Compare commits

...

678 Commits

Author SHA1 Message Date
Ryan Dick
f6045682c0 Fix bug with partial offload of model buffers. 2024-12-10 22:19:17 +00:00
Ryan Dick
84a75ddb72 Fix bug in ModelCache that was causing it to offload more models from VRAM than necessary. 2024-12-10 20:38:37 +00:00
Ryan Dick
a9fb1c82a0 Fix handling of torch.nn.Module buffers in CachedModelWithPartialLoad. 2024-12-10 19:38:04 +00:00
Ryan Dick
cc7391e630 Enable LoRAPatcher.apply_smart_lora_patches(...) throughout the stack. 2024-12-10 17:27:33 +00:00
Ryan Dick
62d595f695 (minor) Rename num_layers -> num_loras in unit tests. 2024-12-10 16:41:52 +00:00
Ryan Dick
5e2080266e Add test_apply_smart_lora_patches_to_partially_loaded_model(...). 2024-12-10 16:38:48 +00:00
Ryan Dick
ed7bb7ea3d Add LoRAPatcher.smart_apply_lora_patches() 2024-12-10 16:26:34 +00:00
Ryan Dick
62407f7c6b Refactor LoRAPatcher slightly in preparation for a 'smart' patcher. 2024-12-10 15:36:36 +00:00
Ryan Dick
80128e1e14 Fix LoRAPatcher.apply_lora_wrapper_patches(...) 2024-12-10 03:10:23 +00:00
Ryan Dick
4c84d39e7d Finish consolidating LoRA sidecar wrapper implementations. 2024-12-10 02:54:32 +00:00
Ryan Dick
0c4a368555 Begin to consolidate the LoRA sidecar and LoRA layer wrapper implementations. 2024-12-10 01:16:01 +00:00
Ryan Dick
55dc762a91 Fix bias handling in LoRAModuleWrapper and add unit test that checks that all LoRA patching methods produce the same outputs. 2024-12-09 16:59:37 +00:00
Ryan Dick
d825d3856e Add LoRA wrapper patching to LoRAPatcher. 2024-12-09 16:35:23 +00:00
Ryan Dick
d94733f55a Add LoRA wrapper layer. 2024-12-09 15:17:50 +00:00
Ryan Dick
2144d21f80 Maintain a read-only CPU state dict copy in CachedModelWithPartialLoad. 2024-12-06 21:49:24 +00:00
Ryan Dick
958efa19d7 Memoize frequently accessed values in CachedModelWithPartialLoad. 2024-12-06 20:39:05 +00:00
Ryan Dick
11af57def3 More ModelCache logging improvements. 2024-12-06 18:38:36 +00:00
Ryan Dick
8b70a5b9bd Cleanup of ModelCache and added a bunch of debug logging. 2024-12-06 17:39:16 +00:00
Ryan Dick
5d9fdcd78d Fix a couple of bugs to get basic vanilla partial model load working with the model cache. 2024-12-06 00:50:58 +00:00
Ryan Dick
c7b84cf012 WIP - first pass at overhauling ModelCache to work with partial loads. 2024-12-05 23:03:40 +00:00
Ryan Dick
8e409e3436 Delete experimental torch device autocasting solutions and clean up TorchFunctionAutocastDeviceContext. 2024-12-05 19:36:44 +00:00
Ryan Dick
987393853c Create CachedModelOnlyFullLoad class. 2024-12-05 18:43:50 +00:00
Ryan Dick
91c5af1b95 Move CachedModelWithPartialLoad into the main model_cache/ directory. 2024-12-05 18:21:26 +00:00
Ryan Dick
5c67dd507a Get rid of ModelLocker. It was an unnecessary layer of indirection. 2024-12-05 16:59:40 +00:00
Ryan Dick
2ff928ec17 Move lock(...) and unlock(...) logic from ModelLocker to the ModelCache and make a bunch of ModelCache properties/methods private. 2024-12-05 16:11:40 +00:00
Ryan Dick
4327bbe77e Pull get_model_cache_key(...) out of ModelCache. The ModelCache should not be concerned with implementation details like the submodel_type. 2024-12-04 22:53:57 +00:00
Ryan Dick
ad1c0d37ef Rename model_cache_default.py -> model_cache.py. 2024-12-04 22:45:30 +00:00
Ryan Dick
9708d87946 Remove ModelCacheBase. 2024-12-04 22:05:34 +00:00
Ryan Dick
3ad44f7850 Move CacheStats to its own file. 2024-12-04 21:56:50 +00:00
Ryan Dick
9a482981b2 Move CacheRecord out to its own file. 2024-12-04 21:53:19 +00:00
Ryan Dick
6b02362b12 Rip out ModelLockerBase. 2024-12-04 21:47:11 +00:00
Ryan Dick
8fec4ec91c Tidy up CachedModel and improve unit test coverage. 2024-12-04 20:28:31 +00:00
Ryan Dick
693e421970 Alternative implementation with torch.nn.Linear module streaming. 2024-12-03 22:32:15 +00:00
Ryan Dick
dc14104bc8 Add TorchFunctionAutocastContext 2024-12-03 19:26:46 +00:00
Ryan Dick
f286a1d1f3 Remove debug logs. 2024-12-03 18:04:55 +00:00
Ryan Dick
9dc86b2b71 Add basic CachedModel class with features for partial load/unload. 2024-12-03 17:12:22 +00:00
Ryan Dick
2cab689b79 Naive TorchAutocastContext. 2024-12-03 14:55:43 +00:00
psychedelicious
f8c7adddd0 feat(ui): add vietnamese to language picker
Closes #7384
2024-12-02 08:12:14 -05:00
psychedelicious
17da1d92e9 fix(ui): remove "adding to" text on Invoke tooltip on Workflows/Upscaling tabs
The "adding to" text indicates if images are going to the gallery or staging area. This info is relevant only to the canvas tab, but was displayed on Upscaling and Workflows tabs. Removed it from those tabs.
2024-12-02 08:08:16 -05:00
psychedelicious
1cc57a4854 chore(ui): lint 2024-12-02 07:59:12 -05:00
psychedelicious
3993fae331 fix(ui): unable to invoke w/ empty inpaint mask or raster layer
Removed the empty state checks for these layer types - it's always OK to invoke when they are empty.
2024-12-02 07:59:12 -05:00
psychedelicious
1446526d55 tidy(ui): translation keys for canvas layer warnings 2024-12-02 07:59:12 -05:00
psychedelicious
62c024e725 feat(ui): add gallery image ctx menu items to create ref image from image
Appears these actions disappeared at some point. Restoring them.
2024-12-02 07:52:58 -05:00
psychedelicious
1e92bb4e94 fix(ui): ref image defaults to prev ref image's image selection
A redux selector is used to get the "default" IP Adapter. The selector uses the model list query result to select an IP Adapter model to be preset by default.

The selector is memoized, so if we mutate the returned default IP Adapter state, it mutates the result of the selector for all consumers.

For example, the `image` property of the default IP Adapter selector result is `null`. When we set the `image` property of the selector result while creating an IP Adapter, this does not trigger the selector to recompute its result. We end up setting the image for the selector result directly, and all other consumers now have that same image set.

Solution - we need to clone the selector result everywhere it is used. This was missed in a few spots, causing the issue.
2024-12-02 07:48:39 -05:00
psychedelicious
db6398fdf6 feat(ui): less confusing empty state for rg ref images
It was easy to misunderstand the empty state for a regional guidance reference image. There was no label, so it seemed like it was the whole region that was empty.

This small change adds the "Reference Image" heading to the empty state, so it's clear that the empty state messaging refers to this reference image, not the whole regional guidance layer.
2024-12-02 07:46:10 -05:00
Riccardo Giovanetti
ebd73a2ac2 translationBot(ui): update translation (Italian)
Currently translated at 98.7% (1622 of 1643 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-12-02 02:13:51 -08:00
Hosted Weblate
8ee95cab00 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-12-02 02:13:51 -08:00
Linos
d1184201a8 translationBot(ui): update translation (Vietnamese)
Currently translated at 100.0% (1643 of 1643 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 100.0% (1638 of 1638 strings)

Co-authored-by: Linos <linos.coding@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/vi/
Translation: InvokeAI/Web UI
2024-12-02 02:13:51 -08:00
Nik Nikovsky
5887891654 translationBot(ui): update translation (Polish)
Currently translated at 4.9% (81 of 1638 strings)

Co-authored-by: Nik Nikovsky <zejdzztegomaila@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/pl/
Translation: InvokeAI/Web UI
2024-12-02 02:13:51 -08:00
Riku
765ca4e004 translationBot(ui): update translation (German)
Currently translated at 69.7% (1142 of 1638 strings)

Co-authored-by: Riku <riku.block@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/de/
Translation: InvokeAI/Web UI
2024-12-02 02:13:51 -08:00
Riku
159b00a490 fix(app): adjust session queue api type 2024-12-01 20:06:05 -08:00
Riku
3fbf6f2d2a chore(ui): update typegen schema 2024-12-01 19:56:09 -08:00
Riku
931fca7cd1 fix(ui): call cancel instead of clear queue 2024-12-01 19:53:12 -08:00
Riku
db84a3a5d4 refactor(ui): move clear queue hook to separate file 2024-12-01 19:42:25 -08:00
psychedelicious
ca8313e805 feat(ui): add new layer from image menu items for staging area
The layers are disabled when created so as to not interfere with the canvas state.
2024-12-01 19:37:49 -08:00
psychedelicious
df849035ee feat(ui): allow setting isEnabled, isLocked and name in createNewCanvasEntityFromImage util 2024-12-01 19:37:49 -08:00
psychedelicious
8d97fe69ca feat(ui): use imageDTOToFile in staging area save to gallery button 2024-12-01 19:37:49 -08:00
psychedelicious
9044e53a9b feat(ui): add imageDTOToFile util 2024-12-01 19:37:49 -08:00
Jonathan
6012b0f912 Update flux_text_encoder.py
Updated version number for FLUX Text Encoding.
2024-11-30 08:29:21 -05:00
Jonathan
bb0ed5dc8a Update flux_denoise.py
Updated node version for FLUX Denoise.
2024-11-30 08:29:21 -05:00
Ryan Dick
021552fd81 Avoid unnecessary dtype conversions with rope encodings. 2024-11-29 12:32:50 -05:00
Ryan Dick
be73dbba92 Use view() instead of rearrange() for better performance. 2024-11-29 12:32:50 -05:00
Ryan Dick
db9c0cad7c Replace custom RMSNorm implementation with torch.nn.functional.rms_norm(...) for improved speed. 2024-11-29 12:32:50 -05:00
Ryan Dick
54b7f9a063 FLUX Regional Prompting (#7388)
## Summary

This PR adds support for regional prompting with FLUX.

### Example 1
Global prompt: `An architecture rendering of the reception area of a
corporate office with modern decor.`
<img width="1386" alt="image"
src="https://github.com/user-attachments/assets/c8169bdb-49a9-44bc-bd9e-58d98e09094b">

![image](https://github.com/user-attachments/assets/4a426be9-9d7a-4527-b27c-2d2514ee73fe)

## QA Instructions

- [x] Test that there is no slowdown in the base case with a single
global prompt.
- [x] Test image fully covered by regional masks.
- [x] Test image covered by region masks with small gaps.
- [x] Test region masks with large unmasked ‘background’ regions
- [x] Test region masks with significant overlap
- [x] Test multiple global prompts.
- [x] Test no global prompt.
- [x] Test regional negative prompts (It runs... but results are not
great. Needs more tuning to be useful.)
- Test compatibility with:
    - [x] ControlNet
    - [x] LoRA
    - [x] IP-Adapter

## Remaining TODO

- [x] Disable the following UI features for FLUX prompt regions:
negative prompts, reference images, auto-negative.

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
2024-11-29 08:56:42 -05:00
psychedelicious
7d488a5352 feat(ui): add delete button to regional ref image empty state 2024-11-29 15:51:24 +10:00
psychedelicious
4d7667f63d fix(ui): add missing translations 2024-11-29 15:43:49 +10:00
psychedelicious
08704ee8ec feat(ui): use canvas layer validators in control/ip adapter graph builders 2024-11-29 15:32:48 +10:00
psychedelicious
5910892c33 Merge remote-tracking branch 'origin/main' into ryan/flux-regional-prompting 2024-11-29 15:19:39 +10:00
psychedelicious
46a09d9e90 feat(ui): format warnings tooltip 2024-11-29 13:32:51 +10:00
psychedelicious
df0c7d73f3 feat(ui): use regional guidance validation utils in graph builders 2024-11-29 13:26:09 +10:00
psychedelicious
3905c97e32 feat(ui): return translation keys from validation utils instead of translated strings 2024-11-29 13:25:09 +10:00
psychedelicious
0be796a808 feat(ui): use layer validation utils in invoke readiness utils 2024-11-29 13:14:26 +10:00
psychedelicious
7dd33b0f39 feat(ui): add indicator to canvas layer headers, displaying validation warnings
If there are any issues with the layer, the icon is displayed. If the layer is disabled, the icon is greyed out but still visible.
2024-11-29 13:13:47 +10:00
psychedelicious
484aaf1595 feat(ui): add canvas layer validation utils
These helpers consolidate layer validation checks. For example, checking that the layer has content drawn, is compatible with the selected main model, has valid reference images, etc.
2024-11-29 13:12:32 +10:00
psychedelicious
c276b60af9 tidy(ui): use object for addRegions graph builder util arg 2024-11-29 08:49:41 +10:00
Ryan Dick
5d8dd6e26e Fix FLUX regional negative prompts. 2024-11-28 18:49:29 +00:00
Emmanuel Ferdman
5bca68d873 docs: update code of conduct reference
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2024-11-27 17:38:33 -08:00
Ryan Dick
64364e7911 Short-circuit if there are no region masks in FLUX and don't apply attention masking. 2024-11-27 22:40:10 +00:00
Ryan Dick
6565cea039 Comment unused _prepare_unrestricted_attn_mask(...) for future reference. 2024-11-27 22:16:44 +00:00
Ryan Dick
3ebd8d6c07 Delete outdated TODO comment. 2024-11-27 22:13:25 +00:00
Ryan Dick
e970185161 Tweak flux regional prompting attention scheme based on latest experimentation. 2024-11-27 22:13:07 +00:00
Ryan Dick
fa5653cdf7 Remove unused 'denoise' param to addRegions(). 2024-11-27 17:08:42 +00:00
Ryan Dick
9a7b000995 Update frontend to support regional prompts with FLUX in the canvas. 2024-11-27 17:04:43 +00:00
Ryan Dick
3a27242838 Bump transformers. The main motivation for this bump is to ingest a fix for DepthAnything postprocessing artifacts. 2024-11-27 07:46:16 -08:00
Ryan Dick
8cfb032051 Add utility ImagePanelLayoutInvocation for working with In-Context LoRA workflows. 2024-11-26 20:58:31 -08:00
Ryan Dick
06a9d4e2b2 Use a Textarea component for the FluxTextEncoderInvocation prompt field. 2024-11-26 20:58:31 -08:00
Brandon Rising
ed46acee79 fix: Fail scan on InvalidMagicError in picklescan, update default for read_checkpoint_meta to scan unless explicitly told not to 2024-11-26 16:17:12 -05:00
Ryan Dick
b54463d294 Allow regional prompting background regions to attend to themselves and to the entire txt embedding. 2024-11-26 17:57:31 +00:00
Ryan Dick
faee79dc95 Distinguish between restricted and unrestricted attn masks in FLUX regional prompting. 2024-11-26 16:55:52 +00:00
Mary Hipp
965cd76e33 lint fix 2024-11-26 11:25:53 -05:00
Mary Hipp
e5e8cbf34c shorten reference image mode descriptions; 2024-11-26 11:25:53 -05:00
Mary Hipp
3412a52594 (ui): updates various informational tooltips, adds descriptons to IP adapter method options 2024-11-26 11:25:53 -05:00
Ryan Dick
e01f66b026 Apply regional attention masks in the single stream blocks in addition to the double stream blocks. 2024-11-25 22:40:08 +00:00
Ryan Dick
53abdde242 Update Flux RegionalPromptingExtension to prepare both a mask with restricted image self-attention and a mask with unrestricted image self attention. 2024-11-25 22:04:23 +00:00
Ryan Dick
94c088300f Be smarter about selecting the global CLIP embedding for FLUX regional prompting. 2024-11-25 20:15:04 +00:00
Ryan Dick
3741a6f5e0 Fix device handling for regional masks and apply the attention mask in the FLUX double stream block. 2024-11-25 16:02:03 +00:00
Kent Keirsey
059336258f Create SECURITY.md 2024-11-25 04:10:03 -08:00
Ryan Dick
2c23b8414c Use a single global CLIP embedding for FLUX regional guidance. 2024-11-22 23:01:43 +00:00
Mary Hipp
271cc52c80 fix(ui): use token for download if its in store 2024-11-22 12:08:05 -05:00
Ryan Dick
20356c0746 Fixup the logic for preparing FLUX regional prompt attention masks. 2024-11-21 22:46:25 +00:00
psychedelicious
e44458609f chore: bump version to v5.4.3rc1 2024-11-21 10:32:43 -08:00
psychedelicious
69d86a7696 feat(ui): address feedback 2024-11-21 09:54:35 -08:00
Hippalectryon
56db1a9292 Use proxyrect and setEntityPosition to sync transformer position 2024-11-21 09:54:35 -08:00
Hippalectryon
cf50e5eeee Make sure the canvas is focused 2024-11-21 09:54:35 -08:00
Hippalectryon
c9c07968d2 lint 2024-11-21 09:54:35 -08:00
Hippalectryon
97d0757176 use $isInteractable instead of $isDisabled 2024-11-21 09:54:35 -08:00
Hippalectryon
0f51b677a9 refactor 2024-11-21 09:54:35 -08:00
Hippalectryon
56ca94c3a9 Don't move if the layer is disabled
Lint
2024-11-21 09:54:35 -08:00
Hippalectryon
28d169f859 Allow moving layers using the keyboard 2024-11-21 09:54:35 -08:00
psychedelicious
92f71d99ee tweak(ui): use X icon for rg ref image delete button 2024-11-21 08:50:39 -08:00
psychedelicious
0764c02b1d tweak(ui): code style 2024-11-21 08:50:39 -08:00
psychedelicious
081c7569fe feat(ui): add global ref image empty state 2024-11-21 08:50:39 -08:00
psychedelicious
20f6532ee8 feat(ui): add empty state for regional guidance ref image 2024-11-21 08:50:39 -08:00
Mary Hipp
b9e8910478 feat(ui): add actions for video modal clicks 2024-11-21 11:15:55 -05:00
Mary Hipp
ded8391e3c use nanostore for schema parsed instead 2024-11-20 20:13:31 -05:00
Mary Hipp
e9dd2c396a limit to one hook 2024-11-20 20:13:31 -05:00
Mary Hipp
0d86de0cb5 fix(ui): make sure schema has loaded before trying to load any workflows 2024-11-20 20:13:31 -05:00
Ryan Dick
bad1149504 WIP - add rough logic for preparing the FLUX regional prompting attention mask. 2024-11-20 22:29:36 +00:00
Ryan Dick
fda7aaa7ca Pass RegionalPromptingExtension down to the CustomDoubleStreamBlockProcessor in FLUX. 2024-11-20 19:48:04 +00:00
Ryan Dick
85c616fa34 WIP - Pass prompt masks to FLUX model during denoising. 2024-11-20 18:51:43 +00:00
psychedelicious
549f4e9794 feat(ui): set default infill method to lama 2024-11-20 11:19:17 -05:00
psychedelicious
ef8ededd2f fix(ui): disable width and height output on image batch output
There's a technical challenge with outputting these values directly. `ImageField` does not store them, so the batch's `ImageField` collection does not have width and height for each image.

In order to set up the batch and pass along width and height for each image, we'd need to make a network request for each image when the user clicks Invoke. It would often be cached, but this will eventually create a scaling issue and poor user experience.

As a very simple workaround, users can output the batch image output into an `Image Primitive` node to access the width and height.

This change is implemented by adding some simple special handling when parsing the output fields for the `image_batch` node.

I'll keep this situation in mind when extending the batching system to other field types.
2024-11-20 11:16:54 -05:00
Mary Hipp
1948ffe106 make sure Soft Edge Detection has preprocessor applied 2024-11-20 08:46:02 -05:00
psychedelicious
c70f4404c4 fix(ui): special node icon tooltip 2024-11-19 14:29:09 -08:00
psychedelicious
b157ae928c chore(ui): update what's new copy 2024-11-19 14:29:09 -08:00
psychedelicious
7a0871992d chore: bump version to v5.4.2 2024-11-19 14:29:09 -08:00
Hosted Weblate
b38e2e14f4 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

translationBot(ui): update translation files

Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-11-19 14:12:00 -08:00
psychedelicious
7c0e70ec84 tweak(ui): "Watch on YouTube" -> "Watch" 2024-11-19 14:02:11 -08:00
psychedelicious
a89ae9d2bf feat(ui): add links to studio sessions/discord 2024-11-19 14:02:11 -08:00
psychedelicious
ad1fcb3f07 chore(ui): bump @invoke-ai/ui-library
Brings in a fix for `ExternalLink`
2024-11-19 14:02:11 -08:00
psychedelicious
87d74b910b feat(ui): support videos modal 2024-11-19 14:02:11 -08:00
psychedelicious
7ad1c297a4 feat(ui): add actions for reset canvas layers / generation settings to session menus 2024-11-19 13:55:16 -08:00
psychedelicious
fbc629faa6 feat(ui): change reset canvas button to new session menu 2024-11-19 13:55:16 -08:00
psychedelicious
7baa6b3c09 feat(ui): split up new from image into submenus
- `New Canvas from Image` -> `As Raster Layer`, `As Raster Layer (Resize)`, `As Control Layer`, `As Control Layer (Resize)`
- `New Layer from Image` -> (each layer type)
2024-11-19 10:34:00 -08:00
psychedelicious
53d482bade feat(ui): add image ctx menu new canvas without resize option 2024-11-19 10:34:00 -08:00
psychedelicious
5aca04b51b feat(ui): change reset canvas icon to "empty" 2024-11-19 09:56:25 -08:00
psychedelicious
ea8787c8ff feat(ui): update invoke button tooltip for batching
- Split up logic to determine reason why the user cannot invoke for each tab.
- Fix issue where the workflows tab would show reasons related to canvas/upscale tab. The tooltip now only shows information relevant to the current tab.
- Add calculation for batch size to the queue count prediction.
- Use a constant for the enqueue mutation's fixed cache key, instead of a string. Just some typo protection.
2024-11-19 09:53:59 -08:00
psychedelicious
cead2c4445 feat(ui): split up selector utils for useIsReadyToEnqueue 2024-11-19 09:53:59 -08:00
Mary Hipp
f76ac1808c fix(ui): simplify logic for non-local invocation progress alerts 2024-11-19 12:40:40 -05:00
psychedelicious
f01210861b chore: ruff 2024-11-19 07:02:37 -08:00
psychedelicious
f757f23ef0 chore(ui): typegen 2024-11-19 07:02:37 -08:00
psychedelicious
872a6ef209 tidy(nodes): extract slerp from lblend to util fn 2024-11-19 07:02:37 -08:00
psychedelicious
4267e5ffc4 tidy(nodes): bring masked blend latents masking logic into invoke core 2024-11-19 07:02:37 -08:00
Brandon Rising
a69c5ff9ef Add copyright notice for CIELab_to_UPLab.icc 2024-11-19 07:02:37 -08:00
Brandon Rising
3ebd8d7d1b Fix .icc asset file in pyproject.toml 2024-11-19 07:02:37 -08:00
Brandon Rising
1fd80d54a4 Run Ruff 2024-11-19 07:02:37 -08:00
Brandon Rising
991f63e455 Store CIELab_to_UPLab.icc within the repo 2024-11-19 07:02:37 -08:00
Brandon Rising
6a1efd3527 Add validation to some of the node inputs 2024-11-19 07:02:37 -08:00
Brandon Rising
0eadc0dd9e feat: Support a subset of composition nodes within base invokeai 2024-11-19 07:02:37 -08:00
youjayjeel
481423d678 translationBot(ui): update translation (Chinese (Simplified Han script))
Currently translated at 86.0% (1367 of 1588 strings)

Co-authored-by: youjayjeel <youjayjeel@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/zh_Hans/
Translation: InvokeAI/Web UI
2024-11-18 19:29:29 -08:00
Riccardo Giovanetti
89ede0aef3 translationBot(ui): update translation (Italian)
Currently translated at 99.3% (1578 of 1588 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-18 19:29:29 -08:00
gallegonovato
359bdee9c6 translationBot(ui): update translation (Spanish)
Currently translated at 42.3% (672 of 1588 strings)

translationBot(ui): update translation (Spanish)

Currently translated at 28.0% (445 of 1588 strings)

Co-authored-by: gallegonovato <fran-carro@hotmail.es>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/es/
Translation: InvokeAI/Web UI
2024-11-18 19:29:29 -08:00
psychedelicious
0e6fba3763 chore: bump version to v5.4.2rc1 2024-11-18 19:25:39 -08:00
psychedelicious
652502d7a6 fix(ui): add sd-3 grid size of 16px to grid util 2024-11-18 19:15:15 -08:00
psychedelicious
91d981a49e fix(ui): reactflow drag interactions with custom scrollbar 2024-11-18 19:12:27 -08:00
psychedelicious
24f61d21b2 feat(ui): make image field collection scrollable 2024-11-18 19:12:27 -08:00
psychedelicious
eb9a4177c5 feat(ui): allow removing individual images from batch 2024-11-18 19:12:27 -08:00
psychedelicious
3c43351a5b feat(ui): add reset to default value button to field title 2024-11-18 19:12:27 -08:00
psychedelicious
b1359b6dff feat(ui): update field validation logic to handle collection sizes 2024-11-18 19:12:27 -08:00
psychedelicious
bddccf6d2f feat(ui): add graph validation for image collection size 2024-11-18 19:12:27 -08:00
psychedelicious
21ffaab2a2 fix(ui): do not allow invoking when canvas is selectig object 2024-11-18 19:12:27 -08:00
psychedelicious
1e969f938f feat(ui): autosize image collection field grid 2024-11-18 19:12:27 -08:00
psychedelicious
9c6c86ee4f fix(ui): image field collection dnd adds instead of replaces 2024-11-18 19:12:27 -08:00
psychedelicious
6b53a48b48 fix(ui): zod schema refiners must return boolean 2024-11-18 19:12:27 -08:00
psychedelicious
c813fa3fc0 feat(ui): support min and max length for image collections 2024-11-18 19:12:27 -08:00
psychedelicious
a08e61184a chore(ui): typegen 2024-11-18 19:12:27 -08:00
psychedelicious
a0d62a5f41 feat(nodes): add minimum image count to ImageBatchInvocation 2024-11-18 19:12:27 -08:00
psychedelicious
616c0f11e1 feat(ui): image batching in workflows
- Add special handling for `ImageBatchInvocation`
- Add input component for image collections, supporting multi-image upload and dnd
- Minor rework of some hooks for accessing node data
2024-11-18 19:12:27 -08:00
psychedelicious
e1626a4e49 chore(ui): typegen 2024-11-18 19:12:27 -08:00
psychedelicious
6ab891a319 feat(nodes): add ImageBatchInvocation 2024-11-18 19:12:27 -08:00
psychedelicious
492de41316 feat(app): add Classification.Special, used for batch nodes 2024-11-18 19:12:27 -08:00
psychedelicious
c064efc866 feat(app): add ImageField as an allowed batching data type 2024-11-18 19:12:27 -08:00
Ryan Dick
1a0885bfb1 Update FLUX IP-Adapter starter model from XLabs v1 to XLabs v2. 2024-11-18 17:06:53 -08:00
Ryan Dick
e8b202d0a5 Update FLUX IP-Adapter graph construction to optimize for XLabs IP-Adapter v2 over v1. This results in degraded performance with v1 IP-Adapters. 2024-11-18 17:06:53 -08:00
Ryan Dick
c6fc82f756 Infer the clip_extra_context_tokens param from the state dict for FLUX XLabs IP-Adapter V2 models. 2024-11-18 17:06:53 -08:00
Ryan Dick
9a77e951d2 Add unit test for FLUX XLabs IP-Adapter V2 model format. 2024-11-18 17:06:53 -08:00
psychedelicious
8bd4207a27 docs(ui): add docstring to CanvasEntityStateGate 2024-11-18 13:40:08 -08:00
psychedelicious
0bb601aaf7 fix(ui): prevent entity not found errors
The canvas react components pass canvas entity identifiers around, then redux selectors are used to access that entity. This is good for perf - entity states may rapidly change. Passing only the identifiers allows components and other logic to have more granular state updates.

Unfortunately, this design opens the possibility for for an entity identifier to point to an entity that does not exist.

To get around this, I had created a redux selector `selectEntityOrThrow` for canvas entities. As the name implies, it throws if the entity is not found.

While it prevents components/hooks from needing to deal with missing entities, it results in mysterious errors if an entity is missing. Without sourcemaps, it's very difficult to determine what component or hook couldn't find the entity.

Refactoring the app to not depend on this behaviour is tricky. We could pass the entity state around directly as a prop or via context, but as mentioned, this could cause performance issues with rapidly changing entities.

As a workaround, I've made two changes:
- `<CanvasEntityStateGate/>` is a component that takes an entity identifier, returning its children if the entity state exists, or null if not. This component is wraps every usage of `selectEntityOrThrow`.  Theoretically, this should prevent the entity not found errors.
- Add a `caller: string` arg to `selectEntityOrThrow`. This string is now added to the error message when the assertion fails, so we can more easily track the source of the errors.

In the future we can work out a way to not use this throwing selector and retain perf. The app has changed quite a bit since that selector was created - so we may not have to worry about perf at all.
2024-11-18 13:40:08 -08:00
psychedelicious
2da25a0043 fix(ui): progress bar not throbbing when it should (#7332)
When we added more progress events during generation, we indirectly broke the logic that controls when the progress bar throbs.

Co-authored-by: Mary Hipp Rogers <maryhipp@gmail.com>
2024-11-18 14:02:20 +00:00
Mary Hipp
51d0931898 remove GPL-3 licensed package easing-functions 2024-11-18 08:55:17 -05:00
Riccardo Giovanetti
357b68d1ba translationBot(ui): update translation (Italian)
Currently translated at 99.3% (1577 of 1587 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-16 05:49:57 +11:00
Mary Hipp
d9ddb6c32e fix(ui): add padding to the metadata recall section so buttons are not blocked 2024-11-16 05:47:45 +11:00
Mary Hipp
ad02a99a83 fix(ui): ignore user setting for commercial, remove unused state 2024-11-16 05:21:30 +11:00
Mary Hipp
b707dafc7b translation 2024-11-16 05:21:30 +11:00
Mary Hipp
02906c8f5d feat(ui): deferred invocation progress details for model loading 2024-11-16 05:21:30 +11:00
psychedelicious
8538e508f1 chore(ui): lint 2024-11-15 12:59:30 +11:00
Hosted Weblate
8c333ffd14 translationBot(ui): update translation files
Updated by "Remove blank strings" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riccardo Giovanetti
72ace5fdff translationBot(ui): update translation (Italian)
Currently translated at 99.4% (1575 of 1583 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Gohsuke Shimada
9b7583fc84 translationBot(ui): update translation (Japanese)
Currently translated at 33.6% (533 of 1583 strings)

translationBot(ui): update translation (Japanese)

Currently translated at 30.3% (481 of 1583 strings)

Co-authored-by: Gohsuke Shimada <ghoskay@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/ja/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
dakota2472
989eee338e translationBot(ui): update translation (Italian)
Currently translated at 99.8% (1580 of 1583 strings)

Co-authored-by: dakota2472 <gardaweb.net@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Linos
acc3d7b91b translationBot(ui): update translation (Vietnamese)
Currently translated at 100.0% (1583 of 1583 strings)

Co-authored-by: Linos <tt250208@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/vi/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riccardo Giovanetti
49de868658 translationBot(ui): update translation (Italian)
Currently translated at 99.4% (1575 of 1583 strings)

translationBot(ui): update translation (Italian)

Currently translated at 99.4% (1573 of 1581 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Hosted Weblate
b1702c7d90 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

translationBot(ui): update translation files

Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riccardo Giovanetti
e49e19ea13 translationBot(ui): update translation (Italian)
Currently translated at 99.4% (1569 of 1577 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
gallegonovato
c9f91f391e translationBot(ui): update translation (Spanish)
Currently translated at 17.6% (278 of 1575 strings)

translationBot(ui): update translation (Spanish)

Currently translated at 17.3% (274 of 1575 strings)

Co-authored-by: gallegonovato <fran-carro@hotmail.es>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/es/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Hosted Weblate
4cb6b2b701 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

translationBot(ui): update translation files

Updated by "Remove blank strings" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riccardo Giovanetti
7d132ea148 translationBot(ui): update translation (Italian)
Currently translated at 99.1% (1564 of 1577 strings)

translationBot(ui): update translation (Italian)

Currently translated at 99.4% (1566 of 1575 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riku
1088accd91 translationBot(ui): update translation (German)
Currently translated at 71.8% (1131 of 1575 strings)

Co-authored-by: Riku <riku.block@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/de/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
dakota2472
8d237d8f8b translationBot(ui): update translation (Italian)
Currently translated at 99.6% (1569 of 1575 strings)

Co-authored-by: dakota2472 <gardaweb.net@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Linos
0c86a3232d translationBot(ui): update translation (Vietnamese)
Currently translated at 100.0% (1581 of 1581 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 100.0% (1576 of 1576 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 100.0% (1575 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 85.0% (1340 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 78.7% (1240 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 73.1% (1152 of 1575 strings)

translationBot(ui): update translation (English)

Currently translated at 99.9% (1574 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 57.9% (913 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 37.0% (584 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 3.2% (51 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 3.2% (51 of 1575 strings)

Co-authored-by: Linos <tt250208@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/en/
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/vi/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
aidawanglion
dbfb0359cb translationBot(ui): update translation (Chinese (Simplified Han script))
Currently translated at 79.9% (1266 of 1583 strings)

translationBot(ui): update translation (Chinese (Simplified Han script))

Currently translated at 74.4% (1171 of 1573 strings)

Co-authored-by: aidawanglion <youjayjeel@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/zh_Hans/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riccardo Giovanetti
b4c2aa596b translationBot(ui): update translation (Italian)
Currently translated at 99.6% (1569 of 1575 strings)

translationBot(ui): update translation (Italian)

Currently translated at 99.4% (1567 of 1575 strings)

translationBot(ui): update translation (Italian)

Currently translated at 99.4% (1565 of 1573 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
psychedelicious
87e89b7995 fix(ui): remove progress message condition for canvas destinations 2024-11-15 12:55:46 +11:00
psychedelicious
9b089430e2 chore: bump version to v5.4.1 2024-11-15 11:51:06 +11:00
psychedelicious
f2b0025958 chore(ui): update what's new 2024-11-15 11:51:06 +11:00
psychedelicious
4b390906bc fix(ui): multiple selection dnd sometimes doesn't get full selection
Turns out a gallery image's `imageDTO` object can actually be a different object by reference. I thought this was not possible thanks to how we have a quasi-normalized cache.

Need to check against image name instead of reference equality when deciding whether or not to use the single image or the gallery selection for the dnd payload.
2024-11-15 11:21:03 +11:00
psychedelicious
c5b8efe03b fix(ui): unable to use text inputs within draggable 2024-11-15 10:25:30 +11:00
psychedelicious
4d08d00ad8 chore(ui): knip 2024-11-14 13:38:40 -08:00
psychedelicious
9b0130262b fix(ui): use silent upload for single-image upload buttons 2024-11-14 13:38:40 -08:00
psychedelicious
878093f64e fix(ui): image uploading handling
Rework uploadImage and uploadImages helpers and the RTK listener, ensuring gallery view isn't changed unexpectedly and preventing extraneous toasts.

Fix staging area save to gallery button to essentially make a copy of the image, instead of changing its intermediate status.
2024-11-14 13:38:40 -08:00
psychedelicious
d5ff7ef250 feat(ui): update output only masked regions
- New name: "Output only Generated Regions"
- New default: true (this was the intention, but at some point the behaviour of the setting was inverted without the default being changed)
2024-11-14 13:35:55 -08:00
psychedelicious
f36583f866 feat(ui): tweak image selection/hover styling
The styling in gallery for selected vs hovered was very similar, leading users to think that the hovered image was also selected.

Reducing the borders for hovered images to a single pixel makes it easier to distinguish between selected and hovered.
2024-11-14 16:28:53 -05:00
psychedelicious
829bc1bc7d feat(ui): progress alert config setting
- Add `invocationProgressAlert` as a disable-able feature. Hide the alert and the setting in system settings when disabled.
- Fix merge conflict
2024-11-15 05:49:05 +11:00
Mary Hipp
17c7b57145 (ui): make detailed progress view a setting that can be hidden 2024-11-15 05:49:05 +11:00
psychedelicious
6a12189542 feat(ui): updated progress event display
- Tweak layout/styling of alerts for consistent spacing
- Add percentage to message if it has percentage
- Only show events if the destination is canvas (so workflows events are hidden for example)
2024-11-15 05:49:05 +11:00
psychedelicious
96a31a5563 feat(app): add more events when loading/running models 2024-11-15 05:49:05 +11:00
psychedelicious
067747eca9 feat(app): tweak model load events
- Pass in the `UtilInterface` to the `ModelsInterface` so we can call the simple `signal_progress` method instead of the complicated `emit_invocation_progress` method.
- Only emit load events when starting to load - not after.
- Add more detail to the messages, like submodel type
2024-11-15 05:49:05 +11:00
Mary Hipp
c7878fddc6 (pytest) mock emit_invocation_progress on events service 2024-11-15 05:49:05 +11:00
maryhipp
54c51e0a06 (worker) add progress images for downloading remote models 2024-11-15 05:49:05 +11:00
Mary Hipp
1640ea0298 (pytest) add missing arg for mocked context 2024-11-15 05:49:05 +11:00
Mary Hipp
0c32ae9775 (pytest) fix import 2024-11-15 05:49:05 +11:00
maryhipp
fdb8ca5165 (worker) use source if name is not available 2024-11-15 05:49:05 +11:00
Mary Hipp
571faf6d7c (pytest) add queue_item and invocation to data in context for test 2024-11-15 05:49:05 +11:00
Mary Hipp
bdbdb22b74 (ui) add Canvas Alert for invocation progress messages 2024-11-15 05:49:05 +11:00
maryhipp
9bbb5644af (worker) add invocation_progress events to model loading 2024-11-15 05:49:05 +11:00
Mary Hipp
e90ad19f22 (ui): update en string for full IP adapter 2024-11-14 10:07:42 -08:00
Ryan Dick
0ba11e8f73 SD3 Image-to-Image and Inpainting (#7295)
## Summary

Add support for SD3 image-to-image and inpainting. Similar to FLUX, the
implementation supports fractional denoise_start/denoise_end for more
fine-grained denoise strength control, and a gradient mask adjustment
schedule for smoother inpainting seams.

## Example
Workflow
<img width="1016" alt="image"
src="https://github.com/user-attachments/assets/ee598d77-be80-4ca7-9355-c3cbefa2ef43">

Result

![image](https://github.com/user-attachments/assets/43953fa7-0e4e-42b5-84e8-85cfeeeee00b)

## QA Instructions

- [x] Regression test of text-to-image
- [x] Test image-to-image without mask
- [x] Test that adjusting denoising_start allows fine-grained control of
amount of change in image-to-image
- [x] Test inpainting with mask
- [x] Smoke test SD1, SDXL, FLUX image-to-image to make sure there was
no regression with the frontend changes.

## Merge Plan

<!--WHEN APPLICABLE: Large PRs, or PRs that touch sensitive things like
DB schemas, may need some care when merging. For example, a careful
rebase by the change author, timing to not interfere with a pending
release, or a message to contributors on discord after merging.-->

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
2024-11-14 09:33:51 -08:00
Ryan Dick
1cf7600f5b Merge branch 'main' into ryan/sd3-image-to-image 2024-11-14 09:25:23 -08:00
Ryan Dick
4f9d12b872 Fix FLUX diffusers LoRA models with no .proj_mlp layers (#7313)
## Summary

Add support for FLUX diffusers LoRA models without `.proj_mlp` layers.

## Related Issues / Discussions

Closes #7129 

## QA Instructions

- [x] FLUX diffusers LoRA **without .proj_mlp** layers
- [x] FLUX diffusers LoRA **with .proj_mlp** layers
- [x] FLUX diffusers LoRA **without .proj_mlp** layers, quantized base
model
- [x] FLUX diffusers LoRA **with .proj_mlp** layers, quantized base
model

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
2024-11-14 09:09:10 -08:00
Ryan Dick
68c3b0649b Add unit tests for FLUX diffusers LoRA without .proj_mlp layers. 2024-11-14 16:53:49 +00:00
Ryan Dick
8ef8bd4261 Add state dict tensor shapes for existing LoRA unit tests. 2024-11-14 16:53:49 +00:00
Ryan Dick
50897ba066 Add flag to optionally allow missing layer keys in FLUX lora loader. 2024-11-14 16:53:49 +00:00
Ryan Dick
3510643870 Support FLUX LoRAs without .proj_mlp layers. 2024-11-14 16:53:49 +00:00
Ryan Dick
ca9cb1c9ef Flux Vae broke for float16, force bfloat16 or float32 were compatible (#7213)
## Summary

The Flux VAE, like many VAEs, is broken if run using float16 inputs
returning black images due to NaNs
This will fix the issue by forcing the VAE to run in bfloat16 or float32
were compatible

## Related Issues / Discussions

Fix for issue https://github.com/invoke-ai/InvokeAI/issues/7208

## QA Instructions

Tested on MacOS, VAE works with float16 in the invoke.yaml and left to
default.
I also briefly forced it down the float32 route to check that to.
Needs testing on CUDA / ROCm

## Merge Plan

It should be a straight forward merge,
2024-11-13 15:51:40 -08:00
Ryan Dick
b89caa02bd Merge branch 'main' into flux_vae_fp16_broke 2024-11-13 15:33:43 -08:00
Ryan Dick
eaf4e08c44 Use vae.parameters() for more efficient access of the first model parameter. 2024-11-13 23:32:40 +00:00
Darrell
fb19621361 Updated link to flux ip adapter model 2024-11-12 08:11:40 -05:00
Mary Hipp
9179619077 actually use optimized denoising 2024-11-08 20:46:08 -05:00
Mary Hipp
13cb5f0ba2 Merge remote-tracking branch 'origin/main' into ryan/sd3-image-to-image 2024-11-08 20:29:56 -05:00
Mary Hipp
7e52fc1c17 Merge branch 'ryan/sd3-image-to-image' of https://github.com/invoke-ai/InvokeAI into ryan/sd3-image-to-image 2024-11-08 20:14:24 -05:00
Mary Hipp
7f60a4a282 (ui): update more generation settings for SD3 linear UI 2024-11-08 20:14:13 -05:00
psychedelicious
3f880496f7 feat(ui): clarify denoising strength badge text 2024-11-09 08:38:41 +11:00
Ryan Dick
f05efd3270 Fix import for getInfill. 2024-11-08 20:42:44 +00:00
psychedelicious
79eb8172b6 feat(ui): update warnings on upscaling tab based on model arch
When an unsupported model architecture is selected, show that warning only, without the extra warnings (i.e. no "missing tile controlnet" warning)

Update Invoke tooltip warnings accordingly

Closes #7239
Closes #7177
2024-11-09 07:34:03 +11:00
Ryan Dick
7732b5d478 Fix bug related to i2l nodes during graph construction of image-to-image workflows. 2024-11-08 20:15:34 +00:00
Mary Hipp
a2a1934b66 Merge branch 'ryan/sd3-image-to-image' of https://github.com/invoke-ai/InvokeAI into ryan/sd3-image-to-image 2024-11-08 13:43:19 -05:00
Mary Hipp
dff6570078 (ui) SD3 support in linear UI 2024-11-08 13:42:57 -05:00
maryhipp
04e4fb63af add SD3 generation modes for metadata validation 2024-11-08 13:13:58 -05:00
Vargol
83609d5008 Merge branch 'invoke-ai:main' into flux_vae_fp16_broke 2024-11-08 10:37:31 +00:00
David Burnett
2618ed0ae7 ruff complained 2024-11-08 10:31:53 +00:00
David Burnett
bb3cedddd5 Rework change based on comments 2024-11-08 10:27:47 +00:00
psychedelicious
5b3e1593ca fix(ui): restore missing image paste handler
Missed migrating this logic over during dnd migration.
2024-11-08 16:42:39 +11:00
psychedelicious
2d08078a7d fix(ui): fit bbox to layers math 2024-11-08 16:40:24 +11:00
psychedelicious
75acece1f1 fix(ui): excessive toasts when generating on canvas
- Add `withToast` flag to `uploadImage` util
- Skip the toast if this is not set
- Use the flag to disable toasts when canvas does internal image-uploading stuff that should be invisible to user
2024-11-08 10:30:04 +11:00
psychedelicious
a9db2ffefd fix(ui): ensure clip vision model is set correctly for FLUX IP Adapters 2024-11-08 10:02:41 +11:00
psychedelicious
cdd148b4d1 feat(ui): add toast for graph building errors 2024-11-08 10:02:41 +11:00
psychedelicious
730fabe2de feat(ui): add util to extract message from a tsafe AssertionError 2024-11-08 10:02:41 +11:00
psychedelicious
6c59790a7f chore: bump version to v5.4.1rc2 2024-11-08 10:00:20 +11:00
Ryan Dick
0e6cb91863 Update SD3 InpaintExtension with gradient adjustment to match FLUX. 2024-11-07 22:55:30 +00:00
Ryan Dick
a0fefcd43f Switch to using a custom scheduler implementation for SD3 rather than the diffusers FlowMatchEulerDiscreteScheduler. It is easier to work with and enables us to re-use the clip_timestep_schedule_fractional() utility from FLUX. 2024-11-07 22:46:52 +00:00
psychedelicious
c37251d6f7 tweak(ui): workflow linear field styling 2024-11-08 07:39:09 +11:00
psychedelicious
2854210162 fix(ui): dnd autoscroll on elements w/ custom scrollbar
Have to do a bit of fanagling to get it to work and get `pragmatic-drag-and-drop` to not complain.
2024-11-08 07:39:09 +11:00
psychedelicious
5545b980af fix(ui): workflow field sorting doesn't use unique identifier for fields 2024-11-08 07:39:09 +11:00
psychedelicious
0c9434c464 chore(ui): lint 2024-11-08 07:39:09 +11:00
psychedelicious
8771de917d feat(ui): migrate fullscreen drop zone to pdnd 2024-11-08 07:39:09 +11:00
psychedelicious
122946ef4c feat(ui): DndDropOverlay supports react node for label 2024-11-08 07:39:09 +11:00
psychedelicious
2d974f670c feat(ui): restore missing upload buttons 2024-11-08 07:39:09 +11:00
psychedelicious
75f0da9c35 fix(ui): use revised uploader for CL empty state 2024-11-08 07:39:09 +11:00
psychedelicious
5df3c00e28 feat(ui): remove SerializableObject, use type-fest's JsonObject 2024-11-08 07:39:09 +11:00
psychedelicious
b049880502 fix(ui): uploads initiated from canvas 2024-11-08 07:39:09 +11:00
psychedelicious
e5293fdd1a fix(ui): match new default controlnet behaviour 2024-11-08 07:39:09 +11:00
psychedelicious
8883775762 feat(ui): rework image uploads (wip) 2024-11-08 07:39:09 +11:00
psychedelicious
cfadb313d2 fix(ui): ts issues 2024-11-08 07:39:09 +11:00
psychedelicious
b5cadd9a1a fix(ui): scroll issue w/ boards list 2024-11-08 07:39:09 +11:00
psychedelicious
5361b6e014 refactor(ui): image actions sep of concerns 2024-11-08 07:39:09 +11:00
psychedelicious
ff346172af feat(ui): use new image actions system for image menu 2024-11-08 07:39:09 +11:00
psychedelicious
92f660018b refactor(ui): dnd actions to image actions
We don't need a "dnd" image system. We need a "image action" system. We need to execute specific flows with images from various "origins":
- internal dnd e.g. from gallery
- external dnd e.g. user drags an image file into the browser
- direct file upload e.g. user clicks an upload button
- some other internal app button e.g. a context menu

The actions are now generalized to better support these various use-cases.
2024-11-08 07:39:09 +11:00
psychedelicious
1afc2cba4e feat(ui): support different labels for external drop targets (e.g. uploads) 2024-11-08 07:39:09 +11:00
psychedelicious
ee8359242c feat(ui): more dnd cleanup and tidy 2024-11-08 07:39:09 +11:00
psychedelicious
f0c80a8d7a tidy(ui): dnd stuff 2024-11-08 07:39:09 +11:00
psychedelicious
8da9e7c1f6 fix(ui): min height for workflow image field drop target 2024-11-08 07:39:09 +11:00
psychedelicious
6d7a486e5b feat(ui): restore dnd to workflow fields 2024-11-08 07:39:09 +11:00
psychedelicious
57122c6aa3 feat(ui): layer reordering styling 2024-11-08 07:39:09 +11:00
psychedelicious
54abd8d4d1 feat(ui): dnd layer reordering (wip) 2024-11-08 07:39:09 +11:00
psychedelicious
06283cffed feat(ui): use custom drag previews for images 2024-11-08 07:39:09 +11:00
psychedelicious
27fa0e1140 tidy(ui): more efficient dnd overlay styling 2024-11-08 07:39:09 +11:00
psychedelicious
533d48abdb feat(ui): multi-image drag preview 2024-11-08 07:39:09 +11:00
psychedelicious
6845cae4c9 tidy(ui): move new dnd impl into features/dnd 2024-11-08 07:39:09 +11:00
psychedelicious
31c9acb1fa tidy(ui): clean up old dnd stuff 2024-11-08 07:39:09 +11:00
psychedelicious
fb5e462300 tidy(ui): document & clean up dnd 2024-11-08 07:39:09 +11:00
psychedelicious
2f3abc29b1 feat(ui): better types for getData 2024-11-08 07:39:09 +11:00
psychedelicious
c5c071f285 feat(ui): better type name 2024-11-08 07:39:09 +11:00
psychedelicious
93a3ed56e7 feat(ui): simpler dnd typing implementation 2024-11-08 07:39:09 +11:00
psychedelicious
406fc58889 feat(ui): migrate to pragmatic-drag-and-drop (wip 4) 2024-11-08 07:39:09 +11:00
psychedelicious
cf67d084fd feat(ui): migrate to pragmatic-drag-and-drop (wip 3) 2024-11-08 07:39:09 +11:00
psychedelicious
d4a95af14f perf(ui): more gallery perf improvements 2024-11-08 07:39:09 +11:00
psychedelicious
8c8e7102c2 perf(ui): improved gallery perf 2024-11-08 07:39:09 +11:00
psychedelicious
b6b9ea9d70 feat(ui): migrate to pragmatic-drag-and-drop (wip 2) 2024-11-08 07:39:09 +11:00
psychedelicious
63126950bc feat(ui): migrate to pragmatic-drag-and-drop (wip) 2024-11-08 07:39:09 +11:00
psychedelicious
29d63d5dea fix(app): silence pydantic protected namespace warning
Closes #7287
2024-11-08 07:36:50 +11:00
Ryan Dick
a5f8c23dee Add inpainting support for SD3. 2024-11-07 20:21:43 +00:00
Ryan Dick
7bb4ea57c6 Add SD3ImageToLatentsInvocation. 2024-11-07 16:07:57 +00:00
Ryan Dick
75dc961bcb Add image-to-image support for SD3 - WIP. 2024-11-07 15:48:35 +00:00
Vargol
a9a1f6ef21 Merge branch 'invoke-ai:main' into flux_vae_fp16_broke 2024-11-07 14:02:51 +00:00
Jonathan
aa40161f26 Update flux_denoise.py
Added a bool to allow the node user to add noise in to initial latents (default) or to leave them alone.
2024-11-07 14:02:20 +00:00
psychedelicious
6efa812874 chore(ui): bump version to v5.4.1rc1 2024-11-07 14:02:20 +00:00
psychedelicious
8a683f5a3c feat(ui): updated whats new handling and v5.4.1 items 2024-11-07 14:02:20 +00:00
Brandon Rising
f4b0b6a93d fix: Look in known subfolders for configs for clip variants 2024-11-07 14:02:20 +00:00
Brandon Rising
1337c33ad3 fix: Avoid downloading unsafe .bin files if a safetensors file is available 2024-11-07 14:02:20 +00:00
Jonathan
2f6b035138 Update flux_denoise.py
Added a bool to allow the node user to add noise in to initial latents (default) or to leave them alone.
2024-11-07 08:44:10 -05:00
psychedelicious
4f9ae44472 chore(ui): bump version to v5.4.1rc1 2024-11-07 12:19:28 +11:00
psychedelicious
c682330852 feat(ui): updated whats new handling and v5.4.1 items 2024-11-07 12:19:28 +11:00
Brandon Rising
c064257759 fix: Look in known subfolders for configs for clip variants 2024-11-07 12:01:02 +11:00
Brandon Rising
8a4c629576 fix: Avoid downloading unsafe .bin files if a safetensors file is available 2024-11-06 19:31:18 -05:00
David Burnett
496b02a3bc Same issue affects image2image, so do the same again 2024-11-06 17:47:22 -05:00
David Burnett
7b5efc2203 Flux Vae broke for float16, force bfloat16 or float32 were compatible 2024-11-06 17:47:22 -05:00
psychedelicious
a01d44f813 chore(ui): lint 2024-11-06 10:25:46 -05:00
psychedelicious
63fb3a15e9 feat(ui): default to no control model selected for control layers 2024-11-06 10:25:46 -05:00
psychedelicious
4d0837541b feat(ui): add simple mode filtering 2024-11-06 10:25:46 -05:00
psychedelicious
999809b4c7 fix(ui): minor viewer close button styling 2024-11-06 10:25:46 -05:00
psychedelicious
c452edfb9f feat(ui): add control layer empty state 2024-11-06 10:25:46 -05:00
psychedelicious
ad2cdbd8a2 feat(ui): tooltip for canvas preview image 2024-11-06 10:25:46 -05:00
psychedelicious
f15c24bfa7 feat(ui): add " (recommended)" to balanced control mode label 2024-11-06 10:25:46 -05:00
psychedelicious
d1f653f28c feat(ui): make default control end step 0.75 2024-11-06 10:25:46 -05:00
psychedelicious
244465d3a6 feat(ui): make default control weight 0.75 2024-11-06 10:25:46 -05:00
psychedelicious
c6236ab70c feat(ui): add menubar-ish header on comparison 2024-11-06 10:25:46 -05:00
psychedelicious
644d5cb411 feat(ui): add menubar-ish header on viewer 2024-11-06 10:25:46 -05:00
Riku
bb0a630416 fix(ui): adjust knip config to ignore parameter schema exports 2024-11-06 22:51:17 +11:00
Riku
2148ae9287 feat(ui): simplify parameter schema declaration and type inference 2024-11-06 22:51:17 +11:00
psychedelicious
42d242609c chore(gh): update pr template w/ reminder for what's new copy 2024-11-06 19:03:31 +11:00
psychedelicious
fd0a52392b feat(ui): added line about when denoising str is disabled 2024-11-06 19:01:33 +11:00
psychedelicious
e64415d59a feat(ui): revised logic to disable denoising str 2024-11-06 19:01:33 +11:00
psychedelicious
1871e0bdbf feat(ui): tweaked denoise str styling 2024-11-06 19:01:33 +11:00
Mary Hipp
3ae9a965c2 lint 2024-11-06 19:01:33 +11:00
Mary Hipp
85932e35a7 update copy again 2024-11-06 19:01:33 +11:00
Mary Hipp
41b07a56cc update popover copy and add image 2024-11-06 19:01:33 +11:00
Mary Hipp
54064c0cb8 fix(ui): match badge height to slider height so layout does not shift 2024-11-06 19:01:33 +11:00
Mary Hipp
68284b37fa remove opacity logic from WavyLine, add badge explaining disabled state, add translations 2024-11-06 19:01:33 +11:00
Mary Hipp
ae5bc6f5d6 feat(ui): move denoising strength to layers panel w/ visualization of how much change will be applied, only enable if 1+ enabled raster layer 2024-11-06 19:01:33 +11:00
Mary Hipp
6dc16c9f54 wip 2024-11-06 19:01:33 +11:00
Brandon Rising
faa9ac4e15 fix: get_clip_variant_type should never return None 2024-11-06 09:59:50 +11:00
Mary Hipp Rogers
d0460849b0 fix bad merge conflict (#7273)
Co-authored-by: Mary Hipp <maryhipp@Marys-MacBook-Air.local>
2024-11-05 16:02:03 -05:00
Mary Hipp Rogers
bed3c2dd77 update Whats New for 5.3.1 (#7272)
Co-authored-by: Mary Hipp <maryhipp@Marys-MacBook-Air.local>
2024-11-05 15:43:16 -05:00
Mary Hipp
916ddd17d7 fix(ui): fix link for infill method popover 2024-11-05 15:39:03 -05:00
Mary Hipp
accfa7407f fix undefined 2024-11-05 15:30:17 -05:00
Mary Hipp
908db31e48 feat(api,ui): allow Whats New module to get content from back-end 2024-11-05 15:30:17 -05:00
Mary Hipp
b70f632b26 fix(ui): add some feedback while layers are merging 2024-11-05 12:38:50 -05:00
Brandon Rising
d07a6385ab Always default to ClipVariantType.L instead of None 2024-11-05 12:03:40 -05:00
Brandon Rising
68df612fa1 fix: Never throw an exception when finding the clip variant type 2024-11-05 12:03:40 -05:00
psychedelicious
3b96c79461 chore: bump version to v5.4.0 2024-11-05 10:09:21 +11:00
psychedelicious
89bda5b983 Ryan/sd3 diffusers (#7222)
## Summary

Nodes to support SD3.5 txt2img generations
* adds SD3.5 to starter models
* adds default workflow for SD3.5 txt2img

## Related Issues / Discussions

<!--WHEN APPLICABLE: List any related issues or discussions on github or
discord. If this PR closes an issue, please use the "Closes #1234"
format, so that the issue will be automatically closed when the PR
merges.-->

## QA Instructions

<!--WHEN APPLICABLE: Describe how you have tested the changes in this
PR. Provide enough detail that a reviewer can reproduce your tests.-->

## Merge Plan

<!--WHEN APPLICABLE: Large PRs, or PRs that touch sensitive things like
DB schemas, may need some care when merging. For example, a careful
rebase by the change author, timing to not interfere with a pending
release, or a message to contributors on discord after merging.-->

## Checklist

- [ ] _The PR has a short but descriptive title, suitable for a
changelog_
- [ ] _Tests added / updated (if applicable)_
- [ ] _Documentation added / updated (if applicable)_
2024-11-05 08:21:28 +11:00
Brandon Rising
22bff1fb22 Fix conditional within filter_by_variant to not read all candidates as default 2024-11-04 12:42:09 -05:00
Mary Hipp
55ba6488d1 fix up types file 2024-11-04 12:42:09 -05:00
brandonrising
2d78859171 Create bespoke latents to image node for sd3 2024-11-04 12:42:09 -05:00
Mary Hipp
3a661bac34 fix(ui): exclude submodels from model manager 2024-11-04 12:42:09 -05:00
Mary Hipp
bb8a02de18 update schema 2024-11-04 12:42:09 -05:00
maryhipp
78155344f6 update node fields for SD3 to match other SD nodes 2024-11-04 12:42:09 -05:00
Brandon Rising
391a24b0f6 Re-add erroniously removed hash code 2024-11-04 12:42:09 -05:00
Brandon Rising
e75903389f Run ruff, fix bug in hf downloading code which failed to download parts of a model 2024-11-04 12:42:09 -05:00
Brandon Rising
27567052f2 Create new latent factors for sd35 2024-11-04 12:42:09 -05:00
Brandon Rising
6f447f7169 Rather than .fp16., some repos start the suffix with .fp16... for weights spread across multiple files 2024-11-04 12:42:09 -05:00
Mary Hipp
8b370cc182 (ui): dont show SD3 in main model dropdown yet 2024-11-04 12:42:09 -05:00
maryhipp
af583d2971 ruff format 2024-11-04 12:42:09 -05:00
Mary Hipp
0ebe8fb1bd (ui): add required/optional logic to other submodel fields 2024-11-04 12:42:09 -05:00
maryhipp
befb629f46 add default workflow 2024-11-04 12:42:09 -05:00
maryhipp
874d67cb37 add SD3.5 to starter models 2024-11-04 12:42:09 -05:00
Mary Hipp
19f7a1295a (ui): add fields for CLIP-L and CLIP-G, remove MainModelConfig type changes 2024-11-04 12:42:09 -05:00
maryhipp
78bd605617 (nodes,api): expose the submodels on SD3 model loader as optional, add types needed for CLIP-L and CLIP-G fields 2024-11-04 12:42:09 -05:00
Brandon Rising
b87f4e59a5 Create clip variant type, create new fucntions for discerning clipL and clipG in the frontend 2024-11-04 12:42:09 -05:00
Ryan Dick
1eca4f12c8 Make T5 encoder optonal in SD3 workflows. 2024-11-04 12:42:09 -05:00
Ryan Dick
f1de11d6bf Make the default CFG for SD3 3.5. 2024-11-04 12:42:09 -05:00
Ryan Dick
9361ed9d70 Add progress images to SD3 and make denoising cancellable. 2024-11-04 12:42:09 -05:00
Brandon Rising
ebabf4f7a8 Setup Model and T5 Encoder selection fields for sd3 nodes 2024-11-04 12:42:09 -05:00
Brandon Rising
606f3321f5 Initial wave of frontend updates for sd-3 node inputs 2024-11-04 12:42:09 -05:00
Brandon Rising
3970aa30fb define submodels on sd3 models during probe 2024-11-04 12:42:09 -05:00
Ryan Dick
678436e07c Add tqdm progress bar for SD3. 2024-11-04 12:42:09 -05:00
Ryan Dick
c620581699 Bug fixes to get SD3 text-to-image workflow running. 2024-11-04 12:42:09 -05:00
Ryan Dick
c331d42ce4 Temporary hack for testing SD3 model loader. 2024-11-04 12:42:09 -05:00
Ryan Dick
1ac9b502f1 Fix Sd3TextEncoderInvocation output type. 2024-11-04 12:42:09 -05:00
Ryan Dick
3fa478a12f Initial draft of SD3DenoiseInvocation. 2024-11-04 12:42:09 -05:00
Ryan Dick
2d86298b7f Add first draft of Sd3TextEncoderInvocation. 2024-11-04 12:42:09 -05:00
Ryan Dick
009cdb714c Add Sd3ModelLoaderInvocation. 2024-11-04 12:42:09 -05:00
Ryan Dick
9d3f5427b4 Move FluxModelLoaderInvocation to its own file. model.py was getting bloated. 2024-11-04 12:42:09 -05:00
Ryan Dick
e4b17f019a Get diffusers SD3 model probing working. 2024-11-04 12:42:09 -05:00
Ryan Dick
586c00bc02 (minor) Remove unused dict. 2024-11-04 12:42:09 -05:00
Eugene Brodsky
0f11fda65a fix(deps): pin mediapipe strictly to a known working version 2024-11-04 10:16:19 -05:00
psychedelicious
3e75331ef7 fix(ui): load workflow from file
In a8de6406c5 a change was made to many menus in an effort to improve performance. The menus were made to be lazy, so that they are mounted only while open.

This causes unexpected behaviour when there is some logic in the menu that may need to execute after the user selects a menu item.

In this case, when you click to load a workflow from file, the file picker opens but then the menuitem unmounts, taking the input element and all uploading logic with it. When you select a file, nothing happens because we've nuked the handlers by unmounting everything.

Easy fix - un-lazy-fy the menu.

Closes #7240
2024-11-04 08:02:55 -05:00
psychedelicious
be133408ac fix(nodes): relaxed validation for segment anything
The validation on this node causes graph validation to valid. It must be validated _after_ instantiation.

Also, it was a bit too strict. The only case we explicitly do not handle is when both bboxes and points are provided. It's acceptable if neither are provided.

Closes #7248
2024-11-04 08:00:52 -05:00
psychedelicious
7e1e0d6928 fix(ui): non-default filters can erase layer
When filtering, we use a listener to trigger processing the image whenever a filter setting changes. For example, if the user changes from canny to depth, and auto-process is enabled, we re-process the layer with new filter settings.

The filterer has a method to reset its ephemeral state. This includes the filter settings, so resetting the ephemeral state is expected to trigger processing of the filter.

When we exit filtering, we reset the ephemeral state before resetting everything else, like the listeners.

This can cause problem when we exit filtering. The sequence:
- Start filtering a layer.
- Auto-process the filter in response to starting the filter process.
- Change the filter settings.
- Auto-process the filter in response to the changed settings.
- Apply the filter.
- Exit filtering, first by resetting the ephemeral state.
- Auto-process the filter in response to the reset settings.*
- Finish exiting, including unsubscribing from listeners.

*Whoops! That last auto-process has now borked the layer's rendering by processing a filter when we shouldn't be processing a filter.

We need to first unsubscribe from listeners, so we don't react to that change to the filter settings and erroneously process the layer.

Also, add a check to the `processImmediate` method to prevent processing if that method is accidentally called without first starting the filterer.

The same issue could affect the segmenyanything module - same fixes are implemented there.
2024-11-04 07:11:20 -05:00
psychedelicious
cd3d8df5a8 fix(ui): save canvas to gallery does nothing
The root issue is the compositing cache. When we save the canvas to gallery, we need to first composite raster layers together and then upload the image.

The compositor makes extensive use of caching to reduce the number of images created and improve performance. There are two "layers" of caching:
1. Caching the composite canvas element, which is used both for uploading the canvas and for generation mode analysis.
2. Caching the uploaded composite canvas element as an image.

The combination of these caches allows for the various processes that require composite canvases to do minimal work.

But this causes a problem in this situation, because the user expects a new image to be uploaded when they click save to gallery.

For example, suppose we have already composited and uploaded the raster layer state for use in a generation. Then, we ask the compositor to save the canvas to gallery.

The compositor sees that we are requesting an image for the current canvas state, and instead of recompositing and uploading the image again, it just returns the cached image.

In this case, no image is uploaded and it the button does nothing.

We need to be able to opt out of the caching at some level, for certain actions. A `forceUpload` arg is added to the compositor's high-level `getCompositeImageDTO` method to do this.

When true, we ignore the uppermost caching layer (the uploaded image layer), but still use the lower caching layer (the canvas element layer). So we don't recompute the canvas element, but we do upload it as a new image to the server.
2024-11-04 07:11:20 -05:00
psychedelicious
24d3c22017 fix(ui): temp fix for stuck tooltips 2024-11-04 07:11:20 -05:00
psychedelicious
b0d37f4e51 fix(ui): progress image does not reset when canceling generation
Previously, we cleared the canvas progress image when the canvas had no active generations. This allowed for a brief flash of canvas state between the last progress image for a given generation, and when the output image for that generation rendered. Here's the sequence:
- Progress images are received and rendered
- Generation completes - no active canvas generations
- Clear the progress image -> canvas layers visible unexpectedly, creating an awkward jarring change
- Generation output image is rendered -> output image overlaid on canvas layers

In 83538c4b2b I attempted to fix this by only clearing the progress image while we were not staging.

This isn't quite right, though. We are often staging with no active generations - for example, you have a few images completed and are waiting to choose one.

In this situation, if you cancel a pending generation, the logic to clear the progress image doesn't fire because it sees staging is in progress.

What we really need is:
- Staging area module clears the progress image once it has rendered an output image.
- Progress image module clears the progress image when a generation is canceled or failed, in which case there will be no output image.

To do this, we can add an event listener to the progress image module to listen for queue item status changes, and when we get a cancelation or failure, clear the progress image.
2024-11-04 07:11:20 -05:00
psychedelicious
3559124674 feat(ui): use nanostores in CanvasProgressImageModule for internal state 2024-11-04 07:11:20 -05:00
Eugene Brodsky
6c33e02141 fix(pkg): pin torch to <2.5.0 to prevent unnecessary downloads
pip's dependency resolution doesn't take into account transitive
dependencies when choosing package versions for download.
Even though `torch=~2.4.1` is required by `diffusers`, pip will
download 2.5.0 and higher, but only install 2.4.1.
Pinning torch to <2.5.0 prevents this behaviour.
2024-11-01 12:27:28 -04:00
psychedelicious
8cf94d602f chore: bump version to v5.3.1 2024-11-01 13:31:51 +11:00
psychedelicious
016a6f182f Make T2I Adapters work with any resolution supported by the models (#7215)
## Summary

This change mimics the unet padding strategy to align T2I featuremaps
with the latents during denoising. It also slightly adjusts the crop and
scale logic so that the control will match the input image without
shifting when it needs to pad.

## Related Issues / Discussions

<!--WHEN APPLICABLE: List any related issues or discussions on github or
discord. If this PR closes an issue, please use the "Closes #1234"
format, so that the issue will be automatically closed when the PR
merges.-->

## QA Instructions

Image generated at 1032x1024

![image](https://github.com/user-attachments/assets/7ea579e4-61dc-4b6b-aa84-33d676d160c6)

Image generated at 1080x1040 to prove feature alignment.

![image](https://github.com/user-attachments/assets/ee6e5b6a-d0d5-474d-9fc4-f65c104964bd)

Edge artifacts on the bottom and right are a result of SDXL's unet
padding, and t2i influence will be cut off in those regions.

## Merge Plan

Contingent on #7205 
Currently the Canvas UI prevents users from generating non-64
resolutions while t2i adapter layers are active. Will leave this as a
draft until fixing that.

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [ ] _Tests added / updated (if applicable)_
- [ ] _Documentation added / updated (if applicable)_
2024-11-01 13:22:00 +11:00
Kent Keirsey
6fbc019142 Merge branch 'main' into t2i_resolution_hack 2024-10-31 22:08:38 -04:00
psychedelicious
26f95d6a97 fix(ui): disable move tool when staging 2024-10-31 22:08:16 -04:00
psychedelicious
40f7b0d171 fix(ui): cursor disappearing on empty layers 2024-10-31 22:08:16 -04:00
psychedelicious
4904700751 feat(ui): more info in state module repr 2024-10-31 22:08:16 -04:00
psychedelicious
83538c4b2b fix(ui): flash of canvas state between last progress image and generation result 2024-10-31 22:08:16 -04:00
psychedelicious
eb7b559529 fix(ui): sync canvas layer visibility when staging state changes 2024-10-31 22:08:16 -04:00
Kent Keirsey
4945465cf0 Merge branch 'main' into t2i_resolution_hack 2024-10-31 21:17:06 -04:00
Will
7eed7282a9 removing periods from update link to prevent page not found error 2024-11-01 07:42:31 +11:00
psychedelicious
47f0781822 fix(ui): add missing translations
Closes #7229
2024-11-01 07:40:52 +11:00
Eugene Brodsky
88b8e3e3d5 chore(deps): adjust pins for torch, numpy, other dependencies, to satisfy stricted dependency resolution 2024-10-31 16:26:53 -04:00
dunkeroni
47c3ab9214 Remove UI restrictions for T2I resolutions 2024-10-31 16:07:46 -04:00
dunkeroni
d6d436b59c Merge branch 'invoke-ai:main' into t2i_resolution_hack 2024-10-31 15:52:24 -04:00
Hippalectryon
6ff7057967 fix broken link in installer 2024-10-31 09:50:08 -04:00
psychedelicious
e032ab1179 fix(ui): ensure compositing rect is rendered correctly
This fixes an issue uncovered by the previous commit in which we do not exit filter/select-object on save-as.
2024-10-31 08:57:10 -04:00
psychedelicious
65bddfcd93 feat(ui): filter/select-object do not exit on save-as 2024-10-31 08:57:10 -04:00
aidawanglion
2d3ce418dd translationBot(ui): update translation (Chinese (Simplified Han script))
Currently translated at 73.7% (1160 of 1573 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/zh_Hans/
2024-10-31 17:18:35 +11:00
Hosted Weblate
548d72f7b9 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
2024-10-31 17:18:35 +11:00
aidawanglion
19837a0f29 translationBot(ui): update translation (Chinese (Simplified Han script))
Currently translated at 73.3% (1146 of 1563 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/zh_Hans/
2024-10-31 17:18:35 +11:00
aidawanglion
483b65a1dc translationBot(ui): update translation (Chinese (Simplified Han script))
Currently translated at 69.4% (1086 of 1563 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/zh_Hans/
2024-10-31 17:18:35 +11:00
Riccardo Giovanetti
b85931c7ab translationBot(ui): update translation (Italian)
Currently translated at 99.4% (1554 of 1563 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
2024-10-31 17:18:35 +11:00
Hosted Weblate
9225f47338 translationBot(ui): update translation files
Updated by "Remove blank strings" hook in Weblate.

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
2024-10-31 17:18:35 +11:00
Riccardo Giovanetti
bccac5e4a6 translationBot(ui): update translation (Italian)
Currently translated at 99.4% (1553 of 1562 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
2024-10-31 17:18:35 +11:00
Hosted Weblate
7cb07fdc04 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
2024-10-31 17:18:35 +11:00
dakota2472
b137450026 translationBot(ui): update translation (Italian)
Currently translated at 100.0% (1562 of 1562 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
2024-10-31 17:18:35 +11:00
Hosted Weblate
dc5090469a translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
2024-10-31 17:18:35 +11:00
Thomas Bolteau
e0ae2ace89 translationBot(ui): update translation (French)
Currently translated at 100.0% (1561 of 1561 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/fr/
2024-10-31 17:18:35 +11:00
Riku
269faae04b translationBot(ui): update translation (German)
Currently translated at 71.4% (1115 of 1561 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/de/
2024-10-31 17:18:35 +11:00
Riccardo Giovanetti
e282acd41c translationBot(ui): update translation (Italian)
Currently translated at 98.8% (1543 of 1561 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
2024-10-31 17:18:35 +11:00
Ettore Atalan
a266668348 translationBot(ui): update translation (German)
Currently translated at 69.3% (1083 of 1561 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/de/
2024-10-31 17:18:35 +11:00
Riccardo Giovanetti
3bb3e142fc translationBot(ui): update translation (Italian)
Currently translated at 98.8% (1543 of 1561 strings)

Translation: InvokeAI/Web UI
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
2024-10-31 17:18:35 +11:00
Hosted Weblate
6ac6d70a22 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

translationBot(ui): update translation files

Updated by "Cleanup translation files" hook in Weblate.

translationBot(ui): update translation files

Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-10-31 17:18:35 +11:00
Riccardo Giovanetti
b0acf33ba5 translationBot(ui): update translation (Italian)
Currently translated at 98.5% (1496 of 1518 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-10-31 17:18:35 +11:00
qyouqme
b3eb64b64c translationBot(ui): update translation (Chinese (Simplified Han script))
Currently translated at 66.0% (1003 of 1518 strings)

Co-authored-by: qyouqme <camtasiacn@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/zh_Hans/
Translation: InvokeAI/Web UI
2024-10-31 17:18:35 +11:00
Riku
95f8ab1a29 translationBot(ui): update translation (German)
Currently translated at 71.3% (1083 of 1518 strings)

Co-authored-by: Riku <riku.block@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/de/
Translation: InvokeAI/Web UI
2024-10-31 17:18:35 +11:00
Hosted Weblate
4e043384db translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-10-31 17:18:35 +11:00
psychedelicious
0f5df8ba17 chore(ui): lint 2024-10-31 16:54:31 +11:00
psychedelicious
2826ab48a2 refactor(ui): layer interaction locking
Previously we maintained an `isInteractable` flag, which was derived from these layer flags:
- Locked/unlocked
- Enabled/disabled
- Layer's type visible/hidden

When a layer was not interactable, we blocked all layer actions.

After comparing to the behaviour in Affinity and considering user feedback, I've loosened these restrictions while maintaining safety. First, some definitions.

There two kinds of layer actions - mutating actions and non-mutating actions.
- Mutating actions are drawing on the layer, cropping it, filtering it, converting it, etc. Anything that changes the layer.
- Non-mutating actions are copying the layer, saving the layer to gallery, etc. Anything that _uses_ the layer.

Then, there are two broad canvas states - busy and not busy. "Busy" means the canvas is actively filtering, staging, compositing layers together, etc - something that is "single-threaded" by nature.

And here are the revised restrictions:
- When canvas is busy, you cannot initiate any layer actions.
- When the canvas is not busy, and the layer is locked, you initiate any mutating actions.
- When the canvas is not busy and the layer is not locked, you can initiate any layer action.

Besides safely giving users more freedom, it also fixes an issue where the context menu for a layer was disabled if it was not the selected layer.
2024-10-31 16:54:31 +11:00
psychedelicious
7ff1b635c8 docs: clarify comments for invoke method return annotation validation 2024-10-31 16:21:07 +11:00
psychedelicious
dfb5e8b5e5 tests: add invoke method & output annotation tests 2024-10-31 16:21:07 +11:00
psychedelicious
7259da799c feat(nodes): attempt to look up invoke return types by name 2024-10-31 16:21:07 +11:00
psychedelicious
965069fce1 tests: fix nodes tests
they now require a valid output
2024-10-31 16:21:07 +11:00
psychedelicious
90232806d9 feat(nodes): add validation for invoke method return types 2024-10-31 16:21:07 +11:00
Hippalectryon
81bc153399 Fix link in dev docs 2024-10-31 16:06:44 +11:00
Jonathan
c63e526f99 Update FAQ.md
Fixed typo
2024-10-31 16:04:23 +11:00
nirmal0001
2b74263007 Update patchmatch.md
add required Install dependencies for arch linux
2024-10-31 16:01:57 +11:00
psychedelicious
d3a82f7119 feat(ui): do not show hftoken error until user attempts to set it 2024-10-31 15:47:14 +11:00
Mary Hipp
291c5a0341 lint 2024-10-31 15:47:14 +11:00
Mary Hipp
bcb41399ca feat(ui,api): support for HF tokens in UI, handle Unauthorized and Forbidden errors 2024-10-31 15:47:14 +11:00
psychedelicious
6f0f53849b tests: reset config changes in test_deny_nodes when finished testing 2024-10-31 15:22:14 +11:00
psychedelicious
4e7d63761a fix(nodes): nodes denylist handling
- Add method to force a rebuild of the pydantic type adapter for the union of invocations, which is used to validate graphs.
- Update the xfail'd test.
2024-10-31 15:22:14 +11:00
psychedelicious
198c84105d fix(ui): compositor not setting processing flag when cleaning up 2024-10-30 16:27:36 +11:00
psychedelicious
2453b9f443 chore: bump version to v5.3.0rc1 2024-10-30 13:11:41 +11:00
psychedelicious
b091aca986 chore(ui): lint 2024-10-30 11:05:46 +11:00
psychedelicious
8f02ce54a0 perf(ui): cache image data & transparency mode during generation mode calculation
Perf boost and reduces the number of images we create on the backend.
2024-10-30 11:05:46 +11:00
psychedelicious
f4b7c63002 feat(ui): omit non-render-impacting keys when hashing entities
Had missed several of these, which means we were invalidating caches far too often. For example, when you changed a RG prompt, we were invalidating the cached canvas for that entity, even though changing the prompt doesn't affect the canvas at all.
2024-10-30 11:05:46 +11:00
psychedelicious
a4629280b5 feat(ui): use typeguard instead of string comparison 2024-10-30 11:05:46 +11:00
psychedelicious
855fb007da tidy(ui): minor type fix 2024-10-30 11:05:46 +11:00
psychedelicious
d805b52c1f feat(ui): merge down deletes merged entities 2024-10-30 11:05:46 +11:00
psychedelicious
2ea55685bb feat(ui): add save to assets for inpaint & rg 2024-10-30 11:05:46 +11:00
psychedelicious
bd6ff3deaa feat(ui): add merge down for all entity types 2024-10-30 11:05:46 +11:00
psychedelicious
82dd53ec88 tidy(ui): clean up merge visible logic 2024-10-30 11:05:46 +11:00
psychedelicious
71d749541d feat(ui): control layers supports merge visible
The "lighter" GlobalCompositeOperation is used. This seems to be the best one when merging control layers, as it retains edge maps.
2024-10-30 11:05:46 +11:00
psychedelicious
48a57fc4b9 feat(ui): support globalCompositeOperation when compositing canvas 2024-10-30 11:05:46 +11:00
psychedelicious
530e0910fc feat(ui): regional guidance supports merge visible 2024-10-30 11:05:46 +11:00
psychedelicious
2fdf8fc0a2 feat(ui): merge visible creates new layer
Previously, merge visible deleted all other visible layers. This is not how affinity works, I should have confirmed before making it work like this in the first place.Ï
2024-10-30 11:05:46 +11:00
psychedelicious
91db9c9300 refactor(ui): generalize compositor methods
`CanvasCompositorModule` had a fairly inflexible API, only supporting compositing all raster layers or inpaint masks.

The API has been generalized work with a list of canvas entities. This enables `Merge Down` and `Merge Selected` functionality (though `Merge Selected` is not part of this set of changes).
2024-10-30 11:05:46 +11:00
psychedelicious
bc42205593 fix(ui): remember to disable isFiltering when finishing filtering 2024-10-30 09:19:30 +11:00
psychedelicious
2e3cba6416 fix(ui): flash of original layer when applying filter/segment
Let the parent module adopt the filtered/segemented image instead of destroying it and making the parent re-create it, which results in a brief flash of the parent layer's original objects before the new image is rendered.
2024-10-30 09:19:30 +11:00
psychedelicious
7852aacd11 fix(uI): track whether graph succeeded in runGraphAndReturnImageOutput
This prevents extraneous graph cancel requests when cleaning up the abort signal after a successful run of a graph.
2024-10-30 09:19:30 +11:00
psychedelicious
6cccd67ecd feat(ui): update SAM module to w/ minor improvements from filter module 2024-10-30 09:19:30 +11:00
psychedelicious
a7a89c9de1 feat(ui): use more resilient logic in canvas filter module, same as in SAM module 2024-10-30 09:19:30 +11:00
psychedelicious
5ca8eed89e tidy(ui): remove all buffer renderer interactions in SAM module
We don't use the buffer rendere in this module; there's no reason to clear it.
2024-10-30 09:19:30 +11:00
psychedelicious
c885c3c9a6 fix(ui): filter layer data pushed to parent rendered when saving as 2024-10-30 09:19:30 +11:00
Mary Hipp
d81c38c350 update announcements 2024-10-29 09:53:13 -04:00
Riku
92d5b73215 fix(ui): seamless zod parameter cleanup 2024-10-29 20:43:44 +11:00
Riku
097e92db6a fix(ui): always write seamless metadata
Ensure images without seamless enabled correctly reset the setting
when all parameters are recalled
2024-10-29 20:43:44 +11:00
Riku
84c6209a45 feat(ui): display seamless values in metadata viewer 2024-10-29 20:43:44 +11:00
Riku
107e48808a fix(ui): recall seamless settings 2024-10-29 20:43:44 +11:00
dunkeroni
47168b5505 chore: make ruff 2024-10-29 14:07:20 +11:00
dunkeroni
58152ec981 fix preview progress bar pre-denoise 2024-10-29 14:07:20 +11:00
dunkeroni
c74afbf332 convert to bgr on sdxl t2i 2024-10-29 14:07:20 +11:00
psychedelicious
7cdda00a54 feat(ui): rearrange canvas paste back nodes to save an image step
We were scaling the unscaled image and mask down before doing the paste-back, but this adds an extraneous step & image output.

We can do the paste-back first, then scale to output size after. So instead of 2 resizes before the paste-back, we have 1 resize after.

The end result is the same.
2024-10-29 11:13:31 +11:00
psychedelicious
a74282bce6 feat(ui): graph builders use objects for arg instead of many args 2024-10-29 11:13:31 +11:00
psychedelicious
107f048c7a feat(ui): extract canvas output node prefix to constant 2024-10-29 11:13:31 +11:00
Ryan Dick
a2486a5f06 Remove unused prediction_type and upcast_attention from from_single_file(...) calls. 2024-10-28 13:05:17 -04:00
Ryan Dick
07ab116efb Remove load_safety_checker=False from calls to from_single_file(...).
This param has been deprecated, and by including it (even when set to
False) the safety checker automatically gets downloaded.
2024-10-28 13:05:17 -04:00
Ryan Dick
1a13af3c7a Fix huggingface_hub.errors imports after version bump. 2024-10-28 13:05:17 -04:00
Ryan Dick
f2966a2594 Fix changed import for FromOriginalControlNetMixin after diffusers bump. 2024-10-28 13:05:17 -04:00
Ryan Dick
58bb97e3c6 Bump diffusers, accelerate, and huggingface-hub. 2024-10-28 13:05:17 -04:00
dunkeroni
34569a2410 Make T2I Adapters compatible with x8 resolutions 2024-10-27 15:38:22 -04:00
psychedelicious
a84aa5c049 fix(ui): canvas alerts blocking metadata panel 2024-10-27 09:46:01 +11:00
dunkeroni
acfa9c87ef Merge branch 'main' into sdxl_t2i_bgr 2024-10-25 23:44:13 -04:00
dunkeroni
f245d8e429 chore: make ruff 2024-10-25 23:43:33 -04:00
dunkeroni
62cf0f54e0 fix preview progress bar pre-denoise 2024-10-25 23:22:06 -04:00
dunkeroni
5f015e76ba convert to bgr on sdxl t2i 2024-10-25 23:04:17 -04:00
psychedelicious
aebcec28e0 chore: bump version to v5.3.0 2024-10-25 22:37:59 -04:00
psychedelicious
db1c5a94f7 feat(ui): image ctx -> New from Image -> Canvas as Raster/Control Layer 2024-10-25 22:27:00 -04:00
psychedelicious
56222a8493 feat(ui): organize layer context menu items 2024-10-25 22:27:00 -04:00
psychedelicious
b7510ce709 feat(ui): filter, select object and transform UI buttons
- Restore dedicated `Apply` buttons
- Remove icons from the buttons, too much noise when the words are short and clear
- Update loading state to show a spinner next to the `Process` button instead of on _every_ button
2024-10-25 22:27:00 -04:00
psychedelicious
5739799e2e fix(ui): close viewer when transforming 2024-10-25 22:27:00 -04:00
psychedelicious
813cf87920 feat(ui): move canvas alerts to top-left corner 2024-10-25 22:27:00 -04:00
psychedelicious
c95b151daf feat(ui): add layer title heading for canvas ctx menu 2024-10-25 22:27:00 -04:00
psychedelicious
a0f823a3cf feat(ui): reset shouldShowStagedImage flag when starting staging 2024-10-25 22:27:00 -04:00
Hippalectryon
64e0f6d688 Improve dev install docs
Fix numbering
2024-10-25 08:27:26 -04:00
psychedelicious
ddd5b1087c fix(nodes): return copies of objects in invocation ctx
Closes #6820
2024-10-25 08:26:09 -04:00
psychedelicious
008be9b846 feat(ui): add all save as options to filter 2024-10-25 08:12:14 -04:00
psychedelicious
8e7cabdc04 feat(ui): add Replace Current open to Select Object -> Save As 2024-10-25 08:12:14 -04:00
psychedelicious
a4c4237f99 feat(ui): use PiPlayFill for process buttons for filter & select object 2024-10-25 08:12:14 -04:00
psychedelicious
bda3740dcd feat(ui): use fill style icons for Filter 2024-10-25 08:12:14 -04:00
psychedelicious
5b4633baa9 feat(ui): use PiShapesFill icon for Select Object 2024-10-25 08:12:14 -04:00
psychedelicious
96351181cb feat(ui): make canvas layer toolbar icons a bit larger 2024-10-25 08:12:14 -04:00
psychedelicious
957d591d99 feat(ui): "Auto-Mask" -> "Select Object" 2024-10-25 08:12:14 -04:00
psychedelicious
75f605ba1a feat(ui): support inverted selection in auto-mask 2024-10-25 08:12:14 -04:00
psychedelicious
ab898a7180 chore(ui): typegen 2024-10-25 08:12:14 -04:00
psychedelicious
c9a4516ab1 feat(nodes): add invert to apply_tensor_mask_to_image 2024-10-25 08:12:14 -04:00
psychedelicious
fe97c0d5eb tweak(ui): default settings verbiage 2024-10-25 16:09:59 +11:00
psychedelicious
6056764840 feat(ui): disable default settings button when synced
A blue button is begging to be clicked, but clicking it will do nothing. Instead, we should communicate that no action is needed by disabling the button when the default settings are already in use.
2024-10-25 16:09:59 +11:00
psychedelicious
8747c0dbb0 fix(ui): handle no model selection in default settings tooltip 2024-10-25 16:09:59 +11:00
psychedelicious
c5cdd5f9c6 fix(ui): use const EMPTY_OBJECT to prevent rerenders 2024-10-25 16:09:59 +11:00
psychedelicious
abc5d53159 fix(ui): use explicit null check when comparing default settings
Using `&&` will result in false negatives for settings where a falsy value might be valid. For example, any setting for which 0 is a valid number. To be on the safe side, just use an explicit null check on all values.
2024-10-25 16:09:59 +11:00
psychedelicious
2f76019a89 tweak(ui): defaults sync tooltip styling 2024-10-25 16:09:59 +11:00
Mary Hipp
3f45beb1ed feat(ui): add out of sync details to model default settings button 2024-10-25 16:09:59 +11:00
Mary Hipp
bc1126a85b (ui): add setting for showing model descriptions in dropdown defaulted to true 2024-10-25 14:52:33 +11:00
psychedelicious
380017041e fix(app): mutating an image also changes the in-memory cached image
We use an in-memory cache for PIL images to reduce I/O. If a node mutates the image in any way, the cached image object is also updated (but the on-disk image file is not).

We've lucked out that this hasn't caused major issues in the past (well, maybe it has but we didn't understand them?) mainly because of a happy accident. When you call `context.images.get_pil` in a node, if you provide an image mode (e.g. `mode="RGB"`), we call `convert`  on the image. This returns a copy. The node can do whatever it wants to that copy and nothing breaks.

However, when mode is not specified, we return the image directly. This is where we get in trouble - nodes that load the image like this, and then mutate the image, update the cache. Other nodes that reference that same image will now get the mutated version of it.

The fix is super simple - we make sure to return only copies from `get_pil`.
2024-10-25 10:22:22 +11:00
psychedelicious
ab7cdbb7e0 fix(ui): do not delete point on right-mouse click 2024-10-25 10:22:22 +11:00
psychedelicious
e5b78d0221 fix(ui): canvas drop area grid layout 2024-10-25 10:22:22 +11:00
psychedelicious
1acaa6c486 chore: bump version to v5.3.0rc2 2024-10-25 07:50:58 +11:00
psychedelicious
b0381076b7 revert(ui): drop targets for inpaint mask and rg 2024-10-25 07:42:46 +11:00
psychedelicious
ffff2d6dbb feat(ui): add New from Image submenu for image ctx menu 2024-10-25 07:42:46 +11:00
psychedelicious
afa9f07649 fix(ui): missing cursor when transforming 2024-10-25 07:42:46 +11:00
psychedelicious
addb5c49ea feat(ui): support dnd images onto inpaint mask/rg entities 2024-10-25 07:42:46 +11:00
psychedelicious
a112d2d55b feat(ui): add logging to useCopyLayerToClipboard 2024-10-25 07:42:46 +11:00
psychedelicious
619a271c8a feat(ui): disable copy to clipboard when layer is empty 2024-10-25 07:42:46 +11:00
psychedelicious
909f2ee36d feat(ui): add help tooltip to automask 2024-10-25 07:42:46 +11:00
psychedelicious
b4cf3d9d03 fix(ui): canvas context menu w/ eraser tool erases 2024-10-25 07:42:46 +11:00
psychedelicious
e6ab6e0293 chore(ui): lint 2024-10-24 08:39:29 -04:00
psychedelicious
66d9c7c631 fix(ui): icon for automask save as 2024-10-24 08:39:29 -04:00
psychedelicious
fec45f3eb6 feat(ui): animate automask preview overlay 2024-10-24 08:39:29 -04:00
psychedelicious
7211d1a6fc feat(ui): add context menu options for layer type convert/copy 2024-10-24 08:39:29 -04:00
psychedelicious
f3069754a9 feat(ui): add logic to convert/copy between all layer types 2024-10-24 08:39:29 -04:00
psychedelicious
4f43152aeb fix(ui): handle pen/touch events on submenu 2024-10-24 08:39:29 -04:00
psychedelicious
7125055d02 fix(ui): icon menu item group spacing 2024-10-24 08:39:29 -04:00
psychedelicious
c91a9ce390 feat(ui): add pull bbox to global ref image ctx menu 2024-10-24 08:39:29 -04:00
psychedelicious
3e7b73da2c feat(ui): add entity context menu as canvas context menu sub-menu 2024-10-24 08:39:29 -04:00
psychedelicious
61ac50c00d feat(ui): use sub-menu for image metadata recall 2024-10-24 08:39:29 -04:00
psychedelicious
c1201f0bce feat(ui): add useSubMenu hook to abstract logic for sub-menus 2024-10-24 08:39:29 -04:00
psychedelicious
acdffac5ad feat(ui): close viewer when filtering/transforming/automasking 2024-10-24 08:39:29 -04:00
psychedelicious
e420300fa4 feat(ui): replace automask apply w/ save as menu 2024-10-24 08:39:29 -04:00
psychedelicious
260a5a4f9a feat(ui): add automask button to toolbar 2024-10-24 08:39:29 -04:00
psychedelicious
ed0c2006fe feat(ui): rename "foreground"/"background" -> "include"/"exclude" 2024-10-24 08:39:29 -04:00
psychedelicious
9ffd888c86 feat(ui): remove neutral points 2024-10-24 08:39:29 -04:00
psychedelicious
175a9dc28d feat(ui): more resilient auto-masking processing
- Use a hash of the last processed points instead of a `hasProcessed` flag to determine whether or not we should re-process a given set of points.
- Store point coords in state instead of pulling them out of the konva node positions. This makes moving a point a more explicit action in code.
- Add a `roundCoord` util to round the x and y values of a coordinate.
- Ensure we always re-process when $points changes.
2024-10-24 08:39:29 -04:00
psychedelicious
5764e4f7f2 chore(ui): lint 2024-10-24 23:34:06 +11:00
psychedelicious
4275a494b9 tweak(ui): bundle info icon 2024-10-24 23:34:06 +11:00
psychedelicious
a3deb8d30d tweak(ui): bundle tooltip styling 2024-10-24 23:34:06 +11:00
Mary Hipp
aafdb0a37b update popover copy 2024-10-24 23:34:06 +11:00
Mary Hipp
56a815719a update schema 2024-10-24 23:34:06 +11:00
Mary Hipp
4db26bfa3a (ui): add information popovers for other layer types 2024-10-24 23:34:06 +11:00
Mary Hipp
8d84ccb12b bump UI dep for combobox descriptions 2024-10-24 23:34:06 +11:00
Mary Hipp
3321d14997 undo show descriptions for now 2024-10-24 23:34:06 +11:00
maryhipp
43cc4684e1 (api) make sure all controlnet starter models will still have pre-processors correctly assigned when probed based on name 2024-10-24 23:34:06 +11:00
Mary Hipp
afa5a4b17c (ui): add informational popover for controlnet layers 2024-10-24 23:34:06 +11:00
Mary Hipp
33c433fe59 (ui): show models in starter bundles on hover, use previous_names for isInstalled logic, allow grouped model combobox to optionally show descriptions 2024-10-24 23:34:06 +11:00
maryhipp
9cd47fa857 (api): update names of starter models, add ability to track previous_names so it does not mess up logic that prevents dupe starter model installs 2024-10-24 23:34:06 +11:00
psychedelicious
32d9abe802 tweak(ui): prevent show/hide boards button cutoff
The use of hard 25% widths caused issues for some translations. Adjusted styling to not rely on any hard numbers. Tested with a project name and URL.
2024-10-24 08:21:16 -04:00
psychedelicious
3947d4a165 fix(ui): normalize infill alpha to 0-255 when building infill nodes
The browser/UI uses float 0-1 for alpha, while backend uses 0-255. We need to normalize the value when building the infill nodes for outpaint.
2024-10-24 19:22:36 +11:00
psychedelicious
3583d03b70 feat(ui): improve subs and cleanup in filterer module
- Subscribe when starting the filterer
- Remember to abort the abortcontroller when destroying
- Unsubscribe when destroying
2024-10-23 08:21:12 -04:00
psychedelicious
bc954b9996 feat(ui): abort controller in SAM module when destroying 2024-10-23 08:21:12 -04:00
psychedelicious
c08075946a feat(ui): only subscribe listeners when segmenting
Realized we are doing a lot of event listening even when segmenting is not occuring. I don't think this will have a meaningful performance impact, but it makes sense to remove these listeners when not in use.
2024-10-23 08:21:12 -04:00
psychedelicious
df8df914e8 docs(ui): add comments to CanvasSegmentAnythingModule 2024-10-23 08:21:12 -04:00
psychedelicious
33924e8491 feat(ui): ensure abort controllers are cleaned up 2024-10-23 08:21:12 -04:00
psychedelicious
7e5ce1d69d fix(ui): when last SAM point is deleted, reset ephemeral state 2024-10-23 08:21:12 -04:00
Riku
6a24594140 feat(ui): move model manager in-place install state to redux
- persists across sessions/refreshes
- shared state for all installers (local path, scan folder)
2024-10-23 21:17:31 +11:00
psychedelicious
61d26cffe6 chore: bump version to v5.3.0rc1 2024-10-23 16:11:20 +11:00
psychedelicious
fdbc244dbe tidy(ui): autoProcessFilter -> autoProcess
It's used for more than filters now.
2024-10-23 16:01:15 +11:00
psychedelicious
0eea84c90d chore(ui): lint 2024-10-23 16:01:15 +11:00
psychedelicious
e079a91800 feat(ui): reorder point type radios 2024-10-23 16:01:15 +11:00
psychedelicious
eb20173487 fix(ui): set hasProcessed on segment module when deleting a point 2024-10-23 16:01:15 +11:00
psychedelicious
20dd0779b5 feat(ui): use radio instead of drop-down for point label 2024-10-23 16:01:15 +11:00
psychedelicious
b384a92f5c fix(ui): let segment module handle cursor if segmenting 2024-10-23 16:01:15 +11:00
psychedelicious
116d32fbbe feat(ui): auto-process for segment anything 2024-10-23 16:01:15 +11:00
psychedelicious
b044f31a61 fix(ui): translation for isolated layer preview 2024-10-23 16:01:15 +11:00
psychedelicious
6c3c24403b feat(ui): rename "Segment" -> "Auto Mask" 2024-10-23 16:01:15 +11:00
psychedelicious
591f48bb95 chore(ui): lint 2024-10-23 16:01:15 +11:00
psychedelicious
dc6e45485c feat(ui): update CanvasSegmentAnythingModule for new nodes 2024-10-23 16:01:15 +11:00
psychedelicious
829820479d chore(ui): typegen 2024-10-23 16:01:15 +11:00
psychedelicious
48a471bfb8 fix(nodes): apply_tensor_mask_to_image transparent image handling
Fix an issue where if the input image is transparent in a region to be masked, that transparent region ends up opaque black. Need to respect the input image transparency by applying the mask to the alpha channel only.
2024-10-23 16:01:15 +11:00
psychedelicious
ff72315db2 feat(nodes): update SAM backend and nodes to work with SAM points 2024-10-23 16:01:15 +11:00
psychedelicious
790846297a feat(ui): add more data to canvas module reprs 2024-10-23 16:01:15 +11:00
psychedelicious
230b455a13 tidy(ui): $pointTypeEnglish -> $pointTypeString 2024-10-23 16:01:15 +11:00
psychedelicious
71f0fff55b fix(ui): right click on stage draws 2024-10-23 16:01:15 +11:00
psychedelicious
7f2c83b9e6 feat(ui): consolidate isolated preview settings
`isolatedFilteringPreview` and `isolatedTransformingPreview` are merged into `isolatedLayerPreview`. This is also used for segment anything.
2024-10-23 16:01:15 +11:00
psychedelicious
bc85bd4bd4 tidy(ui): clean up and document CanvasSegmentAnythingModule 2024-10-23 16:01:15 +11:00
psychedelicious
38b09d73e4 feat(ui): masking UX (wip - interaction state issue) 2024-10-23 16:01:15 +11:00
psychedelicious
606c4ae88c feat(ui): masking UX (wip - issue w/ positioning) 2024-10-23 16:01:15 +11:00
psychedelicious
f666bac77f tidy(ui): CanvasToolView -> CanvasViewToolModule 2024-10-23 16:01:15 +11:00
psychedelicious
c9bf7da23a tidy(ui): CanvasToolRect -> CanvasRectToolModule 2024-10-23 16:01:15 +11:00
psychedelicious
dfc65b93e9 tidy(ui): CanvasToolMove -> CanvasMoveToolModule 2024-10-23 16:01:15 +11:00
psychedelicious
9ca40b4cf5 tidy(ui): CanvasToolErase -> CanvasEraserToolModule 2024-10-23 16:01:15 +11:00
psychedelicious
d571e71d5e tidy(ui): CanvasToolColorPicker -> CanvasColorPickerToolModule 2024-10-23 16:01:15 +11:00
psychedelicious
ad1e6c3fe6 tidy(ui): CanvasToolBrush -> CanvasBrushToolModule 2024-10-23 16:01:15 +11:00
psychedelicious
21d02911dd tidy(ui): CanvasBboxModule -> CanvasBboxToolModule, move file 2024-10-23 16:01:15 +11:00
psychedelicious
43afe0bd9a feat(ui): move cursor handling to tool modules
Also add cursors for move tool and bbox tool - when pointer is over the layer or bbox, use the move cursor.
2024-10-23 16:01:15 +11:00
psychedelicious
e7a68c446d feat(ui): add CanvasToolView
It's nearly a noop but I think it makes sense to have a module for each tool...
2024-10-23 16:01:15 +11:00
psychedelicious
b9c68a2e7e feat(ui): add CanvasToolMove
It's essentially a noop but I think it makes sense to have a module for each tool...
2024-10-23 16:01:15 +11:00
psychedelicious
371a1b1af3 feat(ui): make CanvasBboxModule child of CanvasToolModule 2024-10-23 16:01:15 +11:00
psychedelicious
dae4591de6 feat(ui): let tool modules set own visibility 2024-10-23 16:01:15 +11:00
psychedelicious
8ccb2e30ce feat(ui): bail on stage events when not targeting the stage 2024-10-23 16:01:15 +11:00
psychedelicious
b8106a4613 fix(ui): bail on drawing when mouse not down 2024-10-23 16:01:15 +11:00
psychedelicious
ce51e9582a feat(ui): add CanvasRectTool 2024-10-23 16:01:15 +11:00
psychedelicious
00848eb631 feat(ui): let color picker tool handle its events 2024-10-23 16:01:15 +11:00
psychedelicious
b48430a892 feat(ui): let eraser tool handle its events 2024-10-23 16:01:15 +11:00
psychedelicious
f94a218561 tidy(ui): remove extraneous checks from CanvasToolBrush 2024-10-23 16:01:15 +11:00
psychedelicious
9b6ed40875 fix(ui): edge case where pressure could be added erroneously to points 2024-10-23 16:01:15 +11:00
psychedelicious
26553dbb0e tidy(ui): CanvasToolModule 2024-10-23 16:01:15 +11:00
psychedelicious
9eb695d0b4 docs(ui): update CanvasToolModule 2024-10-23 16:01:15 +11:00
psychedelicious
babab17e1d feat(ui): let brush tool handle its events
Move brush tool event logic to its class.
2024-10-23 16:01:15 +11:00
psychedelicious
d0a80f3347 feat(ui): create zCoordinateWithPressure & export type from canvas types 2024-10-23 16:01:15 +11:00
psychedelicious
9b30363177 tidy(ui): CanvasToolModule structure 2024-10-23 16:01:15 +11:00
psychedelicious
89bde36b0c feat(ui): support draggable SAM points 2024-10-23 16:01:15 +11:00
psychedelicious
86a8476d97 feat(ui): working segment anything flow 2024-10-23 16:01:15 +11:00
psychedelicious
afa0661e55 chore(ui): typegen 2024-10-23 16:01:15 +11:00
psychedelicious
ba09c1277f feat(nodes): hacked together nodes for segment anything w/ points 2024-10-23 16:01:15 +11:00
psychedelicious
80bf9ddb71 feat(ui): rough out points UI for segment anything module 2024-10-23 16:01:15 +11:00
psychedelicious
1dbc98d747 feat(ui): add CanvasSegmentAnythingModule (wip) 2024-10-23 16:01:15 +11:00
psychedelicious
0698188ea2 feat(ui): support readonly arrays in SerializableObject type 2024-10-23 16:01:15 +11:00
psychedelicious
59d0ad4505 chore(ui): migrate from ts-toolbelt to type-fest
`ts-toolbelt` is unmaintained while `type-fest` is very actively maintained. Both provide similar TS utilities.
2024-10-23 16:01:15 +11:00
Thomas Bolteau
074a5692dd translationBot(ui): update translation (French)
Currently translated at 100.0% (1509 of 1509 strings)

translationBot(ui): update translation (French)

Currently translated at 100.0% (1509 of 1509 strings)

Co-authored-by: Thomas Bolteau <thomas.bolteau50@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/fr/
Translation: InvokeAI/Web UI
2024-10-23 10:23:37 +11:00
Васянатор
bb0741146a translationBot(ui): update translation (Russian)
Currently translated at 99.6% (1504 of 1509 strings)

Co-authored-by: Васянатор <ilabulanov339@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/ru/
Translation: InvokeAI/Web UI
2024-10-23 10:23:37 +11:00
Riccardo Giovanetti
1845d9a87a translationBot(ui): update translation (Italian)
Currently translated at 98.8% (1492 of 1509 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-10-23 10:23:37 +11:00
Riku
748c393e71 translationBot(ui): update translation (German)
Currently translated at 71.0% (1072 of 1509 strings)

Co-authored-by: Riku <riku.block@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/de/
Translation: InvokeAI/Web UI
2024-10-23 10:23:37 +11:00
David Burnett
9bd17ea02f Get flux working with MPS on 2.4.1, with GGUF support 2024-10-23 10:20:42 +11:00
David Burnett
24f9b46fbc ruff fix 2024-10-23 10:09:24 +11:00
David Burnett
54b3aa1d01 load t5 model in the same format as it is saved, seems to load as float32 on Macs 2024-10-23 10:09:24 +11:00
Maximilian Maag
d85733f22b fix(installer): pytorch and ROCm versions are incompatible
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes #7006
Closes #7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
2024-10-23 09:59:00 +11:00
psychedelicious
aff6ad0316 FLUX XLabs IP-Adapter Support (#7157)
## Summary

This PR adds support for the XLabs IP-Adapter
(https://huggingface.co/XLabs-AI/flux-ip-adapter) in workflows. Linear
UI integration is coming in a follow-up PR. The XLabs IP-Adapter can be
installed in the Starter Models tab.

Usage tips:

- Use a `cfg_scale` value of 2.0 to 4.0
- Start with an IP-Adatper weight of ~0.6 and adjust from there.
- Set `cfg_scale_start_step = 1`
- Set `cfg_scale_end_step` to roughly the halfway point (it's
unnecessary to apply CFG to all steps, and this will improve processing
time).

Sample workflow:
<img width="976" alt="image"
src="https://github.com/user-attachments/assets/4627b459-7e5a-4703-80e7-f7575c5fce19">

Result:

![image](https://github.com/user-attachments/assets/220b6a4c-69c6-447f-8df6-8aa6a56f3b3f)

## Related Issues / Discussions

Prerequisite: https://github.com/invoke-ai/InvokeAI/pull/7152

## Remaining TODO:

- [ ] Update default workflows.

## QA Instructions

- [x] Test basic happy path
- [x] Test with multiple IP-Adapters (it runs, but results aren't great)
- [ ] ~Test with multiple images to a single IP-Adapter~ (this is not
supported for now)
- [ ] Test automatic runtime installation of CLIP-L, CLIP-H, and CLIP-G
image encoder models if they are not already installed.
- [ ] Test starter model installation of the XLabs FLUX IP-Adapter
- [ ] Test SD and SDXL IP-Adapters for regression.
- [ ] Check peak memory utilization.

## Merge Plan

- [ ] Merge #7152 
- [ ] Change target branch to main

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [ ] _Documentation added / updated (if applicable)_
2024-10-23 09:57:39 +11:00
psychedelicious
61496fdcbc fix(nodes): load IP Adapter images as RGB
FLUX IP Adapter only works with RGB. Did the same for non-FLUX to be safe & consistent, though I don't think it's strictly necessary.
2024-10-23 08:34:15 +10:00
psychedelicious
ee8975401a fix(ui): remove special handling for flux in IPAdapterModel
This masked an issue w/ the CLIP Vision model. Issue is now handled in reducer/graph builder.
2024-10-23 08:31:10 +10:00
psychedelicious
bf3260446d fix(ui): use flux_ip_adapter for flux 2024-10-23 08:30:11 +10:00
psychedelicious
f53823b45e fix(ui): update CLIP Vision when ipa model changes 2024-10-23 08:29:14 +10:00
Ryan Dick
5cbe89afdd Merge branch 'main' into ryan/flux-ip-adapter-cfg-2 2024-10-22 21:17:36 +00:00
Ryan Dick
c466d50c3d FLUX CFG support (#7152)
## Summary

Add support for Classifier-Free Guidance with FLUX.

- Using CFG doubles the time for the denoising process. Running both the
positive and negative conditioning in a single batch is left for future
work, because most users are already VRAM-constrained (this would
probably be faster at the cost of higher peak VRAM).
- Negative text conditioning is optional and only required if `cfg_scale
!= 1.0`
- CFG is skipped if `cfg_scale == 1.0` (i.e. no compute overhead in this
case)
- `cfg_scale_start_step` and `cfg_scale_end_step` can be used to easily
control the range of steps that CFG is applied for.
- CFG is a prerequisite for IP-Adapter support.

## Example

Positive Caption: `Professional photography of a luxury hotel in the
Nevada desert`
CFG: 1.0

![image](https://github.com/user-attachments/assets/f25ff832-d69b-4c5f-88f4-9429ce96d598)

Positive Caption: `Professional photography of a luxury hotel in the
Nevada desert`
Negative Caption: `Swimming pool`
CFG: 2.0
Same seed

![image](https://github.com/user-attachments/assets/27e3b952-2795-469f-bb24-b7fddb726ba1)


## QA Instructions

- [ ] Test interactions with ControlNet
- [ ] Verify that peak RAM/VRAM utilization has not increased
significantly
- [ ] Test that CFG is skipped when cfg_scale == 1.0
- [ ] Test that negative text conditioning can be omitted when cfg_scale
== 1.0
- [ ] Test that a clear error message is returned when negative text
conditioning is omitted when cfg_scale != 1.0
- [ ] Test that the negative text prompt gets applied when cfg_scale
>1.0
- [ ] Test that a collection of cfg_scale values can be provided for
per-step control.
- [ ] Test that `cfg_scale_start_step` and `cfg_scale_end_step` control
the range of steps that CFG is applied

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
2024-10-22 17:09:40 -04:00
Ryan Dick
d20b894a61 Add cfg_scale_start_step and cfg_scale_end_step to FLUX Denoise node. 2024-10-23 07:59:48 +11:00
Ryan Dick
20362448b9 Make negative_text_conditioning nullable on FLUX Denoise invocation. 2024-10-23 07:59:48 +11:00
Ryan Dick
5df10cc494 Add support for cfg_scale list on FLUX Denoise node. 2024-10-23 07:59:48 +11:00
Ryan Dick
da171114ea Naive implementation of CFG for FLUX. 2024-10-23 07:59:48 +11:00
Eugene Brodsky
62919a443c fix(installer): remove xformers before installation 2024-10-23 07:57:52 +11:00
Mary Hipp
ffcec91d87 Merge branch 'ryan/flux-ip-adapter-cfg-2' of https://github.com/invoke-ai/InvokeAI into ryan/flux-ip-adapter-cfg-2 2024-10-22 15:23:35 -04:00
Mary Hipp
0a96466b60 feat(ui): add IP adapters to FLUX in linear UI 2024-10-22 15:22:56 -04:00
Ryan Dick
e48cab0276 Only allow a single image prompt for FLUX IP-Adapters (haven't really looked into this much, but punting on it for now). 2024-10-22 16:32:01 +00:00
Ryan Dick
740f6eb19f Skip tests that use the meta device - they fail on the MacOS CI runners. 2024-10-22 15:56:49 +00:00
psychedelicious
d1bb4c2c70 fix(nodes): FluxDenoiseInvocation.controlnet_vae missing default=None 2024-10-22 10:54:15 +11:00
Ryan Dick
e545f18a45 (minor) Fix ruff. 2024-10-21 22:38:06 +00:00
Ryan Dick
e8cd1bb3d8 Add FLUX IP-Adapter starter models. 2024-10-21 22:17:42 +00:00
Ryan Dick
90a906e203 Simplify handling of CLIP ViT selection for FLUX IP-Adapter invocation. 2024-10-21 19:54:59 +00:00
Ryan Dick
5546110127 Add FluxIPAdapterInvocation. 2024-10-21 18:27:40 +00:00
Ryan Dick
73bbb12f7a Use a black image as the negative IP prompt for parity with X-Labs implementation. 2024-10-21 15:47:22 +00:00
Ryan Dick
dde54740c5 Test out IP-Adapter with CFG. 2024-10-21 15:47:17 +00:00
Ryan Dick
f70a8e2c1a A bunch of HACKS to get ViT-L CLIP vision encoder working for FLUX IP-Adapter. Need to revisit how to clean this all up long term. 2024-10-21 15:43:00 +00:00
Ryan Dick
fdccdd52d5 Fixes to get XLabsIpAdapterExtension running. 2024-10-21 15:43:00 +00:00
Ryan Dick
31ffd73423 Initial draft of integrating FLUX IP-Adapter inference support. 2024-10-21 15:42:56 +00:00
Ryan Dick
3fa1012879 Add IPAdapterDoubleBlocks wrapper to tidy FLUX ip-adapter handling. 2024-10-21 15:38:50 +00:00
Ryan Dick
c2a8fbd8d6 (minor) Move infer_xlabs_ip_adapter_params_from_state_dict(...) to state_dict_utils.py. 2024-10-21 15:38:50 +00:00
Ryan Dick
d6643d7263 Add model loading code for xlabs FLUX IP-Adapter (not tested). 2024-10-21 15:38:50 +00:00
Ryan Dick
412e79d8e6 Add model probing for XLabs FLUX IP-Adapter. 2024-10-21 15:38:50 +00:00
Ryan Dick
f939dbdc33 Add is_state_dict_xlabs_ip_adapter() utility function. 2024-10-21 15:38:50 +00:00
Ryan Dick
24a0ca86f5 Add logic for loading an Xlabs IP-Adapter from a state dict. 2024-10-21 15:38:50 +00:00
Ryan Dick
95c30f6a8b Add initial logic for inferring FLUX IP-Adapter params from a state_dict. 2024-10-21 15:38:50 +00:00
Ryan Dick
ac7441e606 Fixup typing/imports for IPDoubleStreamBlockProcessor. 2024-10-21 15:38:50 +00:00
Ryan Dick
9c9af312fe Copy IPDoubleStreamBlockProcessor from 47495425db/src/flux/modules/layers.py (L221). 2024-10-21 15:38:50 +00:00
Ryan Dick
7bf5927c43 Add XLabs IP-Adapter state dict for unit tests. 2024-10-21 15:38:50 +00:00
Ryan Dick
32c7cdd856 Add cfg_scale_start_step and cfg_scale_end_step to FLUX Denoise node. 2024-10-21 14:52:02 +00:00
Mary Hipp
bbd89d54b4 add it to list 2024-10-19 14:08:49 +11:00
Mary Hipp
ee61006a49 add starter model 2024-10-19 14:08:49 +11:00
psychedelicious
0b43f5fd64 docs(ui): improve docstrings for LoggingOverrides 2024-10-19 08:04:20 +11:00
psychedelicious
6c61266990 refactor(ui): logging config handling
Introduce two-stage logging configuration and overrides for enabled status, log level and log namespaces.

The first stage in `<InvokeAIUI />`, before we set up redux (and therefore before we have access to the user's configured logging setup). In this stage, we use the overrides or default values.

The second stage is in `<App />`, after we set up redux, via `useSyncLoggingConfig`. In this stage, we use the overrides or the user's configured logging setup. This hook also handles pushing changes made by the user into localstorage.

Other changes:
- Extract logging config to util function
- Remove the `useEffect` from `SettingsModal` that was changing the logging settings
- Remove extraneous log effects from `useLogger`
- Export new `LoggingOverrides` type
2024-10-19 08:04:20 +11:00
Maximilian Maag
2d5afe8094 fix(installer): Print maximize suggestion when Python is found, not when it's missing 2024-10-18 16:35:51 -04:00
Maximilian Maag
2430137d19 fix(installer): Avoid misleading error message when searching for python binary
which prints a message to stderr when it doesn't find anything. In this case,
not finding anything is expected so the error is misleading.
2024-10-18 16:35:51 -04:00
Ryan Dick
6df4ee5fc8 Make negative_text_conditioning nullable on FLUX Denoise invocation. 2024-10-18 20:31:27 +00:00
Ryan Dick
371742d8f9 Add support for cfg_scale list on FLUX Denoise node. 2024-10-18 20:14:47 +00:00
psychedelicious
5440c03767 fix(app): directory traversal when deleting images 2024-10-18 14:27:41 +11:00
Ryan Dick
73d4c4d56d Naive implementation of CFG for FLUX. 2024-10-16 16:22:35 +00:00
561 changed files with 34791 additions and 12446 deletions

View File

@@ -19,3 +19,4 @@
- [ ] _The PR has a short but descriptive title, suitable for a changelog_
- [ ] _Tests added / updated (if applicable)_
- [ ] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_

14
SECURITY.md Normal file
View File

@@ -0,0 +1,14 @@
# Security Policy
## Supported Versions
Only the latest version of Invoke will receive security updates.
We do not currently maintain multiple versions of the application with updates.
## Reporting a Vulnerability
To report a vulnerability, contact the Invoke team directly at security@invoke.ai
At this time, we do not maintain a formal bug bounty program.
You can also share identified security issues with our team on huntr.com

View File

@@ -38,7 +38,7 @@ RUN --mount=type=cache,target=/root/.cache/pip \
if [ "$TARGETPLATFORM" = "linux/arm64" ] || [ "$GPU_DRIVER" = "cpu" ]; then \
extra_index_url_arg="--extra-index-url https://download.pytorch.org/whl/cpu"; \
elif [ "$GPU_DRIVER" = "rocm" ]; then \
extra_index_url_arg="--extra-index-url https://download.pytorch.org/whl/rocm5.6"; \
extra_index_url_arg="--extra-index-url https://download.pytorch.org/whl/rocm6.1"; \
else \
extra_index_url_arg="--extra-index-url https://download.pytorch.org/whl/cu124"; \
fi &&\

View File

@@ -1364,7 +1364,6 @@ the in-memory loaded model:
|----------------|-----------------|------------------|
| `config` | AnyModelConfig | A copy of the model's configuration record for retrieving base type, etc. |
| `model` | AnyModel | The instantiated model (details below) |
| `locker` | ModelLockerBase | A context manager that mediates the movement of the model into VRAM |
### get_model_by_key(key, [submodel]) -> LoadedModel

View File

@@ -5,7 +5,7 @@ If you're a new contributor to InvokeAI or Open Source Projects, this is the gui
## New Contributor Checklist
- [x] Set up your local development environment & fork of InvokAI by following [the steps outlined here](../dev-environment.md)
- [x] Set up your local tooling with [this guide](InvokeAI/contributing/LOCAL_DEVELOPMENT/#developing-invokeai-in-vscode). Feel free to skip this step if you already have tooling you're comfortable with.
- [x] Set up your local tooling with [this guide](../LOCAL_DEVELOPMENT.md). Feel free to skip this step if you already have tooling you're comfortable with.
- [x] Familiarize yourself with [Git](https://www.atlassian.com/git) & our project structure by reading through the [development documentation](development.md)
- [x] Join the [#dev-chat](https://discord.com/channels/1020123559063990373/1049495067846524939) channel of the Discord
- [x] Choose an issue to work on! This can be achieved by asking in the #dev-chat channel, tackling a [good first issue](https://github.com/invoke-ai/InvokeAI/contribute) or finding an item on the [roadmap](https://github.com/orgs/invoke-ai/projects/7). If nothing in any of those places catches your eye, feel free to work on something of interest to you!

View File

@@ -17,46 +17,49 @@ If you just want to use Invoke, you should use the [installer][installer link].
## Setup
1. Run through the [requirements][requirements link].
1. [Fork and clone][forking link] the [InvokeAI repo][repo link].
1. Create an directory for user data (images, models, db, etc). This is typically at `~/invokeai`, but if you already have a non-dev install, you may want to create a separate directory for the dev install.
1. Create a python virtual environment inside the directory you just created:
2. [Fork and clone][forking link] the [InvokeAI repo][repo link].
3. Create an directory for user data (images, models, db, etc). This is typically at `~/invokeai`, but if you already have a non-dev install, you may want to create a separate directory for the dev install.
4. Create a python virtual environment inside the directory you just created:
```sh
python3 -m venv .venv --prompt InvokeAI-Dev
```
```sh
python3 -m venv .venv --prompt InvokeAI-Dev
```
1. Activate the venv (you'll need to do this every time you want to run the app):
5. Activate the venv (you'll need to do this every time you want to run the app):
```sh
source .venv/bin/activate
```
```sh
source .venv/bin/activate
```
1. Install the repo as an [editable install][editable install link]:
6. Install the repo as an [editable install][editable install link]:
```sh
pip install -e ".[dev,test,xformers]" --use-pep517 --extra-index-url https://download.pytorch.org/whl/cu121
```
```sh
pip install -e ".[dev,test,xformers]" --use-pep517 --extra-index-url https://download.pytorch.org/whl/cu121
```
Refer to the [manual installation][manual install link]] instructions for more determining the correct install options. `xformers` is optional, but `dev` and `test` are not.
Refer to the [manual installation][manual install link]] instructions for more determining the correct install options. `xformers` is optional, but `dev` and `test` are not.
1. Install the frontend dev toolchain:
7. Install the frontend dev toolchain:
- [`nodejs`](https://nodejs.org/) (recommend v20 LTS)
- [`pnpm`](https://pnpm.io/installation#installing-a-specific-version) (must be v8 - not v9!)
- [`pnpm`](https://pnpm.io/8.x/installation) (must be v8 - not v9!)
1. Do a production build of the frontend:
8. Do a production build of the frontend:
```sh
pnpm build
```
```sh
cd PATH_TO_INVOKEAI_REPO/invokeai/frontend/web
pnpm i
pnpm build
```
1. Start the application:
9. Start the application:
```sh
python scripts/invokeai-web.py
```
```sh
cd PATH_TO_INVOKEAI_REPO
python scripts/invokeai-web.py
```
1. Access the UI at `localhost:9090`.
10. Access the UI at `localhost:9090`.
## Updating the UI

View File

@@ -38,7 +38,7 @@ This project is a combined effort of dedicated people from across the world. [C
## Code of Conduct
The InvokeAI community is a welcoming place, and we want your help in maintaining that. Please review our [Code of Conduct](https://github.com/invoke-ai/InvokeAI/blob/main/CODE_OF_CONDUCT.md) to learn more - it's essential to maintaining a respectful and inclusive environment.
The InvokeAI community is a welcoming place, and we want your help in maintaining that. Please review our [Code of Conduct](https://github.com/invoke-ai/InvokeAI/blob/main/docs/CODE_OF_CONDUCT.md) to learn more - it's essential to maintaining a respectful and inclusive environment.
By making a contribution to this project, you certify that:

View File

@@ -209,7 +209,7 @@ checkpoint models.
To solve this, go to the Model Manager tab (the cube), select the
checkpoint model that's giving you trouble, and press the "Convert"
button in the upper right of your browser window. This will conver the
button in the upper right of your browser window. This will convert the
checkpoint into a diffusers model, after which loading should be
faster and less memory-intensive.

View File

@@ -97,16 +97,16 @@ Prior to installing PyPatchMatch, you need to take the following steps:
sudo pacman -S --needed base-devel
```
2. Install `opencv` and `blas`:
2. Install `opencv`, `blas`, and required dependencies:
```sh
sudo pacman -S opencv blas
sudo pacman -S opencv blas fmt glew vtk hdf5
```
or for CUDA support
```sh
sudo pacman -S opencv-cuda blas
sudo pacman -S opencv-cuda blas fmt glew vtk hdf5
```
3. Fix the naming of the `opencv` package configuration file:

View File

@@ -99,7 +99,6 @@ their descriptions.
| Scale Latents | Scales latents by a given factor. |
| Segment Anything Processor | Applies segment anything processing to image |
| Show Image | Displays a provided image, and passes it forward in the pipeline. |
| Step Param Easing | Experimental per-step parameter easing for denoising steps |
| String Primitive Collection | A collection of string primitive values |
| String Primitive | A string primitive value |
| Subtract Integers | Subtracts two numbers |

View File

@@ -12,7 +12,7 @@ MINIMUM_PYTHON_VERSION=3.10.0
MAXIMUM_PYTHON_VERSION=3.11.100
PYTHON=""
for candidate in python3.11 python3.10 python3 python ; do
if ppath=`which $candidate`; then
if ppath=`which $candidate 2>/dev/null`; then
# when using `pyenv`, the executable for an inactive Python version will exist but will not be operational
# we check that this found executable can actually run
if [ $($candidate --version &>/dev/null; echo ${PIPESTATUS}) -gt 0 ]; then continue; fi
@@ -30,10 +30,11 @@ done
if [ -z "$PYTHON" ]; then
echo "A suitable Python interpreter could not be found"
echo "Please install Python $MINIMUM_PYTHON_VERSION or higher (maximum $MAXIMUM_PYTHON_VERSION) before running this script. See instructions at $INSTRUCTIONS for help."
echo "For the best user experience we suggest enlarging or maximizing this window now."
read -p "Press any key to exit"
exit -1
fi
echo "For the best user experience we suggest enlarging or maximizing this window now."
exec $PYTHON ./lib/main.py ${@}
read -p "Press any key to exit"

View File

@@ -245,6 +245,9 @@ class InvokeAiInstance:
pip = local[self.pip]
# Uninstall xformers if it is present; the correct version of it will be reinstalled if needed
_ = pip["uninstall", "-yqq", "xformers"] & FG
pipeline = pip[
"install",
"--require-virtualenv",
@@ -407,7 +410,7 @@ def get_torch_source() -> Tuple[str | None, str | None]:
optional_modules: str | None = None
if OS == "Linux":
if device == GpuType.ROCM:
url = "https://download.pytorch.org/whl/rocm5.6"
url = "https://download.pytorch.org/whl/rocm6.1"
elif device == GpuType.CPU:
url = "https://download.pytorch.org/whl/cpu"
elif device == GpuType.CUDA:

View File

@@ -259,7 +259,7 @@ def select_gpu() -> GpuType:
[
f"Detected the [gold1]{OS}-{ARCH}[/] platform",
"",
"See [deep_sky_blue1]https://invoke-ai.github.io/InvokeAI/#system[/] to ensure your system meets the minimum requirements.",
"See [deep_sky_blue1]https://invoke-ai.github.io/InvokeAI/installation/requirements/[/] to ensure your system meets the minimum requirements.",
"",
"[red3]🠶[/] [b]Your GPU drivers must be correctly installed before using InvokeAI![/] [red3]🠴[/]",
]

View File

@@ -68,7 +68,7 @@ do_line_input() {
printf "2: Open the developer console\n"
printf "3: Command-line help\n"
printf "Q: Quit\n\n"
printf "To update, download and run the installer from https://github.com/invoke-ai/InvokeAI/releases/latest.\n\n"
printf "To update, download and run the installer from https://github.com/invoke-ai/InvokeAI/releases/latest\n\n"
read -p "Please enter 1-4, Q: [1] " yn
choice=${yn:='1'}
do_choice $choice

View File

@@ -40,6 +40,8 @@ class AppVersion(BaseModel):
version: str = Field(description="App version")
highlights: Optional[list[str]] = Field(default=None, description="Highlights of release")
class AppDependencyVersions(BaseModel):
"""App depencency Versions Response"""

View File

@@ -1,6 +1,7 @@
# Copyright (c) 2023 Lincoln D. Stein
"""FastAPI route for model configuration records."""
import contextlib
import io
import pathlib
import shutil
@@ -10,6 +11,7 @@ from enum import Enum
from tempfile import TemporaryDirectory
from typing import List, Optional, Type
import huggingface_hub
from fastapi import Body, Path, Query, Response, UploadFile
from fastapi.responses import FileResponse, HTMLResponse
from fastapi.routing import APIRouter
@@ -27,6 +29,7 @@ from invokeai.app.services.model_records import (
ModelRecordChanges,
UnknownModelException,
)
from invokeai.app.util.suppress_output import SuppressOutput
from invokeai.backend.model_manager.config import (
AnyModelConfig,
BaseModelType,
@@ -34,7 +37,7 @@ from invokeai.backend.model_manager.config import (
ModelFormat,
ModelType,
)
from invokeai.backend.model_manager.load.model_cache.model_cache_base import CacheStats
from invokeai.backend.model_manager.load.model_cache.cache_stats import CacheStats
from invokeai.backend.model_manager.metadata.fetch.huggingface import HuggingFaceMetadataFetch
from invokeai.backend.model_manager.metadata.metadata_base import ModelMetadataWithFiles, UnknownMetadataException
from invokeai.backend.model_manager.search import ModelSearch
@@ -808,7 +811,11 @@ def get_is_installed(
for model in installed_models:
if model.source == starter_model.source:
return True
if model.name == starter_model.name and model.base == starter_model.base and model.type == starter_model.type:
if (
(model.name == starter_model.name or model.name in starter_model.previous_names)
and model.base == starter_model.base
and model.type == starter_model.type
):
return True
return False
@@ -919,3 +926,51 @@ async def get_stats() -> Optional[CacheStats]:
"""Return performance statistics on the model manager's RAM cache. Will return null if no models have been loaded."""
return ApiDependencies.invoker.services.model_manager.load.ram_cache.stats
class HFTokenStatus(str, Enum):
VALID = "valid"
INVALID = "invalid"
UNKNOWN = "unknown"
class HFTokenHelper:
@classmethod
def get_status(cls) -> HFTokenStatus:
try:
if huggingface_hub.get_token_permission(huggingface_hub.get_token()):
# Valid token!
return HFTokenStatus.VALID
# No token set
return HFTokenStatus.INVALID
except Exception:
return HFTokenStatus.UNKNOWN
@classmethod
def set_token(cls, token: str) -> HFTokenStatus:
with SuppressOutput(), contextlib.suppress(Exception):
huggingface_hub.login(token=token, add_to_git_credential=False)
return cls.get_status()
@model_manager_router.get("/hf_login", operation_id="get_hf_login_status", response_model=HFTokenStatus)
async def get_hf_login_status() -> HFTokenStatus:
token_status = HFTokenHelper.get_status()
if token_status is HFTokenStatus.UNKNOWN:
ApiDependencies.invoker.services.logger.warning("Unable to verify HF token")
return token_status
@model_manager_router.post("/hf_login", operation_id="do_hf_login", response_model=HFTokenStatus)
async def do_hf_login(
token: str = Body(description="Hugging Face token to use for login", embed=True),
) -> HFTokenStatus:
HFTokenHelper.set_token(token)
token_status = HFTokenHelper.get_status()
if token_status is HFTokenStatus.UNKNOWN:
ApiDependencies.invoker.services.logger.warning("Unable to verify HF token")
return token_status

View File

@@ -110,7 +110,7 @@ async def cancel_by_batch_ids(
@session_queue_router.put(
"/{queue_id}/cancel_by_destination",
operation_id="cancel_by_destination",
responses={200: {"model": CancelByBatchIDsResult}},
responses={200: {"model": CancelByDestinationResult}},
)
async def cancel_by_destination(
queue_id: str = Path(description="The queue id to perform this operation on"),

View File

@@ -4,6 +4,7 @@ from __future__ import annotations
import inspect
import re
import sys
import warnings
from abc import ABC, abstractmethod
from enum import Enum
@@ -62,6 +63,7 @@ class Classification(str, Enum, metaclass=MetaEnum):
- `Prototype`: The invocation is not yet stable and may be removed from the application at any time. Workflows built around this invocation may break, and we are *not* committed to supporting this invocation.
- `Deprecated`: The invocation is deprecated and may be removed in a future version.
- `Internal`: The invocation is not intended for use by end-users. It may be changed or removed at any time, but is exposed for users to play with.
- `Special`: The invocation is a special case and does not fit into any of the other classifications.
"""
Stable = "stable"
@@ -69,6 +71,7 @@ class Classification(str, Enum, metaclass=MetaEnum):
Prototype = "prototype"
Deprecated = "deprecated"
Internal = "internal"
Special = "special"
class UIConfigBase(BaseModel):
@@ -192,12 +195,19 @@ class BaseInvocation(ABC, BaseModel):
"""Gets a pydantc TypeAdapter for the union of all invocation types."""
if not cls._typeadapter or cls._typeadapter_needs_update:
AnyInvocation = TypeAliasType(
"AnyInvocation", Annotated[Union[tuple(cls._invocation_classes)], Field(discriminator="type")]
"AnyInvocation", Annotated[Union[tuple(cls.get_invocations())], Field(discriminator="type")]
)
cls._typeadapter = TypeAdapter(AnyInvocation)
cls._typeadapter_needs_update = False
return cls._typeadapter
@classmethod
def invalidate_typeadapter(cls) -> None:
"""Invalidates the typeadapter, forcing it to be rebuilt on next access. If the invocation allowlist or
denylist is changed, this should be called to ensure the typeadapter is updated and validation respects
the updated allowlist and denylist."""
cls._typeadapter_needs_update = True
@classmethod
def get_invocations(cls) -> Iterable[BaseInvocation]:
"""Gets all invocations, respecting the allowlist and denylist."""
@@ -479,6 +489,26 @@ def invocation(
title="type", default=invocation_type, json_schema_extra={"field_kind": FieldKind.NodeAttribute}
)
# Validate the `invoke()` method is implemented
if "invoke" in cls.__abstractmethods__:
raise ValueError(f'Invocation "{invocation_type}" must implement the "invoke" method')
# And validate that `invoke()` returns a subclass of `BaseInvocationOutput
invoke_return_annotation = signature(cls.invoke).return_annotation
try:
# TODO(psyche): If `invoke()` is not defined, `return_annotation` ends up as the string "BaseInvocationOutput"
# instead of the class `BaseInvocationOutput`. This may be a pydantic bug: https://github.com/pydantic/pydantic/issues/7978
if isinstance(invoke_return_annotation, str):
invoke_return_annotation = getattr(sys.modules[cls.__module__], invoke_return_annotation)
assert invoke_return_annotation is not BaseInvocationOutput
assert issubclass(invoke_return_annotation, BaseInvocationOutput)
except Exception:
raise ValueError(
f'Invocation "{invocation_type}" must have a return annotation of a subclass of BaseInvocationOutput (got "{invoke_return_annotation}")'
)
docstring = cls.__doc__
cls = create_model(
cls.__qualname__,

View File

@@ -1,98 +1,120 @@
from typing import Any, Union
from typing import Optional, Union
import numpy as np
import numpy.typing as npt
import torch
import torchvision.transforms as T
from PIL import Image
from torchvision.transforms.functional import resize as tv_resize
from invokeai.app.invocations.baseinvocation import BaseInvocation, invocation
from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField, LatentsField
from invokeai.app.invocations.fields import FieldDescriptions, ImageField, Input, InputField, LatentsField
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.stable_diffusion.diffusers_pipeline import image_resized_to_grid_as_tensor
from invokeai.backend.util.devices import TorchDevice
def slerp(
t: Union[float, np.ndarray],
v0: Union[torch.Tensor, np.ndarray],
v1: Union[torch.Tensor, np.ndarray],
device: torch.device,
DOT_THRESHOLD: float = 0.9995,
):
"""
Spherical linear interpolation
Args:
t (float/np.ndarray): Float value between 0.0 and 1.0
v0 (np.ndarray): Starting vector
v1 (np.ndarray): Final vector
DOT_THRESHOLD (float): Threshold for considering the two vectors as
colineal. Not recommended to alter this.
Returns:
v2 (np.ndarray): Interpolation vector between v0 and v1
"""
inputs_are_torch = False
if not isinstance(v0, np.ndarray):
inputs_are_torch = True
v0 = v0.detach().cpu().numpy()
if not isinstance(v1, np.ndarray):
inputs_are_torch = True
v1 = v1.detach().cpu().numpy()
dot = np.sum(v0 * v1 / (np.linalg.norm(v0) * np.linalg.norm(v1)))
if np.abs(dot) > DOT_THRESHOLD:
v2 = (1 - t) * v0 + t * v1
else:
theta_0 = np.arccos(dot)
sin_theta_0 = np.sin(theta_0)
theta_t = theta_0 * t
sin_theta_t = np.sin(theta_t)
s0 = np.sin(theta_0 - theta_t) / sin_theta_0
s1 = sin_theta_t / sin_theta_0
v2 = s0 * v0 + s1 * v1
if inputs_are_torch:
v2 = torch.from_numpy(v2).to(device)
return v2
@invocation(
"lblend",
title="Blend Latents",
tags=["latents", "blend"],
tags=["latents", "blend", "mask"],
category="latents",
version="1.0.3",
version="1.1.0",
)
class BlendLatentsInvocation(BaseInvocation):
"""Blend two latents using a given alpha. Latents must have same size."""
"""Blend two latents using a given alpha. If a mask is provided, the second latents will be masked before blending.
Latents must have same size. Masking functionality added by @dwringer."""
latents_a: LatentsField = InputField(
description=FieldDescriptions.latents,
input=Input.Connection,
)
latents_b: LatentsField = InputField(
description=FieldDescriptions.latents,
input=Input.Connection,
)
alpha: float = InputField(default=0.5, description=FieldDescriptions.blend_alpha)
latents_a: LatentsField = InputField(description=FieldDescriptions.latents, input=Input.Connection)
latents_b: LatentsField = InputField(description=FieldDescriptions.latents, input=Input.Connection)
mask: Optional[ImageField] = InputField(default=None, description="Mask for blending in latents B")
alpha: float = InputField(ge=0, default=0.5, description=FieldDescriptions.blend_alpha)
def prep_mask_tensor(self, mask_image: Image.Image) -> torch.Tensor:
if mask_image.mode != "L":
mask_image = mask_image.convert("L")
mask_tensor = image_resized_to_grid_as_tensor(mask_image, normalize=False)
if mask_tensor.dim() == 3:
mask_tensor = mask_tensor.unsqueeze(0)
return mask_tensor
def replace_tensor_from_masked_tensor(
self, tensor: torch.Tensor, other_tensor: torch.Tensor, mask_tensor: torch.Tensor
):
output = tensor.clone()
mask_tensor = mask_tensor.expand(output.shape)
if output.dtype != torch.float16:
output = torch.add(output, mask_tensor * torch.sub(other_tensor, tensor))
else:
output = torch.add(output, mask_tensor.half() * torch.sub(other_tensor, tensor))
return output
def invoke(self, context: InvocationContext) -> LatentsOutput:
latents_a = context.tensors.load(self.latents_a.latents_name)
latents_b = context.tensors.load(self.latents_b.latents_name)
if self.mask is None:
mask_tensor = torch.zeros(latents_a.shape[-2:])
else:
mask_tensor = self.prep_mask_tensor(context.images.get_pil(self.mask.image_name))
mask_tensor = tv_resize(mask_tensor, latents_a.shape[-2:], T.InterpolationMode.BILINEAR, antialias=False)
latents_b = self.replace_tensor_from_masked_tensor(latents_b, latents_a, mask_tensor)
if latents_a.shape != latents_b.shape:
raise Exception("Latents to blend must be the same size.")
raise ValueError("Latents to blend must be the same size.")
device = TorchDevice.choose_torch_device()
def slerp(
t: Union[float, npt.NDArray[Any]], # FIXME: maybe use np.float32 here?
v0: Union[torch.Tensor, npt.NDArray[Any]],
v1: Union[torch.Tensor, npt.NDArray[Any]],
DOT_THRESHOLD: float = 0.9995,
) -> Union[torch.Tensor, npt.NDArray[Any]]:
"""
Spherical linear interpolation
Args:
t (float/np.ndarray): Float value between 0.0 and 1.0
v0 (np.ndarray): Starting vector
v1 (np.ndarray): Final vector
DOT_THRESHOLD (float): Threshold for considering the two vectors as
colineal. Not recommended to alter this.
Returns:
v2 (np.ndarray): Interpolation vector between v0 and v1
"""
inputs_are_torch = False
if not isinstance(v0, np.ndarray):
inputs_are_torch = True
v0 = v0.detach().cpu().numpy()
if not isinstance(v1, np.ndarray):
inputs_are_torch = True
v1 = v1.detach().cpu().numpy()
dot = np.sum(v0 * v1 / (np.linalg.norm(v0) * np.linalg.norm(v1)))
if np.abs(dot) > DOT_THRESHOLD:
v2 = (1 - t) * v0 + t * v1
else:
theta_0 = np.arccos(dot)
sin_theta_0 = np.sin(theta_0)
theta_t = theta_0 * t
sin_theta_t = np.sin(theta_t)
s0 = np.sin(theta_0 - theta_t) / sin_theta_0
s1 = sin_theta_t / sin_theta_0
v2 = s0 * v0 + s1 * v1
if inputs_are_torch:
v2_torch: torch.Tensor = torch.from_numpy(v2).to(device)
return v2_torch
else:
assert isinstance(v2, np.ndarray)
return v2
# blend
bl = slerp(self.alpha, latents_a, latents_b)
assert isinstance(bl, torch.Tensor)
blended_latents: torch.Tensor = bl # for type checking convenience
blended_latents = slerp(self.alpha, latents_a, latents_b, device)
# https://discuss.huggingface.co/t/memory-usage-by-later-pipeline-stages/23699
blended_latents = blended_latents.to("cpu")
TorchDevice.empty_cache()
torch.cuda.empty_cache()
name = context.tensors.save(tensor=blended_latents)
return LatentsOutput.build(latents_name=name, latents=blended_latents, seed=self.latents_a.seed)
return LatentsOutput.build(latents_name=name, latents=blended_latents)

View File

@@ -82,10 +82,11 @@ class CompelInvocation(BaseInvocation):
# apply all patches while the model is on the target device
text_encoder_info.model_on_device() as (cached_weights, text_encoder),
tokenizer_info as tokenizer,
LoRAPatcher.apply_lora_patches(
LoRAPatcher.apply_smart_lora_patches(
model=text_encoder,
patches=_lora_loader(),
prefix="lora_te_",
dtype=TorchDevice.choose_torch_dtype(),
cached_weights=cached_weights,
),
# Apply CLIP Skip after LoRA to prevent LoRA application from failing on skipped layers.
@@ -95,6 +96,7 @@ class CompelInvocation(BaseInvocation):
ti_manager,
),
):
context.util.signal_progress("Building conditioning")
assert isinstance(text_encoder, CLIPTextModel)
assert isinstance(tokenizer, CLIPTokenizer)
compel = Compel(
@@ -178,10 +180,11 @@ class SDXLPromptInvocationBase:
# apply all patches while the model is on the target device
text_encoder_info.model_on_device() as (cached_weights, text_encoder),
tokenizer_info as tokenizer,
LoRAPatcher.apply_lora_patches(
LoRAPatcher.apply_smart_lora_patches(
text_encoder,
patches=_lora_loader(),
prefix=lora_prefix,
dtype=TorchDevice.choose_torch_dtype(),
cached_weights=cached_weights,
),
# Apply CLIP Skip after LoRA to prevent LoRA application from failing on skipped layers.
@@ -191,6 +194,7 @@ class SDXLPromptInvocationBase:
ti_manager,
),
):
context.util.signal_progress("Building conditioning")
assert isinstance(text_encoder, (CLIPTextModel, CLIPTextModelWithProjection))
assert isinstance(tokenizer, CLIPTokenizer)

File diff suppressed because it is too large Load Diff

View File

@@ -65,6 +65,7 @@ class CreateDenoiseMaskInvocation(BaseInvocation):
img_mask = tv_resize(mask, image_tensor.shape[-2:], T.InterpolationMode.BILINEAR, antialias=False)
masked_image = image_tensor * torch.where(img_mask < 0.5, 0.0, 1.0)
# TODO:
context.util.signal_progress("Running VAE encoder")
masked_latents = ImageToLatentsInvocation.vae_encode(vae_info, self.fp32, self.tiled, masked_image.clone())
masked_latents_name = context.tensors.save(tensor=masked_latents)

View File

@@ -131,6 +131,7 @@ class CreateGradientMaskInvocation(BaseInvocation):
image_tensor = image_tensor.unsqueeze(0)
img_mask = tv_resize(mask, image_tensor.shape[-2:], T.InterpolationMode.BILINEAR, antialias=False)
masked_image = image_tensor * torch.where(img_mask < 0.5, 0.0, 1.0)
context.util.signal_progress("Running VAE encoder")
masked_latents = ImageToLatentsInvocation.vae_encode(
vae_info, self.fp32, self.tiled, masked_image.clone()
)

View File

@@ -13,6 +13,7 @@ from diffusers.models.unets.unet_2d_condition import UNet2DConditionModel
from diffusers.schedulers.scheduling_dpmsolver_sde import DPMSolverSDEScheduler
from diffusers.schedulers.scheduling_tcd import TCDScheduler
from diffusers.schedulers.scheduling_utils import SchedulerMixin as Scheduler
from PIL import Image
from pydantic import field_validator
from torchvision.transforms.functional import resize as tv_resize
from transformers import CLIPVisionModelWithProjection
@@ -510,6 +511,7 @@ class DenoiseLatentsInvocation(BaseInvocation):
context: InvocationContext,
t2i_adapters: Optional[Union[T2IAdapterField, list[T2IAdapterField]]],
ext_manager: ExtensionsManager,
bgr_mode: bool = False,
) -> None:
if t2i_adapters is None:
return
@@ -519,6 +521,10 @@ class DenoiseLatentsInvocation(BaseInvocation):
t2i_adapters = [t2i_adapters]
for t2i_adapter_field in t2i_adapters:
image = context.images.get_pil(t2i_adapter_field.image.image_name)
if bgr_mode: # SDXL t2i trained on cv2's BGR outputs, but PIL won't convert straight to BGR
r, g, b = image.split()
image = Image.merge("RGB", (b, g, r))
ext_manager.add_extension(
T2IAdapterExt(
node_context=context,
@@ -547,7 +553,9 @@ class DenoiseLatentsInvocation(BaseInvocation):
if not isinstance(single_ipa_image_fields, list):
single_ipa_image_fields = [single_ipa_image_fields]
single_ipa_images = [context.images.get_pil(image.image_name) for image in single_ipa_image_fields]
single_ipa_images = [
context.images.get_pil(image.image_name, mode="RGB") for image in single_ipa_image_fields
]
with image_encoder_model_info as image_encoder_model:
assert isinstance(image_encoder_model, CLIPVisionModelWithProjection)
# Get image embeddings from CLIP and ImageProjModel.
@@ -614,13 +622,17 @@ class DenoiseLatentsInvocation(BaseInvocation):
for t2i_adapter_field in t2i_adapter:
t2i_adapter_model_config = context.models.get_config(t2i_adapter_field.t2i_adapter_model.key)
t2i_adapter_loaded_model = context.models.load(t2i_adapter_field.t2i_adapter_model)
image = context.images.get_pil(t2i_adapter_field.image.image_name)
image = context.images.get_pil(t2i_adapter_field.image.image_name, mode="RGB")
# The max_unet_downscale is the maximum amount that the UNet model downscales the latent image internally.
if t2i_adapter_model_config.base == BaseModelType.StableDiffusion1:
max_unet_downscale = 8
elif t2i_adapter_model_config.base == BaseModelType.StableDiffusionXL:
max_unet_downscale = 4
# SDXL adapters are trained on cv2's BGR outputs
r, g, b = image.split()
image = Image.merge("RGB", (b, g, r))
else:
raise ValueError(f"Unexpected T2I-Adapter base model type: '{t2i_adapter_model_config.base}'.")
@@ -628,29 +640,39 @@ class DenoiseLatentsInvocation(BaseInvocation):
with t2i_adapter_loaded_model as t2i_adapter_model:
total_downscale_factor = t2i_adapter_model.total_downscale_factor
# Resize the T2I-Adapter input image.
# We select the resize dimensions so that after the T2I-Adapter's total_downscale_factor is applied, the
# result will match the latent image's dimensions after max_unet_downscale is applied.
t2i_input_height = latents_shape[2] // max_unet_downscale * total_downscale_factor
t2i_input_width = latents_shape[3] // max_unet_downscale * total_downscale_factor
# Note: We have hard-coded `do_classifier_free_guidance=False`. This is because we only want to prepare
# a single image. If CFG is enabled, we will duplicate the resultant tensor after applying the
# T2I-Adapter model.
#
# Note: We re-use the `prepare_control_image(...)` from ControlNet for T2I-Adapter, because it has many
# of the same requirements (e.g. preserving binary masks during resize).
# Assuming fixed dimensional scaling of LATENT_SCALE_FACTOR.
_, _, latent_height, latent_width = latents_shape
control_height_resize = latent_height * LATENT_SCALE_FACTOR
control_width_resize = latent_width * LATENT_SCALE_FACTOR
t2i_image = prepare_control_image(
image=image,
do_classifier_free_guidance=False,
width=t2i_input_width,
height=t2i_input_height,
width=control_width_resize,
height=control_height_resize,
num_channels=t2i_adapter_model.config["in_channels"], # mypy treats this as a FrozenDict
device=t2i_adapter_model.device,
dtype=t2i_adapter_model.dtype,
resize_mode=t2i_adapter_field.resize_mode,
)
# Resize the T2I-Adapter input image.
# We select the resize dimensions so that after the T2I-Adapter's total_downscale_factor is applied, the
# result will match the latent image's dimensions after max_unet_downscale is applied.
# We crop the image to this size so that the positions match the input image on non-standard resolutions
t2i_input_height = latents_shape[2] // max_unet_downscale * total_downscale_factor
t2i_input_width = latents_shape[3] // max_unet_downscale * total_downscale_factor
if t2i_image.shape[2] > t2i_input_height or t2i_image.shape[3] > t2i_input_width:
t2i_image = t2i_image[
:, :, : min(t2i_image.shape[2], t2i_input_height), : min(t2i_image.shape[3], t2i_input_width)
]
adapter_state = t2i_adapter_model(t2i_image)
if do_classifier_free_guidance:
@@ -898,7 +920,8 @@ class DenoiseLatentsInvocation(BaseInvocation):
# ext = extension_field.to_extension(exit_stack, context, ext_manager)
# ext_manager.add_extension(ext)
self.parse_controlnet_field(exit_stack, context, self.control, ext_manager)
self.parse_t2i_adapter_field(exit_stack, context, self.t2i_adapter, ext_manager)
bgr_mode = self.unet.unet.base == BaseModelType.StableDiffusionXL
self.parse_t2i_adapter_field(exit_stack, context, self.t2i_adapter, ext_manager, bgr_mode)
# ext: t2i/ip adapter
ext_manager.run_callback(ExtensionCallbackType.SETUP, denoise_ctx)
@@ -980,10 +1003,11 @@ class DenoiseLatentsInvocation(BaseInvocation):
ModelPatcher.apply_freeu(unet, self.unet.freeu_config),
SeamlessExt.static_patch_model(unet, self.unet.seamless_axes), # FIXME
# Apply the LoRA after unet has been moved to its target device for faster patching.
LoRAPatcher.apply_lora_patches(
LoRAPatcher.apply_smart_lora_patches(
model=unet,
patches=_lora_loader(),
prefix="lora_unet_",
dtype=unet.dtype,
cached_weights=cached_weights,
),
):

View File

@@ -41,6 +41,7 @@ class UIType(str, Enum, metaclass=MetaEnum):
# region Model Field Types
MainModel = "MainModelField"
FluxMainModel = "FluxMainModelField"
SD3MainModel = "SD3MainModelField"
SDXLMainModel = "SDXLMainModelField"
SDXLRefinerModel = "SDXLRefinerModelField"
ONNXModel = "ONNXModelField"
@@ -52,6 +53,8 @@ class UIType(str, Enum, metaclass=MetaEnum):
T2IAdapterModel = "T2IAdapterModelField"
T5EncoderModel = "T5EncoderModelField"
CLIPEmbedModel = "CLIPEmbedModelField"
CLIPLEmbedModel = "CLIPLEmbedModelField"
CLIPGEmbedModel = "CLIPGEmbedModelField"
SpandrelImageToImageModel = "SpandrelImageToImageModelField"
# endregion
@@ -131,8 +134,10 @@ class FieldDescriptions:
clip = "CLIP (tokenizer, text encoder, LoRAs) and skipped layer count"
t5_encoder = "T5 tokenizer and text encoder"
clip_embed_model = "CLIP Embed loader"
clip_g_model = "CLIP-G Embed loader"
unet = "UNet (scheduler, LoRAs)"
transformer = "Transformer"
mmditx = "MMDiTX"
vae = "VAE"
cond = "Conditioning tensor"
controlnet_model = "ControlNet model to load"
@@ -140,6 +145,7 @@ class FieldDescriptions:
lora_model = "LoRA model to load"
main_model = "Main model (UNet, VAE, CLIP) to load"
flux_model = "Flux model (Transformer) to load"
sd3_model = "SD3 model (MMDiTX) to load"
sdxl_main_model = "SDXL Main model (UNet, VAE, CLIP1, CLIP2) to load"
sdxl_refiner_model = "SDXL Refiner Main Modde (UNet, VAE, CLIP2) to load"
onnx_main_model = "ONNX Main model (UNet, VAE, CLIP) to load"
@@ -244,6 +250,17 @@ class FluxConditioningField(BaseModel):
"""A conditioning tensor primitive value"""
conditioning_name: str = Field(description="The name of conditioning tensor")
mask: Optional[TensorField] = Field(
default=None,
description="The mask associated with this conditioning tensor. Excluded regions should be set to False, "
"included regions should be set to True.",
)
class SD3ConditioningField(BaseModel):
"""A conditioning tensor primitive value"""
conditioning_name: str = Field(description="The name of conditioning tensor")
class ConditioningField(BaseModel):

View File

@@ -1,15 +1,19 @@
from contextlib import ExitStack
from typing import Callable, Iterator, Optional, Tuple
import numpy as np
import numpy.typing as npt
import torch
import torchvision.transforms as tv_transforms
from torchvision.transforms.functional import resize as tv_resize
from transformers import CLIPImageProcessor, CLIPVisionModelWithProjection
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.fields import (
DenoiseMaskField,
FieldDescriptions,
FluxConditioningField,
ImageField,
Input,
InputField,
LatentsField,
@@ -17,6 +21,7 @@ from invokeai.app.invocations.fields import (
WithMetadata,
)
from invokeai.app.invocations.flux_controlnet import FluxControlNetField
from invokeai.app.invocations.ip_adapter import IPAdapterField
from invokeai.app.invocations.model import TransformerField, VAEField
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
@@ -25,7 +30,10 @@ from invokeai.backend.flux.controlnet.xlabs_controlnet_flux import XLabsControlN
from invokeai.backend.flux.denoise import denoise
from invokeai.backend.flux.extensions.inpaint_extension import InpaintExtension
from invokeai.backend.flux.extensions.instantx_controlnet_extension import InstantXControlNetExtension
from invokeai.backend.flux.extensions.regional_prompting_extension import RegionalPromptingExtension
from invokeai.backend.flux.extensions.xlabs_controlnet_extension import XLabsControlNetExtension
from invokeai.backend.flux.extensions.xlabs_ip_adapter_extension import XLabsIPAdapterExtension
from invokeai.backend.flux.ip_adapter.xlabs_ip_adapter_flux import XlabsIpAdapterFlux
from invokeai.backend.flux.model import Flux
from invokeai.backend.flux.sampling_utils import (
clip_timestep_schedule_fractional,
@@ -35,6 +43,7 @@ from invokeai.backend.flux.sampling_utils import (
pack,
unpack,
)
from invokeai.backend.flux.text_conditioning import FluxTextConditioning
from invokeai.backend.lora.conversions.flux_lora_constants import FLUX_LORA_TRANSFORMER_PREFIX
from invokeai.backend.lora.lora_model_raw import LoRAModelRaw
from invokeai.backend.lora.lora_patcher import LoRAPatcher
@@ -49,7 +58,7 @@ from invokeai.backend.util.devices import TorchDevice
title="FLUX Denoise",
tags=["image", "flux"],
category="image",
version="3.1.0",
version="3.2.2",
classification=Classification.Prototype,
)
class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
@@ -74,14 +83,33 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
description=FieldDescriptions.denoising_start,
)
denoising_end: float = InputField(default=1.0, ge=0, le=1, description=FieldDescriptions.denoising_end)
add_noise: bool = InputField(default=True, description="Add noise based on denoising start.")
transformer: TransformerField = InputField(
description=FieldDescriptions.flux_model,
input=Input.Connection,
title="Transformer",
)
positive_text_conditioning: FluxConditioningField = InputField(
positive_text_conditioning: FluxConditioningField | list[FluxConditioningField] = InputField(
description=FieldDescriptions.positive_cond, input=Input.Connection
)
negative_text_conditioning: FluxConditioningField | list[FluxConditioningField] | None = InputField(
default=None,
description="Negative conditioning tensor. Can be None if cfg_scale is 1.0.",
input=Input.Connection,
)
cfg_scale: float | list[float] = InputField(default=1.0, description=FieldDescriptions.cfg_scale, title="CFG Scale")
cfg_scale_start_step: int = InputField(
default=0,
title="CFG Scale Start Step",
description="Index of the first step to apply cfg_scale. Negative indices count backwards from the "
+ "the last step (e.g. a value of -1 refers to the final step).",
)
cfg_scale_end_step: int = InputField(
default=-1,
title="CFG Scale End Step",
description="Index of the last step to apply cfg_scale. Negative indices count backwards from the "
+ "last step (e.g. a value of -1 refers to the final step).",
)
width: int = InputField(default=1024, multiple_of=16, description="Width of the generated image.")
height: int = InputField(default=1024, multiple_of=16, description="Height of the generated image.")
num_steps: int = InputField(
@@ -96,10 +124,15 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
default=None, input=Input.Connection, description="ControlNet models."
)
controlnet_vae: VAEField | None = InputField(
default=None,
description=FieldDescriptions.vae,
input=Input.Connection,
)
ip_adapter: IPAdapterField | list[IPAdapterField] | None = InputField(
description=FieldDescriptions.ip_adapter, title="IP-Adapter", default=None, input=Input.Connection
)
@torch.no_grad()
def invoke(self, context: InvocationContext) -> LatentsOutput:
latents = self._run_diffusion(context)
@@ -114,15 +147,6 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
):
inference_dtype = torch.bfloat16
# Load the conditioning data.
cond_data = context.conditioning.load(self.positive_text_conditioning.conditioning_name)
assert len(cond_data.conditionings) == 1
flux_conditioning = cond_data.conditionings[0]
assert isinstance(flux_conditioning, FLUXConditioningInfo)
flux_conditioning = flux_conditioning.to(dtype=inference_dtype)
t5_embeddings = flux_conditioning.t5_embeds
clip_embeddings = flux_conditioning.clip_embeds
# Load the input latents, if provided.
init_latents = context.tensors.load(self.latents.latents_name) if self.latents else None
if init_latents is not None:
@@ -137,15 +161,45 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
dtype=inference_dtype,
seed=self.seed,
)
b, _c, latent_h, latent_w = noise.shape
packed_h = latent_h // 2
packed_w = latent_w // 2
# Load the conditioning data.
pos_text_conditionings = self._load_text_conditioning(
context=context,
cond_field=self.positive_text_conditioning,
packed_height=packed_h,
packed_width=packed_w,
dtype=inference_dtype,
device=TorchDevice.choose_torch_device(),
)
neg_text_conditionings: list[FluxTextConditioning] | None = None
if self.negative_text_conditioning is not None:
neg_text_conditionings = self._load_text_conditioning(
context=context,
cond_field=self.negative_text_conditioning,
packed_height=packed_h,
packed_width=packed_w,
dtype=inference_dtype,
device=TorchDevice.choose_torch_device(),
)
pos_regional_prompting_extension = RegionalPromptingExtension.from_text_conditioning(
pos_text_conditionings, img_seq_len=packed_h * packed_w
)
neg_regional_prompting_extension = (
RegionalPromptingExtension.from_text_conditioning(neg_text_conditionings, img_seq_len=packed_h * packed_w)
if neg_text_conditionings
else None
)
transformer_info = context.models.load(self.transformer.transformer)
is_schnell = "schnell" in transformer_info.config.config_path
# Calculate the timestep schedule.
image_seq_len = noise.shape[-1] * noise.shape[-2] // 4
timesteps = get_schedule(
num_steps=self.num_steps,
image_seq_len=image_seq_len,
image_seq_len=packed_h * packed_w,
shift=not is_schnell,
)
@@ -162,9 +216,12 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
"to be poor. Consider using a FLUX dev model instead."
)
# Noise the orig_latents by the appropriate amount for the first timestep.
t_0 = timesteps[0]
x = t_0 * noise + (1.0 - t_0) * init_latents
if self.add_noise:
# Noise the orig_latents by the appropriate amount for the first timestep.
t_0 = timesteps[0]
x = t_0 * noise + (1.0 - t_0) * init_latents
else:
x = init_latents
else:
# init_latents are not provided, so we are not doing image-to-image (i.e. we are starting from pure noise).
if self.denoising_start > 1e-5:
@@ -179,20 +236,17 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
inpaint_mask = self._prep_inpaint_mask(context, x)
b, _c, latent_h, latent_w = x.shape
img_ids = generate_img_ids(h=latent_h, w=latent_w, batch_size=b, device=x.device, dtype=x.dtype)
bs, t5_seq_len, _ = t5_embeddings.shape
txt_ids = torch.zeros(bs, t5_seq_len, 3, dtype=inference_dtype, device=TorchDevice.choose_torch_device())
# Pack all latent tensors.
init_latents = pack(init_latents) if init_latents is not None else None
inpaint_mask = pack(inpaint_mask) if inpaint_mask is not None else None
noise = pack(noise)
x = pack(x)
# Now that we have 'packed' the latent tensors, verify that we calculated the image_seq_len correctly.
assert image_seq_len == x.shape[1]
# Now that we have 'packed' the latent tensors, verify that we calculated the image_seq_len, packed_h, and
# packed_w correctly.
assert packed_h * packed_w == x.shape[1]
# Prepare inpaint extension.
inpaint_extension: InpaintExtension | None = None
@@ -204,6 +258,21 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
noise=noise,
)
# Compute the IP-Adapter image prompt clip embeddings.
# We do this before loading other models to minimize peak memory.
# TODO(ryand): We should really do this in a separate invocation to benefit from caching.
ip_adapter_fields = self._normalize_ip_adapter_fields()
pos_image_prompt_clip_embeds, neg_image_prompt_clip_embeds = self._prep_ip_adapter_image_prompt_clip_embeds(
ip_adapter_fields, context
)
cfg_scale = self.prep_cfg_scale(
cfg_scale=self.cfg_scale,
timesteps=timesteps,
cfg_scale_start_step=self.cfg_scale_start_step,
cfg_scale_end_step=self.cfg_scale_end_step,
)
with ExitStack() as exit_stack:
# Prepare ControlNet extensions.
# Note: We do this before loading the transformer model to minimize peak memory (see implementation).
@@ -227,10 +296,11 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
if config.format in [ModelFormat.Checkpoint]:
# The model is non-quantized, so we can apply the LoRA weights directly into the model.
exit_stack.enter_context(
LoRAPatcher.apply_lora_patches(
LoRAPatcher.apply_smart_lora_patches(
model=transformer,
patches=self._lora_iterator(context),
prefix=FLUX_LORA_TRANSFORMER_PREFIX,
dtype=inference_dtype,
cached_weights=cached_weights,
)
)
@@ -242,7 +312,7 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
# The model is quantized, so apply the LoRA weights as sidecar layers. This results in slower inference,
# than directly patching the weights, but is agnostic to the quantization format.
exit_stack.enter_context(
LoRAPatcher.apply_lora_sidecar_patches(
LoRAPatcher.apply_lora_wrapper_patches(
model=transformer,
patches=self._lora_iterator(context),
prefix=FLUX_LORA_TRANSFORMER_PREFIX,
@@ -252,23 +322,121 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
else:
raise ValueError(f"Unsupported model format: {config.format}")
# Prepare IP-Adapter extensions.
pos_ip_adapter_extensions, neg_ip_adapter_extensions = self._prep_ip_adapter_extensions(
pos_image_prompt_clip_embeds=pos_image_prompt_clip_embeds,
neg_image_prompt_clip_embeds=neg_image_prompt_clip_embeds,
ip_adapter_fields=ip_adapter_fields,
context=context,
exit_stack=exit_stack,
dtype=inference_dtype,
)
x = denoise(
model=transformer,
img=x,
img_ids=img_ids,
txt=t5_embeddings,
txt_ids=txt_ids,
vec=clip_embeddings,
pos_regional_prompting_extension=pos_regional_prompting_extension,
neg_regional_prompting_extension=neg_regional_prompting_extension,
timesteps=timesteps,
step_callback=self._build_step_callback(context),
guidance=self.guidance,
cfg_scale=cfg_scale,
inpaint_extension=inpaint_extension,
controlnet_extensions=controlnet_extensions,
pos_ip_adapter_extensions=pos_ip_adapter_extensions,
neg_ip_adapter_extensions=neg_ip_adapter_extensions,
)
x = unpack(x.float(), self.height, self.width)
return x
def _load_text_conditioning(
self,
context: InvocationContext,
cond_field: FluxConditioningField | list[FluxConditioningField],
packed_height: int,
packed_width: int,
dtype: torch.dtype,
device: torch.device,
) -> list[FluxTextConditioning]:
"""Load text conditioning data from a FluxConditioningField or a list of FluxConditioningFields."""
# Normalize to a list of FluxConditioningFields.
cond_list = [cond_field] if isinstance(cond_field, FluxConditioningField) else cond_field
text_conditionings: list[FluxTextConditioning] = []
for cond_field in cond_list:
# Load the text embeddings.
cond_data = context.conditioning.load(cond_field.conditioning_name)
assert len(cond_data.conditionings) == 1
flux_conditioning = cond_data.conditionings[0]
assert isinstance(flux_conditioning, FLUXConditioningInfo)
flux_conditioning = flux_conditioning.to(dtype=dtype, device=device)
t5_embeddings = flux_conditioning.t5_embeds
clip_embeddings = flux_conditioning.clip_embeds
# Load the mask, if provided.
mask: Optional[torch.Tensor] = None
if cond_field.mask is not None:
mask = context.tensors.load(cond_field.mask.tensor_name)
mask = mask.to(device=device)
mask = RegionalPromptingExtension.preprocess_regional_prompt_mask(
mask, packed_height, packed_width, dtype, device
)
text_conditionings.append(FluxTextConditioning(t5_embeddings, clip_embeddings, mask))
return text_conditionings
@classmethod
def prep_cfg_scale(
cls, cfg_scale: float | list[float], timesteps: list[float], cfg_scale_start_step: int, cfg_scale_end_step: int
) -> list[float]:
"""Prepare the cfg_scale schedule.
- Clips the cfg_scale schedule based on cfg_scale_start_step and cfg_scale_end_step.
- If cfg_scale is a list, then it is assumed to be a schedule and is returned as-is.
- If cfg_scale is a scalar, then a linear schedule is created from cfg_scale_start_step to cfg_scale_end_step.
"""
# num_steps is the number of denoising steps, which is one less than the number of timesteps.
num_steps = len(timesteps) - 1
# Normalize cfg_scale to a list if it is a scalar.
cfg_scale_list: list[float]
if isinstance(cfg_scale, float):
cfg_scale_list = [cfg_scale] * num_steps
elif isinstance(cfg_scale, list):
cfg_scale_list = cfg_scale
else:
raise ValueError(f"Unsupported cfg_scale type: {type(cfg_scale)}")
assert len(cfg_scale_list) == num_steps
# Handle negative indices for cfg_scale_start_step and cfg_scale_end_step.
start_step_index = cfg_scale_start_step
if start_step_index < 0:
start_step_index = num_steps + start_step_index
end_step_index = cfg_scale_end_step
if end_step_index < 0:
end_step_index = num_steps + end_step_index
# Validate the start and end step indices.
if not (0 <= start_step_index < num_steps):
raise ValueError(f"Invalid cfg_scale_start_step. Out of range: {cfg_scale_start_step}.")
if not (0 <= end_step_index < num_steps):
raise ValueError(f"Invalid cfg_scale_end_step. Out of range: {cfg_scale_end_step}.")
if start_step_index > end_step_index:
raise ValueError(
f"cfg_scale_start_step ({cfg_scale_start_step}) must be before cfg_scale_end_step "
+ f"({cfg_scale_end_step})."
)
# Set values outside the start and end step indices to 1.0. This is equivalent to disabling cfg_scale for those
# steps.
clipped_cfg_scale = [1.0] * num_steps
clipped_cfg_scale[start_step_index : end_step_index + 1] = cfg_scale_list[start_step_index : end_step_index + 1]
return clipped_cfg_scale
def _prep_inpaint_mask(self, context: InvocationContext, latents: torch.Tensor) -> torch.Tensor | None:
"""Prepare the inpaint mask.
@@ -408,6 +576,112 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
return controlnet_extensions
def _normalize_ip_adapter_fields(self) -> list[IPAdapterField]:
if self.ip_adapter is None:
return []
elif isinstance(self.ip_adapter, IPAdapterField):
return [self.ip_adapter]
elif isinstance(self.ip_adapter, list):
return self.ip_adapter
else:
raise ValueError(f"Unsupported IP-Adapter type: {type(self.ip_adapter)}")
def _prep_ip_adapter_image_prompt_clip_embeds(
self,
ip_adapter_fields: list[IPAdapterField],
context: InvocationContext,
) -> tuple[list[torch.Tensor], list[torch.Tensor]]:
"""Run the IPAdapter CLIPVisionModel, returning image prompt embeddings."""
clip_image_processor = CLIPImageProcessor()
pos_image_prompt_clip_embeds: list[torch.Tensor] = []
neg_image_prompt_clip_embeds: list[torch.Tensor] = []
for ip_adapter_field in ip_adapter_fields:
# `ip_adapter_field.image` could be a list or a single ImageField. Normalize to a list here.
ipa_image_fields: list[ImageField]
if isinstance(ip_adapter_field.image, ImageField):
ipa_image_fields = [ip_adapter_field.image]
elif isinstance(ip_adapter_field.image, list):
ipa_image_fields = ip_adapter_field.image
else:
raise ValueError(f"Unsupported IP-Adapter image type: {type(ip_adapter_field.image)}")
if len(ipa_image_fields) != 1:
raise ValueError(
f"FLUX IP-Adapter only supports a single image prompt (received {len(ipa_image_fields)})."
)
ipa_images = [context.images.get_pil(image.image_name, mode="RGB") for image in ipa_image_fields]
pos_images: list[npt.NDArray[np.uint8]] = []
neg_images: list[npt.NDArray[np.uint8]] = []
for ipa_image in ipa_images:
assert ipa_image.mode == "RGB"
pos_image = np.array(ipa_image)
# We use a black image as the negative image prompt for parity with
# https://github.com/XLabs-AI/x-flux-comfyui/blob/45c834727dd2141aebc505ae4b01f193a8414e38/nodes.py#L592-L593
# An alternative scheme would be to apply zeros_like() after calling the clip_image_processor.
neg_image = np.zeros_like(pos_image)
pos_images.append(pos_image)
neg_images.append(neg_image)
with context.models.load(ip_adapter_field.image_encoder_model) as image_encoder_model:
assert isinstance(image_encoder_model, CLIPVisionModelWithProjection)
clip_image: torch.Tensor = clip_image_processor(images=pos_images, return_tensors="pt").pixel_values
clip_image = clip_image.to(device=image_encoder_model.device, dtype=image_encoder_model.dtype)
pos_clip_image_embeds = image_encoder_model(clip_image).image_embeds
clip_image = clip_image_processor(images=neg_images, return_tensors="pt").pixel_values
clip_image = clip_image.to(device=image_encoder_model.device, dtype=image_encoder_model.dtype)
neg_clip_image_embeds = image_encoder_model(clip_image).image_embeds
pos_image_prompt_clip_embeds.append(pos_clip_image_embeds)
neg_image_prompt_clip_embeds.append(neg_clip_image_embeds)
return pos_image_prompt_clip_embeds, neg_image_prompt_clip_embeds
def _prep_ip_adapter_extensions(
self,
ip_adapter_fields: list[IPAdapterField],
pos_image_prompt_clip_embeds: list[torch.Tensor],
neg_image_prompt_clip_embeds: list[torch.Tensor],
context: InvocationContext,
exit_stack: ExitStack,
dtype: torch.dtype,
) -> tuple[list[XLabsIPAdapterExtension], list[XLabsIPAdapterExtension]]:
pos_ip_adapter_extensions: list[XLabsIPAdapterExtension] = []
neg_ip_adapter_extensions: list[XLabsIPAdapterExtension] = []
for ip_adapter_field, pos_image_prompt_clip_embed, neg_image_prompt_clip_embed in zip(
ip_adapter_fields, pos_image_prompt_clip_embeds, neg_image_prompt_clip_embeds, strict=True
):
ip_adapter_model = exit_stack.enter_context(context.models.load(ip_adapter_field.ip_adapter_model))
assert isinstance(ip_adapter_model, XlabsIpAdapterFlux)
ip_adapter_model = ip_adapter_model.to(dtype=dtype)
if ip_adapter_field.mask is not None:
raise ValueError("IP-Adapter masks are not yet supported in Flux.")
ip_adapter_extension = XLabsIPAdapterExtension(
model=ip_adapter_model,
image_prompt_clip_embed=pos_image_prompt_clip_embed,
weight=ip_adapter_field.weight,
begin_step_percent=ip_adapter_field.begin_step_percent,
end_step_percent=ip_adapter_field.end_step_percent,
)
ip_adapter_extension.run_image_proj(dtype=dtype)
pos_ip_adapter_extensions.append(ip_adapter_extension)
ip_adapter_extension = XLabsIPAdapterExtension(
model=ip_adapter_model,
image_prompt_clip_embed=neg_image_prompt_clip_embed,
weight=ip_adapter_field.weight,
begin_step_percent=ip_adapter_field.begin_step_percent,
end_step_percent=ip_adapter_field.end_step_percent,
)
ip_adapter_extension.run_image_proj(dtype=dtype)
neg_ip_adapter_extensions.append(ip_adapter_extension)
return pos_ip_adapter_extensions, neg_ip_adapter_extensions
def _lora_iterator(self, context: InvocationContext) -> Iterator[Tuple[LoRAModelRaw, float]]:
for lora in self.transformer.loras:
lora_info = context.models.load(lora.lora)

View File

@@ -0,0 +1,89 @@
from builtins import float
from typing import List, Literal, Union
from pydantic import field_validator, model_validator
from typing_extensions import Self
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.fields import InputField, UIType
from invokeai.app.invocations.ip_adapter import (
CLIP_VISION_MODEL_MAP,
IPAdapterField,
IPAdapterInvocation,
IPAdapterOutput,
)
from invokeai.app.invocations.model import ModelIdentifierField
from invokeai.app.invocations.primitives import ImageField
from invokeai.app.invocations.util import validate_begin_end_step, validate_weights
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.model_manager.config import (
IPAdapterCheckpointConfig,
IPAdapterInvokeAIConfig,
)
@invocation(
"flux_ip_adapter",
title="FLUX IP-Adapter",
tags=["ip_adapter", "control"],
category="ip_adapter",
version="1.0.0",
classification=Classification.Prototype,
)
class FluxIPAdapterInvocation(BaseInvocation):
"""Collects FLUX IP-Adapter info to pass to other nodes."""
# FLUXIPAdapterInvocation is based closely on IPAdapterInvocation, but with some unsupported features removed.
image: ImageField = InputField(description="The IP-Adapter image prompt(s).")
ip_adapter_model: ModelIdentifierField = InputField(
description="The IP-Adapter model.", title="IP-Adapter Model", ui_type=UIType.IPAdapterModel
)
# Currently, the only known ViT model used by FLUX IP-Adapters is ViT-L.
clip_vision_model: Literal["ViT-L"] = InputField(description="CLIP Vision model to use.", default="ViT-L")
weight: Union[float, List[float]] = InputField(
default=1, description="The weight given to the IP-Adapter", title="Weight"
)
begin_step_percent: float = InputField(
default=0, ge=0, le=1, description="When the IP-Adapter is first applied (% of total steps)"
)
end_step_percent: float = InputField(
default=1, ge=0, le=1, description="When the IP-Adapter is last applied (% of total steps)"
)
@field_validator("weight")
@classmethod
def validate_ip_adapter_weight(cls, v: float) -> float:
validate_weights(v)
return v
@model_validator(mode="after")
def validate_begin_end_step_percent(self) -> Self:
validate_begin_end_step(self.begin_step_percent, self.end_step_percent)
return self
def invoke(self, context: InvocationContext) -> IPAdapterOutput:
# Lookup the CLIP Vision encoder that is intended to be used with the IP-Adapter model.
ip_adapter_info = context.models.get_config(self.ip_adapter_model.key)
assert isinstance(ip_adapter_info, (IPAdapterInvokeAIConfig, IPAdapterCheckpointConfig))
# Note: There is a IPAdapterInvokeAIConfig.image_encoder_model_id field, but it isn't trustworthy.
image_encoder_starter_model = CLIP_VISION_MODEL_MAP[self.clip_vision_model]
image_encoder_model_id = image_encoder_starter_model.source
image_encoder_model_name = image_encoder_starter_model.name
image_encoder_model = IPAdapterInvocation.get_clip_image_encoder(
context, image_encoder_model_id, image_encoder_model_name
)
return IPAdapterOutput(
ip_adapter=IPAdapterField(
image=self.image,
ip_adapter_model=self.ip_adapter_model,
image_encoder_model=ModelIdentifierField.from_config(image_encoder_model),
weight=self.weight,
target_blocks=[], # target_blocks is currently unused for FLUX IP-Adapters.
begin_step_percent=self.begin_step_percent,
end_step_percent=self.end_step_percent,
mask=None, # mask is currently unused for FLUX IP-Adapters.
),
)

View File

@@ -0,0 +1,89 @@
from typing import Literal
from invokeai.app.invocations.baseinvocation import (
BaseInvocation,
BaseInvocationOutput,
Classification,
invocation,
invocation_output,
)
from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField, OutputField, UIType
from invokeai.app.invocations.model import CLIPField, ModelIdentifierField, T5EncoderField, TransformerField, VAEField
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.flux.util import max_seq_lengths
from invokeai.backend.model_manager.config import (
CheckpointConfigBase,
SubModelType,
)
@invocation_output("flux_model_loader_output")
class FluxModelLoaderOutput(BaseInvocationOutput):
"""Flux base model loader output"""
transformer: TransformerField = OutputField(description=FieldDescriptions.transformer, title="Transformer")
clip: CLIPField = OutputField(description=FieldDescriptions.clip, title="CLIP")
t5_encoder: T5EncoderField = OutputField(description=FieldDescriptions.t5_encoder, title="T5 Encoder")
vae: VAEField = OutputField(description=FieldDescriptions.vae, title="VAE")
max_seq_len: Literal[256, 512] = OutputField(
description="The max sequence length to used for the T5 encoder. (256 for schnell transformer, 512 for dev transformer)",
title="Max Seq Length",
)
@invocation(
"flux_model_loader",
title="Flux Main Model",
tags=["model", "flux"],
category="model",
version="1.0.4",
classification=Classification.Prototype,
)
class FluxModelLoaderInvocation(BaseInvocation):
"""Loads a flux base model, outputting its submodels."""
model: ModelIdentifierField = InputField(
description=FieldDescriptions.flux_model,
ui_type=UIType.FluxMainModel,
input=Input.Direct,
)
t5_encoder_model: ModelIdentifierField = InputField(
description=FieldDescriptions.t5_encoder, ui_type=UIType.T5EncoderModel, input=Input.Direct, title="T5 Encoder"
)
clip_embed_model: ModelIdentifierField = InputField(
description=FieldDescriptions.clip_embed_model,
ui_type=UIType.CLIPEmbedModel,
input=Input.Direct,
title="CLIP Embed",
)
vae_model: ModelIdentifierField = InputField(
description=FieldDescriptions.vae_model, ui_type=UIType.FluxVAEModel, title="VAE"
)
def invoke(self, context: InvocationContext) -> FluxModelLoaderOutput:
for key in [self.model.key, self.t5_encoder_model.key, self.clip_embed_model.key, self.vae_model.key]:
if not context.models.exists(key):
raise ValueError(f"Unknown model: {key}")
transformer = self.model.model_copy(update={"submodel_type": SubModelType.Transformer})
vae = self.vae_model.model_copy(update={"submodel_type": SubModelType.VAE})
tokenizer = self.clip_embed_model.model_copy(update={"submodel_type": SubModelType.Tokenizer})
clip_encoder = self.clip_embed_model.model_copy(update={"submodel_type": SubModelType.TextEncoder})
tokenizer2 = self.t5_encoder_model.model_copy(update={"submodel_type": SubModelType.Tokenizer2})
t5_encoder = self.t5_encoder_model.model_copy(update={"submodel_type": SubModelType.TextEncoder2})
transformer_config = context.models.get_config(transformer)
assert isinstance(transformer_config, CheckpointConfigBase)
return FluxModelLoaderOutput(
transformer=TransformerField(transformer=transformer, loras=[]),
clip=CLIPField(tokenizer=tokenizer, text_encoder=clip_encoder, loras=[], skipped_layers=0),
t5_encoder=T5EncoderField(tokenizer=tokenizer2, text_encoder=t5_encoder),
vae=VAEField(vae=vae),
max_seq_len=max_seq_lengths[transformer_config.config_path],
)

View File

@@ -1,11 +1,18 @@
from contextlib import ExitStack
from typing import Iterator, Literal, Tuple
from typing import Iterator, Literal, Optional, Tuple
import torch
from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5Tokenizer
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField
from invokeai.app.invocations.fields import (
FieldDescriptions,
FluxConditioningField,
Input,
InputField,
TensorField,
UIComponent,
)
from invokeai.app.invocations.model import CLIPField, T5EncoderField
from invokeai.app.invocations.primitives import FluxConditioningOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
@@ -15,6 +22,7 @@ from invokeai.backend.lora.lora_model_raw import LoRAModelRaw
from invokeai.backend.lora.lora_patcher import LoRAPatcher
from invokeai.backend.model_manager.config import ModelFormat
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import ConditioningFieldData, FLUXConditioningInfo
from invokeai.backend.util.devices import TorchDevice
@invocation(
@@ -22,7 +30,7 @@ from invokeai.backend.stable_diffusion.diffusion.conditioning_data import Condit
title="FLUX Text Encoding",
tags=["prompt", "conditioning", "flux"],
category="conditioning",
version="1.1.0",
version="1.1.1",
classification=Classification.Prototype,
)
class FluxTextEncoderInvocation(BaseInvocation):
@@ -41,7 +49,10 @@ class FluxTextEncoderInvocation(BaseInvocation):
t5_max_seq_len: Literal[256, 512] = InputField(
description="Max sequence length for the T5 encoder. Expected to be 256 for FLUX schnell models and 512 for FLUX dev models."
)
prompt: str = InputField(description="Text prompt to encode.")
prompt: str = InputField(description="Text prompt to encode.", ui_component=UIComponent.Textarea)
mask: Optional[TensorField] = InputField(
default=None, description="A mask defining the region that this conditioning prompt applies to."
)
@torch.no_grad()
def invoke(self, context: InvocationContext) -> FluxConditioningOutput:
@@ -54,7 +65,9 @@ class FluxTextEncoderInvocation(BaseInvocation):
)
conditioning_name = context.conditioning.save(conditioning_data)
return FluxConditioningOutput.build(conditioning_name)
return FluxConditioningOutput(
conditioning=FluxConditioningField(conditioning_name=conditioning_name, mask=self.mask)
)
def _t5_encode(self, context: InvocationContext) -> torch.Tensor:
t5_tokenizer_info = context.models.load(self.t5_encoder.tokenizer)
@@ -71,6 +84,7 @@ class FluxTextEncoderInvocation(BaseInvocation):
t5_encoder = HFEncoder(t5_text_encoder, t5_tokenizer, False, self.t5_max_seq_len)
context.util.signal_progress("Running T5 encoder")
prompt_embeds = t5_encoder(prompt)
assert isinstance(prompt_embeds, torch.Tensor)
@@ -98,10 +112,11 @@ class FluxTextEncoderInvocation(BaseInvocation):
if clip_text_encoder_config.format in [ModelFormat.Diffusers]:
# The model is non-quantized, so we can apply the LoRA weights directly into the model.
exit_stack.enter_context(
LoRAPatcher.apply_lora_patches(
LoRAPatcher.apply_smart_lora_patches(
model=clip_text_encoder,
patches=self._clip_lora_iterator(context),
prefix=FLUX_LORA_CLIP_PREFIX,
dtype=TorchDevice.choose_torch_dtype(),
cached_weights=cached_weights,
)
)
@@ -111,6 +126,7 @@ class FluxTextEncoderInvocation(BaseInvocation):
clip_encoder = HFEncoder(clip_text_encoder, clip_tokenizer, True, 77)
context.util.signal_progress("Running CLIP encoder")
pooled_prompt_embeds = clip_encoder(prompt)
assert isinstance(pooled_prompt_embeds, torch.Tensor)

View File

@@ -41,7 +41,8 @@ class FluxVaeDecodeInvocation(BaseInvocation, WithMetadata, WithBoard):
def _vae_decode(self, vae_info: LoadedModel, latents: torch.Tensor) -> Image.Image:
with vae_info as vae:
assert isinstance(vae, AutoEncoder)
latents = latents.to(device=TorchDevice.choose_torch_device(), dtype=TorchDevice.choose_torch_dtype())
vae_dtype = next(iter(vae.parameters())).dtype
latents = latents.to(device=TorchDevice.choose_torch_device(), dtype=vae_dtype)
img = vae.decode(latents)
img = img.clamp(-1, 1)
@@ -53,6 +54,7 @@ class FluxVaeDecodeInvocation(BaseInvocation, WithMetadata, WithBoard):
def invoke(self, context: InvocationContext) -> ImageOutput:
latents = context.tensors.load(self.latents.latents_name)
vae_info = context.models.load(self.vae.vae)
context.util.signal_progress("Running VAE")
image = self._vae_decode(vae_info=vae_info, latents=latents)
TorchDevice.empty_cache()

View File

@@ -44,9 +44,8 @@ class FluxVaeEncodeInvocation(BaseInvocation):
generator = torch.Generator(device=TorchDevice.choose_torch_device()).manual_seed(0)
with vae_info as vae:
assert isinstance(vae, AutoEncoder)
image_tensor = image_tensor.to(
device=TorchDevice.choose_torch_device(), dtype=TorchDevice.choose_torch_dtype()
)
vae_dtype = next(iter(vae.parameters())).dtype
image_tensor = image_tensor.to(device=TorchDevice.choose_torch_device(), dtype=vae_dtype)
latents = vae.encode(image_tensor, sample=True, generator=generator)
return latents
@@ -60,6 +59,7 @@ class FluxVaeEncodeInvocation(BaseInvocation):
if image_tensor.dim() == 3:
image_tensor = einops.rearrange(image_tensor, "c h w -> 1 c h w")
context.util.signal_progress("Running VAE")
latents = self.vae_encode(vae_info=vae_info, image_tensor=image_tensor)
latents = latents.to("cpu")

View File

@@ -0,0 +1,59 @@
from pydantic import ValidationInfo, field_validator
from invokeai.app.invocations.baseinvocation import (
BaseInvocation,
BaseInvocationOutput,
Classification,
invocation,
invocation_output,
)
from invokeai.app.invocations.fields import InputField, OutputField
from invokeai.app.services.shared.invocation_context import InvocationContext
@invocation_output("image_panel_coordinate_output")
class ImagePanelCoordinateOutput(BaseInvocationOutput):
x_left: int = OutputField(description="The left x-coordinate of the panel.")
y_top: int = OutputField(description="The top y-coordinate of the panel.")
width: int = OutputField(description="The width of the panel.")
height: int = OutputField(description="The height of the panel.")
@invocation(
"image_panel_layout",
title="Image Panel Layout",
tags=["image", "panel", "layout"],
category="image",
version="1.0.0",
classification=Classification.Prototype,
)
class ImagePanelLayoutInvocation(BaseInvocation):
"""Get the coordinates of a single panel in a grid. (If the full image shape cannot be divided evenly into panels,
then the grid may not cover the entire image.)
"""
width: int = InputField(description="The width of the entire grid.")
height: int = InputField(description="The height of the entire grid.")
num_cols: int = InputField(ge=1, default=1, description="The number of columns in the grid.")
num_rows: int = InputField(ge=1, default=1, description="The number of rows in the grid.")
panel_col_idx: int = InputField(ge=0, default=0, description="The column index of the panel to be processed.")
panel_row_idx: int = InputField(ge=0, default=0, description="The row index of the panel to be processed.")
@field_validator("panel_col_idx")
def validate_panel_col_idx(cls, v: int, info: ValidationInfo) -> int:
if v < 0 or v >= info.data["num_cols"]:
raise ValueError(f"panel_col_idx must be between 0 and {info.data['num_cols'] - 1}")
return v
@field_validator("panel_row_idx")
def validate_panel_row_idx(cls, v: int, info: ValidationInfo) -> int:
if v < 0 or v >= info.data["num_rows"]:
raise ValueError(f"panel_row_idx must be between 0 and {info.data['num_rows'] - 1}")
return v
def invoke(self, context: InvocationContext) -> ImagePanelCoordinateOutput:
x_left = self.panel_col_idx * (self.width // self.num_cols)
y_top = self.panel_row_idx * (self.height // self.num_rows)
width = self.width // self.num_cols
height = self.height // self.num_rows
return ImagePanelCoordinateOutput(x_left=x_left, y_top=y_top, width=width, height=height)

View File

@@ -117,6 +117,7 @@ class ImageToLatentsInvocation(BaseInvocation):
if image_tensor.dim() == 3:
image_tensor = einops.rearrange(image_tensor, "c h w -> 1 c h w")
context.util.signal_progress("Running VAE encoder")
latents = self.vae_encode(
vae_info=vae_info, upcast=self.fp32, tiled=self.tiled, image_tensor=image_tensor, tile_size=self.tile_size
)

View File

@@ -9,6 +9,7 @@ from invokeai.app.invocations.fields import FieldDescriptions, InputField, Outpu
from invokeai.app.invocations.model import ModelIdentifierField
from invokeai.app.invocations.primitives import ImageField
from invokeai.app.invocations.util import validate_begin_end_step, validate_weights
from invokeai.app.services.model_records.model_records_base import ModelRecordChanges
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.model_manager.config import (
AnyModelConfig,
@@ -17,6 +18,12 @@ from invokeai.backend.model_manager.config import (
IPAdapterInvokeAIConfig,
ModelType,
)
from invokeai.backend.model_manager.starter_models import (
StarterModel,
clip_vit_l_image_encoder,
ip_adapter_sd_image_encoder,
ip_adapter_sdxl_image_encoder,
)
class IPAdapterField(BaseModel):
@@ -55,10 +62,14 @@ class IPAdapterOutput(BaseInvocationOutput):
ip_adapter: IPAdapterField = OutputField(description=FieldDescriptions.ip_adapter, title="IP-Adapter")
CLIP_VISION_MODEL_MAP = {"ViT-H": "ip_adapter_sd_image_encoder", "ViT-G": "ip_adapter_sdxl_image_encoder"}
CLIP_VISION_MODEL_MAP: dict[Literal["ViT-L", "ViT-H", "ViT-G"], StarterModel] = {
"ViT-L": clip_vit_l_image_encoder,
"ViT-H": ip_adapter_sd_image_encoder,
"ViT-G": ip_adapter_sdxl_image_encoder,
}
@invocation("ip_adapter", title="IP-Adapter", tags=["ip_adapter", "control"], category="ip_adapter", version="1.4.1")
@invocation("ip_adapter", title="IP-Adapter", tags=["ip_adapter", "control"], category="ip_adapter", version="1.5.0")
class IPAdapterInvocation(BaseInvocation):
"""Collects IP-Adapter info to pass to other nodes."""
@@ -70,7 +81,7 @@ class IPAdapterInvocation(BaseInvocation):
ui_order=-1,
ui_type=UIType.IPAdapterModel,
)
clip_vision_model: Literal["ViT-H", "ViT-G"] = InputField(
clip_vision_model: Literal["ViT-H", "ViT-G", "ViT-L"] = InputField(
description="CLIP Vision model to use. Overrides model settings. Mandatory for checkpoint models.",
default="ViT-H",
ui_order=2,
@@ -111,9 +122,11 @@ class IPAdapterInvocation(BaseInvocation):
image_encoder_model_id = ip_adapter_info.image_encoder_model_id
image_encoder_model_name = image_encoder_model_id.split("/")[-1].strip()
else:
image_encoder_model_name = CLIP_VISION_MODEL_MAP[self.clip_vision_model]
image_encoder_starter_model = CLIP_VISION_MODEL_MAP[self.clip_vision_model]
image_encoder_model_id = image_encoder_starter_model.source
image_encoder_model_name = image_encoder_starter_model.name
image_encoder_model = self._get_image_encoder(context, image_encoder_model_name)
image_encoder_model = self.get_clip_image_encoder(context, image_encoder_model_id, image_encoder_model_name)
if self.method == "style":
if ip_adapter_info.base == "sd-1":
@@ -147,7 +160,10 @@ class IPAdapterInvocation(BaseInvocation):
),
)
def _get_image_encoder(self, context: InvocationContext, image_encoder_model_name: str) -> AnyModelConfig:
@classmethod
def get_clip_image_encoder(
cls, context: InvocationContext, image_encoder_model_id: str, image_encoder_model_name: str
) -> AnyModelConfig:
image_encoder_models = context.models.search_by_attrs(
name=image_encoder_model_name, base=BaseModelType.Any, type=ModelType.CLIPVision
)
@@ -159,7 +175,11 @@ class IPAdapterInvocation(BaseInvocation):
)
installer = context._services.model_manager.install
job = installer.heuristic_import(f"InvokeAI/{image_encoder_model_name}")
# Note: We hard-code the type to CLIPVision here because if the model contains both a CLIPVision and a
# CLIPText model, the probe may treat it as a CLIPText model.
job = installer.heuristic_import(
image_encoder_model_id, ModelRecordChanges(name=image_encoder_model_name, type=ModelType.CLIPVision)
)
installer.wait_for_job(job, timeout=600) # Wait for up to 10 minutes
image_encoder_models = context.models.search_by_attrs(
name=image_encoder_model_name, base=BaseModelType.Any, type=ModelType.CLIPVision

View File

@@ -60,6 +60,7 @@ class LatentsToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
vae_info = context.models.load(self.vae.vae)
assert isinstance(vae_info.model, (AutoencoderKL, AutoencoderTiny))
with SeamlessExt.static_patch_model(vae_info.model, self.vae.seamless_axes), vae_info as vae:
context.util.signal_progress("Running VAE decoder")
assert isinstance(vae, (AutoencoderKL, AutoencoderTiny))
latents = latents.to(vae.device)
if self.fp32:

View File

@@ -5,6 +5,7 @@ from PIL import Image
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, InvocationContext, invocation
from invokeai.app.invocations.fields import ImageField, InputField, TensorField, WithBoard, WithMetadata
from invokeai.app.invocations.primitives import ImageOutput, MaskOutput
from invokeai.backend.image_util.util import pil_to_np
@invocation(
@@ -148,3 +149,55 @@ class MaskTensorToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
mask_pil = Image.fromarray(mask_np, mode="L")
image_dto = context.images.save(image=mask_pil)
return ImageOutput.build(image_dto)
@invocation(
"apply_tensor_mask_to_image",
title="Apply Tensor Mask to Image",
tags=["mask"],
category="mask",
version="1.0.0",
)
class ApplyMaskTensorToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
"""Applies a tensor mask to an image.
The image is converted to RGBA and the mask is applied to the alpha channel."""
mask: TensorField = InputField(description="The mask tensor to apply.")
image: ImageField = InputField(description="The image to apply the mask to.")
invert: bool = InputField(default=False, description="Whether to invert the mask.")
def invoke(self, context: InvocationContext) -> ImageOutput:
image = context.images.get_pil(self.image.image_name, mode="RGBA")
mask = context.tensors.load(self.mask.tensor_name)
# Squeeze the channel dimension if it exists.
if mask.dim() == 3:
mask = mask.squeeze(0)
# Ensure that the mask is binary.
if mask.dtype != torch.bool:
mask = mask > 0.5
mask_np = (mask.float() * 255).byte().cpu().numpy().astype(np.uint8)
if self.invert:
mask_np = 255 - mask_np
# Apply the mask only to the alpha channel where the original alpha is non-zero. This preserves the original
# image's transparency - else the transparent regions would end up as opaque black.
# Separate the image into R, G, B, and A channels
image_np = pil_to_np(image)
r, g, b, a = np.split(image_np, 4, axis=-1)
# Apply the mask to the alpha channel
new_alpha = np.where(a.squeeze() > 0, mask_np, a.squeeze())
# Stack the RGB channels with the modified alpha
masked_image_np = np.dstack([r.squeeze(), g.squeeze(), b.squeeze(), new_alpha])
# Convert back to an image (RGBA)
masked_image = Image.fromarray(masked_image_np.astype(np.uint8), "RGBA")
image_dto = context.images.save(image=masked_image)
return ImageOutput.build(image_dto)

View File

@@ -40,7 +40,7 @@ class IPAdapterMetadataField(BaseModel):
image: ImageField = Field(description="The IP-Adapter image prompt.")
ip_adapter_model: ModelIdentifierField = Field(description="The IP-Adapter model.")
clip_vision_model: Literal["ViT-H", "ViT-G"] = Field(description="The CLIP Vision model")
clip_vision_model: Literal["ViT-L", "ViT-H", "ViT-G"] = Field(description="The CLIP Vision model")
method: Literal["full", "style", "composition"] = Field(description="Method to apply IP Weights with")
weight: Union[float, list[float]] = Field(description="The weight given to the IP-Adapter")
begin_step_percent: float = Field(description="When the IP-Adapter is first applied (% of total steps)")
@@ -147,6 +147,10 @@ GENERATION_MODES = Literal[
"flux_img2img",
"flux_inpaint",
"flux_outpaint",
"sd3_txt2img",
"sd3_img2img",
"sd3_inpaint",
"sd3_outpaint",
]

View File

@@ -1,5 +1,5 @@
import copy
from typing import List, Literal, Optional
from typing import List, Optional
from pydantic import BaseModel, Field
@@ -13,11 +13,9 @@ from invokeai.app.invocations.baseinvocation import (
from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField, OutputField, UIType
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.app.shared.models import FreeUConfig
from invokeai.backend.flux.util import max_seq_lengths
from invokeai.backend.model_manager.config import (
AnyModelConfig,
BaseModelType,
CheckpointConfigBase,
ModelType,
SubModelType,
)
@@ -139,78 +137,6 @@ class ModelIdentifierInvocation(BaseInvocation):
return ModelIdentifierOutput(model=self.model)
@invocation_output("flux_model_loader_output")
class FluxModelLoaderOutput(BaseInvocationOutput):
"""Flux base model loader output"""
transformer: TransformerField = OutputField(description=FieldDescriptions.transformer, title="Transformer")
clip: CLIPField = OutputField(description=FieldDescriptions.clip, title="CLIP")
t5_encoder: T5EncoderField = OutputField(description=FieldDescriptions.t5_encoder, title="T5 Encoder")
vae: VAEField = OutputField(description=FieldDescriptions.vae, title="VAE")
max_seq_len: Literal[256, 512] = OutputField(
description="The max sequence length to used for the T5 encoder. (256 for schnell transformer, 512 for dev transformer)",
title="Max Seq Length",
)
@invocation(
"flux_model_loader",
title="Flux Main Model",
tags=["model", "flux"],
category="model",
version="1.0.4",
classification=Classification.Prototype,
)
class FluxModelLoaderInvocation(BaseInvocation):
"""Loads a flux base model, outputting its submodels."""
model: ModelIdentifierField = InputField(
description=FieldDescriptions.flux_model,
ui_type=UIType.FluxMainModel,
input=Input.Direct,
)
t5_encoder_model: ModelIdentifierField = InputField(
description=FieldDescriptions.t5_encoder, ui_type=UIType.T5EncoderModel, input=Input.Direct, title="T5 Encoder"
)
clip_embed_model: ModelIdentifierField = InputField(
description=FieldDescriptions.clip_embed_model,
ui_type=UIType.CLIPEmbedModel,
input=Input.Direct,
title="CLIP Embed",
)
vae_model: ModelIdentifierField = InputField(
description=FieldDescriptions.vae_model, ui_type=UIType.FluxVAEModel, title="VAE"
)
def invoke(self, context: InvocationContext) -> FluxModelLoaderOutput:
for key in [self.model.key, self.t5_encoder_model.key, self.clip_embed_model.key, self.vae_model.key]:
if not context.models.exists(key):
raise ValueError(f"Unknown model: {key}")
transformer = self.model.model_copy(update={"submodel_type": SubModelType.Transformer})
vae = self.vae_model.model_copy(update={"submodel_type": SubModelType.VAE})
tokenizer = self.clip_embed_model.model_copy(update={"submodel_type": SubModelType.Tokenizer})
clip_encoder = self.clip_embed_model.model_copy(update={"submodel_type": SubModelType.TextEncoder})
tokenizer2 = self.t5_encoder_model.model_copy(update={"submodel_type": SubModelType.Tokenizer2})
t5_encoder = self.t5_encoder_model.model_copy(update={"submodel_type": SubModelType.TextEncoder2})
transformer_config = context.models.get_config(transformer)
assert isinstance(transformer_config, CheckpointConfigBase)
return FluxModelLoaderOutput(
transformer=TransformerField(transformer=transformer, loras=[]),
clip=CLIPField(tokenizer=tokenizer, text_encoder=clip_encoder, loras=[], skipped_layers=0),
t5_encoder=T5EncoderField(tokenizer=tokenizer2, text_encoder=t5_encoder),
vae=VAEField(vae=vae),
max_seq_len=max_seq_lengths[transformer_config.config_path],
)
@invocation(
"main_model_loader",
title="Main Model",

View File

@@ -1,43 +1,4 @@
import io
from typing import Literal, Optional
import matplotlib.pyplot as plt
import numpy as np
import PIL.Image
from easing_functions import (
BackEaseIn,
BackEaseInOut,
BackEaseOut,
BounceEaseIn,
BounceEaseInOut,
BounceEaseOut,
CircularEaseIn,
CircularEaseInOut,
CircularEaseOut,
CubicEaseIn,
CubicEaseInOut,
CubicEaseOut,
ElasticEaseIn,
ElasticEaseInOut,
ElasticEaseOut,
ExponentialEaseIn,
ExponentialEaseInOut,
ExponentialEaseOut,
LinearInOut,
QuadEaseIn,
QuadEaseInOut,
QuadEaseOut,
QuarticEaseIn,
QuarticEaseInOut,
QuarticEaseOut,
QuinticEaseIn,
QuinticEaseInOut,
QuinticEaseOut,
SineEaseIn,
SineEaseInOut,
SineEaseOut,
)
from matplotlib.ticker import MaxNLocator
from invokeai.app.invocations.baseinvocation import BaseInvocation, invocation
from invokeai.app.invocations.fields import InputField
@@ -65,191 +26,3 @@ class FloatLinearRangeInvocation(BaseInvocation):
def invoke(self, context: InvocationContext) -> FloatCollectionOutput:
param_list = list(np.linspace(self.start, self.stop, self.steps))
return FloatCollectionOutput(collection=param_list)
EASING_FUNCTIONS_MAP = {
"Linear": LinearInOut,
"QuadIn": QuadEaseIn,
"QuadOut": QuadEaseOut,
"QuadInOut": QuadEaseInOut,
"CubicIn": CubicEaseIn,
"CubicOut": CubicEaseOut,
"CubicInOut": CubicEaseInOut,
"QuarticIn": QuarticEaseIn,
"QuarticOut": QuarticEaseOut,
"QuarticInOut": QuarticEaseInOut,
"QuinticIn": QuinticEaseIn,
"QuinticOut": QuinticEaseOut,
"QuinticInOut": QuinticEaseInOut,
"SineIn": SineEaseIn,
"SineOut": SineEaseOut,
"SineInOut": SineEaseInOut,
"CircularIn": CircularEaseIn,
"CircularOut": CircularEaseOut,
"CircularInOut": CircularEaseInOut,
"ExponentialIn": ExponentialEaseIn,
"ExponentialOut": ExponentialEaseOut,
"ExponentialInOut": ExponentialEaseInOut,
"ElasticIn": ElasticEaseIn,
"ElasticOut": ElasticEaseOut,
"ElasticInOut": ElasticEaseInOut,
"BackIn": BackEaseIn,
"BackOut": BackEaseOut,
"BackInOut": BackEaseInOut,
"BounceIn": BounceEaseIn,
"BounceOut": BounceEaseOut,
"BounceInOut": BounceEaseInOut,
}
EASING_FUNCTION_KEYS = Literal[tuple(EASING_FUNCTIONS_MAP.keys())]
# actually I think for now could just use CollectionOutput (which is list[Any]
@invocation(
"step_param_easing",
title="Step Param Easing",
tags=["step", "easing"],
category="step",
version="1.0.2",
)
class StepParamEasingInvocation(BaseInvocation):
"""Experimental per-step parameter easing for denoising steps"""
easing: EASING_FUNCTION_KEYS = InputField(default="Linear", description="The easing function to use")
num_steps: int = InputField(default=20, description="number of denoising steps")
start_value: float = InputField(default=0.0, description="easing starting value")
end_value: float = InputField(default=1.0, description="easing ending value")
start_step_percent: float = InputField(default=0.0, description="fraction of steps at which to start easing")
end_step_percent: float = InputField(default=1.0, description="fraction of steps after which to end easing")
# if None, then start_value is used prior to easing start
pre_start_value: Optional[float] = InputField(default=None, description="value before easing start")
# if None, then end value is used prior to easing end
post_end_value: Optional[float] = InputField(default=None, description="value after easing end")
mirror: bool = InputField(default=False, description="include mirror of easing function")
# FIXME: add alt_mirror option (alternative to default or mirror), or remove entirely
# alt_mirror: bool = InputField(default=False, description="alternative mirroring by dual easing")
show_easing_plot: bool = InputField(default=False, description="show easing plot")
def invoke(self, context: InvocationContext) -> FloatCollectionOutput:
log_diagnostics = False
# convert from start_step_percent to nearest step <= (steps * start_step_percent)
# start_step = int(np.floor(self.num_steps * self.start_step_percent))
start_step = int(np.round(self.num_steps * self.start_step_percent))
# convert from end_step_percent to nearest step >= (steps * end_step_percent)
# end_step = int(np.ceil((self.num_steps - 1) * self.end_step_percent))
end_step = int(np.round((self.num_steps - 1) * self.end_step_percent))
# end_step = int(np.ceil(self.num_steps * self.end_step_percent))
num_easing_steps = end_step - start_step + 1
# num_presteps = max(start_step - 1, 0)
num_presteps = start_step
num_poststeps = self.num_steps - (num_presteps + num_easing_steps)
prelist = list(num_presteps * [self.pre_start_value])
postlist = list(num_poststeps * [self.post_end_value])
if log_diagnostics:
context.logger.debug("start_step: " + str(start_step))
context.logger.debug("end_step: " + str(end_step))
context.logger.debug("num_easing_steps: " + str(num_easing_steps))
context.logger.debug("num_presteps: " + str(num_presteps))
context.logger.debug("num_poststeps: " + str(num_poststeps))
context.logger.debug("prelist size: " + str(len(prelist)))
context.logger.debug("postlist size: " + str(len(postlist)))
context.logger.debug("prelist: " + str(prelist))
context.logger.debug("postlist: " + str(postlist))
easing_class = EASING_FUNCTIONS_MAP[self.easing]
if log_diagnostics:
context.logger.debug("easing class: " + str(easing_class))
easing_list = []
if self.mirror: # "expected" mirroring
# if number of steps is even, squeeze duration down to (number_of_steps)/2
# and create reverse copy of list to append
# if number of steps is odd, squeeze duration down to ceil(number_of_steps/2)
# and create reverse copy of list[1:end-1]
# but if even then number_of_steps/2 === ceil(number_of_steps/2), so can just use ceil always
base_easing_duration = int(np.ceil(num_easing_steps / 2.0))
if log_diagnostics:
context.logger.debug("base easing duration: " + str(base_easing_duration))
even_num_steps = num_easing_steps % 2 == 0 # even number of steps
easing_function = easing_class(
start=self.start_value,
end=self.end_value,
duration=base_easing_duration - 1,
)
base_easing_vals = []
for step_index in range(base_easing_duration):
easing_val = easing_function.ease(step_index)
base_easing_vals.append(easing_val)
if log_diagnostics:
context.logger.debug("step_index: " + str(step_index) + ", easing_val: " + str(easing_val))
if even_num_steps:
mirror_easing_vals = list(reversed(base_easing_vals))
else:
mirror_easing_vals = list(reversed(base_easing_vals[0:-1]))
if log_diagnostics:
context.logger.debug("base easing vals: " + str(base_easing_vals))
context.logger.debug("mirror easing vals: " + str(mirror_easing_vals))
easing_list = base_easing_vals + mirror_easing_vals
# FIXME: add alt_mirror option (alternative to default or mirror), or remove entirely
# elif self.alt_mirror: # function mirroring (unintuitive behavior (at least to me))
# # half_ease_duration = round(num_easing_steps - 1 / 2)
# half_ease_duration = round((num_easing_steps - 1) / 2)
# easing_function = easing_class(start=self.start_value,
# end=self.end_value,
# duration=half_ease_duration,
# )
#
# mirror_function = easing_class(start=self.end_value,
# end=self.start_value,
# duration=half_ease_duration,
# )
# for step_index in range(num_easing_steps):
# if step_index <= half_ease_duration:
# step_val = easing_function.ease(step_index)
# else:
# step_val = mirror_function.ease(step_index - half_ease_duration)
# easing_list.append(step_val)
# if log_diagnostics: logger.debug(step_index, step_val)
#
else: # no mirroring (default)
easing_function = easing_class(
start=self.start_value,
end=self.end_value,
duration=num_easing_steps - 1,
)
for step_index in range(num_easing_steps):
step_val = easing_function.ease(step_index)
easing_list.append(step_val)
if log_diagnostics:
context.logger.debug("step_index: " + str(step_index) + ", easing_val: " + str(step_val))
if log_diagnostics:
context.logger.debug("prelist size: " + str(len(prelist)))
context.logger.debug("easing_list size: " + str(len(easing_list)))
context.logger.debug("postlist size: " + str(len(postlist)))
param_list = prelist + easing_list + postlist
if self.show_easing_plot:
plt.figure()
plt.xlabel("Step")
plt.ylabel("Param Value")
plt.title("Per-Step Values Based On Easing: " + self.easing)
plt.bar(range(len(param_list)), param_list)
# plt.plot(param_list)
ax = plt.gca()
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
buf = io.BytesIO()
plt.savefig(buf, format="png")
buf.seek(0)
im = PIL.Image.open(buf)
im.show()
buf.close()
# output array of size steps, each entry list[i] is param value for step i
return FloatCollectionOutput(collection=param_list)

View File

@@ -4,7 +4,13 @@ from typing import Optional
import torch
from invokeai.app.invocations.baseinvocation import BaseInvocation, BaseInvocationOutput, invocation, invocation_output
from invokeai.app.invocations.baseinvocation import (
BaseInvocation,
BaseInvocationOutput,
Classification,
invocation,
invocation_output,
)
from invokeai.app.invocations.constants import LATENT_SCALE_FACTOR
from invokeai.app.invocations.fields import (
BoundingBoxField,
@@ -18,6 +24,7 @@ from invokeai.app.invocations.fields import (
InputField,
LatentsField,
OutputField,
SD3ConditioningField,
TensorField,
UIComponent,
)
@@ -426,6 +433,17 @@ class FluxConditioningOutput(BaseInvocationOutput):
return cls(conditioning=FluxConditioningField(conditioning_name=conditioning_name))
@invocation_output("sd3_conditioning_output")
class SD3ConditioningOutput(BaseInvocationOutput):
"""Base class for nodes that output a single SD3 conditioning tensor"""
conditioning: SD3ConditioningField = OutputField(description=FieldDescriptions.cond)
@classmethod
def build(cls, conditioning_name: str) -> "SD3ConditioningOutput":
return cls(conditioning=SD3ConditioningField(conditioning_name=conditioning_name))
@invocation_output("conditioning_output")
class ConditioningOutput(BaseInvocationOutput):
"""Base class for nodes that output a single conditioning tensor"""
@@ -521,3 +539,23 @@ class BoundingBoxInvocation(BaseInvocation):
# endregion
@invocation(
"image_batch",
title="Image Batch",
tags=["primitives", "image", "batch", "internal"],
category="primitives",
version="1.0.0",
classification=Classification.Special,
)
class ImageBatchInvocation(BaseInvocation):
"""Create a batched generation, where the workflow is executed once for each image in the batch."""
images: list[ImageField] = InputField(min_length=1, description="The images to batch over", input=Input.Direct)
def __init__(self):
raise NotImplementedError("This class should never be executed or instantiated directly.")
def invoke(self, context: InvocationContext) -> ImageOutput:
raise NotImplementedError("This class should never be executed or instantiated directly.")

View File

@@ -0,0 +1,338 @@
from typing import Callable, Optional, Tuple
import torch
import torchvision.transforms as tv_transforms
from diffusers.models.transformers.transformer_sd3 import SD3Transformer2DModel
from torchvision.transforms.functional import resize as tv_resize
from tqdm import tqdm
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.constants import LATENT_SCALE_FACTOR
from invokeai.app.invocations.fields import (
DenoiseMaskField,
FieldDescriptions,
Input,
InputField,
LatentsField,
SD3ConditioningField,
WithBoard,
WithMetadata,
)
from invokeai.app.invocations.model import TransformerField
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.invocations.sd3_text_encoder import SD3_T5_MAX_SEQ_LEN
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.flux.sampling_utils import clip_timestep_schedule_fractional
from invokeai.backend.model_manager.config import BaseModelType
from invokeai.backend.sd3.extensions.inpaint_extension import InpaintExtension
from invokeai.backend.stable_diffusion.diffusers_pipeline import PipelineIntermediateState
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import SD3ConditioningInfo
from invokeai.backend.util.devices import TorchDevice
@invocation(
"sd3_denoise",
title="SD3 Denoise",
tags=["image", "sd3"],
category="image",
version="1.1.0",
classification=Classification.Prototype,
)
class SD3DenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
"""Run denoising process with a SD3 model."""
# If latents is provided, this means we are doing image-to-image.
latents: Optional[LatentsField] = InputField(
default=None, description=FieldDescriptions.latents, input=Input.Connection
)
# denoise_mask is used for image-to-image inpainting. Only the masked region is modified.
denoise_mask: Optional[DenoiseMaskField] = InputField(
default=None, description=FieldDescriptions.denoise_mask, input=Input.Connection
)
denoising_start: float = InputField(default=0.0, ge=0, le=1, description=FieldDescriptions.denoising_start)
denoising_end: float = InputField(default=1.0, ge=0, le=1, description=FieldDescriptions.denoising_end)
transformer: TransformerField = InputField(
description=FieldDescriptions.sd3_model, input=Input.Connection, title="Transformer"
)
positive_conditioning: SD3ConditioningField = InputField(
description=FieldDescriptions.positive_cond, input=Input.Connection
)
negative_conditioning: SD3ConditioningField = InputField(
description=FieldDescriptions.negative_cond, input=Input.Connection
)
cfg_scale: float | list[float] = InputField(default=3.5, description=FieldDescriptions.cfg_scale, title="CFG Scale")
width: int = InputField(default=1024, multiple_of=16, description="Width of the generated image.")
height: int = InputField(default=1024, multiple_of=16, description="Height of the generated image.")
steps: int = InputField(default=10, gt=0, description=FieldDescriptions.steps)
seed: int = InputField(default=0, description="Randomness seed for reproducibility.")
@torch.no_grad()
def invoke(self, context: InvocationContext) -> LatentsOutput:
latents = self._run_diffusion(context)
latents = latents.detach().to("cpu")
name = context.tensors.save(tensor=latents)
return LatentsOutput.build(latents_name=name, latents=latents, seed=None)
def _prep_inpaint_mask(self, context: InvocationContext, latents: torch.Tensor) -> torch.Tensor | None:
"""Prepare the inpaint mask.
- Loads the mask
- Resizes if necessary
- Casts to same device/dtype as latents
Args:
context (InvocationContext): The invocation context, for loading the inpaint mask.
latents (torch.Tensor): A latent image tensor. Used to determine the target shape, device, and dtype for the
inpaint mask.
Returns:
torch.Tensor | None: Inpaint mask. Values of 0.0 represent the regions to be fully denoised, and 1.0
represent the regions to be preserved.
"""
if self.denoise_mask is None:
return None
mask = context.tensors.load(self.denoise_mask.mask_name)
# The input denoise_mask contains values in [0, 1], where 0.0 represents the regions to be fully denoised, and
# 1.0 represents the regions to be preserved.
# We invert the mask so that the regions to be preserved are 0.0 and the regions to be denoised are 1.0.
mask = 1.0 - mask
_, _, latent_height, latent_width = latents.shape
mask = tv_resize(
img=mask,
size=[latent_height, latent_width],
interpolation=tv_transforms.InterpolationMode.BILINEAR,
antialias=False,
)
mask = mask.to(device=latents.device, dtype=latents.dtype)
return mask
def _load_text_conditioning(
self,
context: InvocationContext,
conditioning_name: str,
joint_attention_dim: int,
dtype: torch.dtype,
device: torch.device,
) -> Tuple[torch.Tensor, torch.Tensor]:
# Load the conditioning data.
cond_data = context.conditioning.load(conditioning_name)
assert len(cond_data.conditionings) == 1
sd3_conditioning = cond_data.conditionings[0]
assert isinstance(sd3_conditioning, SD3ConditioningInfo)
sd3_conditioning = sd3_conditioning.to(dtype=dtype, device=device)
t5_embeds = sd3_conditioning.t5_embeds
if t5_embeds is None:
t5_embeds = torch.zeros(
(1, SD3_T5_MAX_SEQ_LEN, joint_attention_dim),
device=device,
dtype=dtype,
)
clip_prompt_embeds = torch.cat([sd3_conditioning.clip_l_embeds, sd3_conditioning.clip_g_embeds], dim=-1)
clip_prompt_embeds = torch.nn.functional.pad(
clip_prompt_embeds, (0, t5_embeds.shape[-1] - clip_prompt_embeds.shape[-1])
)
prompt_embeds = torch.cat([clip_prompt_embeds, t5_embeds], dim=-2)
pooled_prompt_embeds = torch.cat(
[sd3_conditioning.clip_l_pooled_embeds, sd3_conditioning.clip_g_pooled_embeds], dim=-1
)
return prompt_embeds, pooled_prompt_embeds
def _get_noise(
self,
num_samples: int,
num_channels_latents: int,
height: int,
width: int,
dtype: torch.dtype,
device: torch.device,
seed: int,
) -> torch.Tensor:
# We always generate noise on the same device and dtype then cast to ensure consistency across devices/dtypes.
rand_device = "cpu"
rand_dtype = torch.float16
return torch.randn(
num_samples,
num_channels_latents,
int(height) // LATENT_SCALE_FACTOR,
int(width) // LATENT_SCALE_FACTOR,
device=rand_device,
dtype=rand_dtype,
generator=torch.Generator(device=rand_device).manual_seed(seed),
).to(device=device, dtype=dtype)
def _prepare_cfg_scale(self, num_timesteps: int) -> list[float]:
"""Prepare the CFG scale list.
Args:
num_timesteps (int): The number of timesteps in the scheduler. Could be different from num_steps depending
on the scheduler used (e.g. higher order schedulers).
Returns:
list[float]: _description_
"""
if isinstance(self.cfg_scale, float):
cfg_scale = [self.cfg_scale] * num_timesteps
elif isinstance(self.cfg_scale, list):
assert len(self.cfg_scale) == num_timesteps
cfg_scale = self.cfg_scale
else:
raise ValueError(f"Invalid CFG scale type: {type(self.cfg_scale)}")
return cfg_scale
def _run_diffusion(
self,
context: InvocationContext,
):
inference_dtype = TorchDevice.choose_torch_dtype()
device = TorchDevice.choose_torch_device()
transformer_info = context.models.load(self.transformer.transformer)
# Load/process the conditioning data.
# TODO(ryand): Make CFG optional.
do_classifier_free_guidance = True
pos_prompt_embeds, pos_pooled_prompt_embeds = self._load_text_conditioning(
context=context,
conditioning_name=self.positive_conditioning.conditioning_name,
joint_attention_dim=transformer_info.model.config.joint_attention_dim,
dtype=inference_dtype,
device=device,
)
neg_prompt_embeds, neg_pooled_prompt_embeds = self._load_text_conditioning(
context=context,
conditioning_name=self.negative_conditioning.conditioning_name,
joint_attention_dim=transformer_info.model.config.joint_attention_dim,
dtype=inference_dtype,
device=device,
)
# TODO(ryand): Support both sequential and batched CFG inference.
prompt_embeds = torch.cat([neg_prompt_embeds, pos_prompt_embeds], dim=0)
pooled_prompt_embeds = torch.cat([neg_pooled_prompt_embeds, pos_pooled_prompt_embeds], dim=0)
# Prepare the timestep schedule.
# We add an extra step to the end to account for the final timestep of 0.0.
timesteps: list[float] = torch.linspace(1, 0, self.steps + 1).tolist()
# Clip the timesteps schedule based on denoising_start and denoising_end.
timesteps = clip_timestep_schedule_fractional(timesteps, self.denoising_start, self.denoising_end)
total_steps = len(timesteps) - 1
# Prepare the CFG scale list.
cfg_scale = self._prepare_cfg_scale(total_steps)
# Load the input latents, if provided.
init_latents = context.tensors.load(self.latents.latents_name) if self.latents else None
if init_latents is not None:
init_latents = init_latents.to(device=device, dtype=inference_dtype)
# Generate initial latent noise.
num_channels_latents = transformer_info.model.config.in_channels
assert isinstance(num_channels_latents, int)
noise = self._get_noise(
num_samples=1,
num_channels_latents=num_channels_latents,
height=self.height,
width=self.width,
dtype=inference_dtype,
device=device,
seed=self.seed,
)
# Prepare input latent image.
if init_latents is not None:
# Noise the init_latents by the appropriate amount for the first timestep.
t_0 = timesteps[0]
latents = t_0 * noise + (1.0 - t_0) * init_latents
else:
# init_latents are not provided, so we are not doing image-to-image (i.e. we are starting from pure noise).
if self.denoising_start > 1e-5:
raise ValueError("denoising_start should be 0 when initial latents are not provided.")
latents = noise
# If len(timesteps) == 1, then short-circuit. We are just noising the input latents, but not taking any
# denoising steps.
if len(timesteps) <= 1:
return latents
# Prepare inpaint extension.
inpaint_mask = self._prep_inpaint_mask(context, latents)
inpaint_extension: InpaintExtension | None = None
if inpaint_mask is not None:
assert init_latents is not None
inpaint_extension = InpaintExtension(
init_latents=init_latents,
inpaint_mask=inpaint_mask,
noise=noise,
)
step_callback = self._build_step_callback(context)
step_callback(
PipelineIntermediateState(
step=0,
order=1,
total_steps=total_steps,
timestep=int(timesteps[0]),
latents=latents,
),
)
with transformer_info.model_on_device() as (cached_weights, transformer):
assert isinstance(transformer, SD3Transformer2DModel)
# 6. Denoising loop
for step_idx, (t_curr, t_prev) in tqdm(list(enumerate(zip(timesteps[:-1], timesteps[1:], strict=True)))):
# Expand the latents if we are doing CFG.
latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
# Expand the timestep to match the latent model input.
# Multiply by 1000 to match the default FlowMatchEulerDiscreteScheduler num_train_timesteps.
timestep = torch.tensor([t_curr * 1000], device=device).expand(latent_model_input.shape[0])
noise_pred = transformer(
hidden_states=latent_model_input,
timestep=timestep,
encoder_hidden_states=prompt_embeds,
pooled_projections=pooled_prompt_embeds,
joint_attention_kwargs=None,
return_dict=False,
)[0]
# Apply CFG.
if do_classifier_free_guidance:
noise_pred_uncond, noise_pred_cond = noise_pred.chunk(2)
noise_pred = noise_pred_uncond + cfg_scale[step_idx] * (noise_pred_cond - noise_pred_uncond)
# Compute the previous noisy sample x_t -> x_t-1.
latents_dtype = latents.dtype
latents = latents.to(dtype=torch.float32)
latents = latents + (t_prev - t_curr) * noise_pred
latents = latents.to(dtype=latents_dtype)
if inpaint_extension is not None:
latents = inpaint_extension.merge_intermediate_latents_with_init_latents(latents, t_prev)
step_callback(
PipelineIntermediateState(
step=step_idx + 1,
order=1,
total_steps=total_steps,
timestep=int(t_curr),
latents=latents,
),
)
return latents
def _build_step_callback(self, context: InvocationContext) -> Callable[[PipelineIntermediateState], None]:
def step_callback(state: PipelineIntermediateState) -> None:
context.util.sd_step_callback(state, BaseModelType.StableDiffusion3)
return step_callback

View File

@@ -0,0 +1,65 @@
import einops
import torch
from diffusers.models.autoencoders.autoencoder_kl import AutoencoderKL
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.fields import (
FieldDescriptions,
ImageField,
Input,
InputField,
WithBoard,
WithMetadata,
)
from invokeai.app.invocations.model import VAEField
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.model_manager.load.load_base import LoadedModel
from invokeai.backend.stable_diffusion.diffusers_pipeline import image_resized_to_grid_as_tensor
@invocation(
"sd3_i2l",
title="SD3 Image to Latents",
tags=["image", "latents", "vae", "i2l", "sd3"],
category="image",
version="1.0.0",
classification=Classification.Prototype,
)
class SD3ImageToLatentsInvocation(BaseInvocation, WithMetadata, WithBoard):
"""Generates latents from an image."""
image: ImageField = InputField(description="The image to encode")
vae: VAEField = InputField(description=FieldDescriptions.vae, input=Input.Connection)
@staticmethod
def vae_encode(vae_info: LoadedModel, image_tensor: torch.Tensor) -> torch.Tensor:
with vae_info as vae:
assert isinstance(vae, AutoencoderKL)
vae.disable_tiling()
image_tensor = image_tensor.to(device=vae.device, dtype=vae.dtype)
with torch.inference_mode():
image_tensor_dist = vae.encode(image_tensor).latent_dist
# TODO: Use seed to make sampling reproducible.
latents: torch.Tensor = image_tensor_dist.sample().to(dtype=vae.dtype)
latents = vae.config.scaling_factor * latents
return latents
@torch.no_grad()
def invoke(self, context: InvocationContext) -> LatentsOutput:
image = context.images.get_pil(self.image.image_name)
image_tensor = image_resized_to_grid_as_tensor(image.convert("RGB"))
if image_tensor.dim() == 3:
image_tensor = einops.rearrange(image_tensor, "c h w -> 1 c h w")
vae_info = context.models.load(self.vae.vae)
latents = self.vae_encode(vae_info=vae_info, image_tensor=image_tensor)
latents = latents.to("cpu")
name = context.tensors.save(tensor=latents)
return LatentsOutput.build(latents_name=name, latents=latents, seed=None)

View File

@@ -0,0 +1,74 @@
from contextlib import nullcontext
import torch
from diffusers.models.autoencoders.autoencoder_kl import AutoencoderKL
from einops import rearrange
from PIL import Image
from invokeai.app.invocations.baseinvocation import BaseInvocation, invocation
from invokeai.app.invocations.fields import (
FieldDescriptions,
Input,
InputField,
LatentsField,
WithBoard,
WithMetadata,
)
from invokeai.app.invocations.model import VAEField
from invokeai.app.invocations.primitives import ImageOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.stable_diffusion.extensions.seamless import SeamlessExt
from invokeai.backend.util.devices import TorchDevice
@invocation(
"sd3_l2i",
title="SD3 Latents to Image",
tags=["latents", "image", "vae", "l2i", "sd3"],
category="latents",
version="1.3.0",
)
class SD3LatentsToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
"""Generates an image from latents."""
latents: LatentsField = InputField(
description=FieldDescriptions.latents,
input=Input.Connection,
)
vae: VAEField = InputField(
description=FieldDescriptions.vae,
input=Input.Connection,
)
@torch.no_grad()
def invoke(self, context: InvocationContext) -> ImageOutput:
latents = context.tensors.load(self.latents.latents_name)
vae_info = context.models.load(self.vae.vae)
assert isinstance(vae_info.model, (AutoencoderKL))
with SeamlessExt.static_patch_model(vae_info.model, self.vae.seamless_axes), vae_info as vae:
context.util.signal_progress("Running VAE")
assert isinstance(vae, (AutoencoderKL))
latents = latents.to(vae.device)
vae.disable_tiling()
tiling_context = nullcontext()
# clear memory as vae decode can request a lot
TorchDevice.empty_cache()
with torch.inference_mode(), tiling_context:
# copied from diffusers pipeline
latents = latents / vae.config.scaling_factor
img = vae.decode(latents, return_dict=False)[0]
img = img.clamp(-1, 1)
img = rearrange(img[0], "c h w -> h w c") # noqa: F821
img_pil = Image.fromarray((127.5 * (img + 1.0)).byte().cpu().numpy())
TorchDevice.empty_cache()
image_dto = context.images.save(image=img_pil)
return ImageOutput.build(image_dto)

View File

@@ -0,0 +1,108 @@
from typing import Optional
from invokeai.app.invocations.baseinvocation import (
BaseInvocation,
BaseInvocationOutput,
Classification,
invocation,
invocation_output,
)
from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField, OutputField, UIType
from invokeai.app.invocations.model import CLIPField, ModelIdentifierField, T5EncoderField, TransformerField, VAEField
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.model_manager.config import SubModelType
@invocation_output("sd3_model_loader_output")
class Sd3ModelLoaderOutput(BaseInvocationOutput):
"""SD3 base model loader output."""
transformer: TransformerField = OutputField(description=FieldDescriptions.transformer, title="Transformer")
clip_l: CLIPField = OutputField(description=FieldDescriptions.clip, title="CLIP L")
clip_g: CLIPField = OutputField(description=FieldDescriptions.clip, title="CLIP G")
t5_encoder: T5EncoderField = OutputField(description=FieldDescriptions.t5_encoder, title="T5 Encoder")
vae: VAEField = OutputField(description=FieldDescriptions.vae, title="VAE")
@invocation(
"sd3_model_loader",
title="SD3 Main Model",
tags=["model", "sd3"],
category="model",
version="1.0.0",
classification=Classification.Prototype,
)
class Sd3ModelLoaderInvocation(BaseInvocation):
"""Loads a SD3 base model, outputting its submodels."""
model: ModelIdentifierField = InputField(
description=FieldDescriptions.sd3_model,
ui_type=UIType.SD3MainModel,
input=Input.Direct,
)
t5_encoder_model: Optional[ModelIdentifierField] = InputField(
description=FieldDescriptions.t5_encoder,
ui_type=UIType.T5EncoderModel,
input=Input.Direct,
title="T5 Encoder",
default=None,
)
clip_l_model: Optional[ModelIdentifierField] = InputField(
description=FieldDescriptions.clip_embed_model,
ui_type=UIType.CLIPLEmbedModel,
input=Input.Direct,
title="CLIP L Encoder",
default=None,
)
clip_g_model: Optional[ModelIdentifierField] = InputField(
description=FieldDescriptions.clip_g_model,
ui_type=UIType.CLIPGEmbedModel,
input=Input.Direct,
title="CLIP G Encoder",
default=None,
)
vae_model: Optional[ModelIdentifierField] = InputField(
description=FieldDescriptions.vae_model, ui_type=UIType.VAEModel, title="VAE", default=None
)
def invoke(self, context: InvocationContext) -> Sd3ModelLoaderOutput:
transformer = self.model.model_copy(update={"submodel_type": SubModelType.Transformer})
vae = (
self.vae_model.model_copy(update={"submodel_type": SubModelType.VAE})
if self.vae_model
else self.model.model_copy(update={"submodel_type": SubModelType.VAE})
)
tokenizer_l = self.model.model_copy(update={"submodel_type": SubModelType.Tokenizer})
clip_encoder_l = (
self.clip_l_model.model_copy(update={"submodel_type": SubModelType.TextEncoder})
if self.clip_l_model
else self.model.model_copy(update={"submodel_type": SubModelType.TextEncoder})
)
tokenizer_g = self.model.model_copy(update={"submodel_type": SubModelType.Tokenizer2})
clip_encoder_g = (
self.clip_g_model.model_copy(update={"submodel_type": SubModelType.TextEncoder2})
if self.clip_g_model
else self.model.model_copy(update={"submodel_type": SubModelType.TextEncoder2})
)
tokenizer_t5 = (
self.t5_encoder_model.model_copy(update={"submodel_type": SubModelType.Tokenizer3})
if self.t5_encoder_model
else self.model.model_copy(update={"submodel_type": SubModelType.Tokenizer3})
)
t5_encoder = (
self.t5_encoder_model.model_copy(update={"submodel_type": SubModelType.TextEncoder3})
if self.t5_encoder_model
else self.model.model_copy(update={"submodel_type": SubModelType.TextEncoder3})
)
return Sd3ModelLoaderOutput(
transformer=TransformerField(transformer=transformer, loras=[]),
clip_l=CLIPField(tokenizer=tokenizer_l, text_encoder=clip_encoder_l, loras=[], skipped_layers=0),
clip_g=CLIPField(tokenizer=tokenizer_g, text_encoder=clip_encoder_g, loras=[], skipped_layers=0),
t5_encoder=T5EncoderField(tokenizer=tokenizer_t5, text_encoder=t5_encoder),
vae=VAEField(vae=vae),
)

View File

@@ -0,0 +1,203 @@
from contextlib import ExitStack
from typing import Iterator, Tuple
import torch
from transformers import (
CLIPTextModel,
CLIPTextModelWithProjection,
CLIPTokenizer,
T5EncoderModel,
T5Tokenizer,
T5TokenizerFast,
)
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField
from invokeai.app.invocations.model import CLIPField, T5EncoderField
from invokeai.app.invocations.primitives import SD3ConditioningOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.lora.conversions.flux_lora_constants import FLUX_LORA_CLIP_PREFIX
from invokeai.backend.lora.lora_model_raw import LoRAModelRaw
from invokeai.backend.lora.lora_patcher import LoRAPatcher
from invokeai.backend.model_manager.config import ModelFormat
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import ConditioningFieldData, SD3ConditioningInfo
from invokeai.backend.util.devices import TorchDevice
# The SD3 T5 Max Sequence Length set based on the default in diffusers.
SD3_T5_MAX_SEQ_LEN = 256
@invocation(
"sd3_text_encoder",
title="SD3 Text Encoding",
tags=["prompt", "conditioning", "sd3"],
category="conditioning",
version="1.0.0",
classification=Classification.Prototype,
)
class Sd3TextEncoderInvocation(BaseInvocation):
"""Encodes and preps a prompt for a SD3 image."""
clip_l: CLIPField = InputField(
title="CLIP L",
description=FieldDescriptions.clip,
input=Input.Connection,
)
clip_g: CLIPField = InputField(
title="CLIP G",
description=FieldDescriptions.clip,
input=Input.Connection,
)
# The SD3 models were trained with text encoder dropout, so the T5 encoder can be omitted to save time/memory.
t5_encoder: T5EncoderField | None = InputField(
title="T5Encoder",
default=None,
description=FieldDescriptions.t5_encoder,
input=Input.Connection,
)
prompt: str = InputField(description="Text prompt to encode.")
@torch.no_grad()
def invoke(self, context: InvocationContext) -> SD3ConditioningOutput:
# Note: The text encoding model are run in separate functions to ensure that all model references are locally
# scoped. This ensures that earlier models can be freed and gc'd before loading later models (if necessary).
clip_l_embeddings, clip_l_pooled_embeddings = self._clip_encode(context, self.clip_l)
clip_g_embeddings, clip_g_pooled_embeddings = self._clip_encode(context, self.clip_g)
t5_embeddings: torch.Tensor | None = None
if self.t5_encoder is not None:
t5_embeddings = self._t5_encode(context, SD3_T5_MAX_SEQ_LEN)
conditioning_data = ConditioningFieldData(
conditionings=[
SD3ConditioningInfo(
clip_l_embeds=clip_l_embeddings,
clip_l_pooled_embeds=clip_l_pooled_embeddings,
clip_g_embeds=clip_g_embeddings,
clip_g_pooled_embeds=clip_g_pooled_embeddings,
t5_embeds=t5_embeddings,
)
]
)
conditioning_name = context.conditioning.save(conditioning_data)
return SD3ConditioningOutput.build(conditioning_name)
def _t5_encode(self, context: InvocationContext, max_seq_len: int) -> torch.Tensor:
assert self.t5_encoder is not None
t5_tokenizer_info = context.models.load(self.t5_encoder.tokenizer)
t5_text_encoder_info = context.models.load(self.t5_encoder.text_encoder)
prompt = [self.prompt]
with (
t5_text_encoder_info as t5_text_encoder,
t5_tokenizer_info as t5_tokenizer,
):
context.util.signal_progress("Running T5 encoder")
assert isinstance(t5_text_encoder, T5EncoderModel)
assert isinstance(t5_tokenizer, (T5Tokenizer, T5TokenizerFast))
text_inputs = t5_tokenizer(
prompt,
padding="max_length",
max_length=max_seq_len,
truncation=True,
add_special_tokens=True,
return_tensors="pt",
)
text_input_ids = text_inputs.input_ids
untruncated_ids = t5_tokenizer(prompt, padding="longest", return_tensors="pt").input_ids
assert isinstance(text_input_ids, torch.Tensor)
assert isinstance(untruncated_ids, torch.Tensor)
if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(
text_input_ids, untruncated_ids
):
removed_text = t5_tokenizer.batch_decode(untruncated_ids[:, max_seq_len - 1 : -1])
context.logger.warning(
"The following part of your input was truncated because `max_sequence_length` is set to "
f" {max_seq_len} tokens: {removed_text}"
)
prompt_embeds = t5_text_encoder(text_input_ids.to(t5_text_encoder.device))[0]
assert isinstance(prompt_embeds, torch.Tensor)
return prompt_embeds
def _clip_encode(
self, context: InvocationContext, clip_model: CLIPField, tokenizer_max_length: int = 77
) -> Tuple[torch.Tensor, torch.Tensor]:
clip_tokenizer_info = context.models.load(clip_model.tokenizer)
clip_text_encoder_info = context.models.load(clip_model.text_encoder)
prompt = [self.prompt]
with (
clip_text_encoder_info.model_on_device() as (cached_weights, clip_text_encoder),
clip_tokenizer_info as clip_tokenizer,
ExitStack() as exit_stack,
):
context.util.signal_progress("Running CLIP encoder")
assert isinstance(clip_text_encoder, (CLIPTextModel, CLIPTextModelWithProjection))
assert isinstance(clip_tokenizer, CLIPTokenizer)
clip_text_encoder_config = clip_text_encoder_info.config
assert clip_text_encoder_config is not None
# Apply LoRA models to the CLIP encoder.
# Note: We apply the LoRA after the transformer has been moved to its target device for faster patching.
if clip_text_encoder_config.format in [ModelFormat.Diffusers]:
# The model is non-quantized, so we can apply the LoRA weights directly into the model.
exit_stack.enter_context(
LoRAPatcher.apply_smart_lora_patches(
model=clip_text_encoder,
patches=self._clip_lora_iterator(context, clip_model),
prefix=FLUX_LORA_CLIP_PREFIX,
dtype=TorchDevice.choose_torch_dtype(),
cached_weights=cached_weights,
)
)
else:
# There are currently no supported CLIP quantized models. Add support here if needed.
raise ValueError(f"Unsupported model format: {clip_text_encoder_config.format}")
clip_text_encoder = clip_text_encoder.eval().requires_grad_(False)
text_inputs = clip_tokenizer(
prompt,
padding="max_length",
max_length=tokenizer_max_length,
truncation=True,
return_tensors="pt",
)
text_input_ids = text_inputs.input_ids
untruncated_ids = clip_tokenizer(prompt, padding="longest", return_tensors="pt").input_ids
assert isinstance(text_input_ids, torch.Tensor)
assert isinstance(untruncated_ids, torch.Tensor)
if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(
text_input_ids, untruncated_ids
):
removed_text = clip_tokenizer.batch_decode(untruncated_ids[:, tokenizer_max_length - 1 : -1])
context.logger.warning(
"The following part of your input was truncated because CLIP can only handle sequences up to"
f" {tokenizer_max_length} tokens: {removed_text}"
)
prompt_embeds = clip_text_encoder(
input_ids=text_input_ids.to(clip_text_encoder.device), output_hidden_states=True
)
pooled_prompt_embeds = prompt_embeds[0]
prompt_embeds = prompt_embeds.hidden_states[-2]
return prompt_embeds, pooled_prompt_embeds
def _clip_lora_iterator(
self, context: InvocationContext, clip_model: CLIPField
) -> Iterator[Tuple[LoRAModelRaw, float]]:
for lora in clip_model.loras:
lora_info = context.models.load(lora.lora)
assert isinstance(lora_info.model, LoRAModelRaw)
yield (lora_info.model, lora.weight)
del lora_info

View File

@@ -1,9 +1,11 @@
from enum import Enum
from pathlib import Path
from typing import Literal
import numpy as np
import torch
from PIL import Image
from pydantic import BaseModel, Field
from transformers import AutoModelForMaskGeneration, AutoProcessor
from transformers.models.sam import SamModel
from transformers.models.sam.processing_sam import SamProcessor
@@ -23,12 +25,31 @@ SEGMENT_ANYTHING_MODEL_IDS: dict[SegmentAnythingModelKey, str] = {
}
class SAMPointLabel(Enum):
negative = -1
neutral = 0
positive = 1
class SAMPoint(BaseModel):
x: int = Field(..., description="The x-coordinate of the point")
y: int = Field(..., description="The y-coordinate of the point")
label: SAMPointLabel = Field(..., description="The label of the point")
class SAMPointsField(BaseModel):
points: list[SAMPoint] = Field(..., description="The points of the object")
def to_list(self) -> list[list[int]]:
return [[point.x, point.y, point.label.value] for point in self.points]
@invocation(
"segment_anything",
title="Segment Anything",
tags=["prompt", "segmentation"],
category="segmentation",
version="1.0.0",
version="1.1.0",
)
class SegmentAnythingInvocation(BaseInvocation):
"""Runs a Segment Anything Model."""
@@ -40,7 +61,13 @@ class SegmentAnythingInvocation(BaseInvocation):
model: SegmentAnythingModelKey = InputField(description="The Segment Anything model to use.")
image: ImageField = InputField(description="The image to segment.")
bounding_boxes: list[BoundingBoxField] = InputField(description="The bounding boxes to prompt the SAM model with.")
bounding_boxes: list[BoundingBoxField] | None = InputField(
default=None, description="The bounding boxes to prompt the SAM model with."
)
point_lists: list[SAMPointsField] | None = InputField(
default=None,
description="The list of point lists to prompt the SAM model with. Each list of points represents a single object.",
)
apply_polygon_refinement: bool = InputField(
description="Whether to apply polygon refinement to the masks. This will smooth the edges of the masks slightly and ensure that each mask consists of a single closed polygon (before merging).",
default=True,
@@ -55,7 +82,12 @@ class SegmentAnythingInvocation(BaseInvocation):
# The models expect a 3-channel RGB image.
image_pil = context.images.get_pil(self.image.image_name, mode="RGB")
if len(self.bounding_boxes) == 0:
if self.point_lists is not None and self.bounding_boxes is not None:
raise ValueError("Only one of point_lists or bounding_box can be provided.")
if (not self.bounding_boxes or len(self.bounding_boxes) == 0) and (
not self.point_lists or len(self.point_lists) == 0
):
combined_mask = torch.zeros(image_pil.size[::-1], dtype=torch.bool)
else:
masks = self._segment(context=context, image=image_pil)
@@ -83,14 +115,13 @@ class SegmentAnythingInvocation(BaseInvocation):
assert isinstance(sam_processor, SamProcessor)
return SegmentAnythingPipeline(sam_model=sam_model, sam_processor=sam_processor)
def _segment(
self,
context: InvocationContext,
image: Image.Image,
) -> list[torch.Tensor]:
def _segment(self, context: InvocationContext, image: Image.Image) -> list[torch.Tensor]:
"""Use Segment Anything (SAM) to generate masks given an image + a set of bounding boxes."""
# Convert the bounding boxes to the SAM input format.
sam_bounding_boxes = [[bb.x_min, bb.y_min, bb.x_max, bb.y_max] for bb in self.bounding_boxes]
sam_bounding_boxes = (
[[bb.x_min, bb.y_min, bb.x_max, bb.y_max] for bb in self.bounding_boxes] if self.bounding_boxes else None
)
sam_points = [p.to_list() for p in self.point_lists] if self.point_lists else None
with (
context.models.load_remote_model(
@@ -98,7 +129,7 @@ class SegmentAnythingInvocation(BaseInvocation):
) as sam_pipeline,
):
assert isinstance(sam_pipeline, SegmentAnythingPipeline)
masks = sam_pipeline.segment(image=image, bounding_boxes=sam_bounding_boxes)
masks = sam_pipeline.segment(image=image, bounding_boxes=sam_bounding_boxes, point_lists=sam_points)
masks = self._process_masks(masks)
if self.apply_polygon_refinement:
@@ -141,9 +172,10 @@ class SegmentAnythingInvocation(BaseInvocation):
return masks
def _filter_masks(self, masks: list[torch.Tensor], bounding_boxes: list[BoundingBoxField]) -> list[torch.Tensor]:
def _filter_masks(
self, masks: list[torch.Tensor], bounding_boxes: list[BoundingBoxField] | None
) -> list[torch.Tensor]:
"""Filter the detected masks based on the specified mask filter."""
assert len(masks) == len(bounding_boxes)
if self.mask_filter == "all":
return masks
@@ -151,6 +183,10 @@ class SegmentAnythingInvocation(BaseInvocation):
# Find the largest mask.
return [max(masks, key=lambda x: float(x.sum()))]
elif self.mask_filter == "highest_box_score":
assert (
bounding_boxes is not None
), "Bounding boxes must be provided to use the 'highest_box_score' mask filter."
assert len(masks) == len(bounding_boxes)
# Find the index of the bounding box with the highest score.
# Note that we fallback to -1.0 if the score is None. This is mainly to satisfy the type checker. In most
# cases the scores should all be non-None when using this filtering mode. That being said, -1.0 is a

View File

@@ -207,7 +207,9 @@ class TiledMultiDiffusionDenoiseLatents(BaseInvocation):
with (
ExitStack() as exit_stack,
unet_info as unet,
LoRAPatcher.apply_lora_patches(model=unet, patches=_lora_loader(), prefix="lora_unet_"),
LoRAPatcher.apply_smart_lora_patches(
model=unet, patches=_lora_loader(), prefix="lora_unet_", dtype=unet.dtype
),
):
assert isinstance(unet, UNet2DConditionModel)
latents = latents.to(device=unet.device, dtype=unet.dtype)

View File

@@ -110,15 +110,26 @@ class DiskImageFileStorage(ImageFileStorageBase):
except Exception as e:
raise ImageFileDeleteException from e
# TODO: make this a bit more flexible for e.g. cloud storage
def get_path(self, image_name: str, thumbnail: bool = False) -> Path:
path = self.__output_folder / image_name
base_folder = self.__thumbnails_folder if thumbnail else self.__output_folder
filename = get_thumbnail_name(image_name) if thumbnail else image_name
if thumbnail:
thumbnail_name = get_thumbnail_name(image_name)
path = self.__thumbnails_folder / thumbnail_name
# Strip any path information from the filename
basename = Path(filename).name
return path
if basename != filename:
raise ValueError("Invalid image name, potential directory traversal detected")
image_path = base_folder / basename
# Ensure the image path is within the base folder to prevent directory traversal
resolved_base = base_folder.resolve()
resolved_image_path = image_path.resolve()
if not resolved_image_path.is_relative_to(resolved_base):
raise ValueError("Image path outside outputs folder, potential directory traversal detected")
return resolved_image_path
def validate_path(self, path: Union[str, Path]) -> bool:
"""Validates the path given for an image or thumbnail."""

View File

@@ -20,7 +20,7 @@ from invokeai.app.services.invocation_stats.invocation_stats_common import (
NodeExecutionStatsSummary,
)
from invokeai.app.services.invoker import Invoker
from invokeai.backend.model_manager.load.model_cache import CacheStats
from invokeai.backend.model_manager.load.model_cache.cache_stats import CacheStats
# Size of 1GB in bytes.
GB = 2**30

View File

@@ -7,7 +7,7 @@ from typing import Callable, Optional
from invokeai.backend.model_manager import AnyModel, AnyModelConfig, SubModelType
from invokeai.backend.model_manager.load import LoadedModel, LoadedModelWithoutConfig
from invokeai.backend.model_manager.load.model_cache.model_cache_base import ModelCacheBase
from invokeai.backend.model_manager.load.model_cache.model_cache import ModelCache
class ModelLoadServiceBase(ABC):
@@ -24,7 +24,7 @@ class ModelLoadServiceBase(ABC):
@property
@abstractmethod
def ram_cache(self) -> ModelCacheBase[AnyModel]:
def ram_cache(self) -> ModelCache:
"""Return the RAM cache used by this loader."""
@abstractmethod

View File

@@ -18,7 +18,7 @@ from invokeai.backend.model_manager.load import (
ModelLoaderRegistry,
ModelLoaderRegistryBase,
)
from invokeai.backend.model_manager.load.model_cache.model_cache_base import ModelCacheBase
from invokeai.backend.model_manager.load.model_cache.model_cache import ModelCache
from invokeai.backend.model_manager.load.model_loaders.generic_diffusers import GenericDiffusersLoader
from invokeai.backend.util.devices import TorchDevice
from invokeai.backend.util.logging import InvokeAILogger
@@ -30,7 +30,7 @@ class ModelLoadService(ModelLoadServiceBase):
def __init__(
self,
app_config: InvokeAIAppConfig,
ram_cache: ModelCacheBase[AnyModel],
ram_cache: ModelCache,
registry: Optional[Type[ModelLoaderRegistryBase]] = ModelLoaderRegistry,
):
"""Initialize the model load service."""
@@ -45,7 +45,7 @@ class ModelLoadService(ModelLoadServiceBase):
self._invoker = invoker
@property
def ram_cache(self) -> ModelCacheBase[AnyModel]:
def ram_cache(self) -> ModelCache:
"""Return the RAM cache used by this loader."""
return self._ram_cache
@@ -78,15 +78,14 @@ class ModelLoadService(ModelLoadServiceBase):
self, model_path: Path, loader: Optional[Callable[[Path], AnyModel]] = None
) -> LoadedModelWithoutConfig:
cache_key = str(model_path)
ram_cache = self.ram_cache
try:
return LoadedModelWithoutConfig(_locker=ram_cache.get(key=cache_key))
return LoadedModelWithoutConfig(cache_record=self._ram_cache.get(key=cache_key), cache=self._ram_cache)
except IndexError:
pass
def torch_load_file(checkpoint: Path) -> AnyModel:
scan_result = scan_file_path(checkpoint)
if scan_result.infected_files != 0:
if scan_result.infected_files != 0 or scan_result.scan_err:
raise Exception("The model at {checkpoint} is potentially infected by malware. Aborting load.")
result = torch_load(checkpoint, map_location="cpu")
return result
@@ -109,5 +108,5 @@ class ModelLoadService(ModelLoadServiceBase):
)
assert loader is not None
raw_model = loader(model_path)
ram_cache.put(key=cache_key, model=raw_model)
return LoadedModelWithoutConfig(_locker=ram_cache.get(key=cache_key))
self._ram_cache.put(key=cache_key, model=raw_model)
return LoadedModelWithoutConfig(cache_record=self._ram_cache.get(key=cache_key), cache=self._ram_cache)

View File

@@ -16,7 +16,8 @@ from invokeai.app.services.model_load.model_load_base import ModelLoadServiceBas
from invokeai.app.services.model_load.model_load_default import ModelLoadService
from invokeai.app.services.model_manager.model_manager_base import ModelManagerServiceBase
from invokeai.app.services.model_records.model_records_base import ModelRecordServiceBase
from invokeai.backend.model_manager.load import ModelCache, ModelLoaderRegistry
from invokeai.backend.model_manager.load.model_cache.model_cache import ModelCache
from invokeai.backend.model_manager.load.model_loader_registry import ModelLoaderRegistry
from invokeai.backend.util.devices import TorchDevice
from invokeai.backend.util.logging import InvokeAILogger

View File

@@ -15,6 +15,7 @@ from invokeai.app.util.model_exclude_null import BaseModelExcludeNull
from invokeai.backend.model_manager.config import (
AnyModelConfig,
BaseModelType,
ClipVariantType,
ControlAdapterDefaultSettings,
MainModelDefaultSettings,
ModelFormat,
@@ -85,7 +86,7 @@ class ModelRecordChanges(BaseModelExcludeNull):
# Checkpoint-specific changes
# TODO(MM2): Should we expose these? Feels footgun-y...
variant: Optional[ModelVariantType] = Field(description="The variant of the model.", default=None)
variant: Optional[ModelVariantType | ClipVariantType] = Field(description="The variant of the model.", default=None)
prediction_type: Optional[SchedulerPredictionType] = Field(
description="The prediction type of the model.", default=None
)

View File

@@ -16,6 +16,7 @@ from pydantic import (
from pydantic_core import to_jsonable_python
from invokeai.app.invocations.baseinvocation import BaseInvocation
from invokeai.app.invocations.fields import ImageField
from invokeai.app.services.shared.graph import Graph, GraphExecutionState, NodeNotFoundError
from invokeai.app.services.workflow_records.workflow_records_common import (
WorkflowWithoutID,
@@ -51,11 +52,7 @@ class SessionQueueItemNotFoundError(ValueError):
# region Batch
BatchDataType = Union[
StrictStr,
float,
int,
]
BatchDataType = Union[StrictStr, float, int, ImageField]
class NodeFieldValue(BaseModel):

View File

@@ -1,3 +1,4 @@
from copy import deepcopy
from dataclasses import dataclass
from pathlib import Path
from typing import TYPE_CHECKING, Callable, Optional, Union
@@ -159,6 +160,10 @@ class LoggerInterface(InvocationContextInterface):
class ImagesInterface(InvocationContextInterface):
def __init__(self, services: InvocationServices, data: InvocationContextData, util: "UtilInterface") -> None:
super().__init__(services, data)
self._util = util
def save(
self,
image: Image,
@@ -185,6 +190,8 @@ class ImagesInterface(InvocationContextInterface):
The saved image DTO.
"""
self._util.signal_progress("Saving image")
# If `metadata` is provided directly, use that. Else, use the metadata provided by `WithMetadata`, falling back to None.
metadata_ = None
if metadata:
@@ -221,7 +228,7 @@ class ImagesInterface(InvocationContextInterface):
)
def get_pil(self, image_name: str, mode: IMAGE_MODES | None = None) -> Image:
"""Gets an image as a PIL Image object.
"""Gets an image as a PIL Image object. This method returns a copy of the image.
Args:
image_name: The name of the image to get.
@@ -233,11 +240,15 @@ class ImagesInterface(InvocationContextInterface):
image = self._services.images.get_pil_image(image_name)
if mode and mode != image.mode:
try:
# convert makes a copy!
image = image.convert(mode)
except ValueError:
self._services.logger.warning(
f"Could not convert image from {image.mode} to {mode}. Using original mode instead."
)
else:
# copy the image to prevent the user from modifying the original
image = image.copy()
return image
def get_metadata(self, image_name: str) -> Optional[MetadataField]:
@@ -290,15 +301,15 @@ class TensorsInterface(InvocationContextInterface):
return name
def load(self, name: str) -> Tensor:
"""Loads a tensor by name.
"""Loads a tensor by name. This method returns a copy of the tensor.
Args:
name: The name of the tensor to load.
Returns:
The loaded tensor.
The tensor.
"""
return self._services.tensors.load(name)
return self._services.tensors.load(name).clone()
class ConditioningInterface(InvocationContextInterface):
@@ -316,21 +327,25 @@ class ConditioningInterface(InvocationContextInterface):
return name
def load(self, name: str) -> ConditioningFieldData:
"""Loads conditioning data by name.
"""Loads conditioning data by name. This method returns a copy of the conditioning data.
Args:
name: The name of the conditioning data to load.
Returns:
The loaded conditioning data.
The conditioning data.
"""
return self._services.conditioning.load(name)
return deepcopy(self._services.conditioning.load(name))
class ModelsInterface(InvocationContextInterface):
"""Common API for loading, downloading and managing models."""
def __init__(self, services: InvocationServices, data: InvocationContextData, util: "UtilInterface") -> None:
super().__init__(services, data)
self._util = util
def exists(self, identifier: Union[str, "ModelIdentifierField"]) -> bool:
"""Check if a model exists.
@@ -363,11 +378,15 @@ class ModelsInterface(InvocationContextInterface):
if isinstance(identifier, str):
model = self._services.model_manager.store.get_model(identifier)
return self._services.model_manager.load.load_model(model, submodel_type)
else:
_submodel_type = submodel_type or identifier.submodel_type
submodel_type = submodel_type or identifier.submodel_type
model = self._services.model_manager.store.get_model(identifier.key)
return self._services.model_manager.load.load_model(model, _submodel_type)
message = f"Loading model {model.name}"
if submodel_type:
message += f" ({submodel_type.value})"
self._util.signal_progress(message)
return self._services.model_manager.load.load_model(model, submodel_type)
def load_by_attrs(
self, name: str, base: BaseModelType, type: ModelType, submodel_type: Optional[SubModelType] = None
@@ -392,6 +411,10 @@ class ModelsInterface(InvocationContextInterface):
if len(configs) > 1:
raise ValueError(f"More than one model found with name {name}, base {base}, and type {type}")
message = f"Loading model {name}"
if submodel_type:
message += f" ({submodel_type.value})"
self._util.signal_progress(message)
return self._services.model_manager.load.load_model(configs[0], submodel_type)
def get_config(self, identifier: Union[str, "ModelIdentifierField"]) -> AnyModelConfig:
@@ -462,6 +485,7 @@ class ModelsInterface(InvocationContextInterface):
Returns:
Path to the downloaded model
"""
self._util.signal_progress(f"Downloading model {source}")
return self._services.model_manager.install.download_and_cache_model(source=source)
def load_local_model(
@@ -484,6 +508,8 @@ class ModelsInterface(InvocationContextInterface):
Returns:
A LoadedModelWithoutConfig object.
"""
self._util.signal_progress(f"Loading model {model_path.name}")
return self._services.model_manager.load.load_model_from_path(model_path=model_path, loader=loader)
def load_remote_model(
@@ -509,6 +535,8 @@ class ModelsInterface(InvocationContextInterface):
A LoadedModelWithoutConfig object.
"""
model_path = self._services.model_manager.install.download_and_cache_model(source=str(source))
self._util.signal_progress(f"Loading model {source}")
return self._services.model_manager.load.load_model_from_path(model_path=model_path, loader=loader)
@@ -702,12 +730,12 @@ def build_invocation_context(
"""
logger = LoggerInterface(services=services, data=data)
images = ImagesInterface(services=services, data=data)
tensors = TensorsInterface(services=services, data=data)
models = ModelsInterface(services=services, data=data)
config = ConfigInterface(services=services, data=data)
util = UtilInterface(services=services, data=data, is_canceled=is_canceled)
conditioning = ConditioningInterface(services=services, data=data)
models = ModelsInterface(services=services, data=data, util=util)
images = ImagesInterface(services=services, data=data, util=util)
boards = BoardsInterface(services=services, data=data)
ctx = InvocationContext(

View File

@@ -0,0 +1,382 @@
{
"name": "SD3.5 Text to Image",
"author": "InvokeAI",
"description": "Sample text to image workflow for Stable Diffusion 3.5",
"version": "1.0.0",
"contact": "invoke@invoke.ai",
"tags": "text2image, SD3.5, default",
"notes": "",
"exposedFields": [
{
"nodeId": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"fieldName": "model"
},
{
"nodeId": "e17d34e7-6ed1-493c-9a85-4fcd291cb084",
"fieldName": "prompt"
}
],
"meta": {
"version": "3.0.0",
"category": "default"
},
"id": "e3a51d6b-8208-4d6d-b187-fcfe8b32934c",
"nodes": [
{
"id": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"type": "invocation",
"data": {
"id": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"type": "sd3_model_loader",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"nodePack": "invokeai",
"inputs": {
"model": {
"name": "model",
"label": "",
"value": {
"key": "f7b20be9-92a8-4cfb-bca4-6c3b5535c10b",
"hash": "placeholder",
"name": "stable-diffusion-3.5-medium",
"base": "sd-3",
"type": "main"
}
},
"t5_encoder_model": {
"name": "t5_encoder_model",
"label": ""
},
"clip_l_model": {
"name": "clip_l_model",
"label": ""
},
"clip_g_model": {
"name": "clip_g_model",
"label": ""
},
"vae_model": {
"name": "vae_model",
"label": ""
}
}
},
"position": {
"x": -55.58689609637031,
"y": -111.53602444662268
}
},
{
"id": "f7e394ac-6394-4096-abcb-de0d346506b3",
"type": "invocation",
"data": {
"id": "f7e394ac-6394-4096-abcb-de0d346506b3",
"type": "rand_int",
"version": "1.0.1",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": false,
"nodePack": "invokeai",
"inputs": {
"low": {
"name": "low",
"label": "",
"value": 0
},
"high": {
"name": "high",
"label": "",
"value": 2147483647
}
}
},
"position": {
"x": 470.45870147220353,
"y": 350.3141781644303
}
},
{
"id": "9eb72af0-dd9e-4ec5-ad87-d65e3c01f48b",
"type": "invocation",
"data": {
"id": "9eb72af0-dd9e-4ec5-ad87-d65e3c01f48b",
"type": "sd3_l2i",
"version": "1.3.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": false,
"useCache": true,
"nodePack": "invokeai",
"inputs": {
"board": {
"name": "board",
"label": ""
},
"metadata": {
"name": "metadata",
"label": ""
},
"latents": {
"name": "latents",
"label": ""
},
"vae": {
"name": "vae",
"label": ""
}
}
},
"position": {
"x": 1192.3097009334897,
"y": -366.0994675072209
}
},
{
"id": "3b4f7f27-cfc0-4373-a009-99c5290d0cd6",
"type": "invocation",
"data": {
"id": "3b4f7f27-cfc0-4373-a009-99c5290d0cd6",
"type": "sd3_text_encoder",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"nodePack": "invokeai",
"inputs": {
"clip_l": {
"name": "clip_l",
"label": ""
},
"clip_g": {
"name": "clip_g",
"label": ""
},
"t5_encoder": {
"name": "t5_encoder",
"label": ""
},
"prompt": {
"name": "prompt",
"label": "",
"value": ""
}
}
},
"position": {
"x": 408.16054647924784,
"y": 65.06415352118786
}
},
{
"id": "e17d34e7-6ed1-493c-9a85-4fcd291cb084",
"type": "invocation",
"data": {
"id": "e17d34e7-6ed1-493c-9a85-4fcd291cb084",
"type": "sd3_text_encoder",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"nodePack": "invokeai",
"inputs": {
"clip_l": {
"name": "clip_l",
"label": ""
},
"clip_g": {
"name": "clip_g",
"label": ""
},
"t5_encoder": {
"name": "t5_encoder",
"label": ""
},
"prompt": {
"name": "prompt",
"label": "",
"value": ""
}
}
},
"position": {
"x": 378.9283412440941,
"y": -302.65777497352553
}
},
{
"id": "c7539f7b-7ac5-49b9-93eb-87ede611409f",
"type": "invocation",
"data": {
"id": "c7539f7b-7ac5-49b9-93eb-87ede611409f",
"type": "sd3_denoise",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"nodePack": "invokeai",
"inputs": {
"board": {
"name": "board",
"label": ""
},
"metadata": {
"name": "metadata",
"label": ""
},
"transformer": {
"name": "transformer",
"label": ""
},
"positive_conditioning": {
"name": "positive_conditioning",
"label": ""
},
"negative_conditioning": {
"name": "negative_conditioning",
"label": ""
},
"cfg_scale": {
"name": "cfg_scale",
"label": "",
"value": 3.5
},
"width": {
"name": "width",
"label": "",
"value": 1024
},
"height": {
"name": "height",
"label": "",
"value": 1024
},
"steps": {
"name": "steps",
"label": "",
"value": 30
},
"seed": {
"name": "seed",
"label": "",
"value": 0
}
}
},
"position": {
"x": 813.7814762740603,
"y": -142.20529727605867
}
}
],
"edges": [
{
"id": "reactflow__edge-3f22f668-0e02-4fde-a2bb-c339586ceb4cvae-9eb72af0-dd9e-4ec5-ad87-d65e3c01f48bvae",
"type": "default",
"source": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"target": "9eb72af0-dd9e-4ec5-ad87-d65e3c01f48b",
"sourceHandle": "vae",
"targetHandle": "vae"
},
{
"id": "reactflow__edge-3f22f668-0e02-4fde-a2bb-c339586ceb4ct5_encoder-3b4f7f27-cfc0-4373-a009-99c5290d0cd6t5_encoder",
"type": "default",
"source": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"target": "3b4f7f27-cfc0-4373-a009-99c5290d0cd6",
"sourceHandle": "t5_encoder",
"targetHandle": "t5_encoder"
},
{
"id": "reactflow__edge-3f22f668-0e02-4fde-a2bb-c339586ceb4ct5_encoder-e17d34e7-6ed1-493c-9a85-4fcd291cb084t5_encoder",
"type": "default",
"source": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"target": "e17d34e7-6ed1-493c-9a85-4fcd291cb084",
"sourceHandle": "t5_encoder",
"targetHandle": "t5_encoder"
},
{
"id": "reactflow__edge-3f22f668-0e02-4fde-a2bb-c339586ceb4cclip_g-3b4f7f27-cfc0-4373-a009-99c5290d0cd6clip_g",
"type": "default",
"source": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"target": "3b4f7f27-cfc0-4373-a009-99c5290d0cd6",
"sourceHandle": "clip_g",
"targetHandle": "clip_g"
},
{
"id": "reactflow__edge-3f22f668-0e02-4fde-a2bb-c339586ceb4cclip_g-e17d34e7-6ed1-493c-9a85-4fcd291cb084clip_g",
"type": "default",
"source": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"target": "e17d34e7-6ed1-493c-9a85-4fcd291cb084",
"sourceHandle": "clip_g",
"targetHandle": "clip_g"
},
{
"id": "reactflow__edge-3f22f668-0e02-4fde-a2bb-c339586ceb4cclip_l-3b4f7f27-cfc0-4373-a009-99c5290d0cd6clip_l",
"type": "default",
"source": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"target": "3b4f7f27-cfc0-4373-a009-99c5290d0cd6",
"sourceHandle": "clip_l",
"targetHandle": "clip_l"
},
{
"id": "reactflow__edge-3f22f668-0e02-4fde-a2bb-c339586ceb4cclip_l-e17d34e7-6ed1-493c-9a85-4fcd291cb084clip_l",
"type": "default",
"source": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"target": "e17d34e7-6ed1-493c-9a85-4fcd291cb084",
"sourceHandle": "clip_l",
"targetHandle": "clip_l"
},
{
"id": "reactflow__edge-3f22f668-0e02-4fde-a2bb-c339586ceb4ctransformer-c7539f7b-7ac5-49b9-93eb-87ede611409ftransformer",
"type": "default",
"source": "3f22f668-0e02-4fde-a2bb-c339586ceb4c",
"target": "c7539f7b-7ac5-49b9-93eb-87ede611409f",
"sourceHandle": "transformer",
"targetHandle": "transformer"
},
{
"id": "reactflow__edge-f7e394ac-6394-4096-abcb-de0d346506b3value-c7539f7b-7ac5-49b9-93eb-87ede611409fseed",
"type": "default",
"source": "f7e394ac-6394-4096-abcb-de0d346506b3",
"target": "c7539f7b-7ac5-49b9-93eb-87ede611409f",
"sourceHandle": "value",
"targetHandle": "seed"
},
{
"id": "reactflow__edge-c7539f7b-7ac5-49b9-93eb-87ede611409flatents-9eb72af0-dd9e-4ec5-ad87-d65e3c01f48blatents",
"type": "default",
"source": "c7539f7b-7ac5-49b9-93eb-87ede611409f",
"target": "9eb72af0-dd9e-4ec5-ad87-d65e3c01f48b",
"sourceHandle": "latents",
"targetHandle": "latents"
},
{
"id": "reactflow__edge-e17d34e7-6ed1-493c-9a85-4fcd291cb084conditioning-c7539f7b-7ac5-49b9-93eb-87ede611409fpositive_conditioning",
"type": "default",
"source": "e17d34e7-6ed1-493c-9a85-4fcd291cb084",
"target": "c7539f7b-7ac5-49b9-93eb-87ede611409f",
"sourceHandle": "conditioning",
"targetHandle": "positive_conditioning"
},
{
"id": "reactflow__edge-3b4f7f27-cfc0-4373-a009-99c5290d0cd6conditioning-c7539f7b-7ac5-49b9-93eb-87ede611409fnegative_conditioning",
"type": "default",
"source": "3b4f7f27-cfc0-4373-a009-99c5290d0cd6",
"target": "c7539f7b-7ac5-49b9-93eb-87ede611409f",
"sourceHandle": "conditioning",
"targetHandle": "negative_conditioning"
}
]
}

View File

@@ -34,6 +34,25 @@ SD1_5_LATENT_RGB_FACTORS = [
[-0.1307, -0.1874, -0.7445], # L4
]
SD3_5_LATENT_RGB_FACTORS = [
[-0.05240681, 0.03251581, 0.0749016],
[-0.0580572, 0.00759826, 0.05729818],
[0.16144888, 0.01270368, -0.03768577],
[0.14418615, 0.08460266, 0.15941818],
[0.04894035, 0.0056485, -0.06686988],
[0.05187166, 0.19222395, 0.06261094],
[0.1539433, 0.04818359, 0.07103094],
[-0.08601796, 0.09013458, 0.10893912],
[-0.12398469, -0.06766567, 0.0033688],
[-0.0439737, 0.07825329, 0.02258823],
[0.03101129, 0.06382551, 0.07753657],
[-0.01315361, 0.08554491, -0.08772475],
[0.06464487, 0.05914605, 0.13262741],
[-0.07863674, -0.02261737, -0.12761454],
[-0.09923835, -0.08010759, -0.06264447],
[-0.03392309, -0.0804029, -0.06078822],
]
FLUX_LATENT_RGB_FACTORS = [
[-0.0412, 0.0149, 0.0521],
[0.0056, 0.0291, 0.0768],
@@ -110,6 +129,9 @@ def stable_diffusion_step_callback(
sdxl_latent_rgb_factors = torch.tensor(SDXL_LATENT_RGB_FACTORS, dtype=sample.dtype, device=sample.device)
sdxl_smooth_matrix = torch.tensor(SDXL_SMOOTH_MATRIX, dtype=sample.dtype, device=sample.device)
image = sample_to_lowres_estimated_image(sample, sdxl_latent_rgb_factors, sdxl_smooth_matrix)
elif base_model == BaseModelType.StableDiffusion3:
sd3_latent_rgb_factors = torch.tensor(SD3_5_LATENT_RGB_FACTORS, dtype=sample.dtype, device=sample.device)
image = sample_to_lowres_estimated_image(sample, sd3_latent_rgb_factors)
else:
v1_5_latent_rgb_factors = torch.tensor(SD1_5_LATENT_RGB_FACTORS, dtype=sample.dtype, device=sample.device)
image = sample_to_lowres_estimated_image(sample, v1_5_latent_rgb_factors)

View File

@@ -0,0 +1,138 @@
import einops
import torch
from invokeai.backend.flux.extensions.regional_prompting_extension import RegionalPromptingExtension
from invokeai.backend.flux.extensions.xlabs_ip_adapter_extension import XLabsIPAdapterExtension
from invokeai.backend.flux.math import attention
from invokeai.backend.flux.modules.layers import DoubleStreamBlock, SingleStreamBlock
class CustomDoubleStreamBlockProcessor:
"""A class containing a custom implementation of DoubleStreamBlock.forward() with additional features
(IP-Adapter, etc.).
"""
@staticmethod
def _double_stream_block_forward(
block: DoubleStreamBlock,
img: torch.Tensor,
txt: torch.Tensor,
vec: torch.Tensor,
pe: torch.Tensor,
attn_mask: torch.Tensor | None = None,
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
"""This function is a direct copy of DoubleStreamBlock.forward(), but it returns some of the intermediate
values.
"""
img_mod1, img_mod2 = block.img_mod(vec)
txt_mod1, txt_mod2 = block.txt_mod(vec)
# prepare image for attention
img_modulated = block.img_norm1(img)
img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift
img_qkv = block.img_attn.qkv(img_modulated)
img_q, img_k, img_v = einops.rearrange(img_qkv, "B L (K H D) -> K B H L D", K=3, H=block.num_heads)
img_q, img_k = block.img_attn.norm(img_q, img_k, img_v)
# prepare txt for attention
txt_modulated = block.txt_norm1(txt)
txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift
txt_qkv = block.txt_attn.qkv(txt_modulated)
txt_q, txt_k, txt_v = einops.rearrange(txt_qkv, "B L (K H D) -> K B H L D", K=3, H=block.num_heads)
txt_q, txt_k = block.txt_attn.norm(txt_q, txt_k, txt_v)
# run actual attention
q = torch.cat((txt_q, img_q), dim=2)
k = torch.cat((txt_k, img_k), dim=2)
v = torch.cat((txt_v, img_v), dim=2)
attn = attention(q, k, v, pe=pe, attn_mask=attn_mask)
txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
# calculate the img bloks
img = img + img_mod1.gate * block.img_attn.proj(img_attn)
img = img + img_mod2.gate * block.img_mlp((1 + img_mod2.scale) * block.img_norm2(img) + img_mod2.shift)
# calculate the txt bloks
txt = txt + txt_mod1.gate * block.txt_attn.proj(txt_attn)
txt = txt + txt_mod2.gate * block.txt_mlp((1 + txt_mod2.scale) * block.txt_norm2(txt) + txt_mod2.shift)
return img, txt, img_q
@staticmethod
def custom_double_block_forward(
timestep_index: int,
total_num_timesteps: int,
block_index: int,
block: DoubleStreamBlock,
img: torch.Tensor,
txt: torch.Tensor,
vec: torch.Tensor,
pe: torch.Tensor,
ip_adapter_extensions: list[XLabsIPAdapterExtension],
regional_prompting_extension: RegionalPromptingExtension,
) -> tuple[torch.Tensor, torch.Tensor]:
"""A custom implementation of DoubleStreamBlock.forward() with additional features:
- IP-Adapter support
"""
attn_mask = regional_prompting_extension.get_double_stream_attn_mask(block_index)
img, txt, img_q = CustomDoubleStreamBlockProcessor._double_stream_block_forward(
block, img, txt, vec, pe, attn_mask=attn_mask
)
# Apply IP-Adapter conditioning.
for ip_adapter_extension in ip_adapter_extensions:
img = ip_adapter_extension.run_ip_adapter(
timestep_index=timestep_index,
total_num_timesteps=total_num_timesteps,
block_index=block_index,
block=block,
img_q=img_q,
img=img,
)
return img, txt
class CustomSingleStreamBlockProcessor:
"""A class containing a custom implementation of SingleStreamBlock.forward() with additional features (masking,
etc.)
"""
@staticmethod
def _single_stream_block_forward(
block: SingleStreamBlock,
x: torch.Tensor,
vec: torch.Tensor,
pe: torch.Tensor,
attn_mask: torch.Tensor | None = None,
) -> torch.Tensor:
"""This function is a direct copy of SingleStreamBlock.forward()."""
mod, _ = block.modulation(vec)
x_mod = (1 + mod.scale) * block.pre_norm(x) + mod.shift
qkv, mlp = torch.split(block.linear1(x_mod), [3 * block.hidden_size, block.mlp_hidden_dim], dim=-1)
q, k, v = einops.rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=block.num_heads)
q, k = block.norm(q, k, v)
# compute attention
attn = attention(q, k, v, pe=pe, attn_mask=attn_mask)
# compute activation in mlp stream, cat again and run second linear layer
output = block.linear2(torch.cat((attn, block.mlp_act(mlp)), 2))
return x + mod.gate * output
@staticmethod
def custom_single_block_forward(
timestep_index: int,
total_num_timesteps: int,
block_index: int,
block: SingleStreamBlock,
img: torch.Tensor,
vec: torch.Tensor,
pe: torch.Tensor,
regional_prompting_extension: RegionalPromptingExtension,
) -> torch.Tensor:
"""A custom implementation of SingleStreamBlock.forward() with additional features:
- Masking
"""
attn_mask = regional_prompting_extension.get_single_stream_attn_mask(block_index)
return CustomSingleStreamBlockProcessor._single_stream_block_forward(block, img, vec, pe, attn_mask=attn_mask)

View File

@@ -1,3 +1,4 @@
import math
from typing import Callable
import torch
@@ -6,7 +7,9 @@ from tqdm import tqdm
from invokeai.backend.flux.controlnet.controlnet_flux_output import ControlNetFluxOutput, sum_controlnet_flux_outputs
from invokeai.backend.flux.extensions.inpaint_extension import InpaintExtension
from invokeai.backend.flux.extensions.instantx_controlnet_extension import InstantXControlNetExtension
from invokeai.backend.flux.extensions.regional_prompting_extension import RegionalPromptingExtension
from invokeai.backend.flux.extensions.xlabs_controlnet_extension import XLabsControlNetExtension
from invokeai.backend.flux.extensions.xlabs_ip_adapter_extension import XLabsIPAdapterExtension
from invokeai.backend.flux.model import Flux
from invokeai.backend.stable_diffusion.diffusers_pipeline import PipelineIntermediateState
@@ -16,15 +19,17 @@ def denoise(
# model input
img: torch.Tensor,
img_ids: torch.Tensor,
txt: torch.Tensor,
txt_ids: torch.Tensor,
vec: torch.Tensor,
pos_regional_prompting_extension: RegionalPromptingExtension,
neg_regional_prompting_extension: RegionalPromptingExtension | None,
# sampling parameters
timesteps: list[float],
step_callback: Callable[[PipelineIntermediateState], None],
guidance: float,
cfg_scale: list[float],
inpaint_extension: InpaintExtension | None,
controlnet_extensions: list[XLabsControlNetExtension | InstantXControlNetExtension],
pos_ip_adapter_extensions: list[XLabsIPAdapterExtension],
neg_ip_adapter_extensions: list[XLabsIPAdapterExtension],
):
# step 0 is the initial state
total_steps = len(timesteps) - 1
@@ -37,10 +42,9 @@ def denoise(
latents=img,
),
)
step = 1
# guidance_vec is ignored for schnell.
guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)
for t_curr, t_prev in tqdm(list(zip(timesteps[:-1], timesteps[1:], strict=True))):
for step_index, (t_curr, t_prev) in tqdm(list(enumerate(zip(timesteps[:-1], timesteps[1:], strict=True)))):
t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
# Run ControlNet models.
@@ -48,20 +52,20 @@ def denoise(
for controlnet_extension in controlnet_extensions:
controlnet_residuals.append(
controlnet_extension.run_controlnet(
timestep_index=step - 1,
timestep_index=step_index,
total_num_timesteps=total_steps,
img=img,
img_ids=img_ids,
txt=txt,
txt_ids=txt_ids,
y=vec,
txt=pos_regional_prompting_extension.regional_text_conditioning.t5_embeddings,
txt_ids=pos_regional_prompting_extension.regional_text_conditioning.t5_txt_ids,
y=pos_regional_prompting_extension.regional_text_conditioning.clip_embeddings,
timesteps=t_vec,
guidance=guidance_vec,
)
)
# Merge the ControlNet residuals from multiple ControlNets.
# TODO(ryand): We may want to alculate the sum just-in-time to keep peak memory low. Keep in mind, that the
# TODO(ryand): We may want to calculate the sum just-in-time to keep peak memory low. Keep in mind, that the
# controlnet_residuals datastructure is efficient in that it likely contains multiple references to the same
# tensors. Calculating the sum materializes each tensor into its own instance.
merged_controlnet_residuals = sum_controlnet_flux_outputs(controlnet_residuals)
@@ -69,15 +73,46 @@ def denoise(
pred = model(
img=img,
img_ids=img_ids,
txt=txt,
txt_ids=txt_ids,
y=vec,
txt=pos_regional_prompting_extension.regional_text_conditioning.t5_embeddings,
txt_ids=pos_regional_prompting_extension.regional_text_conditioning.t5_txt_ids,
y=pos_regional_prompting_extension.regional_text_conditioning.clip_embeddings,
timesteps=t_vec,
guidance=guidance_vec,
timestep_index=step_index,
total_num_timesteps=total_steps,
controlnet_double_block_residuals=merged_controlnet_residuals.double_block_residuals,
controlnet_single_block_residuals=merged_controlnet_residuals.single_block_residuals,
ip_adapter_extensions=pos_ip_adapter_extensions,
regional_prompting_extension=pos_regional_prompting_extension,
)
step_cfg_scale = cfg_scale[step_index]
# If step_cfg_scale, is 1.0, then we don't need to run the negative prediction.
if not math.isclose(step_cfg_scale, 1.0):
# TODO(ryand): Add option to run positive and negative predictions in a single batch for better performance
# on systems with sufficient VRAM.
if neg_regional_prompting_extension is None:
raise ValueError("Negative text conditioning is required when cfg_scale is not 1.0.")
neg_pred = model(
img=img,
img_ids=img_ids,
txt=neg_regional_prompting_extension.regional_text_conditioning.t5_embeddings,
txt_ids=neg_regional_prompting_extension.regional_text_conditioning.t5_txt_ids,
y=neg_regional_prompting_extension.regional_text_conditioning.clip_embeddings,
timesteps=t_vec,
guidance=guidance_vec,
timestep_index=step_index,
total_num_timesteps=total_steps,
controlnet_double_block_residuals=None,
controlnet_single_block_residuals=None,
ip_adapter_extensions=neg_ip_adapter_extensions,
regional_prompting_extension=neg_regional_prompting_extension,
)
pred = neg_pred + step_cfg_scale * (pred - neg_pred)
preview_img = img - t_curr * pred
img = img + (t_prev - t_curr) * pred
@@ -87,13 +122,12 @@ def denoise(
step_callback(
PipelineIntermediateState(
step=step,
step=step_index + 1,
order=1,
total_steps=total_steps,
timestep=int(t_curr),
latents=preview_img,
),
)
step += 1
return img

View File

@@ -0,0 +1,276 @@
from typing import Optional
import torch
import torchvision
from invokeai.backend.flux.text_conditioning import FluxRegionalTextConditioning, FluxTextConditioning
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import Range
from invokeai.backend.util.devices import TorchDevice
from invokeai.backend.util.mask import to_standard_float_mask
class RegionalPromptingExtension:
"""A class for managing regional prompting with FLUX.
This implementation is inspired by https://arxiv.org/pdf/2411.02395 (though there are significant differences).
"""
def __init__(
self,
regional_text_conditioning: FluxRegionalTextConditioning,
restricted_attn_mask: torch.Tensor | None = None,
):
self.regional_text_conditioning = regional_text_conditioning
self.restricted_attn_mask = restricted_attn_mask
def get_double_stream_attn_mask(self, block_index: int) -> torch.Tensor | None:
order = [self.restricted_attn_mask, None]
return order[block_index % len(order)]
def get_single_stream_attn_mask(self, block_index: int) -> torch.Tensor | None:
order = [self.restricted_attn_mask, None]
return order[block_index % len(order)]
@classmethod
def from_text_conditioning(cls, text_conditioning: list[FluxTextConditioning], img_seq_len: int):
"""Create a RegionalPromptingExtension from a list of text conditionings.
Args:
text_conditioning (list[FluxTextConditioning]): The text conditionings to use for regional prompting.
img_seq_len (int): The image sequence length (i.e. packed_height * packed_width).
"""
regional_text_conditioning = cls._concat_regional_text_conditioning(text_conditioning)
attn_mask_with_restricted_img_self_attn = cls._prepare_restricted_attn_mask(
regional_text_conditioning, img_seq_len
)
return cls(
regional_text_conditioning=regional_text_conditioning,
restricted_attn_mask=attn_mask_with_restricted_img_self_attn,
)
# Keeping _prepare_unrestricted_attn_mask for reference as an alternative masking strategy:
#
# @classmethod
# def _prepare_unrestricted_attn_mask(
# cls,
# regional_text_conditioning: FluxRegionalTextConditioning,
# img_seq_len: int,
# ) -> torch.Tensor:
# """Prepare an 'unrestricted' attention mask. In this context, 'unrestricted' means that:
# - img self-attention is not masked.
# - img regions attend to both txt within their own region and to global prompts.
# """
# device = TorchDevice.choose_torch_device()
# # Infer txt_seq_len from the t5_embeddings tensor.
# txt_seq_len = regional_text_conditioning.t5_embeddings.shape[1]
# # In the attention blocks, the txt seq and img seq are concatenated and then attention is applied.
# # Concatenation happens in the following order: [txt_seq, img_seq].
# # There are 4 portions of the attention mask to consider as we prepare it:
# # 1. txt attends to itself
# # 2. txt attends to corresponding regional img
# # 3. regional img attends to corresponding txt
# # 4. regional img attends to itself
# # Initialize empty attention mask.
# regional_attention_mask = torch.zeros(
# (txt_seq_len + img_seq_len, txt_seq_len + img_seq_len), device=device, dtype=torch.float16
# )
# for image_mask, t5_embedding_range in zip(
# regional_text_conditioning.image_masks, regional_text_conditioning.t5_embedding_ranges, strict=True
# ):
# # 1. txt attends to itself
# regional_attention_mask[
# t5_embedding_range.start : t5_embedding_range.end, t5_embedding_range.start : t5_embedding_range.end
# ] = 1.0
# # 2. txt attends to corresponding regional img
# # Note that we reshape to (1, img_seq_len) to ensure broadcasting works as desired.
# fill_value = image_mask.view(1, img_seq_len) if image_mask is not None else 1.0
# regional_attention_mask[t5_embedding_range.start : t5_embedding_range.end, txt_seq_len:] = fill_value
# # 3. regional img attends to corresponding txt
# # Note that we reshape to (img_seq_len, 1) to ensure broadcasting works as desired.
# fill_value = image_mask.view(img_seq_len, 1) if image_mask is not None else 1.0
# regional_attention_mask[txt_seq_len:, t5_embedding_range.start : t5_embedding_range.end] = fill_value
# # 4. regional img attends to itself
# # Allow unrestricted img self attention.
# regional_attention_mask[txt_seq_len:, txt_seq_len:] = 1.0
# # Convert attention mask to boolean.
# regional_attention_mask = regional_attention_mask > 0.5
# return regional_attention_mask
@classmethod
def _prepare_restricted_attn_mask(
cls,
regional_text_conditioning: FluxRegionalTextConditioning,
img_seq_len: int,
) -> torch.Tensor | None:
"""Prepare a 'restricted' attention mask. In this context, 'restricted' means that:
- img self-attention is only allowed within regions.
- img regions only attend to txt within their own region, not to global prompts.
"""
# Identify background region. I.e. the region that is not covered by any region masks.
background_region_mask: None | torch.Tensor = None
for image_mask in regional_text_conditioning.image_masks:
if image_mask is not None:
if background_region_mask is None:
background_region_mask = torch.ones_like(image_mask)
background_region_mask *= 1 - image_mask
if background_region_mask is None:
# There are no region masks, short-circuit and return None.
# TODO(ryand): We could restrict txt-txt attention across multiple global prompts, but this would
# is a rare use case and would make the logic here significantly more complicated.
return None
device = TorchDevice.choose_torch_device()
# Infer txt_seq_len from the t5_embeddings tensor.
txt_seq_len = regional_text_conditioning.t5_embeddings.shape[1]
# In the attention blocks, the txt seq and img seq are concatenated and then attention is applied.
# Concatenation happens in the following order: [txt_seq, img_seq].
# There are 4 portions of the attention mask to consider as we prepare it:
# 1. txt attends to itself
# 2. txt attends to corresponding regional img
# 3. regional img attends to corresponding txt
# 4. regional img attends to itself
# Initialize empty attention mask.
regional_attention_mask = torch.zeros(
(txt_seq_len + img_seq_len, txt_seq_len + img_seq_len), device=device, dtype=torch.float16
)
for image_mask, t5_embedding_range in zip(
regional_text_conditioning.image_masks, regional_text_conditioning.t5_embedding_ranges, strict=True
):
# 1. txt attends to itself
regional_attention_mask[
t5_embedding_range.start : t5_embedding_range.end, t5_embedding_range.start : t5_embedding_range.end
] = 1.0
if image_mask is not None:
# 2. txt attends to corresponding regional img
# Note that we reshape to (1, img_seq_len) to ensure broadcasting works as desired.
regional_attention_mask[t5_embedding_range.start : t5_embedding_range.end, txt_seq_len:] = (
image_mask.view(1, img_seq_len)
)
# 3. regional img attends to corresponding txt
# Note that we reshape to (img_seq_len, 1) to ensure broadcasting works as desired.
regional_attention_mask[txt_seq_len:, t5_embedding_range.start : t5_embedding_range.end] = (
image_mask.view(img_seq_len, 1)
)
# 4. regional img attends to itself
image_mask = image_mask.view(img_seq_len, 1)
regional_attention_mask[txt_seq_len:, txt_seq_len:] += image_mask @ image_mask.T
else:
# We don't allow attention between non-background image regions and global prompts. This helps to ensure
# that regions focus on their local prompts. We do, however, allow attention between background regions
# and global prompts. If we didn't do this, then the background regions would not attend to any txt
# embeddings, which we found experimentally to cause artifacts.
# 2. global txt attends to background region
# Note that we reshape to (1, img_seq_len) to ensure broadcasting works as desired.
regional_attention_mask[t5_embedding_range.start : t5_embedding_range.end, txt_seq_len:] = (
background_region_mask.view(1, img_seq_len)
)
# 3. background region attends to global txt
# Note that we reshape to (img_seq_len, 1) to ensure broadcasting works as desired.
regional_attention_mask[txt_seq_len:, t5_embedding_range.start : t5_embedding_range.end] = (
background_region_mask.view(img_seq_len, 1)
)
# Allow background regions to attend to themselves.
regional_attention_mask[txt_seq_len:, txt_seq_len:] += background_region_mask.view(img_seq_len, 1)
regional_attention_mask[txt_seq_len:, txt_seq_len:] += background_region_mask.view(1, img_seq_len)
# Convert attention mask to boolean.
regional_attention_mask = regional_attention_mask > 0.5
return regional_attention_mask
@classmethod
def _concat_regional_text_conditioning(
cls,
text_conditionings: list[FluxTextConditioning],
) -> FluxRegionalTextConditioning:
"""Concatenate regional text conditioning data into a single conditioning tensor (with associated masks)."""
concat_t5_embeddings: list[torch.Tensor] = []
concat_t5_embedding_ranges: list[Range] = []
image_masks: list[torch.Tensor | None] = []
# Choose global CLIP embedding.
# Use the first global prompt's CLIP embedding as the global CLIP embedding. If there is no global prompt, use
# the first prompt's CLIP embedding.
global_clip_embedding: torch.Tensor = text_conditionings[0].clip_embeddings
for text_conditioning in text_conditionings:
if text_conditioning.mask is None:
global_clip_embedding = text_conditioning.clip_embeddings
break
cur_t5_embedding_len = 0
for text_conditioning in text_conditionings:
concat_t5_embeddings.append(text_conditioning.t5_embeddings)
concat_t5_embedding_ranges.append(
Range(start=cur_t5_embedding_len, end=cur_t5_embedding_len + text_conditioning.t5_embeddings.shape[1])
)
image_masks.append(text_conditioning.mask)
cur_t5_embedding_len += text_conditioning.t5_embeddings.shape[1]
t5_embeddings = torch.cat(concat_t5_embeddings, dim=1)
# Initialize the txt_ids tensor.
pos_bs, pos_t5_seq_len, _ = t5_embeddings.shape
t5_txt_ids = torch.zeros(
pos_bs, pos_t5_seq_len, 3, dtype=t5_embeddings.dtype, device=TorchDevice.choose_torch_device()
)
return FluxRegionalTextConditioning(
t5_embeddings=t5_embeddings,
clip_embeddings=global_clip_embedding,
t5_txt_ids=t5_txt_ids,
image_masks=image_masks,
t5_embedding_ranges=concat_t5_embedding_ranges,
)
@staticmethod
def preprocess_regional_prompt_mask(
mask: Optional[torch.Tensor], packed_height: int, packed_width: int, dtype: torch.dtype, device: torch.device
) -> torch.Tensor:
"""Preprocess a regional prompt mask to match the target height and width.
If mask is None, returns a mask of all ones with the target height and width.
If mask is not None, resizes the mask to the target height and width using 'nearest' interpolation.
packed_height and packed_width are the target height and width of the mask in the 'packed' latent space.
Returns:
torch.Tensor: The processed mask. shape: (1, 1, packed_height * packed_width).
"""
if mask is None:
return torch.ones((1, 1, packed_height * packed_width), dtype=dtype, device=device)
mask = to_standard_float_mask(mask, out_dtype=dtype)
tf = torchvision.transforms.Resize(
(packed_height, packed_width), interpolation=torchvision.transforms.InterpolationMode.NEAREST
)
# Add a batch dimension to the mask, because torchvision expects shape (batch, channels, h, w).
mask = mask.unsqueeze(0) # Shape: (1, h, w) -> (1, 1, h, w)
resized_mask = tf(mask)
# Flatten the height and width dimensions into a single image_seq_len dimension.
return resized_mask.flatten(start_dim=2)

View File

@@ -0,0 +1,89 @@
import math
from typing import List, Union
import einops
import torch
from PIL import Image
from transformers import CLIPImageProcessor, CLIPVisionModelWithProjection
from invokeai.backend.flux.ip_adapter.xlabs_ip_adapter_flux import XlabsIpAdapterFlux
from invokeai.backend.flux.modules.layers import DoubleStreamBlock
class XLabsIPAdapterExtension:
def __init__(
self,
model: XlabsIpAdapterFlux,
image_prompt_clip_embed: torch.Tensor,
weight: Union[float, List[float]],
begin_step_percent: float,
end_step_percent: float,
):
self._model = model
self._image_prompt_clip_embed = image_prompt_clip_embed
self._weight = weight
self._begin_step_percent = begin_step_percent
self._end_step_percent = end_step_percent
self._image_proj: torch.Tensor | None = None
def _get_weight(self, timestep_index: int, total_num_timesteps: int) -> float:
first_step = math.floor(self._begin_step_percent * total_num_timesteps)
last_step = math.ceil(self._end_step_percent * total_num_timesteps)
if timestep_index < first_step or timestep_index > last_step:
return 0.0
if isinstance(self._weight, list):
return self._weight[timestep_index]
return self._weight
@staticmethod
def run_clip_image_encoder(
pil_image: List[Image.Image], image_encoder: CLIPVisionModelWithProjection
) -> torch.Tensor:
clip_image_processor = CLIPImageProcessor()
clip_image: torch.Tensor = clip_image_processor(images=pil_image, return_tensors="pt").pixel_values
clip_image = clip_image.to(device=image_encoder.device, dtype=image_encoder.dtype)
clip_image_embeds = image_encoder(clip_image).image_embeds
return clip_image_embeds
def run_image_proj(self, dtype: torch.dtype):
image_prompt_clip_embed = self._image_prompt_clip_embed.to(dtype=dtype)
self._image_proj = self._model.image_proj(image_prompt_clip_embed)
def run_ip_adapter(
self,
timestep_index: int,
total_num_timesteps: int,
block_index: int,
block: DoubleStreamBlock,
img_q: torch.Tensor,
img: torch.Tensor,
) -> torch.Tensor:
"""The logic in this function is based on:
https://github.com/XLabs-AI/x-flux/blob/47495425dbed499be1e8e5a6e52628b07349cba2/src/flux/modules/layers.py#L245-L301
"""
weight = self._get_weight(timestep_index=timestep_index, total_num_timesteps=total_num_timesteps)
if weight < 1e-6:
return img
ip_adapter_block = self._model.ip_adapter_double_blocks.double_blocks[block_index]
ip_key = ip_adapter_block.ip_adapter_double_stream_k_proj(self._image_proj)
ip_value = ip_adapter_block.ip_adapter_double_stream_v_proj(self._image_proj)
# Reshape projections for multi-head attention.
ip_key = einops.rearrange(ip_key, "B L (H D) -> B H L D", H=block.num_heads)
ip_value = einops.rearrange(ip_value, "B L (H D) -> B H L D", H=block.num_heads)
# Compute attention between IP projections and the latent query.
ip_attn = torch.nn.functional.scaled_dot_product_attention(
img_q, ip_key, ip_value, dropout_p=0.0, is_causal=False
)
ip_attn = einops.rearrange(ip_attn, "B H L D -> B L (H D)", H=block.num_heads)
img = img + weight * ip_attn
return img

View File

@@ -0,0 +1,93 @@
# This file is based on:
# https://github.com/XLabs-AI/x-flux/blob/47495425dbed499be1e8e5a6e52628b07349cba2/src/flux/modules/layers.py#L221
import einops
import torch
from invokeai.backend.flux.math import attention
from invokeai.backend.flux.modules.layers import DoubleStreamBlock
class IPDoubleStreamBlockProcessor(torch.nn.Module):
"""Attention processor for handling IP-adapter with double stream block."""
def __init__(self, context_dim: int, hidden_dim: int):
super().__init__()
# Ensure context_dim matches the dimension of image_proj
self.context_dim = context_dim
self.hidden_dim = hidden_dim
# Initialize projections for IP-adapter
self.ip_adapter_double_stream_k_proj = torch.nn.Linear(context_dim, hidden_dim, bias=True)
self.ip_adapter_double_stream_v_proj = torch.nn.Linear(context_dim, hidden_dim, bias=True)
torch.nn.init.zeros_(self.ip_adapter_double_stream_k_proj.weight)
torch.nn.init.zeros_(self.ip_adapter_double_stream_k_proj.bias)
torch.nn.init.zeros_(self.ip_adapter_double_stream_v_proj.weight)
torch.nn.init.zeros_(self.ip_adapter_double_stream_v_proj.bias)
def __call__(
self,
attn: DoubleStreamBlock,
img: torch.Tensor,
txt: torch.Tensor,
vec: torch.Tensor,
pe: torch.Tensor,
image_proj: torch.Tensor,
ip_scale: float = 1.0,
):
# Prepare image for attention
img_mod1, img_mod2 = attn.img_mod(vec)
txt_mod1, txt_mod2 = attn.txt_mod(vec)
img_modulated = attn.img_norm1(img)
img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift
img_qkv = attn.img_attn.qkv(img_modulated)
img_q, img_k, img_v = einops.rearrange(
img_qkv, "B L (K H D) -> K B H L D", K=3, H=attn.num_heads, D=attn.head_dim
)
img_q, img_k = attn.img_attn.norm(img_q, img_k, img_v)
txt_modulated = attn.txt_norm1(txt)
txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift
txt_qkv = attn.txt_attn.qkv(txt_modulated)
txt_q, txt_k, txt_v = einops.rearrange(
txt_qkv, "B L (K H D) -> K B H L D", K=3, H=attn.num_heads, D=attn.head_dim
)
txt_q, txt_k = attn.txt_attn.norm(txt_q, txt_k, txt_v)
q = torch.cat((txt_q, img_q), dim=2)
k = torch.cat((txt_k, img_k), dim=2)
v = torch.cat((txt_v, img_v), dim=2)
attn1 = attention(q, k, v, pe=pe)
txt_attn, img_attn = attn1[:, : txt.shape[1]], attn1[:, txt.shape[1] :]
# print(f"txt_attn shape: {txt_attn.size()}")
# print(f"img_attn shape: {img_attn.size()}")
img = img + img_mod1.gate * attn.img_attn.proj(img_attn)
img = img + img_mod2.gate * attn.img_mlp((1 + img_mod2.scale) * attn.img_norm2(img) + img_mod2.shift)
txt = txt + txt_mod1.gate * attn.txt_attn.proj(txt_attn)
txt = txt + txt_mod2.gate * attn.txt_mlp((1 + txt_mod2.scale) * attn.txt_norm2(txt) + txt_mod2.shift)
# IP-adapter processing
ip_query = img_q # latent sample query
ip_key = self.ip_adapter_double_stream_k_proj(image_proj)
ip_value = self.ip_adapter_double_stream_v_proj(image_proj)
# Reshape projections for multi-head attention
ip_key = einops.rearrange(ip_key, "B L (H D) -> B H L D", H=attn.num_heads, D=attn.head_dim)
ip_value = einops.rearrange(ip_value, "B L (H D) -> B H L D", H=attn.num_heads, D=attn.head_dim)
# Compute attention between IP projections and the latent query
ip_attention = torch.nn.functional.scaled_dot_product_attention(
ip_query, ip_key, ip_value, dropout_p=0.0, is_causal=False
)
ip_attention = einops.rearrange(ip_attention, "B H L D -> B L (H D)", H=attn.num_heads, D=attn.head_dim)
img = img + ip_scale * ip_attention
return img, txt

View File

@@ -0,0 +1,52 @@
from typing import Any, Dict
import torch
from invokeai.backend.flux.ip_adapter.xlabs_ip_adapter_flux import XlabsIpAdapterParams
def is_state_dict_xlabs_ip_adapter(sd: Dict[str, Any]) -> bool:
"""Is the state dict for an XLabs FLUX IP-Adapter model?
This is intended to be a reasonably high-precision detector, but it is not guaranteed to have perfect precision.
"""
# If all of the expected keys are present, then this is very likely an XLabs IP-Adapter model.
expected_keys = {
"double_blocks.0.processor.ip_adapter_double_stream_k_proj.bias",
"double_blocks.0.processor.ip_adapter_double_stream_k_proj.weight",
"double_blocks.0.processor.ip_adapter_double_stream_v_proj.bias",
"double_blocks.0.processor.ip_adapter_double_stream_v_proj.weight",
"ip_adapter_proj_model.norm.bias",
"ip_adapter_proj_model.norm.weight",
"ip_adapter_proj_model.proj.bias",
"ip_adapter_proj_model.proj.weight",
}
if expected_keys.issubset(sd.keys()):
return True
return False
def infer_xlabs_ip_adapter_params_from_state_dict(state_dict: dict[str, torch.Tensor]) -> XlabsIpAdapterParams:
num_double_blocks = 0
context_dim = 0
hidden_dim = 0
# Count the number of double blocks.
double_block_index = 0
while f"double_blocks.{double_block_index}.processor.ip_adapter_double_stream_k_proj.weight" in state_dict:
double_block_index += 1
num_double_blocks = double_block_index
hidden_dim = state_dict["double_blocks.0.processor.ip_adapter_double_stream_k_proj.weight"].shape[0]
context_dim = state_dict["double_blocks.0.processor.ip_adapter_double_stream_k_proj.weight"].shape[1]
clip_embeddings_dim = state_dict["ip_adapter_proj_model.proj.weight"].shape[1]
clip_extra_context_tokens = state_dict["ip_adapter_proj_model.proj.weight"].shape[0] // context_dim
return XlabsIpAdapterParams(
num_double_blocks=num_double_blocks,
context_dim=context_dim,
hidden_dim=hidden_dim,
clip_embeddings_dim=clip_embeddings_dim,
clip_extra_context_tokens=clip_extra_context_tokens,
)

View File

@@ -0,0 +1,70 @@
from dataclasses import dataclass
import torch
from invokeai.backend.ip_adapter.ip_adapter import ImageProjModel
class IPDoubleStreamBlock(torch.nn.Module):
def __init__(self, context_dim: int, hidden_dim: int):
super().__init__()
self.context_dim = context_dim
self.hidden_dim = hidden_dim
self.ip_adapter_double_stream_k_proj = torch.nn.Linear(context_dim, hidden_dim, bias=True)
self.ip_adapter_double_stream_v_proj = torch.nn.Linear(context_dim, hidden_dim, bias=True)
class IPAdapterDoubleBlocks(torch.nn.Module):
def __init__(self, num_double_blocks: int, context_dim: int, hidden_dim: int):
super().__init__()
self.double_blocks = torch.nn.ModuleList(
[IPDoubleStreamBlock(context_dim, hidden_dim) for _ in range(num_double_blocks)]
)
@dataclass
class XlabsIpAdapterParams:
num_double_blocks: int
context_dim: int
hidden_dim: int
clip_embeddings_dim: int
clip_extra_context_tokens: int
class XlabsIpAdapterFlux(torch.nn.Module):
def __init__(self, params: XlabsIpAdapterParams):
super().__init__()
self.image_proj = ImageProjModel(
cross_attention_dim=params.context_dim,
clip_embeddings_dim=params.clip_embeddings_dim,
clip_extra_context_tokens=params.clip_extra_context_tokens,
)
self.ip_adapter_double_blocks = IPAdapterDoubleBlocks(
num_double_blocks=params.num_double_blocks, context_dim=params.context_dim, hidden_dim=params.hidden_dim
)
def load_xlabs_state_dict(self, state_dict: dict[str, torch.Tensor], assign: bool = False):
"""We need this custom function to load state dicts rather than using .load_state_dict(...) because the model
structure does not match the state_dict structure.
"""
# Split the state_dict into the image projection model and the double blocks.
image_proj_sd: dict[str, torch.Tensor] = {}
double_blocks_sd: dict[str, torch.Tensor] = {}
for k, v in state_dict.items():
if k.startswith("ip_adapter_proj_model."):
image_proj_sd[k] = v
elif k.startswith("double_blocks."):
double_blocks_sd[k] = v
else:
raise ValueError(f"Unexpected key: {k}")
# Initialize the image projection model.
image_proj_sd = {k.replace("ip_adapter_proj_model.", ""): v for k, v in image_proj_sd.items()}
self.image_proj.load_state_dict(image_proj_sd, assign=assign)
# Initialize the double blocks.
double_blocks_sd = {k.replace("processor.", ""): v for k, v in double_blocks_sd.items()}
self.ip_adapter_double_blocks.load_state_dict(double_blocks_sd, assign=assign)

View File

@@ -5,10 +5,10 @@ from einops import rearrange
from torch import Tensor
def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor) -> Tensor:
def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor, attn_mask: Tensor | None = None) -> Tensor:
q, k = apply_rope(q, k, pe)
x = torch.nn.functional.scaled_dot_product_attention(q, k, v)
x = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask)
x = rearrange(x, "B H L D -> B L (H D)")
return x
@@ -24,12 +24,12 @@ def rope(pos: Tensor, dim: int, theta: int) -> Tensor:
out = torch.einsum("...n,d->...nd", pos, omega)
out = torch.stack([torch.cos(out), -torch.sin(out), torch.sin(out), torch.cos(out)], dim=-1)
out = rearrange(out, "b n d (i j) -> b n d i j", i=2, j=2)
return out.float()
return out.to(dtype=pos.dtype, device=pos.device)
def apply_rope(xq: Tensor, xk: Tensor, freqs_cis: Tensor) -> tuple[Tensor, Tensor]:
xq_ = xq.float().reshape(*xq.shape[:-1], -1, 1, 2)
xk_ = xk.float().reshape(*xk.shape[:-1], -1, 1, 2)
xq_ = xq.view(*xq.shape[:-1], -1, 1, 2)
xk_ = xk.view(*xk.shape[:-1], -1, 1, 2)
xq_out = freqs_cis[..., 0] * xq_[..., 0] + freqs_cis[..., 1] * xq_[..., 1]
xk_out = freqs_cis[..., 0] * xk_[..., 0] + freqs_cis[..., 1] * xk_[..., 1]
return xq_out.reshape(*xq.shape).type_as(xq), xk_out.reshape(*xk.shape).type_as(xk)
return xq_out.view(*xq.shape), xk_out.view(*xk.shape)

View File

@@ -5,6 +5,12 @@ from dataclasses import dataclass
import torch
from torch import Tensor, nn
from invokeai.backend.flux.custom_block_processor import (
CustomDoubleStreamBlockProcessor,
CustomSingleStreamBlockProcessor,
)
from invokeai.backend.flux.extensions.regional_prompting_extension import RegionalPromptingExtension
from invokeai.backend.flux.extensions.xlabs_ip_adapter_extension import XLabsIPAdapterExtension
from invokeai.backend.flux.modules.layers import (
DoubleStreamBlock,
EmbedND,
@@ -88,8 +94,12 @@ class Flux(nn.Module):
timesteps: Tensor,
y: Tensor,
guidance: Tensor | None,
timestep_index: int,
total_num_timesteps: int,
controlnet_double_block_residuals: list[Tensor] | None,
controlnet_single_block_residuals: list[Tensor] | None,
ip_adapter_extensions: list[XLabsIPAdapterExtension],
regional_prompting_extension: RegionalPromptingExtension,
) -> Tensor:
if img.ndim != 3 or txt.ndim != 3:
raise ValueError("Input img and txt tensors must have 3 dimensions.")
@@ -111,7 +121,19 @@ class Flux(nn.Module):
if controlnet_double_block_residuals is not None:
assert len(controlnet_double_block_residuals) == len(self.double_blocks)
for block_index, block in enumerate(self.double_blocks):
img, txt = block(img=img, txt=txt, vec=vec, pe=pe)
assert isinstance(block, DoubleStreamBlock)
img, txt = CustomDoubleStreamBlockProcessor.custom_double_block_forward(
timestep_index=timestep_index,
total_num_timesteps=total_num_timesteps,
block_index=block_index,
block=block,
img=img,
txt=txt,
vec=vec,
pe=pe,
ip_adapter_extensions=ip_adapter_extensions,
regional_prompting_extension=regional_prompting_extension,
)
if controlnet_double_block_residuals is not None:
img += controlnet_double_block_residuals[block_index]
@@ -123,7 +145,17 @@ class Flux(nn.Module):
assert len(controlnet_single_block_residuals) == len(self.single_blocks)
for block_index, block in enumerate(self.single_blocks):
img = block(img, vec=vec, pe=pe)
assert isinstance(block, SingleStreamBlock)
img = CustomSingleStreamBlockProcessor.custom_single_block_forward(
timestep_index=timestep_index,
total_num_timesteps=total_num_timesteps,
block_index=block_index,
block=block,
img=img,
vec=vec,
pe=pe,
regional_prompting_extension=regional_prompting_extension,
)
if controlnet_single_block_residuals is not None:
img[:, txt.shape[1] :, ...] += controlnet_single_block_residuals[block_index]

View File

@@ -66,10 +66,7 @@ class RMSNorm(torch.nn.Module):
self.scale = nn.Parameter(torch.ones(dim))
def forward(self, x: Tensor):
x_dtype = x.dtype
x = x.float()
rrms = torch.rsqrt(torch.mean(x**2, dim=-1, keepdim=True) + 1e-6)
return (x * rrms).to(dtype=x_dtype) * self.scale
return torch.nn.functional.rms_norm(x, self.scale.shape, self.scale, eps=1e-6)
class QKNorm(torch.nn.Module):

View File

@@ -168,8 +168,17 @@ def generate_img_ids(h: int, w: int, batch_size: int, device: torch.device, dtyp
Returns:
torch.Tensor: Image position ids.
"""
if device.type == "mps":
orig_dtype = dtype
dtype = torch.float16
img_ids = torch.zeros(h // 2, w // 2, 3, device=device, dtype=dtype)
img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2, device=device, dtype=dtype)[:, None]
img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2, device=device, dtype=dtype)[None, :]
img_ids = repeat(img_ids, "h w c -> b (h w) c", b=batch_size)
if device.type == "mps":
img_ids.to(orig_dtype)
return img_ids

View File

@@ -0,0 +1,36 @@
from dataclasses import dataclass
import torch
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import Range
@dataclass
class FluxTextConditioning:
t5_embeddings: torch.Tensor
clip_embeddings: torch.Tensor
# If mask is None, the prompt is a global prompt.
mask: torch.Tensor | None
@dataclass
class FluxRegionalTextConditioning:
# Concatenated text embeddings.
# Shape: (1, concatenated_txt_seq_len, 4096)
t5_embeddings: torch.Tensor
# Shape: (1, concatenated_txt_seq_len, 3)
t5_txt_ids: torch.Tensor
# Global CLIP embeddings.
# Shape: (1, 768)
clip_embeddings: torch.Tensor
# A binary mask indicating the regions of the image that the prompt should be applied to. If None, the prompt is a
# global prompt.
# image_masks[i] is the mask for the ith prompt.
# image_masks[i] has shape (1, image_seq_len) and dtype torch.bool.
image_masks: list[torch.Tensor | None]
# List of ranges that represent the embedding ranges for each mask.
# t5_embedding_ranges[i] contains the range of the t5 embeddings that correspond to image_masks[i].
t5_embedding_ranges: list[Range]

Binary file not shown.

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,4 @@
from typing import Optional
from typing import Optional, TypeAlias
import torch
from PIL import Image
@@ -7,6 +7,14 @@ from transformers.models.sam.processing_sam import SamProcessor
from invokeai.backend.raw_model import RawModel
# Type aliases for the inputs to the SAM model.
ListOfBoundingBoxes: TypeAlias = list[list[int]]
"""A list of bounding boxes. Each bounding box is in the format [xmin, ymin, xmax, ymax]."""
ListOfPoints: TypeAlias = list[list[int]]
"""A list of points. Each point is in the format [x, y]."""
ListOfPointLabels: TypeAlias = list[int]
"""A list of SAM point labels. Each label is an integer where -1 is background, 0 is neutral, and 1 is foreground."""
class SegmentAnythingPipeline(RawModel):
"""A wrapper class for the transformers SAM model and processor that makes it compatible with the model manager."""
@@ -27,20 +35,53 @@ class SegmentAnythingPipeline(RawModel):
return calc_module_size(self._sam_model)
def segment(self, image: Image.Image, bounding_boxes: list[list[int]]) -> torch.Tensor:
def segment(
self,
image: Image.Image,
bounding_boxes: list[list[int]] | None = None,
point_lists: list[list[list[int]]] | None = None,
) -> torch.Tensor:
"""Run the SAM model.
Either bounding_boxes or point_lists must be provided. If both are provided, bounding_boxes will be used and
point_lists will be ignored.
Args:
image (Image.Image): The image to segment.
bounding_boxes (list[list[int]]): The bounding box prompts. Each bounding box is in the format
[xmin, ymin, xmax, ymax].
point_lists (list[list[list[int]]]): The points prompts. Each point is in the format [x, y, label].
`label` is an integer where -1 is background, 0 is neutral, and 1 is foreground.
Returns:
torch.Tensor: The segmentation masks. dtype: torch.bool. shape: [num_masks, channels, height, width].
"""
# Add batch dimension of 1 to the bounding boxes.
boxes = [bounding_boxes]
inputs = self._sam_processor(images=image, input_boxes=boxes, return_tensors="pt").to(self._sam_model.device)
# Prep the inputs:
# - Create a list of bounding boxes or points and labels.
# - Add a batch dimension of 1 to the inputs.
if bounding_boxes:
input_boxes: list[ListOfBoundingBoxes] | None = [bounding_boxes]
input_points: list[ListOfPoints] | None = None
input_labels: list[ListOfPointLabels] | None = None
elif point_lists:
input_boxes: list[ListOfBoundingBoxes] | None = None
input_points: list[ListOfPoints] | None = []
input_labels: list[ListOfPointLabels] | None = []
for point_list in point_lists:
input_points.append([[p[0], p[1]] for p in point_list])
input_labels.append([p[2] for p in point_list])
else:
raise ValueError("Either bounding_boxes or points and labels must be provided.")
inputs = self._sam_processor(
images=image,
input_boxes=input_boxes,
input_points=input_points,
input_labels=input_labels,
return_tensors="pt",
).to(self._sam_model.device)
outputs = self._sam_model(**inputs)
masks = self._sam_processor.post_process_masks(
masks=outputs.pred_masks,

View File

@@ -45,8 +45,9 @@ def lora_model_from_flux_diffusers_state_dict(state_dict: Dict[str, torch.Tensor
# Constants for FLUX.1
num_double_layers = 19
num_single_layers = 38
# inner_dim = 3072
# mlp_ratio = 4.0
hidden_size = 3072
mlp_ratio = 4.0
mlp_hidden_dim = int(hidden_size * mlp_ratio)
layers: dict[str, AnyLoRALayer] = {}
@@ -62,30 +63,43 @@ def lora_model_from_flux_diffusers_state_dict(state_dict: Dict[str, torch.Tensor
layers[dst_key] = LoRALayer.from_state_dict_values(values=value)
assert len(src_layer_dict) == 0
def add_qkv_lora_layer_if_present(src_keys: list[str], dst_qkv_key: str) -> None:
def add_qkv_lora_layer_if_present(
src_keys: list[str],
src_weight_shapes: list[tuple[int, int]],
dst_qkv_key: str,
allow_missing_keys: bool = False,
) -> None:
"""Handle the Q, K, V matrices for a transformer block. We need special handling because the diffusers format
stores them in separate matrices, whereas the BFL format used internally by InvokeAI concatenates them.
"""
# We expect that either all src keys are present or none of them are. Verify this.
keys_present = [key in grouped_state_dict for key in src_keys]
assert all(keys_present) or not any(keys_present)
# If none of the keys are present, return early.
keys_present = [key in grouped_state_dict for key in src_keys]
if not any(keys_present):
return
src_layer_dicts = [grouped_state_dict.pop(key) for key in src_keys]
sub_layers: list[LoRALayer] = []
for src_layer_dict in src_layer_dicts:
values = {
"lora_down.weight": src_layer_dict.pop("lora_A.weight"),
"lora_up.weight": src_layer_dict.pop("lora_B.weight"),
}
if alpha is not None:
values["alpha"] = torch.tensor(alpha)
sub_layers.append(LoRALayer.from_state_dict_values(values=values))
assert len(src_layer_dict) == 0
layers[dst_qkv_key] = ConcatenatedLoRALayer(lora_layers=sub_layers, concat_axis=0)
for src_key, src_weight_shape in zip(src_keys, src_weight_shapes, strict=True):
src_layer_dict = grouped_state_dict.pop(src_key, None)
if src_layer_dict is not None:
values = {
"lora_down.weight": src_layer_dict.pop("lora_A.weight"),
"lora_up.weight": src_layer_dict.pop("lora_B.weight"),
}
if alpha is not None:
values["alpha"] = torch.tensor(alpha)
assert values["lora_down.weight"].shape[1] == src_weight_shape[1]
assert values["lora_up.weight"].shape[0] == src_weight_shape[0]
sub_layers.append(LoRALayer.from_state_dict_values(values=values))
assert len(src_layer_dict) == 0
else:
if not allow_missing_keys:
raise ValueError(f"Missing LoRA layer: '{src_key}'.")
values = {
"lora_up.weight": torch.zeros((src_weight_shape[0], 1)),
"lora_down.weight": torch.zeros((1, src_weight_shape[1])),
}
sub_layers.append(LoRALayer.from_state_dict_values(values=values))
layers[dst_qkv_key] = ConcatenatedLoRALayer(lora_layers=sub_layers)
# time_text_embed.timestep_embedder -> time_in.
add_lora_layer_if_present("time_text_embed.timestep_embedder.linear_1", "time_in.in_layer")
@@ -118,6 +132,7 @@ def lora_model_from_flux_diffusers_state_dict(state_dict: Dict[str, torch.Tensor
f"transformer_blocks.{i}.attn.to_k",
f"transformer_blocks.{i}.attn.to_v",
],
[(hidden_size, hidden_size), (hidden_size, hidden_size), (hidden_size, hidden_size)],
f"double_blocks.{i}.img_attn.qkv",
)
add_qkv_lora_layer_if_present(
@@ -126,6 +141,7 @@ def lora_model_from_flux_diffusers_state_dict(state_dict: Dict[str, torch.Tensor
f"transformer_blocks.{i}.attn.add_k_proj",
f"transformer_blocks.{i}.attn.add_v_proj",
],
[(hidden_size, hidden_size), (hidden_size, hidden_size), (hidden_size, hidden_size)],
f"double_blocks.{i}.txt_attn.qkv",
)
@@ -175,7 +191,14 @@ def lora_model_from_flux_diffusers_state_dict(state_dict: Dict[str, torch.Tensor
f"single_transformer_blocks.{i}.attn.to_v",
f"single_transformer_blocks.{i}.proj_mlp",
],
[
(hidden_size, hidden_size),
(hidden_size, hidden_size),
(hidden_size, hidden_size),
(mlp_hidden_dim, hidden_size),
],
f"single_blocks.{i}.linear1",
allow_missing_keys=True,
)
# Output projections.

View File

@@ -0,0 +1,133 @@
import torch
from invokeai.backend.lora.layers.any_lora_layer import AnyLoRALayer
from invokeai.backend.lora.layers.concatenated_lora_layer import ConcatenatedLoRALayer
from invokeai.backend.lora.layers.lora_layer import LoRALayer
class LoRASidecarWrapper(torch.nn.Module):
def __init__(self, orig_module: torch.nn.Module, lora_layers: list[AnyLoRALayer], lora_weights: list[float]):
super().__init__()
self._orig_module = orig_module
self._lora_layers = lora_layers
self._lora_weights = lora_weights
@property
def orig_module(self) -> torch.nn.Module:
return self._orig_module
def add_lora_layer(self, lora_layer: AnyLoRALayer, lora_weight: float):
self._lora_layers.append(lora_layer)
self._lora_weights.append(lora_weight)
@torch.no_grad()
def _get_lora_patched_parameters(
self, orig_params: dict[str, torch.Tensor], lora_layers: list[AnyLoRALayer], lora_weights: list[float]
) -> dict[str, torch.Tensor]:
params: dict[str, torch.Tensor] = {}
for lora_layer, lora_weight in zip(lora_layers, lora_weights, strict=True):
layer_params = lora_layer.get_parameters(self._orig_module)
for param_name, param_weight in layer_params.items():
if orig_params[param_name].shape != param_weight.shape:
param_weight = param_weight.reshape(orig_params[param_name].shape)
if param_name not in params:
params[param_name] = param_weight * (lora_layer.scale() * lora_weight)
else:
params[param_name] += param_weight * (lora_layer.scale() * lora_weight)
return params
class LoRALinearWrapper(LoRASidecarWrapper):
def _lora_linear_forward(self, input: torch.Tensor, lora_layer: LoRALayer, lora_weight: float) -> torch.Tensor:
"""An optimized implementation of the residual calculation for a Linear LoRALayer."""
x = torch.nn.functional.linear(input, lora_layer.down)
if lora_layer.mid is not None:
x = torch.nn.functional.linear(x, lora_layer.mid)
x = torch.nn.functional.linear(x, lora_layer.up, bias=lora_layer.bias)
x *= lora_weight * lora_layer.scale()
return x
def _concatenated_lora_forward(
self, input: torch.Tensor, concatenated_lora_layer: ConcatenatedLoRALayer, lora_weight: float
) -> torch.Tensor:
"""An optimized implementation of the residual calculation for a Linear ConcatenatedLoRALayer."""
x_chunks: list[torch.Tensor] = []
for lora_layer in concatenated_lora_layer.lora_layers:
x_chunk = torch.nn.functional.linear(input, lora_layer.down)
if lora_layer.mid is not None:
x_chunk = torch.nn.functional.linear(x_chunk, lora_layer.mid)
x_chunk = torch.nn.functional.linear(x_chunk, lora_layer.up, bias=lora_layer.bias)
x_chunk *= lora_weight * lora_layer.scale()
x_chunks.append(x_chunk)
# TODO(ryand): Generalize to support concat_axis != 0.
assert concatenated_lora_layer.concat_axis == 0
x = torch.cat(x_chunks, dim=-1)
return x
def forward(self, input: torch.Tensor) -> torch.Tensor:
# Split the LoRA layers into those that have optimized implementations and those that don't.
optimized_layer_types = (LoRALayer, ConcatenatedLoRALayer)
optimized_layers = [
(layer, weight)
for layer, weight in zip(self._lora_layers, self._lora_weights, strict=True)
if isinstance(layer, optimized_layer_types)
]
non_optimized_layers = [
(layer, weight)
for layer, weight in zip(self._lora_layers, self._lora_weights, strict=True)
if not isinstance(layer, optimized_layer_types)
]
# First, calculate the residual for LoRA layers for which there is an optimized implementation.
residual = None
for lora_layer, lora_weight in optimized_layers:
if isinstance(lora_layer, LoRALayer):
added_residual = self._lora_linear_forward(input, lora_layer, lora_weight)
elif isinstance(lora_layer, ConcatenatedLoRALayer):
added_residual = self._concatenated_lora_forward(input, lora_layer, lora_weight)
else:
raise ValueError(f"Unsupported LoRA layer type: {type(lora_layer)}")
if residual is None:
residual = added_residual
else:
residual += added_residual
# Next, calculate the residuals for the LoRA layers for which there is no optimized implementation.
if non_optimized_layers:
unoptimized_layers, unoptimized_weights = zip(*non_optimized_layers, strict=True)
params = self._get_lora_patched_parameters(
orig_params={"weight": self._orig_module.weight, "bias": self._orig_module.bias},
lora_layers=unoptimized_layers,
lora_weights=unoptimized_weights,
)
added_residual = torch.nn.functional.linear(input, params["weight"], params.get("bias", None))
if residual is None:
residual = added_residual
else:
residual += added_residual
return self.orig_module(input) + residual
class LoRAConv1dWrapper(LoRASidecarWrapper):
def forward(self, input: torch.Tensor) -> torch.Tensor:
params = self._get_lora_patched_parameters(
orig_params={"weight": self._orig_module.weight, "bias": self._orig_module.bias},
lora_layers=self._lora_layers,
lora_weights=self._lora_weights,
)
return self.orig_module(input) + torch.nn.functional.conv1d(input, params["weight"], params.get("bias", None))
class LoRAConv2dWrapper(LoRASidecarWrapper):
def forward(self, input: torch.Tensor) -> torch.Tensor:
params = self._get_lora_patched_parameters(
orig_params={"weight": self._orig_module.weight, "bias": self._orig_module.bias},
lora_layers=self._lora_layers,
lora_weights=self._lora_weights,
)
return self.orig_module(input) + torch.nn.functional.conv2d(input, params["weight"], params.get("bias", None))

View File

@@ -4,19 +4,126 @@ from typing import Dict, Iterable, Optional, Tuple
import torch
from invokeai.backend.lora.layers.any_lora_layer import AnyLoRALayer
from invokeai.backend.lora.layers.concatenated_lora_layer import ConcatenatedLoRALayer
from invokeai.backend.lora.layers.lora_layer import LoRALayer
from invokeai.backend.lora.lora_model_raw import LoRAModelRaw
from invokeai.backend.lora.sidecar_layers.concatenated_lora.concatenated_lora_linear_sidecar_layer import (
ConcatenatedLoRALinearSidecarLayer,
from invokeai.backend.lora.lora_layer_wrappers import (
LoRAConv1dWrapper,
LoRAConv2dWrapper,
LoRALinearWrapper,
LoRASidecarWrapper,
)
from invokeai.backend.lora.sidecar_layers.lora.lora_linear_sidecar_layer import LoRALinearSidecarLayer
from invokeai.backend.lora.sidecar_layers.lora_sidecar_module import LoRASidecarModule
from invokeai.backend.lora.lora_model_raw import LoRAModelRaw
from invokeai.backend.util.devices import TorchDevice
from invokeai.backend.util.original_weights_storage import OriginalWeightsStorage
class LoRAPatcher:
@staticmethod
@torch.no_grad()
@contextmanager
def apply_smart_lora_patches(
model: torch.nn.Module,
patches: Iterable[Tuple[LoRAModelRaw, float]],
prefix: str,
dtype: torch.dtype,
cached_weights: Optional[Dict[str, torch.Tensor]] = None,
):
"""Apply 'smart' LoRA patching that chooses whether to use direct patching or a sidecar wrapper for each module."""
# original_weights are stored for unpatching layers that are directly patched.
original_weights = OriginalWeightsStorage(cached_weights)
# original_modules are stored for unpatching layers that are wrapped in a LoRASidecarWrapper.
original_modules: dict[str, torch.nn.Module] = {}
try:
for patch, patch_weight in patches:
LoRAPatcher._apply_smart_lora_patch(
model=model,
prefix=prefix,
patch=patch,
patch_weight=patch_weight,
original_weights=original_weights,
original_modules=original_modules,
dtype=dtype,
)
yield
finally:
# Restore directly patched layers.
for param_key, weight in original_weights.get_changed_weights():
model.get_parameter(param_key).copy_(weight)
# Restore LoRASidecarWrapper modules.
# Note: This logic assumes no nested modules in original_modules.
for module_key, orig_module in original_modules.items():
module_parent_key, module_name = LoRAPatcher._split_parent_key(module_key)
parent_module = model.get_submodule(module_parent_key)
LoRAPatcher._set_submodule(parent_module, module_name, orig_module)
@staticmethod
@torch.no_grad()
def _apply_smart_lora_patch(
model: torch.nn.Module,
prefix: str,
patch: LoRAModelRaw,
patch_weight: float,
original_weights: OriginalWeightsStorage,
original_modules: dict[str, torch.nn.Module],
dtype: torch.dtype,
):
"""Apply a single LoRA patch to a model using the 'smart' patching strategy that chooses whether to use direct
patching or a sidecar wrapper for each module.
"""
if patch_weight == 0:
return
# If the layer keys contain a dot, then they are not flattened, and can be directly used to access model
# submodules. If the layer keys do not contain a dot, then they are flattened, meaning that all '.' have been
# replaced with '_'. Non-flattened keys are preferred, because they allow submodules to be accessed directly
# without searching, but some legacy code still uses flattened keys.
layer_keys_are_flattened = "." not in next(iter(patch.layers.keys()))
prefix_len = len(prefix)
for layer_key, layer in patch.layers.items():
if not layer_key.startswith(prefix):
continue
module_key, module = LoRAPatcher._get_submodule(
model, layer_key[prefix_len:], layer_key_is_flattened=layer_keys_are_flattened
)
# Decide whether to use direct patching or a sidecar wrapper.
# Direct patching is preferred, because it results in better runtime speed.
# Reasons to use sidecar patching:
# - The module is already wrapped in a LoRASidecarWrapper.
# - The module is quantized.
# - The module is on the CPU (and we don't want to store a second full copy of the original weights on the
# CPU, since this would double the RAM usage)
# NOTE: For now, we don't check if the layer is quantized here. We assume that this is checked in the caller
# and that the caller will use the 'apply_lora_wrapper_patches' method if the layer is quantized.
# TODO(ryand): Handle the case where we are running without a GPU. Should we set a config flag that allows
# forcing full patching even on the CPU?
if isinstance(module, LoRASidecarWrapper) or LoRAPatcher._is_any_part_of_layer_on_cpu(module):
LoRAPatcher._apply_lora_layer_wrapper_patch(
model=model,
module_to_patch=module,
module_to_patch_key=module_key,
patch=layer,
patch_weight=patch_weight,
original_modules=original_modules,
dtype=dtype,
)
else:
LoRAPatcher._apply_lora_layer_patch(
module_to_patch=module,
module_to_patch_key=module_key,
patch=layer,
patch_weight=patch_weight,
original_weights=original_weights,
)
@staticmethod
def _is_any_part_of_layer_on_cpu(layer: torch.nn.Module) -> bool:
return any(p.device.type == "cpu" for p in layer.parameters())
@staticmethod
@torch.no_grad()
@contextmanager
@@ -40,7 +147,7 @@ class LoRAPatcher:
original_weights = OriginalWeightsStorage(cached_weights)
try:
for patch, patch_weight in patches:
LoRAPatcher.apply_lora_patch(
LoRAPatcher._apply_lora_patch(
model=model,
prefix=prefix,
patch=patch,
@@ -56,7 +163,7 @@ class LoRAPatcher:
@staticmethod
@torch.no_grad()
def apply_lora_patch(
def _apply_lora_patch(
model: torch.nn.Module,
prefix: str,
patch: LoRAModelRaw,
@@ -91,48 +198,67 @@ class LoRAPatcher:
model, layer_key[prefix_len:], layer_key_is_flattened=layer_keys_are_flattened
)
# All of the LoRA weight calculations will be done on the same device as the module weight.
# (Performance will be best if this is a CUDA device.)
device = module.weight.device
dtype = module.weight.dtype
LoRAPatcher._apply_lora_layer_patch(
module_to_patch=module,
module_to_patch_key=module_key,
patch=layer,
patch_weight=patch_weight,
original_weights=original_weights,
)
layer_scale = layer.scale()
@staticmethod
@torch.no_grad()
def _apply_lora_layer_patch(
module_to_patch: torch.nn.Module,
module_to_patch_key: str,
patch: AnyLoRALayer,
patch_weight: float,
original_weights: OriginalWeightsStorage,
):
# All of the LoRA weight calculations will be done on the same device as the module weight.
# (Performance will be best if this is a CUDA device.)
device = module_to_patch.weight.device
dtype = module_to_patch.weight.dtype
# We intentionally move to the target device first, then cast. Experimentally, this was found to
# be significantly faster for 16-bit CPU tensors being moved to a CUDA device than doing the
# same thing in a single call to '.to(...)'.
layer.to(device=device)
layer.to(dtype=torch.float32)
layer_scale = patch.scale()
# TODO(ryand): Using torch.autocast(...) over explicit casting may offer a speed benefit on CUDA
# devices here. Experimentally, it was found to be very slow on CPU. More investigation needed.
for param_name, lora_param_weight in layer.get_parameters(module).items():
param_key = module_key + "." + param_name
module_param = module.get_parameter(param_name)
# We intentionally move to the target device first, then cast. Experimentally, this was found to
# be significantly faster for 16-bit CPU tensors being moved to a CUDA device than doing the
# same thing in a single call to '.to(...)'.
patch.to(device=device)
patch.to(dtype=torch.float32)
# Save original weight
original_weights.save(param_key, module_param)
# TODO(ryand): Using torch.autocast(...) over explicit casting may offer a speed benefit on CUDA
# devices here. Experimentally, it was found to be very slow on CPU. More investigation needed.
for param_name, lora_param_weight in patch.get_parameters(module_to_patch).items():
param_key = module_to_patch_key + "." + param_name
module_param = module_to_patch.get_parameter(param_name)
if module_param.shape != lora_param_weight.shape:
lora_param_weight = lora_param_weight.reshape(module_param.shape)
# Save original weight
original_weights.save(param_key, module_param)
lora_param_weight *= patch_weight * layer_scale
module_param += lora_param_weight.to(dtype=dtype)
if module_param.shape != lora_param_weight.shape:
lora_param_weight = lora_param_weight.reshape(module_param.shape)
layer.to(device=TorchDevice.CPU_DEVICE)
lora_param_weight *= patch_weight * layer_scale
module_param += lora_param_weight.to(dtype=dtype)
patch.to(device=TorchDevice.CPU_DEVICE)
@staticmethod
@torch.no_grad()
@contextmanager
def apply_lora_sidecar_patches(
def apply_lora_wrapper_patches(
model: torch.nn.Module,
patches: Iterable[Tuple[LoRAModelRaw, float]],
prefix: str,
dtype: torch.dtype,
):
"""Apply one or more LoRA sidecar patches to a model within a context manager. Sidecar patches incur some
overhead compared to normal LoRA patching, but they allow for LoRA layers to applied to base layers in any
quantization format.
"""Apply one or more LoRA wrapper patches to a model within a context manager. Wrapper patches incur some
runtime overhead compared to normal LoRA patching, but they enable:
- LoRA layers to be applied to quantized models
- LoRA layers to be applied to CPU layers without needing to store a full copy of the original weights (i.e.
avoid doubling the memory requirements).
Args:
model (torch.nn.Module): The model to patch.
@@ -140,14 +266,11 @@ class LoRAPatcher:
associated weights. An iterator is used so that the LoRA patches do not need to be loaded into memory
all at once.
prefix (str): The keys in the patches will be filtered to only include weights with this prefix.
dtype (torch.dtype): The compute dtype of the sidecar layers. This cannot easily be inferred from the model,
since the sidecar layers are typically applied on top of quantized layers whose weight dtype is
different from their compute dtype.
"""
original_modules: dict[str, torch.nn.Module] = {}
try:
for patch, patch_weight in patches:
LoRAPatcher._apply_lora_sidecar_patch(
LoRAPatcher._apply_lora_wrapper_patch(
model=model,
prefix=prefix,
patch=patch,
@@ -165,7 +288,7 @@ class LoRAPatcher:
LoRAPatcher._set_submodule(parent_module, module_name, orig_module)
@staticmethod
def _apply_lora_sidecar_patch(
def _apply_lora_wrapper_patch(
model: torch.nn.Module,
patch: LoRAModelRaw,
patch_weight: float,
@@ -173,7 +296,7 @@ class LoRAPatcher:
original_modules: dict[str, torch.nn.Module],
dtype: torch.dtype,
):
"""Apply a single LoRA sidecar patch to a model."""
"""Apply a single LoRA wrapper patch to a model."""
if patch_weight == 0:
return
@@ -194,28 +317,47 @@ class LoRAPatcher:
model, layer_key[prefix_len:], layer_key_is_flattened=layer_keys_are_flattened
)
# Initialize the LoRA sidecar layer.
lora_sidecar_layer = LoRAPatcher._initialize_lora_sidecar_layer(module, layer, patch_weight)
LoRAPatcher._apply_lora_layer_wrapper_patch(
model=model,
module_to_patch=module,
module_to_patch_key=module_key,
patch=layer,
patch_weight=patch_weight,
original_modules=original_modules,
dtype=dtype,
)
# Replace the original module with a LoRASidecarModule if it has not already been done.
if module_key in original_modules:
# The module has already been patched with a LoRASidecarModule. Append to it.
assert isinstance(module, LoRASidecarModule)
lora_sidecar_module = module
else:
# The module has not yet been patched with a LoRASidecarModule. Create one.
lora_sidecar_module = LoRASidecarModule(module, [])
original_modules[module_key] = module
module_parent_key, module_name = LoRAPatcher._split_parent_key(module_key)
module_parent = model.get_submodule(module_parent_key)
LoRAPatcher._set_submodule(module_parent, module_name, lora_sidecar_module)
@staticmethod
@torch.no_grad()
def _apply_lora_layer_wrapper_patch(
model: torch.nn.Module,
module_to_patch: torch.nn.Module,
module_to_patch_key: str,
patch: AnyLoRALayer,
patch_weight: float,
original_modules: dict[str, torch.nn.Module],
dtype: torch.dtype,
):
"""Apply a single LoRA wrapper patch to a model."""
# Move the LoRA sidecar layer to the same device/dtype as the orig module.
# TODO(ryand): Experiment with moving to the device first, then casting. This could be faster.
lora_sidecar_layer.to(device=lora_sidecar_module.orig_module.weight.device, dtype=dtype)
# Replace the original module with a LoRASidecarWrapper if it has not already been done.
if not isinstance(module_to_patch, LoRASidecarWrapper):
lora_wrapper_layer = LoRAPatcher._initialize_lora_wrapper_layer(module_to_patch)
original_modules[module_to_patch_key] = module_to_patch
module_parent_key, module_name = LoRAPatcher._split_parent_key(module_to_patch_key)
module_parent = model.get_submodule(module_parent_key)
LoRAPatcher._set_submodule(module_parent, module_name, lora_wrapper_layer)
orig_module = module_to_patch
else:
assert module_to_patch_key in original_modules
lora_wrapper_layer = module_to_patch
orig_module = module_to_patch.orig_module
# Add the LoRA sidecar layer to the LoRASidecarModule.
lora_sidecar_module.add_lora_layer(lora_sidecar_layer)
# Move the LoRA layer to the same device/dtype as the orig module.
patch.to(device=orig_module.weight.device, dtype=dtype)
# Add the LoRA wrapper layer to the LoRASidecarWrapper.
lora_wrapper_layer.add_lora_layer(patch, patch_weight)
@staticmethod
def _split_parent_key(module_key: str) -> tuple[str, str]:
@@ -236,17 +378,13 @@ class LoRAPatcher:
raise ValueError(f"Invalid module key: {module_key}")
@staticmethod
def _initialize_lora_sidecar_layer(orig_layer: torch.nn.Module, lora_layer: AnyLoRALayer, patch_weight: float):
# TODO(ryand): Add support for more original layer types and LoRA layer types.
if isinstance(orig_layer, torch.nn.Linear) or (
isinstance(orig_layer, LoRASidecarModule) and isinstance(orig_layer.orig_module, torch.nn.Linear)
):
if isinstance(lora_layer, LoRALayer):
return LoRALinearSidecarLayer(lora_layer=lora_layer, weight=patch_weight)
elif isinstance(lora_layer, ConcatenatedLoRALayer):
return ConcatenatedLoRALinearSidecarLayer(concatenated_lora_layer=lora_layer, weight=patch_weight)
else:
raise ValueError(f"Unsupported Linear LoRA layer type: {type(lora_layer)}")
def _initialize_lora_wrapper_layer(orig_layer: torch.nn.Module):
if isinstance(orig_layer, torch.nn.Linear):
return LoRALinearWrapper(orig_layer, [], [])
elif isinstance(orig_layer, torch.nn.Conv1d):
return LoRAConv1dWrapper(orig_layer, [], [])
elif isinstance(orig_layer, torch.nn.Conv2d):
return LoRAConv2dWrapper(orig_layer, [], [])
else:
raise ValueError(f"Unsupported layer type: {type(orig_layer)}")

View File

@@ -1,34 +0,0 @@
import torch
from invokeai.backend.lora.layers.concatenated_lora_layer import ConcatenatedLoRALayer
class ConcatenatedLoRALinearSidecarLayer(torch.nn.Module):
def __init__(
self,
concatenated_lora_layer: ConcatenatedLoRALayer,
weight: float,
):
super().__init__()
self._concatenated_lora_layer = concatenated_lora_layer
self._weight = weight
def forward(self, input: torch.Tensor) -> torch.Tensor:
x_chunks: list[torch.Tensor] = []
for lora_layer in self._concatenated_lora_layer.lora_layers:
x_chunk = torch.nn.functional.linear(input, lora_layer.down)
if lora_layer.mid is not None:
x_chunk = torch.nn.functional.linear(x_chunk, lora_layer.mid)
x_chunk = torch.nn.functional.linear(x_chunk, lora_layer.up, bias=lora_layer.bias)
x_chunk *= self._weight * lora_layer.scale()
x_chunks.append(x_chunk)
# TODO(ryand): Generalize to support concat_axis != 0.
assert self._concatenated_lora_layer.concat_axis == 0
x = torch.cat(x_chunks, dim=-1)
return x
def to(self, device: torch.device | None = None, dtype: torch.dtype | None = None):
self._concatenated_lora_layer.to(device=device, dtype=dtype)
return self

View File

@@ -1,27 +0,0 @@
import torch
from invokeai.backend.lora.layers.lora_layer import LoRALayer
class LoRALinearSidecarLayer(torch.nn.Module):
def __init__(
self,
lora_layer: LoRALayer,
weight: float,
):
super().__init__()
self._lora_layer = lora_layer
self._weight = weight
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = torch.nn.functional.linear(x, self._lora_layer.down)
if self._lora_layer.mid is not None:
x = torch.nn.functional.linear(x, self._lora_layer.mid)
x = torch.nn.functional.linear(x, self._lora_layer.up, bias=self._lora_layer.bias)
x *= self._weight * self._lora_layer.scale()
return x
def to(self, device: torch.device | None = None, dtype: torch.dtype | None = None):
self._lora_layer.to(device=device, dtype=dtype)
return self

View File

@@ -1,24 +0,0 @@
import torch
class LoRASidecarModule(torch.nn.Module):
"""A LoRA sidecar module that wraps an original module and adds LoRA layers to it."""
def __init__(self, orig_module: torch.nn.Module, lora_layers: list[torch.nn.Module]):
super().__init__()
self.orig_module = orig_module
self._lora_layers = lora_layers
def add_lora_layer(self, lora_layer: torch.nn.Module):
self._lora_layers.append(lora_layer)
def forward(self, input: torch.Tensor) -> torch.Tensor:
x = self.orig_module(input)
for lora_layer in self._lora_layers:
x += lora_layer(input)
return x
def to(self, device: torch.device | None = None, dtype: torch.dtype | None = None):
self._orig_module.to(device=device, dtype=dtype)
for lora_layer in self._lora_layers:
lora_layer.to(device=device, dtype=dtype)

View File

@@ -53,6 +53,7 @@ class BaseModelType(str, Enum):
Any = "any"
StableDiffusion1 = "sd-1"
StableDiffusion2 = "sd-2"
StableDiffusion3 = "sd-3"
StableDiffusionXL = "sdxl"
StableDiffusionXLRefiner = "sdxl-refiner"
Flux = "flux"
@@ -83,8 +84,10 @@ class SubModelType(str, Enum):
Transformer = "transformer"
TextEncoder = "text_encoder"
TextEncoder2 = "text_encoder_2"
TextEncoder3 = "text_encoder_3"
Tokenizer = "tokenizer"
Tokenizer2 = "tokenizer_2"
Tokenizer3 = "tokenizer_3"
VAE = "vae"
VAEDecoder = "vae_decoder"
VAEEncoder = "vae_encoder"
@@ -92,6 +95,13 @@ class SubModelType(str, Enum):
SafetyChecker = "safety_checker"
class ClipVariantType(str, Enum):
"""Variant type."""
L = "large"
G = "gigantic"
class ModelVariantType(str, Enum):
"""Variant type."""
@@ -147,6 +157,17 @@ class ModelSourceType(str, Enum):
DEFAULTS_PRECISION = Literal["fp16", "fp32"]
AnyVariant: TypeAlias = Union[ModelVariantType, ClipVariantType, None]
class SubmodelDefinition(BaseModel):
path_or_prefix: str
model_type: ModelType
variant: AnyVariant = None
model_config = ConfigDict(protected_namespaces=())
class MainModelDefaultSettings(BaseModel):
vae: str | None = Field(default=None, description="Default VAE for this model (model key)")
vae_precision: DEFAULTS_PRECISION | None = Field(default=None, description="Default VAE precision for this model")
@@ -193,6 +214,9 @@ class ModelConfigBase(BaseModel):
schema["required"].extend(["key", "type", "format"])
model_config = ConfigDict(validate_assignment=True, json_schema_extra=json_schema_extra)
submodels: Optional[Dict[SubModelType, SubmodelDefinition]] = Field(
description="Loadable submodels in this model", default=None
)
class CheckpointConfigBase(ModelConfigBase):
@@ -335,7 +359,7 @@ class MainConfigBase(ModelConfigBase):
default_settings: Optional[MainModelDefaultSettings] = Field(
description="Default settings for this model", default=None
)
variant: ModelVariantType = ModelVariantType.Normal
variant: AnyVariant = ModelVariantType.Normal
class MainCheckpointConfig(CheckpointConfigBase, MainConfigBase):
@@ -394,6 +418,8 @@ class IPAdapterBaseConfig(ModelConfigBase):
class IPAdapterInvokeAIConfig(IPAdapterBaseConfig):
"""Model config for IP Adapter diffusers format models."""
# TODO(ryand): Should we deprecate this field? From what I can tell, it hasn't been probed correctly for a long
# time. Need to go through the history to make sure I'm understanding this fully.
image_encoder_model_id: str
format: Literal[ModelFormat.InvokeAI]
@@ -417,12 +443,33 @@ class CLIPEmbedDiffusersConfig(DiffusersConfigBase):
type: Literal[ModelType.CLIPEmbed] = ModelType.CLIPEmbed
format: Literal[ModelFormat.Diffusers] = ModelFormat.Diffusers
variant: ClipVariantType = ClipVariantType.L
@staticmethod
def get_tag() -> Tag:
return Tag(f"{ModelType.CLIPEmbed.value}.{ModelFormat.Diffusers.value}")
class CLIPGEmbedDiffusersConfig(CLIPEmbedDiffusersConfig):
"""Model config for CLIP-G Embeddings."""
variant: ClipVariantType = ClipVariantType.G
@staticmethod
def get_tag() -> Tag:
return Tag(f"{ModelType.CLIPEmbed.value}.{ModelFormat.Diffusers.value}.{ClipVariantType.G}")
class CLIPLEmbedDiffusersConfig(CLIPEmbedDiffusersConfig):
"""Model config for CLIP-L Embeddings."""
variant: ClipVariantType = ClipVariantType.L
@staticmethod
def get_tag() -> Tag:
return Tag(f"{ModelType.CLIPEmbed.value}.{ModelFormat.Diffusers.value}.{ClipVariantType.L}")
class CLIPVisionDiffusersConfig(DiffusersConfigBase):
"""Model config for CLIPVision."""
@@ -499,6 +546,8 @@ AnyModelConfig = Annotated[
Annotated[SpandrelImageToImageConfig, SpandrelImageToImageConfig.get_tag()],
Annotated[CLIPVisionDiffusersConfig, CLIPVisionDiffusersConfig.get_tag()],
Annotated[CLIPEmbedDiffusersConfig, CLIPEmbedDiffusersConfig.get_tag()],
Annotated[CLIPLEmbedDiffusersConfig, CLIPLEmbedDiffusersConfig.get_tag()],
Annotated[CLIPGEmbedDiffusersConfig, CLIPGEmbedDiffusersConfig.get_tag()],
],
Discriminator(get_model_discriminator_value),
]

View File

@@ -8,7 +8,7 @@ from pathlib import Path
from invokeai.backend.model_manager.load.load_base import LoadedModel, LoadedModelWithoutConfig, ModelLoaderBase
from invokeai.backend.model_manager.load.load_default import ModelLoader
from invokeai.backend.model_manager.load.model_cache.model_cache_default import ModelCache
from invokeai.backend.model_manager.load.model_cache.model_cache import ModelCache
from invokeai.backend.model_manager.load.model_loader_registry import ModelLoaderRegistry, ModelLoaderRegistryBase
# This registers the subclasses that implement loaders of specific model types

View File

@@ -5,7 +5,6 @@ Base class for model loading in InvokeAI.
from abc import ABC, abstractmethod
from contextlib import contextmanager
from dataclasses import dataclass
from logging import Logger
from pathlib import Path
from typing import Any, Dict, Generator, Optional, Tuple
@@ -18,19 +17,17 @@ from invokeai.backend.model_manager.config import (
AnyModelConfig,
SubModelType,
)
from invokeai.backend.model_manager.load.model_cache.model_cache_base import ModelCacheBase, ModelLockerBase
from invokeai.backend.model_manager.load.model_cache.cache_record import CacheRecord
from invokeai.backend.model_manager.load.model_cache.model_cache import ModelCache
@dataclass
class LoadedModelWithoutConfig:
"""
Context manager object that mediates transfer from RAM<->VRAM.
"""Context manager object that mediates transfer from RAM<->VRAM.
This is a context manager object that has two distinct APIs:
1. Older API (deprecated):
Use the LoadedModel object directly as a context manager.
It will move the model into VRAM (on CUDA devices), and
Use the LoadedModel object directly as a context manager. It will move the model into VRAM (on CUDA devices), and
return the model in a form suitable for passing to torch.
Example:
```
@@ -40,13 +37,9 @@ class LoadedModelWithoutConfig:
```
2. Newer API (recommended):
Call the LoadedModel's `model_on_device()` method in a
context. It returns a tuple consisting of a copy of
the model's state dict in CPU RAM followed by a copy
of the model in VRAM. The state dict is provided to allow
LoRAs and other model patchers to return the model to
its unpatched state without expensive copy and restore
operations.
Call the LoadedModel's `model_on_device()` method in a context. It returns a tuple consisting of a copy of the
model's state dict in CPU RAM followed by a copy of the model in VRAM. The state dict is provided to allow LoRAs and
other model patchers to return the model to its unpatched state without expensive copy and restore operations.
Example:
```
@@ -55,43 +48,42 @@ class LoadedModelWithoutConfig:
image = vae.decode(latents)[0]
```
The state_dict should be treated as a read-only object and
never modified. Also be aware that some loadable models do
not have a state_dict, in which case this value will be None.
The state_dict should be treated as a read-only object and never modified. Also be aware that some loadable models
do not have a state_dict, in which case this value will be None.
"""
_locker: ModelLockerBase
def __init__(self, cache_record: CacheRecord, cache: ModelCache):
self._cache_record = cache_record
self._cache = cache
def __enter__(self) -> AnyModel:
"""Context entry."""
self._locker.lock()
self._cache.lock(self._cache_record.key)
return self.model
def __exit__(self, *args: Any, **kwargs: Any) -> None:
"""Context exit."""
self._locker.unlock()
self._cache.unlock(self._cache_record.key)
@contextmanager
def model_on_device(self) -> Generator[Tuple[Optional[Dict[str, torch.Tensor]], AnyModel], None, None]:
"""Return a tuple consisting of the model's state dict (if it exists) and the locked model on execution device."""
locked_model = self._locker.lock()
self._cache.lock(self._cache_record.key)
try:
state_dict = self._locker.get_state_dict()
yield (state_dict, locked_model)
yield (self._cache_record.cached_model.get_cpu_state_dict(), self._cache_record.cached_model.model)
finally:
self._locker.unlock()
self._cache.unlock(self._cache_record.key)
@property
def model(self) -> AnyModel:
"""Return the model without locking it."""
return self._locker.model
return self._cache_record.cached_model.model
@dataclass
class LoadedModel(LoadedModelWithoutConfig):
"""Context manager object that mediates transfer from RAM<->VRAM."""
config: Optional[AnyModelConfig] = None
def __init__(self, config: Optional[AnyModelConfig], cache_record: CacheRecord, cache: ModelCache):
super().__init__(cache_record=cache_record, cache=cache)
self.config = config
# TODO(MM2):
@@ -110,7 +102,7 @@ class ModelLoaderBase(ABC):
self,
app_config: InvokeAIAppConfig,
logger: Logger,
ram_cache: ModelCacheBase[AnyModel],
ram_cache: ModelCache,
):
"""Initialize the loader."""
pass
@@ -138,6 +130,6 @@ class ModelLoaderBase(ABC):
@property
@abstractmethod
def ram_cache(self) -> ModelCacheBase[AnyModel]:
def ram_cache(self) -> ModelCache:
"""Return the ram cache associated with this loader."""
pass

View File

@@ -14,7 +14,8 @@ from invokeai.backend.model_manager import (
)
from invokeai.backend.model_manager.config import DiffusersConfigBase
from invokeai.backend.model_manager.load.load_base import LoadedModel, ModelLoaderBase
from invokeai.backend.model_manager.load.model_cache.model_cache_base import ModelCacheBase, ModelLockerBase
from invokeai.backend.model_manager.load.model_cache.cache_record import CacheRecord
from invokeai.backend.model_manager.load.model_cache.model_cache import ModelCache, get_model_cache_key
from invokeai.backend.model_manager.load.model_util import calc_model_size_by_fs
from invokeai.backend.model_manager.load.optimizations import skip_torch_weight_init
from invokeai.backend.util.devices import TorchDevice
@@ -28,13 +29,14 @@ class ModelLoader(ModelLoaderBase):
self,
app_config: InvokeAIAppConfig,
logger: Logger,
ram_cache: ModelCacheBase[AnyModel],
ram_cache: ModelCache,
):
"""Initialize the loader."""
self._app_config = app_config
self._logger = logger
self._ram_cache = ram_cache
self._torch_dtype = TorchDevice.choose_torch_dtype()
self._torch_device = TorchDevice.choose_torch_device()
def load_model(self, model_config: AnyModelConfig, submodel_type: Optional[SubModelType] = None) -> LoadedModel:
"""
@@ -53,11 +55,11 @@ class ModelLoader(ModelLoaderBase):
raise InvalidModelConfigException(f"Files for model '{model_config.name}' not found at {model_path}")
with skip_torch_weight_init():
locker = self._load_and_cache(model_config, submodel_type)
return LoadedModel(config=model_config, _locker=locker)
cache_record = self._load_and_cache(model_config, submodel_type)
return LoadedModel(config=model_config, cache_record=cache_record, cache=self._ram_cache)
@property
def ram_cache(self) -> ModelCacheBase[AnyModel]:
def ram_cache(self) -> ModelCache:
"""Return the ram cache associated with this loader."""
return self._ram_cache
@@ -65,10 +67,10 @@ class ModelLoader(ModelLoaderBase):
model_base = self._app_config.models_path
return (model_base / config.path).resolve()
def _load_and_cache(self, config: AnyModelConfig, submodel_type: Optional[SubModelType] = None) -> ModelLockerBase:
def _load_and_cache(self, config: AnyModelConfig, submodel_type: Optional[SubModelType] = None) -> CacheRecord:
stats_name = ":".join([config.base, config.type, config.name, (submodel_type or "")])
try:
return self._ram_cache.get(config.key, submodel_type, stats_name=stats_name)
return self._ram_cache.get(key=get_model_cache_key(config.key, submodel_type), stats_name=stats_name)
except IndexError:
pass
@@ -77,16 +79,11 @@ class ModelLoader(ModelLoaderBase):
loaded_model = self._load_model(config, submodel_type)
self._ram_cache.put(
config.key,
submodel_type=submodel_type,
get_model_cache_key(config.key, submodel_type),
model=loaded_model,
)
return self._ram_cache.get(
key=config.key,
submodel_type=submodel_type,
stats_name=stats_name,
)
return self._ram_cache.get(key=get_model_cache_key(config.key, submodel_type), stats_name=stats_name)
def get_size_fs(
self, config: AnyModelConfig, model_path: Path, submodel_type: Optional[SubModelType] = None

View File

@@ -1,6 +0,0 @@
"""Init file for ModelCache."""
from .model_cache_base import ModelCacheBase, CacheStats # noqa F401
from .model_cache_default import ModelCache # noqa F401
_all__ = ["ModelCacheBase", "ModelCache", "CacheStats"]

View File

@@ -0,0 +1,31 @@
from dataclasses import dataclass
from invokeai.backend.model_manager.load.model_cache.cached_model.cached_model_only_full_load import (
CachedModelOnlyFullLoad,
)
from invokeai.backend.model_manager.load.model_cache.cached_model.cached_model_with_partial_load import (
CachedModelWithPartialLoad,
)
@dataclass
class CacheRecord:
"""A class that represents a model in the model cache."""
# Cache key.
key: str
# Model in memory.
cached_model: CachedModelWithPartialLoad | CachedModelOnlyFullLoad
# If locks > 0, the model is actively being used, so we should do our best to keep it on the compute device.
_locks: int = 0
def lock(self) -> None:
self._locks += 1
def unlock(self) -> None:
self._locks -= 1
assert self._locks >= 0
@property
def is_locked(self) -> bool:
return self._locks > 0

View File

@@ -0,0 +1,15 @@
from dataclasses import dataclass, field
from typing import Dict
@dataclass
class CacheStats(object):
"""Collect statistics on cache performance."""
hits: int = 0 # cache hits
misses: int = 0 # cache misses
high_watermark: int = 0 # amount of cache used
in_cache: int = 0 # number of models in cache
cleared: int = 0 # number of models cleared to make space
cache_size: int = 0 # total size of cache
loaded_model_sizes: Dict[str, int] = field(default_factory=dict)

View File

@@ -0,0 +1,81 @@
from typing import Any
import torch
class CachedModelOnlyFullLoad:
"""A wrapper around a PyTorch model to handle full loads and unloads between the CPU and the compute device.
Note: "VRAM" is used throughout this class to refer to the memory on the compute device. It could be CUDA memory,
MPS memory, etc.
"""
def __init__(self, model: torch.nn.Module | Any, compute_device: torch.device, total_bytes: int):
"""Initialize a CachedModelOnlyFullLoad.
Args:
model (torch.nn.Module | Any): The model to wrap. Should be on the CPU.
compute_device (torch.device): The compute device to move the model to.
total_bytes (int): The total size (in bytes) of all the weights in the model.
"""
# model is often a torch.nn.Module, but could be any model type. Throughout this class, we handle both cases.
self._model = model
self._compute_device = compute_device
self._total_bytes = total_bytes
self._is_in_vram = False
@property
def model(self) -> torch.nn.Module:
return self._model
def get_cpu_state_dict(self) -> dict[str, torch.Tensor] | None:
"""Get a read-only copy of the model's state dict in RAM."""
# TODO(ryand): Document this better and implement it.
return None
def total_bytes(self) -> int:
"""Get the total size (in bytes) of all the weights in the model."""
return self._total_bytes
def cur_vram_bytes(self) -> int:
"""Get the size (in bytes) of the weights that are currently in VRAM."""
if self._is_in_vram:
return self._total_bytes
else:
return 0
def is_in_vram(self) -> bool:
"""Return true if the model is currently in VRAM."""
return self._is_in_vram
def full_load_to_vram(self) -> int:
"""Load all weights into VRAM (if supported by the model).
Returns:
The number of bytes loaded into VRAM.
"""
if self._is_in_vram:
# Already in VRAM.
return 0
if not hasattr(self._model, "to"):
# Model doesn't support moving to a device.
return 0
self._model.to(self._compute_device)
self._is_in_vram = True
return self._total_bytes
def full_unload_from_vram(self) -> int:
"""Unload all weights from VRAM.
Returns:
The number of bytes unloaded from VRAM.
"""
if not self._is_in_vram:
# Already in RAM.
return 0
self._model.to("cpu")
self._is_in_vram = False
return self._total_bytes

View File

@@ -0,0 +1,150 @@
import itertools
import torch
from invokeai.backend.model_manager.load.model_cache.torch_function_autocast_context import (
add_autocast_to_module_forward,
)
from invokeai.backend.util.calc_tensor_size import calc_tensor_size
def set_nested_attr(obj: object, attr: str, value: object):
"""A helper function that extends setattr() to support nested attributes.
Example:
set_nested_attr(model, "module.encoder.conv1.weight", new_conv1_weight)
"""
attrs = attr.split(".")
for attr in attrs[:-1]:
obj = getattr(obj, attr)
setattr(obj, attrs[-1], value)
class CachedModelWithPartialLoad:
"""A wrapper around a PyTorch model to handle partial loads and unloads between the CPU and the compute device.
Note: "VRAM" is used throughout this class to refer to the memory on the compute device. It could be CUDA memory,
MPS memory, etc.
"""
def __init__(self, model: torch.nn.Module, compute_device: torch.device):
self._model = model
self._compute_device = compute_device
# A CPU read-only copy of the model's state dict.
self._cpu_state_dict: dict[str, torch.Tensor] = model.state_dict()
# Monkey-patch the model to add autocasting to the model's forward method.
add_autocast_to_module_forward(model, compute_device)
self._total_bytes = sum(
calc_tensor_size(p) for p in itertools.chain(self._model.parameters(), self._model.buffers())
)
self._cur_vram_bytes: int | None = None
@property
def model(self) -> torch.nn.Module:
return self._model
def get_cpu_state_dict(self) -> dict[str, torch.Tensor] | None:
"""Get a read-only copy of the model's state dict in RAM."""
# TODO(ryand): Document this better.
return self._cpu_state_dict
def total_bytes(self) -> int:
"""Get the total size (in bytes) of all the weights in the model."""
return self._total_bytes
def cur_vram_bytes(self) -> int:
"""Get the size (in bytes) of the weights that are currently in VRAM."""
if self._cur_vram_bytes is None:
self._cur_vram_bytes = sum(
calc_tensor_size(p)
for p in itertools.chain(self._model.parameters(), self._model.buffers())
if p.device.type == self._compute_device.type
)
return self._cur_vram_bytes
def full_load_to_vram(self) -> int:
"""Load all weights into VRAM."""
return self.partial_load_to_vram(self.total_bytes())
def full_unload_from_vram(self) -> int:
"""Unload all weights from VRAM."""
return self.partial_unload_from_vram(self.total_bytes())
@torch.no_grad()
def partial_load_to_vram(self, vram_bytes_to_load: int) -> int:
"""Load more weights into VRAM without exceeding vram_bytes_to_load.
Returns:
The number of bytes loaded into VRAM.
"""
vram_bytes_loaded = 0
for key, param in itertools.chain(self._model.named_parameters(), self._model.named_buffers()):
# Skip parameters that are already on the compute device.
if param.device.type == self._compute_device.type:
continue
# Check the size of the parameter.
param_size = calc_tensor_size(param)
if vram_bytes_loaded + param_size > vram_bytes_to_load:
# TODO(ryand): Should we just break here? If we couldn't fit this parameter into VRAM, is it really
# worth continuing to search for a smaller parameter that would fit?
continue
# Copy the parameter to the compute device.
# We use the 'overwrite' strategy from torch.nn.Module._apply().
# TODO(ryand): For some edge cases (e.g. quantized models?), we may need to support other strategies (e.g.
# swap).
if isinstance(param, torch.nn.Parameter):
assert param.is_leaf
out_param = torch.nn.Parameter(
param.to(self._compute_device, copy=True), requires_grad=param.requires_grad
)
set_nested_attr(self._model, key, out_param)
# We did not port the param.grad handling from torch.nn.Module._apply(), because we do not expect to be
# handling gradients. We assert that this assumption is true.
assert param.grad is None
else:
# Handle buffers.
set_nested_attr(self._model, key, param.to(self._compute_device, copy=True))
vram_bytes_loaded += param_size
if self._cur_vram_bytes is not None:
self._cur_vram_bytes += vram_bytes_loaded
return vram_bytes_loaded
@torch.no_grad()
def partial_unload_from_vram(self, vram_bytes_to_free: int) -> int:
"""Unload weights from VRAM until vram_bytes_to_free bytes are freed. Or the entire model is unloaded.
Returns:
The number of bytes unloaded from VRAM.
"""
vram_bytes_freed = 0
for key, param in itertools.chain(self._model.named_parameters(), self._model.named_buffers()):
if vram_bytes_freed >= vram_bytes_to_free:
break
if param.device.type != self._compute_device.type:
continue
if isinstance(param, torch.nn.Parameter):
# Create a new parameter, but inject the existing CPU tensor into it.
out_param = torch.nn.Parameter(self._cpu_state_dict[key], requires_grad=param.requires_grad)
set_nested_attr(self._model, key, out_param)
else:
# Handle buffers.
set_nested_attr(self._model, key, self._cpu_state_dict[key])
vram_bytes_freed += calc_tensor_size(param)
if self._cur_vram_bytes is not None:
self._cur_vram_bytes -= vram_bytes_freed
return vram_bytes_freed

View File

@@ -0,0 +1,538 @@
import gc
from logging import Logger
from typing import Dict, List, Optional
import torch
from invokeai.backend.model_manager import AnyModel, SubModelType
from invokeai.backend.model_manager.load.memory_snapshot import MemorySnapshot
from invokeai.backend.model_manager.load.model_cache.cache_record import CacheRecord
from invokeai.backend.model_manager.load.model_cache.cache_stats import CacheStats
from invokeai.backend.model_manager.load.model_cache.cached_model.cached_model_only_full_load import (
CachedModelOnlyFullLoad,
)
from invokeai.backend.model_manager.load.model_cache.cached_model.cached_model_with_partial_load import (
CachedModelWithPartialLoad,
)
from invokeai.backend.model_manager.load.model_util import calc_model_size_by_data
from invokeai.backend.util.devices import TorchDevice
from invokeai.backend.util.logging import InvokeAILogger
from invokeai.backend.util.prefix_logger_adapter import PrefixedLoggerAdapter
# Size of a GB in bytes.
GB = 2**30
# Size of a MB in bytes.
MB = 2**20
# TODO(ryand): Where should this go? The ModelCache shouldn't be concerned with submodels.
def get_model_cache_key(model_key: str, submodel_type: Optional[SubModelType] = None) -> str:
"""Get the cache key for a model based on the optional submodel type."""
if submodel_type:
return f"{model_key}:{submodel_type.value}"
else:
return model_key
class ModelCache:
"""A cache for managing models in memory.
The cache is based on two levels of model storage:
- execution_device: The device where most models are executed (typically "cuda", "mps", or "cpu").
- storage_device: The device where models are offloaded when not in active use (typically "cpu").
The model cache is based on the following assumptions:
- storage_device_mem_size > execution_device_mem_size
- disk_to_storage_device_transfer_time >> storage_device_to_execution_device_transfer_time
A copy of all models in the cache is always kept on the storage_device. A subset of the models also have a copy on
the execution_device.
Models are moved between the storage_device and the execution_device as necessary. Cache size limits are enforced
on both the storage_device and the execution_device. The execution_device cache uses a smallest-first offload
policy. The storage_device cache uses a least-recently-used (LRU) offload policy.
Note: Neither of these offload policies has really been compared against alternatives. It's likely that different
policies would be better, although the optimal policies are likely heavily dependent on usage patterns and HW
configuration.
The cache returns context manager generators designed to load the model into the execution device (often GPU) within
the context, and unload outside the context.
Example usage:
```
cache = ModelCache(max_cache_size=7.5, max_vram_cache_size=6.0)
with cache.get_model('runwayml/stable-diffusion-1-5') as SD1:
do_something_on_gpu(SD1)
```
"""
def __init__(
self,
max_cache_size: float,
max_vram_cache_size: float,
execution_device: torch.device = torch.device("cuda"),
storage_device: torch.device = torch.device("cpu"),
lazy_offloading: bool = True,
log_memory_usage: bool = False,
logger: Optional[Logger] = None,
):
"""
Initialize the model RAM cache.
:param max_cache_size: Maximum size of the storage_device cache in GBs.
:param max_vram_cache_size: Maximum size of the execution_device cache in GBs.
:param execution_device: Torch device to load active model into [torch.device('cuda')]
:param storage_device: Torch device to save inactive model in [torch.device('cpu')]
:param lazy_offloading: Keep model in VRAM until another model needs to be loaded
:param log_memory_usage: If True, a memory snapshot will be captured before and after every model cache
operation, and the result will be logged (at debug level). There is a time cost to capturing the memory
snapshots, so it is recommended to disable this feature unless you are actively inspecting the model cache's
behaviour.
:param logger: InvokeAILogger to use (otherwise creates one)
"""
# allow lazy offloading only when vram cache enabled
# TODO(ryand): Think about what lazy_offloading should mean in the new model cache.
self._lazy_offloading = lazy_offloading and max_vram_cache_size > 0
self._max_cache_size: float = max_cache_size
self._max_vram_cache_size: float = max_vram_cache_size
self._execution_device: torch.device = execution_device
self._storage_device: torch.device = storage_device
self._logger = PrefixedLoggerAdapter(
logger or InvokeAILogger.get_logger(self.__class__.__name__), "MODEL CACHE"
)
self._log_memory_usage = log_memory_usage
self._stats: Optional[CacheStats] = None
self._cached_models: Dict[str, CacheRecord] = {}
self._cache_stack: List[str] = []
@property
def max_cache_size(self) -> float:
"""Return the cap on cache size."""
return self._max_cache_size
@max_cache_size.setter
def max_cache_size(self, value: float) -> None:
"""Set the cap on cache size."""
self._max_cache_size = value
@property
def max_vram_cache_size(self) -> float:
"""Return the cap on vram cache size."""
return self._max_vram_cache_size
@max_vram_cache_size.setter
def max_vram_cache_size(self, value: float) -> None:
"""Set the cap on vram cache size."""
self._max_vram_cache_size = value
@property
def stats(self) -> Optional[CacheStats]:
"""Return collected CacheStats object."""
return self._stats
@stats.setter
def stats(self, stats: CacheStats) -> None:
"""Set the CacheStats object for collecting cache statistics."""
self._stats = stats
def put(self, key: str, model: AnyModel) -> None:
"""Add a model to the cache."""
if key in self._cached_models:
self._logger.debug(
f"Attempted to add model {key} ({model.__class__.__name__}), but it already exists in the cache. No action necessary."
)
return
size = calc_model_size_by_data(self._logger, model)
self.make_room(size)
# Wrap model.
if isinstance(model, torch.nn.Module):
wrapped_model = CachedModelWithPartialLoad(model, self._execution_device)
else:
wrapped_model = CachedModelOnlyFullLoad(model, self._execution_device, size)
# running_on_cpu = self._execution_device == torch.device("cpu")
# state_dict = model.state_dict() if isinstance(model, torch.nn.Module) and not running_on_cpu else None
cache_record = CacheRecord(key=key, cached_model=wrapped_model)
self._cached_models[key] = cache_record
self._cache_stack.append(key)
self._logger.debug(
f"Added model {key} (Type: {model.__class__.__name__}, Wrap mode: {wrapped_model.__class__.__name__}, Model size: {size/MB:.2f}MB)"
)
def get(self, key: str, stats_name: Optional[str] = None) -> CacheRecord:
"""Retrieve a model from the cache.
:param key: Model key
:param stats_name: A human-readable id for the model for the purposes of stats reporting.
Raises IndexError if the model is not in the cache.
"""
if key in self._cached_models:
if self.stats:
self.stats.hits += 1
else:
if self.stats:
self.stats.misses += 1
self._logger.debug(f"Cache miss: {key}")
raise IndexError(f"The model with key {key} is not in the cache.")
cache_entry = self._cached_models[key]
# more stats
if self.stats:
stats_name = stats_name or key
self.stats.cache_size = int(self._max_cache_size * GB)
self.stats.high_watermark = max(self.stats.high_watermark, self._get_ram_in_use())
self.stats.in_cache = len(self._cached_models)
self.stats.loaded_model_sizes[stats_name] = max(
self.stats.loaded_model_sizes.get(stats_name, 0), cache_entry.cached_model.total_bytes()
)
# this moves the entry to the top (right end) of the stack
self._cache_stack = [k for k in self._cache_stack if k != key]
self._cache_stack.append(key)
self._logger.debug(f"Cache hit: {key} (Type: {cache_entry.cached_model.model.__class__.__name__})")
return cache_entry
def lock(self, key: str) -> None:
"""Lock a model for use and move it into VRAM."""
cache_entry = self._cached_models[key]
cache_entry.lock()
self._logger.debug(f"Locking model {key} (Type: {cache_entry.cached_model.model.__class__.__name__})")
try:
self._load_locked_model(cache_entry)
self._logger.debug(
f"Finished locking model {key} (Type: {cache_entry.cached_model.model.__class__.__name__})"
)
except torch.cuda.OutOfMemoryError:
self._logger.warning("Insufficient GPU memory to load model. Aborting")
cache_entry.unlock()
raise
except Exception:
cache_entry.unlock()
raise
self._log_cache_state()
def unlock(self, key: str) -> None:
"""Unlock a model."""
cache_entry = self._cached_models[key]
cache_entry.unlock()
self._logger.debug(f"Unlocked model {key} (Type: {cache_entry.cached_model.model.__class__.__name__})")
def _load_locked_model(self, cache_entry: CacheRecord) -> None:
"""Helper function for self.lock(). Loads a locked model into VRAM."""
vram_available = self._get_vram_available()
# Calculate model_vram_needed, the amount of additional VRAM that will be used if we fully load the model into
# VRAM.
model_cur_vram_bytes = cache_entry.cached_model.cur_vram_bytes()
model_total_bytes = cache_entry.cached_model.total_bytes()
model_vram_needed = model_total_bytes - model_cur_vram_bytes
# The amount of VRAM that must be freed to make room for model_vram_needed.
vram_bytes_to_free = max(0, model_vram_needed - vram_available)
self._logger.debug(
f"Before unloading: {self._get_vram_state_str(model_cur_vram_bytes, model_total_bytes, vram_available)}"
)
# Make room for the model in VRAM.
# 1. If the model can fit entirely in VRAM, then make enough room for it to be loaded fully.
# 2. If the model can't fit fully into VRAM, then unload all other models and load as much of the model as
# possible.
vram_bytes_freed = self._offload_unlocked_models(vram_bytes_to_free)
self._logger.debug(f"Unloaded models (if necessary): vram_bytes_freed={(vram_bytes_freed/MB):.2f}MB")
# Check the updated vram_available after offloading.
vram_available = self._get_vram_available()
self._logger.debug(
f"After unloading: {self._get_vram_state_str(model_cur_vram_bytes, model_total_bytes, vram_available)}"
)
# Move as much of the model as possible into VRAM.
model_bytes_loaded = 0
if isinstance(cache_entry.cached_model, CachedModelWithPartialLoad):
model_bytes_loaded = cache_entry.cached_model.partial_load_to_vram(vram_available)
elif isinstance(cache_entry.cached_model, CachedModelOnlyFullLoad): # type: ignore
# Partial load is not supported, so we have not choice but to try and fit it all into VRAM.
model_bytes_loaded = cache_entry.cached_model.full_load_to_vram()
else:
raise ValueError(f"Unsupported cached model type: {type(cache_entry.cached_model)}")
model_cur_vram_bytes = cache_entry.cached_model.cur_vram_bytes()
vram_available = self._get_vram_available()
self._logger.debug(f"Loaded model onto execution device: model_bytes_loaded={(model_bytes_loaded/MB):.2f}MB, ")
self._logger.debug(
f"After loading: {self._get_vram_state_str(model_cur_vram_bytes, model_total_bytes, vram_available)}"
)
def _get_vram_available(self) -> int:
"""Get the amount of VRAM available in the cache."""
return int(self._max_vram_cache_size * GB) - self._get_vram_in_use()
def _get_vram_in_use(self) -> int:
"""Get the amount of VRAM currently in use."""
return sum(ce.cached_model.cur_vram_bytes() for ce in self._cached_models.values())
def _get_ram_available(self) -> int:
"""Get the amount of RAM available in the cache."""
return int(self._max_cache_size * GB) - self._get_ram_in_use()
def _get_ram_in_use(self) -> int:
"""Get the amount of RAM currently in use."""
return sum(ce.cached_model.total_bytes() for ce in self._cached_models.values())
def _capture_memory_snapshot(self) -> Optional[MemorySnapshot]:
if self._log_memory_usage:
return MemorySnapshot.capture()
return None
def _get_vram_state_str(self, model_cur_vram_bytes: int, model_total_bytes: int, vram_available: int) -> str:
"""Helper function for preparing a VRAM state log string."""
model_cur_vram_bytes_percent = model_cur_vram_bytes / model_total_bytes if model_total_bytes > 0 else 0
return (
f"model_total={model_total_bytes/MB:.0f} MB, "
+ f"model_vram={model_cur_vram_bytes/MB:.0f} MB ({model_cur_vram_bytes_percent:.1%} %), "
+ f"vram_total={int(self._max_vram_cache_size * GB)/MB:.0f} MB, "
+ f"vram_available={(vram_available/MB):.0f} MB, "
)
def _offload_unlocked_models(self, vram_bytes_to_free: int) -> int:
"""Offload models from the execution_device until vram_bytes_to_free bytes are freed, or all models are
offloaded. Of course, locked models are not offloaded.
Returns:
int: The number of bytes freed.
"""
self._logger.debug(f"Offloading unlocked models with goal of freeing {vram_bytes_to_free/MB:.2f}MB of VRAM.")
vram_bytes_freed = 0
# TODO(ryand): Give more thought to the offloading policy used here.
cache_entries_increasing_size = sorted(self._cached_models.values(), key=lambda x: x.cached_model.total_bytes())
for cache_entry in cache_entries_increasing_size:
if vram_bytes_freed >= vram_bytes_to_free:
break
if cache_entry.is_locked:
continue
if isinstance(cache_entry.cached_model, CachedModelWithPartialLoad):
cache_entry_bytes_freed = cache_entry.cached_model.partial_unload_from_vram(
vram_bytes_to_free - vram_bytes_freed
)
elif isinstance(cache_entry.cached_model, CachedModelOnlyFullLoad): # type: ignore
cache_entry_bytes_freed = cache_entry.cached_model.full_unload_from_vram()
else:
raise ValueError(f"Unsupported cached model type: {type(cache_entry.cached_model)}")
if cache_entry_bytes_freed > 0:
self._logger.debug(
f"Unloaded {cache_entry.key} from VRAM to free {(cache_entry_bytes_freed/MB):.0f} MB."
)
vram_bytes_freed += cache_entry_bytes_freed
TorchDevice.empty_cache()
return vram_bytes_freed
# def _move_model_to_device(self, cache_entry: CacheRecord, target_device: torch.device) -> None:
# """Move model into the indicated device.
# :param cache_entry: The CacheRecord for the model
# :param target_device: The torch.device to move the model into
# May raise a torch.cuda.OutOfMemoryError
# """
# self._logger.debug(f"Called to move {cache_entry.key} to {target_device}")
# source_device = cache_entry.device
# # Note: We compare device types only so that 'cuda' == 'cuda:0'.
# # This would need to be revised to support multi-GPU.
# if torch.device(source_device).type == torch.device(target_device).type:
# return
# # Some models don't have a `to` method, in which case they run in RAM/CPU.
# if not hasattr(cache_entry.model, "to"):
# return
# # This roundabout method for moving the model around is done to avoid
# # the cost of moving the model from RAM to VRAM and then back from VRAM to RAM.
# # When moving to VRAM, we copy (not move) each element of the state dict from
# # RAM to a new state dict in VRAM, and then inject it into the model.
# # This operation is slightly faster than running `to()` on the whole model.
# #
# # When the model needs to be removed from VRAM we simply delete the copy
# # of the state dict in VRAM, and reinject the state dict that is cached
# # in RAM into the model. So this operation is very fast.
# start_model_to_time = time.time()
# snapshot_before = self._capture_memory_snapshot()
# try:
# if cache_entry.state_dict is not None:
# assert hasattr(cache_entry.model, "load_state_dict")
# if target_device == self._storage_device:
# cache_entry.model.load_state_dict(cache_entry.state_dict, assign=True)
# else:
# new_dict: Dict[str, torch.Tensor] = {}
# for k, v in cache_entry.state_dict.items():
# new_dict[k] = v.to(target_device, copy=True)
# cache_entry.model.load_state_dict(new_dict, assign=True)
# cache_entry.model.to(target_device)
# cache_entry.device = target_device
# except Exception as e: # blow away cache entry
# self._delete_cache_entry(cache_entry)
# raise e
# snapshot_after = self._capture_memory_snapshot()
# end_model_to_time = time.time()
# self._logger.debug(
# f"Moved model '{cache_entry.key}' from {source_device} to"
# f" {target_device} in {(end_model_to_time-start_model_to_time):.2f}s."
# f"Estimated model size: {(cache_entry.size/GB):.3f} GB."
# f"{get_pretty_snapshot_diff(snapshot_before, snapshot_after)}"
# )
# if (
# snapshot_before is not None
# and snapshot_after is not None
# and snapshot_before.vram is not None
# and snapshot_after.vram is not None
# ):
# vram_change = abs(snapshot_before.vram - snapshot_after.vram)
# # If the estimated model size does not match the change in VRAM, log a warning.
# if not math.isclose(
# vram_change,
# cache_entry.size,
# rel_tol=0.1,
# abs_tol=10 * MB,
# ):
# self._logger.debug(
# f"Moving model '{cache_entry.key}' from {source_device} to"
# f" {target_device} caused an unexpected change in VRAM usage. The model's"
# " estimated size may be incorrect. Estimated model size:"
# f" {(cache_entry.size/GB):.3f} GB.\n"
# f"{get_pretty_snapshot_diff(snapshot_before, snapshot_after)}"
# )
def _log_cache_state(self, title: str = "Model cache state:", include_entry_details: bool = True):
ram_size_bytes = self._max_cache_size * GB
ram_in_use_bytes = self._get_ram_in_use()
ram_in_use_bytes_percent = ram_in_use_bytes / ram_size_bytes if ram_size_bytes > 0 else 0
ram_available_bytes = self._get_ram_available()
ram_available_bytes_percent = ram_available_bytes / ram_size_bytes if ram_size_bytes > 0 else 0
vram_size_bytes = self._max_vram_cache_size * GB
vram_in_use_bytes = self._get_vram_in_use()
vram_in_use_bytes_percent = vram_in_use_bytes / vram_size_bytes if vram_size_bytes > 0 else 0
vram_available_bytes = self._get_vram_available()
vram_available_bytes_percent = vram_available_bytes / vram_size_bytes if vram_size_bytes > 0 else 0
log = f"{title}\n"
log_format = " {:<30} Limit: {:>7.1f} MB, Used: {:>7.1f} MB ({:>5.1%}), Available: {:>7.1f} MB ({:>5.1%})\n"
log += log_format.format(
f"Storage Device ({self._storage_device.type})",
ram_size_bytes / MB,
ram_in_use_bytes / MB,
ram_in_use_bytes_percent,
ram_available_bytes / MB,
ram_available_bytes_percent,
)
log += log_format.format(
f"Compute Device ({self._execution_device.type})",
vram_size_bytes / MB,
vram_in_use_bytes / MB,
vram_in_use_bytes_percent,
vram_available_bytes / MB,
vram_available_bytes_percent,
)
if torch.cuda.is_available():
log += " {:<30} {} MB\n".format("CUDA Memory Allocated:", torch.cuda.memory_allocated() / MB)
log += " {:<30} {}\n".format("Total models:", len(self._cached_models))
if include_entry_details and len(self._cached_models) > 0:
log += " Models:\n"
log_format = (
" {:<80} total={:>7.1f} MB, vram={:>7.1f} MB ({:>5.1%}), ram={:>7.1f} MB ({:>5.1%}), locked={}\n"
)
for cache_record in self._cached_models.values():
total_bytes = cache_record.cached_model.total_bytes()
cur_vram_bytes = cache_record.cached_model.cur_vram_bytes()
cur_vram_bytes_percent = cur_vram_bytes / total_bytes if total_bytes > 0 else 0
cur_ram_bytes = total_bytes - cur_vram_bytes
cur_ram_bytes_percent = cur_ram_bytes / total_bytes if total_bytes > 0 else 0
log += log_format.format(
f"{cache_record.key} ({cache_record.cached_model.model.__class__.__name__}):",
total_bytes / MB,
cur_vram_bytes / MB,
cur_vram_bytes_percent,
cur_ram_bytes / MB,
cur_ram_bytes_percent,
cache_record.is_locked,
)
self._logger.debug(log)
def make_room(self, bytes_needed: int) -> None:
"""Make enough room in the cache to accommodate a new model of indicated size.
Note: This function deletes all of the cache's internal references to a model in order to free it. If there are
external references to the model, there's nothing that the cache can do about it, and those models will not be
garbage-collected.
"""
self._logger.debug(f"Making room for {bytes_needed/MB:.2f}MB of RAM.")
self._log_cache_state(title="Before dropping models:")
ram_bytes_available = self._get_ram_available()
ram_bytes_to_free = max(0, bytes_needed - ram_bytes_available)
ram_bytes_freed = 0
pos = 0
models_cleared = 0
while ram_bytes_freed < ram_bytes_to_free and pos < len(self._cache_stack):
model_key = self._cache_stack[pos]
cache_entry = self._cached_models[model_key]
if not cache_entry.is_locked:
ram_bytes_freed += cache_entry.cached_model.total_bytes()
self._logger.debug(
f"Dropping {model_key} from RAM cache to free {(cache_entry.cached_model.total_bytes()/MB):.2f}MB."
)
self._delete_cache_entry(cache_entry)
del cache_entry
models_cleared += 1
else:
pos += 1
if models_cleared > 0:
# There would likely be some 'garbage' to be collected regardless of whether a model was cleared or not, but
# there is a significant time cost to calling `gc.collect()`, so we want to use it sparingly. (The time cost
# is high even if no garbage gets collected.)
#
# Calling gc.collect(...) when a model is cleared seems like a good middle-ground:
# - If models had to be cleared, it's a signal that we are close to our memory limit.
# - If models were cleared, there's a good chance that there's a significant amount of garbage to be
# collected.
#
# Keep in mind that gc is only responsible for handling reference cycles. Most objects should be cleaned up
# immediately when their reference count hits 0.
if self.stats:
self.stats.cleared = models_cleared
gc.collect()
TorchDevice.empty_cache()
self._logger.debug(f"Dropped {models_cleared} models to free {ram_bytes_freed/MB:.2f}MB of RAM.")
self._log_cache_state(title="After dropping models:")
def _delete_cache_entry(self, cache_entry: CacheRecord) -> None:
self._cache_stack.remove(cache_entry.key)
del self._cached_models[cache_entry.key]

View File

@@ -1,221 +0,0 @@
# Copyright (c) 2024 Lincoln D. Stein and the InvokeAI Development team
# TODO: Add Stalker's proper name to copyright
"""
Manage a RAM cache of diffusion/transformer models for fast switching.
They are moved between GPU VRAM and CPU RAM as necessary. If the cache
grows larger than a preset maximum, then the least recently used
model will be cleared and (re)loaded from disk when next needed.
"""
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from logging import Logger
from typing import Dict, Generic, Optional, TypeVar
import torch
from invokeai.backend.model_manager.config import AnyModel, SubModelType
class ModelLockerBase(ABC):
"""Base class for the model locker used by the loader."""
@abstractmethod
def lock(self) -> AnyModel:
"""Lock the contained model and move it into VRAM."""
pass
@abstractmethod
def unlock(self) -> None:
"""Unlock the contained model, and remove it from VRAM."""
pass
@abstractmethod
def get_state_dict(self) -> Optional[Dict[str, torch.Tensor]]:
"""Return the state dict (if any) for the cached model."""
pass
@property
@abstractmethod
def model(self) -> AnyModel:
"""Return the model."""
pass
T = TypeVar("T")
@dataclass
class CacheRecord(Generic[T]):
"""
Elements of the cache:
key: Unique key for each model, same as used in the models database.
model: Model in memory.
state_dict: A read-only copy of the model's state dict in RAM. It will be
used as a template for creating a copy in the VRAM.
size: Size of the model
loaded: True if the model's state dict is currently in VRAM
Before a model is executed, the state_dict template is copied into VRAM,
and then injected into the model. When the model is finished, the VRAM
copy of the state dict is deleted, and the RAM version is reinjected
into the model.
The state_dict should be treated as a read-only attribute. Do not attempt
to patch or otherwise modify it. Instead, patch the copy of the state_dict
after it is loaded into the execution device (e.g. CUDA) using the `LoadedModel`
context manager call `model_on_device()`.
"""
key: str
model: T
device: torch.device
state_dict: Optional[Dict[str, torch.Tensor]]
size: int
loaded: bool = False
_locks: int = 0
def lock(self) -> None:
"""Lock this record."""
self._locks += 1
def unlock(self) -> None:
"""Unlock this record."""
self._locks -= 1
assert self._locks >= 0
@property
def locked(self) -> bool:
"""Return true if record is locked."""
return self._locks > 0
@dataclass
class CacheStats(object):
"""Collect statistics on cache performance."""
hits: int = 0 # cache hits
misses: int = 0 # cache misses
high_watermark: int = 0 # amount of cache used
in_cache: int = 0 # number of models in cache
cleared: int = 0 # number of models cleared to make space
cache_size: int = 0 # total size of cache
loaded_model_sizes: Dict[str, int] = field(default_factory=dict)
class ModelCacheBase(ABC, Generic[T]):
"""Virtual base class for RAM model cache."""
@property
@abstractmethod
def storage_device(self) -> torch.device:
"""Return the storage device (e.g. "CPU" for RAM)."""
pass
@property
@abstractmethod
def execution_device(self) -> torch.device:
"""Return the exection device (e.g. "cuda" for VRAM)."""
pass
@property
@abstractmethod
def lazy_offloading(self) -> bool:
"""Return true if the cache is configured to lazily offload models in VRAM."""
pass
@property
@abstractmethod
def max_cache_size(self) -> float:
"""Return the maximum size the RAM cache can grow to."""
pass
@max_cache_size.setter
@abstractmethod
def max_cache_size(self, value: float) -> None:
"""Set the cap on vram cache size."""
@property
@abstractmethod
def max_vram_cache_size(self) -> float:
"""Return the maximum size the VRAM cache can grow to."""
pass
@max_vram_cache_size.setter
@abstractmethod
def max_vram_cache_size(self, value: float) -> float:
"""Set the maximum size the VRAM cache can grow to."""
pass
@abstractmethod
def offload_unlocked_models(self, size_required: int) -> None:
"""Offload from VRAM any models not actively in use."""
pass
@abstractmethod
def move_model_to_device(self, cache_entry: CacheRecord[AnyModel], target_device: torch.device) -> None:
"""Move model into the indicated device."""
pass
@property
@abstractmethod
def stats(self) -> Optional[CacheStats]:
"""Return collected CacheStats object."""
pass
@stats.setter
@abstractmethod
def stats(self, stats: CacheStats) -> None:
"""Set the CacheStats object for collectin cache statistics."""
pass
@property
@abstractmethod
def logger(self) -> Logger:
"""Return the logger used by the cache."""
pass
@abstractmethod
def make_room(self, size: int) -> None:
"""Make enough room in the cache to accommodate a new model of indicated size."""
pass
@abstractmethod
def put(
self,
key: str,
model: T,
submodel_type: Optional[SubModelType] = None,
) -> None:
"""Store model under key and optional submodel_type."""
pass
@abstractmethod
def get(
self,
key: str,
submodel_type: Optional[SubModelType] = None,
stats_name: Optional[str] = None,
) -> ModelLockerBase:
"""
Retrieve model using key and optional submodel_type.
:param key: Opaque model key
:param submodel_type: Type of the submodel to fetch
:param stats_name: A human-readable id for the model for the purposes of
stats reporting.
This may raise an IndexError if the model is not in the cache.
"""
pass
@abstractmethod
def cache_size(self) -> int:
"""Get the total size of the models currently cached."""
pass
@abstractmethod
def print_cuda_stats(self) -> None:
"""Log debugging information on CUDA usage."""
pass

View File

@@ -1,426 +0,0 @@
# Copyright (c) 2024 Lincoln D. Stein and the InvokeAI Development team
# TODO: Add Stalker's proper name to copyright
""" """
import gc
import math
import time
from contextlib import suppress
from logging import Logger
from typing import Dict, List, Optional
import torch
from invokeai.backend.model_manager import AnyModel, SubModelType
from invokeai.backend.model_manager.load.memory_snapshot import MemorySnapshot, get_pretty_snapshot_diff
from invokeai.backend.model_manager.load.model_cache.model_cache_base import (
CacheRecord,
CacheStats,
ModelCacheBase,
ModelLockerBase,
)
from invokeai.backend.model_manager.load.model_cache.model_locker import ModelLocker
from invokeai.backend.model_manager.load.model_util import calc_model_size_by_data
from invokeai.backend.util.devices import TorchDevice
from invokeai.backend.util.logging import InvokeAILogger
# Size of a GB in bytes.
GB = 2**30
# Size of a MB in bytes.
MB = 2**20
class ModelCache(ModelCacheBase[AnyModel]):
"""A cache for managing models in memory.
The cache is based on two levels of model storage:
- execution_device: The device where most models are executed (typically "cuda", "mps", or "cpu").
- storage_device: The device where models are offloaded when not in active use (typically "cpu").
The model cache is based on the following assumptions:
- storage_device_mem_size > execution_device_mem_size
- disk_to_storage_device_transfer_time >> storage_device_to_execution_device_transfer_time
A copy of all models in the cache is always kept on the storage_device. A subset of the models also have a copy on
the execution_device.
Models are moved between the storage_device and the execution_device as necessary. Cache size limits are enforced
on both the storage_device and the execution_device. The execution_device cache uses a smallest-first offload
policy. The storage_device cache uses a least-recently-used (LRU) offload policy.
Note: Neither of these offload policies has really been compared against alternatives. It's likely that different
policies would be better, although the optimal policies are likely heavily dependent on usage patterns and HW
configuration.
The cache returns context manager generators designed to load the model into the execution device (often GPU) within
the context, and unload outside the context.
Example usage:
```
cache = ModelCache(max_cache_size=7.5, max_vram_cache_size=6.0)
with cache.get_model('runwayml/stable-diffusion-1-5') as SD1:
do_something_on_gpu(SD1)
```
"""
def __init__(
self,
max_cache_size: float,
max_vram_cache_size: float,
execution_device: torch.device = torch.device("cuda"),
storage_device: torch.device = torch.device("cpu"),
precision: torch.dtype = torch.float16,
lazy_offloading: bool = True,
log_memory_usage: bool = False,
logger: Optional[Logger] = None,
):
"""
Initialize the model RAM cache.
:param max_cache_size: Maximum size of the storage_device cache in GBs.
:param max_vram_cache_size: Maximum size of the execution_device cache in GBs.
:param execution_device: Torch device to load active model into [torch.device('cuda')]
:param storage_device: Torch device to save inactive model in [torch.device('cpu')]
:param precision: Precision for loaded models [torch.float16]
:param lazy_offloading: Keep model in VRAM until another model needs to be loaded
:param log_memory_usage: If True, a memory snapshot will be captured before and after every model cache
operation, and the result will be logged (at debug level). There is a time cost to capturing the memory
snapshots, so it is recommended to disable this feature unless you are actively inspecting the model cache's
behaviour.
:param logger: InvokeAILogger to use (otherwise creates one)
"""
# allow lazy offloading only when vram cache enabled
self._lazy_offloading = lazy_offloading and max_vram_cache_size > 0
self._max_cache_size: float = max_cache_size
self._max_vram_cache_size: float = max_vram_cache_size
self._execution_device: torch.device = execution_device
self._storage_device: torch.device = storage_device
self._logger = logger or InvokeAILogger.get_logger(self.__class__.__name__)
self._log_memory_usage = log_memory_usage
self._stats: Optional[CacheStats] = None
self._cached_models: Dict[str, CacheRecord[AnyModel]] = {}
self._cache_stack: List[str] = []
@property
def logger(self) -> Logger:
"""Return the logger used by the cache."""
return self._logger
@property
def lazy_offloading(self) -> bool:
"""Return true if the cache is configured to lazily offload models in VRAM."""
return self._lazy_offloading
@property
def storage_device(self) -> torch.device:
"""Return the storage device (e.g. "CPU" for RAM)."""
return self._storage_device
@property
def execution_device(self) -> torch.device:
"""Return the exection device (e.g. "cuda" for VRAM)."""
return self._execution_device
@property
def max_cache_size(self) -> float:
"""Return the cap on cache size."""
return self._max_cache_size
@max_cache_size.setter
def max_cache_size(self, value: float) -> None:
"""Set the cap on cache size."""
self._max_cache_size = value
@property
def max_vram_cache_size(self) -> float:
"""Return the cap on vram cache size."""
return self._max_vram_cache_size
@max_vram_cache_size.setter
def max_vram_cache_size(self, value: float) -> None:
"""Set the cap on vram cache size."""
self._max_vram_cache_size = value
@property
def stats(self) -> Optional[CacheStats]:
"""Return collected CacheStats object."""
return self._stats
@stats.setter
def stats(self, stats: CacheStats) -> None:
"""Set the CacheStats object for collectin cache statistics."""
self._stats = stats
def cache_size(self) -> int:
"""Get the total size of the models currently cached."""
total = 0
for cache_record in self._cached_models.values():
total += cache_record.size
return total
def put(
self,
key: str,
model: AnyModel,
submodel_type: Optional[SubModelType] = None,
) -> None:
"""Store model under key and optional submodel_type."""
key = self._make_cache_key(key, submodel_type)
if key in self._cached_models:
return
size = calc_model_size_by_data(self.logger, model)
self.make_room(size)
running_on_cpu = self.execution_device == torch.device("cpu")
state_dict = model.state_dict() if isinstance(model, torch.nn.Module) and not running_on_cpu else None
cache_record = CacheRecord(key=key, model=model, device=self.storage_device, state_dict=state_dict, size=size)
self._cached_models[key] = cache_record
self._cache_stack.append(key)
def get(
self,
key: str,
submodel_type: Optional[SubModelType] = None,
stats_name: Optional[str] = None,
) -> ModelLockerBase:
"""
Retrieve model using key and optional submodel_type.
:param key: Opaque model key
:param submodel_type: Type of the submodel to fetch
:param stats_name: A human-readable id for the model for the purposes of
stats reporting.
This may raise an IndexError if the model is not in the cache.
"""
key = self._make_cache_key(key, submodel_type)
if key in self._cached_models:
if self.stats:
self.stats.hits += 1
else:
if self.stats:
self.stats.misses += 1
raise IndexError(f"The model with key {key} is not in the cache.")
cache_entry = self._cached_models[key]
# more stats
if self.stats:
stats_name = stats_name or key
self.stats.cache_size = int(self._max_cache_size * GB)
self.stats.high_watermark = max(self.stats.high_watermark, self.cache_size())
self.stats.in_cache = len(self._cached_models)
self.stats.loaded_model_sizes[stats_name] = max(
self.stats.loaded_model_sizes.get(stats_name, 0), cache_entry.size
)
# this moves the entry to the top (right end) of the stack
with suppress(Exception):
self._cache_stack.remove(key)
self._cache_stack.append(key)
return ModelLocker(
cache=self,
cache_entry=cache_entry,
)
def _capture_memory_snapshot(self) -> Optional[MemorySnapshot]:
if self._log_memory_usage:
return MemorySnapshot.capture()
return None
def _make_cache_key(self, model_key: str, submodel_type: Optional[SubModelType] = None) -> str:
if submodel_type:
return f"{model_key}:{submodel_type.value}"
else:
return model_key
def offload_unlocked_models(self, size_required: int) -> None:
"""Offload models from the execution_device to make room for size_required.
:param size_required: The amount of space to clear in the execution_device cache, in bytes.
"""
reserved = self._max_vram_cache_size * GB
vram_in_use = torch.cuda.memory_allocated() + size_required
self.logger.debug(f"{(vram_in_use/GB):.2f}GB VRAM needed for models; max allowed={(reserved/GB):.2f}GB")
for _, cache_entry in sorted(self._cached_models.items(), key=lambda x: x[1].size):
if vram_in_use <= reserved:
break
if not cache_entry.loaded:
continue
if not cache_entry.locked:
self.move_model_to_device(cache_entry, self.storage_device)
cache_entry.loaded = False
vram_in_use = torch.cuda.memory_allocated() + size_required
self.logger.debug(
f"Removing {cache_entry.key} from VRAM to free {(cache_entry.size/GB):.2f}GB; vram free = {(torch.cuda.memory_allocated()/GB):.2f}GB"
)
TorchDevice.empty_cache()
def move_model_to_device(self, cache_entry: CacheRecord[AnyModel], target_device: torch.device) -> None:
"""Move model into the indicated device.
:param cache_entry: The CacheRecord for the model
:param target_device: The torch.device to move the model into
May raise a torch.cuda.OutOfMemoryError
"""
self.logger.debug(f"Called to move {cache_entry.key} to {target_device}")
source_device = cache_entry.device
# Note: We compare device types only so that 'cuda' == 'cuda:0'.
# This would need to be revised to support multi-GPU.
if torch.device(source_device).type == torch.device(target_device).type:
return
# Some models don't have a `to` method, in which case they run in RAM/CPU.
if not hasattr(cache_entry.model, "to"):
return
# This roundabout method for moving the model around is done to avoid
# the cost of moving the model from RAM to VRAM and then back from VRAM to RAM.
# When moving to VRAM, we copy (not move) each element of the state dict from
# RAM to a new state dict in VRAM, and then inject it into the model.
# This operation is slightly faster than running `to()` on the whole model.
#
# When the model needs to be removed from VRAM we simply delete the copy
# of the state dict in VRAM, and reinject the state dict that is cached
# in RAM into the model. So this operation is very fast.
start_model_to_time = time.time()
snapshot_before = self._capture_memory_snapshot()
try:
if cache_entry.state_dict is not None:
assert hasattr(cache_entry.model, "load_state_dict")
if target_device == self.storage_device:
cache_entry.model.load_state_dict(cache_entry.state_dict, assign=True)
else:
new_dict: Dict[str, torch.Tensor] = {}
for k, v in cache_entry.state_dict.items():
new_dict[k] = v.to(target_device, copy=True)
cache_entry.model.load_state_dict(new_dict, assign=True)
cache_entry.model.to(target_device)
cache_entry.device = target_device
except Exception as e: # blow away cache entry
self._delete_cache_entry(cache_entry)
raise e
snapshot_after = self._capture_memory_snapshot()
end_model_to_time = time.time()
self.logger.debug(
f"Moved model '{cache_entry.key}' from {source_device} to"
f" {target_device} in {(end_model_to_time-start_model_to_time):.2f}s."
f"Estimated model size: {(cache_entry.size/GB):.3f} GB."
f"{get_pretty_snapshot_diff(snapshot_before, snapshot_after)}"
)
if (
snapshot_before is not None
and snapshot_after is not None
and snapshot_before.vram is not None
and snapshot_after.vram is not None
):
vram_change = abs(snapshot_before.vram - snapshot_after.vram)
# If the estimated model size does not match the change in VRAM, log a warning.
if not math.isclose(
vram_change,
cache_entry.size,
rel_tol=0.1,
abs_tol=10 * MB,
):
self.logger.debug(
f"Moving model '{cache_entry.key}' from {source_device} to"
f" {target_device} caused an unexpected change in VRAM usage. The model's"
" estimated size may be incorrect. Estimated model size:"
f" {(cache_entry.size/GB):.3f} GB.\n"
f"{get_pretty_snapshot_diff(snapshot_before, snapshot_after)}"
)
def print_cuda_stats(self) -> None:
"""Log CUDA diagnostics."""
vram = "%4.2fG" % (torch.cuda.memory_allocated() / GB)
ram = "%4.2fG" % (self.cache_size() / GB)
in_ram_models = 0
in_vram_models = 0
locked_in_vram_models = 0
for cache_record in self._cached_models.values():
if hasattr(cache_record.model, "device"):
if cache_record.model.device == self.storage_device:
in_ram_models += 1
else:
in_vram_models += 1
if cache_record.locked:
locked_in_vram_models += 1
self.logger.debug(
f"Current VRAM/RAM usage: {vram}/{ram}; models_in_ram/models_in_vram(locked) ="
f" {in_ram_models}/{in_vram_models}({locked_in_vram_models})"
)
def make_room(self, size: int) -> None:
"""Make enough room in the cache to accommodate a new model of indicated size.
Note: This function deletes all of the cache's internal references to a model in order to free it. If there are
external references to the model, there's nothing that the cache can do about it, and those models will not be
garbage-collected.
"""
bytes_needed = size
maximum_size = self.max_cache_size * GB # stored in GB, convert to bytes
current_size = self.cache_size()
if current_size + bytes_needed > maximum_size:
self.logger.debug(
f"Max cache size exceeded: {(current_size/GB):.2f}/{self.max_cache_size:.2f} GB, need an additional"
f" {(bytes_needed/GB):.2f} GB"
)
self.logger.debug(f"Before making_room: cached_models={len(self._cached_models)}")
pos = 0
models_cleared = 0
while current_size + bytes_needed > maximum_size and pos < len(self._cache_stack):
model_key = self._cache_stack[pos]
cache_entry = self._cached_models[model_key]
device = cache_entry.model.device if hasattr(cache_entry.model, "device") else None
self.logger.debug(
f"Model: {model_key}, locks: {cache_entry._locks}, device: {device}, loaded: {cache_entry.loaded}"
)
if not cache_entry.locked:
self.logger.debug(
f"Removing {model_key} from RAM cache to free at least {(size/GB):.2f} GB (-{(cache_entry.size/GB):.2f} GB)"
)
current_size -= cache_entry.size
models_cleared += 1
self._delete_cache_entry(cache_entry)
del cache_entry
else:
pos += 1
if models_cleared > 0:
# There would likely be some 'garbage' to be collected regardless of whether a model was cleared or not, but
# there is a significant time cost to calling `gc.collect()`, so we want to use it sparingly. (The time cost
# is high even if no garbage gets collected.)
#
# Calling gc.collect(...) when a model is cleared seems like a good middle-ground:
# - If models had to be cleared, it's a signal that we are close to our memory limit.
# - If models were cleared, there's a good chance that there's a significant amount of garbage to be
# collected.
#
# Keep in mind that gc is only responsible for handling reference cycles. Most objects should be cleaned up
# immediately when their reference count hits 0.
if self.stats:
self.stats.cleared = models_cleared
gc.collect()
TorchDevice.empty_cache()
self.logger.debug(f"After making room: cached_models={len(self._cached_models)}")
def _delete_cache_entry(self, cache_entry: CacheRecord[AnyModel]) -> None:
self._cache_stack.remove(cache_entry.key)
del self._cached_models[cache_entry.key]

View File

@@ -1,64 +0,0 @@
"""
Base class and implementation of a class that moves models in and out of VRAM.
"""
from typing import Dict, Optional
import torch
from invokeai.backend.model_manager import AnyModel
from invokeai.backend.model_manager.load.model_cache.model_cache_base import (
CacheRecord,
ModelCacheBase,
ModelLockerBase,
)
class ModelLocker(ModelLockerBase):
"""Internal class that mediates movement in and out of GPU."""
def __init__(self, cache: ModelCacheBase[AnyModel], cache_entry: CacheRecord[AnyModel]):
"""
Initialize the model locker.
:param cache: The ModelCache object
:param cache_entry: The entry in the model cache
"""
self._cache = cache
self._cache_entry = cache_entry
@property
def model(self) -> AnyModel:
"""Return the model without moving it around."""
return self._cache_entry.model
def get_state_dict(self) -> Optional[Dict[str, torch.Tensor]]:
"""Return the state dict (if any) for the cached model."""
return self._cache_entry.state_dict
def lock(self) -> AnyModel:
"""Move the model into the execution device (GPU) and lock it."""
self._cache_entry.lock()
try:
if self._cache.lazy_offloading:
self._cache.offload_unlocked_models(self._cache_entry.size)
self._cache.move_model_to_device(self._cache_entry, self._cache.execution_device)
self._cache_entry.loaded = True
self._cache.logger.debug(f"Locking {self._cache_entry.key} in {self._cache.execution_device}")
self._cache.print_cuda_stats()
except torch.cuda.OutOfMemoryError:
self._cache.logger.warning("Insufficient GPU memory to load model. Aborting")
self._cache_entry.unlock()
raise
except Exception:
self._cache_entry.unlock()
raise
return self.model
def unlock(self) -> None:
"""Call upon exit from context."""
self._cache_entry.unlock()
if not self._cache.lazy_offloading:
self._cache.offload_unlocked_models(0)
self._cache.print_cuda_stats()

View File

@@ -0,0 +1,33 @@
from typing import Any, Callable
import torch
from torch.overrides import TorchFunctionMode
def add_autocast_to_module_forward(m: torch.nn.Module, to_device: torch.device):
"""Monkey-patch m.forward(...) with a new forward(...) method that activates device autocasting for its duration."""
old_forward = m.forward
def new_forward(*args: Any, **kwargs: Any):
with TorchFunctionAutocastDeviceContext(to_device):
return old_forward(*args, **kwargs)
m.forward = new_forward
def _cast_to_device_and_run(
func: Callable[..., Any], args: tuple[Any, ...], kwargs: dict[str, Any], to_device: torch.device
):
args_on_device = [a.to(to_device) if isinstance(a, torch.Tensor) else a for a in args]
kwargs_on_device = {k: v.to(to_device) if isinstance(v, torch.Tensor) else v for k, v in kwargs.items()}
return func(*args_on_device, **kwargs_on_device)
class TorchFunctionAutocastDeviceContext(TorchFunctionMode):
def __init__(self, to_device: torch.device):
self._to_device = to_device
def __torch_function__(
self, func: Callable[..., Any], types, args: tuple[Any, ...] = (), kwargs: dict[str, Any] | None = None
):
return _cast_to_device_and_run(func, args, kwargs or {}, self._to_device)

View File

@@ -0,0 +1,41 @@
from pathlib import Path
from typing import Optional
from transformers import CLIPVisionModelWithProjection
from invokeai.backend.model_manager.config import (
AnyModel,
AnyModelConfig,
BaseModelType,
DiffusersConfigBase,
ModelFormat,
ModelType,
SubModelType,
)
from invokeai.backend.model_manager.load.load_default import ModelLoader
from invokeai.backend.model_manager.load.model_loader_registry import ModelLoaderRegistry
@ModelLoaderRegistry.register(base=BaseModelType.Any, type=ModelType.CLIPVision, format=ModelFormat.Diffusers)
class ClipVisionLoader(ModelLoader):
"""Class to load CLIPVision models."""
def _load_model(
self,
config: AnyModelConfig,
submodel_type: Optional[SubModelType] = None,
) -> AnyModel:
if not isinstance(config, DiffusersConfigBase):
raise ValueError("Only DiffusersConfigBase models are currently supported here.")
if submodel_type is not None:
raise Exception("There are no submodels in CLIP Vision models.")
model_path = Path(config.path)
model = CLIPVisionModelWithProjection.from_pretrained(
model_path, torch_dtype=self._torch_dtype, local_files_only=True
)
assert isinstance(model, CLIPVisionModelWithProjection)
return model

View File

@@ -19,6 +19,10 @@ from invokeai.backend.flux.controlnet.state_dict_utils import (
is_state_dict_xlabs_controlnet,
)
from invokeai.backend.flux.controlnet.xlabs_controlnet_flux import XLabsControlNetFlux
from invokeai.backend.flux.ip_adapter.state_dict_utils import infer_xlabs_ip_adapter_params_from_state_dict
from invokeai.backend.flux.ip_adapter.xlabs_ip_adapter_flux import (
XlabsIpAdapterFlux,
)
from invokeai.backend.flux.model import Flux
from invokeai.backend.flux.modules.autoencoder import AutoEncoder
from invokeai.backend.flux.util import ae_params, params
@@ -35,6 +39,7 @@ from invokeai.backend.model_manager.config import (
CLIPEmbedDiffusersConfig,
ControlNetCheckpointConfig,
ControlNetDiffusersConfig,
IPAdapterCheckpointConfig,
MainBnbQuantized4bCheckpointConfig,
MainCheckpointConfig,
MainGGUFCheckpointConfig,
@@ -79,7 +84,15 @@ class FluxVAELoader(ModelLoader):
model = AutoEncoder(ae_params[config.config_path])
sd = load_file(model_path)
model.load_state_dict(sd, assign=True)
model.to(dtype=self._torch_dtype)
# VAE is broken in float16, which mps defaults to
if self._torch_dtype == torch.float16:
try:
vae_dtype = torch.tensor([1.0], dtype=torch.bfloat16, device=self._torch_device).dtype
except TypeError:
vae_dtype = torch.float32
else:
vae_dtype = self._torch_dtype
model.to(vae_dtype)
return model
@@ -123,9 +136,9 @@ class BnbQuantizedLlmInt8bCheckpointModel(ModelLoader):
"The bnb modules are not available. Please install bitsandbytes if available on your platform."
)
match submodel_type:
case SubModelType.Tokenizer2:
case SubModelType.Tokenizer2 | SubModelType.Tokenizer3:
return T5Tokenizer.from_pretrained(Path(config.path) / "tokenizer_2", max_length=512)
case SubModelType.TextEncoder2:
case SubModelType.TextEncoder2 | SubModelType.TextEncoder3:
te2_model_path = Path(config.path) / "text_encoder_2"
model_config = AutoConfig.from_pretrained(te2_model_path)
with accelerate.init_empty_weights():
@@ -167,10 +180,10 @@ class T5EncoderCheckpointModel(ModelLoader):
raise ValueError("Only T5EncoderConfig models are currently supported here.")
match submodel_type:
case SubModelType.Tokenizer2:
case SubModelType.Tokenizer2 | SubModelType.Tokenizer3:
return T5Tokenizer.from_pretrained(Path(config.path) / "tokenizer_2", max_length=512)
case SubModelType.TextEncoder2:
return T5EncoderModel.from_pretrained(Path(config.path) / "text_encoder_2")
case SubModelType.TextEncoder2 | SubModelType.TextEncoder3:
return T5EncoderModel.from_pretrained(Path(config.path) / "text_encoder_2", torch_dtype="auto")
raise ValueError(
f"Only Tokenizer and TextEncoder submodels are currently supported. Received: {submodel_type.value if submodel_type else 'None'}"
@@ -352,3 +365,26 @@ class FluxControlnetModel(ModelLoader):
model.load_state_dict(sd, assign=True)
return model
@ModelLoaderRegistry.register(base=BaseModelType.Flux, type=ModelType.IPAdapter, format=ModelFormat.Checkpoint)
class FluxIpAdapterModel(ModelLoader):
"""Class to load FLUX IP-Adapter models."""
def _load_model(
self,
config: AnyModelConfig,
submodel_type: Optional[SubModelType] = None,
) -> AnyModel:
if not isinstance(config, IPAdapterCheckpointConfig):
raise ValueError(f"Unexpected model config type: {type(config)}.")
sd = load_file(Path(config.path))
params = infer_xlabs_ip_adapter_params_from_state_dict(sd)
with accelerate.init_empty_weights():
model = XlabsIpAdapterFlux(params=params)
model.load_xlabs_state_dict(sd, assign=True)
return model

View File

@@ -22,7 +22,6 @@ from invokeai.backend.model_manager.load.load_default import ModelLoader
from invokeai.backend.model_manager.load.model_loader_registry import ModelLoaderRegistry
@ModelLoaderRegistry.register(base=BaseModelType.Any, type=ModelType.CLIPVision, format=ModelFormat.Diffusers)
@ModelLoaderRegistry.register(base=BaseModelType.Any, type=ModelType.T2IAdapter, format=ModelFormat.Diffusers)
class GenericDiffusersLoader(ModelLoader):
"""Class to load simple diffusers models."""

View File

@@ -26,7 +26,7 @@ from invokeai.backend.model_manager import (
SubModelType,
)
from invokeai.backend.model_manager.load.load_default import ModelLoader
from invokeai.backend.model_manager.load.model_cache.model_cache_base import ModelCacheBase
from invokeai.backend.model_manager.load.model_cache.model_cache import ModelCache
from invokeai.backend.model_manager.load.model_loader_registry import ModelLoaderRegistry
@@ -40,7 +40,7 @@ class LoRALoader(ModelLoader):
self,
app_config: InvokeAIAppConfig,
logger: Logger,
ram_cache: ModelCacheBase[AnyModel],
ram_cache: ModelCache,
):
"""Initialize the loader."""
super().__init__(app_config, logger, ram_cache)

View File

@@ -25,6 +25,7 @@ from invokeai.backend.model_manager.config import (
DiffusersConfigBase,
MainCheckpointConfig,
)
from invokeai.backend.model_manager.load.model_cache.model_cache import get_model_cache_key
from invokeai.backend.model_manager.load.model_loader_registry import ModelLoaderRegistry
from invokeai.backend.model_manager.load.model_loaders.generic_diffusers import GenericDiffusersLoader
from invokeai.backend.util.silence_warnings import SilenceWarnings
@@ -42,6 +43,7 @@ VARIANT_TO_IN_CHANNEL_MAP = {
@ModelLoaderRegistry.register(
base=BaseModelType.StableDiffusionXLRefiner, type=ModelType.Main, format=ModelFormat.Diffusers
)
@ModelLoaderRegistry.register(base=BaseModelType.StableDiffusion3, type=ModelType.Main, format=ModelFormat.Diffusers)
@ModelLoaderRegistry.register(base=BaseModelType.StableDiffusion1, type=ModelType.Main, format=ModelFormat.Checkpoint)
@ModelLoaderRegistry.register(base=BaseModelType.StableDiffusion2, type=ModelType.Main, format=ModelFormat.Checkpoint)
@ModelLoaderRegistry.register(base=BaseModelType.StableDiffusionXL, type=ModelType.Main, format=ModelFormat.Checkpoint)
@@ -51,13 +53,6 @@ VARIANT_TO_IN_CHANNEL_MAP = {
class StableDiffusionDiffusersModel(GenericDiffusersLoader):
"""Class to load main models."""
model_base_to_model_type = {
BaseModelType.StableDiffusion1: "FrozenCLIPEmbedder",
BaseModelType.StableDiffusion2: "FrozenOpenCLIPEmbedder",
BaseModelType.StableDiffusionXL: "SDXL",
BaseModelType.StableDiffusionXLRefiner: "SDXL-Refiner",
}
def _load_model(
self,
config: AnyModelConfig,
@@ -117,8 +112,6 @@ class StableDiffusionDiffusersModel(GenericDiffusersLoader):
load_class = load_classes[config.base][config.variant]
except KeyError as e:
raise Exception(f"No diffusers pipeline known for base={config.base}, variant={config.variant}") from e
prediction_type = config.prediction_type.value
upcast_attention = config.upcast_attention
# Without SilenceWarnings we get log messages like this:
# site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
@@ -129,13 +122,7 @@ class StableDiffusionDiffusersModel(GenericDiffusersLoader):
# ['text_model.embeddings.position_ids']
with SilenceWarnings():
pipeline = load_class.from_single_file(
config.path,
torch_dtype=self._torch_dtype,
prediction_type=prediction_type,
upcast_attention=upcast_attention,
load_safety_checker=False,
)
pipeline = load_class.from_single_file(config.path, torch_dtype=self._torch_dtype)
if not submodel_type:
return pipeline
@@ -146,5 +133,5 @@ class StableDiffusionDiffusersModel(GenericDiffusersLoader):
if subtype == submodel_type:
continue
if submodel := getattr(pipeline, subtype.value, None):
self._ram_cache.put(config.key, submodel_type=subtype, model=submodel)
self._ram_cache.put(get_model_cache_key(config.key, subtype), model=submodel)
return getattr(pipeline, submodel_type.value)

View File

@@ -20,7 +20,7 @@ from typing import Optional
import requests
from huggingface_hub import HfApi, configure_http_backend, hf_hub_url
from huggingface_hub.utils._errors import RepositoryNotFoundError, RevisionNotFoundError
from huggingface_hub.errors import RepositoryNotFoundError, RevisionNotFoundError
from pydantic.networks import AnyHttpUrl
from requests.sessions import Session

Some files were not shown because too many files have changed in this diff Show More