Compare commits

...

338 Commits

Author SHA1 Message Date
Lincoln Stein
a81acacb2a fix ruff error 2024-12-02 19:37:51 -05:00
Lincoln Stein
f4f6296183 resolved conflicts with main 2024-12-02 19:35:43 -05:00
psychedelicious
13703d8f55 chore: bump version to v5.4.3rc2 2024-12-02 15:02:30 -08:00
psychedelicious
60d838d0a5 chore(ui): update whats new copy 2024-12-02 15:02:30 -08:00
Riccardo Giovanetti
2a157a44bf translationBot(ui): update translation (Italian)
Currently translated at 99.3% (1633 of 1643 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-12-02 14:52:05 -08:00
James Reynolds
d61b5833c2 Fix documentation broken links and remove whitespace at end of lines 2024-12-02 14:49:53 -08:00
Jonathan
c094838c6a Update model_util.py 2024-12-02 14:35:02 -08:00
Hosted Weblate
2d334c8dd8 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-12-02 14:05:51 -08:00
Mary Hipp
a6be26e174 fix(worker): only apply processor cancel logic if cancel event is for current queue item 2024-12-02 14:03:05 -08:00
psychedelicious
f8c7adddd0 feat(ui): add vietnamese to language picker
Closes #7384
2024-12-02 08:12:14 -05:00
psychedelicious
17da1d92e9 fix(ui): remove "adding to" text on Invoke tooltip on Workflows/Upscaling tabs
The "adding to" text indicates if images are going to the gallery or staging area. This info is relevant only to the canvas tab, but was displayed on Upscaling and Workflows tabs. Removed it from those tabs.
2024-12-02 08:08:16 -05:00
psychedelicious
1cc57a4854 chore(ui): lint 2024-12-02 07:59:12 -05:00
psychedelicious
3993fae331 fix(ui): unable to invoke w/ empty inpaint mask or raster layer
Removed the empty state checks for these layer types - it's always OK to invoke when they are empty.
2024-12-02 07:59:12 -05:00
psychedelicious
1446526d55 tidy(ui): translation keys for canvas layer warnings 2024-12-02 07:59:12 -05:00
psychedelicious
62c024e725 feat(ui): add gallery image ctx menu items to create ref image from image
Appears these actions disappeared at some point. Restoring them.
2024-12-02 07:52:58 -05:00
psychedelicious
1e92bb4e94 fix(ui): ref image defaults to prev ref image's image selection
A redux selector is used to get the "default" IP Adapter. The selector uses the model list query result to select an IP Adapter model to be preset by default.

The selector is memoized, so if we mutate the returned default IP Adapter state, it mutates the result of the selector for all consumers.

For example, the `image` property of the default IP Adapter selector result is `null`. When we set the `image` property of the selector result while creating an IP Adapter, this does not trigger the selector to recompute its result. We end up setting the image for the selector result directly, and all other consumers now have that same image set.

Solution - we need to clone the selector result everywhere it is used. This was missed in a few spots, causing the issue.
2024-12-02 07:48:39 -05:00
psychedelicious
db6398fdf6 feat(ui): less confusing empty state for rg ref images
It was easy to misunderstand the empty state for a regional guidance reference image. There was no label, so it seemed like it was the whole region that was empty.

This small change adds the "Reference Image" heading to the empty state, so it's clear that the empty state messaging refers to this reference image, not the whole regional guidance layer.
2024-12-02 07:46:10 -05:00
Riccardo Giovanetti
ebd73a2ac2 translationBot(ui): update translation (Italian)
Currently translated at 98.7% (1622 of 1643 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-12-02 02:13:51 -08:00
Hosted Weblate
8ee95cab00 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-12-02 02:13:51 -08:00
Linos
d1184201a8 translationBot(ui): update translation (Vietnamese)
Currently translated at 100.0% (1643 of 1643 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 100.0% (1638 of 1638 strings)

Co-authored-by: Linos <linos.coding@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/vi/
Translation: InvokeAI/Web UI
2024-12-02 02:13:51 -08:00
Nik Nikovsky
5887891654 translationBot(ui): update translation (Polish)
Currently translated at 4.9% (81 of 1638 strings)

Co-authored-by: Nik Nikovsky <zejdzztegomaila@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/pl/
Translation: InvokeAI/Web UI
2024-12-02 02:13:51 -08:00
Riku
765ca4e004 translationBot(ui): update translation (German)
Currently translated at 69.7% (1142 of 1638 strings)

Co-authored-by: Riku <riku.block@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/de/
Translation: InvokeAI/Web UI
2024-12-02 02:13:51 -08:00
Riku
159b00a490 fix(app): adjust session queue api type 2024-12-01 20:06:05 -08:00
Riku
3fbf6f2d2a chore(ui): update typegen schema 2024-12-01 19:56:09 -08:00
Riku
931fca7cd1 fix(ui): call cancel instead of clear queue 2024-12-01 19:53:12 -08:00
Riku
db84a3a5d4 refactor(ui): move clear queue hook to separate file 2024-12-01 19:42:25 -08:00
psychedelicious
ca8313e805 feat(ui): add new layer from image menu items for staging area
The layers are disabled when created so as to not interfere with the canvas state.
2024-12-01 19:37:49 -08:00
psychedelicious
df849035ee feat(ui): allow setting isEnabled, isLocked and name in createNewCanvasEntityFromImage util 2024-12-01 19:37:49 -08:00
psychedelicious
8d97fe69ca feat(ui): use imageDTOToFile in staging area save to gallery button 2024-12-01 19:37:49 -08:00
psychedelicious
9044e53a9b feat(ui): add imageDTOToFile util 2024-12-01 19:37:49 -08:00
Jonathan
6012b0f912 Update flux_text_encoder.py
Updated version number for FLUX Text Encoding.
2024-11-30 08:29:21 -05:00
Jonathan
bb0ed5dc8a Update flux_denoise.py
Updated node version for FLUX Denoise.
2024-11-30 08:29:21 -05:00
Ryan Dick
021552fd81 Avoid unnecessary dtype conversions with rope encodings. 2024-11-29 12:32:50 -05:00
Ryan Dick
be73dbba92 Use view() instead of rearrange() for better performance. 2024-11-29 12:32:50 -05:00
Ryan Dick
db9c0cad7c Replace custom RMSNorm implementation with torch.nn.functional.rms_norm(...) for improved speed. 2024-11-29 12:32:50 -05:00
Ryan Dick
54b7f9a063 FLUX Regional Prompting (#7388)
## Summary

This PR adds support for regional prompting with FLUX.

### Example 1
Global prompt: `An architecture rendering of the reception area of a
corporate office with modern decor.`
<img width="1386" alt="image"
src="https://github.com/user-attachments/assets/c8169bdb-49a9-44bc-bd9e-58d98e09094b">

![image](https://github.com/user-attachments/assets/4a426be9-9d7a-4527-b27c-2d2514ee73fe)

## QA Instructions

- [x] Test that there is no slowdown in the base case with a single
global prompt.
- [x] Test image fully covered by regional masks.
- [x] Test image covered by region masks with small gaps.
- [x] Test region masks with large unmasked ‘background’ regions
- [x] Test region masks with significant overlap
- [x] Test multiple global prompts.
- [x] Test no global prompt.
- [x] Test regional negative prompts (It runs... but results are not
great. Needs more tuning to be useful.)
- Test compatibility with:
    - [x] ControlNet
    - [x] LoRA
    - [x] IP-Adapter

## Remaining TODO

- [x] Disable the following UI features for FLUX prompt regions:
negative prompts, reference images, auto-negative.

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
2024-11-29 08:56:42 -05:00
psychedelicious
7d488a5352 feat(ui): add delete button to regional ref image empty state 2024-11-29 15:51:24 +10:00
psychedelicious
4d7667f63d fix(ui): add missing translations 2024-11-29 15:43:49 +10:00
psychedelicious
08704ee8ec feat(ui): use canvas layer validators in control/ip adapter graph builders 2024-11-29 15:32:48 +10:00
psychedelicious
5910892c33 Merge remote-tracking branch 'origin/main' into ryan/flux-regional-prompting 2024-11-29 15:19:39 +10:00
psychedelicious
46a09d9e90 feat(ui): format warnings tooltip 2024-11-29 13:32:51 +10:00
psychedelicious
df0c7d73f3 feat(ui): use regional guidance validation utils in graph builders 2024-11-29 13:26:09 +10:00
psychedelicious
3905c97e32 feat(ui): return translation keys from validation utils instead of translated strings 2024-11-29 13:25:09 +10:00
psychedelicious
0be796a808 feat(ui): use layer validation utils in invoke readiness utils 2024-11-29 13:14:26 +10:00
psychedelicious
7dd33b0f39 feat(ui): add indicator to canvas layer headers, displaying validation warnings
If there are any issues with the layer, the icon is displayed. If the layer is disabled, the icon is greyed out but still visible.
2024-11-29 13:13:47 +10:00
psychedelicious
484aaf1595 feat(ui): add canvas layer validation utils
These helpers consolidate layer validation checks. For example, checking that the layer has content drawn, is compatible with the selected main model, has valid reference images, etc.
2024-11-29 13:12:32 +10:00
psychedelicious
c276b60af9 tidy(ui): use object for addRegions graph builder util arg 2024-11-29 08:49:41 +10:00
Ryan Dick
5d8dd6e26e Fix FLUX regional negative prompts. 2024-11-28 18:49:29 +00:00
Emmanuel Ferdman
5bca68d873 docs: update code of conduct reference
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2024-11-27 17:38:33 -08:00
Ryan Dick
64364e7911 Short-circuit if there are no region masks in FLUX and don't apply attention masking. 2024-11-27 22:40:10 +00:00
Ryan Dick
6565cea039 Comment unused _prepare_unrestricted_attn_mask(...) for future reference. 2024-11-27 22:16:44 +00:00
Ryan Dick
3ebd8d6c07 Delete outdated TODO comment. 2024-11-27 22:13:25 +00:00
Ryan Dick
e970185161 Tweak flux regional prompting attention scheme based on latest experimentation. 2024-11-27 22:13:07 +00:00
Ryan Dick
fa5653cdf7 Remove unused 'denoise' param to addRegions(). 2024-11-27 17:08:42 +00:00
Ryan Dick
9a7b000995 Update frontend to support regional prompts with FLUX in the canvas. 2024-11-27 17:04:43 +00:00
Ryan Dick
3a27242838 Bump transformers. The main motivation for this bump is to ingest a fix for DepthAnything postprocessing artifacts. 2024-11-27 07:46:16 -08:00
Ryan Dick
8cfb032051 Add utility ImagePanelLayoutInvocation for working with In-Context LoRA workflows. 2024-11-26 20:58:31 -08:00
Ryan Dick
06a9d4e2b2 Use a Textarea component for the FluxTextEncoderInvocation prompt field. 2024-11-26 20:58:31 -08:00
Brandon Rising
ed46acee79 fix: Fail scan on InvalidMagicError in picklescan, update default for read_checkpoint_meta to scan unless explicitly told not to 2024-11-26 16:17:12 -05:00
Ryan Dick
b54463d294 Allow regional prompting background regions to attend to themselves and to the entire txt embedding. 2024-11-26 17:57:31 +00:00
Ryan Dick
faee79dc95 Distinguish between restricted and unrestricted attn masks in FLUX regional prompting. 2024-11-26 16:55:52 +00:00
Mary Hipp
965cd76e33 lint fix 2024-11-26 11:25:53 -05:00
Mary Hipp
e5e8cbf34c shorten reference image mode descriptions; 2024-11-26 11:25:53 -05:00
Mary Hipp
3412a52594 (ui): updates various informational tooltips, adds descriptons to IP adapter method options 2024-11-26 11:25:53 -05:00
Ryan Dick
e01f66b026 Apply regional attention masks in the single stream blocks in addition to the double stream blocks. 2024-11-25 22:40:08 +00:00
Ryan Dick
53abdde242 Update Flux RegionalPromptingExtension to prepare both a mask with restricted image self-attention and a mask with unrestricted image self attention. 2024-11-25 22:04:23 +00:00
Ryan Dick
94c088300f Be smarter about selecting the global CLIP embedding for FLUX regional prompting. 2024-11-25 20:15:04 +00:00
Ryan Dick
3741a6f5e0 Fix device handling for regional masks and apply the attention mask in the FLUX double stream block. 2024-11-25 16:02:03 +00:00
Kent Keirsey
059336258f Create SECURITY.md 2024-11-25 04:10:03 -08:00
Ryan Dick
2c23b8414c Use a single global CLIP embedding for FLUX regional guidance. 2024-11-22 23:01:43 +00:00
Mary Hipp
271cc52c80 fix(ui): use token for download if its in store 2024-11-22 12:08:05 -05:00
Ryan Dick
20356c0746 Fixup the logic for preparing FLUX regional prompt attention masks. 2024-11-21 22:46:25 +00:00
psychedelicious
e44458609f chore: bump version to v5.4.3rc1 2024-11-21 10:32:43 -08:00
psychedelicious
69d86a7696 feat(ui): address feedback 2024-11-21 09:54:35 -08:00
Hippalectryon
56db1a9292 Use proxyrect and setEntityPosition to sync transformer position 2024-11-21 09:54:35 -08:00
Hippalectryon
cf50e5eeee Make sure the canvas is focused 2024-11-21 09:54:35 -08:00
Hippalectryon
c9c07968d2 lint 2024-11-21 09:54:35 -08:00
Hippalectryon
97d0757176 use $isInteractable instead of $isDisabled 2024-11-21 09:54:35 -08:00
Hippalectryon
0f51b677a9 refactor 2024-11-21 09:54:35 -08:00
Hippalectryon
56ca94c3a9 Don't move if the layer is disabled
Lint
2024-11-21 09:54:35 -08:00
Hippalectryon
28d169f859 Allow moving layers using the keyboard 2024-11-21 09:54:35 -08:00
psychedelicious
92f71d99ee tweak(ui): use X icon for rg ref image delete button 2024-11-21 08:50:39 -08:00
psychedelicious
0764c02b1d tweak(ui): code style 2024-11-21 08:50:39 -08:00
psychedelicious
081c7569fe feat(ui): add global ref image empty state 2024-11-21 08:50:39 -08:00
psychedelicious
20f6532ee8 feat(ui): add empty state for regional guidance ref image 2024-11-21 08:50:39 -08:00
Mary Hipp
b9e8910478 feat(ui): add actions for video modal clicks 2024-11-21 11:15:55 -05:00
Mary Hipp
ded8391e3c use nanostore for schema parsed instead 2024-11-20 20:13:31 -05:00
Mary Hipp
e9dd2c396a limit to one hook 2024-11-20 20:13:31 -05:00
Mary Hipp
0d86de0cb5 fix(ui): make sure schema has loaded before trying to load any workflows 2024-11-20 20:13:31 -05:00
Ryan Dick
bad1149504 WIP - add rough logic for preparing the FLUX regional prompting attention mask. 2024-11-20 22:29:36 +00:00
Ryan Dick
fda7aaa7ca Pass RegionalPromptingExtension down to the CustomDoubleStreamBlockProcessor in FLUX. 2024-11-20 19:48:04 +00:00
Ryan Dick
85c616fa34 WIP - Pass prompt masks to FLUX model during denoising. 2024-11-20 18:51:43 +00:00
psychedelicious
549f4e9794 feat(ui): set default infill method to lama 2024-11-20 11:19:17 -05:00
psychedelicious
ef8ededd2f fix(ui): disable width and height output on image batch output
There's a technical challenge with outputting these values directly. `ImageField` does not store them, so the batch's `ImageField` collection does not have width and height for each image.

In order to set up the batch and pass along width and height for each image, we'd need to make a network request for each image when the user clicks Invoke. It would often be cached, but this will eventually create a scaling issue and poor user experience.

As a very simple workaround, users can output the batch image output into an `Image Primitive` node to access the width and height.

This change is implemented by adding some simple special handling when parsing the output fields for the `image_batch` node.

I'll keep this situation in mind when extending the batching system to other field types.
2024-11-20 11:16:54 -05:00
Mary Hipp
1948ffe106 make sure Soft Edge Detection has preprocessor applied 2024-11-20 08:46:02 -05:00
psychedelicious
c70f4404c4 fix(ui): special node icon tooltip 2024-11-19 14:29:09 -08:00
psychedelicious
b157ae928c chore(ui): update what's new copy 2024-11-19 14:29:09 -08:00
psychedelicious
7a0871992d chore: bump version to v5.4.2 2024-11-19 14:29:09 -08:00
Hosted Weblate
b38e2e14f4 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

translationBot(ui): update translation files

Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-11-19 14:12:00 -08:00
psychedelicious
7c0e70ec84 tweak(ui): "Watch on YouTube" -> "Watch" 2024-11-19 14:02:11 -08:00
psychedelicious
a89ae9d2bf feat(ui): add links to studio sessions/discord 2024-11-19 14:02:11 -08:00
psychedelicious
ad1fcb3f07 chore(ui): bump @invoke-ai/ui-library
Brings in a fix for `ExternalLink`
2024-11-19 14:02:11 -08:00
psychedelicious
87d74b910b feat(ui): support videos modal 2024-11-19 14:02:11 -08:00
psychedelicious
7ad1c297a4 feat(ui): add actions for reset canvas layers / generation settings to session menus 2024-11-19 13:55:16 -08:00
psychedelicious
fbc629faa6 feat(ui): change reset canvas button to new session menu 2024-11-19 13:55:16 -08:00
psychedelicious
7baa6b3c09 feat(ui): split up new from image into submenus
- `New Canvas from Image` -> `As Raster Layer`, `As Raster Layer (Resize)`, `As Control Layer`, `As Control Layer (Resize)`
- `New Layer from Image` -> (each layer type)
2024-11-19 10:34:00 -08:00
psychedelicious
53d482bade feat(ui): add image ctx menu new canvas without resize option 2024-11-19 10:34:00 -08:00
psychedelicious
5aca04b51b feat(ui): change reset canvas icon to "empty" 2024-11-19 09:56:25 -08:00
psychedelicious
ea8787c8ff feat(ui): update invoke button tooltip for batching
- Split up logic to determine reason why the user cannot invoke for each tab.
- Fix issue where the workflows tab would show reasons related to canvas/upscale tab. The tooltip now only shows information relevant to the current tab.
- Add calculation for batch size to the queue count prediction.
- Use a constant for the enqueue mutation's fixed cache key, instead of a string. Just some typo protection.
2024-11-19 09:53:59 -08:00
psychedelicious
cead2c4445 feat(ui): split up selector utils for useIsReadyToEnqueue 2024-11-19 09:53:59 -08:00
Mary Hipp
f76ac1808c fix(ui): simplify logic for non-local invocation progress alerts 2024-11-19 12:40:40 -05:00
psychedelicious
f01210861b chore: ruff 2024-11-19 07:02:37 -08:00
psychedelicious
f757f23ef0 chore(ui): typegen 2024-11-19 07:02:37 -08:00
psychedelicious
872a6ef209 tidy(nodes): extract slerp from lblend to util fn 2024-11-19 07:02:37 -08:00
psychedelicious
4267e5ffc4 tidy(nodes): bring masked blend latents masking logic into invoke core 2024-11-19 07:02:37 -08:00
Brandon Rising
a69c5ff9ef Add copyright notice for CIELab_to_UPLab.icc 2024-11-19 07:02:37 -08:00
Brandon Rising
3ebd8d7d1b Fix .icc asset file in pyproject.toml 2024-11-19 07:02:37 -08:00
Brandon Rising
1fd80d54a4 Run Ruff 2024-11-19 07:02:37 -08:00
Brandon Rising
991f63e455 Store CIELab_to_UPLab.icc within the repo 2024-11-19 07:02:37 -08:00
Brandon Rising
6a1efd3527 Add validation to some of the node inputs 2024-11-19 07:02:37 -08:00
Brandon Rising
0eadc0dd9e feat: Support a subset of composition nodes within base invokeai 2024-11-19 07:02:37 -08:00
youjayjeel
481423d678 translationBot(ui): update translation (Chinese (Simplified Han script))
Currently translated at 86.0% (1367 of 1588 strings)

Co-authored-by: youjayjeel <youjayjeel@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/zh_Hans/
Translation: InvokeAI/Web UI
2024-11-18 19:29:29 -08:00
Riccardo Giovanetti
89ede0aef3 translationBot(ui): update translation (Italian)
Currently translated at 99.3% (1578 of 1588 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-18 19:29:29 -08:00
gallegonovato
359bdee9c6 translationBot(ui): update translation (Spanish)
Currently translated at 42.3% (672 of 1588 strings)

translationBot(ui): update translation (Spanish)

Currently translated at 28.0% (445 of 1588 strings)

Co-authored-by: gallegonovato <fran-carro@hotmail.es>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/es/
Translation: InvokeAI/Web UI
2024-11-18 19:29:29 -08:00
psychedelicious
0e6fba3763 chore: bump version to v5.4.2rc1 2024-11-18 19:25:39 -08:00
psychedelicious
652502d7a6 fix(ui): add sd-3 grid size of 16px to grid util 2024-11-18 19:15:15 -08:00
psychedelicious
91d981a49e fix(ui): reactflow drag interactions with custom scrollbar 2024-11-18 19:12:27 -08:00
psychedelicious
24f61d21b2 feat(ui): make image field collection scrollable 2024-11-18 19:12:27 -08:00
psychedelicious
eb9a4177c5 feat(ui): allow removing individual images from batch 2024-11-18 19:12:27 -08:00
psychedelicious
3c43351a5b feat(ui): add reset to default value button to field title 2024-11-18 19:12:27 -08:00
psychedelicious
b1359b6dff feat(ui): update field validation logic to handle collection sizes 2024-11-18 19:12:27 -08:00
psychedelicious
bddccf6d2f feat(ui): add graph validation for image collection size 2024-11-18 19:12:27 -08:00
psychedelicious
21ffaab2a2 fix(ui): do not allow invoking when canvas is selectig object 2024-11-18 19:12:27 -08:00
psychedelicious
1e969f938f feat(ui): autosize image collection field grid 2024-11-18 19:12:27 -08:00
psychedelicious
9c6c86ee4f fix(ui): image field collection dnd adds instead of replaces 2024-11-18 19:12:27 -08:00
psychedelicious
6b53a48b48 fix(ui): zod schema refiners must return boolean 2024-11-18 19:12:27 -08:00
psychedelicious
c813fa3fc0 feat(ui): support min and max length for image collections 2024-11-18 19:12:27 -08:00
psychedelicious
a08e61184a chore(ui): typegen 2024-11-18 19:12:27 -08:00
psychedelicious
a0d62a5f41 feat(nodes): add minimum image count to ImageBatchInvocation 2024-11-18 19:12:27 -08:00
psychedelicious
616c0f11e1 feat(ui): image batching in workflows
- Add special handling for `ImageBatchInvocation`
- Add input component for image collections, supporting multi-image upload and dnd
- Minor rework of some hooks for accessing node data
2024-11-18 19:12:27 -08:00
psychedelicious
e1626a4e49 chore(ui): typegen 2024-11-18 19:12:27 -08:00
psychedelicious
6ab891a319 feat(nodes): add ImageBatchInvocation 2024-11-18 19:12:27 -08:00
psychedelicious
492de41316 feat(app): add Classification.Special, used for batch nodes 2024-11-18 19:12:27 -08:00
psychedelicious
c064efc866 feat(app): add ImageField as an allowed batching data type 2024-11-18 19:12:27 -08:00
Ryan Dick
1a0885bfb1 Update FLUX IP-Adapter starter model from XLabs v1 to XLabs v2. 2024-11-18 17:06:53 -08:00
Ryan Dick
e8b202d0a5 Update FLUX IP-Adapter graph construction to optimize for XLabs IP-Adapter v2 over v1. This results in degraded performance with v1 IP-Adapters. 2024-11-18 17:06:53 -08:00
Ryan Dick
c6fc82f756 Infer the clip_extra_context_tokens param from the state dict for FLUX XLabs IP-Adapter V2 models. 2024-11-18 17:06:53 -08:00
Ryan Dick
9a77e951d2 Add unit test for FLUX XLabs IP-Adapter V2 model format. 2024-11-18 17:06:53 -08:00
psychedelicious
8bd4207a27 docs(ui): add docstring to CanvasEntityStateGate 2024-11-18 13:40:08 -08:00
psychedelicious
0bb601aaf7 fix(ui): prevent entity not found errors
The canvas react components pass canvas entity identifiers around, then redux selectors are used to access that entity. This is good for perf - entity states may rapidly change. Passing only the identifiers allows components and other logic to have more granular state updates.

Unfortunately, this design opens the possibility for for an entity identifier to point to an entity that does not exist.

To get around this, I had created a redux selector `selectEntityOrThrow` for canvas entities. As the name implies, it throws if the entity is not found.

While it prevents components/hooks from needing to deal with missing entities, it results in mysterious errors if an entity is missing. Without sourcemaps, it's very difficult to determine what component or hook couldn't find the entity.

Refactoring the app to not depend on this behaviour is tricky. We could pass the entity state around directly as a prop or via context, but as mentioned, this could cause performance issues with rapidly changing entities.

As a workaround, I've made two changes:
- `<CanvasEntityStateGate/>` is a component that takes an entity identifier, returning its children if the entity state exists, or null if not. This component is wraps every usage of `selectEntityOrThrow`.  Theoretically, this should prevent the entity not found errors.
- Add a `caller: string` arg to `selectEntityOrThrow`. This string is now added to the error message when the assertion fails, so we can more easily track the source of the errors.

In the future we can work out a way to not use this throwing selector and retain perf. The app has changed quite a bit since that selector was created - so we may not have to worry about perf at all.
2024-11-18 13:40:08 -08:00
psychedelicious
2da25a0043 fix(ui): progress bar not throbbing when it should (#7332)
When we added more progress events during generation, we indirectly broke the logic that controls when the progress bar throbs.

Co-authored-by: Mary Hipp Rogers <maryhipp@gmail.com>
2024-11-18 14:02:20 +00:00
Mary Hipp
51d0931898 remove GPL-3 licensed package easing-functions 2024-11-18 08:55:17 -05:00
Riccardo Giovanetti
357b68d1ba translationBot(ui): update translation (Italian)
Currently translated at 99.3% (1577 of 1587 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-16 05:49:57 +11:00
Mary Hipp
d9ddb6c32e fix(ui): add padding to the metadata recall section so buttons are not blocked 2024-11-16 05:47:45 +11:00
Mary Hipp
ad02a99a83 fix(ui): ignore user setting for commercial, remove unused state 2024-11-16 05:21:30 +11:00
Mary Hipp
b707dafc7b translation 2024-11-16 05:21:30 +11:00
Mary Hipp
02906c8f5d feat(ui): deferred invocation progress details for model loading 2024-11-16 05:21:30 +11:00
psychedelicious
8538e508f1 chore(ui): lint 2024-11-15 12:59:30 +11:00
Hosted Weblate
8c333ffd14 translationBot(ui): update translation files
Updated by "Remove blank strings" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riccardo Giovanetti
72ace5fdff translationBot(ui): update translation (Italian)
Currently translated at 99.4% (1575 of 1583 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Gohsuke Shimada
9b7583fc84 translationBot(ui): update translation (Japanese)
Currently translated at 33.6% (533 of 1583 strings)

translationBot(ui): update translation (Japanese)

Currently translated at 30.3% (481 of 1583 strings)

Co-authored-by: Gohsuke Shimada <ghoskay@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/ja/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
dakota2472
989eee338e translationBot(ui): update translation (Italian)
Currently translated at 99.8% (1580 of 1583 strings)

Co-authored-by: dakota2472 <gardaweb.net@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Linos
acc3d7b91b translationBot(ui): update translation (Vietnamese)
Currently translated at 100.0% (1583 of 1583 strings)

Co-authored-by: Linos <tt250208@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/vi/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riccardo Giovanetti
49de868658 translationBot(ui): update translation (Italian)
Currently translated at 99.4% (1575 of 1583 strings)

translationBot(ui): update translation (Italian)

Currently translated at 99.4% (1573 of 1581 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Hosted Weblate
b1702c7d90 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

translationBot(ui): update translation files

Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riccardo Giovanetti
e49e19ea13 translationBot(ui): update translation (Italian)
Currently translated at 99.4% (1569 of 1577 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
gallegonovato
c9f91f391e translationBot(ui): update translation (Spanish)
Currently translated at 17.6% (278 of 1575 strings)

translationBot(ui): update translation (Spanish)

Currently translated at 17.3% (274 of 1575 strings)

Co-authored-by: gallegonovato <fran-carro@hotmail.es>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/es/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Hosted Weblate
4cb6b2b701 translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

translationBot(ui): update translation files

Updated by "Remove blank strings" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riccardo Giovanetti
7d132ea148 translationBot(ui): update translation (Italian)
Currently translated at 99.1% (1564 of 1577 strings)

translationBot(ui): update translation (Italian)

Currently translated at 99.4% (1566 of 1575 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riku
1088accd91 translationBot(ui): update translation (German)
Currently translated at 71.8% (1131 of 1575 strings)

Co-authored-by: Riku <riku.block@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/de/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
dakota2472
8d237d8f8b translationBot(ui): update translation (Italian)
Currently translated at 99.6% (1569 of 1575 strings)

Co-authored-by: dakota2472 <gardaweb.net@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Linos
0c86a3232d translationBot(ui): update translation (Vietnamese)
Currently translated at 100.0% (1581 of 1581 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 100.0% (1576 of 1576 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 100.0% (1575 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 85.0% (1340 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 78.7% (1240 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 73.1% (1152 of 1575 strings)

translationBot(ui): update translation (English)

Currently translated at 99.9% (1574 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 57.9% (913 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 37.0% (584 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 3.2% (51 of 1575 strings)

translationBot(ui): update translation (Vietnamese)

Currently translated at 3.2% (51 of 1575 strings)

Co-authored-by: Linos <tt250208@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/en/
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/vi/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
aidawanglion
dbfb0359cb translationBot(ui): update translation (Chinese (Simplified Han script))
Currently translated at 79.9% (1266 of 1583 strings)

translationBot(ui): update translation (Chinese (Simplified Han script))

Currently translated at 74.4% (1171 of 1573 strings)

Co-authored-by: aidawanglion <youjayjeel@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/zh_Hans/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
Riccardo Giovanetti
b4c2aa596b translationBot(ui): update translation (Italian)
Currently translated at 99.6% (1569 of 1575 strings)

translationBot(ui): update translation (Italian)

Currently translated at 99.4% (1567 of 1575 strings)

translationBot(ui): update translation (Italian)

Currently translated at 99.4% (1565 of 1573 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-11-15 12:59:30 +11:00
psychedelicious
87e89b7995 fix(ui): remove progress message condition for canvas destinations 2024-11-15 12:55:46 +11:00
psychedelicious
9b089430e2 chore: bump version to v5.4.1 2024-11-15 11:51:06 +11:00
psychedelicious
f2b0025958 chore(ui): update what's new 2024-11-15 11:51:06 +11:00
psychedelicious
4b390906bc fix(ui): multiple selection dnd sometimes doesn't get full selection
Turns out a gallery image's `imageDTO` object can actually be a different object by reference. I thought this was not possible thanks to how we have a quasi-normalized cache.

Need to check against image name instead of reference equality when deciding whether or not to use the single image or the gallery selection for the dnd payload.
2024-11-15 11:21:03 +11:00
psychedelicious
c5b8efe03b fix(ui): unable to use text inputs within draggable 2024-11-15 10:25:30 +11:00
psychedelicious
4d08d00ad8 chore(ui): knip 2024-11-14 13:38:40 -08:00
psychedelicious
9b0130262b fix(ui): use silent upload for single-image upload buttons 2024-11-14 13:38:40 -08:00
psychedelicious
878093f64e fix(ui): image uploading handling
Rework uploadImage and uploadImages helpers and the RTK listener, ensuring gallery view isn't changed unexpectedly and preventing extraneous toasts.

Fix staging area save to gallery button to essentially make a copy of the image, instead of changing its intermediate status.
2024-11-14 13:38:40 -08:00
psychedelicious
d5ff7ef250 feat(ui): update output only masked regions
- New name: "Output only Generated Regions"
- New default: true (this was the intention, but at some point the behaviour of the setting was inverted without the default being changed)
2024-11-14 13:35:55 -08:00
psychedelicious
f36583f866 feat(ui): tweak image selection/hover styling
The styling in gallery for selected vs hovered was very similar, leading users to think that the hovered image was also selected.

Reducing the borders for hovered images to a single pixel makes it easier to distinguish between selected and hovered.
2024-11-14 16:28:53 -05:00
psychedelicious
829bc1bc7d feat(ui): progress alert config setting
- Add `invocationProgressAlert` as a disable-able feature. Hide the alert and the setting in system settings when disabled.
- Fix merge conflict
2024-11-15 05:49:05 +11:00
Mary Hipp
17c7b57145 (ui): make detailed progress view a setting that can be hidden 2024-11-15 05:49:05 +11:00
psychedelicious
6a12189542 feat(ui): updated progress event display
- Tweak layout/styling of alerts for consistent spacing
- Add percentage to message if it has percentage
- Only show events if the destination is canvas (so workflows events are hidden for example)
2024-11-15 05:49:05 +11:00
psychedelicious
96a31a5563 feat(app): add more events when loading/running models 2024-11-15 05:49:05 +11:00
psychedelicious
067747eca9 feat(app): tweak model load events
- Pass in the `UtilInterface` to the `ModelsInterface` so we can call the simple `signal_progress` method instead of the complicated `emit_invocation_progress` method.
- Only emit load events when starting to load - not after.
- Add more detail to the messages, like submodel type
2024-11-15 05:49:05 +11:00
Mary Hipp
c7878fddc6 (pytest) mock emit_invocation_progress on events service 2024-11-15 05:49:05 +11:00
maryhipp
54c51e0a06 (worker) add progress images for downloading remote models 2024-11-15 05:49:05 +11:00
Mary Hipp
1640ea0298 (pytest) add missing arg for mocked context 2024-11-15 05:49:05 +11:00
Mary Hipp
0c32ae9775 (pytest) fix import 2024-11-15 05:49:05 +11:00
maryhipp
fdb8ca5165 (worker) use source if name is not available 2024-11-15 05:49:05 +11:00
Mary Hipp
571faf6d7c (pytest) add queue_item and invocation to data in context for test 2024-11-15 05:49:05 +11:00
Mary Hipp
bdbdb22b74 (ui) add Canvas Alert for invocation progress messages 2024-11-15 05:49:05 +11:00
maryhipp
9bbb5644af (worker) add invocation_progress events to model loading 2024-11-15 05:49:05 +11:00
Mary Hipp
e90ad19f22 (ui): update en string for full IP adapter 2024-11-14 10:07:42 -08:00
Ryan Dick
0ba11e8f73 SD3 Image-to-Image and Inpainting (#7295)
## Summary

Add support for SD3 image-to-image and inpainting. Similar to FLUX, the
implementation supports fractional denoise_start/denoise_end for more
fine-grained denoise strength control, and a gradient mask adjustment
schedule for smoother inpainting seams.

## Example
Workflow
<img width="1016" alt="image"
src="https://github.com/user-attachments/assets/ee598d77-be80-4ca7-9355-c3cbefa2ef43">

Result

![image](https://github.com/user-attachments/assets/43953fa7-0e4e-42b5-84e8-85cfeeeee00b)

## QA Instructions

- [x] Regression test of text-to-image
- [x] Test image-to-image without mask
- [x] Test that adjusting denoising_start allows fine-grained control of
amount of change in image-to-image
- [x] Test inpainting with mask
- [x] Smoke test SD1, SDXL, FLUX image-to-image to make sure there was
no regression with the frontend changes.

## Merge Plan

<!--WHEN APPLICABLE: Large PRs, or PRs that touch sensitive things like
DB schemas, may need some care when merging. For example, a careful
rebase by the change author, timing to not interfere with a pending
release, or a message to contributors on discord after merging.-->

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
2024-11-14 09:33:51 -08:00
Ryan Dick
1cf7600f5b Merge branch 'main' into ryan/sd3-image-to-image 2024-11-14 09:25:23 -08:00
Ryan Dick
4f9d12b872 Fix FLUX diffusers LoRA models with no .proj_mlp layers (#7313)
## Summary

Add support for FLUX diffusers LoRA models without `.proj_mlp` layers.

## Related Issues / Discussions

Closes #7129 

## QA Instructions

- [x] FLUX diffusers LoRA **without .proj_mlp** layers
- [x] FLUX diffusers LoRA **with .proj_mlp** layers
- [x] FLUX diffusers LoRA **without .proj_mlp** layers, quantized base
model
- [x] FLUX diffusers LoRA **with .proj_mlp** layers, quantized base
model

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
2024-11-14 09:09:10 -08:00
Ryan Dick
68c3b0649b Add unit tests for FLUX diffusers LoRA without .proj_mlp layers. 2024-11-14 16:53:49 +00:00
Ryan Dick
8ef8bd4261 Add state dict tensor shapes for existing LoRA unit tests. 2024-11-14 16:53:49 +00:00
Ryan Dick
50897ba066 Add flag to optionally allow missing layer keys in FLUX lora loader. 2024-11-14 16:53:49 +00:00
Ryan Dick
3510643870 Support FLUX LoRAs without .proj_mlp layers. 2024-11-14 16:53:49 +00:00
Ryan Dick
ca9cb1c9ef Flux Vae broke for float16, force bfloat16 or float32 were compatible (#7213)
## Summary

The Flux VAE, like many VAEs, is broken if run using float16 inputs
returning black images due to NaNs
This will fix the issue by forcing the VAE to run in bfloat16 or float32
were compatible

## Related Issues / Discussions

Fix for issue https://github.com/invoke-ai/InvokeAI/issues/7208

## QA Instructions

Tested on MacOS, VAE works with float16 in the invoke.yaml and left to
default.
I also briefly forced it down the float32 route to check that to.
Needs testing on CUDA / ROCm

## Merge Plan

It should be a straight forward merge,
2024-11-13 15:51:40 -08:00
Ryan Dick
b89caa02bd Merge branch 'main' into flux_vae_fp16_broke 2024-11-13 15:33:43 -08:00
Ryan Dick
eaf4e08c44 Use vae.parameters() for more efficient access of the first model parameter. 2024-11-13 23:32:40 +00:00
Darrell
fb19621361 Updated link to flux ip adapter model 2024-11-12 08:11:40 -05:00
Mary Hipp
9179619077 actually use optimized denoising 2024-11-08 20:46:08 -05:00
Mary Hipp
13cb5f0ba2 Merge remote-tracking branch 'origin/main' into ryan/sd3-image-to-image 2024-11-08 20:29:56 -05:00
Mary Hipp
7e52fc1c17 Merge branch 'ryan/sd3-image-to-image' of https://github.com/invoke-ai/InvokeAI into ryan/sd3-image-to-image 2024-11-08 20:14:24 -05:00
Mary Hipp
7f60a4a282 (ui): update more generation settings for SD3 linear UI 2024-11-08 20:14:13 -05:00
psychedelicious
3f880496f7 feat(ui): clarify denoising strength badge text 2024-11-09 08:38:41 +11:00
Ryan Dick
f05efd3270 Fix import for getInfill. 2024-11-08 20:42:44 +00:00
psychedelicious
79eb8172b6 feat(ui): update warnings on upscaling tab based on model arch
When an unsupported model architecture is selected, show that warning only, without the extra warnings (i.e. no "missing tile controlnet" warning)

Update Invoke tooltip warnings accordingly

Closes #7239
Closes #7177
2024-11-09 07:34:03 +11:00
Ryan Dick
7732b5d478 Fix bug related to i2l nodes during graph construction of image-to-image workflows. 2024-11-08 20:15:34 +00:00
Mary Hipp
a2a1934b66 Merge branch 'ryan/sd3-image-to-image' of https://github.com/invoke-ai/InvokeAI into ryan/sd3-image-to-image 2024-11-08 13:43:19 -05:00
Mary Hipp
dff6570078 (ui) SD3 support in linear UI 2024-11-08 13:42:57 -05:00
maryhipp
04e4fb63af add SD3 generation modes for metadata validation 2024-11-08 13:13:58 -05:00
Vargol
83609d5008 Merge branch 'invoke-ai:main' into flux_vae_fp16_broke 2024-11-08 10:37:31 +00:00
David Burnett
2618ed0ae7 ruff complained 2024-11-08 10:31:53 +00:00
David Burnett
bb3cedddd5 Rework change based on comments 2024-11-08 10:27:47 +00:00
psychedelicious
5b3e1593ca fix(ui): restore missing image paste handler
Missed migrating this logic over during dnd migration.
2024-11-08 16:42:39 +11:00
psychedelicious
2d08078a7d fix(ui): fit bbox to layers math 2024-11-08 16:40:24 +11:00
psychedelicious
75acece1f1 fix(ui): excessive toasts when generating on canvas
- Add `withToast` flag to `uploadImage` util
- Skip the toast if this is not set
- Use the flag to disable toasts when canvas does internal image-uploading stuff that should be invisible to user
2024-11-08 10:30:04 +11:00
psychedelicious
a9db2ffefd fix(ui): ensure clip vision model is set correctly for FLUX IP Adapters 2024-11-08 10:02:41 +11:00
psychedelicious
cdd148b4d1 feat(ui): add toast for graph building errors 2024-11-08 10:02:41 +11:00
psychedelicious
730fabe2de feat(ui): add util to extract message from a tsafe AssertionError 2024-11-08 10:02:41 +11:00
psychedelicious
6c59790a7f chore: bump version to v5.4.1rc2 2024-11-08 10:00:20 +11:00
Ryan Dick
0e6cb91863 Update SD3 InpaintExtension with gradient adjustment to match FLUX. 2024-11-07 22:55:30 +00:00
Ryan Dick
a0fefcd43f Switch to using a custom scheduler implementation for SD3 rather than the diffusers FlowMatchEulerDiscreteScheduler. It is easier to work with and enables us to re-use the clip_timestep_schedule_fractional() utility from FLUX. 2024-11-07 22:46:52 +00:00
psychedelicious
c37251d6f7 tweak(ui): workflow linear field styling 2024-11-08 07:39:09 +11:00
psychedelicious
2854210162 fix(ui): dnd autoscroll on elements w/ custom scrollbar
Have to do a bit of fanagling to get it to work and get `pragmatic-drag-and-drop` to not complain.
2024-11-08 07:39:09 +11:00
psychedelicious
5545b980af fix(ui): workflow field sorting doesn't use unique identifier for fields 2024-11-08 07:39:09 +11:00
psychedelicious
0c9434c464 chore(ui): lint 2024-11-08 07:39:09 +11:00
psychedelicious
8771de917d feat(ui): migrate fullscreen drop zone to pdnd 2024-11-08 07:39:09 +11:00
psychedelicious
122946ef4c feat(ui): DndDropOverlay supports react node for label 2024-11-08 07:39:09 +11:00
psychedelicious
2d974f670c feat(ui): restore missing upload buttons 2024-11-08 07:39:09 +11:00
psychedelicious
75f0da9c35 fix(ui): use revised uploader for CL empty state 2024-11-08 07:39:09 +11:00
psychedelicious
5df3c00e28 feat(ui): remove SerializableObject, use type-fest's JsonObject 2024-11-08 07:39:09 +11:00
psychedelicious
b049880502 fix(ui): uploads initiated from canvas 2024-11-08 07:39:09 +11:00
psychedelicious
e5293fdd1a fix(ui): match new default controlnet behaviour 2024-11-08 07:39:09 +11:00
psychedelicious
8883775762 feat(ui): rework image uploads (wip) 2024-11-08 07:39:09 +11:00
psychedelicious
cfadb313d2 fix(ui): ts issues 2024-11-08 07:39:09 +11:00
psychedelicious
b5cadd9a1a fix(ui): scroll issue w/ boards list 2024-11-08 07:39:09 +11:00
psychedelicious
5361b6e014 refactor(ui): image actions sep of concerns 2024-11-08 07:39:09 +11:00
psychedelicious
ff346172af feat(ui): use new image actions system for image menu 2024-11-08 07:39:09 +11:00
psychedelicious
92f660018b refactor(ui): dnd actions to image actions
We don't need a "dnd" image system. We need a "image action" system. We need to execute specific flows with images from various "origins":
- internal dnd e.g. from gallery
- external dnd e.g. user drags an image file into the browser
- direct file upload e.g. user clicks an upload button
- some other internal app button e.g. a context menu

The actions are now generalized to better support these various use-cases.
2024-11-08 07:39:09 +11:00
psychedelicious
1afc2cba4e feat(ui): support different labels for external drop targets (e.g. uploads) 2024-11-08 07:39:09 +11:00
psychedelicious
ee8359242c feat(ui): more dnd cleanup and tidy 2024-11-08 07:39:09 +11:00
psychedelicious
f0c80a8d7a tidy(ui): dnd stuff 2024-11-08 07:39:09 +11:00
psychedelicious
8da9e7c1f6 fix(ui): min height for workflow image field drop target 2024-11-08 07:39:09 +11:00
psychedelicious
6d7a486e5b feat(ui): restore dnd to workflow fields 2024-11-08 07:39:09 +11:00
psychedelicious
57122c6aa3 feat(ui): layer reordering styling 2024-11-08 07:39:09 +11:00
psychedelicious
54abd8d4d1 feat(ui): dnd layer reordering (wip) 2024-11-08 07:39:09 +11:00
psychedelicious
06283cffed feat(ui): use custom drag previews for images 2024-11-08 07:39:09 +11:00
psychedelicious
27fa0e1140 tidy(ui): more efficient dnd overlay styling 2024-11-08 07:39:09 +11:00
psychedelicious
533d48abdb feat(ui): multi-image drag preview 2024-11-08 07:39:09 +11:00
psychedelicious
6845cae4c9 tidy(ui): move new dnd impl into features/dnd 2024-11-08 07:39:09 +11:00
psychedelicious
31c9acb1fa tidy(ui): clean up old dnd stuff 2024-11-08 07:39:09 +11:00
psychedelicious
fb5e462300 tidy(ui): document & clean up dnd 2024-11-08 07:39:09 +11:00
psychedelicious
2f3abc29b1 feat(ui): better types for getData 2024-11-08 07:39:09 +11:00
psychedelicious
c5c071f285 feat(ui): better type name 2024-11-08 07:39:09 +11:00
psychedelicious
93a3ed56e7 feat(ui): simpler dnd typing implementation 2024-11-08 07:39:09 +11:00
psychedelicious
406fc58889 feat(ui): migrate to pragmatic-drag-and-drop (wip 4) 2024-11-08 07:39:09 +11:00
psychedelicious
cf67d084fd feat(ui): migrate to pragmatic-drag-and-drop (wip 3) 2024-11-08 07:39:09 +11:00
psychedelicious
d4a95af14f perf(ui): more gallery perf improvements 2024-11-08 07:39:09 +11:00
psychedelicious
8c8e7102c2 perf(ui): improved gallery perf 2024-11-08 07:39:09 +11:00
psychedelicious
b6b9ea9d70 feat(ui): migrate to pragmatic-drag-and-drop (wip 2) 2024-11-08 07:39:09 +11:00
psychedelicious
63126950bc feat(ui): migrate to pragmatic-drag-and-drop (wip) 2024-11-08 07:39:09 +11:00
psychedelicious
29d63d5dea fix(app): silence pydantic protected namespace warning
Closes #7287
2024-11-08 07:36:50 +11:00
Ryan Dick
a5f8c23dee Add inpainting support for SD3. 2024-11-07 20:21:43 +00:00
Ryan Dick
7bb4ea57c6 Add SD3ImageToLatentsInvocation. 2024-11-07 16:07:57 +00:00
Ryan Dick
75dc961bcb Add image-to-image support for SD3 - WIP. 2024-11-07 15:48:35 +00:00
Vargol
a9a1f6ef21 Merge branch 'invoke-ai:main' into flux_vae_fp16_broke 2024-11-07 14:02:51 +00:00
Jonathan
aa40161f26 Update flux_denoise.py
Added a bool to allow the node user to add noise in to initial latents (default) or to leave them alone.
2024-11-07 14:02:20 +00:00
psychedelicious
6efa812874 chore(ui): bump version to v5.4.1rc1 2024-11-07 14:02:20 +00:00
psychedelicious
8a683f5a3c feat(ui): updated whats new handling and v5.4.1 items 2024-11-07 14:02:20 +00:00
Brandon Rising
f4b0b6a93d fix: Look in known subfolders for configs for clip variants 2024-11-07 14:02:20 +00:00
Brandon Rising
1337c33ad3 fix: Avoid downloading unsafe .bin files if a safetensors file is available 2024-11-07 14:02:20 +00:00
Jonathan
2f6b035138 Update flux_denoise.py
Added a bool to allow the node user to add noise in to initial latents (default) or to leave them alone.
2024-11-07 08:44:10 -05:00
psychedelicious
4f9ae44472 chore(ui): bump version to v5.4.1rc1 2024-11-07 12:19:28 +11:00
psychedelicious
c682330852 feat(ui): updated whats new handling and v5.4.1 items 2024-11-07 12:19:28 +11:00
Brandon Rising
c064257759 fix: Look in known subfolders for configs for clip variants 2024-11-07 12:01:02 +11:00
Brandon Rising
8a4c629576 fix: Avoid downloading unsafe .bin files if a safetensors file is available 2024-11-06 19:31:18 -05:00
David Burnett
496b02a3bc Same issue affects image2image, so do the same again 2024-11-06 17:47:22 -05:00
David Burnett
7b5efc2203 Flux Vae broke for float16, force bfloat16 or float32 were compatible 2024-11-06 17:47:22 -05:00
psychedelicious
a01d44f813 chore(ui): lint 2024-11-06 10:25:46 -05:00
psychedelicious
63fb3a15e9 feat(ui): default to no control model selected for control layers 2024-11-06 10:25:46 -05:00
psychedelicious
4d0837541b feat(ui): add simple mode filtering 2024-11-06 10:25:46 -05:00
psychedelicious
999809b4c7 fix(ui): minor viewer close button styling 2024-11-06 10:25:46 -05:00
psychedelicious
c452edfb9f feat(ui): add control layer empty state 2024-11-06 10:25:46 -05:00
psychedelicious
ad2cdbd8a2 feat(ui): tooltip for canvas preview image 2024-11-06 10:25:46 -05:00
psychedelicious
f15c24bfa7 feat(ui): add " (recommended)" to balanced control mode label 2024-11-06 10:25:46 -05:00
psychedelicious
d1f653f28c feat(ui): make default control end step 0.75 2024-11-06 10:25:46 -05:00
psychedelicious
244465d3a6 feat(ui): make default control weight 0.75 2024-11-06 10:25:46 -05:00
psychedelicious
c6236ab70c feat(ui): add menubar-ish header on comparison 2024-11-06 10:25:46 -05:00
psychedelicious
644d5cb411 feat(ui): add menubar-ish header on viewer 2024-11-06 10:25:46 -05:00
Riku
bb0a630416 fix(ui): adjust knip config to ignore parameter schema exports 2024-11-06 22:51:17 +11:00
Riku
2148ae9287 feat(ui): simplify parameter schema declaration and type inference 2024-11-06 22:51:17 +11:00
psychedelicious
42d242609c chore(gh): update pr template w/ reminder for what's new copy 2024-11-06 19:03:31 +11:00
psychedelicious
fd0a52392b feat(ui): added line about when denoising str is disabled 2024-11-06 19:01:33 +11:00
psychedelicious
e64415d59a feat(ui): revised logic to disable denoising str 2024-11-06 19:01:33 +11:00
psychedelicious
1871e0bdbf feat(ui): tweaked denoise str styling 2024-11-06 19:01:33 +11:00
Mary Hipp
3ae9a965c2 lint 2024-11-06 19:01:33 +11:00
Mary Hipp
85932e35a7 update copy again 2024-11-06 19:01:33 +11:00
Mary Hipp
41b07a56cc update popover copy and add image 2024-11-06 19:01:33 +11:00
Mary Hipp
54064c0cb8 fix(ui): match badge height to slider height so layout does not shift 2024-11-06 19:01:33 +11:00
Mary Hipp
68284b37fa remove opacity logic from WavyLine, add badge explaining disabled state, add translations 2024-11-06 19:01:33 +11:00
Mary Hipp
ae5bc6f5d6 feat(ui): move denoising strength to layers panel w/ visualization of how much change will be applied, only enable if 1+ enabled raster layer 2024-11-06 19:01:33 +11:00
Mary Hipp
6dc16c9f54 wip 2024-11-06 19:01:33 +11:00
Brandon Rising
faa9ac4e15 fix: get_clip_variant_type should never return None 2024-11-06 09:59:50 +11:00
Mary Hipp Rogers
d0460849b0 fix bad merge conflict (#7273)
Co-authored-by: Mary Hipp <maryhipp@Marys-MacBook-Air.local>
2024-11-05 16:02:03 -05:00
Mary Hipp Rogers
bed3c2dd77 update Whats New for 5.3.1 (#7272)
Co-authored-by: Mary Hipp <maryhipp@Marys-MacBook-Air.local>
2024-11-05 15:43:16 -05:00
Mary Hipp
916ddd17d7 fix(ui): fix link for infill method popover 2024-11-05 15:39:03 -05:00
Mary Hipp
accfa7407f fix undefined 2024-11-05 15:30:17 -05:00
Mary Hipp
908db31e48 feat(api,ui): allow Whats New module to get content from back-end 2024-11-05 15:30:17 -05:00
Mary Hipp
b70f632b26 fix(ui): add some feedback while layers are merging 2024-11-05 12:38:50 -05:00
Brandon Rising
d07a6385ab Always default to ClipVariantType.L instead of None 2024-11-05 12:03:40 -05:00
Brandon Rising
68df612fa1 fix: Never throw an exception when finding the clip variant type 2024-11-05 12:03:40 -05:00
Lincoln Stein
b9d8e196a4 remove full path name from controlnet config file 2024-10-26 10:09:44 -04:00
Eugene Brodsky
2916d058d6 fix(package): update package paths for model base config files in pyproject.toml 2024-10-23 18:29:06 +00:00
Brandon
144673771c Merge branch 'main' into lstein/feat/diffusers-v0.30 2024-10-08 23:04:36 -04:00
Lincoln Stein
471dbf8668 tweak name of folder containing model base config files 2024-09-04 18:08:14 -04:00
Lincoln Stein
9e097b6787 Merge branch 'main' into lstein/feat/diffusers-v0.30 2024-09-04 18:03:34 -04:00
Lincoln Stein
478154b930 Merge branch 'main' into lstein/feat/diffusers-v0.30 2024-09-02 12:21:34 -04:00
Lincoln Stein
03210bddcf Merge branch 'lstein/feat/diffusers-v0.30' of github.com:invoke-ai/InvokeAI into lstein/feat/diffusers-v0.30 2024-08-31 13:54:49 -04:00
Lincoln Stein
32cdd42d2b fix formatting errors 2024-08-31 13:53:45 -04:00
Lincoln Stein
63dbf38963 Merge branch 'main' into lstein/feat/diffusers-v0.30 2024-08-31 13:48:55 -04:00
Lincoln Stein
3487def2d8 fixed SDXL controlnet probe 2024-08-31 13:48:37 -04:00
Lincoln Stein
4556343fa6 use correct controlnet config file 2024-08-27 11:39:34 -04:00
Lincoln Stein
5daaaa3b70 Merge remote-tracking branch 'refs/remotes/origin/lstein/feat/diffusers-v0.30' into lstein/feat/diffusers-v0.30 2024-08-17 15:58:58 -04:00
Lincoln Stein
7a9a1694a4 pass configuration templates to from_single_file() using the config option 2024-08-17 15:57:02 -04:00
Lincoln Stein
5b296d3c87 Merge branch 'main' into lstein/feat/diffusers-v0.30 2024-08-17 14:13:33 -04:00
Lincoln Stein
6af84434e0 enable offline loading of main sd-1, sd-2 and sdxl models 2024-08-17 14:06:55 -04:00
Lincoln Stein
5bde4eaa7a renamed deprecated original_config_file argument 2024-08-13 23:14:38 -04:00
Lincoln Stein
b5ec04f10c pass original_config_file to load_single_file() 2024-08-13 21:12:40 -04:00
436 changed files with 612634 additions and 8847 deletions

View File

@@ -19,3 +19,4 @@
- [ ] _The PR has a short but descriptive title, suitable for a changelog_
- [ ] _Tests added / updated (if applicable)_
- [ ] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_

14
SECURITY.md Normal file
View File

@@ -0,0 +1,14 @@
# Security Policy
## Supported Versions
Only the latest version of Invoke will receive security updates.
We do not currently maintain multiple versions of the application with updates.
## Reporting a Vulnerability
To report a vulnerability, contact the Invoke team directly at security@invoke.ai
At this time, we do not maintain a formal bug bounty program.
You can also share identified security issues with our team on huntr.com

View File

@@ -50,7 +50,7 @@ Applications are built on top of the invoke framework. They should construct `in
### Web UI
The Web UI is built on top of an HTTP API built with [FastAPI](https://fastapi.tiangolo.com/) and [Socket.IO](https://socket.io/). The frontend code is found in `/frontend` and the backend code is found in `/ldm/invoke/app/api_app.py` and `/ldm/invoke/app/api/`. The code is further organized as such:
The Web UI is built on top of an HTTP API built with [FastAPI](https://fastapi.tiangolo.com/) and [Socket.IO](https://socket.io/). The frontend code is found in `/invokeai/frontend` and the backend code is found in `/invokeai/app/api_app.py` and `/invokeai/app/api/`. The code is further organized as such:
| Component | Description |
| --- | --- |
@@ -62,7 +62,7 @@ The Web UI is built on top of an HTTP API built with [FastAPI](https://fastapi.t
### CLI
The CLI is built automatically from invocation metadata, and also supports invocation piping and auto-linking. Code is available in `/ldm/invoke/app/cli_app.py`.
The CLI is built automatically from invocation metadata, and also supports invocation piping and auto-linking. Code is available in `/invokeai/frontend/cli`.
## Invoke
@@ -70,7 +70,7 @@ The Invoke framework provides the interface to the underlying AI systems and is
### Invoker
The invoker (`/ldm/invoke/app/services/invoker.py`) is the primary interface through which applications interact with the framework. Its primary purpose is to create, manage, and invoke sessions. It also maintains two sets of services:
The invoker (`/invokeai/app/services/invoker.py`) is the primary interface through which applications interact with the framework. Its primary purpose is to create, manage, and invoke sessions. It also maintains two sets of services:
- **invocation services**, which are used by invocations to interact with core functionality.
- **invoker services**, which are used by the invoker to manage sessions and manage the invocation queue.
@@ -82,12 +82,12 @@ The session graph does not support looping. This is left as an application probl
### Invocations
Invocations represent individual units of execution, with inputs and outputs. All invocations are located in `/ldm/invoke/app/invocations`, and are all automatically discovered and made available in the applications. These are the primary way to expose new functionality in Invoke.AI, and the [implementation guide](INVOCATIONS.md) explains how to add new invocations.
Invocations represent individual units of execution, with inputs and outputs. All invocations are located in `/invokeai/app/invocations`, and are all automatically discovered and made available in the applications. These are the primary way to expose new functionality in Invoke.AI, and the [implementation guide](INVOCATIONS.md) explains how to add new invocations.
### Services
Services provide invocations access AI Core functionality and other necessary functionality (e.g. image storage). These are available in `/ldm/invoke/app/services`. As a general rule, new services should provide an interface as an abstract base class, and may provide a lightweight local implementation by default in their module. The goal for all services should be to enable the usage of different implementations (e.g. using cloud storage for image storage), but should not load any module dependencies unless that implementation has been used (i.e. don't import anything that won't be used, especially if it's expensive to import).
Services provide invocations access AI Core functionality and other necessary functionality (e.g. image storage). These are available in `/invokeai/app/services`. As a general rule, new services should provide an interface as an abstract base class, and may provide a lightweight local implementation by default in their module. The goal for all services should be to enable the usage of different implementations (e.g. using cloud storage for image storage), but should not load any module dependencies unless that implementation has been used (i.e. don't import anything that won't be used, especially if it's expensive to import).
## AI Core
The AI Core is represented by the rest of the code base (i.e. the code outside of `/ldm/invoke/app/`).
The AI Core is represented by the rest of the code base (i.e. the code outside of `/invokeai/app/`).

View File

@@ -287,8 +287,8 @@ new Invocation ready to be used.
Once you've created a Node, the next step is to share it with the community! The
best way to do this is to submit a Pull Request to add the Node to the
[Community Nodes](nodes/communityNodes) list. If you're not sure how to do that,
take a look a at our [contributing nodes overview](contributingNodes).
[Community Nodes](../nodes/communityNodes.md) list. If you're not sure how to do that,
take a look a at our [contributing nodes overview](../nodes/contributingNodes.md).
## Advanced

View File

@@ -9,20 +9,20 @@ model. These are the:
configuration information. Among other things, the record service
tracks the type of the model, its provenance, and where it can be
found on disk.
* _ModelInstallServiceBase_ A service for installing models to
disk. It uses `DownloadQueueServiceBase` to download models and
their metadata, and `ModelRecordServiceBase` to store that
information. It is also responsible for managing the InvokeAI
`models` directory and its contents.
* _DownloadQueueServiceBase_
A multithreaded downloader responsible
for downloading models from a remote source to disk. The download
queue has special methods for downloading repo_id folders from
Hugging Face, as well as discriminating among model versions in
Civitai, but can be used for arbitrary content.
* _ModelLoadServiceBase_
Responsible for loading a model from disk
into RAM and VRAM and getting it ready for inference.
@@ -207,9 +207,9 @@ for use in the InvokeAI web server. Its signature is:
```
def open(
cls,
config: InvokeAIAppConfig,
conn: Optional[sqlite3.Connection] = None,
cls,
config: InvokeAIAppConfig,
conn: Optional[sqlite3.Connection] = None,
lock: Optional[threading.Lock] = None
) -> Union[ModelRecordServiceSQL, ModelRecordServiceFile]:
```
@@ -363,7 +363,7 @@ functionality:
* Registering a model config record for a model already located on the
local filesystem, without moving it or changing its path.
* Installing a model alreadiy located on the local filesystem, by
moving it into the InvokeAI root directory under the
`models` folder (or wherever config parameter `models_dir`
@@ -371,21 +371,21 @@ functionality:
* Probing of models to determine their type, base type and other key
information.
* Interface with the InvokeAI event bus to provide status updates on
the download, installation and registration process.
* Downloading a model from an arbitrary URL and installing it in
`models_dir`.
* Special handling for HuggingFace repo_ids to recursively download
the contents of the repository, paying attention to alternative
variants such as fp16.
* Saving tags and other metadata about the model into the invokeai database
when fetching from a repo that provides that type of information,
(currently only HuggingFace).
### Initializing the installer
A default installer is created at InvokeAI api startup time and stored
@@ -461,7 +461,7 @@ revision.
`config` is an optional dict of values that will override the
autoprobed values for model type, base, scheduler prediction type, and
so forth. See [Model configuration and
probing](#Model-configuration-and-probing) for details.
probing](#model-configuration-and-probing) for details.
`access_token` is an optional access token for accessing resources
that need authentication.
@@ -494,7 +494,7 @@ source8 = URLModelSource(url='https://civitai.com/api/download/models/63006', ac
for source in [source1, source2, source3, source4, source5, source6, source7]:
install_job = installer.install_model(source)
source2job = installer.wait_for_installs(timeout=120)
for source in sources:
job = source2job[source]
@@ -504,7 +504,7 @@ for source in sources:
print(f"{source} installed as {model_key}")
elif job.errored:
print(f"{source}: {job.error_type}.\nStack trace:\n{job.error}")
```
As shown here, the `import_model()` method accepts a variety of

View File

@@ -1,6 +1,6 @@
# InvokeAI Backend Tests
We use `pytest` to run the backend python tests. (See [pyproject.toml](/pyproject.toml) for the default `pytest` options.)
We use `pytest` to run the backend python tests. (See [pyproject.toml](https://github.com/invoke-ai/InvokeAI/blob/main/pyproject.toml) for the default `pytest` options.)
## Fast vs. Slow
All tests are categorized as either 'fast' (no test annotation) or 'slow' (annotated with the `@pytest.mark.slow` decorator).
@@ -33,7 +33,7 @@ pytest tests -m ""
## Test Organization
All backend tests are in the [`tests/`](/tests/) directory. This directory mirrors the organization of the `invokeai/` directory. For example, tests for `invokeai/model_management/model_manager.py` would be found in `tests/model_management/test_model_manager.py`.
All backend tests are in the [`tests/`](https://github.com/invoke-ai/InvokeAI/tree/main/tests) directory. This directory mirrors the organization of the `invokeai/` directory. For example, tests for `invokeai/model_management/model_manager.py` would be found in `tests/model_management/test_model_manager.py`.
TODO: The above statement is aspirational. A re-organization of legacy tests is required to make it true.

View File

@@ -2,7 +2,7 @@
## **What do I need to know to help?**
If you are looking to help with a code contribution, InvokeAI uses several different technologies under the hood: Python (Pydantic, FastAPI, diffusers) and Typescript (React, Redux Toolkit, ChakraUI, Mantine, Konva). Familiarity with StableDiffusion and image generation concepts is helpful, but not essential.
If you are looking to help with a code contribution, InvokeAI uses several different technologies under the hood: Python (Pydantic, FastAPI, diffusers) and Typescript (React, Redux Toolkit, ChakraUI, Mantine, Konva). Familiarity with StableDiffusion and image generation concepts is helpful, but not essential.
## **Get Started**
@@ -12,7 +12,7 @@ To get started, take a look at our [new contributors checklist](newContributorCh
Once you're setup, for more information, you can review the documentation specific to your area of interest:
* #### [InvokeAI Architecure](../ARCHITECTURE.md)
* #### [Frontend Documentation](https://github.com/invoke-ai/InvokeAI/tree/main/invokeai/frontend/web)
* #### [Frontend Documentation](../frontend/index.md)
* #### [Node Documentation](../INVOCATIONS.md)
* #### [Local Development](../LOCAL_DEVELOPMENT.md)
@@ -20,15 +20,15 @@ Once you're setup, for more information, you can review the documentation specif
If you don't feel ready to make a code contribution yet, no problem! You can also help out in other ways, such as [documentation](documentation.md), [translation](translation.md) or helping support other users and triage issues as they're reported in GitHub.
There are two paths to making a development contribution:
There are two paths to making a development contribution:
1. Choosing an open issue to address. Open issues can be found in the [Issues](https://github.com/invoke-ai/InvokeAI/issues?q=is%3Aissue+is%3Aopen) section of the InvokeAI repository. These are tagged by the issue type (bug, enhancement, etc.) along with the “good first issues” tag denoting if they are suitable for first time contributors.
1. Additional items can be found on our [roadmap](https://github.com/orgs/invoke-ai/projects/7). The roadmap is organized in terms of priority, and contains features of varying size and complexity. If there is an inflight item youd like to help with, reach out to the contributor assigned to the item to see how you can help.
1. Additional items can be found on our [roadmap](https://github.com/orgs/invoke-ai/projects/7). The roadmap is organized in terms of priority, and contains features of varying size and complexity. If there is an inflight item youd like to help with, reach out to the contributor assigned to the item to see how you can help.
2. Opening a new issue or feature to add. **Please make sure you have searched through existing issues before creating new ones.**
*Regardless of what you choose, please post in the [#dev-chat](https://discord.com/channels/1020123559063990373/1049495067846524939) channel of the Discord before you start development in order to confirm that the issue or feature is aligned with the current direction of the project. We value our contributors time and effort and want to ensure that no ones time is being misspent.*
## Best Practices:
## Best Practices:
* Keep your pull requests small. Smaller pull requests are more likely to be accepted and merged
* Comments! Commenting your code helps reviewers easily understand your contribution
* Use Python and Typescripts typing systems, and consider using an editor with [LSP](https://microsoft.github.io/language-server-protocol/) support to streamline development
@@ -38,7 +38,7 @@ There are two paths to making a development contribution:
If you need help, you can ask questions in the [#dev-chat](https://discord.com/channels/1020123559063990373/1049495067846524939) channel of the Discord.
For frontend related work, **@psychedelicious** is the best person to reach out to.
For frontend related work, **@psychedelicious** is the best person to reach out to.
For backend related work, please reach out to **@blessedcoolant**, **@lstein**, **@StAlKeR7779** or **@psychedelicious**.

View File

@@ -22,15 +22,15 @@ Before starting these steps, ensure you have your local environment [configured
2. Fork the [InvokeAI](https://github.com/invoke-ai/InvokeAI) repository to your GitHub profile. This means that you will have a copy of the repository under **your-GitHub-username/InvokeAI**.
3. Clone the repository to your local machine using:
```bash
git clone https://github.com/your-GitHub-username/InvokeAI.git
```
```bash
git clone https://github.com/your-GitHub-username/InvokeAI.git
```
If you're unfamiliar with using Git through the commandline, [GitHub Desktop](https://desktop.github.com) is a easy-to-use alternative with a UI. You can do all the same steps listed here, but through the interface. 4. Create a new branch for your fix using:
```bash
git checkout -b branch-name-here
```
```bash
git checkout -b branch-name-here
```
5. Make the appropriate changes for the issue you are trying to address or the feature that you want to add.
6. Add the file contents of the changed files to the "snapshot" git uses to manage the state of the project, also known as the index:

View File

@@ -27,9 +27,9 @@ If you just want to use Invoke, you should use the [installer][installer link].
5. Activate the venv (you'll need to do this every time you want to run the app):
```sh
source .venv/bin/activate
```
```sh
source .venv/bin/activate
```
6. Install the repo as an [editable install][editable install link]:
@@ -37,7 +37,7 @@ If you just want to use Invoke, you should use the [installer][installer link].
pip install -e ".[dev,test,xformers]" --use-pep517 --extra-index-url https://download.pytorch.org/whl/cu121
```
Refer to the [manual installation][manual install link]] instructions for more determining the correct install options. `xformers` is optional, but `dev` and `test` are not.
Refer to the [manual installation][manual install link] instructions for more determining the correct install options. `xformers` is optional, but `dev` and `test` are not.
7. Install the frontend dev toolchain:

View File

@@ -34,11 +34,11 @@ Please reach out to @hipsterusername on [Discord](https://discord.gg/ZmtBAhwWhy)
## Contributors
This project is a combined effort of dedicated people from across the world. [Check out the list of all these amazing people](https://invoke-ai.github.io/InvokeAI/other/CONTRIBUTORS/). We thank them for their time, hard work and effort.
This project is a combined effort of dedicated people from across the world. [Check out the list of all these amazing people](contributors.md). We thank them for their time, hard work and effort.
## Code of Conduct
The InvokeAI community is a welcoming place, and we want your help in maintaining that. Please review our [Code of Conduct](https://github.com/invoke-ai/InvokeAI/blob/main/CODE_OF_CONDUCT.md) to learn more - it's essential to maintaining a respectful and inclusive environment.
The InvokeAI community is a welcoming place, and we want your help in maintaining that. Please review our [Code of Conduct](../CODE_OF_CONDUCT.md) to learn more - it's essential to maintaining a respectful and inclusive environment.
By making a contribution to this project, you certify that:

View File

@@ -99,7 +99,6 @@ their descriptions.
| Scale Latents | Scales latents by a given factor. |
| Segment Anything Processor | Applies segment anything processing to image |
| Show Image | Displays a provided image, and passes it forward in the pipeline. |
| Step Param Easing | Experimental per-step parameter easing for denoising steps |
| String Primitive Collection | A collection of string primitive values |
| String Primitive | A string primitive value |
| Subtract Integers | Subtracts two numbers |

View File

@@ -40,6 +40,8 @@ class AppVersion(BaseModel):
version: str = Field(description="App version")
highlights: Optional[list[str]] = Field(default=None, description="Highlights of release")
class AppDependencyVersions(BaseModel):
"""App depencency Versions Response"""

View File

@@ -110,7 +110,7 @@ async def cancel_by_batch_ids(
@session_queue_router.put(
"/{queue_id}/cancel_by_destination",
operation_id="cancel_by_destination",
responses={200: {"model": CancelByBatchIDsResult}},
responses={200: {"model": CancelByDestinationResult}},
)
async def cancel_by_destination(
queue_id: str = Path(description="The queue id to perform this operation on"),

View File

@@ -63,6 +63,7 @@ class Classification(str, Enum, metaclass=MetaEnum):
- `Prototype`: The invocation is not yet stable and may be removed from the application at any time. Workflows built around this invocation may break, and we are *not* committed to supporting this invocation.
- `Deprecated`: The invocation is deprecated and may be removed in a future version.
- `Internal`: The invocation is not intended for use by end-users. It may be changed or removed at any time, but is exposed for users to play with.
- `Special`: The invocation is a special case and does not fit into any of the other classifications.
"""
Stable = "stable"
@@ -70,6 +71,7 @@ class Classification(str, Enum, metaclass=MetaEnum):
Prototype = "prototype"
Deprecated = "deprecated"
Internal = "internal"
Special = "special"
class UIConfigBase(BaseModel):

View File

@@ -1,98 +1,120 @@
from typing import Any, Union
from typing import Optional, Union
import numpy as np
import numpy.typing as npt
import torch
import torchvision.transforms as T
from PIL import Image
from torchvision.transforms.functional import resize as tv_resize
from invokeai.app.invocations.baseinvocation import BaseInvocation, invocation
from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField, LatentsField
from invokeai.app.invocations.fields import FieldDescriptions, ImageField, Input, InputField, LatentsField
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.stable_diffusion.diffusers_pipeline import image_resized_to_grid_as_tensor
from invokeai.backend.util.devices import TorchDevice
def slerp(
t: Union[float, np.ndarray],
v0: Union[torch.Tensor, np.ndarray],
v1: Union[torch.Tensor, np.ndarray],
device: torch.device,
DOT_THRESHOLD: float = 0.9995,
):
"""
Spherical linear interpolation
Args:
t (float/np.ndarray): Float value between 0.0 and 1.0
v0 (np.ndarray): Starting vector
v1 (np.ndarray): Final vector
DOT_THRESHOLD (float): Threshold for considering the two vectors as
colineal. Not recommended to alter this.
Returns:
v2 (np.ndarray): Interpolation vector between v0 and v1
"""
inputs_are_torch = False
if not isinstance(v0, np.ndarray):
inputs_are_torch = True
v0 = v0.detach().cpu().numpy()
if not isinstance(v1, np.ndarray):
inputs_are_torch = True
v1 = v1.detach().cpu().numpy()
dot = np.sum(v0 * v1 / (np.linalg.norm(v0) * np.linalg.norm(v1)))
if np.abs(dot) > DOT_THRESHOLD:
v2 = (1 - t) * v0 + t * v1
else:
theta_0 = np.arccos(dot)
sin_theta_0 = np.sin(theta_0)
theta_t = theta_0 * t
sin_theta_t = np.sin(theta_t)
s0 = np.sin(theta_0 - theta_t) / sin_theta_0
s1 = sin_theta_t / sin_theta_0
v2 = s0 * v0 + s1 * v1
if inputs_are_torch:
v2 = torch.from_numpy(v2).to(device)
return v2
@invocation(
"lblend",
title="Blend Latents",
tags=["latents", "blend"],
tags=["latents", "blend", "mask"],
category="latents",
version="1.0.3",
version="1.1.0",
)
class BlendLatentsInvocation(BaseInvocation):
"""Blend two latents using a given alpha. Latents must have same size."""
"""Blend two latents using a given alpha. If a mask is provided, the second latents will be masked before blending.
Latents must have same size. Masking functionality added by @dwringer."""
latents_a: LatentsField = InputField(
description=FieldDescriptions.latents,
input=Input.Connection,
)
latents_b: LatentsField = InputField(
description=FieldDescriptions.latents,
input=Input.Connection,
)
alpha: float = InputField(default=0.5, description=FieldDescriptions.blend_alpha)
latents_a: LatentsField = InputField(description=FieldDescriptions.latents, input=Input.Connection)
latents_b: LatentsField = InputField(description=FieldDescriptions.latents, input=Input.Connection)
mask: Optional[ImageField] = InputField(default=None, description="Mask for blending in latents B")
alpha: float = InputField(ge=0, default=0.5, description=FieldDescriptions.blend_alpha)
def prep_mask_tensor(self, mask_image: Image.Image) -> torch.Tensor:
if mask_image.mode != "L":
mask_image = mask_image.convert("L")
mask_tensor = image_resized_to_grid_as_tensor(mask_image, normalize=False)
if mask_tensor.dim() == 3:
mask_tensor = mask_tensor.unsqueeze(0)
return mask_tensor
def replace_tensor_from_masked_tensor(
self, tensor: torch.Tensor, other_tensor: torch.Tensor, mask_tensor: torch.Tensor
):
output = tensor.clone()
mask_tensor = mask_tensor.expand(output.shape)
if output.dtype != torch.float16:
output = torch.add(output, mask_tensor * torch.sub(other_tensor, tensor))
else:
output = torch.add(output, mask_tensor.half() * torch.sub(other_tensor, tensor))
return output
def invoke(self, context: InvocationContext) -> LatentsOutput:
latents_a = context.tensors.load(self.latents_a.latents_name)
latents_b = context.tensors.load(self.latents_b.latents_name)
if self.mask is None:
mask_tensor = torch.zeros(latents_a.shape[-2:])
else:
mask_tensor = self.prep_mask_tensor(context.images.get_pil(self.mask.image_name))
mask_tensor = tv_resize(mask_tensor, latents_a.shape[-2:], T.InterpolationMode.BILINEAR, antialias=False)
latents_b = self.replace_tensor_from_masked_tensor(latents_b, latents_a, mask_tensor)
if latents_a.shape != latents_b.shape:
raise Exception("Latents to blend must be the same size.")
raise ValueError("Latents to blend must be the same size.")
device = TorchDevice.choose_torch_device()
def slerp(
t: Union[float, npt.NDArray[Any]], # FIXME: maybe use np.float32 here?
v0: Union[torch.Tensor, npt.NDArray[Any]],
v1: Union[torch.Tensor, npt.NDArray[Any]],
DOT_THRESHOLD: float = 0.9995,
) -> Union[torch.Tensor, npt.NDArray[Any]]:
"""
Spherical linear interpolation
Args:
t (float/np.ndarray): Float value between 0.0 and 1.0
v0 (np.ndarray): Starting vector
v1 (np.ndarray): Final vector
DOT_THRESHOLD (float): Threshold for considering the two vectors as
colineal. Not recommended to alter this.
Returns:
v2 (np.ndarray): Interpolation vector between v0 and v1
"""
inputs_are_torch = False
if not isinstance(v0, np.ndarray):
inputs_are_torch = True
v0 = v0.detach().cpu().numpy()
if not isinstance(v1, np.ndarray):
inputs_are_torch = True
v1 = v1.detach().cpu().numpy()
dot = np.sum(v0 * v1 / (np.linalg.norm(v0) * np.linalg.norm(v1)))
if np.abs(dot) > DOT_THRESHOLD:
v2 = (1 - t) * v0 + t * v1
else:
theta_0 = np.arccos(dot)
sin_theta_0 = np.sin(theta_0)
theta_t = theta_0 * t
sin_theta_t = np.sin(theta_t)
s0 = np.sin(theta_0 - theta_t) / sin_theta_0
s1 = sin_theta_t / sin_theta_0
v2 = s0 * v0 + s1 * v1
if inputs_are_torch:
v2_torch: torch.Tensor = torch.from_numpy(v2).to(device)
return v2_torch
else:
assert isinstance(v2, np.ndarray)
return v2
# blend
bl = slerp(self.alpha, latents_a, latents_b)
assert isinstance(bl, torch.Tensor)
blended_latents: torch.Tensor = bl # for type checking convenience
blended_latents = slerp(self.alpha, latents_a, latents_b, device)
# https://discuss.huggingface.co/t/memory-usage-by-later-pipeline-stages/23699
blended_latents = blended_latents.to("cpu")
TorchDevice.empty_cache()
torch.cuda.empty_cache()
name = context.tensors.save(tensor=blended_latents)
return LatentsOutput.build(latents_name=name, latents=blended_latents, seed=self.latents_a.seed)
return LatentsOutput.build(latents_name=name, latents=blended_latents)

View File

@@ -95,6 +95,7 @@ class CompelInvocation(BaseInvocation):
ti_manager,
),
):
context.util.signal_progress("Building conditioning")
assert isinstance(text_encoder, CLIPTextModel)
assert isinstance(tokenizer, CLIPTokenizer)
compel = Compel(
@@ -191,6 +192,7 @@ class SDXLPromptInvocationBase:
ti_manager,
),
):
context.util.signal_progress("Building conditioning")
assert isinstance(text_encoder, (CLIPTextModel, CLIPTextModelWithProjection))
assert isinstance(tokenizer, CLIPTokenizer)

File diff suppressed because it is too large Load Diff

View File

@@ -65,6 +65,7 @@ class CreateDenoiseMaskInvocation(BaseInvocation):
img_mask = tv_resize(mask, image_tensor.shape[-2:], T.InterpolationMode.BILINEAR, antialias=False)
masked_image = image_tensor * torch.where(img_mask < 0.5, 0.0, 1.0)
# TODO:
context.util.signal_progress("Running VAE encoder")
masked_latents = ImageToLatentsInvocation.vae_encode(vae_info, self.fp32, self.tiled, masked_image.clone())
masked_latents_name = context.tensors.save(tensor=masked_latents)

View File

@@ -131,6 +131,7 @@ class CreateGradientMaskInvocation(BaseInvocation):
image_tensor = image_tensor.unsqueeze(0)
img_mask = tv_resize(mask, image_tensor.shape[-2:], T.InterpolationMode.BILINEAR, antialias=False)
masked_image = image_tensor * torch.where(img_mask < 0.5, 0.0, 1.0)
context.util.signal_progress("Running VAE encoder")
masked_latents = ImageToLatentsInvocation.vae_encode(
vae_info, self.fp32, self.tiled, masked_image.clone()
)

View File

@@ -250,6 +250,11 @@ class FluxConditioningField(BaseModel):
"""A conditioning tensor primitive value"""
conditioning_name: str = Field(description="The name of conditioning tensor")
mask: Optional[TensorField] = Field(
default=None,
description="The mask associated with this conditioning tensor. Excluded regions should be set to False, "
"included regions should be set to True.",
)
class SD3ConditioningField(BaseModel):

View File

@@ -30,6 +30,7 @@ from invokeai.backend.flux.controlnet.xlabs_controlnet_flux import XLabsControlN
from invokeai.backend.flux.denoise import denoise
from invokeai.backend.flux.extensions.inpaint_extension import InpaintExtension
from invokeai.backend.flux.extensions.instantx_controlnet_extension import InstantXControlNetExtension
from invokeai.backend.flux.extensions.regional_prompting_extension import RegionalPromptingExtension
from invokeai.backend.flux.extensions.xlabs_controlnet_extension import XLabsControlNetExtension
from invokeai.backend.flux.extensions.xlabs_ip_adapter_extension import XLabsIPAdapterExtension
from invokeai.backend.flux.ip_adapter.xlabs_ip_adapter_flux import XlabsIpAdapterFlux
@@ -42,6 +43,7 @@ from invokeai.backend.flux.sampling_utils import (
pack,
unpack,
)
from invokeai.backend.flux.text_conditioning import FluxTextConditioning
from invokeai.backend.lora.conversions.flux_lora_constants import FLUX_LORA_TRANSFORMER_PREFIX
from invokeai.backend.lora.lora_model_raw import LoRAModelRaw
from invokeai.backend.lora.lora_patcher import LoRAPatcher
@@ -56,7 +58,7 @@ from invokeai.backend.util.devices import TorchDevice
title="FLUX Denoise",
tags=["image", "flux"],
category="image",
version="3.2.0",
version="3.2.2",
classification=Classification.Prototype,
)
class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
@@ -81,15 +83,16 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
description=FieldDescriptions.denoising_start,
)
denoising_end: float = InputField(default=1.0, ge=0, le=1, description=FieldDescriptions.denoising_end)
add_noise: bool = InputField(default=True, description="Add noise based on denoising start.")
transformer: TransformerField = InputField(
description=FieldDescriptions.flux_model,
input=Input.Connection,
title="Transformer",
)
positive_text_conditioning: FluxConditioningField = InputField(
positive_text_conditioning: FluxConditioningField | list[FluxConditioningField] = InputField(
description=FieldDescriptions.positive_cond, input=Input.Connection
)
negative_text_conditioning: FluxConditioningField | None = InputField(
negative_text_conditioning: FluxConditioningField | list[FluxConditioningField] | None = InputField(
default=None,
description="Negative conditioning tensor. Can be None if cfg_scale is 1.0.",
input=Input.Connection,
@@ -138,36 +141,12 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
name = context.tensors.save(tensor=latents)
return LatentsOutput.build(latents_name=name, latents=latents, seed=None)
def _load_text_conditioning(
self, context: InvocationContext, conditioning_name: str, dtype: torch.dtype
) -> Tuple[torch.Tensor, torch.Tensor]:
# Load the conditioning data.
cond_data = context.conditioning.load(conditioning_name)
assert len(cond_data.conditionings) == 1
flux_conditioning = cond_data.conditionings[0]
assert isinstance(flux_conditioning, FLUXConditioningInfo)
flux_conditioning = flux_conditioning.to(dtype=dtype)
t5_embeddings = flux_conditioning.t5_embeds
clip_embeddings = flux_conditioning.clip_embeds
return t5_embeddings, clip_embeddings
def _run_diffusion(
self,
context: InvocationContext,
):
inference_dtype = torch.bfloat16
# Load the conditioning data.
pos_t5_embeddings, pos_clip_embeddings = self._load_text_conditioning(
context, self.positive_text_conditioning.conditioning_name, inference_dtype
)
neg_t5_embeddings: torch.Tensor | None = None
neg_clip_embeddings: torch.Tensor | None = None
if self.negative_text_conditioning is not None:
neg_t5_embeddings, neg_clip_embeddings = self._load_text_conditioning(
context, self.negative_text_conditioning.conditioning_name, inference_dtype
)
# Load the input latents, if provided.
init_latents = context.tensors.load(self.latents.latents_name) if self.latents else None
if init_latents is not None:
@@ -182,15 +161,45 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
dtype=inference_dtype,
seed=self.seed,
)
b, _c, latent_h, latent_w = noise.shape
packed_h = latent_h // 2
packed_w = latent_w // 2
# Load the conditioning data.
pos_text_conditionings = self._load_text_conditioning(
context=context,
cond_field=self.positive_text_conditioning,
packed_height=packed_h,
packed_width=packed_w,
dtype=inference_dtype,
device=TorchDevice.choose_torch_device(),
)
neg_text_conditionings: list[FluxTextConditioning] | None = None
if self.negative_text_conditioning is not None:
neg_text_conditionings = self._load_text_conditioning(
context=context,
cond_field=self.negative_text_conditioning,
packed_height=packed_h,
packed_width=packed_w,
dtype=inference_dtype,
device=TorchDevice.choose_torch_device(),
)
pos_regional_prompting_extension = RegionalPromptingExtension.from_text_conditioning(
pos_text_conditionings, img_seq_len=packed_h * packed_w
)
neg_regional_prompting_extension = (
RegionalPromptingExtension.from_text_conditioning(neg_text_conditionings, img_seq_len=packed_h * packed_w)
if neg_text_conditionings
else None
)
transformer_info = context.models.load(self.transformer.transformer)
is_schnell = "schnell" in transformer_info.config.config_path
# Calculate the timestep schedule.
image_seq_len = noise.shape[-1] * noise.shape[-2] // 4
timesteps = get_schedule(
num_steps=self.num_steps,
image_seq_len=image_seq_len,
image_seq_len=packed_h * packed_w,
shift=not is_schnell,
)
@@ -207,9 +216,12 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
"to be poor. Consider using a FLUX dev model instead."
)
# Noise the orig_latents by the appropriate amount for the first timestep.
t_0 = timesteps[0]
x = t_0 * noise + (1.0 - t_0) * init_latents
if self.add_noise:
# Noise the orig_latents by the appropriate amount for the first timestep.
t_0 = timesteps[0]
x = t_0 * noise + (1.0 - t_0) * init_latents
else:
x = init_latents
else:
# init_latents are not provided, so we are not doing image-to-image (i.e. we are starting from pure noise).
if self.denoising_start > 1e-5:
@@ -224,28 +236,17 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
inpaint_mask = self._prep_inpaint_mask(context, x)
b, _c, latent_h, latent_w = x.shape
img_ids = generate_img_ids(h=latent_h, w=latent_w, batch_size=b, device=x.device, dtype=x.dtype)
pos_bs, pos_t5_seq_len, _ = pos_t5_embeddings.shape
pos_txt_ids = torch.zeros(
pos_bs, pos_t5_seq_len, 3, dtype=inference_dtype, device=TorchDevice.choose_torch_device()
)
neg_txt_ids: torch.Tensor | None = None
if neg_t5_embeddings is not None:
neg_bs, neg_t5_seq_len, _ = neg_t5_embeddings.shape
neg_txt_ids = torch.zeros(
neg_bs, neg_t5_seq_len, 3, dtype=inference_dtype, device=TorchDevice.choose_torch_device()
)
# Pack all latent tensors.
init_latents = pack(init_latents) if init_latents is not None else None
inpaint_mask = pack(inpaint_mask) if inpaint_mask is not None else None
noise = pack(noise)
x = pack(x)
# Now that we have 'packed' the latent tensors, verify that we calculated the image_seq_len correctly.
assert image_seq_len == x.shape[1]
# Now that we have 'packed' the latent tensors, verify that we calculated the image_seq_len, packed_h, and
# packed_w correctly.
assert packed_h * packed_w == x.shape[1]
# Prepare inpaint extension.
inpaint_extension: InpaintExtension | None = None
@@ -334,12 +335,8 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
model=transformer,
img=x,
img_ids=img_ids,
txt=pos_t5_embeddings,
txt_ids=pos_txt_ids,
vec=pos_clip_embeddings,
neg_txt=neg_t5_embeddings,
neg_txt_ids=neg_txt_ids,
neg_vec=neg_clip_embeddings,
pos_regional_prompting_extension=pos_regional_prompting_extension,
neg_regional_prompting_extension=neg_regional_prompting_extension,
timesteps=timesteps,
step_callback=self._build_step_callback(context),
guidance=self.guidance,
@@ -353,6 +350,43 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
x = unpack(x.float(), self.height, self.width)
return x
def _load_text_conditioning(
self,
context: InvocationContext,
cond_field: FluxConditioningField | list[FluxConditioningField],
packed_height: int,
packed_width: int,
dtype: torch.dtype,
device: torch.device,
) -> list[FluxTextConditioning]:
"""Load text conditioning data from a FluxConditioningField or a list of FluxConditioningFields."""
# Normalize to a list of FluxConditioningFields.
cond_list = [cond_field] if isinstance(cond_field, FluxConditioningField) else cond_field
text_conditionings: list[FluxTextConditioning] = []
for cond_field in cond_list:
# Load the text embeddings.
cond_data = context.conditioning.load(cond_field.conditioning_name)
assert len(cond_data.conditionings) == 1
flux_conditioning = cond_data.conditionings[0]
assert isinstance(flux_conditioning, FLUXConditioningInfo)
flux_conditioning = flux_conditioning.to(dtype=dtype, device=device)
t5_embeddings = flux_conditioning.t5_embeds
clip_embeddings = flux_conditioning.clip_embeds
# Load the mask, if provided.
mask: Optional[torch.Tensor] = None
if cond_field.mask is not None:
mask = context.tensors.load(cond_field.mask.tensor_name)
mask = mask.to(device=device)
mask = RegionalPromptingExtension.preprocess_regional_prompt_mask(
mask, packed_height, packed_width, dtype, device
)
text_conditionings.append(FluxTextConditioning(t5_embeddings, clip_embeddings, mask))
return text_conditionings
@classmethod
def prep_cfg_scale(
cls, cfg_scale: float | list[float], timesteps: list[float], cfg_scale_start_step: int, cfg_scale_end_step: int

View File

@@ -1,11 +1,18 @@
from contextlib import ExitStack
from typing import Iterator, Literal, Tuple
from typing import Iterator, Literal, Optional, Tuple
import torch
from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5Tokenizer
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField
from invokeai.app.invocations.fields import (
FieldDescriptions,
FluxConditioningField,
Input,
InputField,
TensorField,
UIComponent,
)
from invokeai.app.invocations.model import CLIPField, T5EncoderField
from invokeai.app.invocations.primitives import FluxConditioningOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
@@ -22,7 +29,7 @@ from invokeai.backend.stable_diffusion.diffusion.conditioning_data import Condit
title="FLUX Text Encoding",
tags=["prompt", "conditioning", "flux"],
category="conditioning",
version="1.1.0",
version="1.1.1",
classification=Classification.Prototype,
)
class FluxTextEncoderInvocation(BaseInvocation):
@@ -41,7 +48,10 @@ class FluxTextEncoderInvocation(BaseInvocation):
t5_max_seq_len: Literal[256, 512] = InputField(
description="Max sequence length for the T5 encoder. Expected to be 256 for FLUX schnell models and 512 for FLUX dev models."
)
prompt: str = InputField(description="Text prompt to encode.")
prompt: str = InputField(description="Text prompt to encode.", ui_component=UIComponent.Textarea)
mask: Optional[TensorField] = InputField(
default=None, description="A mask defining the region that this conditioning prompt applies to."
)
@torch.no_grad()
def invoke(self, context: InvocationContext) -> FluxConditioningOutput:
@@ -54,7 +64,9 @@ class FluxTextEncoderInvocation(BaseInvocation):
)
conditioning_name = context.conditioning.save(conditioning_data)
return FluxConditioningOutput.build(conditioning_name)
return FluxConditioningOutput(
conditioning=FluxConditioningField(conditioning_name=conditioning_name, mask=self.mask)
)
def _t5_encode(self, context: InvocationContext) -> torch.Tensor:
t5_tokenizer_info = context.models.load(self.t5_encoder.tokenizer)
@@ -71,6 +83,7 @@ class FluxTextEncoderInvocation(BaseInvocation):
t5_encoder = HFEncoder(t5_text_encoder, t5_tokenizer, False, self.t5_max_seq_len)
context.util.signal_progress("Running T5 encoder")
prompt_embeds = t5_encoder(prompt)
assert isinstance(prompt_embeds, torch.Tensor)
@@ -111,6 +124,7 @@ class FluxTextEncoderInvocation(BaseInvocation):
clip_encoder = HFEncoder(clip_text_encoder, clip_tokenizer, True, 77)
context.util.signal_progress("Running CLIP encoder")
pooled_prompt_embeds = clip_encoder(prompt)
assert isinstance(pooled_prompt_embeds, torch.Tensor)

View File

@@ -41,7 +41,8 @@ class FluxVaeDecodeInvocation(BaseInvocation, WithMetadata, WithBoard):
def _vae_decode(self, vae_info: LoadedModel, latents: torch.Tensor) -> Image.Image:
with vae_info as vae:
assert isinstance(vae, AutoEncoder)
latents = latents.to(device=TorchDevice.choose_torch_device(), dtype=TorchDevice.choose_torch_dtype())
vae_dtype = next(iter(vae.parameters())).dtype
latents = latents.to(device=TorchDevice.choose_torch_device(), dtype=vae_dtype)
img = vae.decode(latents)
img = img.clamp(-1, 1)
@@ -53,6 +54,7 @@ class FluxVaeDecodeInvocation(BaseInvocation, WithMetadata, WithBoard):
def invoke(self, context: InvocationContext) -> ImageOutput:
latents = context.tensors.load(self.latents.latents_name)
vae_info = context.models.load(self.vae.vae)
context.util.signal_progress("Running VAE")
image = self._vae_decode(vae_info=vae_info, latents=latents)
TorchDevice.empty_cache()

View File

@@ -44,9 +44,8 @@ class FluxVaeEncodeInvocation(BaseInvocation):
generator = torch.Generator(device=TorchDevice.choose_torch_device()).manual_seed(0)
with vae_info as vae:
assert isinstance(vae, AutoEncoder)
image_tensor = image_tensor.to(
device=TorchDevice.choose_torch_device(), dtype=TorchDevice.choose_torch_dtype()
)
vae_dtype = next(iter(vae.parameters())).dtype
image_tensor = image_tensor.to(device=TorchDevice.choose_torch_device(), dtype=vae_dtype)
latents = vae.encode(image_tensor, sample=True, generator=generator)
return latents
@@ -60,6 +59,7 @@ class FluxVaeEncodeInvocation(BaseInvocation):
if image_tensor.dim() == 3:
image_tensor = einops.rearrange(image_tensor, "c h w -> 1 c h w")
context.util.signal_progress("Running VAE")
latents = self.vae_encode(vae_info=vae_info, image_tensor=image_tensor)
latents = latents.to("cpu")

View File

@@ -0,0 +1,59 @@
from pydantic import ValidationInfo, field_validator
from invokeai.app.invocations.baseinvocation import (
BaseInvocation,
BaseInvocationOutput,
Classification,
invocation,
invocation_output,
)
from invokeai.app.invocations.fields import InputField, OutputField
from invokeai.app.services.shared.invocation_context import InvocationContext
@invocation_output("image_panel_coordinate_output")
class ImagePanelCoordinateOutput(BaseInvocationOutput):
x_left: int = OutputField(description="The left x-coordinate of the panel.")
y_top: int = OutputField(description="The top y-coordinate of the panel.")
width: int = OutputField(description="The width of the panel.")
height: int = OutputField(description="The height of the panel.")
@invocation(
"image_panel_layout",
title="Image Panel Layout",
tags=["image", "panel", "layout"],
category="image",
version="1.0.0",
classification=Classification.Prototype,
)
class ImagePanelLayoutInvocation(BaseInvocation):
"""Get the coordinates of a single panel in a grid. (If the full image shape cannot be divided evenly into panels,
then the grid may not cover the entire image.)
"""
width: int = InputField(description="The width of the entire grid.")
height: int = InputField(description="The height of the entire grid.")
num_cols: int = InputField(ge=1, default=1, description="The number of columns in the grid.")
num_rows: int = InputField(ge=1, default=1, description="The number of rows in the grid.")
panel_col_idx: int = InputField(ge=0, default=0, description="The column index of the panel to be processed.")
panel_row_idx: int = InputField(ge=0, default=0, description="The row index of the panel to be processed.")
@field_validator("panel_col_idx")
def validate_panel_col_idx(cls, v: int, info: ValidationInfo) -> int:
if v < 0 or v >= info.data["num_cols"]:
raise ValueError(f"panel_col_idx must be between 0 and {info.data['num_cols'] - 1}")
return v
@field_validator("panel_row_idx")
def validate_panel_row_idx(cls, v: int, info: ValidationInfo) -> int:
if v < 0 or v >= info.data["num_rows"]:
raise ValueError(f"panel_row_idx must be between 0 and {info.data['num_rows'] - 1}")
return v
def invoke(self, context: InvocationContext) -> ImagePanelCoordinateOutput:
x_left = self.panel_col_idx * (self.width // self.num_cols)
y_top = self.panel_row_idx * (self.height // self.num_rows)
width = self.width // self.num_cols
height = self.height // self.num_rows
return ImagePanelCoordinateOutput(x_left=x_left, y_top=y_top, width=width, height=height)

View File

@@ -117,6 +117,7 @@ class ImageToLatentsInvocation(BaseInvocation):
if image_tensor.dim() == 3:
image_tensor = einops.rearrange(image_tensor, "c h w -> 1 c h w")
context.util.signal_progress("Running VAE encoder")
latents = self.vae_encode(
vae_info=vae_info, upcast=self.fp32, tiled=self.tiled, image_tensor=image_tensor, tile_size=self.tile_size
)

View File

@@ -60,6 +60,7 @@ class LatentsToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
vae_info = context.models.load(self.vae.vae)
assert isinstance(vae_info.model, (AutoencoderKL, AutoencoderTiny))
with SeamlessExt.static_patch_model(vae_info.model, self.vae.seamless_axes), vae_info as vae:
context.util.signal_progress("Running VAE decoder")
assert isinstance(vae, (AutoencoderKL, AutoencoderTiny))
latents = latents.to(vae.device)
if self.fp32:

View File

@@ -147,6 +147,10 @@ GENERATION_MODES = Literal[
"flux_img2img",
"flux_inpaint",
"flux_outpaint",
"sd3_txt2img",
"sd3_img2img",
"sd3_inpaint",
"sd3_outpaint",
]

View File

@@ -1,43 +1,4 @@
import io
from typing import Literal, Optional
import matplotlib.pyplot as plt
import numpy as np
import PIL.Image
from easing_functions import (
BackEaseIn,
BackEaseInOut,
BackEaseOut,
BounceEaseIn,
BounceEaseInOut,
BounceEaseOut,
CircularEaseIn,
CircularEaseInOut,
CircularEaseOut,
CubicEaseIn,
CubicEaseInOut,
CubicEaseOut,
ElasticEaseIn,
ElasticEaseInOut,
ElasticEaseOut,
ExponentialEaseIn,
ExponentialEaseInOut,
ExponentialEaseOut,
LinearInOut,
QuadEaseIn,
QuadEaseInOut,
QuadEaseOut,
QuarticEaseIn,
QuarticEaseInOut,
QuarticEaseOut,
QuinticEaseIn,
QuinticEaseInOut,
QuinticEaseOut,
SineEaseIn,
SineEaseInOut,
SineEaseOut,
)
from matplotlib.ticker import MaxNLocator
from invokeai.app.invocations.baseinvocation import BaseInvocation, invocation
from invokeai.app.invocations.fields import InputField
@@ -65,191 +26,3 @@ class FloatLinearRangeInvocation(BaseInvocation):
def invoke(self, context: InvocationContext) -> FloatCollectionOutput:
param_list = list(np.linspace(self.start, self.stop, self.steps))
return FloatCollectionOutput(collection=param_list)
EASING_FUNCTIONS_MAP = {
"Linear": LinearInOut,
"QuadIn": QuadEaseIn,
"QuadOut": QuadEaseOut,
"QuadInOut": QuadEaseInOut,
"CubicIn": CubicEaseIn,
"CubicOut": CubicEaseOut,
"CubicInOut": CubicEaseInOut,
"QuarticIn": QuarticEaseIn,
"QuarticOut": QuarticEaseOut,
"QuarticInOut": QuarticEaseInOut,
"QuinticIn": QuinticEaseIn,
"QuinticOut": QuinticEaseOut,
"QuinticInOut": QuinticEaseInOut,
"SineIn": SineEaseIn,
"SineOut": SineEaseOut,
"SineInOut": SineEaseInOut,
"CircularIn": CircularEaseIn,
"CircularOut": CircularEaseOut,
"CircularInOut": CircularEaseInOut,
"ExponentialIn": ExponentialEaseIn,
"ExponentialOut": ExponentialEaseOut,
"ExponentialInOut": ExponentialEaseInOut,
"ElasticIn": ElasticEaseIn,
"ElasticOut": ElasticEaseOut,
"ElasticInOut": ElasticEaseInOut,
"BackIn": BackEaseIn,
"BackOut": BackEaseOut,
"BackInOut": BackEaseInOut,
"BounceIn": BounceEaseIn,
"BounceOut": BounceEaseOut,
"BounceInOut": BounceEaseInOut,
}
EASING_FUNCTION_KEYS = Literal[tuple(EASING_FUNCTIONS_MAP.keys())]
# actually I think for now could just use CollectionOutput (which is list[Any]
@invocation(
"step_param_easing",
title="Step Param Easing",
tags=["step", "easing"],
category="step",
version="1.0.2",
)
class StepParamEasingInvocation(BaseInvocation):
"""Experimental per-step parameter easing for denoising steps"""
easing: EASING_FUNCTION_KEYS = InputField(default="Linear", description="The easing function to use")
num_steps: int = InputField(default=20, description="number of denoising steps")
start_value: float = InputField(default=0.0, description="easing starting value")
end_value: float = InputField(default=1.0, description="easing ending value")
start_step_percent: float = InputField(default=0.0, description="fraction of steps at which to start easing")
end_step_percent: float = InputField(default=1.0, description="fraction of steps after which to end easing")
# if None, then start_value is used prior to easing start
pre_start_value: Optional[float] = InputField(default=None, description="value before easing start")
# if None, then end value is used prior to easing end
post_end_value: Optional[float] = InputField(default=None, description="value after easing end")
mirror: bool = InputField(default=False, description="include mirror of easing function")
# FIXME: add alt_mirror option (alternative to default or mirror), or remove entirely
# alt_mirror: bool = InputField(default=False, description="alternative mirroring by dual easing")
show_easing_plot: bool = InputField(default=False, description="show easing plot")
def invoke(self, context: InvocationContext) -> FloatCollectionOutput:
log_diagnostics = False
# convert from start_step_percent to nearest step <= (steps * start_step_percent)
# start_step = int(np.floor(self.num_steps * self.start_step_percent))
start_step = int(np.round(self.num_steps * self.start_step_percent))
# convert from end_step_percent to nearest step >= (steps * end_step_percent)
# end_step = int(np.ceil((self.num_steps - 1) * self.end_step_percent))
end_step = int(np.round((self.num_steps - 1) * self.end_step_percent))
# end_step = int(np.ceil(self.num_steps * self.end_step_percent))
num_easing_steps = end_step - start_step + 1
# num_presteps = max(start_step - 1, 0)
num_presteps = start_step
num_poststeps = self.num_steps - (num_presteps + num_easing_steps)
prelist = list(num_presteps * [self.pre_start_value])
postlist = list(num_poststeps * [self.post_end_value])
if log_diagnostics:
context.logger.debug("start_step: " + str(start_step))
context.logger.debug("end_step: " + str(end_step))
context.logger.debug("num_easing_steps: " + str(num_easing_steps))
context.logger.debug("num_presteps: " + str(num_presteps))
context.logger.debug("num_poststeps: " + str(num_poststeps))
context.logger.debug("prelist size: " + str(len(prelist)))
context.logger.debug("postlist size: " + str(len(postlist)))
context.logger.debug("prelist: " + str(prelist))
context.logger.debug("postlist: " + str(postlist))
easing_class = EASING_FUNCTIONS_MAP[self.easing]
if log_diagnostics:
context.logger.debug("easing class: " + str(easing_class))
easing_list = []
if self.mirror: # "expected" mirroring
# if number of steps is even, squeeze duration down to (number_of_steps)/2
# and create reverse copy of list to append
# if number of steps is odd, squeeze duration down to ceil(number_of_steps/2)
# and create reverse copy of list[1:end-1]
# but if even then number_of_steps/2 === ceil(number_of_steps/2), so can just use ceil always
base_easing_duration = int(np.ceil(num_easing_steps / 2.0))
if log_diagnostics:
context.logger.debug("base easing duration: " + str(base_easing_duration))
even_num_steps = num_easing_steps % 2 == 0 # even number of steps
easing_function = easing_class(
start=self.start_value,
end=self.end_value,
duration=base_easing_duration - 1,
)
base_easing_vals = []
for step_index in range(base_easing_duration):
easing_val = easing_function.ease(step_index)
base_easing_vals.append(easing_val)
if log_diagnostics:
context.logger.debug("step_index: " + str(step_index) + ", easing_val: " + str(easing_val))
if even_num_steps:
mirror_easing_vals = list(reversed(base_easing_vals))
else:
mirror_easing_vals = list(reversed(base_easing_vals[0:-1]))
if log_diagnostics:
context.logger.debug("base easing vals: " + str(base_easing_vals))
context.logger.debug("mirror easing vals: " + str(mirror_easing_vals))
easing_list = base_easing_vals + mirror_easing_vals
# FIXME: add alt_mirror option (alternative to default or mirror), or remove entirely
# elif self.alt_mirror: # function mirroring (unintuitive behavior (at least to me))
# # half_ease_duration = round(num_easing_steps - 1 / 2)
# half_ease_duration = round((num_easing_steps - 1) / 2)
# easing_function = easing_class(start=self.start_value,
# end=self.end_value,
# duration=half_ease_duration,
# )
#
# mirror_function = easing_class(start=self.end_value,
# end=self.start_value,
# duration=half_ease_duration,
# )
# for step_index in range(num_easing_steps):
# if step_index <= half_ease_duration:
# step_val = easing_function.ease(step_index)
# else:
# step_val = mirror_function.ease(step_index - half_ease_duration)
# easing_list.append(step_val)
# if log_diagnostics: logger.debug(step_index, step_val)
#
else: # no mirroring (default)
easing_function = easing_class(
start=self.start_value,
end=self.end_value,
duration=num_easing_steps - 1,
)
for step_index in range(num_easing_steps):
step_val = easing_function.ease(step_index)
easing_list.append(step_val)
if log_diagnostics:
context.logger.debug("step_index: " + str(step_index) + ", easing_val: " + str(step_val))
if log_diagnostics:
context.logger.debug("prelist size: " + str(len(prelist)))
context.logger.debug("easing_list size: " + str(len(easing_list)))
context.logger.debug("postlist size: " + str(len(postlist)))
param_list = prelist + easing_list + postlist
if self.show_easing_plot:
plt.figure()
plt.xlabel("Step")
plt.ylabel("Param Value")
plt.title("Per-Step Values Based On Easing: " + self.easing)
plt.bar(range(len(param_list)), param_list)
# plt.plot(param_list)
ax = plt.gca()
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
buf = io.BytesIO()
plt.savefig(buf, format="png")
buf.seek(0)
im = PIL.Image.open(buf)
im.show()
buf.close()
# output array of size steps, each entry list[i] is param value for step i
return FloatCollectionOutput(collection=param_list)

View File

@@ -4,7 +4,13 @@ from typing import Optional
import torch
from invokeai.app.invocations.baseinvocation import BaseInvocation, BaseInvocationOutput, invocation, invocation_output
from invokeai.app.invocations.baseinvocation import (
BaseInvocation,
BaseInvocationOutput,
Classification,
invocation,
invocation_output,
)
from invokeai.app.invocations.constants import LATENT_SCALE_FACTOR
from invokeai.app.invocations.fields import (
BoundingBoxField,
@@ -533,3 +539,23 @@ class BoundingBoxInvocation(BaseInvocation):
# endregion
@invocation(
"image_batch",
title="Image Batch",
tags=["primitives", "image", "batch", "internal"],
category="primitives",
version="1.0.0",
classification=Classification.Special,
)
class ImageBatchInvocation(BaseInvocation):
"""Create a batched generation, where the workflow is executed once for each image in the batch."""
images: list[ImageField] = InputField(min_length=1, description="The images to batch over", input=Input.Direct)
def __init__(self):
raise NotImplementedError("This class should never be executed or instantiated directly.")
def invoke(self, context: InvocationContext) -> ImageOutput:
raise NotImplementedError("This class should never be executed or instantiated directly.")

View File

@@ -1,16 +1,19 @@
from typing import Callable, Tuple
from typing import Callable, Optional, Tuple
import torch
import torchvision.transforms as tv_transforms
from diffusers.models.transformers.transformer_sd3 import SD3Transformer2DModel
from diffusers.schedulers.scheduling_flow_match_euler_discrete import FlowMatchEulerDiscreteScheduler
from torchvision.transforms.functional import resize as tv_resize
from tqdm import tqdm
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.constants import LATENT_SCALE_FACTOR
from invokeai.app.invocations.fields import (
DenoiseMaskField,
FieldDescriptions,
Input,
InputField,
LatentsField,
SD3ConditioningField,
WithBoard,
WithMetadata,
@@ -19,7 +22,9 @@ from invokeai.app.invocations.model import TransformerField
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.invocations.sd3_text_encoder import SD3_T5_MAX_SEQ_LEN
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.flux.sampling_utils import clip_timestep_schedule_fractional
from invokeai.backend.model_manager.config import BaseModelType
from invokeai.backend.sd3.extensions.inpaint_extension import InpaintExtension
from invokeai.backend.stable_diffusion.diffusers_pipeline import PipelineIntermediateState
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import SD3ConditioningInfo
from invokeai.backend.util.devices import TorchDevice
@@ -30,16 +35,24 @@ from invokeai.backend.util.devices import TorchDevice
title="SD3 Denoise",
tags=["image", "sd3"],
category="image",
version="1.0.0",
version="1.1.0",
classification=Classification.Prototype,
)
class SD3DenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
"""Run denoising process with a SD3 model."""
# If latents is provided, this means we are doing image-to-image.
latents: Optional[LatentsField] = InputField(
default=None, description=FieldDescriptions.latents, input=Input.Connection
)
# denoise_mask is used for image-to-image inpainting. Only the masked region is modified.
denoise_mask: Optional[DenoiseMaskField] = InputField(
default=None, description=FieldDescriptions.denoise_mask, input=Input.Connection
)
denoising_start: float = InputField(default=0.0, ge=0, le=1, description=FieldDescriptions.denoising_start)
denoising_end: float = InputField(default=1.0, ge=0, le=1, description=FieldDescriptions.denoising_end)
transformer: TransformerField = InputField(
description=FieldDescriptions.sd3_model,
input=Input.Connection,
title="Transformer",
description=FieldDescriptions.sd3_model, input=Input.Connection, title="Transformer"
)
positive_conditioning: SD3ConditioningField = InputField(
description=FieldDescriptions.positive_cond, input=Input.Connection
@@ -61,6 +74,41 @@ class SD3DenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
name = context.tensors.save(tensor=latents)
return LatentsOutput.build(latents_name=name, latents=latents, seed=None)
def _prep_inpaint_mask(self, context: InvocationContext, latents: torch.Tensor) -> torch.Tensor | None:
"""Prepare the inpaint mask.
- Loads the mask
- Resizes if necessary
- Casts to same device/dtype as latents
Args:
context (InvocationContext): The invocation context, for loading the inpaint mask.
latents (torch.Tensor): A latent image tensor. Used to determine the target shape, device, and dtype for the
inpaint mask.
Returns:
torch.Tensor | None: Inpaint mask. Values of 0.0 represent the regions to be fully denoised, and 1.0
represent the regions to be preserved.
"""
if self.denoise_mask is None:
return None
mask = context.tensors.load(self.denoise_mask.mask_name)
# The input denoise_mask contains values in [0, 1], where 0.0 represents the regions to be fully denoised, and
# 1.0 represents the regions to be preserved.
# We invert the mask so that the regions to be preserved are 0.0 and the regions to be denoised are 1.0.
mask = 1.0 - mask
_, _, latent_height, latent_width = latents.shape
mask = tv_resize(
img=mask,
size=[latent_height, latent_width],
interpolation=tv_transforms.InterpolationMode.BILINEAR,
antialias=False,
)
mask = mask.to(device=latents.device, dtype=latents.dtype)
return mask
def _load_text_conditioning(
self,
context: InvocationContext,
@@ -170,14 +218,20 @@ class SD3DenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
prompt_embeds = torch.cat([neg_prompt_embeds, pos_prompt_embeds], dim=0)
pooled_prompt_embeds = torch.cat([neg_pooled_prompt_embeds, pos_pooled_prompt_embeds], dim=0)
# Prepare the scheduler.
scheduler = FlowMatchEulerDiscreteScheduler()
scheduler.set_timesteps(num_inference_steps=self.steps, device=device)
timesteps = scheduler.timesteps
assert isinstance(timesteps, torch.Tensor)
# Prepare the timestep schedule.
# We add an extra step to the end to account for the final timestep of 0.0.
timesteps: list[float] = torch.linspace(1, 0, self.steps + 1).tolist()
# Clip the timesteps schedule based on denoising_start and denoising_end.
timesteps = clip_timestep_schedule_fractional(timesteps, self.denoising_start, self.denoising_end)
total_steps = len(timesteps) - 1
# Prepare the CFG scale list.
cfg_scale = self._prepare_cfg_scale(len(timesteps))
cfg_scale = self._prepare_cfg_scale(total_steps)
# Load the input latents, if provided.
init_latents = context.tensors.load(self.latents.latents_name) if self.latents else None
if init_latents is not None:
init_latents = init_latents.to(device=device, dtype=inference_dtype)
# Generate initial latent noise.
num_channels_latents = transformer_info.model.config.in_channels
@@ -191,9 +245,34 @@ class SD3DenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
device=device,
seed=self.seed,
)
latents: torch.Tensor = noise
total_steps = len(timesteps)
# Prepare input latent image.
if init_latents is not None:
# Noise the init_latents by the appropriate amount for the first timestep.
t_0 = timesteps[0]
latents = t_0 * noise + (1.0 - t_0) * init_latents
else:
# init_latents are not provided, so we are not doing image-to-image (i.e. we are starting from pure noise).
if self.denoising_start > 1e-5:
raise ValueError("denoising_start should be 0 when initial latents are not provided.")
latents = noise
# If len(timesteps) == 1, then short-circuit. We are just noising the input latents, but not taking any
# denoising steps.
if len(timesteps) <= 1:
return latents
# Prepare inpaint extension.
inpaint_mask = self._prep_inpaint_mask(context, latents)
inpaint_extension: InpaintExtension | None = None
if inpaint_mask is not None:
assert init_latents is not None
inpaint_extension = InpaintExtension(
init_latents=init_latents,
inpaint_mask=inpaint_mask,
noise=noise,
)
step_callback = self._build_step_callback(context)
step_callback(
@@ -210,11 +289,12 @@ class SD3DenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
assert isinstance(transformer, SD3Transformer2DModel)
# 6. Denoising loop
for step_idx, t in tqdm(list(enumerate(timesteps))):
for step_idx, (t_curr, t_prev) in tqdm(list(enumerate(zip(timesteps[:-1], timesteps[1:], strict=True)))):
# Expand the latents if we are doing CFG.
latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
# Expand the timestep to match the latent model input.
timestep = t.expand(latent_model_input.shape[0])
# Multiply by 1000 to match the default FlowMatchEulerDiscreteScheduler num_train_timesteps.
timestep = torch.tensor([t_curr * 1000], device=device).expand(latent_model_input.shape[0])
noise_pred = transformer(
hidden_states=latent_model_input,
@@ -232,21 +312,19 @@ class SD3DenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
# Compute the previous noisy sample x_t -> x_t-1.
latents_dtype = latents.dtype
latents = scheduler.step(model_output=noise_pred, timestep=t, sample=latents, return_dict=False)[0]
latents = latents.to(dtype=torch.float32)
latents = latents + (t_prev - t_curr) * noise_pred
latents = latents.to(dtype=latents_dtype)
# TODO(ryand): This MPS dtype handling was copied from diffusers, I haven't tested to see if it's
# needed.
if latents.dtype != latents_dtype:
if torch.backends.mps.is_available():
# some platforms (eg. apple mps) misbehave due to a pytorch bug: https://github.com/pytorch/pytorch/pull/99272
latents = latents.to(latents_dtype)
if inpaint_extension is not None:
latents = inpaint_extension.merge_intermediate_latents_with_init_latents(latents, t_prev)
step_callback(
PipelineIntermediateState(
step=step_idx + 1,
order=1,
total_steps=total_steps,
timestep=int(t),
timestep=int(t_curr),
latents=latents,
),
)

View File

@@ -0,0 +1,65 @@
import einops
import torch
from diffusers.models.autoencoders.autoencoder_kl import AutoencoderKL
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.fields import (
FieldDescriptions,
ImageField,
Input,
InputField,
WithBoard,
WithMetadata,
)
from invokeai.app.invocations.model import VAEField
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.model_manager.load.load_base import LoadedModel
from invokeai.backend.stable_diffusion.diffusers_pipeline import image_resized_to_grid_as_tensor
@invocation(
"sd3_i2l",
title="SD3 Image to Latents",
tags=["image", "latents", "vae", "i2l", "sd3"],
category="image",
version="1.0.0",
classification=Classification.Prototype,
)
class SD3ImageToLatentsInvocation(BaseInvocation, WithMetadata, WithBoard):
"""Generates latents from an image."""
image: ImageField = InputField(description="The image to encode")
vae: VAEField = InputField(description=FieldDescriptions.vae, input=Input.Connection)
@staticmethod
def vae_encode(vae_info: LoadedModel, image_tensor: torch.Tensor) -> torch.Tensor:
with vae_info as vae:
assert isinstance(vae, AutoencoderKL)
vae.disable_tiling()
image_tensor = image_tensor.to(device=vae.device, dtype=vae.dtype)
with torch.inference_mode():
image_tensor_dist = vae.encode(image_tensor).latent_dist
# TODO: Use seed to make sampling reproducible.
latents: torch.Tensor = image_tensor_dist.sample().to(dtype=vae.dtype)
latents = vae.config.scaling_factor * latents
return latents
@torch.no_grad()
def invoke(self, context: InvocationContext) -> LatentsOutput:
image = context.images.get_pil(self.image.image_name)
image_tensor = image_resized_to_grid_as_tensor(image.convert("RGB"))
if image_tensor.dim() == 3:
image_tensor = einops.rearrange(image_tensor, "c h w -> 1 c h w")
vae_info = context.models.load(self.vae.vae)
latents = self.vae_encode(vae_info=vae_info, image_tensor=image_tensor)
latents = latents.to("cpu")
name = context.tensors.save(tensor=latents)
return LatentsOutput.build(latents_name=name, latents=latents, seed=None)

View File

@@ -47,6 +47,7 @@ class SD3LatentsToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
vae_info = context.models.load(self.vae.vae)
assert isinstance(vae_info.model, (AutoencoderKL))
with SeamlessExt.static_patch_model(vae_info.model, self.vae.seamless_axes), vae_info as vae:
context.util.signal_progress("Running VAE")
assert isinstance(vae, (AutoencoderKL))
latents = latents.to(vae.device)

View File

@@ -95,6 +95,7 @@ class Sd3TextEncoderInvocation(BaseInvocation):
t5_text_encoder_info as t5_text_encoder,
t5_tokenizer_info as t5_tokenizer,
):
context.util.signal_progress("Running T5 encoder")
assert isinstance(t5_text_encoder, T5EncoderModel)
assert isinstance(t5_tokenizer, (T5Tokenizer, T5TokenizerFast))
@@ -137,6 +138,7 @@ class Sd3TextEncoderInvocation(BaseInvocation):
clip_tokenizer_info as clip_tokenizer,
ExitStack() as exit_stack,
):
context.util.signal_progress("Running CLIP encoder")
assert isinstance(clip_text_encoder, (CLIPTextModel, CLIPTextModelWithProjection))
assert isinstance(clip_tokenizer, CLIPTokenizer)

View File

@@ -86,7 +86,7 @@ class ModelLoadService(ModelLoadServiceBase):
def torch_load_file(checkpoint: Path) -> AnyModel:
scan_result = scan_file_path(checkpoint)
if scan_result.infected_files != 0:
if scan_result.infected_files != 0 or scan_result.scan_err:
raise Exception("The model at {checkpoint} is potentially infected by malware. Aborting load.")
result = torch_load(checkpoint, map_location="cpu")
return result

View File

@@ -378,6 +378,9 @@ class DefaultSessionProcessor(SessionProcessorBase):
self._poll_now()
async def _on_queue_item_status_changed(self, event: FastAPIEvent[QueueItemStatusChangedEvent]) -> None:
# Make sure the cancel event is for the currently processing queue item
if self._queue_item and self._queue_item.item_id is not event[1].item_id:
return
if self._queue_item and event[1].status in ["completed", "failed", "canceled"]:
# When the queue item is canceled via HTTP, the queue item status is set to `"canceled"` and this event is
# emitted. We need to respond to this event and stop graph execution. This is done by setting the cancel

View File

@@ -16,6 +16,7 @@ from pydantic import (
from pydantic_core import to_jsonable_python
from invokeai.app.invocations.baseinvocation import BaseInvocation
from invokeai.app.invocations.fields import ImageField
from invokeai.app.services.shared.graph import Graph, GraphExecutionState, NodeNotFoundError
from invokeai.app.services.workflow_records.workflow_records_common import (
WorkflowWithoutID,
@@ -51,11 +52,7 @@ class SessionQueueItemNotFoundError(ValueError):
# region Batch
BatchDataType = Union[
StrictStr,
float,
int,
]
BatchDataType = Union[StrictStr, float, int, ImageField]
class NodeFieldValue(BaseModel):

View File

@@ -160,6 +160,10 @@ class LoggerInterface(InvocationContextInterface):
class ImagesInterface(InvocationContextInterface):
def __init__(self, services: InvocationServices, data: InvocationContextData, util: "UtilInterface") -> None:
super().__init__(services, data)
self._util = util
def save(
self,
image: Image,
@@ -186,6 +190,8 @@ class ImagesInterface(InvocationContextInterface):
The saved image DTO.
"""
self._util.signal_progress("Saving image")
# If `metadata` is provided directly, use that. Else, use the metadata provided by `WithMetadata`, falling back to None.
metadata_ = None
if metadata:
@@ -336,6 +342,10 @@ class ConditioningInterface(InvocationContextInterface):
class ModelsInterface(InvocationContextInterface):
"""Common API for loading, downloading and managing models."""
def __init__(self, services: InvocationServices, data: InvocationContextData, util: "UtilInterface") -> None:
super().__init__(services, data)
self._util = util
def exists(self, identifier: Union[str, "ModelIdentifierField"]) -> bool:
"""Check if a model exists.
@@ -368,11 +378,15 @@ class ModelsInterface(InvocationContextInterface):
if isinstance(identifier, str):
model = self._services.model_manager.store.get_model(identifier)
return self._services.model_manager.load.load_model(model, submodel_type)
else:
_submodel_type = submodel_type or identifier.submodel_type
submodel_type = submodel_type or identifier.submodel_type
model = self._services.model_manager.store.get_model(identifier.key)
return self._services.model_manager.load.load_model(model, _submodel_type)
message = f"Loading model {model.name}"
if submodel_type:
message += f" ({submodel_type.value})"
self._util.signal_progress(message)
return self._services.model_manager.load.load_model(model, submodel_type)
def load_by_attrs(
self, name: str, base: BaseModelType, type: ModelType, submodel_type: Optional[SubModelType] = None
@@ -397,6 +411,10 @@ class ModelsInterface(InvocationContextInterface):
if len(configs) > 1:
raise ValueError(f"More than one model found with name {name}, base {base}, and type {type}")
message = f"Loading model {name}"
if submodel_type:
message += f" ({submodel_type.value})"
self._util.signal_progress(message)
return self._services.model_manager.load.load_model(configs[0], submodel_type)
def get_config(self, identifier: Union[str, "ModelIdentifierField"]) -> AnyModelConfig:
@@ -467,6 +485,7 @@ class ModelsInterface(InvocationContextInterface):
Returns:
Path to the downloaded model
"""
self._util.signal_progress(f"Downloading model {source}")
return self._services.model_manager.install.download_and_cache_model(source=source)
def load_local_model(
@@ -489,6 +508,8 @@ class ModelsInterface(InvocationContextInterface):
Returns:
A LoadedModelWithoutConfig object.
"""
self._util.signal_progress(f"Loading model {model_path.name}")
return self._services.model_manager.load.load_model_from_path(model_path=model_path, loader=loader)
def load_remote_model(
@@ -514,6 +535,8 @@ class ModelsInterface(InvocationContextInterface):
A LoadedModelWithoutConfig object.
"""
model_path = self._services.model_manager.install.download_and_cache_model(source=str(source))
self._util.signal_progress(f"Loading model {source}")
return self._services.model_manager.load.load_model_from_path(model_path=model_path, loader=loader)
@@ -707,12 +730,12 @@ def build_invocation_context(
"""
logger = LoggerInterface(services=services, data=data)
images = ImagesInterface(services=services, data=data)
tensors = TensorsInterface(services=services, data=data)
models = ModelsInterface(services=services, data=data)
config = ConfigInterface(services=services, data=data)
util = UtilInterface(services=services, data=data, is_canceled=is_canceled)
conditioning = ConditioningInterface(services=services, data=data)
models = ModelsInterface(services=services, data=data, util=util)
images = ImagesInterface(services=services, data=data, util=util)
boards = BoardsInterface(services=services, data=data)
ctx = InvocationContext(

View File

@@ -0,0 +1,42 @@
{
"_class_name": "ControlNetModel",
"_diffusers_version": "0.16.0.dev0",
"_name_or_path": "controlnet_v1_1/control_v11p_sd15_canny",
"act_fn": "silu",
"attention_head_dim": 8,
"block_out_channels": [
320,
640,
1280,
1280
],
"class_embed_type": null,
"conditioning_embedding_out_channels": [
16,
32,
96,
256
],
"controlnet_conditioning_channel_order": "rgb",
"cross_attention_dim": 768,
"down_block_types": [
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"DownBlock2D"
],
"downsample_padding": 1,
"flip_sin_to_cos": true,
"freq_shift": 0,
"in_channels": 4,
"layers_per_block": 2,
"mid_block_scale_factor": 1,
"norm_eps": 1e-05,
"norm_num_groups": 32,
"num_class_embeds": null,
"only_cross_attention": false,
"projection_class_embeddings_input_dim": null,
"resnet_time_scale_shift": "default",
"upcast_attention": false,
"use_linear_projection": false
}

View File

@@ -0,0 +1,56 @@
{
"_class_name": "ControlNetModel",
"_diffusers_version": "0.19.3",
"act_fn": "silu",
"addition_embed_type": "text_time",
"addition_embed_type_num_heads": 64,
"addition_time_embed_dim": 256,
"attention_head_dim": [
5,
10,
20
],
"block_out_channels": [
320,
640,
1280
],
"class_embed_type": null,
"conditioning_channels": 3,
"conditioning_embedding_out_channels": [
16,
32,
96,
256
],
"controlnet_conditioning_channel_order": "rgb",
"cross_attention_dim": 2048,
"down_block_types": [
"DownBlock2D",
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D"
],
"downsample_padding": 1,
"encoder_hid_dim": null,
"encoder_hid_dim_type": null,
"flip_sin_to_cos": true,
"freq_shift": 0,
"global_pool_conditions": false,
"in_channels": 4,
"layers_per_block": 2,
"mid_block_scale_factor": 1,
"norm_eps": 1e-05,
"norm_num_groups": 32,
"num_attention_heads": null,
"num_class_embeds": null,
"only_cross_attention": false,
"projection_class_embeddings_input_dim": 2816,
"resnet_time_scale_shift": "default",
"transformer_layers_per_block": [
1,
2,
10
],
"upcast_attention": null,
"use_linear_projection": true
}

View File

@@ -0,0 +1,20 @@
{
"crop_size": 224,
"do_center_crop": true,
"do_convert_rgb": true,
"do_normalize": true,
"do_resize": true,
"feature_extractor_type": "CLIPFeatureExtractor",
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"resample": 3,
"size": 224
}

View File

@@ -0,0 +1,32 @@
{
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.6.0",
"feature_extractor": [
"transformers",
"CLIPImageProcessor"
],
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"PNDMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}

View File

@@ -0,0 +1,175 @@
{
"_commit_hash": "4bb648a606ef040e7685bde262611766a5fdd67b",
"_name_or_path": "CompVis/stable-diffusion-safety-checker",
"architectures": [
"StableDiffusionSafetyChecker"
],
"initializer_factor": 1.0,
"logit_scale_init_value": 2.6592,
"model_type": "clip",
"projection_dim": 768,
"text_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0.0,
"bad_words_ids": null,
"bos_token_id": 0,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.0,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": 2,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "quick_gelu",
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-05,
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 77,
"min_length": 0,
"model_type": "clip_text_model",
"no_repeat_ngram_size": 0,
"num_attention_heads": 12,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 12,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": 1,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torch_dtype": null,
"torchscript": false,
"transformers_version": "4.22.0.dev0",
"typical_p": 1.0,
"use_bfloat16": false,
"vocab_size": 49408
},
"text_config_dict": {
"hidden_size": 768,
"intermediate_size": 3072,
"num_attention_heads": 12,
"num_hidden_layers": 12
},
"torch_dtype": "float32",
"transformers_version": null,
"vision_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0.0,
"bad_words_ids": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.0,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "quick_gelu",
"hidden_size": 1024,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"image_size": 224,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 4096,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-05,
"length_penalty": 1.0,
"max_length": 20,
"min_length": 0,
"model_type": "clip_vision_model",
"no_repeat_ngram_size": 0,
"num_attention_heads": 16,
"num_beam_groups": 1,
"num_beams": 1,
"num_channels": 3,
"num_hidden_layers": 24,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"patch_size": 14,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torch_dtype": null,
"torchscript": false,
"transformers_version": "4.22.0.dev0",
"typical_p": 1.0,
"use_bfloat16": false
},
"vision_config_dict": {
"hidden_size": 1024,
"intermediate_size": 4096,
"num_attention_heads": 16,
"num_hidden_layers": 24,
"patch_size": 14
}
}

View File

@@ -0,0 +1,13 @@
{
"_class_name": "PNDMScheduler",
"_diffusers_version": "0.6.0",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"num_train_timesteps": 1000,
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null,
"clip_sample": false
}

View File

@@ -0,0 +1,25 @@
{
"_name_or_path": "openai/clip-vit-large-patch14",
"architectures": [
"CLIPTextModel"
],
"attention_dropout": 0.0,
"bos_token_id": 0,
"dropout": 0.0,
"eos_token_id": 2,
"hidden_act": "quick_gelu",
"hidden_size": 768,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 77,
"model_type": "clip_text_model",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 1,
"projection_dim": 768,
"torch_dtype": "float32",
"transformers_version": "4.22.0.dev0",
"vocab_size": 49408
}

View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": "<|endoftext|>",
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,34 @@
{
"add_prefix_space": false,
"bos_token": {
"__type": "AddedToken",
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"do_lower_case": true,
"eos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"errors": "replace",
"model_max_length": 77,
"name_or_path": "openai/clip-vit-large-patch14",
"pad_token": "<|endoftext|>",
"special_tokens_map_file": "./special_tokens_map.json",
"tokenizer_class": "CLIPTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,36 @@
{
"_class_name": "UNet2DConditionModel",
"_diffusers_version": "0.6.0",
"act_fn": "silu",
"attention_head_dim": 8,
"block_out_channels": [
320,
640,
1280,
1280
],
"center_input_sample": false,
"cross_attention_dim": 768,
"down_block_types": [
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"DownBlock2D"
],
"downsample_padding": 1,
"flip_sin_to_cos": true,
"freq_shift": 0,
"in_channels": 4,
"layers_per_block": 2,
"mid_block_scale_factor": 1,
"norm_eps": 1e-05,
"norm_num_groups": 32,
"out_channels": 4,
"sample_size": 64,
"up_block_types": [
"UpBlock2D",
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D"
]
}

View File

@@ -0,0 +1,29 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.6.0",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 512,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
]
}

View File

@@ -0,0 +1,28 @@
{
"crop_size": {
"height": 224,
"width": 224
},
"do_center_crop": true,
"do_convert_rgb": true,
"do_normalize": true,
"do_rescale": true,
"do_resize": true,
"feature_extractor_type": "CLIPFeatureExtractor",
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "CLIPFeatureExtractor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"resample": 3,
"rescale_factor": 0.00392156862745098,
"size": {
"shortest_edge": 224
}
}

View File

@@ -0,0 +1,33 @@
{
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.18.0.dev0",
"feature_extractor": [
"transformers",
"CLIPFeatureExtractor"
],
"requires_safety_checker": true,
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"DPMSolverMultistepScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}

View File

@@ -0,0 +1,168 @@
{
"_commit_hash": "cb41f3a270d63d454d385fc2e4f571c487c253c5",
"_name_or_path": "CompVis/stable-diffusion-safety-checker",
"architectures": [
"StableDiffusionSafetyChecker"
],
"initializer_factor": 1.0,
"logit_scale_init_value": 2.6592,
"model_type": "clip",
"projection_dim": 768,
"text_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0.0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": 0,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.0,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": 2,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "quick_gelu",
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-05,
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 77,
"min_length": 0,
"model_type": "clip_text_model",
"no_repeat_ngram_size": 0,
"num_attention_heads": 12,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 12,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": 1,
"prefix": null,
"problem_type": null,
"projection_dim": 512,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torch_dtype": null,
"torchscript": false,
"transformers_version": "4.30.2",
"typical_p": 1.0,
"use_bfloat16": false,
"vocab_size": 49408
},
"torch_dtype": "float16",
"transformers_version": null,
"vision_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0.0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.0,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "quick_gelu",
"hidden_size": 1024,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"image_size": 224,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 4096,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-05,
"length_penalty": 1.0,
"max_length": 20,
"min_length": 0,
"model_type": "clip_vision_model",
"no_repeat_ngram_size": 0,
"num_attention_heads": 16,
"num_beam_groups": 1,
"num_beams": 1,
"num_channels": 3,
"num_hidden_layers": 24,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"patch_size": 14,
"prefix": null,
"problem_type": null,
"projection_dim": 512,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torch_dtype": null,
"torchscript": false,
"transformers_version": "4.30.2",
"typical_p": 1.0,
"use_bfloat16": false
}
}

View File

@@ -0,0 +1,26 @@
{
"_class_name": "DPMSolverMultistepScheduler",
"_diffusers_version": "0.18.0.dev0",
"algorithm_type": "dpmsolver++",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": false,
"clip_sample_range": 1.0,
"dynamic_thresholding_ratio": 0.995,
"lambda_min_clipped": -Infinity,
"lower_order_final": true,
"num_train_timesteps": 1000,
"prediction_type": "v_prediction",
"rescale_betas_zero_snr": false,
"sample_max_value": 1.0,
"set_alpha_to_one": false,
"solver_order": 2,
"solver_type": "midpoint",
"steps_offset": 1,
"thresholding": false,
"timestep_spacing": "leading",
"trained_betas": null,
"use_karras_sigmas": false,
"variance_type": null
}

View File

@@ -0,0 +1,25 @@
{
"_name_or_path": "openai/clip-vit-large-patch14",
"architectures": [
"CLIPTextModel"
],
"attention_dropout": 0.0,
"bos_token_id": 0,
"dropout": 0.0,
"eos_token_id": 2,
"hidden_act": "quick_gelu",
"hidden_size": 768,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 77,
"model_type": "clip_text_model",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 1,
"projection_dim": 768,
"torch_dtype": "float16",
"transformers_version": "4.30.2",
"vocab_size": 49408
}

View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": "<|endoftext|>",
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,33 @@
{
"add_prefix_space": false,
"bos_token": {
"__type": "AddedToken",
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": true,
"do_lower_case": true,
"eos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"errors": "replace",
"model_max_length": 77,
"pad_token": "<|endoftext|>",
"tokenizer_class": "CLIPTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,62 @@
{
"_class_name": "UNet2DConditionModel",
"_diffusers_version": "0.18.0.dev0",
"act_fn": "silu",
"addition_embed_type": null,
"addition_embed_type_num_heads": 64,
"attention_head_dim": 8,
"block_out_channels": [
320,
640,
1280,
1280
],
"center_input_sample": false,
"class_embed_type": null,
"class_embeddings_concat": false,
"conv_in_kernel": 3,
"conv_out_kernel": 3,
"cross_attention_dim": 768,
"cross_attention_norm": null,
"down_block_types": [
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"DownBlock2D"
],
"downsample_padding": 1,
"dual_cross_attention": false,
"encoder_hid_dim": null,
"encoder_hid_dim_type": null,
"flip_sin_to_cos": true,
"freq_shift": 0,
"in_channels": 4,
"layers_per_block": 2,
"mid_block_only_cross_attention": null,
"mid_block_scale_factor": 1,
"mid_block_type": "UNetMidBlock2DCrossAttn",
"norm_eps": 1e-05,
"norm_num_groups": 32,
"num_attention_heads": null,
"num_class_embeds": null,
"only_cross_attention": false,
"out_channels": 4,
"projection_class_embeddings_input_dim": null,
"resnet_out_scale_factor": 1.0,
"resnet_skip_time_act": false,
"resnet_time_scale_shift": "default",
"sample_size": 96,
"time_cond_proj_dim": null,
"time_embedding_act_fn": null,
"time_embedding_dim": null,
"time_embedding_type": "positional",
"timestep_post_act": null,
"up_block_types": [
"UpBlock2D",
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D"
],
"upcast_attention": null,
"use_linear_projection": false
}

View File

@@ -0,0 +1,30 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.18.0.dev0",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 768,
"scaling_factor": 0.18215,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
]
}

View File

@@ -0,0 +1,20 @@
{
"crop_size": 224,
"do_center_crop": true,
"do_convert_rgb": true,
"do_normalize": true,
"do_resize": true,
"feature_extractor_type": "CLIPFeatureExtractor",
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"resample": 3,
"size": 224
}

View File

@@ -0,0 +1,33 @@
{
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.8.0",
"feature_extractor": [
"transformers",
"CLIPImageProcessor"
],
"requires_safety_checker": false,
"safety_checker": [
null,
null
],
"scheduler": [
"diffusers",
"DDIMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}

View File

@@ -0,0 +1,14 @@
{
"_class_name": "DDIMScheduler",
"_diffusers_version": "0.8.0",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": false,
"num_train_timesteps": 1000,
"prediction_type": "v_prediction",
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null
}

View File

@@ -0,0 +1,25 @@
{
"_name_or_path": "hf-models/stable-diffusion-v2-768x768/text_encoder",
"architectures": [
"CLIPTextModel"
],
"attention_dropout": 0.0,
"bos_token_id": 0,
"dropout": 0.0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_size": 1024,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 77,
"model_type": "clip_text_model",
"num_attention_heads": 16,
"num_hidden_layers": 23,
"pad_token_id": 1,
"projection_dim": 512,
"torch_dtype": "float32",
"transformers_version": "4.25.0.dev0",
"vocab_size": 49408
}

View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": "!",
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,34 @@
{
"add_prefix_space": false,
"bos_token": {
"__type": "AddedToken",
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"do_lower_case": true,
"eos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"errors": "replace",
"model_max_length": 77,
"name_or_path": "hf-models/stable-diffusion-v2-768x768/tokenizer",
"pad_token": "<|endoftext|>",
"special_tokens_map_file": "./special_tokens_map.json",
"tokenizer_class": "CLIPTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,46 @@
{
"_class_name": "UNet2DConditionModel",
"_diffusers_version": "0.10.0.dev0",
"act_fn": "silu",
"attention_head_dim": [
5,
10,
20,
20
],
"block_out_channels": [
320,
640,
1280,
1280
],
"center_input_sample": false,
"cross_attention_dim": 1024,
"down_block_types": [
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"DownBlock2D"
],
"downsample_padding": 1,
"dual_cross_attention": false,
"flip_sin_to_cos": true,
"freq_shift": 0,
"in_channels": 4,
"layers_per_block": 2,
"mid_block_scale_factor": 1,
"norm_eps": 1e-05,
"norm_num_groups": 32,
"num_class_embeds": null,
"only_cross_attention": false,
"out_channels": 4,
"sample_size": 96,
"up_block_types": [
"UpBlock2D",
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D"
],
"use_linear_projection": true,
"upcast_attention": true
}

View File

@@ -0,0 +1,30 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.8.0",
"_name_or_path": "hf-models/stable-diffusion-v2-768x768/vae",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 768,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
]
}

View File

@@ -0,0 +1,34 @@
{
"_class_name": "StableDiffusionXLPipeline",
"_diffusers_version": "0.19.0.dev0",
"force_zeros_for_empty_prompt": true,
"add_watermarker": null,
"scheduler": [
"diffusers",
"EulerDiscreteScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"text_encoder_2": [
"transformers",
"CLIPTextModelWithProjection"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"tokenizer_2": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}

View File

@@ -0,0 +1,18 @@
{
"_class_name": "EulerDiscreteScheduler",
"_diffusers_version": "0.19.0.dev0",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": false,
"interpolation_type": "linear",
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"sample_max_value": 1.0,
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"timestep_spacing": "leading",
"trained_betas": null,
"use_karras_sigmas": false
}

View File

@@ -0,0 +1,24 @@
{
"architectures": [
"CLIPTextModel"
],
"attention_dropout": 0.0,
"bos_token_id": 0,
"dropout": 0.0,
"eos_token_id": 2,
"hidden_act": "quick_gelu",
"hidden_size": 768,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 77,
"model_type": "clip_text_model",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 1,
"projection_dim": 768,
"torch_dtype": "float16",
"transformers_version": "4.32.0.dev0",
"vocab_size": 49408
}

View File

@@ -0,0 +1,24 @@
{
"architectures": [
"CLIPTextModelWithProjection"
],
"attention_dropout": 0.0,
"bos_token_id": 0,
"dropout": 0.0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_size": 1280,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 5120,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 77,
"model_type": "clip_text_model",
"num_attention_heads": 20,
"num_hidden_layers": 32,
"pad_token_id": 1,
"projection_dim": 1280,
"torch_dtype": "float16",
"transformers_version": "4.32.0.dev0",
"vocab_size": 49408
}

View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": "<|endoftext|>",
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,33 @@
{
"add_prefix_space": false,
"bos_token": {
"__type": "AddedToken",
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": true,
"do_lower_case": true,
"eos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"errors": "replace",
"model_max_length": 77,
"pad_token": "<|endoftext|>",
"tokenizer_class": "CLIPTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": "!",
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,33 @@
{
"add_prefix_space": false,
"bos_token": {
"__type": "AddedToken",
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": true,
"do_lower_case": true,
"eos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"errors": "replace",
"model_max_length": 77,
"pad_token": "!",
"tokenizer_class": "CLIPTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,69 @@
{
"_class_name": "UNet2DConditionModel",
"_diffusers_version": "0.19.0.dev0",
"act_fn": "silu",
"addition_embed_type": "text_time",
"addition_embed_type_num_heads": 64,
"addition_time_embed_dim": 256,
"attention_head_dim": [
5,
10,
20
],
"block_out_channels": [
320,
640,
1280
],
"center_input_sample": false,
"class_embed_type": null,
"class_embeddings_concat": false,
"conv_in_kernel": 3,
"conv_out_kernel": 3,
"cross_attention_dim": 2048,
"cross_attention_norm": null,
"down_block_types": [
"DownBlock2D",
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D"
],
"downsample_padding": 1,
"dual_cross_attention": false,
"encoder_hid_dim": null,
"encoder_hid_dim_type": null,
"flip_sin_to_cos": true,
"freq_shift": 0,
"in_channels": 4,
"layers_per_block": 2,
"mid_block_only_cross_attention": null,
"mid_block_scale_factor": 1,
"mid_block_type": "UNetMidBlock2DCrossAttn",
"norm_eps": 1e-05,
"norm_num_groups": 32,
"num_attention_heads": null,
"num_class_embeds": null,
"only_cross_attention": false,
"out_channels": 4,
"projection_class_embeddings_input_dim": 2816,
"resnet_out_scale_factor": 1.0,
"resnet_skip_time_act": false,
"resnet_time_scale_shift": "default",
"sample_size": 128,
"time_cond_proj_dim": null,
"time_embedding_act_fn": null,
"time_embedding_dim": null,
"time_embedding_type": "positional",
"timestep_post_act": null,
"transformer_layers_per_block": [
1,
2,
10
],
"up_block_types": [
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D",
"UpBlock2D"
],
"upcast_attention": null,
"use_linear_projection": true
}

View File

@@ -0,0 +1,32 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.20.0.dev0",
"_name_or_path": "../sdxl-vae/",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"force_upcast": true,
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 1024,
"scaling_factor": 0.13025,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
]
}

View File

@@ -0,0 +1,31 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.19.0.dev0",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"force_upcast": true,
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 1024,
"scaling_factor": 0.13025,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
]
}

View File

@@ -0,0 +1,31 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.19.0.dev0",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"force_upcast": true,
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 1024,
"scaling_factor": 0.13025,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
]
}

View File

@@ -0,0 +1,31 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.19.0.dev0",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"force_upcast": true,
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 1024,
"scaling_factor": 0.13025,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
]
}

View File

@@ -0,0 +1,35 @@
{
"_class_name": "StableDiffusionXLImg2ImgPipeline",
"_diffusers_version": "0.19.0.dev0",
"force_zeros_for_empty_prompt": false,
"add_watermarker": null,
"requires_aesthetics_score": true,
"scheduler": [
"diffusers",
"EulerDiscreteScheduler"
],
"text_encoder": [
null,
null
],
"text_encoder_2": [
"transformers",
"CLIPTextModelWithProjection"
],
"tokenizer": [
null,
null
],
"tokenizer_2": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}

View File

@@ -0,0 +1,18 @@
{
"_class_name": "EulerDiscreteScheduler",
"_diffusers_version": "0.19.0.dev0",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": false,
"interpolation_type": "linear",
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"sample_max_value": 1.0,
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"timestep_spacing": "leading",
"trained_betas": null,
"use_karras_sigmas": false
}

View File

@@ -0,0 +1,24 @@
{
"architectures": [
"CLIPTextModelWithProjection"
],
"attention_dropout": 0.0,
"bos_token_id": 0,
"dropout": 0.0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_size": 1280,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 5120,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 77,
"model_type": "clip_text_model",
"num_attention_heads": 20,
"num_hidden_layers": 32,
"pad_token_id": 1,
"projection_dim": 1280,
"torch_dtype": "float16",
"transformers_version": "4.32.0.dev0",
"vocab_size": 49408
}

View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": "!",
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,33 @@
{
"add_prefix_space": false,
"bos_token": {
"__type": "AddedToken",
"content": "<|startoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": true,
"do_lower_case": true,
"eos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"errors": "replace",
"model_max_length": 77,
"pad_token": "!",
"tokenizer_class": "CLIPTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,69 @@
{
"_class_name": "UNet2DConditionModel",
"_diffusers_version": "0.19.0.dev0",
"act_fn": "silu",
"addition_embed_type": "text_time",
"addition_embed_type_num_heads": 64,
"addition_time_embed_dim": 256,
"attention_head_dim": [
6,
12,
24,
24
],
"block_out_channels": [
384,
768,
1536,
1536
],
"center_input_sample": false,
"class_embed_type": null,
"class_embeddings_concat": false,
"conv_in_kernel": 3,
"conv_out_kernel": 3,
"cross_attention_dim": 1280,
"cross_attention_norm": null,
"down_block_types": [
"DownBlock2D",
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"DownBlock2D"
],
"downsample_padding": 1,
"dual_cross_attention": false,
"encoder_hid_dim": null,
"encoder_hid_dim_type": null,
"flip_sin_to_cos": true,
"freq_shift": 0,
"in_channels": 4,
"layers_per_block": 2,
"mid_block_only_cross_attention": null,
"mid_block_scale_factor": 1,
"mid_block_type": "UNetMidBlock2DCrossAttn",
"norm_eps": 1e-05,
"norm_num_groups": 32,
"num_attention_heads": null,
"num_class_embeds": null,
"only_cross_attention": false,
"out_channels": 4,
"projection_class_embeddings_input_dim": 2560,
"resnet_out_scale_factor": 1.0,
"resnet_skip_time_act": false,
"resnet_time_scale_shift": "default",
"sample_size": 128,
"time_cond_proj_dim": null,
"time_embedding_act_fn": null,
"time_embedding_dim": null,
"time_embedding_type": "positional",
"timestep_post_act": null,
"transformer_layers_per_block": 4,
"up_block_types": [
"UpBlock2D",
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D",
"UpBlock2D"
],
"upcast_attention": null,
"use_linear_projection": true
}

View File

@@ -0,0 +1,32 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.20.0.dev0",
"_name_or_path": "../sdxl-vae/",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"force_upcast": true,
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 1024,
"scaling_factor": 0.13025,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
]
}

View File

@@ -0,0 +1,31 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.19.0.dev0",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"force_upcast": true,
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 1024,
"scaling_factor": 0.13025,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
]
}

View File

@@ -1,9 +1,10 @@
import einops
import torch
from invokeai.backend.flux.extensions.regional_prompting_extension import RegionalPromptingExtension
from invokeai.backend.flux.extensions.xlabs_ip_adapter_extension import XLabsIPAdapterExtension
from invokeai.backend.flux.math import attention
from invokeai.backend.flux.modules.layers import DoubleStreamBlock
from invokeai.backend.flux.modules.layers import DoubleStreamBlock, SingleStreamBlock
class CustomDoubleStreamBlockProcessor:
@@ -13,7 +14,12 @@ class CustomDoubleStreamBlockProcessor:
@staticmethod
def _double_stream_block_forward(
block: DoubleStreamBlock, img: torch.Tensor, txt: torch.Tensor, vec: torch.Tensor, pe: torch.Tensor
block: DoubleStreamBlock,
img: torch.Tensor,
txt: torch.Tensor,
vec: torch.Tensor,
pe: torch.Tensor,
attn_mask: torch.Tensor | None = None,
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
"""This function is a direct copy of DoubleStreamBlock.forward(), but it returns some of the intermediate
values.
@@ -40,7 +46,7 @@ class CustomDoubleStreamBlockProcessor:
k = torch.cat((txt_k, img_k), dim=2)
v = torch.cat((txt_v, img_v), dim=2)
attn = attention(q, k, v, pe=pe)
attn = attention(q, k, v, pe=pe, attn_mask=attn_mask)
txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
# calculate the img bloks
@@ -63,11 +69,15 @@ class CustomDoubleStreamBlockProcessor:
vec: torch.Tensor,
pe: torch.Tensor,
ip_adapter_extensions: list[XLabsIPAdapterExtension],
regional_prompting_extension: RegionalPromptingExtension,
) -> tuple[torch.Tensor, torch.Tensor]:
"""A custom implementation of DoubleStreamBlock.forward() with additional features:
- IP-Adapter support
"""
img, txt, img_q = CustomDoubleStreamBlockProcessor._double_stream_block_forward(block, img, txt, vec, pe)
attn_mask = regional_prompting_extension.get_double_stream_attn_mask(block_index)
img, txt, img_q = CustomDoubleStreamBlockProcessor._double_stream_block_forward(
block, img, txt, vec, pe, attn_mask=attn_mask
)
# Apply IP-Adapter conditioning.
for ip_adapter_extension in ip_adapter_extensions:
@@ -81,3 +91,48 @@ class CustomDoubleStreamBlockProcessor:
)
return img, txt
class CustomSingleStreamBlockProcessor:
"""A class containing a custom implementation of SingleStreamBlock.forward() with additional features (masking,
etc.)
"""
@staticmethod
def _single_stream_block_forward(
block: SingleStreamBlock,
x: torch.Tensor,
vec: torch.Tensor,
pe: torch.Tensor,
attn_mask: torch.Tensor | None = None,
) -> torch.Tensor:
"""This function is a direct copy of SingleStreamBlock.forward()."""
mod, _ = block.modulation(vec)
x_mod = (1 + mod.scale) * block.pre_norm(x) + mod.shift
qkv, mlp = torch.split(block.linear1(x_mod), [3 * block.hidden_size, block.mlp_hidden_dim], dim=-1)
q, k, v = einops.rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=block.num_heads)
q, k = block.norm(q, k, v)
# compute attention
attn = attention(q, k, v, pe=pe, attn_mask=attn_mask)
# compute activation in mlp stream, cat again and run second linear layer
output = block.linear2(torch.cat((attn, block.mlp_act(mlp)), 2))
return x + mod.gate * output
@staticmethod
def custom_single_block_forward(
timestep_index: int,
total_num_timesteps: int,
block_index: int,
block: SingleStreamBlock,
img: torch.Tensor,
vec: torch.Tensor,
pe: torch.Tensor,
regional_prompting_extension: RegionalPromptingExtension,
) -> torch.Tensor:
"""A custom implementation of SingleStreamBlock.forward() with additional features:
- Masking
"""
attn_mask = regional_prompting_extension.get_single_stream_attn_mask(block_index)
return CustomSingleStreamBlockProcessor._single_stream_block_forward(block, img, vec, pe, attn_mask=attn_mask)

Some files were not shown because too many files have changed in this diff Show More