Improve FLUX image-to-image (Trajectory Guidance) (#6900)

## Summary

This PR makes some improvements to the FLUX image-to-image and
inpainting behaviours.

Changes:
- Expand inpainting region at a cutoff timestep. This improves seam
coherence around inpainting regions.
- Add Trajectory Guidance to improve the ability to control how much an
image gets modified during image-to-image/inpainting (see the code for a
more technical explanation - it's well-documented).

## `trajectory_guidance_strength` Usage

- The `trajectory_guidance_strength` param has been added to the `FLUX
Denoise` invocation.
- `trajectory_guidance_strength` defaults to `0.0` and should be in the
range [0, 1].
- `trajectory_guidance_strength = 0.0` has no effect on the denoising
process.
- `trajectory_guidance_strength = 1.0` will guide strongly towards the
original image.

## FLUX image-to-image usage tips

- As always, prompt matters a lot.
- If you are trying to making minor perturbations to an image, use
vanilla image-to-image by setting the `denoising_start` param.
- If you are trying to make significant changes to an image, using
trajectory guidance will give more control than using vanilla
image-to-image. Set `denoising_start=0.0` and adjust
`trajectory_guidance_strength` to control the amount of change in the
image.
- The 'transition point' where the image changes the most as you adjust
`trajectory_guidance_strength` or `denoise_start` varies depending on
the noise. So, set a fixed noise seed, then tune those params.


## QA Instructions

- [x] Vanilla image-to-image - No change in output
- [x] Vanilla inpainting - No change in output
- [x] Vanilla outpainting - No change in output
- Trajectory Guidance image-to-image
    - [x] TGS = 0.0 is identical to Vanilla case
    - [x] TGS = 1.0 guides close to the original image
      - Not as close as I'd like, but it's not broken.
    - [x] Smooth transition as TGS varies
    - [x] Smoke test: TGS with denoise_start > 0.0
- TG inpainting
    - [x] TGS = 0.0 is identical to Vanilla case
    - [x] TGS = 1.0 guides close to the original image
      - Not as close as I'd like, but it's not broken
    - [x] Smooth transition as TGS varies
    - [x] Smoke test: TGS with denoise_start > 0.0
- TG outpainting
    - [x] TGS = 0.0 is identical to Vanilla case
    - [x] Smoke test TGS outpainting
- [x] Smoke test FLUX text-to-image
- [x] Preview images look ok for all of above.

## Known issues (will be addressed in follow-up PRs)

- The current TGS scale biases towards creating more change than desired
in the image. More tuning of the TG change schedule is required.
- TGS does not work very well for outpainting right now. This _might_ be
solvable, but more likely we'll just want to discourage it in the Linear
UI.

## Merge Plan

No special instructions.

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
This commit is contained in:
Ryan Dick
2024-09-20 18:47:32 -04:00
committed by GitHub
14 changed files with 425 additions and 138 deletions

View File

@@ -20,7 +20,6 @@ from invokeai.app.invocations.model import TransformerField
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.flux.denoise import denoise
from invokeai.backend.flux.inpaint_extension import InpaintExtension
from invokeai.backend.flux.model import Flux
from invokeai.backend.flux.sampling_utils import (
clip_timestep_schedule,
@@ -30,6 +29,7 @@ from invokeai.backend.flux.sampling_utils import (
pack,
unpack,
)
from invokeai.backend.flux.trajectory_guidance_extension import TrajectoryGuidanceExtension
from invokeai.backend.lora.lora_model_raw import LoRAModelRaw
from invokeai.backend.lora.lora_patcher import LoRAPatcher
from invokeai.backend.model_manager.config import ModelFormat
@@ -43,7 +43,7 @@ from invokeai.backend.util.devices import TorchDevice
title="FLUX Denoise",
tags=["image", "flux"],
category="image",
version="2.0.0",
version="2.1.0",
classification=Classification.Prototype,
)
class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
@@ -68,6 +68,12 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
description=FieldDescriptions.denoising_start,
)
denoising_end: float = InputField(default=1.0, ge=0, le=1, description=FieldDescriptions.denoising_end)
trajectory_guidance_strength: float = InputField(
default=0.0,
ge=0.0,
le=1.0,
description="Value indicating how strongly to guide the denoising process towards the initial latents (during image-to-image). Range [0, 1]. A value of 0.0 is equivalent to vanilla image-to-image. A value of 1.0 will guide the denoising process very close to the original latents.",
)
transformer: TransformerField = InputField(
description=FieldDescriptions.flux_model,
input=Input.Connection,
@@ -181,14 +187,13 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
# Now that we have 'packed' the latent tensors, verify that we calculated the image_seq_len correctly.
assert image_seq_len == x.shape[1]
# Prepare inpaint extension.
inpaint_extension: InpaintExtension | None = None
if inpaint_mask is not None:
assert init_latents is not None
inpaint_extension = InpaintExtension(
# Prepare trajectory guidance extension.
traj_guidance_extension: TrajectoryGuidanceExtension | None = None
if init_latents is not None:
traj_guidance_extension = TrajectoryGuidanceExtension(
init_latents=init_latents,
inpaint_mask=inpaint_mask,
noise=noise,
trajectory_guidance_strength=self.trajectory_guidance_strength,
)
with (
@@ -236,7 +241,7 @@ class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
timesteps=timesteps,
step_callback=self._build_step_callback(context),
guidance=self.guidance,
inpaint_extension=inpaint_extension,
traj_guidance_extension=traj_guidance_extension,
)
x = unpack(x.float(), self.height, self.width)

View File

@@ -2,7 +2,7 @@
"name": "FLUX Image to Image",
"author": "InvokeAI",
"description": "A simple image-to-image workflow using a FLUX dev model. ",
"version": "1.0.4",
"version": "1.1.0",
"contact": "",
"tags": "image2image, flux, image-to-image",
"notes": "Prerequisite model downloads: T5 Encoder, CLIP-L Encoder, and FLUX VAE. Quantized and un-quantized versions can be found in the starter models tab within your Model Manager. We recommend using FLUX dev models for image-to-image workflows. The image-to-image performance with FLUX schnell models is poor.",
@@ -23,17 +23,13 @@
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "vae_model"
},
{
"nodeId": "ace0258f-67d7-4eee-a218-6fff27065214",
"fieldName": "denoising_start"
},
{
"nodeId": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"fieldName": "prompt"
},
{
"nodeId": "ace0258f-67d7-4eee-a218-6fff27065214",
"fieldName": "num_steps"
"nodeId": "2981a67c-480f-4237-9384-26b68dbf912b",
"fieldName": "image"
}
],
"meta": {
@@ -42,48 +38,18 @@
},
"nodes": [
{
"id": "2981a67c-480f-4237-9384-26b68dbf912b",
"id": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"type": "invocation",
"data": {
"id": "2981a67c-480f-4237-9384-26b68dbf912b",
"type": "flux_vae_encode",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"inputs": {
"image": {
"name": "image",
"label": "",
"value": {
"image_name": "8a5c62aa-9335-45d2-9c71-89af9fc1f8d4.png"
}
},
"vae": {
"name": "vae",
"label": ""
}
}
},
"position": {
"x": 732.7680166609682,
"y": -24.37398171806909
}
},
{
"id": "ace0258f-67d7-4eee-a218-6fff27065214",
"type": "invocation",
"data": {
"id": "ace0258f-67d7-4eee-a218-6fff27065214",
"id": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"type": "flux_denoise",
"version": "1.0.0",
"version": "2.1.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"nodePack": "invokeai",
"inputs": {
"board": {
"name": "board",
@@ -111,6 +77,11 @@
"label": "",
"value": 1
},
"trajectory_guidance_strength": {
"name": "trajectory_guidance_strength",
"label": "",
"value": 0.0
},
"transformer": {
"name": "transformer",
"label": ""
@@ -131,7 +102,7 @@
},
"num_steps": {
"name": "num_steps",
"label": "Steps (Recommend 30 for Dev, 4 for Schnell)",
"label": "",
"value": 30
},
"guidance": {
@@ -147,8 +118,36 @@
}
},
"position": {
"x": 1182.8836633018684,
"y": -251.38882958913183
"x": 1159.584057771928,
"y": -175.90561201366845
}
},
{
"id": "2981a67c-480f-4237-9384-26b68dbf912b",
"type": "invocation",
"data": {
"id": "2981a67c-480f-4237-9384-26b68dbf912b",
"type": "flux_vae_encode",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"inputs": {
"image": {
"name": "image",
"label": ""
},
"vae": {
"name": "vae",
"label": ""
}
}
},
"position": {
"x": 732.7680166609682,
"y": -24.37398171806909
}
},
{
@@ -202,18 +201,32 @@
"inputs": {
"model": {
"name": "model",
"label": "Model (dev variant recommended for Image-to-Image)"
"label": "Model (dev variant recommended for Image-to-Image)",
"value": {
"key": "b4990a6c-0899-48e9-969b-d6f3801acc6a",
"hash": "random:aad8f7bc19ce76541dfb394b62a30f77722542b66e48064a9f25453263b45fba",
"name": "FLUX Dev (Quantized)_2",
"base": "flux",
"type": "main"
}
},
"t5_encoder_model": {
"name": "t5_encoder_model",
"label": ""
"label": "",
"value": {
"key": "d18d5575-96b6-4da3-b3d8-eb58308d6705",
"hash": "random:f2f9ed74acdfb4bf6fec200e780f6c25f8dd8764a35e65d425d606912fdf573a",
"name": "t5_bnb_int8_quantized_encoder",
"base": "any",
"type": "t5_encoder"
}
},
"clip_embed_model": {
"name": "clip_embed_model",
"label": "",
"value": {
"key": "fa23a584-b623-415d-832a-21b5098ff1a1",
"hash": "blake3:17c19f0ef941c3b7609a9c94a659ca5364de0be364a91d4179f0e39ba17c3b70",
"key": "5a19d7e5-8d98-43cd-8a81-87515e4b3b4e",
"hash": "random:4bd08514c08fb6ff04088db9aeb45def3c488e8b5fd09a35f2cc4f2dc346f99f",
"name": "clip-vit-large-patch14",
"base": "any",
"type": "clip_embed"
@@ -223,8 +236,8 @@
"name": "vae_model",
"label": "",
"value": {
"key": "74fc82ba-c0a8-479d-a890-2126f82da758",
"hash": "blake3:ce21cb76364aa6e2421311cf4a4b5eb052a76c4f1cd207b50703d8978198a068",
"key": "9172beab-5c1d-43f0-b2f0-6e0b956710d9",
"hash": "random:c54dde288e5fa2e6137f1c92e9d611f598049e6f16e360207b6d96c9f5a67ba0",
"name": "FLUX.1-schnell_ae",
"base": "flux",
"type": "vae"
@@ -308,28 +321,60 @@
],
"edges": [
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bheight-ace0258f-67d7-4eee-a218-6fff27065214height",
"id": "reactflow__edge-eebd7252-0bd8-401a-bb26-2b8bc64892falatents-7e5172eb-48c1-44db-a770-8fd83e1435d1latents",
"type": "default",
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "height",
"targetHandle": "height"
"source": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"sourceHandle": "latents",
"targetHandle": "latents"
},
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bwidth-ace0258f-67d7-4eee-a218-6fff27065214width",
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-eebd7252-0bd8-401a-bb26-2b8bc64892fatransformer",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "transformer",
"targetHandle": "transformer"
},
{
"id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-eebd7252-0bd8-401a-bb26-2b8bc64892fapositive_text_conditioning",
"type": "default",
"source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "conditioning",
"targetHandle": "positive_text_conditioning"
},
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912blatents-eebd7252-0bd8-401a-bb26-2b8bc64892falatents",
"type": "default",
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "latents",
"targetHandle": "latents"
},
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bwidth-eebd7252-0bd8-401a-bb26-2b8bc64892fawidth",
"type": "default",
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "width",
"targetHandle": "width"
},
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912blatents-ace0258f-67d7-4eee-a218-6fff27065214latents",
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bheight-eebd7252-0bd8-401a-bb26-2b8bc64892faheight",
"type": "default",
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "latents",
"targetHandle": "latents"
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "height",
"targetHandle": "height"
},
{
"id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-eebd7252-0bd8-401a-bb26-2b8bc64892faseed",
"type": "default",
"source": "4754c534-a5f3-4ad0-9382-7887985e668c",
"target": "eebd7252-0bd8-401a-bb26-2b8bc64892fa",
"sourceHandle": "value",
"targetHandle": "seed"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-2981a67c-480f-4237-9384-26b68dbf912bvae",
@@ -339,38 +384,6 @@
"sourceHandle": "vae",
"targetHandle": "vae"
},
{
"id": "reactflow__edge-ace0258f-67d7-4eee-a218-6fff27065214latents-7e5172eb-48c1-44db-a770-8fd83e1435d1latents",
"type": "default",
"source": "ace0258f-67d7-4eee-a218-6fff27065214",
"target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"sourceHandle": "latents",
"targetHandle": "latents"
},
{
"id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-ace0258f-67d7-4eee-a218-6fff27065214seed",
"type": "default",
"source": "4754c534-a5f3-4ad0-9382-7887985e668c",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "value",
"targetHandle": "seed"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-ace0258f-67d7-4eee-a218-6fff27065214transformer",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "transformer",
"targetHandle": "transformer"
},
{
"id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-ace0258f-67d7-4eee-a218-6fff27065214positive_text_conditioning",
"type": "default",
"source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "conditioning",
"targetHandle": "positive_text_conditioning"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-7e5172eb-48c1-44db-a770-8fd83e1435d1vae",
"type": "default",

View File

@@ -2,7 +2,7 @@
"name": "FLUX Text to Image",
"author": "InvokeAI",
"description": "A simple text-to-image workflow using FLUX dev or schnell models.",
"version": "1.0.4",
"version": "1.1.0",
"contact": "",
"tags": "text2image, flux",
"notes": "Prerequisite model downloads: T5 Encoder, CLIP-L Encoder, and FLUX VAE. Quantized and un-quantized versions can be found in the starter models tab within your Model Manager. We recommend 4 steps for FLUX schnell models and 30 steps for FLUX dev models.",
@@ -26,10 +26,6 @@
{
"nodeId": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"fieldName": "prompt"
},
{
"nodeId": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"fieldName": "num_steps"
}
],
"meta": {
@@ -38,17 +34,18 @@
},
"nodes": [
{
"id": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"id": "4ecda92d-ee0e-45ca-aa35-6e9410ac39b9",
"type": "invocation",
"data": {
"id": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"id": "4ecda92d-ee0e-45ca-aa35-6e9410ac39b9",
"type": "flux_denoise",
"version": "1.0.0",
"version": "2.1.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"nodePack": "invokeai",
"inputs": {
"board": {
"name": "board",
@@ -76,6 +73,11 @@
"label": "",
"value": 1
},
"trajectory_guidance_strength": {
"name": "trajectory_guidance_strength",
"label": "",
"value": 0
},
"transformer": {
"name": "transformer",
"label": ""
@@ -96,8 +98,8 @@
},
"num_steps": {
"name": "num_steps",
"label": "Steps (Recommend 30 for Dev, 4 for Schnell)",
"value": 30
"label": "",
"value": 4
},
"guidance": {
"name": "guidance",
@@ -112,8 +114,8 @@
}
},
"position": {
"x": 1186.1868226120378,
"y": -214.9459927686657
"x": 1161.0101524413685,
"y": -223.33548695623742
}
},
{
@@ -167,19 +169,47 @@
"inputs": {
"model": {
"name": "model",
"label": ""
"label": "",
"value": {
"key": "b4990a6c-0899-48e9-969b-d6f3801acc6a",
"hash": "random:aad8f7bc19ce76541dfb394b62a30f77722542b66e48064a9f25453263b45fba",
"name": "FLUX Dev (Quantized)_2",
"base": "flux",
"type": "main"
}
},
"t5_encoder_model": {
"name": "t5_encoder_model",
"label": ""
"label": "",
"value": {
"key": "d18d5575-96b6-4da3-b3d8-eb58308d6705",
"hash": "random:f2f9ed74acdfb4bf6fec200e780f6c25f8dd8764a35e65d425d606912fdf573a",
"name": "t5_bnb_int8_quantized_encoder",
"base": "any",
"type": "t5_encoder"
}
},
"clip_embed_model": {
"name": "clip_embed_model",
"label": ""
"label": "",
"value": {
"key": "5a19d7e5-8d98-43cd-8a81-87515e4b3b4e",
"hash": "random:4bd08514c08fb6ff04088db9aeb45def3c488e8b5fd09a35f2cc4f2dc346f99f",
"name": "clip-vit-large-patch14",
"base": "any",
"type": "clip_embed"
}
},
"vae_model": {
"name": "vae_model",
"label": ""
"label": "",
"value": {
"key": "9172beab-5c1d-43f0-b2f0-6e0b956710d9",
"hash": "random:c54dde288e5fa2e6137f1c92e9d611f598049e6f16e360207b6d96c9f5a67ba0",
"name": "FLUX.1-schnell_ae",
"base": "flux",
"type": "vae"
}
}
}
},
@@ -259,33 +289,33 @@
],
"edges": [
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-4fe24f07-f906-4f55-ab2c-9beee56ef5bdtransformer",
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-4ecda92d-ee0e-45ca-aa35-6e9410ac39b9transformer",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"target": "4ecda92d-ee0e-45ca-aa35-6e9410ac39b9",
"sourceHandle": "transformer",
"targetHandle": "transformer"
},
{
"id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-4fe24f07-f906-4f55-ab2c-9beee56ef5bdpositive_text_conditioning",
"id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-4ecda92d-ee0e-45ca-aa35-6e9410ac39b9positive_text_conditioning",
"type": "default",
"source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"target": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"target": "4ecda92d-ee0e-45ca-aa35-6e9410ac39b9",
"sourceHandle": "conditioning",
"targetHandle": "positive_text_conditioning"
},
{
"id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-4fe24f07-f906-4f55-ab2c-9beee56ef5bdseed",
"id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-4ecda92d-ee0e-45ca-aa35-6e9410ac39b9seed",
"type": "default",
"source": "4754c534-a5f3-4ad0-9382-7887985e668c",
"target": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"target": "4ecda92d-ee0e-45ca-aa35-6e9410ac39b9",
"sourceHandle": "value",
"targetHandle": "seed"
},
{
"id": "reactflow__edge-4fe24f07-f906-4f55-ab2c-9beee56ef5bdlatents-7e5172eb-48c1-44db-a770-8fd83e1435d1latents",
"id": "reactflow__edge-4ecda92d-ee0e-45ca-aa35-6e9410ac39b9latents-7e5172eb-48c1-44db-a770-8fd83e1435d1latents",
"type": "default",
"source": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"source": "4ecda92d-ee0e-45ca-aa35-6e9410ac39b9",
"target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"sourceHandle": "latents",
"targetHandle": "latents"

View File

@@ -3,8 +3,8 @@ from typing import Callable
import torch
from tqdm import tqdm
from invokeai.backend.flux.inpaint_extension import InpaintExtension
from invokeai.backend.flux.model import Flux
from invokeai.backend.flux.trajectory_guidance_extension import TrajectoryGuidanceExtension
from invokeai.backend.stable_diffusion.diffusers_pipeline import PipelineIntermediateState
@@ -20,7 +20,7 @@ def denoise(
timesteps: list[float],
step_callback: Callable[[PipelineIntermediateState], None],
guidance: float,
inpaint_extension: InpaintExtension | None,
traj_guidance_extension: TrajectoryGuidanceExtension | None, # noqa: F821
):
step = 0
# guidance_vec is ignored for schnell.
@@ -36,13 +36,15 @@ def denoise(
timesteps=t_vec,
guidance=guidance_vec,
)
if traj_guidance_extension is not None:
pred = traj_guidance_extension.update_noise(
t_curr_latents=img, pred_noise=pred, t_curr=t_curr, t_prev=t_prev
)
preview_img = img - t_curr * pred
img = img + (t_prev - t_curr) * pred
if inpaint_extension is not None:
img = inpaint_extension.merge_intermediate_latents_with_init_latents(img, t_prev)
preview_img = inpaint_extension.merge_intermediate_latents_with_init_latents(preview_img, 0.0)
step_callback(
PipelineIntermediateState(
step=step,

View File

@@ -0,0 +1,134 @@
import torch
from invokeai.backend.util.build_line import build_line
class TrajectoryGuidanceExtension:
"""An implementation of trajectory guidance for FLUX.
What is trajectory guidance?
----------------------------
With SD 1 and SDXL, the amount of change in image-to-image denoising is largely controlled by the denoising_start
parameter. Doing the same thing with the FLUX model does not work as well, because the FLUX model converges very
quickly (roughly time 1.0 to 0.9) to the structure of the final image. The result of this model characteristic is
that you typically get one of two outcomes:
1) a result that is very similar to the original image
2) a result that is very different from the original image, as though it was generated from the text prompt with
pure noise.
To address this issue with image-to-image workflows with FLUX, we employ the concept of trajectory guidance. The
idea is that in addition to controlling the denoising_start parameter (i.e. the amount of noise added to the
original image), we can also guide the denoising process to stay close to the trajectory that would reproduce the
original. By controlling the strength of the trajectory guidance throughout the denoising process, we can achieve
FLUX image-to-image behavior with the same level of control offered by SD1 and SDXL.
What is the trajectory_guidance_strength?
-----------------------------------------
In the limit, we could apply a different trajectory guidance 'strength' for every latent value in every timestep.
This would be impractical for a user, so instead we have engineered a strength schedule that is more convenient to
use. The `trajectory_guidance_strength` parameter is a single scalar value that maps to a schedule. The engineered
schedule is defined as:
1) An initial change_ratio at t=1.0.
2) A linear ramp up to change_ratio=1.0 at t = t_cutoff.
3) A constant change_ratio=1.0 after t = t_cutoff.
"""
def __init__(
self, init_latents: torch.Tensor, inpaint_mask: torch.Tensor | None, trajectory_guidance_strength: float
):
"""Initialize TrajectoryGuidanceExtension.
Args:
init_latents (torch.Tensor): The initial latents (i.e. un-noised at timestep 0). In 'packed' format.
inpaint_mask (torch.Tensor | None): A mask specifying which elements to inpaint. Range [0, 1]. Values of 1
will be re-generated. Values of 0 will remain unchanged. Values between 0 and 1 can be used to blend the
inpainted region with the background. In 'packed' format. If None, will be treated as a mask of all 1s.
trajectory_guidance_strength (float): A value in [0, 1] specifying the strength of the trajectory guidance.
A value of 0.0 is equivalent to vanilla image-to-image. A value of 1.0 will guide the denoising process
very close to the original latents.
"""
assert 0.0 <= trajectory_guidance_strength <= 1.0
self._init_latents = init_latents
if inpaint_mask is None:
# The inpaing mask is None, so we initialize a mask with a single value of 1.0.
# This value will be broadcasted and treated as a mask of all 1s.
self._inpaint_mask = torch.ones(1, device=init_latents.device, dtype=init_latents.dtype)
else:
self._inpaint_mask = inpaint_mask
# Calculate the params that define the trajectory guidance schedule.
# These mappings from trajectory_guidance_strength have no theoretical basis - they were tuned manually.
self._trajectory_guidance_strength = trajectory_guidance_strength
self._change_ratio_at_t_1 = build_line(x1=0.0, y1=1.0, x2=1.0, y2=0.0)(self._trajectory_guidance_strength)
self._change_ratio_at_cutoff = 1.0
self._t_cutoff = build_line(x1=0.0, y1=1.0, x2=1.0, y2=0.5)(self._trajectory_guidance_strength)
def _apply_mask_gradient_adjustment(self, t_prev: float) -> torch.Tensor:
"""Applies inpaint mask gradient adjustment and returns the inpaint mask to be used at the current timestep."""
# As we progress through the denoising process, we promote gradient regions of the mask to have a full weight of
# 1.0. This helps to produce more coherent seams around the inpainted region. We experimented with a (small)
# number of promotion strategies (e.g. gradual promotion based on timestep), but found that a simple cutoff
# threshold worked well.
# We use a small epsilon to avoid any potential issues with floating point precision.
eps = 1e-4
mask_gradient_t_cutoff = 0.5
if t_prev > mask_gradient_t_cutoff:
# Early in the denoising process, use the inpaint mask as-is.
return self._inpaint_mask
else:
# After the cut-off, promote all non-zero mask values to 1.0.
mask = self._inpaint_mask.where(self._inpaint_mask <= (0.0 + eps), 1.0)
return mask
def _get_change_ratio(self, t_prev: float) -> float:
"""Get the change_ratio for t_prev based on the change schedule."""
change_ratio = 1.0
if t_prev > self._t_cutoff:
# If we are before the cutoff, linearly interpolate between the change_ratio at t=1.0 and the change_ratio
# at the cutoff.
change_ratio = build_line(
x1=1.0, y1=self._change_ratio_at_t_1, x2=self._t_cutoff, y2=self._change_ratio_at_cutoff
)(t_prev)
# The change_ratio should be in the range [0, 1]. Assert that we didn't make any mistakes.
eps = 1e-5
assert 0.0 - eps <= change_ratio <= 1.0 + eps
return change_ratio
def update_noise(
self, t_curr_latents: torch.Tensor, pred_noise: torch.Tensor, t_curr: float, t_prev: float
) -> torch.Tensor:
# Handle gradient cutoff.
mask = self._apply_mask_gradient_adjustment(t_prev)
mask = mask * self._get_change_ratio(t_prev)
# NOTE(ryand): During inpainting, it is common to guide the denoising process by noising the initial latents for
# the current timestep and then blending the predicted intermediate latents with the noised initial latents.
# For example:
# ```
# noised_init_latents = self._noise * t_prev + (1.0 - t_prev) * self._init_latents
# return t_prev_latents * self._inpaint_mask + noised_init_latents * (1.0 - self._inpaint_mask)
# ```
# Instead of guiding based on the noised initial latents, we have decided to guide based on the noise prediction
# that points towards the initial latents. The difference between these guidance strategies is minor, but
# qualitatively we found the latter to produce slightly better results. When change_ratio is 0.0 or 1.0 there is
# no difference between the two strategies.
#
# We experimented with a number of related guidance strategies, but not exhaustively. It's entirely possible
# that there's a much better way to do this.
# Calculate noise guidance
# What noise should the model have predicted at this timestep to step towards self._init_latents?
# Derivation:
# > t_prev_latents = t_curr_latents + (t_prev - t_curr) * pred_noise
# > t_0_latents = t_curr_latents + (0 - t_curr) * init_traj_noise
# > t_0_latents = t_curr_latents - t_curr * init_traj_noise
# > init_traj_noise = (t_curr_latents - t_0_latents) / t_curr)
init_traj_noise = (t_curr_latents - self._init_latents) / t_curr
# Blend the init_traj_noise with the pred_noise according to the inpaint mask and the trajectory guidance.
noise = pred_noise * mask + init_traj_noise * (1.0 - mask)
return noise

View File

@@ -0,0 +1,6 @@
from typing import Callable
def build_line(x1: float, y1: float, x2: float, y2: float) -> Callable[[float], float]:
"""Build a linear function given two points on the line (x1, y1) and (x2, y2)."""
return lambda x: (y2 - y1) / (x2 - x1) * (x - x1) + y1

View File

@@ -1042,6 +1042,7 @@
"strength": "Strength",
"symmetry": "Symmetry",
"tileSize": "Tile Size",
"optimizedInpainting": "Optimized Inpainting",
"type": "Type",
"postProcessing": "Post-Processing (Shift + U)",
"processImage": "Process Image",
@@ -1547,6 +1548,12 @@
"paragraphs": [
"FLUX.1 [dev] models are licensed under the FLUX [dev] non-commercial license. To use this model type for commercial purposes in Invoke, visit our website to learn more."
]
},
"optimizedDenoising": {
"heading": "Optimized Inpainting",
"paragraphs": [
"Enable optimized denoising for enhanced inpainting transformations with Flux models. This setting improves detail and clarity during generation, but may be turned off to preserve more of your original image."
]
}
},
"unifiedCanvas": {

View File

@@ -60,6 +60,7 @@ export type Feature =
| 'scale'
| 'creativity'
| 'structure'
| 'optimizedDenoising'
| 'fluxDevLicense';
export type PopoverData = PopoverProps & {

View File

@@ -40,6 +40,7 @@ export type ParamsState = {
cfgRescaleMultiplier: ParameterCFGRescaleMultiplier;
guidance: ParameterGuidance;
img2imgStrength: ParameterStrength;
optimizedDenoisingEnabled: boolean;
iterations: number;
scheduler: ParameterScheduler;
seed: ParameterSeed;
@@ -83,6 +84,7 @@ const initialState: ParamsState = {
cfgRescaleMultiplier: 0,
guidance: 4,
img2imgStrength: 0.75,
optimizedDenoisingEnabled: true,
iterations: 1,
scheduler: 'euler',
seed: 0,
@@ -141,6 +143,9 @@ export const paramsSlice = createSlice({
setImg2imgStrength: (state, action: PayloadAction<number>) => {
state.img2imgStrength = action.payload;
},
setOptimizedDenoisingEnabled: (state, action: PayloadAction<boolean>) => {
state.optimizedDenoisingEnabled = action.payload;
},
setSeamlessXAxis: (state, action: PayloadAction<boolean>) => {
state.seamlessXAxis = action.payload;
},
@@ -273,6 +278,7 @@ export const {
setScheduler,
setSeed,
setImg2imgStrength,
setOptimizedDenoisingEnabled,
setSeamlessXAxis,
setSeamlessYAxis,
setShouldRandomizeSeed,
@@ -341,6 +347,7 @@ export const selectInfillPatchmatchDownscaleSize = createParamsSelector(
);
export const selectInfillColorValue = createParamsSelector((params) => params.infillColorValue);
export const selectImg2imgStrength = createParamsSelector((params) => params.img2imgStrength);
export const selectOptimizedDenoisingEnabled = createParamsSelector((params) => params.optimizedDenoisingEnabled);
export const selectPositivePrompt = createParamsSelector((params) => params.positivePrompt);
export const selectNegativePrompt = createParamsSelector((params) => params.negativePrompt);
export const selectPositivePrompt2 = createParamsSelector((params) => params.positivePrompt2);

View File

@@ -37,7 +37,17 @@ export const buildFLUXGraph = async (
const { originalSize, scaledSize } = getSizes(bbox);
const { model, guidance, seed, steps, fluxVAE, t5EncoderModel, clipEmbedModel, img2imgStrength } = params;
const {
model,
guidance,
seed,
steps,
fluxVAE,
t5EncoderModel,
clipEmbedModel,
img2imgStrength,
optimizedDenoisingEnabled,
} = params;
assert(model, 'No model found in state');
assert(t5EncoderModel, 'No T5 Encoder model found in state');
@@ -68,7 +78,8 @@ export const buildFLUXGraph = async (
guidance,
num_steps: steps,
seed,
denoising_start: 0, // denoising_start should be 0 when latents are not provided
trajectory_guidance_strength: 0,
denoising_start: 0,
denoising_end: 1,
width: scaledSize.width,
height: scaledSize.height,
@@ -113,6 +124,8 @@ export const buildFLUXGraph = async (
clip_embed_model: clipEmbedModel,
});
const denoisingValue = 1 - img2imgStrength;
if (generationMode === 'txt2img') {
canvasOutput = addTextToImage(g, l2i, originalSize, scaledSize);
} else if (generationMode === 'img2img') {
@@ -125,7 +138,7 @@ export const buildFLUXGraph = async (
originalSize,
scaledSize,
bbox,
1 - img2imgStrength,
denoisingValue,
false
);
} else if (generationMode === 'inpaint') {
@@ -139,9 +152,15 @@ export const buildFLUXGraph = async (
modelLoader,
originalSize,
scaledSize,
1 - img2imgStrength,
denoisingValue,
false
);
if (optimizedDenoisingEnabled) {
g.updateNode(noise, {
denoising_start: 0,
trajectory_guidance_strength: denoisingValue,
});
}
} else if (generationMode === 'outpaint') {
canvasOutput = await addOutpaint(
state,
@@ -153,7 +172,7 @@ export const buildFLUXGraph = async (
modelLoader,
originalSize,
scaledSize,
1 - img2imgStrength,
denoisingValue,
false
);
}

View File

@@ -0,0 +1,35 @@
import { FormControl, FormLabel, Switch } from '@invoke-ai/ui-library';
import { useAppDispatch, useAppSelector } from 'app/store/storeHooks';
import { InformationalPopover } from 'common/components/InformationalPopover/InformationalPopover';
import {
selectOptimizedDenoisingEnabled,
setOptimizedDenoisingEnabled,
} from 'features/controlLayers/store/paramsSlice';
import type { ChangeEvent } from 'react';
import { memo, useCallback } from 'react';
import { useTranslation } from 'react-i18next';
export const ParamOptimizedDenoisingToggle = memo(() => {
const optimizedDenoisingEnabled = useAppSelector(selectOptimizedDenoisingEnabled);
const dispatch = useAppDispatch();
const onChange = useCallback(
(event: ChangeEvent<HTMLInputElement>) => {
dispatch(setOptimizedDenoisingEnabled(event.target.checked));
},
[dispatch]
);
const { t } = useTranslation();
return (
<FormControl w="min-content">
<InformationalPopover feature="optimizedDenoising">
<FormLabel m={0}>{t('parameters.optimizedInpainting')}</FormLabel>
</InformationalPopover>
<Switch isChecked={optimizedDenoisingEnabled} onChange={onChange} />
</FormControl>
);
});
ParamOptimizedDenoisingToggle.displayName = 'ParamOptimizedDenoisingToggle';

View File

@@ -3,8 +3,9 @@ import { Expander, Flex, FormControlGroup, StandaloneAccordion } from '@invoke-a
import { EMPTY_ARRAY } from 'app/store/constants';
import { createMemoizedSelector } from 'app/store/createMemoizedSelector';
import { useAppSelector } from 'app/store/storeHooks';
import { selectParamsSlice } from 'features/controlLayers/store/paramsSlice';
import { selectIsFLUX, selectParamsSlice } from 'features/controlLayers/store/paramsSlice';
import { selectCanvasSlice, selectScaleMethod } from 'features/controlLayers/store/selectors';
import { ParamOptimizedDenoisingToggle } from 'features/parameters/components/Advanced/ParamOptimizedDenoisingToggle';
import BboxScaledHeight from 'features/parameters/components/Bbox/BboxScaledHeight';
import BboxScaledWidth from 'features/parameters/components/Bbox/BboxScaledWidth';
import BboxScaleMethod from 'features/parameters/components/Bbox/BboxScaleMethod';
@@ -59,6 +60,7 @@ export const ImageSettingsAccordion = memo(() => {
id: 'image-settings-advanced',
defaultIsOpen: false,
});
const isFLUX = useAppSelector(selectIsFLUX);
return (
<StandaloneAccordion
@@ -77,6 +79,7 @@ export const ImageSettingsAccordion = memo(() => {
<ParamDenoisingStrength />
<Expander label={t('accordions.advanced.options')} isOpen={isOpenExpander} onToggle={onToggleExpander}>
<Flex gap={4} pb={4} flexDir="column">
{isFLUX && <ParamOptimizedDenoisingToggle />}
<BboxScaleMethod />
{scaleMethod !== 'none' && (
<FormControlGroup formLabelProps={scalingLabelProps}>

View File

@@ -6340,6 +6340,12 @@ export type components = {
* @default 1
*/
denoising_end?: number;
/**
* Trajectory Guidance Strength
* @description Value indicating how strongly to guide the denoising process towards the initial latents (during image-to-image). Range [0, 1]. A value of 0.0 is equivalent to vanilla image-to-image. A value of 1.0 will guide the denoising process very close to the original latents.
* @default 0
*/
trajectory_guidance_strength?: number;
/**
* Transformer
* @description Flux model (Transformer) to load

View File

@@ -0,0 +1,19 @@
import math
import pytest
from invokeai.backend.util.build_line import build_line
@pytest.mark.parametrize(
["x1", "y1", "x2", "y2", "x3", "y3"],
[
(0, 0, 1, 1, 2, 2), # y = x
(0, 1, 1, 2, 2, 3), # y = x + 1
(0, 0, 1, 2, 2, 4), # y = 2x
(0, 1, 1, 0, 2, -1), # y = -x + 1
(0, 5, 1, 5, 2, 5), # y = 0
],
)
def test_build_line(x1: float, y1: float, x2: float, y2: float, x3: float, y3: float):
assert math.isclose(build_line(x1, y1, x2, y2)(x3), y3, rel_tol=1e-9)