Compare commits

...

53 Commits

Author SHA1 Message Date
Brandon Rising
bda579577c chore: 4.2.9 version bump 2024-09-05 16:17:48 -04:00
Brandon Rising
a16b555d47 Simplify flux model dtype conversion in model loader 2024-09-05 15:47:14 -04:00
Brandon Rising
6667c39c73 Remove dependency of asizeof 2024-09-05 15:47:14 -04:00
Brandon Rising
5219ac12a6 Add comment explaining the cache make room call 2024-09-05 15:47:14 -04:00
Brandon Rising
445f813fb9 Update flux transformer loader to more efficiently use and release memory during upcasting 2024-09-05 15:47:14 -04:00
Brandon Rising
87f9e59cfb Cast tensors in unquantized flux models to bfloat16 during loading 2024-09-05 15:47:14 -04:00
Phrixus2023
8b03b39aa8 translationBot(ui): update translation (Chinese (Simplified Han script))
Currently translated at 97.6% (1342 of 1374 strings)

Co-authored-by: Phrixus2023 <920414016@qq.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/zh_Hans/
Translation: InvokeAI/Web UI
2024-09-05 15:34:13 -04:00
Tobias Lechner
e59b6bb971 translationBot(ui): update translation (German)
Currently translated at 63.3% (870 of 1374 strings)

Co-authored-by: Tobias Lechner <me@tobias-lechner.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/de/
Translation: InvokeAI/Web UI
2024-09-05 15:34:13 -04:00
Riccardo Giovanetti
24a7ed467c translationBot(ui): update translation (Italian)
Currently translated at 98.2% (1350 of 1374 strings)

translationBot(ui): update translation (Italian)

Currently translated at 98.2% (1350 of 1374 strings)

translationBot(ui): update translation (Italian)

Currently translated at 98.2% (1350 of 1374 strings)

translationBot(ui): update translation (Italian)

Currently translated at 98.4% (1349 of 1370 strings)

translationBot(ui): update translation (Italian)

Currently translated at 98.4% (1348 of 1369 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-09-05 15:34:13 -04:00
Васянатор
f01f1033ac translationBot(ui): update translation (Russian)
Currently translated at 100.0% (1370 of 1370 strings)

translationBot(ui): update translation (Russian)

Currently translated at 100.0% (1369 of 1369 strings)

Co-authored-by: Васянатор <ilabulanov339@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/ru/
Translation: InvokeAI/Web UI
2024-09-05 15:34:13 -04:00
smk-e
d35f515413 translationBot(ui): update translation (Spanish)
Currently translated at 33.0% (452 of 1369 strings)

Co-authored-by: smk-e <jit-r8@outlook.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/es/
Translation: InvokeAI/Web UI
2024-09-05 15:34:13 -04:00
Brandon Rising
125b459e56 chore: 4.2.9rc2 version bump 2024-09-04 10:42:16 -04:00
Brandon Rising
33edee1ba6 Delete all flux bundle state dict keys when extracting the transformer state dict 2024-09-04 09:36:23 -04:00
Brandon Rising
d20335dabc convert_bundle_to_flux_transformer_checkpoint now removes processed keys to decrease memory usage 2024-09-04 09:36:23 -04:00
Brandon Rising
d10d258213 Add a comment for why we're converting scale tensors in flux models to bfloat16 2024-09-04 09:36:23 -04:00
Brandon
d57ba1ed8b Update invokeai/backend/model_manager/probe.py
Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>
2024-09-04 09:36:23 -04:00
Brandon Rising
2d0e34e57b Support non-quantized bundles 2024-09-04 09:36:23 -04:00
Brandon Rising
a005d06255 feat: support checkpoint bundles containing more than just the transformer 2024-09-04 09:36:23 -04:00
Eugene Brodsky
a301ef5a5a chore(ci): update github action version pins in container build workflow 2024-09-03 16:01:58 -04:00
Eugene Brodsky
9422df2737 feat(ci): enable a checkbox to push the container image when manually building via workflow dispatch 2024-09-03 16:01:58 -04:00
Lincoln Stein
6dabe4d3ca assign T5 encoder to base type "Any" 2024-09-03 15:55:51 -04:00
Lincoln Stein
00e4652d30 add more reliable fallback method for determining BnbQuantizedLlmInt8b 2024-09-03 15:55:51 -04:00
Lincoln Stein
b6434c5318 correct modelformat probe for t5 encoders 2024-09-03 15:55:51 -04:00
Lincoln Stein
3f7f9f8d61 add probes for T5_encoder and ClipTextModel 2024-09-03 15:55:51 -04:00
Brandon Rising
f3bb592544 Update latents used for preview images in flux 2024-09-03 14:04:16 -04:00
Brandon Rising
69f080fb75 Move flux step callback code into the step_callback util scripts, use other services within the invocation context 2024-09-03 14:04:16 -04:00
Brandon Rising
04272a7cc8 Initial attempt at preview images 2024-09-03 14:04:16 -04:00
Lincoln Stein
8d35af946e [MM] add API routes for getting & setting MM cache sizes (#6523)
* [MM] add API routes for getting & setting MM cache sizes, and retrieving MM stats

* Update invokeai/app/api/routers/model_manager.py

Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>

* code cleanup after @ryand review

* Update invokeai/app/api/routers/model_manager.py

Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>

* fix merge conflicts; tested and working

---------

Co-authored-by: Lincoln Stein <lstein@gmail.com>
Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>
2024-09-02 12:18:21 -04:00
Ryan Dick
24065ec6b6 Add FLUX image-to-image and inpainting (#6798)
## Summary

This PR adds support for Image-to-Image and inpainting workflows with
the FLUX model.

Full changelog:
- Split out `FLUX VAE Encode` and `FLUX VAE Decode` nodes
- Renamed `FLUX Text-to-Image` node to `FLUX Denoise` (since it now
supports image-to-image too). This is a workflow-breaking change.
- Added support for FLUX image-to-image via the `Latents` param on the
FLUX denoising node.
- Added support for FLUX masked inpainting via the `Denoise Mask` param
on the FLUX denoising node.
- Added "Denoise Start" and "Denoise End" params to the "FLUX Denoise"
node.
- Updated the "FLUX Text to Image" default workflow.
- Added a "FLUX Image to Image" default workflow.

### Example

FLUX inpainting workflow
<img width="1282" alt="image"
src="https://github.com/user-attachments/assets/86fc1170-e620-4412-8fd8-e119f875fc2e">

Input image

![image](https://github.com/user-attachments/assets/9c381b86-9f87-4257-bd2e-da22c56ca26c)

Mask

![image](https://github.com/user-attachments/assets/8f774c5c-2a25-45fe-9d4b-b233e3d58d2c)

Output image

![image](https://github.com/user-attachments/assets/8576a630-24ce-4a00-8052-e86bab59c855)


### Callouts for reviewers:
- I renamed FLUXTextToImageInvocation -> FLUXDenoisingInvocation. This
is, of course, a breaking change. It feels like the right move and now
is the right time to do it. Any objection?
- I added new `FLUX VAE Encode` and `FLUX VAE Decode` nodes.
Alternatively, I could have tried to match these names to the
corresponding SD nodes (e.g. `FLUX Image to Latents`, `FLUX Latents to
Image`). Personally, I prefer the current names, but want to hear other
opinions.

### Usage notes:
- With the default dev timestep scheduler, the image structure is
largely determined in the first ~3 steps. A consequence of this is that
the denoise_start parameter provides limited 'granularity' of control.
This will likely be improved in the future as we add more scheduler
options. In the meantime, you will likely want to use small values for
`denoise_start` (e.g. 0.03) to start denoising on step ~1-4 out of ~30.
- Currently, there is no 'noise' parameter on the `FLUX Denoise` node,
so the `denoise_end` parameter has limited utility. This will be added
in the future.

## QA Instructions

Test the following workflows:
- [x] Vanilla FLUX text-to-image behaviour is unchanged
- [x] Image-to-image with FLUX dev, no mask
- [x] Image-to-image with FLUX dev, with mask
- [x] Image-to-image with FLUX schnell, no mask (smoke test, not
expected to work well)

## Merge Plan

No special instructions.

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
2024-09-02 09:50:31 -04:00
Ryan Dick
627b0bf644 Expose all FLUX model params in the default FLUX models. 2024-09-02 09:38:17 -04:00
Ryan Dick
b43da46b82 Rename 'FLUX VAE Encode'/'FLUX VAE Decode' to 'FLUX Image to Latents'/'FLUX Latents to Image' 2024-09-02 09:38:17 -04:00
Ryan Dick
4255a01c64 Restore line that was accidentally removed during development. 2024-09-02 09:38:17 -04:00
Ryan Dick
23adbd4002 Update schema.ts. 2024-09-02 09:38:17 -04:00
Ryan Dick
fb5a24fcc6 Update default workflows for FLUX. 2024-09-02 09:38:17 -04:00
Ryan Dick
cfdd5a1900 Rename flux_text_to_image.py -> flex_denoise.py 2024-09-02 09:38:17 -04:00
Ryan Dick
2313f326df Add denoise_end param to FluxDenoiseInvocation. 2024-09-02 09:38:17 -04:00
Ryan Dick
2e092a2313 Rename FluxTextToImageInvocation -> FluxDenoiseInvocation. 2024-09-02 09:38:17 -04:00
Ryan Dick
763ef06c18 Use the existence of initial latents to decide whether we are doing image-to-image in the FLUX denoising node. Previously we were using the denoising_start value, but in some cases with an inpaintin mask you may want to run image-to-image from densoising_start=0. 2024-09-02 09:38:17 -04:00
Ryan Dick
8292f6cd42 Code cleanup and documentation around FLUX inpainting. 2024-09-02 09:38:17 -04:00
Ryan Dick
278bba499e Split FLUX VAE decoding out into its own node from LatentsToImageInvocation. 2024-09-02 09:38:17 -04:00
Ryan Dick
dd99ed28e0 Split FLUX VAE encoding out into its own node from ImageToLatentsInvocation. 2024-09-02 09:38:17 -04:00
Ryan Dick
9a8aca69bf Get a rough version of FLUX inpainting working. 2024-09-02 09:38:17 -04:00
Ryan Dick
7ad62512eb Update MaskTensorToImageInvocation to support input mask tensors with or without a channel dimension. 2024-09-02 09:38:17 -04:00
Ryan Dick
bd466661ec Remove unused vae field from FLUXTextToImageInvocation. 2024-09-02 09:38:17 -04:00
Ryan Dick
7ebb509d05 Bump FLUX node versions after splitting out VAE encode/decode. 2024-09-02 09:38:17 -04:00
Ryan Dick
0aa13c046c Split VAE decoding out from the FLUXTextToImageInvocation. 2024-09-02 09:38:17 -04:00
Ryan Dick
a7a33d73f5 Get FLUX non-masked image-to-image working - still rough. 2024-09-02 09:38:17 -04:00
Ryan Dick
ffa39857d3 Add FLUX VAE decoding support to LatentsToImageInvocation. 2024-09-02 09:38:17 -04:00
Ryan Dick
e85c3bc465 Add FLUX VAE support to ImageToLatentsInvocation. 2024-09-02 09:38:17 -04:00
psychedelicious
8185ba7054 scripts: add allocate_vram script
Allocates the specified amount of VRAM, or allocates enough VRAM such that you have the specified amount of VRAM free.

Useful to simulate an environment with a specific amount of VRAM.
2024-09-02 18:18:26 +10:00
Lincoln Stein
d501865bec add a new FAQ for converting safetensors (#6736)
Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-08-31 18:56:08 +00:00
Brandon Rising
d62310bb5f Support HF repos with subfolders in source on windows OS 2024-08-30 19:31:42 -04:00
Brandon Rising
1835bff196 Fix source string in hugging face installs with subfolders 2024-08-30 19:31:42 -04:00
35 changed files with 1895 additions and 568 deletions

View File

@@ -13,6 +13,12 @@ on:
tags:
- 'v*.*.*'
workflow_dispatch:
inputs:
push-to-registry:
description: Push the built image to the container registry
required: false
type: boolean
default: false
permissions:
contents: write
@@ -50,16 +56,15 @@ jobs:
df -h
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Docker meta
id: meta
uses: docker/metadata-action@v4
uses: docker/metadata-action@v5
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
images: |
ghcr.io/${{ github.repository }}
${{ env.DOCKERHUB_REPOSITORY }}
tags: |
type=ref,event=branch
type=ref,event=tag
@@ -72,49 +77,33 @@ jobs:
suffix=-${{ matrix.gpu-driver }},onlatest=false
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
with:
platforms: ${{ env.PLATFORMS }}
- name: Login to GitHub Container Registry
if: github.event_name != 'pull_request'
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
# - name: Login to Docker Hub
# if: github.event_name != 'pull_request' && vars.DOCKERHUB_REPOSITORY != ''
# uses: docker/login-action@v2
# with:
# username: ${{ secrets.DOCKERHUB_USERNAME }}
# password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build container
timeout-minutes: 40
id: docker_build
uses: docker/build-push-action@v4
uses: docker/build-push-action@v6
with:
context: .
file: docker/Dockerfile
platforms: ${{ env.PLATFORMS }}
push: ${{ github.ref == 'refs/heads/main' || github.ref_type == 'tag' }}
push: ${{ github.ref == 'refs/heads/main' || github.ref_type == 'tag' || github.event.inputs.push-to-registry }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: |
type=gha,scope=${{ github.ref_name }}-${{ matrix.gpu-driver }}
type=gha,scope=main-${{ matrix.gpu-driver }}
cache-to: type=gha,mode=max,scope=${{ github.ref_name }}-${{ matrix.gpu-driver }}
# - name: Docker Hub Description
# if: github.ref == 'refs/heads/main' || github.ref == 'refs/tags/*' && vars.DOCKERHUB_REPOSITORY != ''
# uses: peter-evans/dockerhub-description@v3
# with:
# username: ${{ secrets.DOCKERHUB_USERNAME }}
# password: ${{ secrets.DOCKERHUB_TOKEN }}
# repository: ${{ vars.DOCKERHUB_REPOSITORY }}
# short-description: ${{ github.event.repository.description }}

View File

@@ -196,6 +196,22 @@ tips to reduce the problem:
=== "12GB VRAM GPU"
This should be sufficient to generate larger images up to about 1280x1280.
## Checkpoint Models Load Slowly or Use Too Much RAM
The difference between diffusers models (a folder containing multiple
subfolders) and checkpoint models (a file ending with .safetensors or
.ckpt) is that InvokeAI is able to load diffusers models into memory
incrementally, while checkpoint models must be loaded all at
once. With very large models, or systems with limited RAM, you may
experience slowdowns and other memory-related issues when loading
checkpoint models.
To solve this, go to the Model Manager tab (the cube), select the
checkpoint model that's giving you trouble, and press the "Convert"
button in the upper right of your browser window. This will conver the
checkpoint into a diffusers model, after which loading should be
faster and less memory-intensive.
## Memory Leak (Linux)

View File

@@ -3,8 +3,10 @@
import io
import pathlib
import shutil
import traceback
from copy import deepcopy
from enum import Enum
from tempfile import TemporaryDirectory
from typing import List, Optional, Type
@@ -17,6 +19,7 @@ from starlette.exceptions import HTTPException
from typing_extensions import Annotated
from invokeai.app.api.dependencies import ApiDependencies
from invokeai.app.services.config import get_config
from invokeai.app.services.model_images.model_images_common import ModelImageFileNotFoundException
from invokeai.app.services.model_install.model_install_common import ModelInstallJob
from invokeai.app.services.model_records import (
@@ -31,6 +34,7 @@ from invokeai.backend.model_manager.config import (
ModelFormat,
ModelType,
)
from invokeai.backend.model_manager.load.model_cache.model_cache_base import CacheStats
from invokeai.backend.model_manager.metadata.fetch.huggingface import HuggingFaceMetadataFetch
from invokeai.backend.model_manager.metadata.metadata_base import ModelMetadataWithFiles, UnknownMetadataException
from invokeai.backend.model_manager.search import ModelSearch
@@ -50,6 +54,13 @@ class ModelsList(BaseModel):
model_config = ConfigDict(use_enum_values=True)
class CacheType(str, Enum):
"""Cache type - one of vram or ram."""
RAM = "RAM"
VRAM = "VRAM"
def add_cover_image_to_model_config(config: AnyModelConfig, dependencies: Type[ApiDependencies]) -> AnyModelConfig:
"""Add a cover image URL to a model configuration."""
cover_image = dependencies.invoker.services.model_images.get_url(config.key)
@@ -797,3 +808,83 @@ async def get_starter_models() -> list[StarterModel]:
model.dependencies = missing_deps
return starter_models
@model_manager_router.get(
"/model_cache",
operation_id="get_cache_size",
response_model=float,
summary="Get maximum size of model manager RAM or VRAM cache.",
)
async def get_cache_size(cache_type: CacheType = Query(description="The cache type", default=CacheType.RAM)) -> float:
"""Return the current RAM or VRAM cache size setting (in GB)."""
cache = ApiDependencies.invoker.services.model_manager.load.ram_cache
value = 0.0
if cache_type == CacheType.RAM:
value = cache.max_cache_size
elif cache_type == CacheType.VRAM:
value = cache.max_vram_cache_size
return value
@model_manager_router.put(
"/model_cache",
operation_id="set_cache_size",
response_model=float,
summary="Set maximum size of model manager RAM or VRAM cache, optionally writing new value out to invokeai.yaml config file.",
)
async def set_cache_size(
value: float = Query(description="The new value for the maximum cache size"),
cache_type: CacheType = Query(description="The cache type", default=CacheType.RAM),
persist: bool = Query(description="Write new value out to invokeai.yaml", default=False),
) -> float:
"""Set the current RAM or VRAM cache size setting (in GB). ."""
cache = ApiDependencies.invoker.services.model_manager.load.ram_cache
app_config = get_config()
# Record initial state.
vram_old = app_config.vram
ram_old = app_config.ram
# Prepare target state.
vram_new = vram_old
ram_new = ram_old
if cache_type == CacheType.RAM:
ram_new = value
elif cache_type == CacheType.VRAM:
vram_new = value
else:
raise ValueError(f"Unexpected {cache_type=}.")
config_path = app_config.config_file_path
new_config_path = config_path.with_suffix(".yaml.new")
try:
# Try to apply the target state.
cache.max_vram_cache_size = vram_new
cache.max_cache_size = ram_new
app_config.ram = ram_new
app_config.vram = vram_new
if persist:
app_config.write_file(new_config_path)
shutil.move(new_config_path, config_path)
except Exception as e:
# If there was a failure, restore the initial state.
cache.max_cache_size = ram_old
cache.max_vram_cache_size = vram_old
app_config.ram = ram_old
app_config.vram = vram_old
raise RuntimeError("Failed to update cache size") from e
return value
@model_manager_router.get(
"/stats",
operation_id="get_stats",
response_model=Optional[CacheStats],
summary="Get model manager RAM cache performance statistics.",
)
async def get_stats() -> Optional[CacheStats]:
"""Return performance statistics on the model manager's RAM cache. Will return null if no models have been loaded."""
return ApiDependencies.invoker.services.model_manager.load.ram_cache.stats

View File

@@ -185,7 +185,7 @@ class DenoiseLatentsInvocation(BaseInvocation):
)
denoise_mask: Optional[DenoiseMaskField] = InputField(
default=None,
description=FieldDescriptions.mask,
description=FieldDescriptions.denoise_mask,
input=Input.Connection,
ui_order=8,
)

View File

@@ -181,7 +181,7 @@ class FieldDescriptions:
)
num_1 = "The first number"
num_2 = "The second number"
mask = "The mask to use for the operation"
denoise_mask = "A mask of the region to apply the denoising process to."
board = "The board to save the image to"
image = "The image to process"
tile_size = "Tile size"

View File

@@ -0,0 +1,249 @@
from typing import Callable, Optional
import torch
import torchvision.transforms as tv_transforms
from torchvision.transforms.functional import resize as tv_resize
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.fields import (
DenoiseMaskField,
FieldDescriptions,
FluxConditioningField,
Input,
InputField,
LatentsField,
WithBoard,
WithMetadata,
)
from invokeai.app.invocations.model import TransformerField
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.flux.denoise import denoise
from invokeai.backend.flux.inpaint_extension import InpaintExtension
from invokeai.backend.flux.model import Flux
from invokeai.backend.flux.sampling_utils import (
clip_timestep_schedule,
generate_img_ids,
get_noise,
get_schedule,
pack,
unpack,
)
from invokeai.backend.stable_diffusion.diffusers_pipeline import PipelineIntermediateState
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import FLUXConditioningInfo
from invokeai.backend.util.devices import TorchDevice
@invocation(
"flux_denoise",
title="FLUX Denoise",
tags=["image", "flux"],
category="image",
version="1.0.0",
classification=Classification.Prototype,
)
class FluxDenoiseInvocation(BaseInvocation, WithMetadata, WithBoard):
"""Run denoising process with a FLUX transformer model."""
# If latents is provided, this means we are doing image-to-image.
latents: Optional[LatentsField] = InputField(
default=None,
description=FieldDescriptions.latents,
input=Input.Connection,
)
# denoise_mask is used for image-to-image inpainting. Only the masked region is modified.
denoise_mask: Optional[DenoiseMaskField] = InputField(
default=None,
description=FieldDescriptions.denoise_mask,
input=Input.Connection,
)
denoising_start: float = InputField(
default=0.0,
ge=0,
le=1,
description=FieldDescriptions.denoising_start,
)
denoising_end: float = InputField(default=1.0, ge=0, le=1, description=FieldDescriptions.denoising_end)
transformer: TransformerField = InputField(
description=FieldDescriptions.flux_model,
input=Input.Connection,
title="Transformer",
)
positive_text_conditioning: FluxConditioningField = InputField(
description=FieldDescriptions.positive_cond, input=Input.Connection
)
width: int = InputField(default=1024, multiple_of=16, description="Width of the generated image.")
height: int = InputField(default=1024, multiple_of=16, description="Height of the generated image.")
num_steps: int = InputField(
default=4, description="Number of diffusion steps. Recommended values are schnell: 4, dev: 50."
)
guidance: float = InputField(
default=4.0,
description="The guidance strength. Higher values adhere more strictly to the prompt, and will produce less diverse images. FLUX dev only, ignored for schnell.",
)
seed: int = InputField(default=0, description="Randomness seed for reproducibility.")
@torch.no_grad()
def invoke(self, context: InvocationContext) -> LatentsOutput:
latents = self._run_diffusion(context)
latents = latents.detach().to("cpu")
name = context.tensors.save(tensor=latents)
return LatentsOutput.build(latents_name=name, latents=latents, seed=None)
def _run_diffusion(
self,
context: InvocationContext,
):
inference_dtype = torch.bfloat16
# Load the conditioning data.
cond_data = context.conditioning.load(self.positive_text_conditioning.conditioning_name)
assert len(cond_data.conditionings) == 1
flux_conditioning = cond_data.conditionings[0]
assert isinstance(flux_conditioning, FLUXConditioningInfo)
flux_conditioning = flux_conditioning.to(dtype=inference_dtype)
t5_embeddings = flux_conditioning.t5_embeds
clip_embeddings = flux_conditioning.clip_embeds
# Load the input latents, if provided.
init_latents = context.tensors.load(self.latents.latents_name) if self.latents else None
if init_latents is not None:
init_latents = init_latents.to(device=TorchDevice.choose_torch_device(), dtype=inference_dtype)
# Prepare input noise.
noise = get_noise(
num_samples=1,
height=self.height,
width=self.width,
device=TorchDevice.choose_torch_device(),
dtype=inference_dtype,
seed=self.seed,
)
transformer_info = context.models.load(self.transformer.transformer)
is_schnell = "schnell" in transformer_info.config.config_path
# Calculate the timestep schedule.
image_seq_len = noise.shape[-1] * noise.shape[-2] // 4
timesteps = get_schedule(
num_steps=self.num_steps,
image_seq_len=image_seq_len,
shift=not is_schnell,
)
# Clip the timesteps schedule based on denoising_start and denoising_end.
timesteps = clip_timestep_schedule(timesteps, self.denoising_start, self.denoising_end)
# Prepare input latent image.
if init_latents is not None:
# If init_latents is provided, we are doing image-to-image.
if is_schnell:
context.logger.warning(
"Running image-to-image with a FLUX schnell model. This is not recommended. The results are likely "
"to be poor. Consider using a FLUX dev model instead."
)
# Noise the orig_latents by the appropriate amount for the first timestep.
t_0 = timesteps[0]
x = t_0 * noise + (1.0 - t_0) * init_latents
else:
# init_latents are not provided, so we are not doing image-to-image (i.e. we are starting from pure noise).
if self.denoising_start > 1e-5:
raise ValueError("denoising_start should be 0 when initial latents are not provided.")
x = noise
# If len(timesteps) == 1, then short-circuit. We are just noising the input latents, but not taking any
# denoising steps.
if len(timesteps) <= 1:
return x
inpaint_mask = self._prep_inpaint_mask(context, x)
b, _c, h, w = x.shape
img_ids = generate_img_ids(h=h, w=w, batch_size=b, device=x.device, dtype=x.dtype)
bs, t5_seq_len, _ = t5_embeddings.shape
txt_ids = torch.zeros(bs, t5_seq_len, 3, dtype=inference_dtype, device=TorchDevice.choose_torch_device())
# Pack all latent tensors.
init_latents = pack(init_latents) if init_latents is not None else None
inpaint_mask = pack(inpaint_mask) if inpaint_mask is not None else None
noise = pack(noise)
x = pack(x)
# Now that we have 'packed' the latent tensors, verify that we calculated the image_seq_len correctly.
assert image_seq_len == x.shape[1]
# Prepare inpaint extension.
inpaint_extension: InpaintExtension | None = None
if inpaint_mask is not None:
assert init_latents is not None
inpaint_extension = InpaintExtension(
init_latents=init_latents,
inpaint_mask=inpaint_mask,
noise=noise,
)
with transformer_info as transformer:
assert isinstance(transformer, Flux)
x = denoise(
model=transformer,
img=x,
img_ids=img_ids,
txt=t5_embeddings,
txt_ids=txt_ids,
vec=clip_embeddings,
timesteps=timesteps,
step_callback=self._build_step_callback(context),
guidance=self.guidance,
inpaint_extension=inpaint_extension,
)
x = unpack(x.float(), self.height, self.width)
return x
def _prep_inpaint_mask(self, context: InvocationContext, latents: torch.Tensor) -> torch.Tensor | None:
"""Prepare the inpaint mask.
- Loads the mask
- Resizes if necessary
- Casts to same device/dtype as latents
- Expands mask to the same shape as latents so that they line up after 'packing'
Args:
context (InvocationContext): The invocation context, for loading the inpaint mask.
latents (torch.Tensor): A latent image tensor. In 'unpacked' format. Used to determine the target shape,
device, and dtype for the inpaint mask.
Returns:
torch.Tensor | None: Inpaint mask.
"""
if self.denoise_mask is None:
return None
mask = context.tensors.load(self.denoise_mask.mask_name)
_, _, latent_height, latent_width = latents.shape
mask = tv_resize(
img=mask,
size=[latent_height, latent_width],
interpolation=tv_transforms.InterpolationMode.BILINEAR,
antialias=False,
)
mask = mask.to(device=latents.device, dtype=latents.dtype)
# Expand the inpaint mask to the same shape as `latents` so that when we 'pack' `mask` it lines up with
# `latents`.
return mask.expand_as(latents)
def _build_step_callback(self, context: InvocationContext) -> Callable[[PipelineIntermediateState], None]:
def step_callback(state: PipelineIntermediateState) -> None:
state.latents = unpack(state.latents.float(), self.height, self.width).squeeze()
context.util.flux_step_callback(state)
return step_callback

View File

@@ -1,169 +0,0 @@
import torch
from einops import rearrange
from PIL import Image
from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
from invokeai.app.invocations.fields import (
FieldDescriptions,
FluxConditioningField,
Input,
InputField,
WithBoard,
WithMetadata,
)
from invokeai.app.invocations.model import TransformerField, VAEField
from invokeai.app.invocations.primitives import ImageOutput
from invokeai.app.services.session_processor.session_processor_common import CanceledException
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.flux.model import Flux
from invokeai.backend.flux.modules.autoencoder import AutoEncoder
from invokeai.backend.flux.sampling import denoise, get_noise, get_schedule, prepare_latent_img_patches, unpack
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import FLUXConditioningInfo
from invokeai.backend.util.devices import TorchDevice
@invocation(
"flux_text_to_image",
title="FLUX Text to Image",
tags=["image", "flux"],
category="image",
version="1.0.0",
classification=Classification.Prototype,
)
class FluxTextToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
"""Text-to-image generation using a FLUX model."""
transformer: TransformerField = InputField(
description=FieldDescriptions.flux_model,
input=Input.Connection,
title="Transformer",
)
vae: VAEField = InputField(
description=FieldDescriptions.vae,
input=Input.Connection,
)
positive_text_conditioning: FluxConditioningField = InputField(
description=FieldDescriptions.positive_cond, input=Input.Connection
)
width: int = InputField(default=1024, multiple_of=16, description="Width of the generated image.")
height: int = InputField(default=1024, multiple_of=16, description="Height of the generated image.")
num_steps: int = InputField(
default=4, description="Number of diffusion steps. Recommend values are schnell: 4, dev: 50."
)
guidance: float = InputField(
default=4.0,
description="The guidance strength. Higher values adhere more strictly to the prompt, and will produce less diverse images. FLUX dev only, ignored for schnell.",
)
seed: int = InputField(default=0, description="Randomness seed for reproducibility.")
@torch.no_grad()
def invoke(self, context: InvocationContext) -> ImageOutput:
latents = self._run_diffusion(context)
image = self._run_vae_decoding(context, latents)
image_dto = context.images.save(image=image)
return ImageOutput.build(image_dto)
def _run_diffusion(
self,
context: InvocationContext,
):
inference_dtype = torch.bfloat16
# Load the conditioning data.
cond_data = context.conditioning.load(self.positive_text_conditioning.conditioning_name)
assert len(cond_data.conditionings) == 1
flux_conditioning = cond_data.conditionings[0]
assert isinstance(flux_conditioning, FLUXConditioningInfo)
flux_conditioning = flux_conditioning.to(dtype=inference_dtype)
t5_embeddings = flux_conditioning.t5_embeds
clip_embeddings = flux_conditioning.clip_embeds
transformer_info = context.models.load(self.transformer.transformer)
# Prepare input noise.
x = get_noise(
num_samples=1,
height=self.height,
width=self.width,
device=TorchDevice.choose_torch_device(),
dtype=inference_dtype,
seed=self.seed,
)
x, img_ids = prepare_latent_img_patches(x)
is_schnell = "schnell" in transformer_info.config.config_path
timesteps = get_schedule(
num_steps=self.num_steps,
image_seq_len=x.shape[1],
shift=not is_schnell,
)
bs, t5_seq_len, _ = t5_embeddings.shape
txt_ids = torch.zeros(bs, t5_seq_len, 3, dtype=inference_dtype, device=TorchDevice.choose_torch_device())
with transformer_info as transformer:
assert isinstance(transformer, Flux)
def step_callback() -> None:
if context.util.is_canceled():
raise CanceledException
# TODO: Make this look like the image before re-enabling
# latent_image = unpack(img.float(), self.height, self.width)
# latent_image = latent_image.squeeze() # Remove unnecessary dimensions
# flattened_tensor = latent_image.reshape(-1) # Flatten to shape [48*128*128]
# # Create a new tensor of the required shape [255, 255, 3]
# latent_image = flattened_tensor[: 255 * 255 * 3].reshape(255, 255, 3) # Reshape to RGB format
# # Convert to a NumPy array and then to a PIL Image
# image = Image.fromarray(latent_image.cpu().numpy().astype(np.uint8))
# (width, height) = image.size
# width *= 8
# height *= 8
# dataURL = image_to_dataURL(image, image_format="JPEG")
# # TODO: move this whole function to invocation context to properly reference these variables
# context._services.events.emit_invocation_denoise_progress(
# context._data.queue_item,
# context._data.invocation,
# state,
# ProgressImage(dataURL=dataURL, width=width, height=height),
# )
x = denoise(
model=transformer,
img=x,
img_ids=img_ids,
txt=t5_embeddings,
txt_ids=txt_ids,
vec=clip_embeddings,
timesteps=timesteps,
step_callback=step_callback,
guidance=self.guidance,
)
x = unpack(x.float(), self.height, self.width)
return x
def _run_vae_decoding(
self,
context: InvocationContext,
latents: torch.Tensor,
) -> Image.Image:
vae_info = context.models.load(self.vae.vae)
with vae_info as vae:
assert isinstance(vae, AutoEncoder)
latents = latents.to(dtype=TorchDevice.choose_torch_dtype())
img = vae.decode(latents)
img = img.clamp(-1, 1)
img = rearrange(img[0], "c h w -> h w c")
img_pil = Image.fromarray((127.5 * (img + 1.0)).byte().cpu().numpy())
return img_pil

View File

@@ -0,0 +1,60 @@
import torch
from einops import rearrange
from PIL import Image
from invokeai.app.invocations.baseinvocation import BaseInvocation, invocation
from invokeai.app.invocations.fields import (
FieldDescriptions,
Input,
InputField,
LatentsField,
WithBoard,
WithMetadata,
)
from invokeai.app.invocations.model import VAEField
from invokeai.app.invocations.primitives import ImageOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.flux.modules.autoencoder import AutoEncoder
from invokeai.backend.model_manager.load.load_base import LoadedModel
from invokeai.backend.util.devices import TorchDevice
@invocation(
"flux_vae_decode",
title="FLUX Latents to Image",
tags=["latents", "image", "vae", "l2i", "flux"],
category="latents",
version="1.0.0",
)
class FluxVaeDecodeInvocation(BaseInvocation, WithMetadata, WithBoard):
"""Generates an image from latents."""
latents: LatentsField = InputField(
description=FieldDescriptions.latents,
input=Input.Connection,
)
vae: VAEField = InputField(
description=FieldDescriptions.vae,
input=Input.Connection,
)
def _vae_decode(self, vae_info: LoadedModel, latents: torch.Tensor) -> Image.Image:
with vae_info as vae:
assert isinstance(vae, AutoEncoder)
latents = latents.to(device=TorchDevice.choose_torch_device(), dtype=TorchDevice.choose_torch_dtype())
img = vae.decode(latents)
img = img.clamp(-1, 1)
img = rearrange(img[0], "c h w -> h w c") # noqa: F821
img_pil = Image.fromarray((127.5 * (img + 1.0)).byte().cpu().numpy())
return img_pil
@torch.no_grad()
def invoke(self, context: InvocationContext) -> ImageOutput:
latents = context.tensors.load(self.latents.latents_name)
vae_info = context.models.load(self.vae.vae)
image = self._vae_decode(vae_info=vae_info, latents=latents)
TorchDevice.empty_cache()
image_dto = context.images.save(image=image)
return ImageOutput.build(image_dto)

View File

@@ -0,0 +1,67 @@
import einops
import torch
from invokeai.app.invocations.baseinvocation import BaseInvocation, invocation
from invokeai.app.invocations.fields import (
FieldDescriptions,
ImageField,
Input,
InputField,
)
from invokeai.app.invocations.model import VAEField
from invokeai.app.invocations.primitives import LatentsOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.backend.flux.modules.autoencoder import AutoEncoder
from invokeai.backend.model_manager import LoadedModel
from invokeai.backend.stable_diffusion.diffusers_pipeline import image_resized_to_grid_as_tensor
from invokeai.backend.util.devices import TorchDevice
@invocation(
"flux_vae_encode",
title="FLUX Image to Latents",
tags=["latents", "image", "vae", "i2l", "flux"],
category="latents",
version="1.0.0",
)
class FluxVaeEncodeInvocation(BaseInvocation):
"""Encodes an image into latents."""
image: ImageField = InputField(
description="The image to encode.",
)
vae: VAEField = InputField(
description=FieldDescriptions.vae,
input=Input.Connection,
)
@staticmethod
def vae_encode(vae_info: LoadedModel, image_tensor: torch.Tensor) -> torch.Tensor:
# TODO(ryand): Expose seed parameter at the invocation level.
# TODO(ryand): Write a util function for generating random tensors that is consistent across devices / dtypes.
# There's a starting point in get_noise(...), but it needs to be extracted and generalized. This function
# should be used for VAE encode sampling.
generator = torch.Generator(device=TorchDevice.choose_torch_device()).manual_seed(0)
with vae_info as vae:
assert isinstance(vae, AutoEncoder)
image_tensor = image_tensor.to(
device=TorchDevice.choose_torch_device(), dtype=TorchDevice.choose_torch_dtype()
)
latents = vae.encode(image_tensor, sample=True, generator=generator)
return latents
@torch.no_grad()
def invoke(self, context: InvocationContext) -> LatentsOutput:
image = context.images.get_pil(self.image.image_name)
vae_info = context.models.load(self.vae.vae)
image_tensor = image_resized_to_grid_as_tensor(image.convert("RGB"))
if image_tensor.dim() == 3:
image_tensor = einops.rearrange(image_tensor, "c h w -> 1 c h w")
latents = self.vae_encode(vae_info=vae_info, image_tensor=image_tensor)
latents = latents.to("cpu")
name = context.tensors.save(tensor=latents)
return LatentsOutput.build(latents_name=name, latents=latents, seed=None)

View File

@@ -126,7 +126,7 @@ class ImageMaskToTensorInvocation(BaseInvocation, WithMetadata):
title="Tensor Mask to Image",
tags=["mask"],
category="mask",
version="1.0.0",
version="1.1.0",
)
class MaskTensorToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
"""Convert a mask tensor to an image."""
@@ -135,6 +135,11 @@ class MaskTensorToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
def invoke(self, context: InvocationContext) -> ImageOutput:
mask = context.tensors.load(self.mask.tensor_name)
# Squeeze the channel dimension if it exists.
if mask.dim() == 3:
mask = mask.squeeze(0)
# Ensure that the mask is binary.
if mask.dtype != torch.bool:
mask = mask > 0.5

View File

@@ -103,7 +103,7 @@ class HFModelSource(StringLikeSource):
if self.variant:
base += f":{self.variant or ''}"
if self.subfolder:
base += f":{self.subfolder}"
base += f"::{self.subfolder.as_posix()}"
return base

View File

@@ -14,7 +14,7 @@ from invokeai.app.services.image_records.image_records_common import ImageCatego
from invokeai.app.services.images.images_common import ImageDTO
from invokeai.app.services.invocation_services import InvocationServices
from invokeai.app.services.model_records.model_records_base import UnknownModelException
from invokeai.app.util.step_callback import stable_diffusion_step_callback
from invokeai.app.util.step_callback import flux_step_callback, stable_diffusion_step_callback
from invokeai.backend.model_manager.config import (
AnyModel,
AnyModelConfig,
@@ -557,6 +557,24 @@ class UtilInterface(InvocationContextInterface):
is_canceled=self.is_canceled,
)
def flux_step_callback(self, intermediate_state: PipelineIntermediateState) -> None:
"""
The step callback emits a progress event with the current step, the total number of
steps, a preview image, and some other internal metadata.
This should be called after each denoising step.
Args:
intermediate_state: The intermediate state of the diffusion pipeline.
"""
flux_step_callback(
context_data=self._data,
intermediate_state=intermediate_state,
events=self._services.events,
is_canceled=self.is_canceled,
)
class InvocationContext:
"""Provides access to various services and data for the current invocation.

View File

@@ -0,0 +1,407 @@
{
"name": "FLUX Image to Image",
"author": "InvokeAI",
"description": "A simple image-to-image workflow using a FLUX dev model. ",
"version": "1.0.4",
"contact": "",
"tags": "image2image, flux, image-to-image",
"notes": "Prerequisite model downloads: T5 Encoder, CLIP-L Encoder, and FLUX VAE. Quantized and un-quantized versions can be found in the starter models tab within your Model Manager. We recommend using FLUX dev models for image-to-image workflows. The image-to-image performance with FLUX schnell models is poor.",
"exposedFields": [
{
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "model"
},
{
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "t5_encoder_model"
},
{
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "clip_embed_model"
},
{
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "vae_model"
},
{
"nodeId": "ace0258f-67d7-4eee-a218-6fff27065214",
"fieldName": "denoising_start"
},
{
"nodeId": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"fieldName": "prompt"
},
{
"nodeId": "ace0258f-67d7-4eee-a218-6fff27065214",
"fieldName": "num_steps"
}
],
"meta": {
"version": "3.0.0",
"category": "default"
},
"nodes": [
{
"id": "2981a67c-480f-4237-9384-26b68dbf912b",
"type": "invocation",
"data": {
"id": "2981a67c-480f-4237-9384-26b68dbf912b",
"type": "flux_vae_encode",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"inputs": {
"image": {
"name": "image",
"label": "",
"value": {
"image_name": "8a5c62aa-9335-45d2-9c71-89af9fc1f8d4.png"
}
},
"vae": {
"name": "vae",
"label": ""
}
}
},
"position": {
"x": 732.7680166609682,
"y": -24.37398171806909
}
},
{
"id": "ace0258f-67d7-4eee-a218-6fff27065214",
"type": "invocation",
"data": {
"id": "ace0258f-67d7-4eee-a218-6fff27065214",
"type": "flux_denoise",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"inputs": {
"board": {
"name": "board",
"label": ""
},
"metadata": {
"name": "metadata",
"label": ""
},
"latents": {
"name": "latents",
"label": ""
},
"denoise_mask": {
"name": "denoise_mask",
"label": ""
},
"denoising_start": {
"name": "denoising_start",
"label": "",
"value": 0.04
},
"denoising_end": {
"name": "denoising_end",
"label": "",
"value": 1
},
"transformer": {
"name": "transformer",
"label": ""
},
"positive_text_conditioning": {
"name": "positive_text_conditioning",
"label": ""
},
"width": {
"name": "width",
"label": "",
"value": 1024
},
"height": {
"name": "height",
"label": "",
"value": 1024
},
"num_steps": {
"name": "num_steps",
"label": "Steps (Recommend 30 for Dev, 4 for Schnell)",
"value": 30
},
"guidance": {
"name": "guidance",
"label": "",
"value": 4
},
"seed": {
"name": "seed",
"label": "",
"value": 0
}
}
},
"position": {
"x": 1182.8836633018684,
"y": -251.38882958913183
}
},
{
"id": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"type": "invocation",
"data": {
"id": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"type": "flux_vae_decode",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": false,
"useCache": true,
"inputs": {
"board": {
"name": "board",
"label": ""
},
"metadata": {
"name": "metadata",
"label": ""
},
"latents": {
"name": "latents",
"label": ""
},
"vae": {
"name": "vae",
"label": ""
}
}
},
"position": {
"x": 1575.5797431839133,
"y": -209.00150975507415
}
},
{
"id": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"type": "invocation",
"data": {
"id": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"type": "flux_model_loader",
"version": "1.0.4",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": false,
"inputs": {
"model": {
"name": "model",
"label": "Model (dev variant recommended for Image-to-Image)"
},
"t5_encoder_model": {
"name": "t5_encoder_model",
"label": ""
},
"clip_embed_model": {
"name": "clip_embed_model",
"label": "",
"value": {
"key": "fa23a584-b623-415d-832a-21b5098ff1a1",
"hash": "blake3:17c19f0ef941c3b7609a9c94a659ca5364de0be364a91d4179f0e39ba17c3b70",
"name": "clip-vit-large-patch14",
"base": "any",
"type": "clip_embed"
}
},
"vae_model": {
"name": "vae_model",
"label": "",
"value": {
"key": "74fc82ba-c0a8-479d-a890-2126f82da758",
"hash": "blake3:ce21cb76364aa6e2421311cf4a4b5eb052a76c4f1cd207b50703d8978198a068",
"name": "FLUX.1-schnell_ae",
"base": "flux",
"type": "vae"
}
}
}
},
"position": {
"x": 328.1809894659957,
"y": -90.2241133566946
}
},
{
"id": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"type": "invocation",
"data": {
"id": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"type": "flux_text_encoder",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"inputs": {
"clip": {
"name": "clip",
"label": ""
},
"t5_encoder": {
"name": "t5_encoder",
"label": ""
},
"t5_max_seq_len": {
"name": "t5_max_seq_len",
"label": "T5 Max Seq Len",
"value": 256
},
"prompt": {
"name": "prompt",
"label": "",
"value": "a cat wearing a birthday hat"
}
}
},
"position": {
"x": 745.8823365057267,
"y": -299.60249175851914
}
},
{
"id": "4754c534-a5f3-4ad0-9382-7887985e668c",
"type": "invocation",
"data": {
"id": "4754c534-a5f3-4ad0-9382-7887985e668c",
"type": "rand_int",
"version": "1.0.1",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": false,
"inputs": {
"low": {
"name": "low",
"label": "",
"value": 0
},
"high": {
"name": "high",
"label": "",
"value": 2147483647
}
}
},
"position": {
"x": 725.834098928012,
"y": 496.2710031089931
}
}
],
"edges": [
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bheight-ace0258f-67d7-4eee-a218-6fff27065214height",
"type": "default",
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "height",
"targetHandle": "height"
},
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912bwidth-ace0258f-67d7-4eee-a218-6fff27065214width",
"type": "default",
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "width",
"targetHandle": "width"
},
{
"id": "reactflow__edge-2981a67c-480f-4237-9384-26b68dbf912blatents-ace0258f-67d7-4eee-a218-6fff27065214latents",
"type": "default",
"source": "2981a67c-480f-4237-9384-26b68dbf912b",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "latents",
"targetHandle": "latents"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-2981a67c-480f-4237-9384-26b68dbf912bvae",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "2981a67c-480f-4237-9384-26b68dbf912b",
"sourceHandle": "vae",
"targetHandle": "vae"
},
{
"id": "reactflow__edge-ace0258f-67d7-4eee-a218-6fff27065214latents-7e5172eb-48c1-44db-a770-8fd83e1435d1latents",
"type": "default",
"source": "ace0258f-67d7-4eee-a218-6fff27065214",
"target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"sourceHandle": "latents",
"targetHandle": "latents"
},
{
"id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-ace0258f-67d7-4eee-a218-6fff27065214seed",
"type": "default",
"source": "4754c534-a5f3-4ad0-9382-7887985e668c",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "value",
"targetHandle": "seed"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-ace0258f-67d7-4eee-a218-6fff27065214transformer",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "transformer",
"targetHandle": "transformer"
},
{
"id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-ace0258f-67d7-4eee-a218-6fff27065214positive_text_conditioning",
"type": "default",
"source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"target": "ace0258f-67d7-4eee-a218-6fff27065214",
"sourceHandle": "conditioning",
"targetHandle": "positive_text_conditioning"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-7e5172eb-48c1-44db-a770-8fd83e1435d1vae",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"sourceHandle": "vae",
"targetHandle": "vae"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90max_seq_len-01f674f8-b3d1-4df1-acac-6cb8e0bfb63ct5_max_seq_len",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"sourceHandle": "max_seq_len",
"targetHandle": "t5_max_seq_len"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90t5_encoder-01f674f8-b3d1-4df1-acac-6cb8e0bfb63ct5_encoder",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"sourceHandle": "t5_encoder",
"targetHandle": "t5_encoder"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90clip-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cclip",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"sourceHandle": "clip",
"targetHandle": "clip"
}
]
}

View File

@@ -1,7 +1,7 @@
{
"name": "FLUX Text to Image",
"author": "InvokeAI",
"description": "A simple text-to-image workflow using FLUX dev or schnell models. Prerequisite model downloads: T5 Encoder, CLIP-L Encoder, and FLUX VAE. Quantized and un-quantized versions can be found in the starter models tab within your Model Manager. We recommend 4 steps for FLUX schnell models and 30 steps for FLUX dev models.",
"description": "A simple text-to-image workflow using FLUX dev or schnell models.",
"version": "1.0.4",
"contact": "",
"tags": "text2image, flux",
@@ -11,17 +11,25 @@
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "model"
},
{
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "t5_encoder_model"
},
{
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "clip_embed_model"
},
{
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "vae_model"
},
{
"nodeId": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"fieldName": "prompt"
},
{
"nodeId": "159bdf1b-79e7-4174-b86e-d40e646964c8",
"nodeId": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"fieldName": "num_steps"
},
{
"nodeId": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"fieldName": "t5_encoder_model"
}
],
"meta": {
@@ -29,6 +37,121 @@
"category": "default"
},
"nodes": [
{
"id": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"type": "invocation",
"data": {
"id": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"type": "flux_denoise",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": true,
"useCache": true,
"inputs": {
"board": {
"name": "board",
"label": ""
},
"metadata": {
"name": "metadata",
"label": ""
},
"latents": {
"name": "latents",
"label": ""
},
"denoise_mask": {
"name": "denoise_mask",
"label": ""
},
"denoising_start": {
"name": "denoising_start",
"label": "",
"value": 0
},
"denoising_end": {
"name": "denoising_end",
"label": "",
"value": 1
},
"transformer": {
"name": "transformer",
"label": ""
},
"positive_text_conditioning": {
"name": "positive_text_conditioning",
"label": ""
},
"width": {
"name": "width",
"label": "",
"value": 1024
},
"height": {
"name": "height",
"label": "",
"value": 1024
},
"num_steps": {
"name": "num_steps",
"label": "Steps (Recommend 30 for Dev, 4 for Schnell)",
"value": 30
},
"guidance": {
"name": "guidance",
"label": "",
"value": 4
},
"seed": {
"name": "seed",
"label": "",
"value": 0
}
}
},
"position": {
"x": 1186.1868226120378,
"y": -214.9459927686657
}
},
{
"id": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"type": "invocation",
"data": {
"id": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"type": "flux_vae_decode",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": false,
"useCache": true,
"inputs": {
"board": {
"name": "board",
"label": ""
},
"metadata": {
"name": "metadata",
"label": ""
},
"latents": {
"name": "latents",
"label": ""
},
"vae": {
"name": "vae",
"label": ""
}
}
},
"position": {
"x": 1575.5797431839133,
"y": -209.00150975507415
}
},
{
"id": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"type": "invocation",
@@ -99,8 +222,8 @@
}
},
"position": {
"x": 824.1970602278849,
"y": 146.98251001061735
"x": 778.4899149328337,
"y": -100.36469216659502
}
},
{
@@ -129,77 +252,52 @@
}
},
"position": {
"x": 822.9899179655476,
"y": 360.9657214885052
}
},
{
"id": "159bdf1b-79e7-4174-b86e-d40e646964c8",
"type": "invocation",
"data": {
"id": "159bdf1b-79e7-4174-b86e-d40e646964c8",
"type": "flux_text_to_image",
"version": "1.0.0",
"label": "",
"notes": "",
"isOpen": true,
"isIntermediate": false,
"useCache": true,
"inputs": {
"board": {
"name": "board",
"label": ""
},
"metadata": {
"name": "metadata",
"label": ""
},
"transformer": {
"name": "transformer",
"label": ""
},
"vae": {
"name": "vae",
"label": ""
},
"positive_text_conditioning": {
"name": "positive_text_conditioning",
"label": ""
},
"width": {
"name": "width",
"label": "",
"value": 1024
},
"height": {
"name": "height",
"label": "",
"value": 1024
},
"num_steps": {
"name": "num_steps",
"label": "Steps (Recommend 30 for Dev, 4 for Schnell)",
"value": 30
},
"guidance": {
"name": "guidance",
"label": "",
"value": 4
},
"seed": {
"name": "seed",
"label": "",
"value": 0
}
}
},
"position": {
"x": 1216.3900791301849,
"y": 5.500841807102248
"x": 800.9667463219505,
"y": 285.8297267547506
}
}
],
"edges": [
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-4fe24f07-f906-4f55-ab2c-9beee56ef5bdtransformer",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"sourceHandle": "transformer",
"targetHandle": "transformer"
},
{
"id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-4fe24f07-f906-4f55-ab2c-9beee56ef5bdpositive_text_conditioning",
"type": "default",
"source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"target": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"sourceHandle": "conditioning",
"targetHandle": "positive_text_conditioning"
},
{
"id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-4fe24f07-f906-4f55-ab2c-9beee56ef5bdseed",
"type": "default",
"source": "4754c534-a5f3-4ad0-9382-7887985e668c",
"target": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"sourceHandle": "value",
"targetHandle": "seed"
},
{
"id": "reactflow__edge-4fe24f07-f906-4f55-ab2c-9beee56ef5bdlatents-7e5172eb-48c1-44db-a770-8fd83e1435d1latents",
"type": "default",
"source": "4fe24f07-f906-4f55-ab2c-9beee56ef5bd",
"target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"sourceHandle": "latents",
"targetHandle": "latents"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-7e5172eb-48c1-44db-a770-8fd83e1435d1vae",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "7e5172eb-48c1-44db-a770-8fd83e1435d1",
"sourceHandle": "vae",
"targetHandle": "vae"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90max_seq_len-01f674f8-b3d1-4df1-acac-6cb8e0bfb63ct5_max_seq_len",
"type": "default",
@@ -208,14 +306,6 @@
"sourceHandle": "max_seq_len",
"targetHandle": "t5_max_seq_len"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90vae-159bdf1b-79e7-4174-b86e-d40e646964c8vae",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "159bdf1b-79e7-4174-b86e-d40e646964c8",
"sourceHandle": "vae",
"targetHandle": "vae"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90t5_encoder-01f674f8-b3d1-4df1-acac-6cb8e0bfb63ct5_encoder",
"type": "default",
@@ -231,30 +321,6 @@
"target": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"sourceHandle": "clip",
"targetHandle": "clip"
},
{
"id": "reactflow__edge-f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90transformer-159bdf1b-79e7-4174-b86e-d40e646964c8transformer",
"type": "default",
"source": "f8d9d7c8-9ed7-4bd7-9e42-ab0e89bfac90",
"target": "159bdf1b-79e7-4174-b86e-d40e646964c8",
"sourceHandle": "transformer",
"targetHandle": "transformer"
},
{
"id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-159bdf1b-79e7-4174-b86e-d40e646964c8positive_text_conditioning",
"type": "default",
"source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
"target": "159bdf1b-79e7-4174-b86e-d40e646964c8",
"sourceHandle": "conditioning",
"targetHandle": "positive_text_conditioning"
},
{
"id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-159bdf1b-79e7-4174-b86e-d40e646964c8seed",
"type": "default",
"source": "4754c534-a5f3-4ad0-9382-7887985e668c",
"target": "159bdf1b-79e7-4174-b86e-d40e646964c8",
"sourceHandle": "value",
"targetHandle": "seed"
}
]
}

View File

@@ -38,6 +38,25 @@ SD1_5_LATENT_RGB_FACTORS = [
[-0.1307, -0.1874, -0.7445], # L4
]
FLUX_LATENT_RGB_FACTORS = [
[-0.0412, 0.0149, 0.0521],
[0.0056, 0.0291, 0.0768],
[0.0342, -0.0681, -0.0427],
[-0.0258, 0.0092, 0.0463],
[0.0863, 0.0784, 0.0547],
[-0.0017, 0.0402, 0.0158],
[0.0501, 0.1058, 0.1152],
[-0.0209, -0.0218, -0.0329],
[-0.0314, 0.0083, 0.0896],
[0.0851, 0.0665, -0.0472],
[-0.0534, 0.0238, -0.0024],
[0.0452, -0.0026, 0.0048],
[0.0892, 0.0831, 0.0881],
[-0.1117, -0.0304, -0.0789],
[0.0027, -0.0479, -0.0043],
[-0.1146, -0.0827, -0.0598],
]
def sample_to_lowres_estimated_image(
samples: torch.Tensor, latent_rgb_factors: torch.Tensor, smooth_matrix: Optional[torch.Tensor] = None
@@ -94,3 +113,32 @@ def stable_diffusion_step_callback(
intermediate_state,
ProgressImage(dataURL=dataURL, width=width, height=height),
)
def flux_step_callback(
context_data: "InvocationContextData",
intermediate_state: PipelineIntermediateState,
events: "EventServiceBase",
is_canceled: Callable[[], bool],
) -> None:
if is_canceled():
raise CanceledException
sample = intermediate_state.latents
latent_rgb_factors = torch.tensor(FLUX_LATENT_RGB_FACTORS, dtype=sample.dtype, device=sample.device)
latent_image_perm = sample.permute(1, 2, 0).to(dtype=sample.dtype, device=sample.device)
latent_image = latent_image_perm @ latent_rgb_factors
latents_ubyte = (
((latent_image + 1) / 2).clamp(0, 1).mul(0xFF) # change scale from -1..1 to 0..1 # to 0..255
).to(device="cpu", dtype=torch.uint8)
image = Image.fromarray(latents_ubyte.cpu().numpy())
(width, height) = image.size
width *= 8
height *= 8
dataURL = image_to_dataURL(image, image_format="JPEG")
events.emit_invocation_denoise_progress(
context_data.queue_item,
context_data.invocation,
intermediate_state,
ProgressImage(dataURL=dataURL, width=width, height=height),
)

View File

@@ -0,0 +1,56 @@
from typing import Callable
import torch
from tqdm import tqdm
from invokeai.backend.flux.inpaint_extension import InpaintExtension
from invokeai.backend.flux.model import Flux
from invokeai.backend.stable_diffusion.diffusers_pipeline import PipelineIntermediateState
def denoise(
model: Flux,
# model input
img: torch.Tensor,
img_ids: torch.Tensor,
txt: torch.Tensor,
txt_ids: torch.Tensor,
vec: torch.Tensor,
# sampling parameters
timesteps: list[float],
step_callback: Callable[[PipelineIntermediateState], None],
guidance: float,
inpaint_extension: InpaintExtension | None,
):
step = 0
# guidance_vec is ignored for schnell.
guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)
for t_curr, t_prev in tqdm(list(zip(timesteps[:-1], timesteps[1:], strict=True))):
t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
pred = model(
img=img,
img_ids=img_ids,
txt=txt,
txt_ids=txt_ids,
y=vec,
timesteps=t_vec,
guidance=guidance_vec,
)
preview_img = img - t_curr * pred
img = img + (t_prev - t_curr) * pred
if inpaint_extension is not None:
img = inpaint_extension.merge_intermediate_latents_with_init_latents(img, t_prev)
step_callback(
PipelineIntermediateState(
step=step,
order=1,
total_steps=len(timesteps),
timestep=int(t_curr),
latents=preview_img,
),
)
step += 1
return img

View File

@@ -0,0 +1,35 @@
import torch
class InpaintExtension:
"""A class for managing inpainting with FLUX."""
def __init__(self, init_latents: torch.Tensor, inpaint_mask: torch.Tensor, noise: torch.Tensor):
"""Initialize InpaintExtension.
Args:
init_latents (torch.Tensor): The initial latents (i.e. un-noised at timestep 0). In 'packed' format.
inpaint_mask (torch.Tensor): A mask specifying which elements to inpaint. Range [0, 1]. Values of 1 will be
re-generated. Values of 0 will remain unchanged. Values between 0 and 1 can be used to blend the
inpainted region with the background. In 'packed' format.
noise (torch.Tensor): The noise tensor used to noise the init_latents. In 'packed' format.
"""
assert init_latents.shape == inpaint_mask.shape == noise.shape
self._init_latents = init_latents
self._inpaint_mask = inpaint_mask
self._noise = noise
def merge_intermediate_latents_with_init_latents(
self, intermediate_latents: torch.Tensor, timestep: float
) -> torch.Tensor:
"""Merge the intermediate latents with the initial latents for the current timestep using the inpaint mask. I.e.
update the intermediate latents to keep the regions that are not being inpainted on the correct noise
trajectory.
This function should be called after each denoising step.
"""
# Noise the init latents for the current timestep.
noised_init_latents = self._noise * timestep + (1.0 - timestep) * self._init_latents
# Merge the intermediate latents with the noised_init_latents using the inpaint_mask.
return intermediate_latents * self._inpaint_mask + noised_init_latents * (1.0 - self._inpaint_mask)

View File

@@ -258,16 +258,17 @@ class Decoder(nn.Module):
class DiagonalGaussian(nn.Module):
def __init__(self, sample: bool = True, chunk_dim: int = 1):
def __init__(self, chunk_dim: int = 1):
super().__init__()
self.sample = sample
self.chunk_dim = chunk_dim
def forward(self, z: Tensor) -> Tensor:
def forward(self, z: Tensor, sample: bool = True, generator: torch.Generator | None = None) -> Tensor:
mean, logvar = torch.chunk(z, 2, dim=self.chunk_dim)
if self.sample:
if sample:
std = torch.exp(0.5 * logvar)
return mean + std * torch.randn_like(mean)
# Unfortunately, torch.randn_like(...) does not accept a generator argument at the time of writing, so we
# have to use torch.randn(...) instead.
return mean + std * torch.randn(size=mean.size(), generator=generator, dtype=mean.dtype, device=mean.device)
else:
return mean
@@ -297,8 +298,21 @@ class AutoEncoder(nn.Module):
self.scale_factor = params.scale_factor
self.shift_factor = params.shift_factor
def encode(self, x: Tensor) -> Tensor:
z = self.reg(self.encoder(x))
def encode(self, x: Tensor, sample: bool = True, generator: torch.Generator | None = None) -> Tensor:
"""Run VAE encoding on input tensor x.
Args:
x (Tensor): Input image tensor. Shape: (batch_size, in_channels, height, width).
sample (bool, optional): If True, sample from the encoded distribution, else, return the distribution mean.
Defaults to True.
generator (torch.Generator | None, optional): Optional random number generator for reproducibility.
Defaults to None.
Returns:
Tensor: Encoded latent tensor. Shape: (batch_size, z_channels, latent_height, latent_width).
"""
z = self.reg(self.encoder(x), sample=sample, generator=generator)
z = self.scale_factor * (z - self.shift_factor)
return z

View File

@@ -1,167 +0,0 @@
# Initially pulled from https://github.com/black-forest-labs/flux
import math
from typing import Callable
import torch
from einops import rearrange, repeat
from torch import Tensor
from tqdm import tqdm
from invokeai.backend.flux.model import Flux
from invokeai.backend.flux.modules.conditioner import HFEncoder
def get_noise(
num_samples: int,
height: int,
width: int,
device: torch.device,
dtype: torch.dtype,
seed: int,
):
# We always generate noise on the same device and dtype then cast to ensure consistency across devices/dtypes.
rand_device = "cpu"
rand_dtype = torch.float16
return torch.randn(
num_samples,
16,
# allow for packing
2 * math.ceil(height / 16),
2 * math.ceil(width / 16),
device=rand_device,
dtype=rand_dtype,
generator=torch.Generator(device=rand_device).manual_seed(seed),
).to(device=device, dtype=dtype)
def prepare(t5: HFEncoder, clip: HFEncoder, img: Tensor, prompt: str | list[str]) -> dict[str, Tensor]:
bs, c, h, w = img.shape
if bs == 1 and not isinstance(prompt, str):
bs = len(prompt)
img = rearrange(img, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
if img.shape[0] == 1 and bs > 1:
img = repeat(img, "1 ... -> bs ...", bs=bs)
img_ids = torch.zeros(h // 2, w // 2, 3)
img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2)[:, None]
img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2)[None, :]
img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
if isinstance(prompt, str):
prompt = [prompt]
txt = t5(prompt)
if txt.shape[0] == 1 and bs > 1:
txt = repeat(txt, "1 ... -> bs ...", bs=bs)
txt_ids = torch.zeros(bs, txt.shape[1], 3)
vec = clip(prompt)
if vec.shape[0] == 1 and bs > 1:
vec = repeat(vec, "1 ... -> bs ...", bs=bs)
return {
"img": img,
"img_ids": img_ids.to(img.device),
"txt": txt.to(img.device),
"txt_ids": txt_ids.to(img.device),
"vec": vec.to(img.device),
}
def time_shift(mu: float, sigma: float, t: Tensor):
return math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma)
def get_lin_function(x1: float = 256, y1: float = 0.5, x2: float = 4096, y2: float = 1.15) -> Callable[[float], float]:
m = (y2 - y1) / (x2 - x1)
b = y1 - m * x1
return lambda x: m * x + b
def get_schedule(
num_steps: int,
image_seq_len: int,
base_shift: float = 0.5,
max_shift: float = 1.15,
shift: bool = True,
) -> list[float]:
# extra step for zero
timesteps = torch.linspace(1, 0, num_steps + 1)
# shifting the schedule to favor high timesteps for higher signal images
if shift:
# eastimate mu based on linear estimation between two points
mu = get_lin_function(y1=base_shift, y2=max_shift)(image_seq_len)
timesteps = time_shift(mu, 1.0, timesteps)
return timesteps.tolist()
def denoise(
model: Flux,
# model input
img: Tensor,
img_ids: Tensor,
txt: Tensor,
txt_ids: Tensor,
vec: Tensor,
# sampling parameters
timesteps: list[float],
step_callback: Callable[[], None],
guidance: float = 4.0,
):
# guidance_vec is ignored for schnell.
guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)
for t_curr, t_prev in tqdm(list(zip(timesteps[:-1], timesteps[1:], strict=True))):
t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
pred = model(
img=img,
img_ids=img_ids,
txt=txt,
txt_ids=txt_ids,
y=vec,
timesteps=t_vec,
guidance=guidance_vec,
)
img = img + (t_prev - t_curr) * pred
step_callback()
return img
def unpack(x: Tensor, height: int, width: int) -> Tensor:
return rearrange(
x,
"b (h w) (c ph pw) -> b c (h ph) (w pw)",
h=math.ceil(height / 16),
w=math.ceil(width / 16),
ph=2,
pw=2,
)
def prepare_latent_img_patches(latent_img: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
"""Convert an input image in latent space to patches for diffusion.
This implementation was extracted from:
https://github.com/black-forest-labs/flux/blob/c00d7c60b085fce8058b9df845e036090873f2ce/src/flux/sampling.py#L32
Returns:
tuple[Tensor, Tensor]: (img, img_ids), as defined in the original flux repo.
"""
bs, c, h, w = latent_img.shape
# Pixel unshuffle with a scale of 2, and flatten the height/width dimensions to get an array of patches.
img = rearrange(latent_img, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
if img.shape[0] == 1 and bs > 1:
img = repeat(img, "1 ... -> bs ...", bs=bs)
# Generate patch position ids.
img_ids = torch.zeros(h // 2, w // 2, 3, device=img.device, dtype=img.dtype)
img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2, device=img.device, dtype=img.dtype)[:, None]
img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2, device=img.device, dtype=img.dtype)[None, :]
img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
return img, img_ids

View File

@@ -0,0 +1,135 @@
# Initially pulled from https://github.com/black-forest-labs/flux
import math
from typing import Callable
import torch
from einops import rearrange, repeat
def get_noise(
num_samples: int,
height: int,
width: int,
device: torch.device,
dtype: torch.dtype,
seed: int,
):
# We always generate noise on the same device and dtype then cast to ensure consistency across devices/dtypes.
rand_device = "cpu"
rand_dtype = torch.float16
return torch.randn(
num_samples,
16,
# allow for packing
2 * math.ceil(height / 16),
2 * math.ceil(width / 16),
device=rand_device,
dtype=rand_dtype,
generator=torch.Generator(device=rand_device).manual_seed(seed),
).to(device=device, dtype=dtype)
def time_shift(mu: float, sigma: float, t: torch.Tensor) -> torch.Tensor:
return math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma)
def get_lin_function(x1: float = 256, y1: float = 0.5, x2: float = 4096, y2: float = 1.15) -> Callable[[float], float]:
m = (y2 - y1) / (x2 - x1)
b = y1 - m * x1
return lambda x: m * x + b
def get_schedule(
num_steps: int,
image_seq_len: int,
base_shift: float = 0.5,
max_shift: float = 1.15,
shift: bool = True,
) -> list[float]:
# extra step for zero
timesteps = torch.linspace(1, 0, num_steps + 1)
# shifting the schedule to favor high timesteps for higher signal images
if shift:
# estimate mu based on linear estimation between two points
mu = get_lin_function(y1=base_shift, y2=max_shift)(image_seq_len)
timesteps = time_shift(mu, 1.0, timesteps)
return timesteps.tolist()
def _find_last_index_ge_val(timesteps: list[float], val: float, eps: float = 1e-6) -> int:
"""Find the last index in timesteps that is >= val.
We use epsilon-close equality to avoid potential floating point errors.
"""
idx = len(list(filter(lambda t: t >= (val - eps), timesteps))) - 1
assert idx >= 0
return idx
def clip_timestep_schedule(timesteps: list[float], denoising_start: float, denoising_end: float) -> list[float]:
"""Clip the timestep schedule to the denoising range.
Args:
timesteps (list[float]): The original timestep schedule: [1.0, ..., 0.0].
denoising_start (float): A value in [0, 1] specifying the start of the denoising process. E.g. a value of 0.2
would mean that the denoising process start at the last timestep in the schedule >= 0.8.
denoising_end (float): A value in [0, 1] specifying the end of the denoising process. E.g. a value of 0.8 would
mean that the denoising process end at the last timestep in the schedule >= 0.2.
Returns:
list[float]: The clipped timestep schedule.
"""
assert 0.0 <= denoising_start <= 1.0
assert 0.0 <= denoising_end <= 1.0
assert denoising_start <= denoising_end
t_start_val = 1.0 - denoising_start
t_end_val = 1.0 - denoising_end
t_start_idx = _find_last_index_ge_val(timesteps, t_start_val)
t_end_idx = _find_last_index_ge_val(timesteps, t_end_val)
clipped_timesteps = timesteps[t_start_idx : t_end_idx + 1]
return clipped_timesteps
def unpack(x: torch.Tensor, height: int, width: int) -> torch.Tensor:
"""Unpack flat array of patch embeddings to latent image."""
return rearrange(
x,
"b (h w) (c ph pw) -> b c (h ph) (w pw)",
h=math.ceil(height / 16),
w=math.ceil(width / 16),
ph=2,
pw=2,
)
def pack(x: torch.Tensor) -> torch.Tensor:
"""Pack latent image to flattented array of patch embeddings."""
# Pixel unshuffle with a scale of 2, and flatten the height/width dimensions to get an array of patches.
return rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
def generate_img_ids(h: int, w: int, batch_size: int, device: torch.device, dtype: torch.dtype) -> torch.Tensor:
"""Generate tensor of image position ids.
Args:
h (int): Height of image in latent space.
w (int): Width of image in latent space.
batch_size (int): Batch size.
device (torch.device): Device.
dtype (torch.dtype): dtype.
Returns:
torch.Tensor: Image position ids.
"""
img_ids = torch.zeros(h // 2, w // 2, 3, device=device, dtype=dtype)
img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2, device=device, dtype=dtype)[:, None]
img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2, device=device, dtype=dtype)[None, :]
img_ids = repeat(img_ids, "h w c -> b (h w) c", b=batch_size)
return img_ids

View File

@@ -66,8 +66,9 @@ class ModelLoader(ModelLoaderBase):
return (model_base / config.path).resolve()
def _load_and_cache(self, config: AnyModelConfig, submodel_type: Optional[SubModelType] = None) -> ModelLockerBase:
stats_name = ":".join([config.base, config.type, config.name, (submodel_type or "")])
try:
return self._ram_cache.get(config.key, submodel_type)
return self._ram_cache.get(config.key, submodel_type, stats_name=stats_name)
except IndexError:
pass
@@ -84,7 +85,7 @@ class ModelLoader(ModelLoaderBase):
return self._ram_cache.get(
key=config.key,
submodel_type=submodel_type,
stats_name=":".join([config.base, config.type, config.name, (submodel_type or "")]),
stats_name=stats_name,
)
def get_size_fs(

View File

@@ -128,7 +128,24 @@ class ModelCacheBase(ABC, Generic[T]):
@property
@abstractmethod
def max_cache_size(self) -> float:
"""Return true if the cache is configured to lazily offload models in VRAM."""
"""Return the maximum size the RAM cache can grow to."""
pass
@max_cache_size.setter
@abstractmethod
def max_cache_size(self, value: float) -> None:
"""Set the cap on vram cache size."""
@property
@abstractmethod
def max_vram_cache_size(self) -> float:
"""Return the maximum size the VRAM cache can grow to."""
pass
@max_vram_cache_size.setter
@abstractmethod
def max_vram_cache_size(self, value: float) -> float:
"""Set the maximum size the VRAM cache can grow to."""
pass
@abstractmethod

View File

@@ -70,6 +70,7 @@ class ModelCache(ModelCacheBase[AnyModel]):
max_vram_cache_size: float,
execution_device: torch.device = torch.device("cuda"),
storage_device: torch.device = torch.device("cpu"),
precision: torch.dtype = torch.float16,
lazy_offloading: bool = True,
log_memory_usage: bool = False,
logger: Optional[Logger] = None,
@@ -81,11 +82,13 @@ class ModelCache(ModelCacheBase[AnyModel]):
:param max_vram_cache_size: Maximum size of the execution_device cache in GBs.
:param execution_device: Torch device to load active model into [torch.device('cuda')]
:param storage_device: Torch device to save inactive model in [torch.device('cpu')]
:param lazy_offloading: Keep model in VRAM until another model needs to be loaded.
:param precision: Precision for loaded models [torch.float16]
:param lazy_offloading: Keep model in VRAM until another model needs to be loaded
:param log_memory_usage: If True, a memory snapshot will be captured before and after every model cache
operation, and the result will be logged (at debug level). There is a time cost to capturing the memory
snapshots, so it is recommended to disable this feature unless you are actively inspecting the model cache's
behaviour.
:param logger: InvokeAILogger to use (otherwise creates one)
"""
# allow lazy offloading only when vram cache enabled
self._lazy_offloading = lazy_offloading and max_vram_cache_size > 0
@@ -130,6 +133,16 @@ class ModelCache(ModelCacheBase[AnyModel]):
"""Set the cap on cache size."""
self._max_cache_size = value
@property
def max_vram_cache_size(self) -> float:
"""Return the cap on vram cache size."""
return self._max_vram_cache_size
@max_vram_cache_size.setter
def max_vram_cache_size(self, value: float) -> None:
"""Set the cap on vram cache size."""
self._max_vram_cache_size = value
@property
def stats(self) -> Optional[CacheStats]:
"""Return collected CacheStats object."""

View File

@@ -32,6 +32,9 @@ from invokeai.backend.model_manager.config import (
)
from invokeai.backend.model_manager.load.load_default import ModelLoader
from invokeai.backend.model_manager.load.model_loader_registry import ModelLoaderRegistry
from invokeai.backend.model_manager.util.model_util import (
convert_bundle_to_flux_transformer_checkpoint,
)
from invokeai.backend.util.silence_warnings import SilenceWarnings
try:
@@ -190,6 +193,13 @@ class FluxCheckpointModel(ModelLoader):
with SilenceWarnings():
model = Flux(params[config.config_path])
sd = load_file(model_path)
if "model.diffusion_model.double_blocks.0.img_attn.norm.key_norm.scale" in sd:
sd = convert_bundle_to_flux_transformer_checkpoint(sd)
new_sd_size = sum([ten.nelement() * torch.bfloat16.itemsize for ten in sd.values()])
self._ram_cache.make_room(new_sd_size)
for k in sd.keys():
# We need to cast to bfloat16 due to it being the only currently supported dtype for inference
sd[k] = sd[k].to(torch.bfloat16)
model.load_state_dict(sd, assign=True)
return model
@@ -230,5 +240,7 @@ class FluxBnbQuantizednf4bCheckpointModel(ModelLoader):
model = Flux(params[config.config_path])
model = quantize_model_nf4(model, modules_to_not_convert=set(), compute_dtype=torch.bfloat16)
sd = load_file(model_path)
if "model.diffusion_model.double_blocks.0.img_attn.norm.key_norm.scale" in sd:
sd = convert_bundle_to_flux_transformer_checkpoint(sd)
model.load_state_dict(sd, assign=True)
return model

View File

@@ -108,6 +108,8 @@ class ModelProbe(object):
"CLIPVisionModelWithProjection": ModelType.CLIPVision,
"T2IAdapter": ModelType.T2IAdapter,
"CLIPModel": ModelType.CLIPEmbed,
"CLIPTextModel": ModelType.CLIPEmbed,
"T5EncoderModel": ModelType.T5Encoder,
}
@classmethod
@@ -224,7 +226,18 @@ class ModelProbe(object):
ckpt = ckpt.get("state_dict", ckpt)
for key in [str(k) for k in ckpt.keys()]:
if key.startswith(("cond_stage_model.", "first_stage_model.", "model.diffusion_model.", "double_blocks.")):
if key.startswith(
(
"cond_stage_model.",
"first_stage_model.",
"model.diffusion_model.",
# FLUX models in the official BFL format contain keys with the "double_blocks." prefix.
"double_blocks.",
# Some FLUX checkpoint files contain transformer keys prefixed with "model.diffusion_model".
# This prefix is typically used to distinguish between multiple models bundled in a single file.
"model.diffusion_model.double_blocks.",
)
):
# Keys starting with double_blocks are associated with Flux models
return ModelType.Main
elif key.startswith(("encoder.conv_in", "decoder.conv_in")):
@@ -283,9 +296,16 @@ class ModelProbe(object):
if (folder_path / "image_encoder.txt").exists():
return ModelType.IPAdapter
i = folder_path / "model_index.json"
c = folder_path / "config.json"
config_path = i if i.exists() else c if c.exists() else None
config_path = None
for p in [
folder_path / "model_index.json", # pipeline
folder_path / "config.json", # most diffusers
folder_path / "text_encoder_2" / "config.json", # T5 text encoder
folder_path / "text_encoder" / "config.json", # T5 CLIP
]:
if p.exists():
config_path = p
break
if config_path:
with open(config_path, "r") as file:
@@ -328,7 +348,10 @@ class ModelProbe(object):
# TODO: Decide between dev/schnell
checkpoint = ModelProbe._scan_and_load_checkpoint(model_path)
state_dict = checkpoint.get("state_dict") or checkpoint
if "guidance_in.out_layer.weight" in state_dict:
if (
"guidance_in.out_layer.weight" in state_dict
or "model.diffusion_model.guidance_in.out_layer.weight" in state_dict
):
# For flux, this is a key in invokeai.backend.flux.util.params
# Due to model type and format being the descriminator for model configs this
# is used rather than attempting to support flux with separate model types and format
@@ -336,7 +359,7 @@ class ModelProbe(object):
config_file = "flux-dev"
else:
# For flux, this is a key in invokeai.backend.flux.util.params
# Due to model type and format being the descriminator for model configs this
# Due to model type and format being the discriminator for model configs this
# is used rather than attempting to support flux with separate model types and format
# If changed in the future, please fix me
config_file = "flux-schnell"
@@ -443,7 +466,10 @@ class CheckpointProbeBase(ProbeBase):
def get_format(self) -> ModelFormat:
state_dict = self.checkpoint.get("state_dict") or self.checkpoint
if "double_blocks.0.img_attn.proj.weight.quant_state.bitsandbytes__nf4" in state_dict:
if (
"double_blocks.0.img_attn.proj.weight.quant_state.bitsandbytes__nf4" in state_dict
or "model.diffusion_model.double_blocks.0.img_attn.proj.weight.quant_state.bitsandbytes__nf4" in state_dict
):
return ModelFormat.BnbQuantizednf4b
return ModelFormat("checkpoint")
@@ -470,7 +496,10 @@ class PipelineCheckpointProbe(CheckpointProbeBase):
def get_base_type(self) -> BaseModelType:
checkpoint = self.checkpoint
state_dict = self.checkpoint.get("state_dict") or checkpoint
if "double_blocks.0.img_attn.norm.key_norm.scale" in state_dict:
if (
"double_blocks.0.img_attn.norm.key_norm.scale" in state_dict
or "model.diffusion_model.double_blocks.0.img_attn.norm.key_norm.scale" in state_dict
):
return BaseModelType.Flux
key_name = "model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight"
if key_name in state_dict and state_dict[key_name].shape[-1] == 768:
@@ -747,8 +776,27 @@ class TextualInversionFolderProbe(FolderProbeBase):
class T5EncoderFolderProbe(FolderProbeBase):
def get_base_type(self) -> BaseModelType:
return BaseModelType.Any
def get_format(self) -> ModelFormat:
return ModelFormat.T5Encoder
path = self.model_path / "text_encoder_2"
if (path / "model.safetensors.index.json").exists():
return ModelFormat.T5Encoder
files = list(path.glob("*.safetensors"))
if len(files) == 0:
raise InvalidModelConfigException(f"{self.model_path.as_posix()}: no .safetensors files found")
# shortcut: look for the quantization in the name
if any(x for x in files if "llm_int8" in x.as_posix()):
return ModelFormat.BnbQuantizedLlmInt8b
# more reliable path: probe contents for a 'SCB' key
ckpt = read_checkpoint_meta(files[0], scan=True)
if any("SCB" in x for x in ckpt.keys()):
return ModelFormat.BnbQuantizedLlmInt8b
raise InvalidModelConfigException(f"{self.model_path.as_posix()}: unknown model format")
class ONNXFolderProbe(PipelineFolderProbe):

View File

@@ -133,3 +133,29 @@ def lora_token_vector_length(checkpoint: Dict[str, torch.Tensor]) -> Optional[in
break
return lora_token_vector_length
def convert_bundle_to_flux_transformer_checkpoint(
transformer_state_dict: dict[str, torch.Tensor],
) -> dict[str, torch.Tensor]:
original_state_dict: dict[str, torch.Tensor] = {}
keys_to_remove: list[str] = []
for k, v in transformer_state_dict.items():
if not k.startswith("model.diffusion_model"):
keys_to_remove.append(k) # This can be removed in the future if we only want to delete transformer keys
continue
if k.endswith("scale"):
# Scale math must be done at bfloat16 due to our current flux model
# support limitations at inference time
v = v.to(dtype=torch.bfloat16)
new_key = k.replace("model.diffusion_model.", "")
original_state_dict[new_key] = v
keys_to_remove.append(k)
# Remove processed keys from the original dictionary, leaving others in case
# other model state dicts need to be pulled
for k in keys_to_remove:
del transformer_state_dict[k]
return original_state_dict

View File

@@ -127,7 +127,14 @@
"bulkDownloadRequestedDesc": "Dein Download wird vorbereitet. Dies kann ein paar Momente dauern.",
"bulkDownloadRequestFailed": "Problem beim Download vorbereiten",
"bulkDownloadFailed": "Download fehlgeschlagen",
"alwaysShowImageSizeBadge": "Zeige immer Bilder Größe Abzeichen"
"alwaysShowImageSizeBadge": "Zeige immer Bilder Größe Abzeichen",
"selectForCompare": "Zum Vergleichen auswählen",
"compareImage": "Bilder vergleichen",
"exitSearch": "Suche beenden",
"newestFirst": "Neueste zuerst",
"oldestFirst": "Älteste zuerst",
"openInViewer": "Im Viewer öffnen",
"swapImages": "Bilder tauschen"
},
"hotkeys": {
"keyboardShortcuts": "Tastenkürzel",
@@ -631,7 +638,8 @@
"archived": "Archiviert",
"noBoards": "Kein {boardType}} Ordner",
"hideBoards": "Ordner verstecken",
"viewBoards": "Ordner ansehen"
"viewBoards": "Ordner ansehen",
"deletedPrivateBoardsCannotbeRestored": "Gelöschte Boards können nicht wiederhergestellt werden. Wenn Sie „Nur Board löschen“ wählen, werden die Bilder in einen privaten, nicht kategorisierten Status für den Ersteller des Bildes versetzt."
},
"controlnet": {
"showAdvanced": "Zeige Erweitert",
@@ -781,7 +789,9 @@
"batchFieldValues": "Stapelverarbeitungswerte",
"batchQueued": "Stapelverarbeitung eingereiht",
"graphQueued": "Graph eingereiht",
"graphFailedToQueue": "Fehler beim Einreihen des Graphen"
"graphFailedToQueue": "Fehler beim Einreihen des Graphen",
"generations_one": "Generation",
"generations_other": "Generationen"
},
"metadata": {
"negativePrompt": "Negativ Beschreibung",
@@ -1146,5 +1156,10 @@
"noMatchingTriggers": "Keine passenden Trigger",
"addPromptTrigger": "Prompt-Trigger hinzufügen",
"compatibleEmbeddings": "Kompatible Einbettungen"
},
"ui": {
"tabs": {
"queue": "Warteschlange"
}
}
}

View File

@@ -86,15 +86,15 @@
"loadMore": "Cargar más",
"noImagesInGallery": "No hay imágenes para mostrar",
"deleteImage_one": "Eliminar Imagen",
"deleteImage_many": "",
"deleteImage_other": "",
"deleteImage_many": "Eliminar {{count}} Imágenes",
"deleteImage_other": "Eliminar {{count}} Imágenes",
"deleteImagePermanent": "Las imágenes eliminadas no se pueden restaurar.",
"assets": "Activos",
"autoAssignBoardOnClick": "Asignación automática de tableros al hacer clic"
},
"hotkeys": {
"keyboardShortcuts": "Atajos de teclado",
"appHotkeys": "Atajos de applicación",
"appHotkeys": "Atajos de aplicación",
"generalHotkeys": "Atajos generales",
"galleryHotkeys": "Atajos de galería",
"unifiedCanvasHotkeys": "Atajos de lienzo unificado",
@@ -535,7 +535,7 @@
"bottomMessage": "Al eliminar este panel y las imágenes que contiene, se restablecerán las funciones que los estén utilizando actualmente.",
"deleteBoardAndImages": "Borrar el panel y las imágenes",
"loading": "Cargando...",
"deletedBoardsCannotbeRestored": "Los paneles eliminados no se pueden restaurar",
"deletedBoardsCannotbeRestored": "Los paneles eliminados no se pueden restaurar. Al Seleccionar 'Borrar Solo el Panel' transferirá las imágenes a un estado sin categorizar.",
"move": "Mover",
"menuItemAutoAdd": "Agregar automáticamente a este panel",
"searchBoard": "Buscando paneles…",
@@ -549,7 +549,13 @@
"imagesWithCount_other": "{{count}} imágenes",
"assetsWithCount_one": "{{count}} activo",
"assetsWithCount_many": "{{count}} activos",
"assetsWithCount_other": "{{count}} activos"
"assetsWithCount_other": "{{count}} activos",
"hideBoards": "Ocultar Paneles",
"addPrivateBoard": "Agregar un tablero privado",
"addSharedBoard": "Agregar Panel Compartido",
"boards": "Paneles",
"archiveBoard": "Archivar Panel",
"archived": "Archivado"
},
"accordions": {
"compositing": {

View File

@@ -496,7 +496,9 @@
"main": "Principali",
"noModelsInstalledDesc1": "Installa i modelli con",
"ipAdapters": "Adattatori IP",
"noMatchingModels": "Nessun modello corrispondente"
"noMatchingModels": "Nessun modello corrispondente",
"starterModelsInModelManager": "I modelli iniziali possono essere trovati in Gestione Modelli",
"spandrelImageToImage": "Immagine a immagine (Spandrel)"
},
"parameters": {
"images": "Immagini",
@@ -510,7 +512,7 @@
"perlinNoise": "Rumore Perlin",
"type": "Tipo",
"strength": "Forza",
"upscaling": "Ampliamento",
"upscaling": "Amplia",
"scale": "Scala",
"imageFit": "Adatta l'immagine iniziale alle dimensioni di output",
"scaleBeforeProcessing": "Scala prima dell'elaborazione",
@@ -593,7 +595,7 @@
"globalPositivePromptPlaceholder": "Prompt positivo globale",
"globalNegativePromptPlaceholder": "Prompt negativo globale",
"processImage": "Elabora Immagine",
"sendToUpscale": "Invia a Ampliare",
"sendToUpscale": "Invia a Amplia",
"postProcessing": "Post-elaborazione (Shift + U)"
},
"settings": {
@@ -1420,7 +1422,7 @@
"paramUpscaleMethod": {
"heading": "Metodo di ampliamento",
"paragraphs": [
"Metodo utilizzato per eseguire l'ampliamento dell'immagine per la correzione ad alta risoluzione."
"Metodo utilizzato per ampliare l'immagine per la correzione ad alta risoluzione."
]
},
"patchmatchDownScaleSize": {
@@ -1528,7 +1530,7 @@
},
"upscaleModel": {
"paragraphs": [
"Il modello di ampliamento (Upscale), scala l'immagine alle dimensioni di uscita prima di aggiungere i dettagli. È possibile utilizzare qualsiasi modello di ampliamento supportato, ma alcuni sono specializzati per diversi tipi di immagini, come foto o disegni al tratto."
"Il modello di ampliamento, scala l'immagine alle dimensioni di uscita prima di aggiungere i dettagli. È possibile utilizzare qualsiasi modello di ampliamento supportato, ma alcuni sono specializzati per diversi tipi di immagini, come foto o disegni al tratto."
],
"heading": "Modello di ampliamento"
},
@@ -1720,26 +1722,27 @@
"modelsTab": "$t(ui.tabs.models) $t(common.tab)",
"queue": "Coda",
"queueTab": "$t(ui.tabs.queue) $t(common.tab)",
"upscaling": "Ampliamento",
"upscaling": "Amplia",
"upscalingTab": "$t(ui.tabs.upscaling) $t(common.tab)"
}
},
"upscaling": {
"creativity": "Creatività",
"structure": "Struttura",
"upscaleModel": "Modello di Ampliamento",
"upscaleModel": "Modello di ampliamento",
"scale": "Scala",
"missingModelsWarning": "Visita <LinkComponent>Gestione modelli</LinkComponent> per installare i modelli richiesti:",
"mainModelDesc": "Modello principale (architettura SD1.5 o SDXL)",
"tileControlNetModelDesc": "Modello Tile ControlNet per l'architettura del modello principale scelto",
"upscaleModelDesc": "Modello per l'ampliamento (da immagine a immagine)",
"upscaleModelDesc": "Modello per l'ampliamento (immagine a immagine)",
"missingUpscaleInitialImage": "Immagine iniziale mancante per l'ampliamento",
"missingUpscaleModel": "Modello per lampliamento mancante",
"missingTileControlNetModel": "Nessun modello ControlNet Tile valido installato",
"postProcessingModel": "Modello di post-elaborazione",
"postProcessingMissingModelWarning": "Visita <LinkComponent>Gestione modelli</LinkComponent> per installare un modello di post-elaborazione (da immagine a immagine).",
"exceedsMaxSize": "Le impostazioni di ampliamento superano il limite massimo delle dimensioni",
"exceedsMaxSizeDetails": "Il limite massimo di ampliamento è {{maxUpscaleDimension}}x{{maxUpscaleDimension}} pixel. Prova un'immagine più piccola o diminuisci la scala selezionata."
"exceedsMaxSizeDetails": "Il limite massimo di ampliamento è {{maxUpscaleDimension}}x{{maxUpscaleDimension}} pixel. Prova un'immagine più piccola o diminuisci la scala selezionata.",
"upscale": "Amplia"
},
"upsell": {
"inviteTeammates": "Invita collaboratori",
@@ -1789,6 +1792,7 @@
"positivePromptColumn": "'prompt' o 'positive_prompt'",
"noTemplates": "Nessun modello",
"acceptedColumnsKeys": "Colonne/chiavi accettate:",
"templateActions": "Azioni modello"
"templateActions": "Azioni modello",
"promptTemplateCleared": "Modello di prompt cancellato"
}
}

View File

@@ -501,7 +501,8 @@
"noModelsInstalled": "Нет установленных моделей",
"noModelsInstalledDesc1": "Установите модели с помощью",
"noMatchingModels": "Нет подходящих моделей",
"ipAdapters": "IP адаптеры"
"ipAdapters": "IP адаптеры",
"starterModelsInModelManager": "Стартовые модели можно найти в Менеджере моделей"
},
"parameters": {
"images": "Изображения",
@@ -1758,7 +1759,8 @@
"postProcessingModel": "Модель постобработки",
"tileControlNetModelDesc": "Модель ControlNet для выбранной архитектуры основной модели",
"missingModelsWarning": "Зайдите в <LinkComponent>Менеджер моделей</LinkComponent> чтоб установить необходимые модели:",
"postProcessingMissingModelWarning": "Посетите <LinkComponent>Менеджер моделей</LinkComponent>, чтобы установить модель постобработки (img2img)."
"postProcessingMissingModelWarning": "Посетите <LinkComponent>Менеджер моделей</LinkComponent>, чтобы установить модель постобработки (img2img).",
"upscale": "Увеличить"
},
"stylePresets": {
"noMatchingTemplates": "Нет подходящих шаблонов",
@@ -1804,7 +1806,8 @@
"noTemplates": "Нет шаблонов",
"promptTemplatesDesc2": "Используйте строку-заполнитель <Pre>{{placeholder}}</Pre>, чтобы указать место, куда должен быть включен ваш запрос в шаблоне.",
"searchByName": "Поиск по имени",
"shared": "Общий"
"shared": "Общий",
"promptTemplateCleared": "Шаблон запроса создан"
},
"upsell": {
"inviteTeammates": "Пригласите членов команды",

View File

@@ -154,7 +154,8 @@
"displaySearch": "显示搜索",
"stretchToFit": "拉伸以适应",
"exitCompare": "退出对比",
"compareHelp1": "在点击图库中的图片或使用箭头键切换比较图片时,请按住<Kbd>Alt</Kbd> 键。"
"compareHelp1": "在点击图库中的图片或使用箭头键切换比较图片时,请按住<Kbd>Alt</Kbd> 键。",
"go": "运行"
},
"hotkeys": {
"keyboardShortcuts": "快捷键",
@@ -494,7 +495,9 @@
"huggingFacePlaceholder": "所有者或模型名称",
"huggingFaceRepoID": "HuggingFace仓库ID",
"loraTriggerPhrases": "LoRA 触发词",
"ipAdapters": "IP适配器"
"ipAdapters": "IP适配器",
"spandrelImageToImage": "图生图(Spandrel)",
"starterModelsInModelManager": "您可以在模型管理器中找到初始模型"
},
"parameters": {
"images": "图像",
@@ -695,7 +698,9 @@
"outOfMemoryErrorDesc": "您当前的生成设置已超出系统处理能力.请调整设置后再次尝试.",
"parametersSet": "参数已恢复",
"errorCopied": "错误信息已复制",
"modelImportCanceled": "模型导入已取消"
"modelImportCanceled": "模型导入已取消",
"importFailed": "导入失败",
"importSuccessful": "导入成功"
},
"unifiedCanvas": {
"layer": "图层",
@@ -1705,12 +1710,55 @@
"missingModelsWarning": "请访问<LinkComponent>模型管理器</LinkComponent> 安装所需的模型:",
"mainModelDesc": "主模型SD1.5或SDXL架构",
"exceedsMaxSize": "放大设置超出了最大尺寸限制",
"exceedsMaxSizeDetails": "最大放大限制是 {{maxUpscaleDimension}}x{{maxUpscaleDimension}} 像素.请尝试一个较小的图像或减少您的缩放选择."
"exceedsMaxSizeDetails": "最大放大限制是 {{maxUpscaleDimension}}x{{maxUpscaleDimension}} 像素.请尝试一个较小的图像或减少您的缩放选择.",
"upscale": "放大"
},
"upsell": {
"inviteTeammates": "邀请团队成员",
"professional": "专业",
"professionalUpsell": "可在 Invoke 的专业版中使用.点击此处或访问 invoke.com/pricing 了解更多详情.",
"shareAccess": "共享访问权限"
},
"stylePresets": {
"positivePrompt": "正向提示词",
"preview": "预览",
"deleteImage": "删除图像",
"deleteTemplate": "删除模版",
"deleteTemplate2": "您确定要删除这个模板吗?请注意,删除后无法恢复.",
"importTemplates": "导入提示模板支持CSV或JSON格式",
"insertPlaceholder": "插入一个占位符",
"myTemplates": "我的模版",
"name": "名称",
"type": "类型",
"unableToDeleteTemplate": "无法删除提示模板",
"updatePromptTemplate": "更新提示词模版",
"exportPromptTemplates": "导出我的提示模板为CSV格式",
"exportDownloaded": "导出已下载",
"noMatchingTemplates": "无匹配的模版",
"promptTemplatesDesc1": "提示模板可以帮助您在编写提示时添加预设的文本内容.",
"promptTemplatesDesc3": "如果您没有使用占位符,那么模板的内容将会被添加到您提示的末尾.",
"searchByName": "按名称搜索",
"shared": "已分享",
"sharedTemplates": "已分享的模版",
"templateActions": "模版操作",
"templateDeleted": "提示模版已删除",
"toggleViewMode": "切换显示模式",
"uploadImage": "上传图像",
"active": "激活",
"choosePromptTemplate": "选择提示词模板",
"clearTemplateSelection": "清除模版选择",
"copyTemplate": "拷贝模版",
"createPromptTemplate": "创建提示词模版",
"defaultTemplates": "默认模版",
"editTemplate": "编辑模版",
"exportFailed": "无法生成并下载CSV文件",
"flatten": "将选定的模板内容合并到当前提示中",
"negativePrompt": "反向提示词",
"promptTemplateCleared": "提示模板已清除",
"useForTemplate": "用于提示词模版",
"viewList": "预览模版列表",
"viewModeTooltip": "这是您的提示在当前选定的模板下的预览效果。如需编辑提示,请直接在文本框中点击进行修改.",
"noTemplates": "无模版",
"private": "私密"
}
}

File diff suppressed because one or more lines are too long

View File

@@ -1 +1 @@
__version__ = "4.2.9rc1"
__version__ = "4.2.9"

63
scripts/allocate_vram.py Normal file
View File

@@ -0,0 +1,63 @@
import argparse
import torch
def display_vram_usage():
"""Displays the total, allocated, and free VRAM on the current CUDA device."""
assert torch.cuda.is_available(), "CUDA is not available"
device = torch.device("cuda")
total_vram = torch.cuda.get_device_properties(device).total_memory
allocated_vram = torch.cuda.memory_allocated(device)
free_vram = total_vram - allocated_vram
print(f"Total VRAM: {total_vram / (1024 * 1024 * 1024):.2f} GB")
print(f"Allocated VRAM: {allocated_vram / (1024 * 1024 * 1024):.2f} GB")
print(f"Free VRAM: {free_vram / (1024 * 1024 * 1024):.2f} GB")
def allocate_vram(target_gb: float, target_free: bool = False):
"""Allocates VRAM on the current CUDA device. After allocation, the script will pause until the user presses Enter
or ends the script, at which point the VRAM will be released.
Args:
target_gb (float): Amount of VRAM to allocate in GB.
target_free (bool, optional): Instead of allocating <target_gb> VRAM, enough VRAM will be allocated so the system has <target_gb> of VRAM free. For example, if <target_gb> is 2 GB, the script will allocate VRAM until the free VRAM is 2 GB.
"""
assert torch.cuda.is_available(), "CUDA is not available"
device = torch.device("cuda")
if target_free:
total_vram = torch.cuda.get_device_properties(device).total_memory
free_vram = total_vram - torch.cuda.memory_allocated(device)
target_free_bytes = target_gb * 1024 * 1024 * 1024
bytes_to_allocate = free_vram - target_free_bytes
if bytes_to_allocate <= 0:
print(f"Already at or below the target free VRAM of {target_gb} GB")
return
else:
bytes_to_allocate = target_gb * 1024 * 1024 * 1024
# FloatTensor (4 bytes per element)
_tensor = torch.empty(int(bytes_to_allocate / 4), dtype=torch.float, device="cuda")
display_vram_usage()
input("Press Enter to release VRAM allocation and exit...")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Allocate VRAM for testing purposes. Only works on CUDA devices.")
parser.add_argument("target_gb", type=float, help="Amount of VRAM to allocate in GB.")
parser.add_argument(
"--target-free",
action="store_true",
help="Instead of allocating <target_gb> VRAM, enough VRAM will be allocated so the system has <target_gb> of VRAM free. For example, if <target_gb> is 2 GB, the script will allocate VRAM until the free VRAM is 2 GB.",
)
args = parser.parse_args()
allocate_vram(target_gb=args.target_gb, target_free=args.target_free)

View File

@@ -0,0 +1,42 @@
import pytest
import torch
from invokeai.backend.flux.sampling_utils import clip_timestep_schedule
def float_lists_almost_equal(list1: list[float], list2: list[float], tol: float = 1e-6) -> bool:
return all(abs(a - b) < tol for a, b in zip(list1, list2, strict=True))
@pytest.mark.parametrize(
["denoising_start", "denoising_end", "expected_timesteps", "raises"],
[
(0.0, 1.0, [1.0, 0.75, 0.5, 0.25, 0.0], False), # Default case.
(-0.1, 1.0, [], True), # Negative denoising_start should raise.
(0.0, 1.1, [], True), # denoising_end > 1 should raise.
(0.5, 0.0, [], True), # denoising_start > denoising_end should raise.
(0.0, 0.0, [1.0], False), # denoising_end == 0.
(1.0, 1.0, [0.0], False), # denoising_start == 1.
(0.2, 0.8, [1.0, 0.75, 0.5, 0.25], False), # Middle of the schedule.
# If we denoise from 0.0 to x, then from x to 1.0, it is important that denoise_end = x and denoise_start = x
# map to the same timestep. We test this first when x is equal to a timestep, then when it falls between two
# timesteps.
# x = 0.5
(0.0, 0.5, [1.0, 0.75, 0.5], False),
(0.5, 1.0, [0.5, 0.25, 0.0], False),
# x = 0.3
(0.0, 0.3, [1.0, 0.75], False),
(0.3, 1.0, [0.75, 0.5, 0.25, 0.0], False),
],
)
def test_clip_timestep_schedule(
denoising_start: float, denoising_end: float, expected_timesteps: list[float], raises: bool
):
timesteps = torch.linspace(1, 0, 5).tolist()
if raises:
with pytest.raises(AssertionError):
clip_timestep_schedule(timesteps, denoising_start, denoising_end)
else:
assert float_lists_almost_equal(
clip_timestep_schedule(timesteps, denoising_start, denoising_end), expected_timesteps
)