mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-04-23 03:00:31 -04:00

Files

Alexander Eichhorn b92c6ae633 feat(flux2): add FLUX.2 klein model support (#8768 )

* WIP: feat(flux2): add FLUX 2 Kontext model support

- Add new invocation nodes for FLUX 2:
  - flux2_denoise: Denoising invocation for FLUX 2
  - flux2_klein_model_loader: Model loader for Klein architecture
  - flux2_klein_text_encoder: Text encoder for Qwen3-based encoding
  - flux2_vae_decode: VAE decoder for FLUX 2

- Add backend support:
  - New flux2 module with denoise and sampling utilities
  - Extended model manager configs for FLUX 2 models
  - Updated model loaders for Klein architecture

- Update frontend:
  - Extended graph builder for FLUX 2 support
  - Added FLUX 2 model types and configurations
  - Updated readiness checks and UI components

* fix(flux2): correct VAE decode with proper BN denormalization

FLUX.2 VAE uses Batch Normalization in the patchified latent space
(128 channels). The decode must:
1. Patchify latents from (B, 32, H, W) to (B, 128, H/2, W/2)
2. Apply BN denormalization using running_mean/running_var
3. Unpatchify back to (B, 32, H, W) for VAE decode

Also fixed image normalization from [-1, 1] to [0, 255].

This fixes washed-out colors in generated FLUX.2 Klein images.

* feat(flux2): add FLUX.2 Klein model support with ComfyUI checkpoint compatibility

- Add FLUX.2 transformer loader with BFL-to-diffusers weight conversion
- Fix AdaLayerNorm scale-shift swap for final_layer.adaLN_modulation weights
- Add VAE batch normalization handling for FLUX.2 latent normalization
- Add Qwen3 text encoder loader with ComfyUI FP8 quantization support
- Add frontend components for FLUX.2 Klein model selection
- Update configs and schema for FLUX.2 model types

* Chore Ruff

* Fix Flux1 vae probing

* Fix Windows Paths schema.ts

* Add 4B und 9B klein to Starter Models.

* feat(flux2): add non-commercial license indicator for FLUX.2 Klein 9B

- Add isFlux2Klein9BMainModelConfig and isNonCommercialMainModelConfig functions
- Update MainModelPicker and InitialStateMainModelPicker to show license icon
- Update license tooltip text to include FLUX.2 Klein 9B

* feat(flux2): add Klein/Qwen3 variant support and encoder filtering

Backend:
- Add klein_4b/klein_9b variants for FLUX.2 Klein models
- Add qwen3_4b/qwen3_8b variants for Qwen3 encoder models
- Validate encoder variant matches Klein model (4B↔4B, 9B↔8B)
- Auto-detect Qwen3 variant from hidden_size during probing

Frontend:
- Show variant field for all model types in ModelView
- Filter Qwen3 encoder dropdown to only show compatible variants
- Update variant type definitions (zFlux2VariantType, zQwen3VariantType)
- Remove unused exports (isFluxDevMainModelConfig, isFlux2Klein9BMainModelConfig)

* Chore Ruff

* feat(flux2): add Klein 9B Base (undistilled) variant support

Distinguish between FLUX.2 Klein 9B (distilled) and Klein 9B Base (undistilled)
models by checking guidance_embeds in diffusers config or guidance_in keys in
safetensors. Klein 9B Base requires more steps but offers higher quality.

* feat(flux2): improve diffusers compatibility and distilled model support

Backend changes:
- Update text encoder layers from [9,18,27] to (10,20,30) matching diffusers
- Use apply_chat_template with system message instead of manual formatting
- Change position IDs from ones to zeros to match diffusers implementation
- Add get_schedule_flux2() with empirical mu computation for proper schedule shifting
- Add txt_embed_scale parameter for Qwen3 embedding magnitude control
- Add shift_schedule toggle for base (28+ steps) vs distilled (4 steps) models
- Zero out guidance_embedder weights for Klein models without guidance_embeds

UI changes:
- Clear Klein VAE and Qwen3 encoder when switching away from flux2 base
- Clear Qwen3 encoder when switching between different Klein model variants
- Add toast notification informing user to select compatible encoder

* feat(flux2): fix distilled model scheduling with proper dynamic shifting

- Configure scheduler with FLUX.2 Klein parameters from scheduler_config.json
  (use_dynamic_shifting=True, shift=3.0, time_shift_type="exponential")
- Pass mu parameter to scheduler.set_timesteps() for resolution-aware shifting
- Remove manual shift_schedule parameter (scheduler handles this automatically)
- Simplify get_schedule_flux2() to return linear sigmas only
- Remove txt_embed_scale parameter (no longer needed)

This matches the diffusers Flux2KleinPipeline behavior where the
FlowMatchEulerDiscreteScheduler applies dynamic timestep shifting
based on image resolution via the mu parameter.

Fixes 4-step distilled Klein 9B model quality issues.

* fix(ui): fix FLUX.1 graph building with posCondCollect node lookup

The posCondCollect node was created with getPrefixedId() which generates
a random suffix (e.g., 'pos_cond_collect:abc123'), but g.getNode() was
called with the plain string 'pos_cond_collect', causing a node lookup
failure.

Fix by declaring posCondCollect as a module-scoped variable and
referencing it directly instead of using g.getNode().

* Remove Flux2 Klein Base from Starter Models

* Remove Logging

* Add Default Values for Flux2 Klein and add variant as additional info to from_base

* Add migrations for the z-image qwen3 encoder without a variant value

* Add img2img, inpainting and outpainting support for FLUX.2 Klein

- Add flux2_vae_encode invocation for encoding images to FLUX.2 latents
- Integrate inpaint_extension into FLUX.2 denoise loop for proper mask handling
- Apply BN normalization to init_latents and noise for consistency in inpainting
- Use manual Euler stepping for img2img/inpaint to preserve exact timestep schedule
- Add flux2_img2img, flux2_inpaint, flux2_outpaint generation modes
- Expand starter models with FP8 variants, standalone transformers, and separate VAE/encoders
- Fix outpainting to always use full denoising (0-1) since strength doesn't apply
- Improve error messages in model loader with clear guidance for standalone models

* Add GGUF quantized model support and Diffusers VAE loader for FLUX.2 Klein

- Add Main_GGUF_Flux2_Config for GGUF-quantized FLUX.2 transformer models
- Add VAE_Diffusers_Flux2_Config for FLUX.2 VAE in diffusers format
- Add Flux2GGUFCheckpointModel loader with BFL-to-diffusers conversion
- Add Flux2VAEDiffusersLoader for AutoencoderKLFlux2
- Add FLUX.2 Klein 4B/9B hardware requirements to documentation
- Update starter model descriptions to clarify dependencies install together
- Update frontend schema for new model configs

* Fix FLUX.2 model detection and add FP8 weight dequantization support

- Improve FLUX.2 variant detection for GGUF/checkpoint models (BFL format keys)
- Fix guidance_embeds logic: distilled=False, undistilled=True
- Add FP8 weight dequantization for ComfyUI-style quantized models
- Prevent FLUX.2 models from being misidentified as FLUX.1
- Preserve user-editable fields (name, description, etc.) on model reidentify
- Improve Qwen3Encoder detection by variant in starter models
- Add defensive checks for tensor operations

* Chore ruff format

* Chore Typegen

* Fix FLUX.2 Klein 9B model loading by detecting hidden_size from weights

Previously num_attention_heads was hardcoded to 24, which is correct for
Klein 4B but causes size mismatches when loading Klein 9B checkpoints.

Now dynamically calculates num_attention_heads from the hidden_size
dimension of context_embedder weights:
- Klein 4B: hidden_size=3072 → num_attention_heads=24
- Klein 9B: hidden_size=4096 → num_attention_heads=32

Fixes both Checkpoint and GGUF loaders for FLUX.2 models.

* Only clear Qwen3 encoder when FLUX.2 Klein variant changes

Previously the encoder was cleared whenever switching between any Klein
models, even if they had the same variant. Now compares the variant of
the old and new model and only clears the encoder when switching between
different variants (e.g., klein_4b to klein_9b).

This allows users to switch between different Klein 9B models without
having to re-select the Qwen3 encoder each time.

* Add metadata recall support for FLUX.2 Klein parameters

The scheduler, VAE model, and Qwen3 encoder model were not being
recalled correctly for FLUX.2 Klein images. This adds dedicated
metadata handlers for the Klein-specific parameters.

* Fix FLUX.2 Klein denoising scaling and Z-Image VAE compatibility

- Apply exponential denoising scaling (exponent 0.2) to FLUX.2 Klein,
  matching FLUX.1 behavior for more intuitive inpainting strength
- Add isFlux1VAEModelConfig type guard to filter FLUX 1.0 VAEs only
- Restrict Z-Image VAE selection to FLUX 1.0 VAEs, excluding FLUX.2
  Klein 32-channel VAEs which are incompatible

* chore pnpm fix

* Add FLUX.2 Klein to starter bundles and documentation

- Add FLUX.2 Klein hardware requirements to quick start guide
- Create flux2_klein_bundle with GGUF Q4 model, VAE, and Qwen3 encoder
- Add "What's New" entry announcing FLUX.2 Klein support

* Add FLUX.2 Klein built-in reference image editing support

FLUX.2 Klein has native multi-reference image editing without requiring
a separate model (unlike FLUX.1 which needs a Kontext model).

Backend changes:
- Add Flux2RefImageExtension for encoding reference images with FLUX.2 VAE
- Apply BN normalization to reference image latents for correct scaling
- Use T-coordinate offset scale=10 like diffusers (T=10, 20, 30...)
- Concatenate reference latents with generated image during denoising
- Extract only generated portion in step callback for correct preview

Frontend changes:
- Add flux2_reference_image config type without model field
- Hide model selector for FLUX.2 reference images (built-in support)
- Add type guards to handle configs without model property
- Update validators to skip model validation for FLUX.2
- Add 'flux2' to SUPPORTS_REF_IMAGES_BASE_MODELS

* Chore windows path fix

* Add reference image resizing for FLUX.2 Klein

Resize large reference images to match BFL FLUX.2 sampling.py limits:
- Single reference: max 2024² pixels (~4.1M)
- Multiple references: max 1024² pixels (~1M)

Uses same scaling approach as BFL's cap_pixels() function.

2026-01-26 23:21:37 -05:00

6.9 KiB

Raw Blame History

Requirements

Invoke runs on Windows 10+, macOS 14+ and Linux (Ubuntu 20.04+ is well-tested).

Hardware

Hardware requirements vary significantly depending on model and image output size.

The requirements below are rough guidelines for best performance. GPUs with less VRAM typically still work, if a bit slower. Follow the Low-VRAM mode guide to optimize performance.

All Apple Silicon (M1, M2, etc) Macs work, but 16GB+ memory is recommended.
AMD GPUs are supported on Linux only. The VRAM requirements are the same as Nvidia GPUs.

!!! info "Hardware Requirements (Windows/Linux)"

=== "SD1.5 - 512×512"

    - GPU: Nvidia 10xx series or later, 4GB+ VRAM.
    - Memory: At least 8GB RAM.
    - Disk: 10GB for base installation plus 30GB for models.

=== "SDXL - 1024×1024"

    - GPU: Nvidia 20xx series or later, 8GB+ VRAM.
    - Memory: At least 16GB RAM.
    - Disk: 10GB for base installation plus 100GB for models.

=== "FLUX.1 - 1024×1024"

    - GPU: Nvidia 20xx series or later, 10GB+ VRAM.
    - Memory: At least 32GB RAM.
    - Disk: 10GB for base installation plus 200GB for models.

=== "FLUX.2 Klein 4B - 1024×1024"

    - GPU: Nvidia 30xx series or later, 12GB+ VRAM (e.g. RTX 3090, RTX 4070). FP8 version works with 8GB+ VRAM.
    - Memory: At least 16GB RAM.
    - Disk: 10GB for base installation plus 20GB for models (Diffusers format with encoder).

=== "FLUX.2 Klein 9B - 1024×1024"

    - GPU: Nvidia 40xx series, 24GB+ VRAM (e.g. RTX 4090). FP8 version works with 12GB+ VRAM.
    - Memory: At least 32GB RAM.
    - Disk: 10GB for base installation plus 40GB for models (Diffusers format with encoder).

=== "Z-Image Turbo - 1024x1024"
    - GPU: Nvidia 20xx series or later, 8GB+ VRAM for the Q4_K quantized model. 16GB+ needed for the Q8 or BF16 models.
    - Memory: At least 16GB RAM.
    - Disk: 10GB for base installation plus 35GB for models.

!!! info "tmpfs on Linux"

If your temporary directory is mounted as a `tmpfs`, ensure it has sufficient space.

Python

!!! tip "The launcher installs python for you"

You don't need to do this if you are installing with the [Invoke Launcher](./quick_start.md).

Invoke requires python 3.11 through 3.12. If you don't already have one of these versions installed, we suggest installing 3.12, as it will be supported for longer.

Check that your system has an up-to-date Python installed by running python3 --version in the terminal (Linux, macOS) or cmd/powershell (Windows).

!!! info "Installing Python"

=== "Windows"

    - Install python with [an official installer].
    - The installer includes an option to add python to your PATH. Be sure to enable this. If you missed it, re-run the installer, choose to modify an existing installation, and tick that checkbox.
    - You may need to install [Microsoft Visual C++ Redistributable].

=== "macOS"

    - Install python with [an official installer].
    - If model installs fail with a certificate error, you may need to run this command (changing the python version to match what you have installed): `/Applications/Python\ 3.11/Install\ Certificates.command`
    - If you haven't already, you will need to install the XCode CLI Tools by running `xcode-select --install` in a terminal.

=== "Linux"

    - Installing python varies depending on your system. We recommend [using `uv` to manage your python installation](https://docs.astral.sh/uv/concepts/python-versions/#installing-a-python-version).
    - You'll need to install `libglib2.0-0` and `libgl1-mesa-glx` for OpenCV to work. For example, on a Debian system: `sudo apt update && sudo apt install -y libglib2.0-0 libgl1-mesa-glx`

Drivers

If you have an Nvidia or AMD GPU, you may need to manually install drivers or other support packages for things to work well or at all.

Nvidia

Run nvidia-smi on your system's command line to verify that drivers and CUDA are installed. If this command fails, or doesn't report versions, you will need to install drivers.

Go to the CUDA Toolkit Downloads and carefully follow the instructions for your system to get everything installed.

Confirm that nvidia-smi displays driver and CUDA versions after installation.

Linux - via Nvidia Container Runtime

An alternative to installing CUDA locally is to use the Nvidia Container Runtime to run the application in a container.

Windows - Nvidia cuDNN DLLs

An out-of-date cuDNN library can greatly hamper performance on 30-series and 40-series cards. Check with the community on discord to compare your it/s if you think you may need this fix.

First, locate the destination for the DLL files and make a quick back up:

Find your InvokeAI installation folder, e.g. C:\Users\Username\InvokeAI\.
Open the .venv folder, e.g. C:\Users\Username\InvokeAI\.venv (you may need to show hidden files to see it).
Navigate deeper to the torch package, e.g. C:\Users\Username\InvokeAI\.venv\Lib\site-packages\torch.
Copy the lib folder inside torch and back it up somewhere.

Next, download and copy the updated cuDNN DLLs:

Go to https://developer.nvidia.com/cudnn.
Create an account if needed and log in.
Choose the newest version of cuDNN that works with your GPU architecture. Consult the cuDNN support matrix to determine the correct version for your GPU.
Download the latest version and extract it.
Find the bin folder, e.g. cudnn-windows-x86_64-SOME_VERSION\bin.
Copy and paste the .dll files into the lib folder you located earlier. Replace files when prompted.

If, after restarting the app, this doesn't improve your performance, either restore your back up or re-run the installer to reset torch back to its original state.

AMD

!!! info "Linux Only"

AMD GPUs are supported on Linux only, due to ROCm (the AMD equivalent of CUDA) support being Linux only.

!!! warning "Bumps Ahead"

While the application does run on AMD GPUs, there are occasional bumps related to spotty torch support.

Run rocm-smi on your system's command line verify that drivers and ROCm are installed. If this command fails, or doesn't report versions, you will need to install them.

Go to the ROCm Documentation and carefully follow the instructions for your system to get everything installed.

Confirm that rocm-smi displays driver and CUDA versions after installation.

Linux - via Docker Container

An alternative to installing ROCm locally is to use a ROCm docker container to run the application in a container.

6.9 KiB Raw Blame History Unescape Escape