Files
CypherNaugh_0x 9deb545cc1 External models (Gemini Nano Banana & OpenAI GPT Image) (#8633) (#8884)
* feat: initial external model support

* feat: support reference images for external models

* fix: sorting lint error

* chore: hide Reidentify button for external models

* review: enable auto-install/remove fro external models

* feat: show external mode name during install

* review: model descriptions

* review: implemented review comments

* review: added optional seed control for external models

* chore: fix linter warning

* review: save api keys to a seperate file

* docs: updated external model docs

* chore: fix linter errors

* fix: sync configured external starter models on startup

* feat(ui): add provider-specific external generation nodes

* feat: expose external panel schemas in model configs

* feat(ui): drive external panels from panel schema

* docs: sync app config docstring order

* feat: add gemini 3.1 flash image preview starter model

* feat: update gemini image model limits

* fix: resolve TypeScript errors and move external provider config to api_keys.yaml

Add 'external', 'external_image_generator', and 'external_api' to Zod
enum schemas (zBaseModelType, zModelType, zModelFormat) to match the
generated OpenAPI types. Remove redundant union workarounds from
component prop types and Record definitions.

Fix type errors in ModelEdit (react-hook-form Control invariance),
parsing.tsx (model identifier narrowing), buildExternalGraph (edge
typing), and ModelSettings import/export buttons.

Move external_gemini_base_url and external_openai_base_url into
api_keys.yaml alongside the API keys so all external provider config
lives in one dedicated file, separate from invokeai.yaml.

* feat: add resolution presets and imageConfig support for Gemini 3 models

Add combined resolution preset selector for external models that maps
aspect ratio + image size to fixed dimensions. Gemini 3 Pro and 3.1 Flash
now send imageConfig (aspectRatio + imageSize) via generationConfig instead
of text-based aspect ratio hints used by Gemini 2.5 Flash.

Backend: ExternalResolutionPreset model, resolution_presets capability field,
image_size on ExternalGenerationRequest, and Gemini provider imageConfig logic.

Frontend: ExternalSettingsAccordion with combo resolution select, dimension
slider disabling for fixed-size models, and panel schema constraint wiring
for Steps/Guidance/Seed controls.

* Remove unused external model fields and add provider-specific parameters

- Remove negative_prompt, steps, guidance, reference_image_weights,
  reference_image_modes from external model nodes (unused by any provider)
- Remove supports_negative_prompt, supports_steps, supports_guidance
  from ExternalModelCapabilities
- Add provider_options dict to ExternalGenerationRequest for
  provider-specific parameters
- Add OpenAI-specific fields: quality, background, input_fidelity
- Add Gemini-specific fields: temperature, thinking_level
- Add new OpenAI starter models: GPT Image 1.5, GPT Image 1 Mini,
  DALL-E 3, DALL-E 2
- Fix OpenAI provider to use output_format (GPT Image) vs
  response_format (DALL-E) and send model ID in requests
- Add fixed aspect ratio sizes for OpenAI models (bucketing)
- Add ExternalProviderRateLimitError with retry logic for 429 responses
- Add provider-specific UI components in ExternalSettingsAccordion
- Simplify ParamSteps/ParamGuidance by removing dead external overrides
- Update all backend and frontend tests

* Chore Ruff check & format

* Chore typegen

* feat: full canvas workflow integration for external models

- Add missing aspect ratios (4:5, 5:4, 8:1, 4:1, 1:4, 1:8) to type
  system for external model support
- Sync canvas bbox when external model resolution preset is selected
- Use params preset dimensions in buildExternalGraph to prevent
  "unsupported aspect ratio" errors
- Lock all bbox controls (resize handles, aspect ratio select,
  width/height sliders, swap/optimal buttons) for external models
  with fixed dimension presets
- Disable denoise strength slider for external models (not applicable)
- Sync bbox aspect ratio changes back to paramsSlice for external models
- Initialize bbox dimensions when switching to an external model

* Chore typegen Linux seperator

* feat: full canvas workflow integration for external models
- Update buildExternalGraph test to include dimensions in mock params

* Merge remote-tracking branch 'upstream/main' into external-models

* Chore pnpm fix

* add missing parameter

* docs: add External Models guide with Gemini and OpenAI provider pages

* fix(external-models): address PR review feedback

- Gemini recall: write temperature, thinking_level, image_size to image metadata;
  wire external graph as metadata receiver; add recall handlers.
- Canvas: gate regional guidance, inpaint mask, and control layer for external models.
- Canvas: throw a clear error on outpainting for external models (was falling back to
  inpaint and hitting an API-side mask/image size mismatch).
- Workflow editor: add ui_model_provider_id filter so OpenAI and Gemini nodes only
  list their own provider's models.
- Workflow editor: silently drop seed when the selected model does not support it
  instead of raising a capability error.
- Remove the legacy external_image_generation invocation and the graph-builder
  fallback; providers must register a dedicated node.
- Regenerate schema.ts.
- remove Gemini debug dumps to outputs/external_debug

* fix(external-models): resolve TSC errors in metadata parsing and external graph

- Export imageSizeChanged from paramsSlice (required by the new ImageSize
  recall handler).
- Emit the external graph's metadata model entry via zModelIdentifierField
  since ExternalApiModelConfig is not part of the AnyModelConfig union.

* chore: prettier format ModelIdentifierFieldInputComponent

* fix: remove unsupported thinkingConfig from Gemini image models and restrict GPT Image models to txt2img

* chore typegen

* chore(docs): regenerate settings.json for external provider fields

* fix(external): fix mask handling and mode support for external providers

- Remove img2img and inpaint modes from Gemini models (Gemini has no
  bitmap mask or dedicated edit API; image editing works via reference
  images in the UI)
- Fix DALL-E 2 inpainting: convert grayscale mask to RGBA with alpha
  channel transparency (OpenAI expects transparent=edit area) and
  convert init image to RGBA when mask is present

* fix(external): update mode support and UI for external providers

- Remove DALL-E 2 from starter models (deprecated, shutdown May 12 2026)
- Enable img2img for GPT Image 1/1.5/1-mini (supports edits endpoint)
- Set Gemini models to txt2img only (no mask/edit API; editing via
  ref images)
- Hide mode/init_image/mask_image fields on Gemini node (not usable)
- Hide mask_image field on OpenAI node (no model supports inpaint)

* Chore typegen

* fix(external): improve OpenAI node UX and disable cache by default

- Hide OpenAI node's mode and init_image fields: OpenAI's API has no
  img2img/inpaint distinction (the edits endpoint is invoked
  automatically when reference images are provided). init_image is
  functionally identical to a reference image and was misleading users.
- Default use_cache to False for external image generation nodes:
  external API calls are non-deterministic and incur usage costs.
  Cache hits returned stale image references that did not produce new
  gallery entries on repeat invokes.

* fix(external): duplicate cached images on cache hit instead of skipping

External image generation nodes use the standard invocation cache, but
returning the cached output (with stale image_name references) on cache
hits resulted in no new gallery entries — the Invoke button would spin
indefinitely on repeat invokes with identical parameters.

Override invoke_internal so that on cache hit, the cached images are
loaded and re-saved as new gallery entries. The expensive API call is
still skipped (cost saving), but the user sees a new image as expected.

* Chore typegen + ruff

* CHore ruff format

* fix(external): restore OpenAI advanced settings on Remix recall

Remix recall iterates through ImageMetadataHandlers but only Gemini's
temperature handler was wired up — OpenAI's quality, background, and
input_fidelity were stored in image metadata but never parsed back into
the params slice. Add the three missing handlers so Remix restores
these settings as expected.

---------

Co-authored-by: Alexander Eichhorn <alex@eichhorn.dev>
Co-authored-by: Alexander Eichhorn <alex@code-with.us>
Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2026-04-20 17:13:26 +00:00

264 lines
7.8 KiB
Python

from enum import Enum
from typing import Dict, TypeAlias, Union
import onnxruntime as ort
import torch
from diffusers.models.modeling_utils import ModelMixin
from diffusers.pipelines.pipeline_utils import DiffusionPipeline
from pydantic import TypeAdapter
from invokeai.backend.raw_model import RawModel
# ModelMixin is the base class for all diffusers and transformers models
# RawModel is the InvokeAI wrapper class for ip_adapters, loras, textual_inversion and onnx runtime
AnyModel: TypeAlias = Union[
ModelMixin,
RawModel,
torch.nn.Module,
Dict[str, torch.Tensor],
DiffusionPipeline,
ort.InferenceSession,
]
"""Type alias for any kind of runtime, in-memory model representation. For example, a torch module or diffusers pipeline."""
class BaseModelType(str, Enum):
"""An enumeration of base model architectures. For example, Stable Diffusion 1.x, Stable Diffusion 2.x, FLUX, etc.
Every model config must have a base architecture type.
Not all models are associated with a base architecture. For example, CLIP models are their own thing, not related
to any particular model architecture. To simplify internal APIs and make it easier to work with models, we use a
fallback/null value `BaseModelType.Any` for these models, instead of making the model base optional."""
Any = "any"
"""`Any` is essentially a fallback/null value for models with no base architecture association.
For example, CLIP models are not related to Stable Diffusion, FLUX, or any other model arch."""
StableDiffusion1 = "sd-1"
"""Indicates the model is associated with the Stable Diffusion 1.x model architecture, including 1.4 and 1.5."""
StableDiffusion2 = "sd-2"
"""Indicates the model is associated with the Stable Diffusion 2.x model architecture, including 2.0 and 2.1."""
StableDiffusion3 = "sd-3"
"""Indicates the model is associated with the Stable Diffusion 3.5 model architecture."""
StableDiffusionXL = "sdxl"
"""Indicates the model is associated with the Stable Diffusion XL model architecture."""
StableDiffusionXLRefiner = "sdxl-refiner"
"""Indicates the model is associated with the Stable Diffusion XL Refiner model architecture."""
Flux = "flux"
"""Indicates the model is associated with FLUX.1 model architecture, including FLUX Dev, Schnell and Fill."""
Flux2 = "flux2"
"""Indicates the model is associated with FLUX.2 model architecture, including FLUX2 Klein."""
CogView4 = "cogview4"
"""Indicates the model is associated with CogView 4 model architecture."""
ZImage = "z-image"
"""Indicates the model is associated with Z-Image model architecture, including Z-Image-Turbo."""
External = "external"
"""Indicates the model is hosted by an external provider."""
QwenImage = "qwen-image"
"""Indicates the model is associated with Qwen Image Edit 2511 model architecture."""
Anima = "anima"
"""Indicates the model is associated with Anima model architecture (Cosmos Predict2 DiT + LLM Adapter)."""
Unknown = "unknown"
"""Indicates the model's base architecture is unknown."""
class ModelType(str, Enum):
"""Model type."""
ONNX = "onnx"
Main = "main"
VAE = "vae"
LoRA = "lora"
ControlLoRa = "control_lora"
ControlNet = "controlnet" # used by model_probe
TextualInversion = "embedding"
IPAdapter = "ip_adapter"
CLIPVision = "clip_vision"
CLIPEmbed = "clip_embed"
T2IAdapter = "t2i_adapter"
T5Encoder = "t5_encoder"
Qwen3Encoder = "qwen3_encoder"
SpandrelImageToImage = "spandrel_image_to_image"
SigLIP = "siglip"
FluxRedux = "flux_redux"
LlavaOnevision = "llava_onevision"
ExternalImageGenerator = "external_image_generator"
Unknown = "unknown"
class SubModelType(str, Enum):
"""Submodel type."""
UNet = "unet"
Transformer = "transformer"
TextEncoder = "text_encoder"
TextEncoder2 = "text_encoder_2"
TextEncoder3 = "text_encoder_3"
Tokenizer = "tokenizer"
Tokenizer2 = "tokenizer_2"
Tokenizer3 = "tokenizer_3"
VAE = "vae"
VAEDecoder = "vae_decoder"
VAEEncoder = "vae_encoder"
Scheduler = "scheduler"
SafetyChecker = "safety_checker"
class ClipVariantType(str, Enum):
"""Variant type."""
L = "large"
G = "gigantic"
class ModelVariantType(str, Enum):
"""Variant type."""
Normal = "normal"
Inpaint = "inpaint"
Depth = "depth"
class FluxVariantType(str, Enum):
"""FLUX.1 model variants."""
Schnell = "schnell"
Dev = "dev"
DevFill = "dev_fill"
class Flux2VariantType(str, Enum):
"""FLUX.2 model variants."""
Klein4B = "klein_4b"
"""Flux2 Klein 4B variant using Qwen3 4B text encoder."""
Klein9B = "klein_9b"
"""Flux2 Klein 9B variant using Qwen3 8B text encoder (distilled)."""
Klein9BBase = "klein_9b_base"
"""Flux2 Klein 9B Base variant - undistilled foundation model using Qwen3 8B text encoder."""
class ZImageVariantType(str, Enum):
"""Z-Image model variants."""
Turbo = "turbo"
"""Z-Image Turbo - distilled model optimized for 8 steps, no CFG support."""
ZBase = "zbase"
"""Z-Image Base - undistilled foundation model with full CFG and negative prompt support."""
class QwenImageVariantType(str, Enum):
"""Qwen Image model variants."""
Generate = "generate"
"""Qwen Image - text-to-image generation model."""
Edit = "edit"
"""Qwen Image Edit - image editing model with reference image support."""
class Qwen3VariantType(str, Enum):
"""Qwen3 text encoder variants based on model size."""
Qwen3_4B = "qwen3_4b"
"""Qwen3 4B text encoder (hidden_size=2560). Used by FLUX.2 Klein 4B and Z-Image."""
Qwen3_8B = "qwen3_8b"
"""Qwen3 8B text encoder (hidden_size=4096). Used by FLUX.2 Klein 9B."""
Qwen3_06B = "qwen3_06b"
"""Qwen3 0.6B text encoder (hidden_size=1024). Used by Anima."""
class ModelFormat(str, Enum):
"""Storage format of model."""
OMI = "omi"
Diffusers = "diffusers"
Checkpoint = "checkpoint"
LyCORIS = "lycoris"
ONNX = "onnx"
Olive = "olive"
EmbeddingFile = "embedding_file"
EmbeddingFolder = "embedding_folder"
InvokeAI = "invokeai"
T5Encoder = "t5_encoder"
Qwen3Encoder = "qwen3_encoder"
BnbQuantizedLlmInt8b = "bnb_quantized_int8b"
BnbQuantizednf4b = "bnb_quantized_nf4b"
GGUFQuantized = "gguf_quantized"
ExternalApi = "external_api"
Unknown = "unknown"
class SchedulerPredictionType(str, Enum):
"""Scheduler prediction type."""
Epsilon = "epsilon"
VPrediction = "v_prediction"
Sample = "sample"
class ModelRepoVariant(str, Enum):
"""Various hugging face variants on the diffusers format."""
Default = "" # model files without "fp16" or other qualifier
FP16 = "fp16"
FP32 = "fp32"
ONNX = "onnx"
OpenVINO = "openvino"
Flax = "flax"
class ModelSourceType(str, Enum):
"""Model source type."""
Path = "path"
Url = "url"
HFRepoID = "hf_repo_id"
External = "external"
class FluxLoRAFormat(str, Enum):
"""Flux LoRA formats."""
Diffusers = "flux.diffusers"
Kohya = "flux.kohya"
OneTrainer = "flux.onetrainer"
Control = "flux.control"
AIToolkit = "flux.aitoolkit"
XLabs = "flux.xlabs"
BflPeft = "flux.bfl_peft"
OneTrainerBfl = "flux.onetrainer_bfl"
AnyVariant: TypeAlias = Union[
ModelVariantType,
ClipVariantType,
FluxVariantType,
Flux2VariantType,
ZImageVariantType,
QwenImageVariantType,
Qwen3VariantType,
]
variant_type_adapter = TypeAdapter[
ModelVariantType
| ClipVariantType
| FluxVariantType
| Flux2VariantType
| ZImageVariantType
| QwenImageVariantType
| Qwen3VariantType
](
ModelVariantType
| ClipVariantType
| FluxVariantType
| Flux2VariantType
| ZImageVariantType
| QwenImageVariantType
| Qwen3VariantType
)