mirror of
https://github.com/invoke-ai/InvokeAI.git
synced 2026-04-23 03:00:31 -04:00
* feat: initial external model support * feat: support reference images for external models * fix: sorting lint error * chore: hide Reidentify button for external models * review: enable auto-install/remove fro external models * feat: show external mode name during install * review: model descriptions * review: implemented review comments * review: added optional seed control for external models * chore: fix linter warning * review: save api keys to a seperate file * docs: updated external model docs * chore: fix linter errors * fix: sync configured external starter models on startup * feat(ui): add provider-specific external generation nodes * feat: expose external panel schemas in model configs * feat(ui): drive external panels from panel schema * docs: sync app config docstring order * feat: add gemini 3.1 flash image preview starter model * feat: update gemini image model limits * fix: resolve TypeScript errors and move external provider config to api_keys.yaml Add 'external', 'external_image_generator', and 'external_api' to Zod enum schemas (zBaseModelType, zModelType, zModelFormat) to match the generated OpenAPI types. Remove redundant union workarounds from component prop types and Record definitions. Fix type errors in ModelEdit (react-hook-form Control invariance), parsing.tsx (model identifier narrowing), buildExternalGraph (edge typing), and ModelSettings import/export buttons. Move external_gemini_base_url and external_openai_base_url into api_keys.yaml alongside the API keys so all external provider config lives in one dedicated file, separate from invokeai.yaml. * feat: add resolution presets and imageConfig support for Gemini 3 models Add combined resolution preset selector for external models that maps aspect ratio + image size to fixed dimensions. Gemini 3 Pro and 3.1 Flash now send imageConfig (aspectRatio + imageSize) via generationConfig instead of text-based aspect ratio hints used by Gemini 2.5 Flash. Backend: ExternalResolutionPreset model, resolution_presets capability field, image_size on ExternalGenerationRequest, and Gemini provider imageConfig logic. Frontend: ExternalSettingsAccordion with combo resolution select, dimension slider disabling for fixed-size models, and panel schema constraint wiring for Steps/Guidance/Seed controls. * Remove unused external model fields and add provider-specific parameters - Remove negative_prompt, steps, guidance, reference_image_weights, reference_image_modes from external model nodes (unused by any provider) - Remove supports_negative_prompt, supports_steps, supports_guidance from ExternalModelCapabilities - Add provider_options dict to ExternalGenerationRequest for provider-specific parameters - Add OpenAI-specific fields: quality, background, input_fidelity - Add Gemini-specific fields: temperature, thinking_level - Add new OpenAI starter models: GPT Image 1.5, GPT Image 1 Mini, DALL-E 3, DALL-E 2 - Fix OpenAI provider to use output_format (GPT Image) vs response_format (DALL-E) and send model ID in requests - Add fixed aspect ratio sizes for OpenAI models (bucketing) - Add ExternalProviderRateLimitError with retry logic for 429 responses - Add provider-specific UI components in ExternalSettingsAccordion - Simplify ParamSteps/ParamGuidance by removing dead external overrides - Update all backend and frontend tests * Chore Ruff check & format * Chore typegen * feat: full canvas workflow integration for external models - Add missing aspect ratios (4:5, 5:4, 8:1, 4:1, 1:4, 1:8) to type system for external model support - Sync canvas bbox when external model resolution preset is selected - Use params preset dimensions in buildExternalGraph to prevent "unsupported aspect ratio" errors - Lock all bbox controls (resize handles, aspect ratio select, width/height sliders, swap/optimal buttons) for external models with fixed dimension presets - Disable denoise strength slider for external models (not applicable) - Sync bbox aspect ratio changes back to paramsSlice for external models - Initialize bbox dimensions when switching to an external model * Chore typegen Linux seperator * feat: full canvas workflow integration for external models - Update buildExternalGraph test to include dimensions in mock params * Merge remote-tracking branch 'upstream/main' into external-models * Chore pnpm fix * add missing parameter * docs: add External Models guide with Gemini and OpenAI provider pages * fix(external-models): address PR review feedback - Gemini recall: write temperature, thinking_level, image_size to image metadata; wire external graph as metadata receiver; add recall handlers. - Canvas: gate regional guidance, inpaint mask, and control layer for external models. - Canvas: throw a clear error on outpainting for external models (was falling back to inpaint and hitting an API-side mask/image size mismatch). - Workflow editor: add ui_model_provider_id filter so OpenAI and Gemini nodes only list their own provider's models. - Workflow editor: silently drop seed when the selected model does not support it instead of raising a capability error. - Remove the legacy external_image_generation invocation and the graph-builder fallback; providers must register a dedicated node. - Regenerate schema.ts. - remove Gemini debug dumps to outputs/external_debug * fix(external-models): resolve TSC errors in metadata parsing and external graph - Export imageSizeChanged from paramsSlice (required by the new ImageSize recall handler). - Emit the external graph's metadata model entry via zModelIdentifierField since ExternalApiModelConfig is not part of the AnyModelConfig union. * chore: prettier format ModelIdentifierFieldInputComponent * fix: remove unsupported thinkingConfig from Gemini image models and restrict GPT Image models to txt2img * chore typegen * chore(docs): regenerate settings.json for external provider fields * fix(external): fix mask handling and mode support for external providers - Remove img2img and inpaint modes from Gemini models (Gemini has no bitmap mask or dedicated edit API; image editing works via reference images in the UI) - Fix DALL-E 2 inpainting: convert grayscale mask to RGBA with alpha channel transparency (OpenAI expects transparent=edit area) and convert init image to RGBA when mask is present * fix(external): update mode support and UI for external providers - Remove DALL-E 2 from starter models (deprecated, shutdown May 12 2026) - Enable img2img for GPT Image 1/1.5/1-mini (supports edits endpoint) - Set Gemini models to txt2img only (no mask/edit API; editing via ref images) - Hide mode/init_image/mask_image fields on Gemini node (not usable) - Hide mask_image field on OpenAI node (no model supports inpaint) * Chore typegen * fix(external): improve OpenAI node UX and disable cache by default - Hide OpenAI node's mode and init_image fields: OpenAI's API has no img2img/inpaint distinction (the edits endpoint is invoked automatically when reference images are provided). init_image is functionally identical to a reference image and was misleading users. - Default use_cache to False for external image generation nodes: external API calls are non-deterministic and incur usage costs. Cache hits returned stale image references that did not produce new gallery entries on repeat invokes. * fix(external): duplicate cached images on cache hit instead of skipping External image generation nodes use the standard invocation cache, but returning the cached output (with stale image_name references) on cache hits resulted in no new gallery entries — the Invoke button would spin indefinitely on repeat invokes with identical parameters. Override invoke_internal so that on cache hit, the cached images are loaded and re-saved as new gallery entries. The expensive API call is still skipped (cost saving), but the user sees a new image as expected. * Chore typegen + ruff * CHore ruff format * fix(external): restore OpenAI advanced settings on Remix recall Remix recall iterates through ImageMetadataHandlers but only Gemini's temperature handler was wired up — OpenAI's quality, background, and input_fidelity were stored in image metadata but never parsed back into the params slice. Add the three missing handlers so Remix restores these settings as expected. --------- Co-authored-by: Alexander Eichhorn <alex@eichhorn.dev> Co-authored-by: Alexander Eichhorn <alex@code-with.us> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
264 lines
7.8 KiB
Python
264 lines
7.8 KiB
Python
from enum import Enum
|
|
from typing import Dict, TypeAlias, Union
|
|
|
|
import onnxruntime as ort
|
|
import torch
|
|
from diffusers.models.modeling_utils import ModelMixin
|
|
from diffusers.pipelines.pipeline_utils import DiffusionPipeline
|
|
from pydantic import TypeAdapter
|
|
|
|
from invokeai.backend.raw_model import RawModel
|
|
|
|
# ModelMixin is the base class for all diffusers and transformers models
|
|
# RawModel is the InvokeAI wrapper class for ip_adapters, loras, textual_inversion and onnx runtime
|
|
AnyModel: TypeAlias = Union[
|
|
ModelMixin,
|
|
RawModel,
|
|
torch.nn.Module,
|
|
Dict[str, torch.Tensor],
|
|
DiffusionPipeline,
|
|
ort.InferenceSession,
|
|
]
|
|
"""Type alias for any kind of runtime, in-memory model representation. For example, a torch module or diffusers pipeline."""
|
|
|
|
|
|
class BaseModelType(str, Enum):
|
|
"""An enumeration of base model architectures. For example, Stable Diffusion 1.x, Stable Diffusion 2.x, FLUX, etc.
|
|
|
|
Every model config must have a base architecture type.
|
|
|
|
Not all models are associated with a base architecture. For example, CLIP models are their own thing, not related
|
|
to any particular model architecture. To simplify internal APIs and make it easier to work with models, we use a
|
|
fallback/null value `BaseModelType.Any` for these models, instead of making the model base optional."""
|
|
|
|
Any = "any"
|
|
"""`Any` is essentially a fallback/null value for models with no base architecture association.
|
|
For example, CLIP models are not related to Stable Diffusion, FLUX, or any other model arch."""
|
|
StableDiffusion1 = "sd-1"
|
|
"""Indicates the model is associated with the Stable Diffusion 1.x model architecture, including 1.4 and 1.5."""
|
|
StableDiffusion2 = "sd-2"
|
|
"""Indicates the model is associated with the Stable Diffusion 2.x model architecture, including 2.0 and 2.1."""
|
|
StableDiffusion3 = "sd-3"
|
|
"""Indicates the model is associated with the Stable Diffusion 3.5 model architecture."""
|
|
StableDiffusionXL = "sdxl"
|
|
"""Indicates the model is associated with the Stable Diffusion XL model architecture."""
|
|
StableDiffusionXLRefiner = "sdxl-refiner"
|
|
"""Indicates the model is associated with the Stable Diffusion XL Refiner model architecture."""
|
|
Flux = "flux"
|
|
"""Indicates the model is associated with FLUX.1 model architecture, including FLUX Dev, Schnell and Fill."""
|
|
Flux2 = "flux2"
|
|
"""Indicates the model is associated with FLUX.2 model architecture, including FLUX2 Klein."""
|
|
CogView4 = "cogview4"
|
|
"""Indicates the model is associated with CogView 4 model architecture."""
|
|
ZImage = "z-image"
|
|
"""Indicates the model is associated with Z-Image model architecture, including Z-Image-Turbo."""
|
|
External = "external"
|
|
"""Indicates the model is hosted by an external provider."""
|
|
QwenImage = "qwen-image"
|
|
"""Indicates the model is associated with Qwen Image Edit 2511 model architecture."""
|
|
Anima = "anima"
|
|
"""Indicates the model is associated with Anima model architecture (Cosmos Predict2 DiT + LLM Adapter)."""
|
|
Unknown = "unknown"
|
|
"""Indicates the model's base architecture is unknown."""
|
|
|
|
|
|
class ModelType(str, Enum):
|
|
"""Model type."""
|
|
|
|
ONNX = "onnx"
|
|
Main = "main"
|
|
VAE = "vae"
|
|
LoRA = "lora"
|
|
ControlLoRa = "control_lora"
|
|
ControlNet = "controlnet" # used by model_probe
|
|
TextualInversion = "embedding"
|
|
IPAdapter = "ip_adapter"
|
|
CLIPVision = "clip_vision"
|
|
CLIPEmbed = "clip_embed"
|
|
T2IAdapter = "t2i_adapter"
|
|
T5Encoder = "t5_encoder"
|
|
Qwen3Encoder = "qwen3_encoder"
|
|
SpandrelImageToImage = "spandrel_image_to_image"
|
|
SigLIP = "siglip"
|
|
FluxRedux = "flux_redux"
|
|
LlavaOnevision = "llava_onevision"
|
|
ExternalImageGenerator = "external_image_generator"
|
|
Unknown = "unknown"
|
|
|
|
|
|
class SubModelType(str, Enum):
|
|
"""Submodel type."""
|
|
|
|
UNet = "unet"
|
|
Transformer = "transformer"
|
|
TextEncoder = "text_encoder"
|
|
TextEncoder2 = "text_encoder_2"
|
|
TextEncoder3 = "text_encoder_3"
|
|
Tokenizer = "tokenizer"
|
|
Tokenizer2 = "tokenizer_2"
|
|
Tokenizer3 = "tokenizer_3"
|
|
VAE = "vae"
|
|
VAEDecoder = "vae_decoder"
|
|
VAEEncoder = "vae_encoder"
|
|
Scheduler = "scheduler"
|
|
SafetyChecker = "safety_checker"
|
|
|
|
|
|
class ClipVariantType(str, Enum):
|
|
"""Variant type."""
|
|
|
|
L = "large"
|
|
G = "gigantic"
|
|
|
|
|
|
class ModelVariantType(str, Enum):
|
|
"""Variant type."""
|
|
|
|
Normal = "normal"
|
|
Inpaint = "inpaint"
|
|
Depth = "depth"
|
|
|
|
|
|
class FluxVariantType(str, Enum):
|
|
"""FLUX.1 model variants."""
|
|
|
|
Schnell = "schnell"
|
|
Dev = "dev"
|
|
DevFill = "dev_fill"
|
|
|
|
|
|
class Flux2VariantType(str, Enum):
|
|
"""FLUX.2 model variants."""
|
|
|
|
Klein4B = "klein_4b"
|
|
"""Flux2 Klein 4B variant using Qwen3 4B text encoder."""
|
|
|
|
Klein9B = "klein_9b"
|
|
"""Flux2 Klein 9B variant using Qwen3 8B text encoder (distilled)."""
|
|
|
|
Klein9BBase = "klein_9b_base"
|
|
"""Flux2 Klein 9B Base variant - undistilled foundation model using Qwen3 8B text encoder."""
|
|
|
|
|
|
class ZImageVariantType(str, Enum):
|
|
"""Z-Image model variants."""
|
|
|
|
Turbo = "turbo"
|
|
"""Z-Image Turbo - distilled model optimized for 8 steps, no CFG support."""
|
|
|
|
ZBase = "zbase"
|
|
"""Z-Image Base - undistilled foundation model with full CFG and negative prompt support."""
|
|
|
|
|
|
class QwenImageVariantType(str, Enum):
|
|
"""Qwen Image model variants."""
|
|
|
|
Generate = "generate"
|
|
"""Qwen Image - text-to-image generation model."""
|
|
|
|
Edit = "edit"
|
|
"""Qwen Image Edit - image editing model with reference image support."""
|
|
|
|
|
|
class Qwen3VariantType(str, Enum):
|
|
"""Qwen3 text encoder variants based on model size."""
|
|
|
|
Qwen3_4B = "qwen3_4b"
|
|
"""Qwen3 4B text encoder (hidden_size=2560). Used by FLUX.2 Klein 4B and Z-Image."""
|
|
|
|
Qwen3_8B = "qwen3_8b"
|
|
"""Qwen3 8B text encoder (hidden_size=4096). Used by FLUX.2 Klein 9B."""
|
|
|
|
Qwen3_06B = "qwen3_06b"
|
|
"""Qwen3 0.6B text encoder (hidden_size=1024). Used by Anima."""
|
|
|
|
|
|
class ModelFormat(str, Enum):
|
|
"""Storage format of model."""
|
|
|
|
OMI = "omi"
|
|
Diffusers = "diffusers"
|
|
Checkpoint = "checkpoint"
|
|
LyCORIS = "lycoris"
|
|
ONNX = "onnx"
|
|
Olive = "olive"
|
|
EmbeddingFile = "embedding_file"
|
|
EmbeddingFolder = "embedding_folder"
|
|
InvokeAI = "invokeai"
|
|
T5Encoder = "t5_encoder"
|
|
Qwen3Encoder = "qwen3_encoder"
|
|
BnbQuantizedLlmInt8b = "bnb_quantized_int8b"
|
|
BnbQuantizednf4b = "bnb_quantized_nf4b"
|
|
GGUFQuantized = "gguf_quantized"
|
|
ExternalApi = "external_api"
|
|
Unknown = "unknown"
|
|
|
|
|
|
class SchedulerPredictionType(str, Enum):
|
|
"""Scheduler prediction type."""
|
|
|
|
Epsilon = "epsilon"
|
|
VPrediction = "v_prediction"
|
|
Sample = "sample"
|
|
|
|
|
|
class ModelRepoVariant(str, Enum):
|
|
"""Various hugging face variants on the diffusers format."""
|
|
|
|
Default = "" # model files without "fp16" or other qualifier
|
|
FP16 = "fp16"
|
|
FP32 = "fp32"
|
|
ONNX = "onnx"
|
|
OpenVINO = "openvino"
|
|
Flax = "flax"
|
|
|
|
|
|
class ModelSourceType(str, Enum):
|
|
"""Model source type."""
|
|
|
|
Path = "path"
|
|
Url = "url"
|
|
HFRepoID = "hf_repo_id"
|
|
External = "external"
|
|
|
|
|
|
class FluxLoRAFormat(str, Enum):
|
|
"""Flux LoRA formats."""
|
|
|
|
Diffusers = "flux.diffusers"
|
|
Kohya = "flux.kohya"
|
|
OneTrainer = "flux.onetrainer"
|
|
Control = "flux.control"
|
|
AIToolkit = "flux.aitoolkit"
|
|
XLabs = "flux.xlabs"
|
|
BflPeft = "flux.bfl_peft"
|
|
OneTrainerBfl = "flux.onetrainer_bfl"
|
|
|
|
|
|
AnyVariant: TypeAlias = Union[
|
|
ModelVariantType,
|
|
ClipVariantType,
|
|
FluxVariantType,
|
|
Flux2VariantType,
|
|
ZImageVariantType,
|
|
QwenImageVariantType,
|
|
Qwen3VariantType,
|
|
]
|
|
variant_type_adapter = TypeAdapter[
|
|
ModelVariantType
|
|
| ClipVariantType
|
|
| FluxVariantType
|
|
| Flux2VariantType
|
|
| ZImageVariantType
|
|
| QwenImageVariantType
|
|
| Qwen3VariantType
|
|
](
|
|
ModelVariantType
|
|
| ClipVariantType
|
|
| FluxVariantType
|
|
| Flux2VariantType
|
|
| ZImageVariantType
|
|
| QwenImageVariantType
|
|
| Qwen3VariantType
|
|
)
|