InvokeAI

mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-01-30 16:38:02 -05:00

Author	SHA1	Message	Date
psychedelicious	203fa04295	feat(nodes): support bottleneck flag for nodes	2025-05-13 11:56:40 +10:00
psychedelicious	1e85184c62	feat(nodes): add imagen3/chatgpt-4o field types	2025-05-06 09:07:52 -04:00
psychedelicious	cc54466db9	fix(nodes): default value for UIConfigBase.tags	2025-04-28 13:31:26 -04:00
psychedelicious	cbdafe7e38	feat(nodes): allow node clobbering	2025-04-28 13:31:26 -04:00
psychedelicious	8ed5585285	feat(nodes): move output metadata to BaseInvocationOutput	2025-04-28 09:19:43 -04:00
Mary Hipp	4a0df6b865	add optional output_metadata to baseinvocation	2025-04-28 09:19:43 -04:00
psychedelicious	814406d98a	feat(mm): siglip model loading supports partial loading In the previous commit, the LLaVA model was updated to support partial loading. In this commit, the SigLIP model is updated in the same way. This model is used for FLUX Redux. It's <4GB and only ever run in isolation, so it won't benefit from partial loading for the vast majority of users. Regardless, I think it is best if we make _all_ models work with partial loading. PS: I also fixed the initial load dtype issue, described in the prev commit. It's probably a non-issue for this model, but we may as well fix it.	2025-04-18 10:12:03 +10:00
psychedelicious	c054501103	feat(mm): llava model loading supports partial loading; fix OOM crash on initial load The model manager has two types of model cache entries: - `CachedModelOnlyFullLoad`: The model may only ever be loaded and unloaded as a single object. - `CachedModelWithPartialLoad`: The model may be partially loaded and unloaded. Partial loaded is enabled by overwriting certain torch layer classes, adding the ability to autocast the layer to a device on-the-fly. See `CustomLinear` for an example. So, to take advantage of partial loading and be cached as a `CachedModelWithPartialLoad`, the model must inherit from `torch.nn.Module`. The LLaVA classes provided by `transformers` do inherit from `torch.nn.Module`, but we wrap those classes in a separate class called `LlavaOnevisionModel`. The wrapper encapsulate both the LLaVA model and its "processor" - a lightweight class that prepares model inputs like text and images. While it is more elegant to encapsulate both model and processor classes in a single entity, this prevents the model cache from enabling partial loading for the chunky vLLM model. Fixing this involved a few changes. - Update the `LlavaOnevisionModelLoader` class to operate on the vLLM model directly, instead the `LlavaOnevisionModel` wrapper class. - Instantiate the processor directly in the node. The processor is lightweight and does its business on the CPU. We don't need to worry about caching in the model manager. - Remove caching support code from the `LlavaOnevisionModel` wrapper class. It's not needed, because we do not cache this class. The class now only handles running the models provided to it. - Rename `LlavaOnevisionModel` to `LlavaOnevisionPipeline` to better represent its purpose. These changes have a bonus effect of fixing an OOM crash when initially loading the models. This was most apparent when loading LLaVA 7B, which is pretty chunky. The initial load is onto CPU RAM. In the old version of the loaders, we ignored the loader's target dtype for the initial load. Instead, we loaded the model at `transformers`'s "default" dtype of fp32. LLaVA 7B is fp16 and weighs ~17GB. Loading as fp32 means we need double that amount (~34GB) of CPU RAM. Many users only have 32GB RAM, so this causes a _CPU_ OOM - which is a hard crash of the whole process. With the updated loaders, the initial load logic now uses the target dtype for the initial load. LLaVA now needs the expected ~17GB RAM for its initial load. PS: If we didn't make the accompanying partial loading changes, we still could have solved this OOM. We'd just need to pass the initial load dtype to the wrapper class and have it load on that dtype. But we may as well fix both issues. PPS: There are other models whose model classes are wrappers around a torch module class, and thus cannot be partially loaded. However, these models are typically fairly small and/or are run only on their own, so they don't benefit as much from partial loading. It's the really big models (like LLaVA 7B) that benefit most from the partial loading.	2025-04-18 10:12:03 +10:00
skunkworxdark	566282bff0	Update metadata_linked.py added metadata_to_string_collection, metadata_to_integer_collection, metadata_to_float_collection, metadata_to_bool_collection	2025-04-16 06:28:22 +10:00
psychedelicious	a5bc21cf50	feat(nodes): extract LaMa model url to constant	2025-04-15 07:13:25 +10:00
psychedelicious	ae8d1f26d6	fix(app): import CogView4Transformer2DModel from the module that exports it	2025-04-10 10:50:13 +10:00
psychedelicious	ad582c8cc5	feat(nodes): rename CogView4 nodes to match naming format	2025-04-10 10:50:13 +10:00
maryhipp	305c5761d0	add generation modes for cogview linear	2025-04-10 10:50:13 +10:00
Ryan Dick	d86cd66994	Add CogView4 VAE approximation for progress images.	2025-04-10 10:50:13 +10:00
Ryan Dick	13850271ab	Add inpainting to CogView4DenoiseInvocation.	2025-04-10 10:50:13 +10:00
Ryan Dick	7e894ffe83	Consolidate InpaintExtension implementations for SD3 and FLUX.	2025-04-10 10:50:13 +10:00
Ryan Dick	0939030324	Support cfg_scale list in CogView4Denoise.	2025-04-10 10:50:13 +10:00
Ryan Dick	30f19dc37a	Update CogView4Denoise to support image-to-image.	2025-04-10 10:50:13 +10:00
Ryan Dick	ace5e748f4	Simplify CogView4 timesteps schedule generation in preparation for timestep schedule slipping.	2025-04-10 10:50:13 +10:00
Ryan Dick	4fae8ad163	Add CogView4ImageToLatentsInvocation.	2025-04-10 10:50:13 +10:00
Ryan Dick	5e75bc570a	Fix bug in CogView4 noise schedule handling that was resulting in low-quality images.	2025-04-10 10:50:13 +10:00
Ryan Dick	3166b5d2ea	Switch to sequential CFG for CogView4 (for now, until I sort out the padding).	2025-04-10 10:50:13 +10:00
Ryan Dick	321c2d358c	Add CogView4 model loader. And various other fixes to get a CogView4 workflow running (though quality is still below expectations).	2025-04-10 10:50:13 +10:00
Ryan Dick	cf76a0b575	Add CogView4ModelLoaderInvocation. (Not wired up with frontend yet.)	2025-04-10 10:50:13 +10:00
Ryan Dick	67bfd63c73	Require the cogview4 height/width are multiples of 32. This requirement is documented here: https://huggingface.co/THUDM/CogView4-6B . I haven't tracked down the underlying source of this requirement.	2025-04-10 10:50:13 +10:00
Ryan Dick	cdad8a4fd1	Add CogView4LatentsToImageInvocation.	2025-04-10 10:50:13 +10:00
Ryan Dick	5d9797945b	Completed first pass of CogView4Denoise.	2025-04-10 10:50:13 +10:00
Ryan Dick	78159c3200	Simplify CogView4 timestep schedule initialization.	2025-04-10 10:50:13 +10:00
Ryan Dick	1320c4fa13	WIP - CogView4DenoiseInvocation.	2025-04-10 10:50:13 +10:00
Ryan Dick	bac05a7885	Add CogView4TextEncoderInvocation	2025-04-10 10:50:13 +10:00
psychedelicious	49622c37ed	fix(nodes): logic bug in flux redux node	2025-04-08 10:33:45 +10:00
skunkworxdark	e1538af219	Update flux_redux.py Add down sampling and weight to redux node	2025-04-08 10:33:45 +10:00
psychedelicious	8d3743c6f2	tidy(nodes): rename `controlnet_image_processors.py` -> `controlnet.py`	2025-04-04 18:42:13 +11:00
psychedelicious	986b7426d2	tidy(nodes): remove unused old dw openpose detector class	2025-04-04 18:42:13 +11:00
psychedelicious	8d8150b47e	tidy(nodes): remove deprecated controlnet "processor" nodes	2025-04-04 18:42:13 +11:00
psychedelicious	595133463e	feat(nodes): add methods to invalidate invocation typeadapters	2025-03-31 19:15:59 +11:00
psychedelicious	6155f9ff9e	feat(nodes): move invocation/output registration to separate class	2025-03-31 19:15:59 +11:00
psychedelicious	7be87c8048	refactor(nodes): simpler logic for baseinvocation typeadapter handling	2025-03-31 19:15:59 +11:00
psychedelicious	4109ea5324	fix(nodes): expanded masks not 100% transparent outside the fade out region The polynomial fit isn't perfect and we end up with alpha values of 1 instead of 0 when applying the mask. This in turn causes issues on canvas where outputs aren't 100% transparent and individual layer bbox calculations are incorrect.	2025-03-31 11:17:00 +11:00
psychedelicious	258bf736da	fix(nodes): handle zero fade size (e.g. mask blur 0) Closes #7850	2025-03-28 08:14:06 +11:00
psychedelicious	9ca071819b	chore(nodes): remove beta/prototype flag from a lot of stable nodes	2025-03-27 08:08:44 +11:00
psychedelicious	b14d8e8192	chore(nodes): mark llava_onevision_vllm as beta	2025-03-27 08:08:44 +11:00
Billy	182580ff69	Imports	2025-03-26 12:55:10 +11:00
psychedelicious	5127a07cf9	feat(nodes): clean up lora node names I had named them wonkily and caused some user confusion.	2025-03-24 12:45:46 +11:00
psychedelicious	c013a6e38d	feat(nodes): deprecate canvas_v2_mask_and_crop	2025-03-21 10:24:03 +11:00
psychedelicious	6cfeb71bed	feat(nodes): add expand_mask_with_fade to better handle canvas compositing needs Previously we used erode/dilate and a Gaussian blur to expand and fade the edges of Canvas masks. The implementation a number of problems: - Erode/dilate kernel sizes were not calculated correctly, and extra iterations were run to compensate. The result is the blur size, which should have been pixels, was very inaccurate and unreliable. - What we want is to add a "soft bleed" - like a drop shadow with no offset - starting from the edge of the mask, extending out by however many pixels. But Gaussian blur does not do this. The blurred area starts _inside_ the mask and extends outside it. So it kinda blurs inwards and outwards. We compensated for this by expanding the mask. - Using a Gaussian blur can cause banding artifacts. Gaussian blur doesn't have a "size" or "radius" parameter in the sense that you think it should. It's a convolution matrix and there are _no non-zero values in the result_. This means that, far away from the mask, once compositing completes, we have some values that are very close to zero but not quite zero. These values are quantized by HTML Canvas, resulting in banding artifacts where you'd expect the blur to have faded to 0% alpha. At least, that is my understanding of why the banding artifacts occur. The new node uses a better strategy to expand the mask and add the fade out effect: - Calculate the distance from each white pixel to the nearest black pixel. - Normalize this distance by dividing by the fade size in px, then clip the values to 0 - 1. The result represents the distance of each white pixel to its nearest black pixel as a percentage of the fade size. At this point, it is a linear distribution. - Create a polynomial to describe the fade's intensity so that we can have a smooth transition from the masked region (black) to unmasked (white). There are some magic numbers here, deterined experimentally. - Evaluate the polynomial over the normalized distances, so we now have a matrix representing the fade intensity for every pixel - Convert this matrix back to uint8 and apply it to the mask This works soooo much better than the previous method. Not only does it fix the banding issues, but when we enable "output only generated regions", we get a much smaller image. Will add images to the PR to clarify.	2025-03-21 10:24:03 +11:00
psychedelicious	534f993023	feat(nodes): add `apply_mask_to_image` node It simply applies the mask to an image.	2025-03-21 10:24:03 +11:00
psychedelicious	67f9b6420c	fix(nodes): ensure alpha mask is opened as RGBA	2025-03-21 10:24:03 +11:00
psychedelicious	61bf065237	feat(nodes): rename "FLUX Fill" -> "FLUX Fill Conditioning"	2025-03-21 10:24:03 +11:00
Ryan Dick	9cc2232b6f	Bump FluxDenoise invocation version and typegen.	2025-03-19 14:45:18 +11:00

1 2 3 4 5 ...

1521 Commits