This feature added a lot of unexpected complexity in graph building /
metadata recall and is unintuitive user experience. 99% of the time, the
style prompt should be exactly the main prompt.
You can still use style prompts in workflows, but in an effort to reduce
complexity in the linear UI, we are removing this rarely-used feature.
When installing a model, the previous, graceful logic would increment a
suffix on the destination path until found a free path for the model.
But because model file installation and record creation are not in a
transaction, we could end up moving the file successfully and fail to
create the record:
- User attempts to install an already-installed model
- Attempt to move the downloaded model from download tempdir to
destination path
- The path already exists
- Add `_1` or similar to the path until we find a path that is free
- Move the model
- Create the model record
- FK constraint violation bc we already have a model w/ that name, but
the model file has already been moved into the invokeai dir.
Closes#8416
Prevents a large spike in VRAM when preparing to denoise w/ multiple ref
images.
There doesn't appear to be any different in image quality / ref
adherence when concatenating in latent space vs image space, though
images _are_ different.
If the transformer fills up VRAM, then when we VAE encode kontext
latents, we'll need to first offload the transformer (partially, if
partial loading is enabled).
No need to do this - we can encode kontext latents before loading the
transformer to reduce model thrashing.
Tell the model manager that we need some extra working memory for VAE
encoding operations to prevent OOMs.
See previous commit for investigation and determination of the magic
numbers used.
This safety measure is especially relevant now that we have FLUX Kontext
and may be encoding rather large ref images. Without the working memory
estimation we can OOM as we prepare for denoising.
See #8405 for an example of this issue on a very low VRAM system. It's
possible we can have the same issue on any GPU, though - just a matter
of hitting the right combination of models loaded.
This commit includes a task delegated to Claude to investigate our VAE
working memory calculations and investigation results.
See VAE_INVESTIGATION.md for motivation and detail. Everything else is
its output.
Result data includes empirical measurements for all supported model
architectures at a variety of resolutions and fp16/fp32 precision.
Testing conducted on a 4090.
The summarized conclusion is that our working memory estimations for
decoding are spot-on, but decoding also needs some extra working memory.
Empirical measurements suggest ~45% the amount needed for encoding.
A followup commit will implement working memory estimations for VAE
encoding with the goal of preventing unexpected OOMs during encode.