There was a really confusing aspect of the SAM pipeline classes where
they accepted deeply nested lists of different dimensions (bbox, points,
and labels).
The lengths of the lists are related; each point must have a
corresponding label, and if bboxes are provided with points, they must
be same length.
I've refactored the backend API to take a single list of SAMInput
objects. This class has a bbox and/or a list of points, making it much
simpler to provide the right shape of inputs.
Internally, the pipeline classes take rejigger these input classes to
have the correct nesting.
The Nodes still have an awkward API where you can provide both bboxes
and points of different lengths, so I added a pydantic validator that
enforces correct lenghts.
Fixes errors like `AttributeError: module 'cv2.ximgproc' has no
attribute 'thinning'` which occur because there is a conflict between
our own `opencv-contrib-python` dependency and the `invisible-watermark`
library's `opencv-python`.
- Move the estimation logic to utility functions
- Estimate memory _within_ the encode and decode methods, ensuring we
_always_ estimate working memory when running a VAE
Tell the model manager that we need some extra working memory for VAE
encoding operations to prevent OOMs.
See previous commit for investigation and determination of the magic
numbers used.
This safety measure is especially relevant now that we have FLUX Kontext
and may be encoding rather large ref images. Without the working memory
estimation we can OOM as we prepare for denoising.
See #8405 for an example of this issue on a very low VRAM system. It's
possible we can have the same issue on any GPU, though - just a matter
of hitting the right combination of models loaded.
When unsafe_disable_picklescan is enabled, instead of erroring on
detections or scan failures, a warning is logged.
A warning is also logged on app startup when this setting is enabled.
The setting is disabled by default and there is no change in behaviour
when disabled.
Implements intelligent spatial tiling that arranges multiple reference
images in a virtual canvas, choosing between horizontal and vertical
placement to maintain a square-like aspect ratio