Previously we used erode/dilate and a Gaussian blur to expand and fade the edges of Canvas masks. The implementation a number of problems:
- Erode/dilate kernel sizes were not calculated correctly, and extra iterations were run to compensate. The result is the blur size, which should have been pixels, was very inaccurate and unreliable.
- What we want is to add a "soft bleed" - like a drop shadow with no offset - starting from the edge of the mask, extending out by however many pixels. But Gaussian blur does not do this. The blurred area starts _inside_ the mask and extends outside it. So it kinda blurs inwards and outwards. We compensated for this by expanding the mask.
- Using a Gaussian blur can cause banding artifacts. Gaussian blur doesn't have a "size" or "radius" parameter in the sense that you think it should. It's a convolution matrix and there are _no non-zero values in the result_. This means that, far away from the mask, once compositing completes, we have some values that are very close to zero but not quite zero. These values are quantized by HTML Canvas, resulting in banding artifacts where you'd expect the blur to have faded to 0% alpha. At least, that is my understanding of why the banding artifacts occur.
The new node uses a better strategy to expand the mask and add the fade out effect:
- Calculate the distance from each white pixel to the nearest black pixel.
- Normalize this distance by dividing by the fade size in px, then clip the values to 0 - 1. The result represents the distance of each white pixel to its nearest black pixel as a percentage of the fade size. At this point, it is a linear distribution.
- Create a polynomial to describe the fade's intensity so that we can have a smooth transition from the masked region (black) to unmasked (white). There are some magic numbers here, deterined experimentally.
- Evaluate the polynomial over the normalized distances, so we now have a matrix representing the fade intensity for every pixel
- Convert this matrix back to uint8 and apply it to the mask
This works soooo much better than the previous method. Not only does it fix the banding issues, but when we enable "output only generated regions", we get a much smaller image. Will add images to the PR to clarify.
In #7688 we optimized queuing preparation logic. This inadvertently broke retrying queue items.
Previously, a `NamedTuple` was used to store the values to insert in the DB when enqueuing. This handy class provides an API similar to a dataclass, where you can instantiate it with kwargs in any order. The resultant tuple re-orders the kwargs to match the order in the class definition.
For example, consider this `NamedTuple`:
```py
class SessionQueueValueToInsert(NamedTuple):
foo: str
bar: str
```
When instantiating it, no matter the order of the kwargs, if you make a normal tuple out of it, the tuple values are in the same order as in the class definition:
```
t1 = SessionQueueValueToInsert(foo="foo", bar="bar")
print(tuple(t1)) # -> ('foo', 'bar')
t2 = SessionQueueValueToInsert(bar="bar", foo="foo")
print(tuple(t2)) # -> ('foo', 'bar')
```
So, in the old code, when we used the `NamedTuple`, it implicitly normalized the order of the values we insert into the DB.
In the retry logic, the values of the tuple were not ordered correctly, but the use of `NamedTuple` had secretly fixed the order for us.
In the linked PR, `NamedTuple` was dropped for a normal tuple, after profiling showed `NamedTuple` to be meaningfully slower than a normal tuple.
The implicit order normalization behaviour wasn't understood, and the order wasn't fixed when changin the retry logic to use a normal tuple instead of `NamedTuple`. This results in a bug where we incorrectly create queue items in the DB. For example, we stored the `destination` in the `field_values` column.
When such an incorrectly-created queue item is dequeued, it fails pydantic validation and causes what appears to be an endless loop of errors.
The only user-facing solution is to add this line to `invokeai.yaml` and restart the app:
```yaml
clear_queue_on_startup: true
```
On next startup, the queue is forcibly cleared before the error loop is triggered. Then the user should remove this line so their queue is persisted across app launches per usual.
The solution is simple - fix the ordering of the tuple. I also added a type annotation and comment to the tuple type alias definition.
Note: The endless error loop, as a general problem, will take some thinking to fix. The queue service methods to cancel and fail a queue item still retrieve it and parse it. And the list queue items methods parse the queue items. Bit of a catch 22, maybe the solution is to simply delete totally borked queue items and log an error.
- Replace `get_counts` method with `get_tag_counts_with_filter` which gets the counts for a list of tags, filtering by a list of selected tags
- Update `get_many` logic to apply tag filtering with AND logic, to match the new `get_tag_counts_with_filter` method
- Update workflow library router
This follows the same pattern for IP Adapter w/ its CLIP Vision model. The SigLIP model is unlikely to ever change and we don't want to force the user to select it anywhere. Hardcoding it is safe and makes the UX much nicer.
The alternative is a model dropdown that will likely only ever have one valid choice in it.
- We don't need to copy the init file. Just crawl the custom nodes dir for modules and import them all. Dunno why I didn't do this initially.
- Pass the logger in as an arg. There was a race condition where if we got the logger directly in the load_custom_nodes function, the config would not have been loaded fully yet and we'd end up with the wrong custom nodes path!
- Remove permissions-setting logic, I do not believe it is relevant for custom nodes
- Minor cleanup of the utility
There's a pydantic thing that causes the graphs to fail validation erroneously. Details in the comments - not a high priority to fix but we should figure it out someday.
This method simply sets the `opened_at` attribute to the current time.
Previously `opened_at` was set when calling `get`, but that is not correct. We `get` workflows often, even when not opening them. So this needs to be a separate thing