This allows tags to be invalidated while mutations are executing, resolving an issue in this situation:
- A long-running mutation starts.
- A tag is invalidated; for example, user edits a board name, and the boards list query tag is invalidated.
- The boards list query isn't fired, and the board name isn't updated.
- The long-running mutation finishes.
- Finally, the boards list query fires and the board name is updated.
This is the "delayed" behaviour. The "immediately" behaviour has the fires requests from tag invalidation immediately, without waiting for all mutations to finish.
It may cause extra network requests and stale data if we are mutating a lot of things very quickly. I don't think it will be an issue in practice and the improved responsiveness will be a net benefit.
Rely on WAL mode and the busy timeout.
Also changed:
- Remove extraneous rollbacks when we were only doing a `SELECT`
- Remove try/catch blocks that were made extraneous when removing the extraneous rollbacks
This allows for read and write concurrency without using a global mutex. Operations may still fail they take longer than the busy timeout (5s).
If we get a database lock error after waiting 5s for an operation, we have a problem. So, I think it's actually better to use a busy timeout instead of a global mutex.
Alternatively, we could add a timeout to the global mutex.
Fixes an issue where fields like control weight on ControlNet nodes and image on IP Adapter nodes didn't render.
These are "single or collection" fields. They accept a single input object, or collection. They are supposed to render the UI input for a single object.
In a7a71ca935 a performance optimisation for a hot code-path inadvertently broke this.
The determination of which UI component to render for a given field was done using a type guard function for the field's template. Previously, this used a zod schema to parse the template. This is very slow, especially when the template was not the expected type.
The optimization changed the type guards to check the field name (aka its type, integer, image, etc) and cardinality directly, without any zod parsing.
It's much faster, but subtly changed the behaviour because it was a bit stricter. For some fields, it rejected "single or collection" cardinalities when it should have accepted them.
When these fields - like the aforementioned Control Weight and Image - were being rendered, none of the type guards passed and they rendered nothing.
The fix here updates the type guard functions to support multiple cardinalities. So now, when we go to render a "single or collection" field, we will render the "single" input component as it should be.
## Summary
This PR adds a `pytorch_cuda_alloc_conf` config flag to control the
torch memory allocator behavior.
- `pytorch_cuda_alloc_conf` defaults to `None`, preserving the current
behavior.
- The configuration options are explained here:
https://pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf.
Tuning this configuration can reduce peak reserved VRAM and improve
performance.
- Setting `pytorch_cuda_alloc_conf: "backend:cudaMallocAsync"` in
`invokeai.yaml` is expected to work well on many systems. This is a good
first step for those looking to tune this config. (We may make this the
default in the future.)
- The optimal configuration seems to be dependent on a number of factors
such as device version, VRAM, CUDA kernel version, etc. For now, users
will have to experiment with this config to see if it hurts or helps on
their systems. In most cases, I expect it to help.
### Memory Tests
```
VAE decode memory usage comparison:
- SDXL, fp16, 1024x1024:
- `cudaMallocAsync`: allocated=2593 MB, reserved=3200 MB
- `native`: allocated=2595 MB, reserved=4418 MB
- SDXL, fp32, 1024x1024:
- `cudaMallocAsync`: allocated=3982 MB, reserved=5536 MB
- `native`: allocated=3982 MB, reserved=7276 MB
- SDXL, fp32, 1536x1536:
- `cudaMallocAsync`: allocated=8643 MB, reserved=12032 MB
- `native`: allocated=8643 MB, reserved=15900 MB
```
## Related Issues / Discussions
N/A
## QA Instructions
- [x] Performance tests with `pytorch_cuda_alloc_conf` unset.
- [x] Performance tests with `pytorch_cuda_alloc_conf:
"backend:cudaMallocAsync"`.
## Merge Plan
- [x] Merge #7668 first and change target branch to `main`
## Checklist
- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
## Summary
Prior to this PR, most of the app setup was being done in `api_app.py`
at import time. This PR cleans this up, by:
- Splitting app setup into more modular functions
- Narrower responsibility for the `api_app.py` file - it just
initializes the `FastAPI` app
The main motivation for this changes is to make it easier to support an
upcoming torch configuration feature that requires more careful ordering
of app initialization steps.
## Related Issues / Discussions
N/A
## QA Instructions
- [x] Launch the app via invokeai-web.py and smoke test it.
- [ ] Launch the app via the installer and smoke test it.
- [x] Test that generate_openapi_schema.py produces the same result
before and after the change.
- [x] No regression in unit tests that directly interact with the app.
(test_images.py)
## Merge Plan
- [x] Check to see if there are any commercial implications to modifying
the app entrypoint.
## Checklist
- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
On the Canvas tab, when we made the network request to enqueue a batch, we were immediately resetting the request. This effectively disabled RTKQ's tracking of the request - including the loading state.
As a result, when you click the Invoke button on the Canvas tab, it didn't show a spinner, and it was not clear that anything was happening.
The solution is simple - just await the enqueue request before resetting the tracking, same as we already did on the workflows and upscaling tabs.
I also added some extra logging messages for enqueuing, so we get the same JS console logs for each tab on success or failure.