65 Commits

Author SHA1 Message Date
Mary Hipp
5885db4ab5 ruff 2025-09-19 11:07:36 -04:00
Mary Hipp
36ed9b750d restore list_queue_items method 2025-09-19 11:07:36 -04:00
Mary Hipp
aa35a5083b remove completed_at from queue list so that created_at is only sort option, restore field values in UI 2025-09-11 12:41:56 +10:00
psychedelicious
0bb5d647b5 tidy(app): method naming snake case 2025-09-08 20:41:36 +10:00
Attila Cseh
74e1047870 build errors fixed 2025-09-08 20:41:36 +10:00
Attila Cseh
3c2f654da8 queue api listQueueItems removed 2025-09-08 20:41:36 +10:00
Attila Cseh
9788735d6b code review fixes 2025-09-08 20:41:36 +10:00
Attila Cseh
486b333cef queue list virtualized 2025-09-08 20:41:36 +10:00
Attila Cseh
6fa437af03 get_queue_itemIds endpoint created 2025-09-08 20:41:36 +10:00
psychedelicious
fc71849c24 feat(app): expose a cursor, not a connection in db util 2025-07-11 08:20:06 +10:00
psychedelicious
a19aa3b032 feat(app): db abstraction to prevent threading conflicts
- Add a context manager to the SqliteDatabase class which abstracts away
creating a transaction, committing it on success and rolling back on
error.
- Use it everywhere. The context manager should be exited before
returning results. No business logic changes should be present.
2025-07-11 08:20:06 +10:00
psychedelicious
4229377532 fix(app): ensure cancel events are emitted for current item when bulk canceling
There was a bug where bulk cancel operations would cancel the current
queue item in the DB but not emit the status changed events correctly.
2025-07-08 12:12:55 +10:00
psychedelicious
e129525306 fix(app): handle None in queue count queries 2025-07-07 22:05:49 +10:00
psychedelicious
c1937b1379 chore: ruff 2025-06-30 12:56:51 +10:00
psychedelicious
5c66dfed8e fix(app): remove errant comment from prev impl 2025-06-30 12:56:51 +10:00
psychedelicious
3604dcfdd1 feat(api): return list of enqueued item ids when enqueuing 2025-06-30 12:56:51 +10:00
psychedelicious
2ddcde13ff refactor(ui): migrate from canceling queue items to deleteing, make queue hook APIs consistent 2025-06-26 19:51:36 +10:00
psychedelicious
5d80642ea4 feat(app): support deleting queue items by id or destination 2025-06-26 19:50:37 +10:00
psychedelicious
7ec511da01 feat(app): do not prune queue on startup
With the new canvas design, this will result in loss of staging area images.
2025-06-26 19:50:36 +10:00
psychedelicious
0af20b03e5 feat(api): remove status from list all queue items query 2025-06-26 19:50:36 +10:00
psychedelicious
8a78e37634 feat: canvas flow rework (wip) 2025-06-26 19:50:35 +10:00
psychedelicious
1ff3d44dba fix(app): guard against possible race conditions during enqueue
In #7724 we made a number of perf optimisations related to enqueuing. One of these optimisations included moving the enqueue logic - including expensive prep work and db writes - to a separate thread.

At the same time manual DB locking was abandoned in favor of WAL mode.

Finally, we set `check_same_thread=False` to allow multiple threads to access the connection at a given time.

I think this may be the cause of #7950:
- We start an enqueue in a thread (running in bg)
- We dequeue
- Dequeue pulls a partially-written queue item from DB and we get the errors in the linked issue

To be honest, I don't understand enough about SQLite to confidently say that this kind of race condition is actually possible. But:
- The error started popping up around the time we made this change.
- I have reviewed the logic from enqueue to dequeue very carefully _many_ times over the past month or so, and I am confident that the error is only possible if we are getting unexpectedly `NULL` values from the DB.
- The DB schema includes `NOT NULL` constraints for the column that is apparently returning `NULL`.
- Therefore, without some kind of race condition or schema issue, the error should not be possible.
- The `enqueue_batch` call is the only place I can find where we have the possibility of a race condition due to async logic. Everywhere else, all DB interaction for the queue is synchronous, as far as I can tell.

This change retains the perf benefits by running the heavy enqueue prep logic in a separate thread, but moves back to the main thread for the DB write. It also uses an explicit transaction for the write.

Will just have to wait and see if this fixes the issue.
2025-06-13 23:51:47 +10:00
psychedelicious
971c425734 fix(app): incorrect values inserted when retrying queue item
In #7688 we optimized queuing preparation logic. This inadvertently broke retrying queue items.

Previously, a `NamedTuple` was used to store the values to insert in the DB when enqueuing. This handy class provides an API similar to a dataclass, where you can instantiate it with kwargs in any order. The resultant tuple re-orders the kwargs to match the order in the class definition.

For example, consider this `NamedTuple`:
```py
class SessionQueueValueToInsert(NamedTuple):
    foo: str
    bar: str
```

When instantiating it, no matter the order of the kwargs, if you make a normal tuple out of it, the tuple values are in the same order as in the class definition:

```
t1 = SessionQueueValueToInsert(foo="foo", bar="bar")
print(tuple(t1)) # -> ('foo', 'bar')

t2 = SessionQueueValueToInsert(bar="bar", foo="foo")
print(tuple(t2)) # -> ('foo', 'bar')
```

So, in the old code, when we used the `NamedTuple`, it implicitly normalized the order of the values we insert into the DB.

In the retry logic, the values of the tuple were not ordered correctly, but the use of `NamedTuple` had secretly fixed the order for us.

In the linked PR, `NamedTuple` was dropped for a normal tuple, after profiling showed `NamedTuple` to be meaningfully slower than a normal tuple.

The implicit order normalization behaviour wasn't understood, and the order wasn't fixed when changin the retry logic to use a normal tuple instead of `NamedTuple`. This results in a bug where we incorrectly create queue items in the DB. For example, we stored the `destination` in the `field_values` column.

When such an incorrectly-created queue item is dequeued, it fails pydantic validation and causes what appears to be an endless loop of errors.

The only user-facing solution is to add this line to `invokeai.yaml` and restart the app:
```yaml
clear_queue_on_startup: true
```

On next startup, the queue is forcibly cleared before the error loop is triggered. Then the user should remove this line so their queue is persisted across app launches per usual.

The solution is simple - fix the ordering of the tuple. I also added a type annotation and comment to the tuple type alias definition.

Note: The endless error loop, as a general problem, will take some thinking to fix. The queue service methods to cancel and fail a queue item still retrieve it and parse it. And the list queue items methods parse the queue items. Bit of a catch 22, maybe the solution is to simply delete totally borked queue items and log an error.
2025-03-18 08:00:51 +11:00
psychedelicious
e57f0ff055 experiment(app): avoid nested cursors in session_queue service
SQLite cursors are meant to be lightweight and not reused. For whatever reason, we reuse one per service for the entire app lifecycle.

This can cause issues where a cursor is used twice at the same time in different transactions.

This experiment makes the session queue use a fresh cursor for each method, hopefully fixing the issue.
2025-03-04 08:33:42 +11:00
psychedelicious
7399909029 feat(app): use simpler syntax for enqueue_batch threaded execution 2025-03-03 14:40:48 +11:00
psychedelicious
c8aaf5e76b tidy(app): remove extraneous class attr type annotations 2025-03-03 14:40:48 +11:00
psychedelicious
0cdf7a7048 Revert "experiment(app): simulate very long enqueue operations (15s)"
This reverts commit eb6a323d0b70004732de493d6530e08eb5ca8acf.
2025-03-03 14:40:48 +11:00
psychedelicious
14f9d5b6bc experiment(app): remove db locking logic
Rely on WAL mode and the busy timeout.

Also changed:
- Remove extraneous rollbacks when we were only doing a `SELECT`
- Remove try/catch blocks that were made extraneous when removing the extraneous rollbacks
2025-03-03 14:40:48 +11:00
psychedelicious
f3dd44044a experiment(app): run enqueue_batch async in a thread 2025-03-03 14:40:48 +11:00
psychedelicious
03ca83fe13 experiment(app): simulate very long enqueue operations (15s) 2025-03-03 14:40:48 +11:00
psychedelicious
d1e03aa1c5 tidy(app): remove timing debug logs 2025-02-26 21:04:23 +11:00
psychedelicious
a3e78f0db6 perf(app): optimise batch prep logic
- Avoid pydantic models when dict manipulation works
- Avoid extraneous deep copies when we can safely mutate
- Avoid NamedTuple construct and its overhead
- Fix tests to use altered function signatures
- Remove extraneous populate_graph function
2025-02-26 21:04:23 +11:00
psychedelicious
675ac348de feat(app): add retry queue item functionality
Retrying a queue item means cloning it, resetting all execution-related state. Retried queue items reference the item they were retried from by id. This relationship is not enforced by any DB constraints.

- Add `retried_from_item_id` to `session_queue` table in DB in a migration.
- Add `retry_items_by_id` method to session queue service. Accepts a list of queue item IDs and clones them (minus execution state). Returns a list of retried items. Items that are not in a canceled or failed state are skipped.
- Add `retry_items_by_id` HTTP endpoint that maps 1-to-1 to the queue service method.
- Add `queue_items_retried` event, which includes the list of retried items.
2025-02-18 09:14:03 +11:00
Riku
47dc954385 feat(app): add cancel all except current queue item functionality 2025-02-04 12:23:23 +11:00
psychedelicious
9b0dd52792 feat(app): add get_queue_counts_by_destination
This allows the frontend to check if there are, for example, pending canvas generations.
2024-09-18 06:40:47 +03:00
psychedelicious
480856a528 feat(app): cancel by destination, not origin
When resetting the canvas or staging area, we don't want to cancel generations that are going to the gallery - only those going to the canvas.

Thus the method should not cancel by origin, but instead cancel by destination.

Update the queue method and route.
2024-09-06 22:56:24 +10:00
psychedelicious
6877db12c9 feat(app): add destination column to session_queue
The frontend needs to know where queue items came from (i.e. which tab), and where results are going to (i.e. send images to gallery or canvas). The `origin` column is not quite enough to represent this cleanly.

A `destination` column provides the frontend what it needs to handle incoming generations.
2024-09-06 22:56:24 +10:00
psychedelicious
257b18230a tidy(app): clean up app changes for canvas v2 2024-09-06 22:56:24 +10:00
psychedelicious
03809763a6 feat(app): add origin to session queue
The origin is an optional field indicating the queue item's origin. For example, "canvas" when the queue item originated from the canvas or "workflows" when the queue item originated from the workflows tab. If omitted, we assume the queue item originated from the API directly.

- Add migration to add the nullable column to the `session_queue` table.
- Update relevant event payloads with the new field.
- Add `cancel_by_origin` method to `session_queue` service and corresponding route. This is required for the canvas to bail out early when staging images.
- Add `origin` to both `SessionQueueItem` and `Batch` - it needs to be provided initially via the batch and then passed onto the queue item.
-
2024-09-06 22:56:24 +10:00
steffylo
a43d602f16 fix(queue): add clear_queue_on_startup config to clear problematic queues 2024-06-19 11:39:25 +10:00
psychedelicious
084cf26ed6 refactor: remove all session events
There's no longer any need for session-scoped events now that we have the session queue. Session started/completed/canceled map 1-to-1 to queue item status events, but queue item status events also have an event for failed state.

We can simplify queue and processor handling substantially by removing session events and instead using queue item events.

- Remove the session-scoped events entirely.
- Remove all event handling from session queue. The processor still needs to respond to some events from the queue: `QueueClearedEvent`, `BatchEnqueuedEvent` and `QueueItemStatusChangedEvent`.
- Pass an `is_canceled` callback to the invocation context instead of the cancel event
- Update processor logic to ensure the local instance of the current queue item is synced with the instance in the database. This prevents race conditions and ensures lifecycle callback do not get stale callbacks.
- Update docstrings and comments
- Add `complete_queue_item` method to session queue service as an explicit way to mark a queue item as successfully completed. Previously, the queue listened for session complete events to do this.

Closes #6442
2024-05-27 09:06:02 +10:00
psychedelicious
368127bd25 feat(events): register_events supports single event 2024-05-27 09:06:02 +10:00
psychedelicious
c0aabcd8ea tidy(events): use tuple index access for event payloads 2024-05-27 09:06:02 +10:00
psychedelicious
9bd78823a3 refactor(events): use pydantic schemas for events
Our events handling and implementation has a couple pain points:
- Adding or removing data from event payloads requires changes wherever the events are dispatched from.
- We have no type safety for events and need to rely on string matching and dict access when interacting with events.
- Frontend types for socket events must be manually typed. This has caused several bugs.

`fastapi-events` has a neat feature where you can create a pydantic model as an event payload, give it an `__event_name__` attr, and then dispatch the model directly.

This allows us to eliminate a layer of indirection and some unpleasant complexity:
- Event handler callbacks get type hints for their event payloads, and can use `isinstance` on them if needed.
- Event payload construction is now the responsibility of the event itself (a pydantic model), not the service. Every event model has a `build` class method, encapsulating this logic. The build methods are provided as few args as possible. For example, `InvocationStartedEvent.build()` gets the invocation instance and queue item, and can choose the data it wants to include in the event payload.
- Frontend event types may be autogenerated from the OpenAPI schema. We use the payload registry feature of `fastapi-events` to collect all payload models into one place, making it trivial to keep our schema and frontend types in sync.

This commit moves the backend over to this improved event handling setup.
2024-05-27 09:06:02 +10:00
psychedelicious
9117db2673 tidy(queue): delete unused delete_queue_item method 2024-05-24 20:02:24 +10:00
psychedelicious
25954ea750 feat(queue): session queue error handling
- Add handling for new error columns `error_type`, `error_message`, `error_traceback`.
- Update queue item model to include the new data. The `error_traceback` field has an alias of `error` for backwards compatibility.
- Add `fail_queue_item` method. This was previously handled by `cancel_queue_item`. Splitting this functionality makes failing a queue item a bit more explicit. We also don't need to handle multiple optional error args.
-
2024-05-24 20:02:24 +10:00
psychedelicious
93e4c3dbc2 feat(app): update queue item's session on session completion
The session is never updated in the queue after it is first enqueued. As a result, the queue detail view in the frontend never never updates and the session itself doesn't show outputs, execution graph, etc.

We need a new method on the queue service to update a queue item's session, then call it before updating the queue item's status.

Queue item status may be updated via a session-type event _or_ queue-type event. Adding the updated session to all these events is a hairy - simpler to just update the session before we do anything that could trigger a queue item status change event:
- Before calling `emit_session_complete` in the processor (handles session error, completed and cancel events and the corresponding queue events)
- Before calling `cancel_queue_item` in the processor (handles another way queue items can be canceled, outside the session execution loop)

When serializing the session, both in the new service method and the `get_queue_item` endpoint, we need to use `exclude_none=True` to prevent unexpected validation errors.
2024-05-24 08:59:49 +10:00
psychedelicious
897fe497dc fix(config): use new get_config across the app, use correct settings 2024-03-19 09:24:28 +11:00
psychedelicious
725c03cf87 refactor(nodes): merge processors
Consolidate graph processing logic into session processor.

With graphs as the unit of work, and the session queue distributing graphs, we no longer need the invocation queue or processor.

Instead, the session processor dequeues the next session and processes it in a simple loop, greatly simplifying the app.

- Remove `graph_execution_manager` service.
- Remove `queue` (invocation queue) service.
- Remove `processor` (invocation processor) service.
- Remove queue-related logic from `Invoker`. It now only starts and stops the services, providing them with access to other services.
- Remove unused `invocation_retrieval_error` and `session_retrieval_error` events, these are no longer needed.
- Clean up stats service now that it is less coupled to the rest of the app.
- Refactor cancellation logic - cancellations now originate from session queue (i.e. HTTP cancel endpoint) and are emitted as events. Processor gets the events and sets the canceled event. Access to this event is provided to the invocation context for e.g. the step callback.
- Remove `sessions` router; it provided access to `graph_executions` but that no longer exists.
2024-03-01 10:42:33 +11:00
psychedelicious
f2c6819d68 feat(db): add SQLiteMigrator to perform db migrations 2023-12-11 16:14:25 +11:00