mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-04-23 03:00:31 -04:00

Go to file

Ryan Dick 0258b6a04f Partial Loading PR5: Dynamic cache ram/vram limits (#7509 )

## Summary

This PR enables RAM/VRAM cache size limits to be determined dynamically
based on availability.

**Config Changes**

This PR modifies the app configs in the following ways:
- A new `device_working_mem_gb` config was added. This is the amount of
non-model working memory to keep available on the execution device (i.e.
GPU) when using dynamic cache limits. It default to 3GB.
- The `ram` and `vram` configs now default to `None`. If these configs
are set, they will take precedence over the dynamic limits. **Note: Some
users may have previously overriden the `ram` and `vram` values in their
`invokeai.yaml`. They will need to remove these configs to enable the
new dynamic limit feature.**

**Working Memory**

In addition to the new `device_working_mem_gb` config described above,
memory-intensive operations can estimate the amount of working memory
that they will need and request it from the model cache. This is
currently applied to the VAE decoding step for all models. In the
future, we may apply this to other operations as we work out which ops
tend to exceed the default working memory reservation.

**Mitigations for https://github.com/invoke-ai/InvokeAI/issues/7513**

This PR includes some mitigations for the issue described in
https://github.com/invoke-ai/InvokeAI/issues/7513. Without these
mitigations, it would occur with higher frequency when dynamic RAM
limits are used and the RAM is close to maxed-out.

## Limitations / Future Work

- Only _models_ can be offloaded to RAM to conserve VRAM. I.e. if VAE
decoding requires more working VRAM than available, the best we can do
is keep the full model on the CPU, but we will still hit an OOM error.
In the future, we could detect this ahead of time and switch to running
inference on the CPU for those ops.
- There is often a non-negligible amount of VRAM 'reserved' by the torch
CUDA allocator, but not used by any allocated tensors. We may be able to
tune the torch CUDA allocator to work better for our use case.
Reference:
https://pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf
- There may be some ops that require high working memory that haven't
been updated to request extra memory yet. We will update these as we
uncover them.
- If a model is 'locked' in VRAM, it won't be partially unloaded if a
later model load requests extra working memory. This should be uncommon,
but I can think of cases where it would matter.

## Related Issues / Discussions

- #7492
- #7494
- #7500
- #7505

## QA Instructions

Run a variety of models near the cache limits to ensure that model
switching works properly for the following configurations:
- [x] CUDA, `enable_partial_loading=true`, all other configs default
(i.e. dynamic memory limits)
- [x] CUDA, `enable_partial_loading=true`, CPU and CUDA memory reserved
in another process so there is limited RAM/VRAM remaining, all other
configs default (i.e. dynamic memory limits)
- [x] CUDA, `enable_partial_loading=false`, all other configs default
(i.e. dynamic memory limits)
- [x] CUDA, ram/vram limits set (these should take precedence over the
dynamic limits)
- [x] MPS, all other default (i.e. dynamic memory limits)
- [x] CPU, all other default (i.e. dynamic memory limits)

## Merge Plan

- [x] Merge #7505 first and change target branch to main

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_

2025-01-07 00:35:39 -05:00

.dev_scripts

Apply black

2023-07-27 10:54:01 -04:00

.github

feat(ci): add typegen check workflow

2024-12-22 06:05:17 +11:00

coverage

combine pytest.ini with pyproject.toml

2023-03-05 17:00:08 +00:00

docker

(docker) add comments in docker-entrypoint.sh and ensure variables are not null in bash expansion

2024-12-04 17:02:08 +00:00

docs

docs: add blurb about setting a HF token when downloading HF models by URL and not repo id

2025-01-03 11:21:23 -05:00

installer

removing periods from update link to prevent page not found error

2024-11-01 07:42:31 +11:00

invokeai

Offload the current model when locking if it is already partially loaded and we have insufficient VRAM.

2025-01-07 02:53:44 +00:00

scripts

Add scripts/extract_sd_keys_and_shapes.py

2024-10-10 07:59:29 -04:00

tests

Offload the current model when locking if it is already partially loaded and we have insufficient VRAM.

2025-01-07 02:53:44 +00:00

.dockerignore

Update dockerignore, set venv to 3.10, pass cache to yarn vite buidl

2023-07-12 16:51:15 -04:00

.editorconfig

Merge dev into main for 2.2.0 (#1642 )

2022-11-30 16:12:23 -05:00

.git-blame-ignore-revs

(meta) hide the 'black' formatting commit from git blame

2023-07-27 11:29:22 -04:00

.gitattributes

Enforce Unix line endings in container (#4990 )

2023-10-30 12:34:30 -04:00

.gitignore

feat: no frontend build in repo

2023-12-11 12:30:13 +11:00

.gitmodules

remove src directory, which is gumming up conda installs; addresses issue #77

2022-08-25 10:43:05 -04:00

.pre-commit-config.yaml

Adding isort GHA and pre-commit hooks

2023-09-12 13:01:58 -04:00

.prettierrc.yaml

feat: automated releases via github action

2024-02-29 21:57:20 -05:00

flake.lock

update flake (#7032 )

2024-10-08 10:55:49 +11:00

flake.nix

update flake (#7032 )

2024-10-08 10:55:49 +11:00

InvokeAI_Statement_of_Values.md

Add @ebr to Contributors (#2095 )

2022-12-21 14:33:08 -05:00

LICENSE

Update LICENSE

2023-07-05 23:46:27 -04:00

LICENSE-SD1+SD2.txt

updated LICENSE files and added information about watermarking

2023-07-26 17:27:33 -04:00

LICENSE-SDXL.txt

updated LICENSE files and added information about watermarking

2023-07-26 17:27:33 -04:00

Makefile

build: fix Makefile docs target

2024-09-22 17:10:14 +03:00

mkdocs.yml

docs: fix installation docs home again

2024-12-20 17:35:50 +11:00

pyproject.toml

Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW.

2024-12-24 14:32:11 +00:00

README.md

Update README.md

2024-12-20 17:01:34 +11:00

SECURITY.md

Create SECURITY.md

2024-11-25 04:10:03 -08:00

Stable_Diffusion_v1_Model_Card.md

Global replace [ \t]+$, add "GB" (#1751 )

2022-12-19 16:36:39 +00:00

README.md

Invoke - Professional Creative AI Tools for Visual Media

To learn more about Invoke, or implement our Business solutions, visit invoke.com

Invoke is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. Invoke offers an industry leading web-based UI, and serves as the foundation for multiple commercial products.

Invoke is available in two editions:

Community Edition	Professional Edition
For users looking for a locally installed, self-hosted and self-managed service	For users or teams looking for a cloud-hosted, fully managed service
- Free to use under a commercially-friendly license	- Monthly subscription fee with three different plan levels
- Download and install on compatible hardware	- Offers additional benefits, including multi-user support, improved model training, and more
- Includes all core studio features: generate, refine, iterate on images, and build workflows	- Hosted in the cloud for easy, secure model access and scalability
Quick Start -> Installation and Updates	More Information -> www.invoke.com/pricing

Documentation

Quick Links
Installation and Updates - Documentation and Tutorials - Bug Reports - Contributing

Installation

To get started with Invoke, Download the Installer.

For detailed step by step instructions, or for instructions on manual/docker installations, visit our documentation on Installation and Updates

Troubleshooting, FAQ and Support

Please review our FAQ for solutions to common installation problems and other issues.

For more help, please join our Discord.

Features

Full details on features can be found in our documentation.

Web Server & UI

Invoke runs a locally hosted web server & React UI with an industry-leading user experience.

Unified Canvas

The Unified Canvas is a fully integrated canvas implementation with support for all core generation capabilities, in/out-painting, brush tools, and more. This creative tool unlocks the capability for artists to create with AI as a creative collaborator, and can be used to augment AI-generated imagery, sketches, photography, renders, and more.

Workflows & Nodes

Invoke offers a fully featured workflow management solution, enabling users to combine the power of node-based workflows with the easy of a UI. This allows for customizable generation pipelines to be developed and shared by users looking to create specific workflows to support their production use-cases.

Board & Gallery Management

Invoke features an organized gallery system for easily storing, accessing, and remixing your content in the Invoke workspace. Images can be dragged/dropped onto any Image-base UI element in the application, and rich metadata within the Image allows for easy recall of key prompts or settings used in your workflow.

Other features

Support for both ckpt and diffusers models
SD1.5, SD2.0, SDXL, and FLUX support
Upscaling Tools
Embedding Manager & Support
Model Manager & Support
Workflow creation & management
Node-Based Architecture

Contributing

Anyone who wishes to contribute to this project - whether documentation, features, bug fixes, code cleanup, testing, or code reviews - is very much encouraged to do so.

Get started with contributing by reading our contribution documentation, joining the #dev-chat or the GitHub discussion board.

We hope you enjoy using Invoke as much as we enjoy creating it, and we hope you will elect to become part of our community.

Thanks

Invoke is a combined effort of passionate and talented people from across the world. We thank them for their time, hard work and effort.