mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-04-23 03:00:31 -04:00

Go to file

Ryan Dick 87fdcb7f6f Partial Loading PR4: Enable partial loading (behind config flag) (#7505 )

## Summary

This PR adds support for partial loading of models onto the GPU. This
enables models to run with much lower peak VRAM requirements (e.g. full
FLUX dev with 8GB of VRAM).

The partial loading feature is enabled behind a new config flag:
`enable_partial_loading=True`. This flag defaults to `False`.

**Note about performance:**
The `ram` and `vram` config limits are still applied when
`enable_partial_loading=True` is set. This can result in significant
slowdowns compared to the 'old' behaviour. Consider the case where the
VRAM limit is set to `vram=0.75` (GB) and we are trying to run an 8GB
model. When `enable_partial_loading=False`, we attempt to load the
entire model into VRAM, and if it fits (no OOM error) then it will run
at full speed. When `enable_partial_loading=True`, since we have the
option to partially load the model we will only load 0.75 GB into VRAM
and leave the remaining 7.25 GB in RAM. This will cause inference to be
much slower than before. To workaround this, it is important that your
`ram` and `vram` configs are carefully tuned. In a future PR, we will
add the ability to dynamically set the RAM/VRAM limits based on the
available memory / VRAM.

## Related Issues / Discussions

- #7492 
- #7494 
- #7500

## QA Instructions

Tests with `enable_partial_loading=True`, `vram=2`, on CUDA device:
For all tests, we expect model memory to stay below 2 GB. Peak working
memory will be higher.
- [x] SD1 inference
- [x] SDXL inference
- [x] FLUX non-quantized inference
- [x] FLUX GGML-quantized inference
- [x] FLUX BnB quantized inference
- [x] Variety of ControlNet / IP-Adapter / LoRA smoke tests

Tests with `enable_partial_loading=True`, and hack to force all models
to load 10%, on CUDA device:
- [x] SD1 inference
- [x] SDXL inference
- [x] FLUX non-quantized inference
- [x] FLUX GGML-quantized inference
- [x] FLUX BnB quantized inference
- [x] Variety of ControlNet / IP-Adapter / LoRA smoke tests

Tests with `enable_partial_loading=False`, `vram=30`:
We expect no change in behaviour when  `enable_partial_loading=False`.
- [x] SD1 inference
- [x] SDXL inference
- [x] FLUX non-quantized inference
- [x] FLUX GGML-quantized inference
- [x] FLUX BnB quantized inference
- [x] Variety of ControlNet / IP-Adapter / LoRA smoke tests

Other platforms:
- [x] No change in behavior on MPS, even if
`enable_partial_loading=True`.
- [x] No change in behavior on CPU-only systems, even if
`enable_partial_loading=True`.

## Merge Plan

- [x] Merge #7500 first, and change the target branch to main

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_

2025-01-06 23:18:31 -05:00

.dev_scripts

Apply black

2023-07-27 10:54:01 -04:00

.github

feat(ci): add typegen check workflow

2024-12-22 06:05:17 +11:00

coverage

combine pytest.ini with pyproject.toml

2023-03-05 17:00:08 +00:00

docker

(docker) add comments in docker-entrypoint.sh and ensure variables are not null in bash expansion

2024-12-04 17:02:08 +00:00

docs

docs: add blurb about setting a HF token when downloading HF models by URL and not repo id

2025-01-03 11:21:23 -05:00

installer

removing periods from update link to prevent page not found error

2024-11-01 07:42:31 +11:00

invokeai

Change definition of VRAM in use for the ModelCache from sum of model weights to the total torch.cuda.memory_allocated().

2025-01-07 00:31:53 +00:00

scripts

Add scripts/extract_sd_keys_and_shapes.py

2024-10-10 07:59:29 -04:00

tests

Add seed to flaky unit test.

2025-01-07 00:31:00 +00:00

.dockerignore

Update dockerignore, set venv to 3.10, pass cache to yarn vite buidl

2023-07-12 16:51:15 -04:00

.editorconfig

Merge dev into main for 2.2.0 (#1642 )

2022-11-30 16:12:23 -05:00

.git-blame-ignore-revs

(meta) hide the 'black' formatting commit from git blame

2023-07-27 11:29:22 -04:00

.gitattributes

Enforce Unix line endings in container (#4990 )

2023-10-30 12:34:30 -04:00

.gitignore

feat: no frontend build in repo

2023-12-11 12:30:13 +11:00

.gitmodules

remove src directory, which is gumming up conda installs; addresses issue #77

2022-08-25 10:43:05 -04:00

.pre-commit-config.yaml

Adding isort GHA and pre-commit hooks

2023-09-12 13:01:58 -04:00

.prettierrc.yaml

feat: automated releases via github action

2024-02-29 21:57:20 -05:00

flake.lock

update flake (#7032 )

2024-10-08 10:55:49 +11:00

flake.nix

update flake (#7032 )

2024-10-08 10:55:49 +11:00

InvokeAI_Statement_of_Values.md

Add @ebr to Contributors (#2095 )

2022-12-21 14:33:08 -05:00

LICENSE

Update LICENSE

2023-07-05 23:46:27 -04:00

LICENSE-SD1+SD2.txt

updated LICENSE files and added information about watermarking

2023-07-26 17:27:33 -04:00

LICENSE-SDXL.txt

updated LICENSE files and added information about watermarking

2023-07-26 17:27:33 -04:00

Makefile

build: fix Makefile docs target

2024-09-22 17:10:14 +03:00

mkdocs.yml

docs: fix installation docs home again

2024-12-20 17:35:50 +11:00

pyproject.toml

Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW.

2024-12-24 14:32:11 +00:00

README.md

Update README.md

2024-12-20 17:01:34 +11:00

SECURITY.md

Create SECURITY.md

2024-11-25 04:10:03 -08:00

Stable_Diffusion_v1_Model_Card.md

Global replace [ \t]+$, add "GB" (#1751 )

2022-12-19 16:36:39 +00:00

README.md

Invoke - Professional Creative AI Tools for Visual Media

To learn more about Invoke, or implement our Business solutions, visit invoke.com

Invoke is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. Invoke offers an industry leading web-based UI, and serves as the foundation for multiple commercial products.

Invoke is available in two editions:

Community Edition	Professional Edition
For users looking for a locally installed, self-hosted and self-managed service	For users or teams looking for a cloud-hosted, fully managed service
- Free to use under a commercially-friendly license	- Monthly subscription fee with three different plan levels
- Download and install on compatible hardware	- Offers additional benefits, including multi-user support, improved model training, and more
- Includes all core studio features: generate, refine, iterate on images, and build workflows	- Hosted in the cloud for easy, secure model access and scalability
Quick Start -> Installation and Updates	More Information -> www.invoke.com/pricing

Documentation

Quick Links
Installation and Updates - Documentation and Tutorials - Bug Reports - Contributing

Installation

To get started with Invoke, Download the Installer.

For detailed step by step instructions, or for instructions on manual/docker installations, visit our documentation on Installation and Updates

Troubleshooting, FAQ and Support

Please review our FAQ for solutions to common installation problems and other issues.

For more help, please join our Discord.

Features

Full details on features can be found in our documentation.

Web Server & UI

Invoke runs a locally hosted web server & React UI with an industry-leading user experience.

Unified Canvas

The Unified Canvas is a fully integrated canvas implementation with support for all core generation capabilities, in/out-painting, brush tools, and more. This creative tool unlocks the capability for artists to create with AI as a creative collaborator, and can be used to augment AI-generated imagery, sketches, photography, renders, and more.

Workflows & Nodes

Invoke offers a fully featured workflow management solution, enabling users to combine the power of node-based workflows with the easy of a UI. This allows for customizable generation pipelines to be developed and shared by users looking to create specific workflows to support their production use-cases.

Board & Gallery Management

Invoke features an organized gallery system for easily storing, accessing, and remixing your content in the Invoke workspace. Images can be dragged/dropped onto any Image-base UI element in the application, and rich metadata within the Image allows for easy recall of key prompts or settings used in your workflow.

Other features

Support for both ckpt and diffusers models
SD1.5, SD2.0, SDXL, and FLUX support
Upscaling Tools
Embedding Manager & Support
Model Manager & Support
Workflow creation & management
Node-Based Architecture

Contributing

Anyone who wishes to contribute to this project - whether documentation, features, bug fixes, code cleanup, testing, or code reviews - is very much encouraged to do so.

Get started with contributing by reading our contribution documentation, joining the #dev-chat or the GitHub discussion board.

We hope you enjoy using Invoke as much as we enjoy creating it, and we hope you will elect to become part of our community.

Thanks

Invoke is a combined effort of passionate and talented people from across the world. We thank them for their time, hard work and effort.