Ryan Dick 87fdcb7f6f Partial Loading PR4: Enable partial loading (behind config flag) (#7505)
## Summary

This PR adds support for partial loading of models onto the GPU. This
enables models to run with much lower peak VRAM requirements (e.g. full
FLUX dev with 8GB of VRAM).

The partial loading feature is enabled behind a new config flag:
`enable_partial_loading=True`. This flag defaults to `False`.

**Note about performance:**
The `ram` and `vram` config limits are still applied when
`enable_partial_loading=True` is set. This can result in significant
slowdowns compared to the 'old' behaviour. Consider the case where the
VRAM limit is set to `vram=0.75` (GB) and we are trying to run an 8GB
model. When `enable_partial_loading=False`, we attempt to load the
entire model into VRAM, and if it fits (no OOM error) then it will run
at full speed. When `enable_partial_loading=True`, since we have the
option to partially load the model we will only load 0.75 GB into VRAM
and leave the remaining 7.25 GB in RAM. This will cause inference to be
much slower than before. To workaround this, it is important that your
`ram` and `vram` configs are carefully tuned. In a future PR, we will
add the ability to dynamically set the RAM/VRAM limits based on the
available memory / VRAM.

## Related Issues / Discussions

- #7492 
- #7494 
- #7500

## QA Instructions

Tests with `enable_partial_loading=True`, `vram=2`, on CUDA device:
For all tests, we expect model memory to stay below 2 GB. Peak working
memory will be higher.
- [x] SD1 inference
- [x] SDXL inference
- [x] FLUX non-quantized inference
- [x] FLUX GGML-quantized inference
- [x] FLUX BnB quantized inference
- [x] Variety of ControlNet / IP-Adapter / LoRA smoke tests

Tests with `enable_partial_loading=True`, and hack to force all models
to load 10%, on CUDA device:
- [x] SD1 inference
- [x] SDXL inference
- [x] FLUX non-quantized inference
- [x] FLUX GGML-quantized inference
- [x] FLUX BnB quantized inference
- [x] Variety of ControlNet / IP-Adapter / LoRA smoke tests

Tests with `enable_partial_loading=False`, `vram=30`:
We expect no change in behaviour when  `enable_partial_loading=False`.
- [x] SD1 inference
- [x] SDXL inference
- [x] FLUX non-quantized inference
- [x] FLUX GGML-quantized inference
- [x] FLUX BnB quantized inference
- [x] Variety of ControlNet / IP-Adapter / LoRA smoke tests

Other platforms:
- [x] No change in behavior on MPS, even if
`enable_partial_loading=True`.
- [x] No change in behavior on CPU-only systems, even if
`enable_partial_loading=True`.

## Merge Plan

- [x] Merge #7500 first, and change the target branch to main

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
2025-01-06 23:18:31 -05:00
2023-07-27 10:54:01 -04:00
2025-01-07 00:31:00 +00:00
2023-12-11 12:30:13 +11:00
2024-10-08 10:55:49 +11:00
2024-10-08 10:55:49 +11:00
2023-07-05 23:46:27 -04:00
2024-09-22 17:10:14 +03:00
2024-12-20 17:01:34 +11:00
2024-11-25 04:10:03 -08:00

project hero

Invoke - Professional Creative AI Tools for Visual Media

To learn more about Invoke, or implement our Business solutions, visit invoke.com

discord badge latest release badge github stars badge github forks badge CI checks on main badge latest commit to main badge github open issues badge github open prs badge translation status badge

Invoke is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. Invoke offers an industry leading web-based UI, and serves as the foundation for multiple commercial products.

Invoke is available in two editions:

Community Edition Professional Edition
For users looking for a locally installed, self-hosted and self-managed service For users or teams looking for a cloud-hosted, fully managed service
- Free to use under a commercially-friendly license - Monthly subscription fee with three different plan levels
- Download and install on compatible hardware - Offers additional benefits, including multi-user support, improved model training, and more
- Includes all core studio features: generate, refine, iterate on images, and build workflows - Hosted in the cloud for easy, secure model access and scalability
Quick Start -> Installation and Updates More Information -> www.invoke.com/pricing

Highlighted Features - Canvas and Workflows

Documentation

Quick Links
Installation and Updates - Documentation and Tutorials - Bug Reports - Contributing

Installation

To get started with Invoke, Download the Installer.

For detailed step by step instructions, or for instructions on manual/docker installations, visit our documentation on Installation and Updates

Troubleshooting, FAQ and Support

Please review our FAQ for solutions to common installation problems and other issues.

For more help, please join our Discord.

Features

Full details on features can be found in our documentation.

Web Server & UI

Invoke runs a locally hosted web server & React UI with an industry-leading user experience.

Unified Canvas

The Unified Canvas is a fully integrated canvas implementation with support for all core generation capabilities, in/out-painting, brush tools, and more. This creative tool unlocks the capability for artists to create with AI as a creative collaborator, and can be used to augment AI-generated imagery, sketches, photography, renders, and more.

Workflows & Nodes

Invoke offers a fully featured workflow management solution, enabling users to combine the power of node-based workflows with the easy of a UI. This allows for customizable generation pipelines to be developed and shared by users looking to create specific workflows to support their production use-cases.

Invoke features an organized gallery system for easily storing, accessing, and remixing your content in the Invoke workspace. Images can be dragged/dropped onto any Image-base UI element in the application, and rich metadata within the Image allows for easy recall of key prompts or settings used in your workflow.

Other features

  • Support for both ckpt and diffusers models
  • SD1.5, SD2.0, SDXL, and FLUX support
  • Upscaling Tools
  • Embedding Manager & Support
  • Model Manager & Support
  • Workflow creation & management
  • Node-Based Architecture

Contributing

Anyone who wishes to contribute to this project - whether documentation, features, bug fixes, code cleanup, testing, or code reviews - is very much encouraged to do so.

Get started with contributing by reading our contribution documentation, joining the #dev-chat or the GitHub discussion board.

We hope you enjoy using Invoke as much as we enjoy creating it, and we hope you will elect to become part of our community.

Thanks

Invoke is a combined effort of passionate and talented people from across the world. We thank them for their time, hard work and effort.

Original portions of the software are Copyright © 2024 by respective contributors.

Description
No description provided
Readme 479 MiB
Languages
Python 52.3%
TypeScript 47.4%