Compare commits

...

6 Commits

Author SHA1 Message Date
Lincoln Stein
4d5b208601 multiple fixes in response to preflight testing bug reports
- updated environment-mac.yml #932
- use the upstream GFPGAN library now that issues with color-changing fixed
  and facial recognition improved #905
- preload_models fixed to download additional models needed by gfpgan
2022-10-05 12:44:16 -04:00
Lincoln Stein
488890e6bb preload script more robust for downloading gfpgan 2022-10-05 11:29:48 -04:00
Damian at mba
3feda31d82 Improve IMG2IMG docs with deeper explanation of what is happening under the hood 2022-10-05 10:21:40 -04:00
Lincoln Stein
c4b4a0e56e Update PROMPTS.md
fix broken img tags
2022-10-05 10:10:17 -04:00
Lincoln Stein
95c7742c9c change "prompt weighting" to "prompt blending"; Issue #931 2022-10-05 10:08:56 -04:00
Lincoln Stein
44e3995425 remove dangling -V from normalized dream command 2022-10-05 00:48:17 -04:00
32 changed files with 239 additions and 70 deletions

3
.gitignore vendored
View File

@@ -196,3 +196,6 @@ checkpoints
.vscode/
gfpgan/
models/ldm/stable-diffusion-v1/model.sha256
# GFPGAN model files
gfpgan/

View File

@@ -114,7 +114,7 @@ you can try starting `dream.py` with the `--precision=float32` flag:
- [Web Server](docs/features/WEB.md)
- [Reading Prompts From File](docs/features/PROMPTS.md#reading-prompts-from-a-file)
- [Shortcut: Reusing Seeds](docs/features/OTHER.md#shortcuts-reusing-seeds)
- [Weighted Prompts](docs/features/PROMPTS.md#weighted-prompts)
- [Prompt Blending](docs/features/PROMPTS.md#prompt-blending)
- [Thresholding and Perlin Noise Initialization Options](/docs/features/OTHER.md#thresholding-and-perlin-noise-initialization-options)
- [Negative/Unconditioned Prompts](docs/features/PROMPTS.md#negative-and-unconditioned-prompts)
- [Variations](docs/features/VARIATIONS.md)

Binary file not shown.

After

Width:  |  Height:  |  Size: 270 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 184 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 198 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 151 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 221 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 159 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 148 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 340 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 501 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 473 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 618 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 557 KiB

View File

@@ -9,7 +9,7 @@ drawing or photo. This is a really cool feature that tells stable diffusion to b
top of the image you provide, preserving the original's basic shape and layout. To use it, provide
the `--init_img` option as shown here:
```bash
```commandline
dream> "waterfall and rainbow" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4
```
@@ -26,5 +26,99 @@ If the initial image contains transparent regions, then Stable Diffusion will on
transparent regions, a process called "inpainting". However, for this to work correctly, the color
information underneath the transparent needs to be preserved, not erased.
More Details can be found here:
More details can be found here:
[Creating Transparent Images For Inpainting](./INPAINTING.md#creating-transparent-regions-for-inpainting)
## How does it actually work, though?
The main difference between `img2img` and `prompt2img` is the starting point. While `prompt2img` always starts with pure
gaussian noise and progressively refines it over the requested number of steps, `img2img` skips some of these earlier steps
(how many it skips is indirectly controlled by the `--strength` parameter), and uses instead your initial image mixed with gaussian noise as the starting image.
**Let's start** by thinking about vanilla `prompt2img`, just generating an image from a prompt. If the step count is 10, then the "latent space" (Stable Diffusion's internal representation of the image) for the prompt "fire" with seed `1592514025` develops something like this:
```commandline
dream> "fire" -s10 -W384 -H384 -S1592514025
```
![latent steps](../assets/img2img/000019.steps.png)
Put simply: starting from a frame of fuzz/static, SD finds details in each frame that it thinks look like "fire" and brings them a little bit more into focus, gradually scrubbing out the fuzz until a clear image remains.
**When you use `img2img`** some of the earlier steps are cut, and instead an initial image of your choice is used. But because of how the maths behind Stable Diffusion works, this image needs to be mixed with just the right amount of noise (fuzz/static) for where it is being inserted. This is where the strength parameter comes in. Depending on the set strength, your image will be inserted into the sequence at the appropriate point, with just the right amount of noise.
### A concrete example
Say I want SD to draw a fire based on this hand-drawn image:
![drawing of a fireplace](../assets/img2img/fire-drawing.png)
Let's only do 10 steps, to make it easier to see what's happening. If strength is `0.7`, this is what the internal steps the algorithm has to take will look like:
![](../assets/img2img/000032.steps.gravity.png)
With strength `0.4`, the steps look more like this:
![](../assets/img2img/000030.steps.gravity.png)
Notice how much more fuzzy the starting image is for strength `0.7` compared to `0.4`, and notice also how much longer the sequence is with `0.7`:
| | strength = 0.7 | strength = 0.4 |
| -- | -- | -- |
| initial image that SD sees | ![](../assets/img2img/000032.step-0.png) | ![](../assets/img2img/000030.step-0.png) |
| steps argument to `dream>` | `-S10` | `-S10` |
| steps actually taken | 7 | 4 |
| latent space at each step | ![](../assets/img2img/000032.steps.gravity.png) | ![](../assets/img2img/000030.steps.gravity.png) |
| output | ![](../assets/img2img/000032.1592514025.png) | ![](../assets/img2img/000030.1592514025.png) |
Both of the outputs look kind of like what I was thinking of. With the strength higher, my input becomes more vague, *and* Stable Diffusion has more steps to refine its output. But it's not really making what I want, which is a picture of cheery open fire. With the strength lower, my input is more clear, *but* Stable Diffusion has less chance to refine itself, so the result ends up inheriting all the problems of my bad drawing.
If you want to try this out yourself, all of these are using a seed of `1592514025` with a width/height of `384`, step count `10`, the default sampler (`k_lms`), and the single-word prompt `fire`:
```commandline
dream> "fire" -s10 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png --strength 0.7
```
The code for rendering intermediates is on my (damian0815's) branch [document-img2img](https://github.com/damian0815/InvokeAI/tree/document-img2img) - run `dream.py` and check your `outputs/img-samples/intermediates` folder while generating an image.
### Compensating for the reduced step count
After putting this guide together I was curious to see how the difference would be if I increased the step count to compensate, so that SD could have the same amount of steps to develop the image regardless of the strength. So I ran the generation again using the same seed, but this time adapting the step count to give each generation 20 steps.
Here's strength `0.4` (note step count `50`, which is `20 ÷ 0.4` to make sure SD does `20` steps from my image):
```commandline
dream> "fire" -s50 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png -f 0.4
```
![](../assets/img2img/000035.1592514025.png)
and strength `0.7` (note step count `30`, which is roughly `20 ÷ 0.7` to make sure SD does `20` steps from my image):
```commandline
dream> "fire" -s30 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png -f 0.7
```
![](../assets/img2img/000046.1592514025.png)
In both cases the image is nice and clean and "finished", but because at strength `0.7` Stable Diffusion has been give so much more freedom to improve on my badly-drawn flames, they've come out looking much better. You can really see the difference when looking at the latent steps. There's more noise on the first image with strength `0.7`:
![](../assets/img2img/000046.steps.gravity.png)
than there is for strength `0.4`:
![](../assets/img2img/000035.steps.gravity.png)
and that extra noise gives the algorithm more choices when it is evaluating how to denoise any particular pixel in the image.
Unfortunately, it seems that `img2img` is very sensitive to the step count. Here's strength `0.7` with a step count of `29` (SD did 19 steps from my image):
![](../assets/img2img/000045.1592514025.png)
By comparing the latents we can sort of see that something got interpreted differently enough on the third or fourth step to lead to a rather different interpretation of the flames.
![](../assets/img2img/000046.steps.gravity.png)
![](../assets/img2img/000045.steps.gravity.png)
This is the result of a difference in the de-noising "schedule" - basically the noise has to be cleaned by a certain degree each step or the model won't "converge" on the image properly (see https://huggingface.co/blog/stable_diffusion for more about that). A different step count means a different schedule, which means things get interpreted slightly differently at every step.

View File

@@ -27,28 +27,12 @@ You may read a series of prompts from standard input by providing a filename of
```bash
(ldm) ~/stable-diffusion$ echo "a beautiful day" | python3 scripts/dream.py --from_file -
```
---
## **Weighted Prompts**
You may weight different sections of the prompt to tell the sampler to attach different levels of
priority to them, by adding `:(number)` to the end of the section you wish to up- or downweight. For
example consider this prompt:
```bash
tabby cat:0.25 white duck:0.75 hybrid
```
This will tell the sampler to invest 25% of its effort on the tabby cat aspect of the image and 75%
on the white duck aspect (surprisingly, this example actually works). The prompt weights can use any
combination of integers and floating point numbers, and they do not need to add up to 1.
---
## **Negative and Unconditioned Prompts**
Any words between a pair of square brackets will try and be ignored by Stable Diffusion's model during generation of images.
Any words between a pair of square brackets will instruct Stable
Diffusion to attempt to ban the concept from the generated image.
```bash
this is a test prompt [not really] to make you understand [cool] how this works.
@@ -88,3 +72,78 @@ Getting close - but there's no sense in having a saddle when our horse doesn't h
* You can provide multiple words within the same bracket.
* You can provide multiple brackets with multiple words in different places of your prompt. That works just fine.
* To improve typical anatomy problems, you can add negative prompts like `[bad anatomy, extra legs, extra arms, extra fingers, poorly drawn hands, poorly drawn feet, disfigured, out of frame, tiling, bad art, deformed, mutated]`.
---
## **Prompt Blending**
You may blend together different sections of the prompt to explore the
AI's latent semantic space and generate interesting (and often
surprising!) variations. The syntax is:
```bash
blue sphere:0.25 red cube:0.75 hybrid
```
This will tell the sampler to blend 25% of the concept of a blue
sphere with 75% of the concept of a red cube. The blend weights can
use any combination of integers and floating point numbers, and they
do not need to add up to 1. Everything to the left of the `:XX` up to
the previous `:XX` is used for merging, so the overall effect is:
```bash
0.25 * "blue sphere" + 0.75 * "white duck" + hybrid
```
Because you are exploring the "mind" of the AI, the AI's way of mixing
two concepts may not match yours, leading to surprising effects. To
illustrate, here are three images generated using various combinations
of blend weights. As usual, unless you fix the seed, the prompts will give you
different results each time you run them.
### "blue sphere, red cube, hybrid"
This example doesn't use melding at all and represents the default way
of mixing concepts.
<img src="../assets/prompt-blending/blue-sphere-red-cube-hybrid.png" width=256>
It's interesting to see how the AI expressed the concept of "cube" as
the four quadrants of the enclosing frame. If you look closely, there
is depth there, so the enclosing frame is actually a cube.
### "blue sphere:0.25 red cube:0.75 hybrid"
<img src="../assets/prompt-blending/blue-sphere:0.25-red-cube:0.75-hybrid.png" width=256>
Now that's interesting. We get neither a blue sphere nor a red cube,
but a red sphere embedded in a brick wall, which represents a melding
of concepts within the AI's "latent space" of semantic
representations. Where is Ludwig Wittgenstein when you need him?
### "blue sphere:0.75 red cube:0.25 hybrid"
<img src="../assets/prompt-blending/blue-sphere:0.75-red-cube:0.25-hybrid.png" width=256>
Definitely more blue-spherey. The cube is gone entirely, but it's
really cool abstract art.
### "blue sphere:0.5 red cube:0.5 hybrid"
<img src="../assets/prompt-blending/blue-sphere:0.5-red-cube:0.5-hybrid.png" width=256>
Whoa...! I see blue and red, but no spheres or cubes. Is the word
"hybrid" summoning up the concept of some sort of scifi creature?
Let's find out.
### "blue sphere:0.5 red cube:0.5"
<img src="../assets/prompt-blending/blue-sphere:0.5-red-cube:0.5.png" width=256>
Indeed, removing the word "hybrid" produces an image that is more like
what we'd expect.
In conclusion, prompt blending is great for exploring creative space,
but can be difficult to direct. A forthcoming release of InvokeAI will
feature more deterministic prompt weighting.

View File

@@ -3,12 +3,12 @@ channels:
- pytorch
- conda-forge
dependencies:
- python==3.10.5
- python==3.9.13
- pip==22.2.2
# pytorch left unpinned
- pytorch
- torchvision
- pytorch==1.12.1
- torchvision==0.13.1
# I suggest to keep the other deps sorted for convenience.
# To determine what the latest versions should be, run:
@@ -27,13 +27,12 @@ dependencies:
- imgaug==0.4.0
- kornia==0.6.7
- mpmath==1.2.1
- nomkl
- nomkl=1.0
- numpy==1.23.2
- omegaconf==2.1.1
- openh264==2.3.0
- onnx==1.12.0
- onnxruntime==1.12.1
- protobuf==3.19.4
- pudb==2022.1
- pytorch-lightning==1.7.5
- scipy==1.9.1
@@ -42,22 +41,22 @@ dependencies:
- tensorboard==2.10.0
- torchmetrics==0.9.3
- pip:
- flask==2.1.3
- flask_socketio==5.3.0
- flask_cors==3.0.10
- dependency_injector==4.40.0
- eventlet
- opencv-python==4.6.0
- protobuf==3.20.1
- realesrgan==0.2.5.0
- send2trash==1.8.0
- test-tube==0.7.5
- transformers==4.21.2
- torch-fidelity==0.3.0
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
- -e git+https://github.com/Birch-san/k-diffusion.git@mps#egg=k_diffusion
- -e git+https://github.com/lstein/GFPGAN@fix-dark-cast-images#egg=gfpgan
- -e .
- flask==2.1.3
- flask_socketio==5.3.0
- flask_cors==3.0.10
- dependency_injector==4.40.0
- eventlet==0.33.1
- opencv-python==4.6.0
- protobuf==3.19.5
- realesrgan==0.2.5.0
- send2trash==1.8.0
- test-tube==0.7.5
- transformers==4.21.2
- torch-fidelity==0.3.0
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
- -e git+https://github.com/Birch-san/k-diffusion.git@mps#egg=k_diffusion
- -e git+https://github.com/TencentARC/GFPGAN.git#egg=gfpgan
- -e .
variables:
PYTORCH_ENABLE_MPS_FALLBACK: 1

View File

@@ -36,5 +36,5 @@ dependencies:
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
- -e git+https://github.com/Birch-san/k-diffusion.git@mps#egg=k_diffusion
- -e git+https://github.com/lstein/GFPGAN@fix-dark-cast-images#egg=gfpgan
- -e git+https://github.com/TencentARC/GFPGAN.git#egg=gfpgan
- -e .

View File

@@ -238,7 +238,7 @@ class Args(object):
if a['with_variations']:
formatted_variations = ','.join(f'{seed}:{weight}' for seed, weight in (a["with_variations"]))
switches.append(f'-V {formatted_variations}')
if 'variations' in a:
if 'variations' in a and len(a['variations'])>0:
switches.append(f'-V {a["variations"]}')
return ' '.join(switches)

View File

@@ -20,6 +20,6 @@ torchmetrics==0.6.0
transformers==4.19.2
-e git+https://github.com/openai/CLIP.git@main#egg=clip
-e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion
-e git+https://github.com/lstein/GFPGAN@fix-dark-cast-images#egg=gfpgan
-e .
-e git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion
-e git+https://github.com/TencentARC/GFPGAN.git#egg=gfpgan
-e .

View File

@@ -33,4 +33,4 @@ dependency_injector==4.40.0
eventlet
git+https://github.com/openai/CLIP.git@main#egg=clip
git+https://github.com/Birch-san/k-diffusion.git@mps#egg=k-diffusion
git+https://github.com/lstein/GFPGAN@fix-dark-cast-images#egg=gfpgan
git+https://github.com/TencentARC/GFPGAN.git#egg=gfpgan

View File

@@ -15,13 +15,13 @@ import urllib.request
transformers.logging.set_verbosity_error()
# this will preload the Bert tokenizer fles
print('preloading bert tokenizer...')
print('preloading bert tokenizer...', end='')
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
print('...success')
# this will download requirements for Kornia
print('preloading Kornia requirements (ignore the deprecation warnings)...')
print('preloading Kornia requirements...', end='')
with warnings.catch_warnings():
warnings.filterwarnings('ignore', category=DeprecationWarning)
import kornia
@@ -29,12 +29,12 @@ print('...success')
version = 'openai/clip-vit-large-patch14'
print('preloading CLIP model (Ignore the deprecation warnings)...')
print('preloading CLIP model...',end='')
sys.stdout.flush()
tokenizer = CLIPTokenizer.from_pretrained(version)
transformer = CLIPTextModel.from_pretrained(version)
print('\n\n...success')
print('...success')
# In the event that the user has installed GFPGAN and also elected to use
# RealESRGAN, this will attempt to download the model needed by RealESRGANer
@@ -47,7 +47,7 @@ except ModuleNotFoundError:
pass
if gfpgan:
print('Loading models from RealESRGAN and facexlib')
print('Loading models from RealESRGAN and facexlib...',end='')
try:
from realesrgan.archs.srvgg_arch import SRVGGNetCompact
from facexlib.utils.face_restoration_helper import FaceRestoreHelper
@@ -65,27 +65,41 @@ if gfpgan:
print('Error loading ESRGAN:')
print(traceback.format_exc())
try:
import urllib.request
model_url = 'https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth'
model_dest = 'src/gfpgan/experiments/pretrained_models/GFPGANv1.4.pth'
print('Loading models from GFPGAN')
import urllib.request
for model in (
[
'https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth',
'src/gfpgan/experiments/pretrained_models/GFPGANv1.4.pth'
],
[
'https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth',
'./gfpgan/weights/detection_Resnet50_Final.pth'
],
[
'https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth',
'./gfpgan/weights/parsing_parsenet.pth'
],
):
model_url,model_dest = model
try:
if not os.path.exists(model_dest):
print(f'Downloading gfpgan model file {model_url}...',end='')
os.makedirs(os.path.dirname(model_dest), exist_ok=True)
urllib.request.urlretrieve(model_url,model_dest)
print('...success')
except Exception:
import traceback
print('Error loading GFPGAN:')
print(traceback.format_exc())
if not os.path.exists(model_dest):
print('downloading gfpgan model file...')
urllib.request.urlretrieve(model_url,model_dest)
except Exception:
import traceback
print('Error loading GFPGAN:')
print(traceback.format_exc())
print('...success')
print('preloading CodeFormer model file...')
print('preloading CodeFormer model file...',end='')
try:
import urllib.request
model_url = 'https://github.com/sczhou/CodeFormer/releases/download/v0.1.0/codeformer.pth'
model_dest = 'ldm/dream/restoration/codeformer/weights/codeformer.pth'
if not os.path.exists(model_dest):
print('downloading codeformer model file...')
print('Downloading codeformer model file...')
os.makedirs(os.path.dirname(model_dest), exist_ok=True)
urllib.request.urlretrieve(model_url,model_dest)
except Exception: