Compare commits

...

501 Commits

Author SHA1 Message Date
Lincoln Stein
bf1beaa607 revert 49a96b90 due to conflicts during training 2022-09-12 14:34:36 -04:00
Lincoln Stein
7dee9efb24 Merge branch 'development' into main 2022-09-12 14:33:05 -04:00
Lincoln Stein
9d6d728b51 Squashed commit of the following:
commit 1c649e4663
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 12 13:29:16 2022 -0400

    fix torchvision dependency version #511

commit 4d197f699e
Merge: a3e07fb 190ba78
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 12 07:29:19 2022 -0400

    Merge branch 'development' of github.com:lstein/stable-diffusion into development

commit a3e07fb84a
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 12 07:28:58 2022 -0400

    fix grid crash

commit 9fa1f31bf2
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 12 07:07:05 2022 -0400

    fix opencv and realesrgan dependencies in mac install

commit 190ba78960
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 12 01:50:58 2022 -0400

    Update requirements-mac.txt

    Fixed dangling dash on last line.

commit 25d9ccc509
Author: Any-Winter-4079 <50542132+Any-Winter-4079@users.noreply.github.com>
Date:   Mon Sep 12 03:17:29 2022 +0200

    Update model.py

commit 9cdf3aca7d
Author: Any-Winter-4079 <50542132+Any-Winter-4079@users.noreply.github.com>
Date:   Mon Sep 12 02:52:36 2022 +0200

    Update attention.py

    Performance improvements to generate larger images in M1 #431

    Update attention.py

    Added dtype=r1.dtype to softmax

commit 49a96b90d8
Author: Mihai <299015+mh-dm@users.noreply.github.com>
Date:   Sat Sep 10 16:58:07 2022 +0300

    ~7% speedup (1.57 to 1.69it/s) from switch to += in ldm.modules.attention. (#482)

    Tested on 8GB eGPU nvidia setup so YMMV.
    512x512 output, max VRAM stays same.

commit aba94b85e8
Author: Niek van der Maas <mail@niekvandermaas.nl>
Date:   Fri Sep 9 15:01:37 2022 +0200

    Fix macOS `pyenv` instructions, add code block highlight (#441)

    Fix: `anaconda3-latest` does not work, specify the correct virtualenv, add missing init.

commit aac5102cf3
Author: Henry van Megen <h.vanmegen@gmail.com>
Date:   Thu Sep 8 05:16:35 2022 +0200

    Disabled debug output (#436)

    Co-authored-by: Henry van Megen <hvanmegen@gmail.com>

commit 0ab5a36464
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 17:19:46 2022 -0400

    fix missing lines in outputs

commit 5e433728b5
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 16:20:14 2022 -0400

    upped max_steps in v1-finetune.yaml and fixed TI docs to address #493

commit 7708f4fb98
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 16:03:37 2022 -0400

    slight efficiency gain by using += in attention.py

commit b86a1deb00
Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com>
Date:   Mon Sep 12 07:47:12 2022 +1200

    Remove print statement styling (#504)

    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit 4951e66103
Author: chromaticist <mhostick@gmail.com>
Date:   Sun Sep 11 12:44:26 2022 -0700

    Adding support for .bin files from huggingface concepts (#498)

    * Adding support for .bin files from huggingface concepts

    * Updating documentation to include huggingface .bin info

commit 79b445b0ca
Merge: a323070 f7662c1
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 15:39:38 2022 -0400

    Merge branch 'development' of github.com:lstein/stable-diffusion into development

commit a323070a4d
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 15:28:57 2022 -0400

    update requirements for new location of gfpgan

commit f7662c1808
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 15:00:24 2022 -0400

    update requirements for changed location of gfpgan

commit 93c242c9fb
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:47:58 2022 -0400

    make gfpgan_model_exists flag available to web interface

commit c7c6cd7735
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:43:07 2022 -0400

    Update UPSCALE.md

    New instructions needed to accommodate fact that the ESRGAN and GFPGAN packages are now installed by environment.yaml.

commit 77ca83e103
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:31:56 2022 -0400

    Update CLI.md

    Final documentation tweak.

commit 0ea145d188
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:29:26 2022 -0400

    Update CLI.md

    More doc fixes.

commit 162285ae86
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:28:45 2022 -0400

    Update CLI.md

    Minor documentation fix

commit 37c921dfe2
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:26:41 2022 -0400

    documentation enhancements

commit 4f72cb44ad
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 13:05:38 2022 -0400

    moved the notebook files into their own directory

commit 878ef2e9e0
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 12:58:06 2022 -0400

    documentation tweaks

commit 4923118610
Merge: 16f6a67 defafc0
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 12:51:25 2022 -0400

    Merge branch 'development' of github.com:lstein/stable-diffusion into development

commit defafc0e8e
Author: Dominic Letz <dominic@diode.io>
Date:   Sun Sep 11 18:51:01 2022 +0200

    Enable upscaling on m1 (#474)

commit 16f6a6731d
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 12:47:26 2022 -0400

    install GFPGAN inside SD repository in order to fix 'dark cast' issue #169

commit 0881d429f2
Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com>
Date:   Mon Sep 12 03:52:43 2022 +1200

    Docs Update (#466)

    Authored-by: @blessedcoolant
    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit 9a29d442b4
Author: Gérald LONLAS <gerald@lonlas.com>
Date:   Sun Sep 11 23:23:18 2022 +0800

    Revert "Add 3x Upscale option on the Web UI (#442)" (#488)

    This reverts commit f8a540881c.

commit d301836fbd
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 10:52:19 2022 -0400

    can select prior output for init_img using -1, -2, etc

commit 70aa674e9e
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 10:34:06 2022 -0400

    merge PR #495 - keep using float16 in ldm.modules.attention

commit 8748370f44
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 10:22:32 2022 -0400

    negative -S indexing recovers correct previous seed; closes issue #476

commit 839e30e4b8
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 10:02:44 2022 -0400

    improve CUDA VRAM monitoring

    extra check that device==cuda before getting VRAM stats

commit bfb2781279
Author: tildebyte <337875+tildebyte@users.noreply.github.com>
Date:   Sat Sep 10 10:15:56 2022 -0400

    fix(readme): add note about updating env via conda (#475)

commit 5c43988862
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 10 10:02:43 2022 -0400

    reduce VRAM memory usage by half during model loading

    * This moves the call to half() before model.to(device) to avoid GPU
    copy of full model. Improves speed and reduces memory usage dramatically

    * This fix contributed by @mh-dm (Mihai)

commit 99122708ca
Merge: 817c4a2 ecc6b75
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 10 09:54:34 2022 -0400

    Merge branch 'development' of github.com:lstein/stable-diffusion into development

commit 817c4a26de
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 10 09:53:27 2022 -0400

    remove -F option from normalized prompt; closes #483

commit ecc6b75a3e
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 10 09:53:27 2022 -0400

    remove -F option from normalized prompt

commit 723d074442
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Fri Sep 9 18:49:51 2022 -0400

    Allow ctrl c when using --from_file (#472)

    * added ansi escapes to highlight key parts of CLI session

    * adjust exception handling so that ^C will abort when reading prompts from a file

commit 75f633cda8
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Fri Sep 9 12:03:45 2022 -0400

    re-add new logo

commit 10db192cc4
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Fri Sep 9 09:26:10 2022 -0400

    changes to dogettx optimizations to run on m1
    * Author @any-winter-4079
    * Author @dogettx
    Thanks to many individuals who contributed time and hardware to
    benchmarking and debugging these changes.

commit c85ae00b33
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 23:57:45 2022 -0400

    fix bug which caused seed to get "stuck" on previous image even when UI specified -1

commit 1b5aae3ef3
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:36:47 2022 -0400

    add icon to dream web server

commit 6abf739315
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:25:09 2022 -0400

    add favicon to web server

commit db825b8138
Merge: 33874ba afee7f9
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:17:37 2022 -0400

    Merge branch 'deNULL-development' into development

commit 33874bae8d
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:16:29 2022 -0400

    Squashed commit of the following:

    commit afee7f9cea
    Merge: 6531446 171f8db
    Author: Lincoln Stein <lincoln.stein@gmail.com>
    Date:   Thu Sep 8 22:14:32 2022 -0400

        Merge branch 'development' of github.com:deNULL/stable-diffusion into deNULL-development

    commit 171f8db742
    Author: Denis Olshin <me@denull.ru>
    Date:   Thu Sep 8 03:15:20 2022 +0300

        saving full prompt to metadata when using web ui

    commit d7e67b62f0
    Author: Denis Olshin <me@denull.ru>
    Date:   Thu Sep 8 01:51:47 2022 +0300

        better logic for clicking to make variations

commit afee7f9cea
Merge: 6531446 171f8db
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:14:32 2022 -0400

    Merge branch 'development' of github.com:deNULL/stable-diffusion into deNULL-development

commit 653144694f
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 20:41:37 2022 -0400

    work around unexplained crash when timesteps=1000 (#440)

    * work around unexplained crash when timesteps=1000

    * this fix seems to work

commit c33a84cdfd
Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com>
Date:   Fri Sep 9 12:39:51 2022 +1200

    Add New Logo (#454)

    * Add instructions on how to install alongside pyenv (#393)

    Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`.
    After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system.
    But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes.

    It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv.

    Feel free to incorporate these instructions as you see fit.

    Thanks a million for all your hard work.

    * Disabled debug output (#436)

    Co-authored-by: Henry van Megen <hvanmegen@gmail.com>

    * Add New Logo

    Co-authored-by: Håvard Gulldahl <havard@lurtgjort.no>
    Co-authored-by: Henry van Megen <h.vanmegen@gmail.com>
    Co-authored-by: Henry van Megen <hvanmegen@gmail.com>
    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit f8a540881c
Author: Gérald LONLAS <gerald@lonlas.com>
Date:   Fri Sep 9 01:45:54 2022 +0800

    Add 3x Upscale option on the Web UI (#442)

commit 244239e5f6
Author: James Reynolds <magnusviri@users.noreply.github.com>
Date:   Thu Sep 8 05:36:33 2022 -0600

    macOS CI workflow, dream.py exits with an error, but the workflow com… (#396)

    * macOS CI workflow, dream.py exits with an error, but the workflow completes.

    * Files for testing

    Co-authored-by: James Reynolds <magnsuviri@me.com>
    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit 711d49ed30
Author: James Reynolds <magnusviri@users.noreply.github.com>
Date:   Thu Sep 8 05:35:08 2022 -0600

    Cache model workflow (#394)

    * Add workflow that caches the model, step 1 for CI

    * Change name of workflow job

    Co-authored-by: James Reynolds <magnsuviri@me.com>
    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit 7996a30e3a
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 07:34:03 2022 -0400

    add auto-creation of mask for inpainting (#438)

    * now use a single init image for both image and mask

    * turn on debugging for now to write out mask and image

    * add back -M option as a fallback

commit a69ca31f34
Author: elliotsayes <elliotsayes@gmail.com>
Date:   Thu Sep 8 15:30:06 2022 +1200

    .gitignore WebUI temp files (#430)

    * Add instructions on how to install alongside pyenv (#393)

    Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`.
    After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system.
    But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes.

    It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv.

    Feel free to incorporate these instructions as you see fit.

    Thanks a million for all your hard work.

    * .gitignore WebUI temp files

    Co-authored-by: Håvard Gulldahl <havard@lurtgjort.no>

commit 5c6b612a72
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Wed Sep 7 22:50:55 2022 -0400

    fix bug that caused same seed to be redisplayed repeatedly

commit 56f155c590
Author: Johan Roxendal <johan@roxendal.com>
Date:   Thu Sep 8 04:50:06 2022 +0200

    added support for parsing run log and displaying images in the frontend init state (#410)

    Co-authored-by: Johan Roxendal <johan.roxendal@litteraturbanken.se>
    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit 41687746be
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Wed Sep 7 20:24:35 2022 -0400

    added missing initialization of latent_noise to None

commit 171f8db742
Author: Denis Olshin <me@denull.ru>
Date:   Thu Sep 8 03:15:20 2022 +0300

    saving full prompt to metadata when using web ui

commit d7e67b62f0
Author: Denis Olshin <me@denull.ru>
Date:   Thu Sep 8 01:51:47 2022 +0300

    better logic for clicking to make variations

commit d1d044aa87
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Wed Sep 7 17:56:59 2022 -0400

    actual image seed now written into web log rather than -1 (#428)

commit edada042b3
Author: Arturo Mendivil <60411196+artmen1516@users.noreply.github.com>
Date:   Wed Sep 7 10:42:26 2022 -0700

    Improve notebook and add requirements file (#422)

commit 29ab3c2028
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Wed Sep 7 13:28:11 2022 -0400

    disable neonpixel optimizations on M1 hardware (#414)

    * disable neonpixel optimizations on M1 hardware

    * fix typo that was causing random noise images on m1

commit 7670ecc63f
Author: cody <cnmizell@gmail.com>
Date:   Wed Sep 7 12:24:41 2022 -0500

    add more keyboard support on the web server (#391)

    add ability to submit prompts with the "enter" key
    add ability to cancel generations with the "escape" key

commit dd2aedacaf
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Wed Sep 7 13:23:53 2022 -0400

    report VRAM usage stats during initial model loading (#419)

commit f6284777e6
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Tue Sep 6 17:12:39 2022 -0400

    Squashed commit of the following:

    commit 7d1344282d942a33dcecda4d5144fc154ec82915
    Merge: caf4ea3 ebeb556
    Author: Lincoln Stein <lincoln.stein@gmail.com>
    Date:   Mon Sep 5 10:07:27 2022 -0400

        Merge branch 'development' of github.com:WebDev9000/stable-diffusion into WebDev9000-development

    commit ebeb556af9
    Author: Web Dev 9000 <rirath@gmail.com>
    Date:   Sun Sep 4 18:05:15 2022 -0700

        Fixed unintentionally removed lines

    commit ff2c4b9a1b
    Author: Web Dev 9000 <rirath@gmail.com>
    Date:   Sun Sep 4 17:50:13 2022 -0700

        Add ability to recreate variations via image click

    commit c012929cda
    Author: Web Dev 9000 <rirath@gmail.com>
    Date:   Sun Sep 4 14:35:33 2022 -0700

        Add files via upload

    commit 02a6018992
    Author: Web Dev 9000 <rirath@gmail.com>
    Date:   Sun Sep 4 14:35:07 2022 -0700

        Add files via upload

commit eef788981c
Author: Olivier Louvignes <olivier@mg-crea.com>
Date:   Tue Sep 6 12:41:08 2022 +0200

    feat(txt2img): allow from_file to work with len(lines) < batch_size (#349)

commit 720e5cd651
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 5 20:40:10 2022 -0400

    Refactoring simplet2i (#387)

    * start refactoring -not yet functional

    * first phase of refactor done - not sure weighted prompts working

    * Second phase of refactoring. Everything mostly working.
    * The refactoring has moved all the hard-core inference work into
    ldm.dream.generator.*, where there are submodules for txt2img and
    img2img. inpaint will go in there as well.
    * Some additional refactoring will be done soon, but relatively
    minor work.

    * fix -save_orig flag to actually work

    * add @neonsecret attention.py memory optimization

    * remove unneeded imports

    * move token logging into conditioning.py

    * add placeholder version of inpaint; porting in progress

    * fix crash in img2img

    * inpainting working; not tested on variations

    * fix crashes in img2img

    * ported attention.py memory optimization #117 from basujindal branch

    * added @torch_no_grad() decorators to img2img, txt2img, inpaint closures

    * Final commit prior to PR against development
    * fixup crash when generating intermediate images in web UI
    * rename ldm.simplet2i to ldm.generate
    * add backward-compatibility simplet2i shell with deprecation warning

    * add back in mps exception, addresses @vargol comment in #354

    * replaced Conditioning class with exported functions

    * fix wrong type of with_variations attribute during intialization

    * changed "image_iterator()" to "get_make_image()"

    * raise NotImplementedError for calling get_make_image() in parent class

    * Update ldm/generate.py

    better error message

    Co-authored-by: Kevin Gibbons <bakkot@gmail.com>

    * minor stylistic fixes and assertion checks from code review

    * moved get_noise() method into img2img class

    * break get_noise() into two methods, one for txt2img and the other for img2img

    * inpainting works on non-square images now

    * make get_noise() an abstract method in base class

    * much improved inpainting

    Co-authored-by: Kevin Gibbons <bakkot@gmail.com>

commit 1ad2a8e567
Author: thealanle <35761977+thealanle@users.noreply.github.com>
Date:   Mon Sep 5 17:35:04 2022 -0700

    Fix --outdir function for web (#373)

    * Fix --outdir function for web

    * Removed unnecessary hardcoded path

commit 52d8bb2836
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 5 10:31:59 2022 -0400

    Squashed commit of the following:

    commit 0cd48e932f1326e000c46f4140f98697eb9bdc79
    Author: Lincoln Stein <lincoln.stein@gmail.com>
    Date:   Mon Sep 5 10:27:43 2022 -0400

        resolve conflicts with development

    commit d7bc8c12e0
    Author: Scott McMillin <scott@scottmcmillin.com>
    Date:   Sun Sep 4 18:52:09 2022 -0500

        Add title attribute back to img tag

    commit 5397c89184
    Author: Scott McMillin <scott@scottmcmillin.com>
    Date:   Sun Sep 4 13:49:46 2022 -0500

        Remove temp code

    commit 1da080b509
    Author: Scott McMillin <scott@scottmcmillin.com>
    Date:   Sun Sep 4 13:33:56 2022 -0500

        Cleaned up HTML; small style changes; image click opens image; add seed to figcaption beneath image

commit caf4ea3d89
Author: Adam Rice <adam@askadam.io>
Date:   Mon Sep 5 10:05:39 2022 -0400

    Add a 'Remove Image' button to clear the file upload field (#382)

    * added "remove image" button

    * styled a new "remove image" button

    * Update index.js

commit 95c088b303
Author: Kevin Gibbons <bakkot@gmail.com>
Date:   Sun Sep 4 19:04:14 2022 -0700

    Revert "Add CORS headers to dream server to ease integration with third-party web interfaces" (#371)

    This reverts commit 91e826e5f4.

commit a20113d5a3
Author: Kevin Gibbons <bakkot@gmail.com>
Date:   Sun Sep 4 18:59:12 2022 -0700

    put no_grad decorator on make_image closures (#375)

commit 0f93dadd6a
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 4 21:39:15 2022 -0400

    fix several dangling references to --gfpgan option, which no longer exists

commit f4004f660e
Author: tildebyte <337875+tildebyte@users.noreply.github.com>
Date:   Sun Sep 4 19:43:04 2022 -0400

    TOIL(requirements): Split requirements to per-platform (#355)

    * toil(reqs): split requirements to per-platform

    Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

    * toil(reqs): fix for Win and Lin...

    ...allow pip to resolve latest torch, numpy

    Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

    * toil(install): update reqs in Win install notebook

    Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

    Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

commit 4406fd138d
Merge: 5116c81 fd7a72e
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 4 08:23:53 2022 -0400

    Merge branch 'SebastianAigner-main' into development
    Add support for full CORS headers for dream server.

commit fd7a72e147
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 4 08:23:11 2022 -0400

    remove debugging message

commit 3a2be621f3
Merge: 91e826e 5116c81
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 4 08:15:51 2022 -0400

    Merge branch 'development' into main

commit 5116c8178c
Author: Justin Wong <1584142+wongjustin99@users.noreply.github.com>
Date:   Sun Sep 4 07:17:58 2022 -0400

    fix save_original flag saving to the same filename (#360)

    * Update README.md with new Anaconda install steps (#347)

    pip3 version did not work for me and this is the recommended way to install Anaconda now it seems

    * fix save_original flag saving to the same filename

    Before this, the `--save_orig` flag was not working. The upscaled/GFPGAN would overwrite the original output image.

    Co-authored-by: greentext2 <112735219+greentext2@users.noreply.github.com>

commit 91e826e5f4
Author: Sebastian Aigner <SebastianAigner@users.noreply.github.com>
Date:   Sun Sep 4 10:22:54 2022 +0200

    Add CORS headers to dream server to ease integration with third-party web interfaces

commit 6266d9e8d6
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 15:45:20 2022 -0400

    remove stray debugging message

commit 138956e516
Author: greentext2 <112735219+greentext2@users.noreply.github.com>
Date:   Sat Sep 3 13:38:57 2022 -0500

    Update README.md with new Anaconda install steps (#347)

    pip3 version did not work for me and this is the recommended way to install Anaconda now it seems

commit 60be735e80
Author: Cora Johnson-Roberson <cora.johnson.roberson@gmail.com>
Date:   Sat Sep 3 14:28:34 2022 -0400

    Switch to regular pytorch channel and restore Python 3.10 for Macs. (#301)

    * Switch to regular pytorch channel and restore Python 3.10 for Macs.

    Although pytorch-nightly should in theory be faster, it is currently
    causing increased memory usage and slower iterations:

    https://github.com/lstein/stable-diffusion/pull/283#issuecomment-1234784885

    This changes the environment-mac.yaml file back to the regular pytorch
    channel and moves the `transformers` dep into pip for now (since it
    cannot be satisfied until tokenizers>=0.11 is built for Python 3.10).

    * Specify versions for Pip packages as well.

commit d0d95d3a2a
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 14:10:31 2022 -0400

    make initimg appear in web log

commit b90a215000
Merge: 1eee811 6270e31
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 13:47:15 2022 -0400

    Merge branch 'prixt-seamless' into development

commit 6270e313b8
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 13:46:29 2022 -0400

    add credit to prixt for seamless circular tiling

commit a01b7bdc40
Merge: 1eee811 9d88abe
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 13:43:04 2022 -0400

    add web interface for seamless option

commit 1eee8111b9
Merge: 64eca42 fb857f0
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 12:33:39 2022 -0400

    Merge branch 'development' of github.com:lstein/stable-diffusion into development

commit 64eca42610
Merge: 9130ad7 21a1f68
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 12:33:05 2022 -0400

    Merge branch 'main' into development
    * brings in small documentation fixes that were
    added directly to main during release tweaking.

commit fb857f05ba
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 12:07:07 2022 -0400

    fix typo in docs

commit 9d88abe2ea
Author: prixt <paraxite@naver.com>
Date:   Sat Sep 3 22:42:16 2022 +0900

    fixed typo

commit a61e49bc97
Author: prixt <paraxite@naver.com>
Date:   Sat Sep 3 22:39:35 2022 +0900

    * Removed unnecessary code
    * Added description about --seamless

commit 02bee4fdb1
Author: prixt <paraxite@naver.com>
Date:   Sat Sep 3 16:08:03 2022 +0900

    added --seamless tag logging to normalize_prompt

commit d922b53c26
Author: prixt <paraxite@naver.com>
Date:   Sat Sep 3 15:13:31 2022 +0900

    added seamless tiling mode and commands
2022-09-12 14:31:48 -04:00
Lincoln Stein
1c649e4663 fix torchvision dependency version #511 2022-09-12 13:29:16 -04:00
Lincoln Stein
4d197f699e Merge branch 'development' of github.com:lstein/stable-diffusion into development 2022-09-12 07:29:19 -04:00
Lincoln Stein
a3e07fb84a fix grid crash 2022-09-12 07:28:58 -04:00
Lincoln Stein
9fa1f31bf2 fix opencv and realesrgan dependencies in mac install 2022-09-12 07:07:05 -04:00
Lincoln Stein
190ba78960 Update requirements-mac.txt
Fixed dangling dash on last line.
2022-09-12 01:50:58 -04:00
Any-Winter-4079
25d9ccc509 Update model.py 2022-09-11 22:37:45 -04:00
Any-Winter-4079
9cdf3aca7d Update attention.py
Performance improvements to generate larger images in M1 #431

Update attention.py

Added dtype=r1.dtype to softmax
2022-09-11 22:36:58 -04:00
Mihai
49a96b90d8 ~7% speedup (1.57 to 1.69it/s) from switch to += in ldm.modules.attention. (#482)
Tested on 8GB eGPU nvidia setup so YMMV.
512x512 output, max VRAM stays same.
2022-09-11 22:35:37 -04:00
Niek van der Maas
aba94b85e8 Fix macOS pyenv instructions, add code block highlight (#441)
Fix: `anaconda3-latest` does not work, specify the correct virtualenv, add missing init.
2022-09-11 22:34:39 -04:00
Henry van Megen
aac5102cf3 Disabled debug output (#436)
Co-authored-by: Henry van Megen <hvanmegen@gmail.com>
2022-09-11 22:34:35 -04:00
Lincoln Stein
0ab5a36464 fix missing lines in outputs 2022-09-11 17:19:46 -04:00
Lincoln Stein
5e433728b5 upped max_steps in v1-finetune.yaml and fixed TI docs to address #493 2022-09-11 16:20:14 -04:00
Lincoln Stein
7708f4fb98 slight efficiency gain by using += in attention.py 2022-09-11 16:03:54 -04:00
blessedcoolant
b86a1deb00 Remove print statement styling (#504)
Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2022-09-11 15:47:12 -04:00
chromaticist
4951e66103 Adding support for .bin files from huggingface concepts (#498)
* Adding support for .bin files from huggingface concepts

* Updating documentation to include huggingface .bin info
2022-09-11 15:44:26 -04:00
Lincoln Stein
79b445b0ca Merge branch 'development' of github.com:lstein/stable-diffusion into development 2022-09-11 15:39:38 -04:00
Lincoln Stein
a323070a4d update requirements for new location of gfpgan 2022-09-11 15:28:57 -04:00
Lincoln Stein
f7662c1808 update requirements for changed location of gfpgan 2022-09-11 15:00:24 -04:00
Lincoln Stein
93c242c9fb make gfpgan_model_exists flag available to web interface 2022-09-11 14:47:58 -04:00
Lincoln Stein
c7c6cd7735 Update UPSCALE.md
New instructions needed to accommodate fact that the ESRGAN and GFPGAN packages are now installed by environment.yaml.
2022-09-11 14:43:07 -04:00
Lincoln Stein
77ca83e103 Update CLI.md
Final documentation tweak.
2022-09-11 14:31:56 -04:00
Lincoln Stein
0ea145d188 Update CLI.md
More doc fixes.
2022-09-11 14:29:26 -04:00
Lincoln Stein
162285ae86 Update CLI.md
Minor documentation fix
2022-09-11 14:28:45 -04:00
Lincoln Stein
37c921dfe2 documentation enhancements 2022-09-11 14:26:41 -04:00
Lincoln Stein
4f72cb44ad moved the notebook files into their own directory 2022-09-11 13:05:38 -04:00
Lincoln Stein
878ef2e9e0 documentation tweaks 2022-09-11 12:58:06 -04:00
Lincoln Stein
4923118610 Merge branch 'development' of github.com:lstein/stable-diffusion into development 2022-09-11 12:51:25 -04:00
Dominic Letz
defafc0e8e Enable upscaling on m1 (#474) 2022-09-11 12:51:01 -04:00
Lincoln Stein
16f6a6731d install GFPGAN inside SD repository in order to fix 'dark cast' issue #169 2022-09-11 12:47:26 -04:00
blessedcoolant
0881d429f2 Docs Update (#466)
Authored-by: @blessedcoolant 
Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2022-09-11 11:52:43 -04:00
Gérald LONLAS
9a29d442b4 Revert "Add 3x Upscale option on the Web UI (#442)" (#488)
This reverts commit f8a540881c.
2022-09-11 11:23:18 -04:00
Lincoln Stein
d301836fbd can select prior output for init_img using -1, -2, etc 2022-09-11 10:52:19 -04:00
Lincoln Stein
70aa674e9e merge PR #495 - keep using float16 in ldm.modules.attention 2022-09-11 10:34:06 -04:00
Lincoln Stein
8748370f44 negative -S indexing recovers correct previous seed; closes issue #476 2022-09-11 10:22:32 -04:00
Lincoln Stein
839e30e4b8 improve CUDA VRAM monitoring
extra check that device==cuda before getting VRAM stats
2022-09-11 10:10:24 -04:00
tildebyte
bfb2781279 fix(readme): add note about updating env via conda (#475) 2022-09-10 10:15:56 -04:00
Lincoln Stein
5c43988862 reduce VRAM memory usage by half during model loading
* This moves the call to half() before model.to(device) to avoid GPU
copy of full model. Improves speed and reduces memory usage dramatically

* This fix contributed by @mh-dm (Mihai)
2022-09-10 10:02:43 -04:00
Mihai
62863ac586 ~7% speedup (1.57 to 1.69it/s) from switch to += in ldm.modules.attention. (#482)
Tested on 8GB eGPU nvidia setup so YMMV.
512x512 output, max VRAM stays same.
2022-09-10 09:58:07 -04:00
Lincoln Stein
99122708ca Merge branch 'development' of github.com:lstein/stable-diffusion into development 2022-09-10 09:54:34 -04:00
Lincoln Stein
817c4a26de remove -F option from normalized prompt; closes #483 2022-09-10 09:54:26 -04:00
Lincoln Stein
ecc6b75a3e remove -F option from normalized prompt 2022-09-10 09:53:27 -04:00
Lincoln Stein
723d074442 Allow ctrl c when using --from_file (#472)
* added ansi escapes to highlight key parts of CLI session

* adjust exception handling so that ^C will abort when reading prompts from a file
2022-09-09 18:49:51 -04:00
Lincoln Stein
75f633cda8 re-add new logo 2022-09-09 12:03:45 -04:00
Lincoln Stein
10db192cc4 changes to dogettx optimizations to run on m1
* Author @any-winter-4079
* Author @dogettx
Thanks to many individuals who contributed time and hardware to
benchmarking and debugging these changes.
2022-09-09 09:51:41 -04:00
Niek van der Maas
79ac0f3420 Fix macOS pyenv instructions, add code block highlight (#441)
Fix: `anaconda3-latest` does not work, specify the correct virtualenv, add missing init.
2022-09-09 09:01:37 -04:00
Lincoln Stein
c85ae00b33 fix bug which caused seed to get "stuck" on previous image even when UI specified -1 2022-09-08 23:57:45 -04:00
Lincoln Stein
1b5aae3ef3 add icon to dream web server 2022-09-08 22:36:47 -04:00
Lincoln Stein
6abf739315 add favicon to web server 2022-09-08 22:25:09 -04:00
Lincoln Stein
db825b8138 Merge branch 'deNULL-development' into development 2022-09-08 22:17:37 -04:00
Lincoln Stein
33874bae8d Squashed commit of the following:
commit afee7f9cea
Merge: 6531446 171f8db
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:14:32 2022 -0400

    Merge branch 'development' of github.com:deNULL/stable-diffusion into deNULL-development

commit 171f8db742
Author: Denis Olshin <me@denull.ru>
Date:   Thu Sep 8 03:15:20 2022 +0300

    saving full prompt to metadata when using web ui

commit d7e67b62f0
Author: Denis Olshin <me@denull.ru>
Date:   Thu Sep 8 01:51:47 2022 +0300

    better logic for clicking to make variations
2022-09-08 22:16:29 -04:00
Lincoln Stein
afee7f9cea Merge branch 'development' of github.com:deNULL/stable-diffusion into deNULL-development 2022-09-08 22:14:32 -04:00
Lincoln Stein
653144694f work around unexplained crash when timesteps=1000 (#440)
* work around unexplained crash when timesteps=1000

* this fix seems to work
2022-09-08 20:41:37 -04:00
blessedcoolant
c33a84cdfd Add New Logo (#454)
* Add instructions on how to install alongside pyenv (#393)

Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`. 
After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system.
But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes.

It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv. 

Feel free to incorporate these instructions as you see fit. 

Thanks a million for all your hard work.

* Disabled debug output (#436)

Co-authored-by: Henry van Megen <hvanmegen@gmail.com>

* Add New Logo

Co-authored-by: Håvard Gulldahl <havard@lurtgjort.no>
Co-authored-by: Henry van Megen <h.vanmegen@gmail.com>
Co-authored-by: Henry van Megen <hvanmegen@gmail.com>
Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2022-09-08 20:39:51 -04:00
Gérald LONLAS
f8a540881c Add 3x Upscale option on the Web UI (#442) 2022-09-08 13:45:54 -04:00
James Reynolds
244239e5f6 macOS CI workflow, dream.py exits with an error, but the workflow com… (#396)
* macOS CI workflow, dream.py exits with an error, but the workflow completes.

* Files for testing

Co-authored-by: James Reynolds <magnsuviri@me.com>
Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2022-09-08 07:36:33 -04:00
James Reynolds
711d49ed30 Cache model workflow (#394)
* Add workflow that caches the model, step 1 for CI

* Change name of workflow job

Co-authored-by: James Reynolds <magnsuviri@me.com>
Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2022-09-08 07:35:08 -04:00
Lincoln Stein
7996a30e3a add auto-creation of mask for inpainting (#438)
* now use a single init image for both image and mask

* turn on debugging for now to write out mask and image

* add back -M option as a fallback
2022-09-08 07:34:03 -04:00
elliotsayes
a69ca31f34 .gitignore WebUI temp files (#430)
* Add instructions on how to install alongside pyenv (#393)

Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`. 
After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system.
But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes.

It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv. 

Feel free to incorporate these instructions as you see fit. 

Thanks a million for all your hard work.

* .gitignore WebUI temp files

Co-authored-by: Håvard Gulldahl <havard@lurtgjort.no>
2022-09-07 23:30:06 -04:00
Henry van Megen
049ea02fc7 Disabled debug output (#436)
Co-authored-by: Henry van Megen <hvanmegen@gmail.com>
2022-09-07 23:16:35 -04:00
Lincoln Stein
5c6b612a72 fix bug that caused same seed to be redisplayed repeatedly 2022-09-07 22:50:55 -04:00
Johan Roxendal
56f155c590 added support for parsing run log and displaying images in the frontend init state (#410)
Co-authored-by: Johan Roxendal <johan.roxendal@litteraturbanken.se>
Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2022-09-07 22:50:06 -04:00
Lincoln Stein
41687746be added missing initialization of latent_noise to None 2022-09-07 20:24:35 -04:00
Denis Olshin
171f8db742 saving full prompt to metadata when using web ui 2022-09-08 03:15:20 +03:00
Denis Olshin
d7e67b62f0 better logic for clicking to make variations 2022-09-08 01:51:47 +03:00
Lincoln Stein
d1d044aa87 actual image seed now written into web log rather than -1 (#428) 2022-09-07 17:56:59 -04:00
Arturo Mendivil
edada042b3 Improve notebook and add requirements file (#422) 2022-09-07 13:42:26 -04:00
Lincoln Stein
29ab3c2028 disable neonpixel optimizations on M1 hardware (#414)
* disable neonpixel optimizations on M1 hardware

* fix typo that was causing random noise images on m1
2022-09-07 13:28:11 -04:00
cody
7670ecc63f add more keyboard support on the web server (#391)
add ability to submit prompts with the "enter" key
add ability to cancel generations with the "escape" key
2022-09-07 13:24:41 -04:00
Lincoln Stein
dd2aedacaf report VRAM usage stats during initial model loading (#419) 2022-09-07 13:23:53 -04:00
Lincoln Stein
f6284777e6 Squashed commit of the following:
commit 7d1344282d942a33dcecda4d5144fc154ec82915
Merge: caf4ea3 ebeb556
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 5 10:07:27 2022 -0400

    Merge branch 'development' of github.com:WebDev9000/stable-diffusion into WebDev9000-development

commit ebeb556af9
Author: Web Dev 9000 <rirath@gmail.com>
Date:   Sun Sep 4 18:05:15 2022 -0700

    Fixed unintentionally removed lines

commit ff2c4b9a1b
Author: Web Dev 9000 <rirath@gmail.com>
Date:   Sun Sep 4 17:50:13 2022 -0700

    Add ability to recreate variations via image click

commit c012929cda
Author: Web Dev 9000 <rirath@gmail.com>
Date:   Sun Sep 4 14:35:33 2022 -0700

    Add files via upload

commit 02a6018992
Author: Web Dev 9000 <rirath@gmail.com>
Date:   Sun Sep 4 14:35:07 2022 -0700

    Add files via upload
2022-09-06 17:12:39 -04:00
Håvard Gulldahl
dbc8fc7900 Add instructions on how to install alongside pyenv (#393)
Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`. 
After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system.
But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes.

It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv. 

Feel free to incorporate these instructions as you see fit. 

Thanks a million for all your hard work.
2022-09-06 07:07:10 -04:00
Olivier Louvignes
eef788981c feat(txt2img): allow from_file to work with len(lines) < batch_size (#349) 2022-09-06 06:41:08 -04:00
Lincoln Stein
720e5cd651 Refactoring simplet2i (#387)
* start refactoring -not yet functional

* first phase of refactor done - not sure weighted prompts working

* Second phase of refactoring. Everything mostly working.
* The refactoring has moved all the hard-core inference work into
ldm.dream.generator.*, where there are submodules for txt2img and
img2img. inpaint will go in there as well.
* Some additional refactoring will be done soon, but relatively
minor work.

* fix -save_orig flag to actually work

* add @neonsecret attention.py memory optimization

* remove unneeded imports

* move token logging into conditioning.py

* add placeholder version of inpaint; porting in progress

* fix crash in img2img

* inpainting working; not tested on variations

* fix crashes in img2img

* ported attention.py memory optimization #117 from basujindal branch

* added @torch_no_grad() decorators to img2img, txt2img, inpaint closures

* Final commit prior to PR against development
* fixup crash when generating intermediate images in web UI
* rename ldm.simplet2i to ldm.generate
* add backward-compatibility simplet2i shell with deprecation warning

* add back in mps exception, addresses @vargol comment in #354

* replaced Conditioning class with exported functions

* fix wrong type of with_variations attribute during intialization

* changed "image_iterator()" to "get_make_image()"

* raise NotImplementedError for calling get_make_image() in parent class

* Update ldm/generate.py

better error message

Co-authored-by: Kevin Gibbons <bakkot@gmail.com>

* minor stylistic fixes and assertion checks from code review

* moved get_noise() method into img2img class

* break get_noise() into two methods, one for txt2img and the other for img2img

* inpainting works on non-square images now

* make get_noise() an abstract method in base class

* much improved inpainting

Co-authored-by: Kevin Gibbons <bakkot@gmail.com>
2022-09-05 20:40:10 -04:00
thealanle
1ad2a8e567 Fix --outdir function for web (#373)
* Fix --outdir function for web

* Removed unnecessary hardcoded path
2022-09-05 20:35:04 -04:00
Lincoln Stein
52d8bb2836 Squashed commit of the following:
commit 0cd48e932f1326e000c46f4140f98697eb9bdc79
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 5 10:27:43 2022 -0400

    resolve conflicts with development

commit d7bc8c12e0
Author: Scott McMillin <scott@scottmcmillin.com>
Date:   Sun Sep 4 18:52:09 2022 -0500

    Add title attribute back to img tag

commit 5397c89184
Author: Scott McMillin <scott@scottmcmillin.com>
Date:   Sun Sep 4 13:49:46 2022 -0500

    Remove temp code

commit 1da080b509
Author: Scott McMillin <scott@scottmcmillin.com>
Date:   Sun Sep 4 13:33:56 2022 -0500

    Cleaned up HTML; small style changes; image click opens image; add seed to figcaption beneath image
2022-09-05 10:31:59 -04:00
Adam Rice
caf4ea3d89 Add a 'Remove Image' button to clear the file upload field (#382)
* added "remove image" button

* styled a new "remove image" button

* Update index.js
2022-09-05 10:05:39 -04:00
Kevin Gibbons
95c088b303 Revert "Add CORS headers to dream server to ease integration with third-party web interfaces" (#371)
This reverts commit 91e826e5f4.
2022-09-04 22:04:14 -04:00
Kevin Gibbons
a20113d5a3 put no_grad decorator on make_image closures (#375) 2022-09-04 21:59:12 -04:00
Lincoln Stein
0f93dadd6a fix several dangling references to --gfpgan option, which no longer exists 2022-09-04 21:39:15 -04:00
tildebyte
f4004f660e TOIL(requirements): Split requirements to per-platform (#355)
* toil(reqs): split requirements to per-platform

Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

* toil(reqs): fix for Win and Lin...

...allow pip to resolve latest torch, numpy

Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

* toil(install): update reqs in Win install notebook

Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

Signed-off-by: Ben Alkov <ben.alkov@gmail.com>
2022-09-04 19:43:04 -04:00
Lincoln Stein
4406fd138d Merge branch 'SebastianAigner-main' into development
Add support for full CORS headers for dream server.
2022-09-04 08:23:53 -04:00
Lincoln Stein
fd7a72e147 remove debugging message 2022-09-04 08:23:11 -04:00
Lincoln Stein
3a2be621f3 Merge branch 'development' into main 2022-09-04 08:15:51 -04:00
Justin Wong
5116c8178c fix save_original flag saving to the same filename (#360)
* Update README.md with new Anaconda install steps (#347)

pip3 version did not work for me and this is the recommended way to install Anaconda now it seems

* fix save_original flag saving to the same filename

Before this, the `--save_orig` flag was not working. The upscaled/GFPGAN would overwrite the original output image.

Co-authored-by: greentext2 <112735219+greentext2@users.noreply.github.com>
2022-09-04 07:17:58 -04:00
Sebastian Aigner
91e826e5f4 Add CORS headers to dream server to ease integration with third-party web interfaces 2022-09-04 10:22:54 +02:00
Kevin Gibbons
751283a2de fix img2img variations/MPS (#353)
* fix img2img variations

* fix assert for variation_amount
2022-09-04 00:34:20 -06:00
Lincoln Stein
6266d9e8d6 remove stray debugging message 2022-09-03 15:45:20 -04:00
greentext2
c22c3dec56 Update README.md with new Anaconda install steps (#347)
pip3 version did not work for me and this is the recommended way to install Anaconda now it seems
2022-09-03 14:41:25 -04:00
greentext2
138956e516 Update README.md with new Anaconda install steps (#347)
pip3 version did not work for me and this is the recommended way to install Anaconda now it seems
2022-09-03 14:38:57 -04:00
Cora Johnson-Roberson
60be735e80 Switch to regular pytorch channel and restore Python 3.10 for Macs. (#301)
* Switch to regular pytorch channel and restore Python 3.10 for Macs.

Although pytorch-nightly should in theory be faster, it is currently
causing increased memory usage and slower iterations:

https://github.com/lstein/stable-diffusion/pull/283#issuecomment-1234784885

This changes the environment-mac.yaml file back to the regular pytorch
channel and moves the `transformers` dep into pip for now (since it
cannot be satisfied until tokenizers>=0.11 is built for Python 3.10).

* Specify versions for Pip packages as well.
2022-09-03 14:28:34 -04:00
Lincoln Stein
d0d95d3a2a make initimg appear in web log 2022-09-03 14:10:31 -04:00
Lincoln Stein
b90a215000 Merge branch 'prixt-seamless' into development 2022-09-03 13:47:15 -04:00
Lincoln Stein
6270e313b8 add credit to prixt for seamless circular tiling 2022-09-03 13:46:29 -04:00
Lincoln Stein
a01b7bdc40 add web interface for seamless option 2022-09-03 13:43:04 -04:00
Lincoln Stein
1eee8111b9 Merge branch 'development' of github.com:lstein/stable-diffusion into development 2022-09-03 12:33:39 -04:00
Lincoln Stein
64eca42610 Merge branch 'main' into development
* brings in small documentation fixes that were
added directly to main during release tweaking.
2022-09-03 12:33:05 -04:00
Lincoln Stein
21a1f681dc added attributions for several contributors; let me know if I've missed someone 2022-09-03 12:19:54 -04:00
Lincoln Stein
2882c2d0a6 documentation update 2022-09-03 12:11:10 -04:00
Lincoln Stein
fb857f05ba fix typo in docs 2022-09-03 12:07:07 -04:00
Lincoln Stein
4ffdf73412 Merge branch 'development' into main
This merge adds the following major features:

* Support for image variations.

* Security fix for webGUI (binds to localhost by default, use
--host=0.0.0.0 to allow access from external interface.

* Scalable configs/models.yaml configuration file for adding more
models as they become available.

* More tuning and exception handling for M1 hardware running MPS.

* Various documentation fixes.
2022-09-03 11:58:46 -04:00
Lincoln Stein
9130ad7e08 make results section of webgui full width 2022-09-03 11:58:05 -04:00
tildebyte
d66010410c FEAT: add notebook for Windows for from-zero install and run (#164)
* Update README.md

Those []() link pairs get me every time.

* New issue template

* Added issue templates

* feat(install+run): add notebook for Windows for from-zero install...

...and run

Tested with JupyterLab and VSCode

Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

Signed-off-by: Ben Alkov <ben.alkov@gmail.com>
Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
Co-authored-by: James Reynolds <magnusviri@users.noreply.github.com>
Co-authored-by: James Reynolds <magnsuviri@me.com>
2022-09-03 11:49:37 -04:00
Lincoln Stein
6566c2298c add scalable support for new models using a configs/models.yaml file 2022-09-03 11:45:21 -04:00
Lincoln Stein
063b4a1995 add ability to specify location of config file (models.yaml) 2022-09-03 11:36:04 -04:00
Lincoln Stein
18cdb556bd update requirements.txt to run on m1 w/pip 2022-09-03 10:56:06 -04:00
Lincoln Stein
8d16a69b80 Merge branch 'erickhun-patch-1' into development 2022-09-03 10:45:12 -04:00
Lincoln Stein
a406b588b4 Merge branch 'development' into patch-1 2022-09-03 10:43:59 -04:00
Lincoln Stein
5454a0edc2 code cleanup
* check that fixed side provided when requesting variant parameter sweep
(-v)
* move _get_noise() into outer scope to improve readability -
refactoring of big method call needed
2022-09-03 10:40:20 -04:00
Lincoln Stein
fe5cc79249 fixes dream.py mps seed 2022-09-03 10:11:46 -04:00
Ben Alkov
361cc42829 fix(readme): Add individual OS links to TOC; fix TOC changelog link 2022-09-03 09:46:06 -04:00
Lincoln Stein
91cce6b4c3 move special-casing test for precision on mps into T2I class 2022-09-03 09:43:18 -04:00
prixt
9d88abe2ea fixed typo 2022-09-03 22:42:16 +09:00
prixt
a61e49bc97 * Removed unnecessary code
* Added description about --seamless
2022-09-03 22:39:35 +09:00
Lincoln Stein
d0df894c9f Merge branch 'cgodley-web-host-port' into development 2022-09-03 09:33:46 -04:00
Lincoln Stein
f46916d521 Add warning message about change in default host 2022-09-03 09:33:02 -04:00
Lincoln Stein
12755c6ef6 Merge branch 'web-host-port' of github.com:cgodley/stable-diffusion into cgodley-web-host-port
this allows host and port to be set on --web command line.
changes default binding from 0.0.0.0 to 127.0.0.1
2022-09-03 09:12:32 -04:00
Lincoln Stein
cc4f33bf3a Merge branch 'bakkot-variant-commas' into development
Change the image variation weighting syntax to match the prompt weighting
syntax
2022-09-03 09:08:20 -04:00
Lincoln Stein
d8c0d020eb remove space between -V and its value in generated prompt, for consistency with other switches 2022-09-03 09:08:10 -04:00
prixt
02bee4fdb1 added --seamless tag logging to normalize_prompt 2022-09-03 16:08:03 +09:00
Kevin Gibbons
e918cb1a8a replace list delimiters in variations syntax 2022-09-02 23:51:22 -07:00
prixt
d922b53c26 added seamless tiling mode and commands 2022-09-03 15:13:31 +09:00
Eric Khun
0163310a47 Merge branch 'development' into patch-1 2022-09-03 03:27:25 +00:00
Lincoln Stein
423d25716d Fix unclosed code section 2022-09-02 18:04:15 -04:00
Lincoln Stein
1d999ba974 Fix reference to variations walkthru
Had wrong href for the VARIATIONS.md file
2022-09-02 18:01:52 -04:00
Lincoln Stein
27d4bb5624 Merge branch 'development' of github.com:lstein/stable-diffusion into development
Synchronize with earlier changes in development
2022-09-02 18:00:02 -04:00
Lincoln Stein
c78b496da6 Merge branch 'bakkot-seed-fuzz' into development
This adds support for variations and mixtures of weighted variations
2022-09-02 17:58:42 -04:00
Lincoln Stein
dd2af3f93c added walkthru, small code fixes 2022-09-02 17:54:55 -04:00
Lincoln Stein
2d65b03f05 Merge branch 'seed-fuzz' of github.com:bakkot/stable-diffusion into bakkot-seed-fuzz 2022-09-02 16:17:51 -04:00
Cragin Godley
2288412ef2 dream.py: fix indentation 2022-09-02 15:00:07 -04:00
Cragin Godley
6bff985496 dream.py: include 0.0.0.0 in --host help
Co-authored-by: Kevin Gibbons <bakkot@gmail.com>
2022-09-02 14:58:57 -04:00
Cragin Godley
918ade12ed dream.py: use localhost in url when host is 0.0.0.0
Co-authored-by: Kevin Gibbons <bakkot@gmail.com>
2022-09-02 14:56:52 -04:00
Cragin Godley
68f62c8352 web: allow custom host/port, default to 127.0.0.1 for security reasons 2022-09-02 12:27:12 -04:00
James Reynolds
33936430d0 Added issue templates 2022-09-02 12:16:09 -04:00
James Reynolds
81b3de9c65 New issue template 2022-09-02 12:16:09 -04:00
Simon Vans-Colina
ad6cf6f2f7 Update readme to make it clearer for Windows users 2022-09-02 12:12:48 -04:00
Eric Khun
ecef72ca39 Merge branch 'development' into patch-1 2022-09-02 15:22:30 +00:00
Lincoln Stein
92d1ed744a Merge branch 'magnusviri-readme-mac-update-take2' into development 2022-09-02 10:42:39 -04:00
Lincoln Stein
da4bf95fbc Merge branch 'development' into readme-mac-update-take2 2022-09-02 10:41:10 -04:00
Lincoln Stein
d43c5c01e3 Update README.md
Those []() link pairs get me every time.
2022-09-02 10:38:59 -04:00
Lincoln Stein
51278c7a10 add brief contribution instructions in lieu of full code-of-conduct and contribution guidelines 2022-09-02 10:37:35 -04:00
James Reynolds
6ef7c1ad4e Added psychedelicious' changes 2022-09-02 08:29:28 -06:00
Lincoln Stein
33cc16473f Merge branch 'main' into development
Synchronizing dev with legacy pulls to main.
2022-09-02 10:22:57 -04:00
James Reynolds
1701c2ea94 README-Mac update 2022-09-02 10:22:26 -04:00
Lincoln Stein
2e299a1daf Merge branch 'gabrielrotbart-fix_img2img_m1' into main
This may improve the black image problem when using img2img with some
samplers on M1 hardware.
2022-09-02 10:18:06 -04:00
Lincoln Stein
0b582a40d0 add developer's guidance for refactoring this change 2022-09-02 10:17:51 -04:00
James Reynolds
1306457b27 README-Mac update 2022-09-02 08:17:19 -06:00
gabrielrotbart
f4a19af04f fix scope being set to autocast even for m1 2022-09-02 14:55:24 +03:00
Eric Khun
58545ba057 Update README-Mac-MPS.md 2022-09-02 19:52:13 +08:00
Kevin Gibbons
4fe265735a support generating variations
Co-authored-by: xra <mail@xra.dev>
2022-09-01 23:48:53 -07:00
James Reynolds
2b7f32502c Merge branch 'lstein:main' into main 2022-09-01 19:41:14 -06:00
Lincoln Stein
3ee82d8a3b Merge branch 'toffaletti-dream-m1' into main
This provides support for Apple M1 hardware
2022-09-01 17:55:36 -04:00
Lincoln Stein
629ca09fda Merge branch 'dream-m1' of github.com:toffaletti/stable-diffusion into toffaletti-dream-m1
* Fix conflicts with main branch changes
* Fix logic error in choose_autocast_device() that was causing crashes
on CUDA systems.
2022-09-01 17:54:01 -04:00
Lincoln Stein
833de06299 fix InitImageResizer not found error, closes #294 2022-09-01 16:16:46 -04:00
David Wager
68eabab2af Deprecate --laion400m and --weights arguments
Removes functionality for the --laion400m and --weights arguments and notifies user to use the --model argument instead.
2022-09-01 20:46:53 +01:00
David Wager
a4f69e62d7 Set sensible default for 1.4
Use the file that already exists for the majority of users for the default value.
2022-09-01 20:21:39 +01:00
David Wager
7db51d0171 Merge branch 'main' into main 2022-09-01 19:27:38 +01:00
Lincoln Stein
1b3c7acce3 fix ambiguous naming of self.device 2022-09-01 14:18:17 -04:00
Lincoln Stein
e6b2c15fc5 Merge branch 'main' into fit-init-img
add a --fit option to limit the size of the initial image to the
maximum boundaries specified by width and height.
2022-09-01 14:09:46 -04:00
David Wager
d319b8a762 Reference model from configs/models.yaml
By supplying --model (defaulting to stable-diffusion-1.4) a user can specify which model to load.
Width/Height/Config Location/Weights Location are referenced from configs/models.yaml
2022-09-01 19:04:31 +01:00
David Wager
db580ccefd Create models.yaml
models.yaml can serve as a base for expanding our support for other versions of Latent/Stable Diffusion.
Contained are parameters for default width/height, as well as where to find the config and weights for this model.
Adding a new model is as simple as adding to this file.
2022-09-01 19:02:57 +01:00
Brent Ozar
9e99fcbc16 README.md - fixing "further reading" formatting
Fixing typo in header and hyperlinking a file.
2022-09-01 10:27:58 -04:00
Lincoln Stein
346c9b66ec Merge branch 'corajr-main' into main
This improves Mac M1 installation instructions and makes the
environment easier to install.
2022-09-01 10:25:59 -04:00
Lincoln Stein
a52870684a Merge branch 'main' of https://github.com/corajr/stable-diffusion into corajr-main 2022-09-01 10:25:43 -04:00
Lincoln Stein
2455bb38a4 Remove redundant chain of types
torch->cuda and cuda->torch, so torch.cuda.torch.cuda actually works. However it looks like (and probably is) a typo.
2022-09-01 10:23:45 -04:00
Lincoln Stein
01e05a98de this fixes the inconsistent use of self.device, sometimes a str and sometimes an obj 2022-09-01 10:16:05 -04:00
Cora Johnson-Roberson
2cac4697aa Correct some verbiage in Mac readme. 2022-09-01 10:11:14 -04:00
Lincoln Stein
c5e95adb49 closes #273, crash on M1 machines 2022-09-01 10:01:41 -04:00
Cora Johnson-Roberson
91565970c2 Move environment-mac.yaml to Python 3.9 and patch dream.py for Macs.
I'm using stable-diffusion on a 2022 Macbook M2 Air with 24 GB unified memory.
I see this taking about 2.0s/it.

I've moved many deps from pip to conda-forge, to take advantage of the
precompiled binaries. Some notes for Mac users, since I've seen a lot of
confusion about this:

One doesn't need the `apple` channel to run this on a Mac-- that's only
used by `tensorflow-deps`, required for running tensorflow-metal. For
that, I have an example environment.yml here:

https://developer.apple.com/forums/thread/711792?answerId=723276022#723276022

However, the `CONDA_ENV=osx-arm64` environment variable *is* needed to
ensure that you do not run any Intel-specific packages such as `mkl`,
which will fail with [cryptic errors](https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1226702274)
on the ARM architecture and cause the environment to break.

I've also added a comment in the env file about 3.10 not working yet.
When it becomes possible to update, those commands run on an osx-arm64
machine should work to determine the new version set.

Here's what a successful run of dream.py should look like:

```
$ python scripts/dream.py --full_precision                                                                                                           SIGABRT(6) ↵  08:42:59
* Initializing, be patient...

Loading model from models/ldm/stable-diffusion-v1/model.ckpt
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Using slower but more accurate full-precision math (--full_precision)
>> Setting Sampler to k_lms
model loaded in 6.12s

* Initialization done! Awaiting your command (-h for help, 'q' to quit)
dream> "an astronaut riding a horse"
Generating:   0%|                                                                                                                                                                         | 0/1 [00:00<?, ?it/s]/Users/corajr/Documents/lstein/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1662016319283/work/aten/src/ATen/mps/MPSFallback.mm:11.)
  placeholder_idx = torch.where(
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [01:37<00:00,  1.95s/it]
Generating: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:38<00:00, 98.55s/it]
Usage stats:
   1 image(s) generated in 98.60s
   Max VRAM used for this generation: 0.00G
Outputs:
outputs/img-samples/000001.1525943180.png: "an astronaut riding a horse" -s50 -W512 -H512 -C7.5 -Ak_lms -F -S1525943180
```
2022-09-01 09:04:30 -04:00
Jason Toffaletti
09bd9fa47e move autocast device selection to a function 2022-08-31 22:21:14 -07:00
Lincoln Stein
dc30adfbb4 closes #273, crash on M1 machines 2022-09-01 01:09:56 -04:00
Jason Toffaletti
fa98601bfb better error reporting for load_model 2022-08-31 22:03:50 -07:00
Jason Toffaletti
66fe110148 default full_prevision to True for mps device 2022-08-31 22:03:50 -07:00
Jason Toffaletti
bf50ab9dd6 changes to get dream.py working on M1
- move all device init logic to T2I.__init__
- handle m1 specific edge case with autocast device type
- check torch.cuda.is_available before using cuda
2022-08-31 22:03:42 -07:00
James Reynolds
70119602a0 Issue 270 fix (#274)
* check if torch.backends has mps before calling it

* Fixes issue 270

Co-authored-by: James Reynolds <magnsuviri@me.com>
2022-09-01 00:59:20 -04:00
Lincoln Stein
28fe84177e optionally scale initial image to fit box defined by width x height
* This functionality is triggered by the --fit option in the CLI (default
false), and by the "fit" checkbox in the WebGUI (default True)

* In addition, this commit contains a number of whitespace changes to
make the code more readable, as well as an attempt to unify the visual
appearance of info and warning messages.
2022-09-01 00:52:43 -04:00
James Reynolds
35d3f0ed90 Merge branch 'lstein:main' into main 2022-08-31 21:42:12 -06:00
blessedcoolant
0433b3d625 Add Warning When Image Is Too Large (#271)
* Add Warning When Image Is Too Large

* fix incomprehensible formatting introduced by "blue"

Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2022-08-31 23:13:21 -04:00
Lincoln Stein
4b560b50c2 fix AttributeError crash when running on non-CUDA systems (#256)
* fix AttributeError crash when running on non-CUDA systems; closes issue #234 and issue #250
* although this prevents dream.py script from crashing immediately on MPS systems, MPS support still very much a work in progress.
2022-08-31 16:59:27 -04:00
Lincoln Stein
9ad79207c2 Merge branch 'main' of github.com:lstein/stable-diffusion into main 2022-08-31 14:44:18 -04:00
Lincoln Stein
0be2351c97 Merge branch 'resolution-checker' of https://github.com/blessedcoolant/stable-diffusion into main 2022-08-31 14:43:17 -04:00
David Wager
ed513397b2 Allow configuration of which SD model to use (#263)
* Allow configuration of which SD model to use

Closes https://github.com/lstein/stable-diffusion/issues/49 The syntax isn't quite the same (opting for --weights over --model), although --weights is more in-line with the existing naming convention.
This method also locks us into models in the models/ldm/stable-diffusion-v1/ directory. Personally, I'm not averse to this, although a secondary solution may be necessary if we wish to supply weights from an external directory.

* Fix typo

* Allow either filename OR filepath input for arg

This approach allows both
--weights SD13 
--weights C:/StableDiffusion/models/ldm/stable-diffusion-v1/SD13.ckpt
2022-08-31 14:20:28 -04:00
_nderscore
c52ba1b022 feat: simplify and enhance prompt weight splitting (#258)
* feat: simplify and enhance prompt weight splitting

* fix: don't shadow the prompt variable

* feat: enable backslash-escaped colons in prompts
2022-08-31 14:00:10 -04:00
Lincoln Stein
d022d0dd11 continue to display in-progress image until the post-processing is done, for better esthetics (#255) 2022-08-31 12:32:56 -04:00
Kevin Gibbons
a14fd69a5a fix progress bar in webui when using strength parameter (#254) 2022-08-31 11:28:11 -04:00
James Reynolds
0d2e6f90c8 Readme update (#253)
* check if torch.backends has mps before calling it

* Updated Mac Readme with latest debugging info

Co-authored-by: James Reynolds <magnsuviri@me.com>
2022-08-31 11:27:13 -04:00
David Ford
58e3562652 Fix merging embeddings (#226)
Fixed merging embeddings based on the changes made in textual inversion. Tested and working. Inverted their logic to prioritize Stable Diffusion implementation over alternatives, but left the option for alternatives to still be used.
2022-08-31 11:24:11 -04:00
Mikhail Tishin
b622819051 Expose img2img strength parameter in Web UI (#239)
* Expose img2img strength parameter in Web UI

* Fix strength label id

Co-authored-by: Mikhail Tishin <michail.tishin@fayrix.com>
Co-authored-by: Kevin Gibbons https://github.com/bakkot
2022-08-31 11:18:32 -04:00
James Reynolds
a547c33327 check if torch.backends has mps before calling it (#245)
Co-authored-by: James Reynolds <magnsuviri@me.com>
2022-08-31 10:56:38 -04:00
Brent Ozar
31b77dbaf8 Readme.md - fix hyperlink to Mac docs (#246)
The square brackets & curly brackets were mixed up.
2022-08-31 10:53:21 -04:00
Tom Elovi Spruce
4280788c18 Fix link to Mac instructions in README (#235) 2022-08-31 10:51:25 -04:00
Lincoln Stein
146e75a1de Merge branch 'bakkot-refactor-pngwriter-2' into main
This fixes regressions in the WebGUI and makes maintenance of pngwriter
easier.
2022-08-31 10:07:57 -04:00
Lincoln Stein
8a2b849620 fix regression in WebGUI progress bar and WebGUI crashes, closes issue #236. Closes issue #249 2022-08-31 10:07:19 -04:00
Lincoln Stein
462a1961e4 fix infinite hang during GFPGAN duration inadvertently introduced during batch_size cleanup 2022-08-31 08:21:49 -04:00
James Reynolds
84c10346fb check if torch.backends has mps before calling it 2022-08-31 03:29:37 -06:00
Jason Toffaletti
2aa8393272 set PYTORCH_ENABLE_MPS_FALLBACK in mac environment (#232)
- this enables cpu fallback for op not yet implemented for m1 gpu
2022-08-31 02:00:40 -04:00
Lincoln Stein
c83d01b369 fix hang during GFPGAN processing due to bug introduced by recent removal of batch_size arg from pngwriter 2022-08-31 01:41:15 -04:00
Lincoln Stein
5354122094 Merge branch 'main' into refactor-pngwriter-2 2022-08-31 01:24:17 -04:00
spezialspezial
64444025a9 Update simplet2i.py (#228)
Typo causing bug when preinitializing the model. Unsupported Sampler: klms, Defaulting to plms
2022-08-31 01:08:46 -04:00
Kevin Gibbons
d566ee092a move make_grid into image_utils 2022-08-30 22:03:53 -07:00
Kevin Gibbons
b983d61e93 tweak format of "result" event in web ui 2022-08-30 22:03:53 -07:00
Kevin Gibbons
153c93bdd4 refactor pngwriter 2022-08-30 22:03:51 -07:00
Lincoln Stein
3be1cee17c avoid crash due to dangling batch_size reference 2022-08-31 00:56:12 -04:00
Lincoln Stein
bdb0651eb2 add support for Apple hardware using MPS acceleration 2022-08-31 00:33:23 -04:00
blessedcoolant
1480ef84dc Add Resolution Checker 2022-08-31 14:54:16 +12:00
Kevin Gibbons
1714816fe2 remove support for batch_size from dream.py (#227)
* remove dream.py support for batch_size

* expect to get a single image
2022-08-30 22:30:12 -04:00
David Ford
b5565d2c82 Update .gitignore (#225)
Include log folders in git ignore.
2022-08-30 20:29:26 -04:00
David Ford
4fad71cd8c Training optimizations (#217)
* Optimizations to the training model

Based on the changes made in
textual_inversion I carried over the relevant changes that improve model training. These changes reduce the amount of memory used, significantly improve the speed at which training runs, and improves the quality of the results.

It also fixes the problem where the model trainer wouldn't automatically stop when it hit the set number of steps.

* Update main.py

Cleaned up whitespace
2022-08-30 15:59:32 -04:00
Lincoln Stein
d126db2413 Update README.md 2022-08-30 15:57:54 -04:00
blessedcoolant
7811d20f21 Add Badges to README.md and add CHANGELOG.md (#205)
* Update README.md - Add Badges

* Add CHANGELOG.md
2022-08-30 15:40:56 -04:00
Yosuke Shinya
d524e5797d Add regression test (#136)
* Add regression test
* fix regression test with full_precision
2022-08-30 15:39:14 -04:00
Kevin Gibbons
8ca4d6542d support progress for img2img (#215)
WebGUI shows progress bar when an initial image is provided.
2022-08-30 15:36:12 -04:00
Lincoln Stein
a51e18ea98 resize initial image to match requested width and height, preserving aspect ratio. Closes #210. Closes #207 (#214) 2022-08-30 15:26:02 -04:00
Lincoln Stein
8bf321f6ae Merge pull request #182 from bakkot/webui-cancel
webui: support cancelation
2022-08-30 12:02:05 -04:00
Kevin Gibbons
5d13207aa6 webui: support cancelation 2022-08-30 08:55:40 -07:00
Lincoln Stein
dae2b26765 remove message about GFPGAN being required, since it is no longer displayed if GFPGAN missing 2022-08-30 09:50:39 -04:00
Lincoln Stein
713b2a03dc Merge branch 'bakkot-sw-drop' into main
This adds a checkbox that shows the intermediate images fpr,omg as txt2img()
goes through its denoising steps.
2022-08-30 09:31:19 -04:00
Lincoln Stein
186d0f9d10 Merge branch 'sw-drop' of https://github.com/bakkot/stable-diffusion into bakkot-sw-drop 2022-08-30 09:17:07 -04:00
Lincoln Stein
55b448818e Update README.md
Highlighted Colab notebook addition.
2022-08-29 23:49:42 -04:00
Lincoln Stein
b4babf7680 add a screenshot to description of command-line utility 2022-08-29 23:27:44 -04:00
Lincoln Stein
85f32752fe promote most headings by one level 2022-08-29 23:16:41 -04:00
Lincoln Stein
b757384aba promote most headings by one level 2022-08-29 23:16:21 -04:00
Lincoln Stein
a5d21d7c94 Update README.md
Added a table of contents and a troubleshooting guide.
2022-08-29 23:15:49 -04:00
Lincoln Stein
8f3520e2d5 add troubleshooting guide to README 2022-08-29 23:08:04 -04:00
Lincoln Stein
19e4298cf9 Merge branch 'BlueAmulet-prompt_as_dir' into main
This adds the frequently-requested feature of naming the output
directory after the text prompt.
2022-08-29 22:34:48 -04:00
Lincoln Stein
42ffcd7204 add the recently added commands to the readline command-line-completion list; fix command-line documentation bug, closing issue #188 2022-08-29 22:34:09 -04:00
Lincoln Stein
d48299e56c Merge branch 'prompt_as_dir' of https://github.com/BlueAmulet/stable-diffusion into BlueAmulet-prompt_as_dir 2022-08-29 22:13:37 -04:00
BlueAmulet
2e22d9ecf1 Address bakkot review 2022-08-29 18:10:15 -06:00
Kevin Gibbons
18597ad1d9 fix bug in pngwriter 2022-08-29 16:33:32 -07:00
Kevin Gibbons
0173d3a8fc stream images 2022-08-29 16:33:31 -07:00
Lincoln Stein
e7658b941e Merge pull request #187 from bakkot/webui-upscalers-optional
webui: hide gfpgan part if not installed
2022-08-29 19:29:16 -04:00
Kevin Gibbons
a7a62d39d4 webui: hide gfpgan if not installed 2022-08-29 16:27:44 -07:00
Lincoln Stein
24ce56b3db Merge branch 'webui-sampler-fix' into main 2022-08-29 19:25:49 -04:00
Lincoln Stein
3220f73f0a add missing dropdown element for K_EULER_A 2022-08-29 19:24:52 -04:00
Lincoln Stein
27a1044e65 Merge pull request #199 from david-ford/gitattributes
Create .gitattributes
2022-08-29 19:15:25 -04:00
Lincoln Stein
39c56f20be Merge pull request #200 from david-ford/html-minor-changes
Minor updates to index.html
2022-08-29 19:14:51 -04:00
Lincoln Stein
f6b2ec61b2 Merge pull request #201 from blessedcoolant/placeholder-logo
Add Temp Logo To Repo
2022-08-29 19:13:54 -04:00
blessedcoolant
e57d6fd1a6 Add Temp Logo To Repo 2022-08-30 11:00:11 +12:00
David Ford
1b40a31a89 Update .gitattributes
Co-authored-by: Kevin Gibbons <bakkot@gmail.com>
2022-08-29 16:58:41 -05:00
David Ford
4fce1063c4 Minor updates to index.html
Some minor tweaks to index.html for accessibility and browsers.
2022-08-29 16:55:20 -05:00
David Ford
f9862a3d88 Create .gitattributes
Added a .gitattributes file to autoresolve line endings across different environments.
2022-08-29 16:50:28 -05:00
Lincoln Stein
81ad239197 Merge pull request #192 from david-ford/working-branch
Fix case sensitive check to be case insensitive
2022-08-29 17:48:12 -04:00
David Ford
ed38c97ed8 Removed unrelated changes and updated based on recommendation
Removed the changes to the index.html and .gitattributes for this PR. Will add them in separate PRs.

Applied recommended change for resolving the case issue.
2022-08-29 16:43:34 -05:00
BlueAmulet
4f8e7356b3 Add prompt as output directory feature
Based on previous code by czyz
2022-08-29 14:52:02 -06:00
Lincoln Stein
c363f033e8 Merge pull request #198 from bakkot/instructions-typo
fix path to dream server in readme
2022-08-29 16:41:24 -04:00
Kevin Gibbons
22c25b3615 fix path to dream server in readme 2022-08-29 13:38:52 -07:00
Lincoln Stein
7fe7cdc8c9 Merge pull request #176 from xraxra/show-tokenization
Print out tokenization data during image generation, allowing truncated prompts to be visible.
2022-08-29 15:36:10 -04:00
Lincoln Stein
e26fee78b5 Merge pull request #158 from Cubox/patch-1
Fix wrong help message
2022-08-29 15:12:24 -04:00
Lincoln Stein
63178c6a8c Merge branch 'main' into patch-1 2022-08-29 15:12:14 -04:00
Lincoln Stein
6fb2f1ed6e fixes WebUI so that the selected sampler is actually applied 2022-08-29 14:06:18 -04:00
Lincoln Stein
38701a6d7b Fix IndexError when generating grid; --grid option can now be passed on shell command line 2022-08-29 13:52:44 -04:00
David Ford
31fa92a83f Fix case sensitive check to be case insensitive
Case sensitivity between os.getcwd and os.realpath can fail due to different drive letter casing. C:\ vs c:\. This change addresses that by normalizing the strings before comparing.
2022-08-29 12:46:24 -05:00
Lincoln Stein
0abfc3cac6 Merge branch 'main' of github.com:lstein/stable-diffusion into main
This fixes issue with grid generation.
2022-08-29 13:39:59 -04:00
Lincoln Stein
d483fcb53a Merge pull request #166 from SMUsamaShah/patch-1
Bug fix in grid
2022-08-29 13:39:50 -04:00
Lincoln Stein
c7db038c96 grid is broken, needs the grid-fix PR#166 to fix 2022-08-29 13:39:20 -04:00
Lincoln Stein
132d23e55d Merge pull request #186 from lstein/reset-properly
Fix form reset logic
2022-08-29 13:03:30 -04:00
Lincoln Stein
90cbc6362c Display new features more prominently in the README 2022-08-29 12:59:34 -04:00
Lincoln Stein
f33ae1bdf4 Display new features more prominently in the README 2022-08-29 12:58:48 -04:00
Kevin Gibbons
754525be82 fix reset logic 2022-08-29 09:47:56 -07:00
Kevin Gibbons
d9eab7f383 use LF not CRLF for files, oh god 2022-08-29 09:47:45 -07:00
Kevin Gibbons
f695988915 fix whitespace in index.html 2022-08-29 09:47:20 -07:00
Lincoln Stein
5d19294810 Merge pull request #128 from artmen1516/feature/colab-notebook
Feature: Add Colab Notebook
2022-08-29 12:37:11 -04:00
Lincoln Stein
77803cf233 Merge branch 'bakkot-rebase-streaming-web' into main
This adds correct treatment of upscaling/face-fixing within the WebUI.
Also adds a basic status message so that the user knows what's happening
during the post-processing steps.
2022-08-29 12:09:44 -04:00
Lincoln Stein
4acfb76be6 correctly handle upscaling in webUI, including displaying status messages during GFPGAN/ESRGAN postprocessing 2022-08-29 12:08:18 -04:00
Lincoln Stein
fd13526454 Merge pull request #175 from bakkot/unicode
read/write plain text files in utf-8, not ascii
2022-08-29 07:03:55 -04:00
Lincoln Stein
7718af041c Merge pull request #178 from lstein/dream-web-upscaling
FEAT: Dream web upscaling
2022-08-29 06:59:21 -04:00
Lincoln Stein
30dbf0e589 Merge pull request #177 from lstein/bugfixes
Bugfixes to image generation logic
2022-08-29 06:58:42 -04:00
tesseractcat
070795a3b4 webui: stream progress events to page 2022-08-28 21:54:10 -07:00
Lincoln Stein
e351d6ffe5 set correct default values for scaling and sampler; closes issues #167 #157 2022-08-29 00:13:18 -04:00
Lincoln Stein
46464ac677 remove unused metadatastr variable 2022-08-28 23:45:50 -04:00
Lincoln Stein
03d8eb19e0 when no callback used, modify results list so that upscaled/face-fixed image replaces the old one 2022-08-28 23:40:04 -04:00
xra
fef632e0e1 tokenization logging (take 2)
This adds an option -t argument that will print out color-coded tokenization, SD has a maximum of 77 tokens, it silently discards tokens over the limit if your prompt is too long.
By using -t you can see how your prompt is being tokenized which helps prompt crafting.
2022-08-29 12:28:49 +09:00
Lincoln Stein
05061a70b3 report errors on non-cuda systems rather than failing silently 2022-08-28 23:13:23 -04:00
Lincoln Stein
617a029ae7 pass outdir from txt2img() and img2img() to prompt2img() correctly 2022-08-28 23:12:49 -04:00
Kevin Gibbons
7ae79b350e write log files in utf-8, not ascii 2022-08-28 20:00:11 -07:00
Lincoln Stein
9a8cd9684e Merge pull request #173 from warner-benjamin/bug-fixes-improvements
Fix grid image saving & sampler selection, log to outdir path, display sampler options once
2022-08-28 22:46:38 -04:00
Lincoln Stein
18899be4ae working, but there is a bug in underlying txt2png() call that is preventing upscaled images from being returned 2022-08-28 22:42:31 -04:00
Benjamin Warner
3ea505bc2d Merge branch 'lstein:main' into bug-fixes-improvements 2022-08-28 21:34:13 -05:00
Lincoln Stein
e2ae6d288d added reset to defaults button and sampler selection 2022-08-28 21:35:52 -04:00
blessedcoolant
41b26e0520 Merge pull request #171 from blessedcoolant/sampler-bug-fix
Fix sampler changer not working.
2022-08-29 13:06:30 +12:00
blessedcoolant
b6053108c1 Merge pull request #168 from blessedcoolant/bug-fixes
Fixed grid image not saving
2022-08-29 13:05:53 +12:00
Lincoln Stein
22365a3f12 begin adding fields for GFPGAN and ESRGAN adjustment; only making public because need to switch computers 2022-08-28 21:04:32 -04:00
Benjamin Warner
594c0eeb8c check in rest of sampler fix 2022-08-28 19:53:25 -05:00
Benjamin Warner
529040708b Fix grid image saving, log to outdir path, display sampler options once 2022-08-28 19:34:55 -05:00
blessedcoolant
f0e2fa781f Grid image not saving after recent changes has been fixed. 2022-08-29 11:29:45 +12:00
blessedcoolant
87b7446228 Fix unique filename bug 2022-08-29 11:28:16 +12:00
blessedcoolant
8a517fdc17 Fix sampler changer not working. 2022-08-29 11:26:19 +12:00
Lincoln Stein
373a2d9c32 Merge branch 'main' of github.com:lstein/stable-diffusion into main 2022-08-28 19:03:45 -04:00
Lincoln Stein
1f8bc9482a added support for changing sampler on prompt line 2022-08-28 19:03:38 -04:00
Lincoln Stein
b85773f332 resolved conflicts and write properly-formatted prompt string (with sampler & upscaling) into image file 2022-08-28 19:01:45 -04:00
Lincoln Stein
ddc0e9b4d8 Merge pull request #133 from bakkot/dir-traversal
prevent directory traversal in the web UI
2022-08-28 18:32:12 -04:00
Lincoln Stein
44a48d0981 Merge pull request #130 from bakkot/patch-1
make web ui default to 512x512
2022-08-28 18:30:32 -04:00
Lincoln Stein
8bbe7936bd close Issue #165 2022-08-28 18:21:20 -04:00
Kevin Gibbons
9e7865704a prevent directory traversal in the web UI 2022-08-28 14:33:30 -07:00
Lincoln Stein
ac02a775e4 moved server.py into right location 2022-08-28 17:27:43 -04:00
Lincoln Stein
7c485a1a4a adjusted -U upscaling argument so that it defaults to upscaling strength 0.75 if the second argument is not given 2022-08-28 17:26:39 -04:00
Lincoln Stein
36bc989a27 Merge branch 'blessedcoolant-gfpgan-optimization' into main
This reduces VRAM requirements when GFPGAN face fixing and Real-ESRGAN
upscaling are used. --gfpgan flag is no longer needed (or accepted)
2022-08-28 17:06:49 -04:00
Lincoln Stein
ea2ee33be8 cosmetic fixup to how the outputs are reported 2022-08-28 17:06:33 -04:00
Lincoln Stein
5d67986997 Merge branch 'dagf2101-main' into main
This consolidates dream.py with dream_web.py, such that you now invoke
the web server by executing "scripts/dream.py --web (+other options)"
2022-08-28 16:37:50 -04:00
Lincoln Stein
7dfca3dcb5 moved scripts/dream_server.py into ldm/dream/server.py 2022-08-28 16:37:27 -04:00
blessedcoolant
e0de42bd03 Update README.md 2022-08-29 08:25:55 +12:00
blessedcoolant
614974a8e8 Merge branch 'main' into gfpgan-optimization 2022-08-29 08:22:26 +12:00
blessedcoolant
6e49c070bb Optimize and Improve GFPGAN and Real-ESRGAN Pipeline 2022-08-29 08:14:29 +12:00
Lincoln Stein
08a9702b73 Merge branch 'main' of github.com:lstein/stable-diffusion into main 2022-08-28 15:54:33 -04:00
Lincoln Stein
042a9043d1 got rid of the cd and pwd commands, and just allow user to specify --outdir on the command 2022-08-28 15:54:12 -04:00
Lincoln Stein
a7ac93a899 Merge pull request #110 from sajattack/half-precision-embeddings
Support full-precision embeddings in half precision inference mode
2022-08-28 15:36:26 -04:00
Lincoln Stein
3b2569ebdd Merge branch 'yunsaki-main' into main 2022-08-28 14:20:48 -04:00
Lincoln Stein
8b9a520c5c adjusted handling of from_file 2022-08-28 14:20:34 -04:00
Lincoln Stein
ba03289c14 print current and max VRAM usage stats after each round of generation 2022-08-28 13:05:01 -04:00
blessedcoolant
d1551b1bd4 Enable users to set sampler using prompts 2022-08-29 04:27:54 +12:00
Andy Pilate
fab9e1a423 Fix wrong help message 2022-08-28 17:11:24 +02:00
Muhammad Usama
59be6c815d bug fix in grid
In case of 6 images 3rd image was also copied to 4th box missing the last image in the grid.
2022-08-28 15:39:09 +01:00
artmen1516
ff6c11406b add notebook and readme section 2022-08-27 17:20:44 -07:00
Kevin Gibbons
6f90c7daf6 make web ui default to 512x512 2022-08-26 18:26:22 -07:00
Lincoln Stein
38ed6393fa updated TODO 2022-08-26 14:59:53 -04:00
Lincoln Stein
a5a3300fc6 Merge pull request #114 from TesseractCat/main
Replace numerical size inputs with dropdowns
2022-08-26 14:42:57 -04:00
tesseractcat
0ab03a5fde Replace numerical size inputs with dropdowns 2022-08-26 13:35:27 -04:00
Lincoln Stein
800132970e Merge pull request #105 from shusso/select-device
Move torch.device selection to it's own function
2022-08-26 12:23:21 -04:00
Paul Sajna
555f13e469 Merge branch 'main' into half-precision-embeddings 2022-08-26 08:33:46 -07:00
Paul Sajna
9b5101cd8d support full-precision embeddings in half precision mode 2022-08-26 08:30:58 -07:00
yun saki
7040995ceb fixed variable name error 2022-08-26 14:25:49 +02:00
yun saki
5129f256a3 simplet2i: changed image file handling to work as stated in the [docs](https://pillow.readthedocs.io/en/stable/reference/open_files.html) 2022-08-26 14:13:16 +02:00
Lincoln Stein
b0b4ccf521 Merge pull request #101 from BaristaLabs/remove-gpfgan
Set default to none for gfpgan_strength
2022-08-26 07:55:47 -04:00
Samuel Husso
ed72ff3268 Move torch.device selection to it's own function 2022-08-26 14:43:18 +03:00
yun saki
89805a5239 fixed mistake in comment 2022-08-26 13:25:12 +02:00
yun saki
e00397f9ca refactored logfile handling; minimised time spent in context managers (with open) 2022-08-26 13:22:53 +02:00
yun saki
12f59e1daa removed log.close(); 'with open' automatically closes the file 2022-08-26 13:12:56 +02:00
yun saki
cf750f62db refactored infile handling 2022-08-26 13:10:37 +02:00
yun saki
0f28663805 remove redundant None check (if var does the same thing) 2022-08-26 12:43:13 +02:00
Sean McLellan
f3fad22cb6 Fix 2022-08-26 05:27:34 -04:00
Sean McLellan
7bf0bc5208 fix comment 2022-08-26 04:08:18 -04:00
Sean McLellan
4e5aa7e714 fix comment 2022-08-26 04:07:01 -04:00
Sean McLellan
46a223f229 Double check for null and 0, and add a comment to indicate intent 2022-08-26 04:05:09 -04:00
Sean McLellan
eb9f0be91a Set default to none for gfpgan_strength 2022-08-26 03:53:55 -04:00
Lincoln Stein
4f02b72c9c prettified all the code using "blue" at the urging of @tildebyte 2022-08-26 03:15:42 -04:00
Lincoln Stein
dd670200bb documentation tweaks for installation and running of the GFPGAN extension; now you have the ability to specify the previous image's seed with -S -1, the one before that with -S -2, and so forth 2022-08-26 02:17:14 -04:00
Lincoln Stein
8f89a2456a something is not quite right; when providing -G1 option on one prompt, and then omitting it on the next, I see a "images do not match" error from GFPGAN 2022-08-26 01:20:01 -04:00
Sean McLellan
407d70a987 Fix backwards logic 2022-08-26 00:49:12 -04:00
Sean McLellan
f1ffb5b51b Fix blend if the target image has been upscaled 2022-08-26 00:45:19 -04:00
Sean McLellan
4f1664ec4f remove params 2022-08-26 00:41:41 -04:00
Sean McLellan
fcdd95b652 Refactor so that behavior is consolidated at top level 2022-08-26 00:39:57 -04:00
Sean McLellan
470a62dbbe Merge branch 'main' of https://github.com/BaristaLabs/stable-diffusion-dream into add-gfpgan-option 2022-08-26 00:26:03 -04:00
Lincoln Stein
2c08cf7175 Merge branch 'more-refactoring' into main
This breaks up the dream utility modules in a more
sensible manner.
2022-08-25 23:59:30 -04:00
Lincoln Stein
539c15966d Update README.md
Put in a plug for Yansuki's morphing code.
2022-08-25 23:54:44 -04:00
Lincoln Stein
5f844807cb Update README.md
Removed a bit of an uncaught merge conflict warning.
2022-08-25 23:50:56 -04:00
Sean McLellan
cb86b9ae6e Remove the redundancy, better logging 2022-08-25 23:48:35 -04:00
Sean McLellan
3a30a8f2d2 Fix not being able to disable bgupscaler; update readme 2022-08-25 23:39:03 -04:00
Sean McLellan
60ed004328 Update readme, fix defaults for case-sensitive fs's 2022-08-25 23:31:08 -04:00
Sean McLellan
dbb9132f4d Merge branch 'main' of https://github.com/BaristaLabs/stable-diffusion-dream into add-gfpgan-option 2022-08-25 23:19:17 -04:00
Sean McLellan
5711b6d611 Add optional GFPGAN support 2022-08-25 22:57:30 -04:00
Lincoln Stein
f1bed52530 moved dream utilities into their own subfolder 2022-08-25 22:49:15 -04:00
Lincoln Stein
23fb4a72bb Merge branch 'bakkot-more-refactor' into main 2022-08-25 22:19:27 -04:00
Lincoln Stein
c38b6964b4 improved inline error messages slightly 2022-08-25 22:19:12 -04:00
Lincoln Stein
e202441f0c Merge branch 'more-refactor' of https://github.com/bakkot/stable-diffusion into bakkot-more-refactor 2022-08-25 21:55:08 -04:00
Lincoln Stein
d051d86df6 Merge pull request #96 from TesseractCat/main
Keep a log of requests for dream_web
2022-08-25 21:42:17 -04:00
tesseractcat
b49475a54f Keep a log of requests for dream_web 2022-08-25 21:06:17 -04:00
Kevin Gibbons
797de3257c fix batch_size 2022-08-25 17:28:52 -07:00
Kevin Gibbons
31b22e057d switch to generators 2022-08-25 17:06:06 -07:00
Kevin Gibbons
078859207d factor out loop 2022-08-25 16:51:39 -07:00
Kevin Gibbons
a10baf5808 factor out exception handler 2022-08-25 15:13:07 -07:00
Lincoln Stein
0eba55ddbc Merge pull request #91 from veprogames/update-ignorance
Properly remove src from repository, add ignored directory for initial images
2022-08-25 17:32:10 -04:00
Lincoln Stein
19fa222810 refactoring complete; please test carefully! 2022-08-25 17:30:08 -04:00
Lincoln Stein
b3e3b0e861 feature complete; looks like ready for merge 2022-08-25 17:26:48 -04:00
veprogames
dde2994d10 add inputs/ to .gitignore (a place for initial images) 2022-08-25 22:31:24 +02:00
veprogames
888ca39ce2 remove k-diffusion from repository (git rm --cached)
should fix conda environment hanging
2022-08-25 22:29:12 +02:00
Lincoln Stein
f4c95bfec0 Update README.md 2022-08-25 15:33:49 -04:00
Lincoln Stein
91d3e4605e Update README.md 2022-08-25 15:32:48 -04:00
Lincoln Stein
652c67c90e Update README.md 2022-08-25 15:29:41 -04:00
Lincoln Stein
2114c386ad moved index.js .html and .css files into static/dream_web/; changed batch to iterations again 2022-08-25 15:27:43 -04:00
Lincoln Stein
6d2b4cbda1 Merge branch 'main' of github.com:lstein/stable-diffusion into main 2022-08-25 15:15:07 -04:00
Lincoln Stein
562831fc4b Merge branch 'TesseractCat-main' into main 2022-08-25 15:14:50 -04:00
Lincoln Stein
d04518e65e resolved conflicts in use of batch vs iterations 2022-08-25 15:14:38 -04:00
Lincoln Stein
d598b6c79d Update README.md 2022-08-25 15:11:06 -04:00
Lincoln Stein
4ec21a5423 resolved conflicts 2022-08-25 15:09:55 -04:00
Lincoln Stein
b64c902354 added missing image 2022-08-25 15:06:10 -04:00
Lincoln Stein
2ada3288e7 Small cleanups.
- Quenched tokenizer warnings during model initialization.
- Changed "batch" to "iterations" for generating multiple images in
  order to conserve vram.
- Updated README.
- Moved static folder from under scripts to top level. Can store other
  static content there in future.
- Added screenshot of web server in action (to static folder).
2022-08-25 15:03:40 -04:00
tesseractcat
91966e9ffa Fix appearance on mobile 2022-08-25 15:01:08 -04:00
tesseractcat
2ad73246f9 Normalize working directory 2022-08-25 14:27:33 -04:00
tesseractcat
d3a802db69 Fix horizontal divider 2022-08-25 14:18:29 -04:00
tesseractcat
b95908daec Move style and script to individual files 2022-08-25 14:15:08 -04:00
Lincoln Stein
79add5f0b6 Merge branch 'main' of https://github.com/TesseractCat/stable-diffusion into TesseractCat-main 2022-08-25 13:52:44 -04:00
Lincoln Stein
650ae3eb13 Merge pull request #89 from BlueAmulet/remove-accelerate
Remove accelerate library
2022-08-25 13:48:48 -04:00
Lincoln Stein
0e3059728c Merge pull request #85 from JigenD/VRAMutilizationFix
fix VRAM utilization
2022-08-25 13:47:49 -04:00
BlueAmulet
b7735b3788 Fix attribution 2022-08-25 11:13:12 -06:00
BlueAmulet
39b55ae016 Remove accelerate library
This library is not required to use k-diffusion
Make k-diffusion wrapper closer to the other samplers
2022-08-25 11:04:57 -06:00
JigenD
e82c5eba18 PR revision: replace cuda call with dynamic type 2022-08-25 12:18:35 -04:00
Lincoln Stein
1c8ecacddf remove src directory, which is gumming up conda installs; addresses issue #77 2022-08-25 10:43:05 -04:00
Lincoln Stein
26dc05e0e0 document --from_file flag, closes issue #82 2022-08-25 09:47:27 -04:00
Lincoln Stein
49247b4aa4 fix performance regression; closes issue #42 2022-08-25 09:41:12 -04:00
JigenD
eb58276a2c fix VRAM utilization 2022-08-25 08:34:51 -04:00
tesseractcat
72a9d75330 404 on missing file 2022-08-25 01:25:22 -04:00
Lincoln Stein
1a7743f3c2 Merge pull request #79 from BaristaLabs/update-readme-with-variant-disc
Update readme with variant disc
2022-08-25 00:44:45 -04:00
Lincoln Stein
0b4459b707 mostly back to full functionality; just missing grid generation code 2022-08-25 00:42:37 -04:00
Sean McLellan
c521ac08ee Another update 2022-08-25 00:00:39 -04:00
Sean McLellan
29727f3e12 Another update 2022-08-24 23:59:37 -04:00
Sean McLellan
51b9a1d8d3 Update readme.md 2022-08-24 23:55:31 -04:00
tesseractcat
ab131cb55e Add img2img support, fix naming conventions 2022-08-24 23:03:02 -04:00
tesseractcat
269fcf92d9 Reapply prompt config on image click 2022-08-24 21:38:47 -04:00
Lincoln Stein
8b682ac83b Merge pull request #75 from tildebyte/docs-readme-update-109-notes
DOCS: update release features for v1.09 in README - add k_diffusion samplers note
2022-08-24 19:55:10 -04:00
Lincoln Stein
36e4130f1c Merge pull request #72 from BaristaLabs/fix-dependencies
Various fixes in requirements and variant counting.
2022-08-24 19:54:38 -04:00
Lincoln Stein
b978536385 code is reorganized and mostly functional. Grid needs to be brought back online, as well as naming of img2img variants (currently the variants get written but not logged) 2022-08-24 19:47:59 -04:00
tesseractcat
0a7fe6f2d9 Switch to ThreadingHTTPServer 2022-08-24 18:19:50 -04:00
Lincoln Stein
b12955c963 remove unneeded imports from dream.py 2022-08-24 17:57:44 -04:00
Lincoln Stein
9133087850 first draft at big refactoring; will be broken 2022-08-24 17:52:34 -04:00
Ben Alkov
25fa0ad1f2 docs(readme): update release features for v1.09 2022-08-24 17:50:29 -04:00
tesseractcat
df9f088eb4 Preserve prompt across generations 2022-08-24 17:28:59 -04:00
tesseractcat
b1600d4ca3 Update seed on click 2022-08-24 17:26:22 -04:00
tesseractcat
0efc3bf780 Add bare bones web UI 2022-08-24 17:04:30 -04:00
Sean McLellan
dd16fe16bb Fix issue where more than the expected number of variants are generated 2022-08-24 16:26:58 -04:00
Sean McLellan
4d72644db4 Housekeeping 2022-08-24 15:54:49 -04:00
Lincoln Stein
7ea168227c Update README.md
Added a few features that were missed in initial 1.09 commit.
2022-08-24 15:35:10 -04:00
Lincoln Stein
ef8ddffe46 updated README 2022-08-24 15:28:19 -04:00
Lincoln Stein
81cbcb919e add ability to generate variants on an img2img; warnings quieted 2022-08-24 15:26:59 -04:00
Lincoln Stein
1eec6b776b tweaked documentation and comments slightly 2022-08-24 15:25:52 -04:00
Lincoln Stein
776c747978 added warning message when width/height specified along with init img 2022-08-24 14:04:27 -04:00
Lincoln Stein
caf4dd4155 added a TODO list to keep track of user requests 2022-08-24 13:54:59 -04:00
Sean McLellan
ee10021ea2 bikeshedding 2022-08-24 13:36:27 -04:00
Sean McLellan
ca82acfd3b Remove unnecessary print, small optmi 2022-08-24 13:33:19 -04:00
Sean McLellan
feea5fb063 Merge branch 'main' of https://github.com/BaristaLabs/stable-diffusion-dream into add-simple-variant-mechanism 2022-08-24 13:16:15 -04:00
Sean McLellan
b5cdbd3b0b Fixes issue with cuda/current mismatch 2022-08-24 13:14:08 -04:00
Lincoln Stein
e043f238af backed out change from PR #44 that was causing ddim sampler to fail with the message 'sqrt _vml_cpu not implemented for 'Half' 2022-08-24 13:10:42 -04:00
Lincoln Stein
47a5da25b7 runtime errors now produce a stack trace 2022-08-24 12:57:04 -04:00
Lincoln Stein
f55f4d7156 squelch warnings about modified untracked content due to the src/* submodules 2022-08-24 12:31:48 -04:00
Lincoln Stein
5055e9e1d5 corrected double clip reference from requirements.txt 2022-08-24 12:09:22 -04:00
Sean McLellan
c6b5e930dc Merge branch 'main' of https://github.com/BaristaLabs/stable-diffusion-dream into add-simple-variant-mechanism 2022-08-24 12:06:29 -04:00
Sean McLellan
d33e1bf563 Add simple way to make variants 2022-08-24 12:02:36 -04:00
Lincoln Stein
923466387f minor tweak to .gitignore 2022-08-24 11:52:13 -04:00
Lincoln Stein
56f7b0f434 Merge branch 'warner-benjamin-small-improvements' into main 2022-08-24 11:51:16 -04:00
Lincoln Stein
c24a16ccb0 resolved merge conflicts 2022-08-24 11:50:48 -04:00
Lincoln Stein
ab8ee9bbb6 Merge branch 'BaristaLabs-ensure-image-exists' into main
This catches the case in which user specifies an img2img init img that
doesn't exist.
2022-08-24 11:42:55 -04:00
Lincoln Stein
37609d6e53 resolved merge conflicts 2022-08-24 11:42:44 -04:00
Lincoln Stein
fb9b845fda Merge branch 'BaristaLabs-add-textual-inversion' into main
This allows the use of a custom dataset to fine tune model outputs.
2022-08-24 11:33:39 -04:00
Lincoln Stein
9050ce152b Fixed up a few merge conflicts, looks good so far 2022-08-24 11:29:32 -04:00
Lincoln Stein
73901a2777 Merge pull request #58 from nicolai256/main
init img didn't work in textual inversion, now it does :)
2022-08-24 11:24:30 -04:00
Lincoln Stein
decd1a58d2 Merge branch 'escape-single-quotes' into main
This prevents single quotes in the prompt from generating a parse error.
2022-08-24 11:21:09 -04:00
Lincoln Stein
7f4a5e946d Merge branch 'tildebyte-feat-samplers-add-remaining-k' into main
This adds the remaining k_* samplers to the dream.py script.
2022-08-24 11:19:45 -04:00
Lincoln Stein
4bc64a6aff sampler now written to PNG metadata 2022-08-24 11:18:51 -04:00
Lincoln Stein
02cf5879a1 Merge branch 'feat-samplers-add-remaining-k' of https://github.com/tildebyte/stable-diffusion into tildebyte-feat-samplers-add-remaining-k
This adds all the remaining k_* sampling algorithms.
2022-08-24 10:56:24 -04:00
Lincoln Stein
d495bac307 updated README for version release 2022-08-24 09:31:17 -04:00
Lincoln Stein
3393b8cad1 added assertion checks for out-of-bound arguments; added various copyright and license agreement files 2022-08-24 09:22:27 -04:00
Benjamin Warner
886f1c0138 Undo more 'cuda' hardcoding 2022-08-24 00:39:25 -05:00
nicolai256
9588444f0e textual inversion + init img fix 2022-08-24 05:16:01 +02:00
Sean McLellan
24b11ecf9f Fix silent SystemExit if embedding_path is not specified 2022-08-23 22:45:02 -04:00
Sean McLellan
84989f0d05 Remote token output on startup 2022-08-23 22:39:10 -04:00
Sean McLellan
a93a79568d Tweak save_top_k setting 2022-08-23 21:56:05 -04:00
Sean McLellan
7081a84600 Fix GPU parameter in readme, add SD style finetune config 2022-08-23 20:14:40 -04:00
Sean McLellan
1df1e5c38b Test for the presence of the specified img2img 2022-08-23 19:22:35 -04:00
Sean McLellan
5a513426bd Remove test .ps1 file 2022-08-23 18:32:38 -04:00
Sean McLellan
611ccb991e Remove another duplicate file 2022-08-23 18:31:41 -04:00
Sean McLellan
bde956647f Remove duplicate t2i file 2022-08-23 18:29:50 -04:00
Sean McLellan
8952196bbf Add personalization 2022-08-23 18:26:28 -04:00
Ben Alkov
050dffd269 feat(samplers): add ability use all k_* samplers
Signed-off-by: Ben Alkov <ben.alkov@gmail.com>
2022-08-23 17:26:22 -04:00
Sean McLellan
0cdf5e61b0 Merge pull request #1 from lstein/main
Upstream changes
2022-08-23 15:44:36 -04:00
Benjamin Warner
de1cea92ce Small QoL imporvements 2022-08-23 12:49:17 -05:00
Lincoln Stein
3a58988e4a escape single quotes in the command stream so as not to confuse the shlex parser 2022-08-23 13:46:50 -04:00
Lincoln Stein
7a67d3d837 confirmed that pip install -r requirements.txt is working as expected 2022-08-23 13:15:03 -04:00
Lincoln Stein
9050f3d399 Merge branch 'main' into pip-install-test 2022-08-23 11:00:57 -04:00
Lincoln Stein
a21156e3e3 removed the reference to the mod-deleted Reddit walkthru of prompt weighting usage 2022-08-23 10:44:33 -04:00
Lincoln Stein
716dbbdf8c resolved conflicts in README changelog 2022-08-23 10:40:22 -04:00
Lincoln Stein
1f2e52a1d6 fixed filename generation so that newer files are always chronologically later 2022-08-23 10:39:18 -04:00
Lincoln Stein
dc788f92b3 in output directory, new image files always start with the number higher than the previous maximum filename to ensure alphabetic sort==chronological sort 2022-08-23 10:19:11 -04:00
Lincoln Stein
13774912f4 generated requirements.txt to test pip-only (non-conda) install 2022-08-23 09:46:27 -04:00
Lincoln Stein
cb9e6d544a Update README.md
Fixed a forward/backward slash error.
2022-08-23 09:35:28 -04:00
Lincoln Stein
a6d6bafd13 updated README with info on weighted partial prompts 2022-08-23 01:58:47 -04:00
Lincoln Stein
9d1343dce3 resolved conflicts and tested 2022-08-23 01:44:43 -04:00
Lincoln Stein
11c0df07b7 prompt weighting not working 2022-08-23 01:23:14 -04:00
Lincoln Stein
ca8a799373 Merge pull request #24 from bakkot/patch-1
Fix usage of simplified API in readme
2022-08-23 01:02:13 -04:00
Lincoln Stein
710b908290 Keyboard interrupt retains seed and log information in files produced prior to interrupt. Closes #21 2022-08-23 00:51:38 -04:00
Kevin Gibbons
c80ce4fff5 fix default config to match docs / dream.py 2022-08-22 21:46:22 -07:00
Lincoln Stein
bc7b1fdd37 Added --from_file argument to load input from a file. Closes #23 2022-08-23 00:30:06 -04:00
Kevin Gibbons
1b7d414784 Fix usage of simplified API in readme 2022-08-22 21:01:15 -07:00
Lincoln Stein
6d1219deec fixed filenames 2022-08-22 23:56:36 -04:00
Lincoln Stein
e019de34ac can now change output directories in mid-session using cd and pwd commands 2022-08-22 21:14:31 -04:00
Lincoln Stein
88563fd27a added support for cd command in path completer 2022-08-22 21:01:06 -04:00
Lincoln Stein
18289dabcb better exception handling for out of memory errors and badly formatted prompts 2022-08-22 16:55:18 -04:00
Lincoln Stein
e70169257e better exception handling for out of memory errors and badly formatted prompts 2022-08-22 16:55:06 -04:00
Lincoln Stein
2afa87e911 Update README.md 2022-08-22 15:45:44 -04:00
Lincoln Stein
281e381cfc clarify use of preload_models.py 2022-08-22 15:42:06 -04:00
Lincoln Stein
9a121f6190 updated changelog 2022-08-22 15:34:57 -04:00
Lincoln Stein
a20827697c adjusted instructions for the released stable-diffusion-v1 weights 2022-08-22 15:33:27 -04:00
Lincoln Stein
9391eaff0e Merge branch 'prompt-in-png' into main 2022-08-22 13:24:12 -04:00
Lincoln Stein
e1d52822c5 fixed crash that occurs if you type an empty prompt at the dream> prompt 2022-08-22 12:40:54 -04:00
xra
e4eb775b63 added optional parameter to skip subprompt weight normalization
allows more control when fine-tuning
2022-08-23 00:03:32 +09:00
xra
a3632f5b4f improved comments & added warning if value couldn't be parsed correctly 2022-08-22 23:32:01 +09:00
Lincoln Stein
63989ce6ff tidied up scripts directory by moving the original CompViz scripts into a subfolder 2022-08-22 10:11:54 -04:00
Lincoln Stein
24b88c6fc5 Update README.md 2022-08-22 10:01:06 -04:00
xra
2736d7e15e optional weighting for creative blending of prompts
example: "an apple: a banana:0 a watermelon:0.5"
        the above example turns into 3 sub-prompts:
        "an apple" 1.0 (default if no value)
        "a banana" 0.0
        "a watermelon" 0.5
        The weights are added and normalized
        The resulting image will be: apple 66%, banana 0%, watermelon 33%
2022-08-22 22:59:06 +09:00
Lincoln Stein
7cb5149a02 Update README.md
Fixed typo in the change log (wrong version #)
2022-08-22 00:27:48 -04:00
Lincoln Stein
ea3501a8c4 Merge branch 'main' of github.com:lstein/stable-diffusion into main 2022-08-22 00:26:11 -04:00
Lincoln Stein
8caa27bef0 Close #2 2022-08-22 00:26:03 -04:00
Lincoln Stein
ddf0ef3af1 updated README for image metadata storage 2022-08-22 00:22:12 -04:00
Lincoln Stein
aa2729d868 user's prompt is now normalized for reproducibility and written into the destination PNG file as a tEXt metadata chunk named "Dream". You can retrieve the prompt with an image editing program that supports browsing the full metadata, or with the images2prompt.py script located in 'scripts' 2022-08-22 00:12:16 -04:00
Lincoln Stein
5f352aec87 test of normalization of prompt 2022-08-21 22:48:40 -04:00
Lincoln Stein
c4c4974b39 Update README.md
Fixed formatting in changelog.
2022-08-21 21:48:02 -04:00
Lincoln Stein
194f43f00b Update README.md
Add acknowledges for those who sent pull requests.
2022-08-21 21:46:00 -04:00
Lincoln Stein
325bc5280e Updated README.md
Fix the path for where to install the LIAON-400m model.
2022-08-21 20:48:44 -04:00
Lincoln Stein
11cc8e545b Clarified the required Python version (3.8.5) 2022-08-21 20:30:21 -04:00
Lincoln Stein
9adac56f4e Fixed incorrect conda env update command 2022-08-21 20:27:25 -04:00
Lincoln Stein
5d5307dcb4 Update README.md 2022-08-21 20:20:22 -04:00
114 changed files with 12578 additions and 3247 deletions

View File

@@ -0,0 +1,32 @@
import argparse
import numpy as np
from PIL import Image
def read_image_int16(image_path):
image = Image.open(image_path)
return np.array(image).astype(np.int16)
def calc_images_mean_L1(image1_path, image2_path):
image1 = read_image_int16(image1_path)
image2 = read_image_int16(image2_path)
assert image1.shape == image2.shape
mean_L1 = np.abs(image1 - image2).mean()
return mean_L1
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('image1_path')
parser.add_argument('image2_path')
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
mean_L1 = calc_images_mean_L1(args.image1_path, args.image2_path)
print(mean_L1)

Binary file not shown.

After

Width:  |  Height:  |  Size: 416 KiB

View File

@@ -0,0 +1 @@
"a photograph of an astronaut riding a horse" -s50 -S42

View File

@@ -0,0 +1,20 @@
# generate an image
PROMPT_FILE=".dev_scripts/sample_command.txt"
OUT_DIR="outputs/img-samples/test_regression_txt2img_v1_4"
SAMPLES_DIR=${OUT_DIR}
python scripts/dream.py \
--from_file ${PROMPT_FILE} \
--outdir ${OUT_DIR} \
--sampler plms \
--full_precision
# original output by CompVis/stable-diffusion
IMAGE1=".dev_scripts/images/v1_4_astronaut_rides_horse_plms_step50_seed42.png"
# new output
IMAGE2=`ls -A ${SAMPLES_DIR}/*.png | sort | tail -n 1`
echo ""
echo "comparing the following two images"
echo "IMAGE1: ${IMAGE1}"
echo "IMAGE2: ${IMAGE2}"
python .dev_scripts/diff_images.py ${IMAGE1} ${IMAGE2}

View File

@@ -0,0 +1,23 @@
# generate an image
PROMPT="a photograph of an astronaut riding a horse"
OUT_DIR="outputs/txt2img-samples/test_regression_txt2img_v1_4"
SAMPLES_DIR="outputs/txt2img-samples/test_regression_txt2img_v1_4/samples"
python scripts/orig_scripts/txt2img.py \
--prompt "${PROMPT}" \
--outdir ${OUT_DIR} \
--plms \
--ddim_steps 50 \
--n_samples 1 \
--n_iter 1 \
--seed 42
# original output by CompVis/stable-diffusion
IMAGE1=".dev_scripts/images/v1_4_astronaut_rides_horse_plms_step50_seed42.png"
# new output
IMAGE2=`ls -A ${SAMPLES_DIR}/*.png | sort | tail -n 1`
echo ""
echo "comparing the following two images"
echo "IMAGE1: ${IMAGE1}"
echo "IMAGE2: ${IMAGE2}"
python .dev_scripts/diff_images.py ${IMAGE1} ${IMAGE2}

4
.gitattributes vendored Normal file
View File

@@ -0,0 +1,4 @@
# Auto normalizes line endings on commit so devs don't need to change local settings.
# Only affects text files and ignores other file types.
# For more info see: https://www.aleksandrhovhannisyan.com/blog/crlf-vs-lf-normalizing-line-endings-in-git/
* text=auto

36
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@@ -0,0 +1,36 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
**Describe your environment**
- GPU: [cuda/amd/mps/cpu]
- VRAM: [if known]
- CPU arch: [x86/arm]
- OS: [Linux/Windows/macOS]
- Python: [Anaconda/miniconda/miniforge/pyenv/other (explain)]
- Branch: [if `git status` says anything other than "On branch main" paste it here]
- Commit: [run `git show` and paste the line that starts with "Merge" here]
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Additional context**
Add any other context about the problem here.

View File

@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

64
.github/workflows/cache-model.yml vendored Normal file
View File

@@ -0,0 +1,64 @@
name: Cache Model
on:
workflow_dispatch
jobs:
build:
strategy:
matrix:
os: [ macos-12 ]
name: Create Caches using ${{ matrix.os }}
runs-on: ${{ matrix.os }}
steps:
- name: Checkout sources
uses: actions/checkout@v3
- name: Cache model
id: cache-sd-v1-4
uses: actions/cache@v3
env:
cache-name: cache-sd-v1-4
with:
path: models/ldm/stable-diffusion-v1/model.ckpt
key: ${{ env.cache-name }}
restore-keys: |
${{ env.cache-name }}
- name: Download Stable Diffusion v1.4 model
if: ${{ steps.cache-sd-v1-4.outputs.cache-hit != 'true' }}
continue-on-error: true
run: |
if [ ! -e models/ldm/stable-diffusion-v1 ]; then
mkdir -p models/ldm/stable-diffusion-v1
fi
if [ ! -e models/ldm/stable-diffusion-v1/model.ckpt ]; then
curl -o models/ldm/stable-diffusion-v1/model.ckpt ${{ secrets.SD_V1_4_URL }}
fi
# Uncomment this when we no longer make changes to environment-mac.yaml
# - name: Cache environment
# id: cache-conda-env-ldm
# uses: actions/cache@v3
# env:
# cache-name: cache-conda-env-ldm
# with:
# path: ~/.conda/envs/ldm
# key: ${{ env.cache-name }}
# restore-keys: |
# ${{ env.cache-name }}
- name: Install dependencies
# if: ${{ steps.cache-conda-env-ldm.outputs.cache-hit != 'true' }}
run: |
conda env create -f environment-mac.yaml
- name: Cache hugginface and torch models
id: cache-hugginface-torch
uses: actions/cache@v3
env:
cache-name: cache-hugginface-torch
with:
path: ~/.cache
key: ${{ env.cache-name }}
restore-keys: |
${{ env.cache-name }}
- name: Download Huggingface and Torch models
if: ${{ steps.cache-hugginface-torch.outputs.cache-hit != 'true' }}
continue-on-error: true
run: |
export PYTHON_BIN=/usr/local/miniconda/envs/ldm/bin/python
$PYTHON_BIN scripts/preload_models.py

80
.github/workflows/macos12-miniconda.yml vendored Normal file
View File

@@ -0,0 +1,80 @@
name: Build
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
build:
strategy:
matrix:
os: [ macos-12 ]
name: Build on ${{ matrix.os }} miniconda
runs-on: ${{ matrix.os }}
steps:
- name: Checkout sources
uses: actions/checkout@v3
- name: Cache model
id: cache-sd-v1-4
uses: actions/cache@v3
env:
cache-name: cache-sd-v1-4
with:
path: models/ldm/stable-diffusion-v1/model.ckpt
key: ${{ env.cache-name }}
restore-keys: |
${{ env.cache-name }}
- name: Download Stable Diffusion v1.4 model
if: ${{ steps.cache-sd-v1-4.outputs.cache-hit != 'true' }}
continue-on-error: true
run: |
if [ ! -e models/ldm/stable-diffusion-v1 ]; then
mkdir -p models/ldm/stable-diffusion-v1
fi
if [ ! -e models/ldm/stable-diffusion-v1/model.ckpt ]; then
curl -o models/ldm/stable-diffusion-v1/model.ckpt ${{ secrets.SD_V1_4_URL }}
fi
# Uncomment this when we no longer make changes to environment-mac.yaml
# - name: Cache environment
# id: cache-conda-env-ldm
# uses: actions/cache@v3
# env:
# cache-name: cache-conda-env-ldm
# with:
# path: ~/.conda/envs/ldm
# key: ${{ env.cache-name }}
# restore-keys: |
# ${{ env.cache-name }}
- name: Install dependencies
# if: ${{ steps.cache-conda-env-ldm.outputs.cache-hit != 'true' }}
run: |
conda env create -f environment-mac.yaml
- name: Cache hugginface and torch models
id: cache-hugginface-torch
uses: actions/cache@v3
env:
cache-name: cache-hugginface-torch
with:
path: ~/.cache
key: ${{ env.cache-name }}
restore-keys: |
${{ env.cache-name }}
- name: Download Huggingface and Torch models
if: ${{ steps.cache-hugginface-torch.outputs.cache-hit != 'true' }}
continue-on-error: true
run: |
export PYTHON_BIN=/usr/local/miniconda/envs/ldm/bin/python
$PYTHON_BIN scripts/preload_models.py
- name: Run the tests
run: |
# Note, can't "activate" via automation, and activation is just env vars and path
export PYTHON_BIN=/usr/local/miniconda/envs/ldm/bin/python
export PYTORCH_ENABLE_MPS_FALLBACK=1
$PYTHON_BIN scripts/preload_models.py
mkdir -p outputs/img-samples
time $PYTHON_BIN scripts/dream.py --from_file tests/prompts.txt </dev/null 2> outputs/img-samples/err.log > outputs/img-samples/out.log
- name: Archive results
uses: actions/upload-artifact@v3
with:
name: results
path: outputs/img-samples

188
.gitignore vendored Normal file
View File

@@ -0,0 +1,188 @@
# ignore default image save location and model symbolic link
outputs/
models/ldm/stable-diffusion-v1/model.ckpt
# ignore a directory which serves as a place for initial images
inputs/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# emacs autosave and recovery files
*~
.#*
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# WebUI temp files:
img2img-tmp.png
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
src
**/__pycache__/
outputs
# Logs and associated folders
# created from generated embeddings.
logs
testtube
checkpoints
# If it's a Mac
.DS_Store

0
.gitmodules vendored Normal file
View File

30
LICENSE
View File

@@ -1,9 +1,27 @@
All rights reserved by the authors.
You must not distribute the weights provided to you directly or indirectly without explicit consent of the authors.
You must not distribute harmful, offensive, dehumanizing content or otherwise harmful representations of people or their environments, cultures, religions, etc. produced with the model weights
or other generated content described in the "Misuse and Malicious Use" section in the model card.
The model weights are provided for research purposes only.
MIT License
Copyright (c) 2022 Lincoln D. Stein (https://github.com/lstein)
This software is derived from a fork of the source code available from
https://github.com/pesser/stable-diffusion and
https://github.com/CompViz/stable-diffusion. They carry the following
copyrights:
Copyright (c) 2022 Machine Vision and Learning Group, LMU Munich
Copyright (c) 2022 Robin Rombach and Patrick Esser and contributors
Please see individual source code files for copyright and authorship
attributions.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
@@ -11,4 +29,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
SOFTWARE.

294
LICENSE-ModelWeights.txt Normal file
View File

@@ -0,0 +1,294 @@
Copyright (c) 2022 Robin Rombach and Patrick Esser and contributors
CreativeML Open RAIL-M
dated August 22, 2022
Section I: PREAMBLE
Multimodal generative models are being widely adopted and used, and
have the potential to transform the way artists, among other
individuals, conceive and benefit from AI or ML technologies as a tool
for content creation.
Notwithstanding the current and potential benefits that these
artifacts can bring to society at large, there are also concerns about
potential misuses of them, either due to their technical limitations
or ethical considerations.
In short, this license strives for both the open and responsible
downstream use of the accompanying model. When it comes to the open
character, we took inspiration from open source permissive licenses
regarding the grant of IP rights. Referring to the downstream
responsible use, we added use-based restrictions not permitting the
use of the Model in very specific scenarios, in order for the licensor
to be able to enforce the license in case potential misuses of the
Model may occur. At the same time, we strive to promote open and
responsible research on generative models for art and content
generation.
Even though downstream derivative versions of the model could be
released under different licensing terms, the latter will always have
to include - at minimum - the same use-based restrictions as the ones
in the original license (this license). We believe in the intersection
between open and responsible AI development; thus, this License aims
to strike a balance between both in order to enable responsible
open-science in the field of AI.
This License governs the use of the model (and its derivatives) and is
informed by the model card associated with the model.
NOW THEREFORE, You and Licensor agree as follows:
1. Definitions
- "License" means the terms and conditions for use, reproduction, and
Distribution as defined in this document.
- "Data" means a collection of information and/or content extracted
from the dataset used with the Model, including to train, pretrain,
or otherwise evaluate the Model. The Data is not licensed under this
License.
- "Output" means the results of operating a Model as embodied in
informational content resulting therefrom.
- "Model" means any accompanying machine-learning based assemblies
(including checkpoints), consisting of learnt weights, parameters
(including optimizer states), corresponding to the model
architecture as embodied in the Complementary Material, that have
been trained or tuned, in whole or in part on the Data, using the
Complementary Material.
- "Derivatives of the Model" means all modifications to the Model,
works based on the Model, or any other model which is created or
initialized by transfer of patterns of the weights, parameters,
activations or output of the Model, to the other model, in order to
cause the other model to perform similarly to the Model, including -
but not limited to - distillation methods entailing the use of
intermediate data representations or methods based on the generation
of synthetic data by the Model for training the other model.
- "Complementary Material" means the accompanying source code and
scripts used to define, run, load, benchmark or evaluate the Model,
and used to prepare data for training or evaluation, if any. This
includes any accompanying documentation, tutorials, examples, etc,
if any.
- "Distribution" means any transmission, reproduction, publication or
other sharing of the Model or Derivatives of the Model to a third
party, including providing the Model as a hosted service made
available by electronic or other remote means - e.g. API-based or
web access.
- "Licensor" means the copyright owner or entity authorized by the
copyright owner that is granting the License, including the persons
or entities that may have rights in the Model and/or distributing
the Model.
- "You" (or "Your") means an individual or Legal Entity exercising
permissions granted by this License and/or making use of the Model
for whichever purpose and in any field of use, including usage of
the Model in an end-use application - e.g. chatbot, translator,
image generator.
- "Third Parties" means individuals or legal entities that are not
under common control with Licensor or You.
- "Contribution" means any work of authorship, including the original
version of the Model and any modifications or additions to that
Model or Derivatives of the Model thereof, that is intentionally
submitted to Licensor for inclusion in the Model by the copyright
owner or by an individual or Legal Entity authorized to submit on
behalf of the copyright owner. For the purposes of this definition,
"submitted" means any form of electronic, verbal, or written
communication sent to the Licensor or its representatives, including
but not limited to communication on electronic mailing lists, source
code control systems, and issue tracking systems that are managed
by, or on behalf of, the Licensor for the purpose of discussing and
improving the Model, but excluding communication that is
conspicuously marked or otherwise designated in writing by the
copyright owner as "Not a Contribution."
- "Contributor" means Licensor and any individual or Legal Entity on
behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Model.
Section II: INTELLECTUAL PROPERTY RIGHTS
Both copyright and patent grants apply to the Model, Derivatives of
the Model and Complementary Material. The Model and Derivatives of the
Model are subject to additional terms as described in Section III.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare, publicly display, publicly
perform, sublicense, and distribute the Complementary Material, the
Model, and Derivatives of the Model.
3. Grant of Patent License. Subject to the terms and conditions of
this License and where and as applicable, each Contributor hereby
grants to You a perpetual, worldwide, non-exclusive, no-charge,
royalty-free, irrevocable (except as stated in this paragraph) patent
license to make, have made, use, offer to sell, sell, import, and
otherwise transfer the Model and the Complementary Material, where
such license applies only to those patent claims licensable by such
Contributor that are necessarily infringed by their Contribution(s)
alone or by combination of their Contribution(s) with the Model to
which such Contribution(s) was submitted. If You institute patent
litigation against any entity (including a cross-claim or counterclaim
in a lawsuit) alleging that the Model and/or Complementary Material or
a Contribution incorporated within the Model and/or Complementary
Material constitutes direct or contributory patent infringement, then
any patent licenses granted to You under this License for the Model
and/or Work shall terminate as of the date such litigation is asserted
or filed.
Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
4. Distribution and Redistribution. You may host for Third Party
remote access purposes (e.g. software-as-a-service), reproduce and
distribute copies of the Model or Derivatives of the Model thereof in
any medium, with or without modifications, provided that You meet the
following conditions: Use-based restrictions as referenced in
paragraph 5 MUST be included as an enforceable provision by You in any
type of legal agreement (e.g. a license) governing the use and/or
distribution of the Model or Derivatives of the Model, and You shall
give notice to subsequent users You Distribute to, that the Model or
Derivatives of the Model are subject to paragraph 5. This provision
does not apply to the use of Complementary Material. You must give
any Third Party recipients of the Model or Derivatives of the Model a
copy of this License; You must cause any modified files to carry
prominent notices stating that You changed the files; You must retain
all copyright, patent, trademark, and attribution notices excluding
those notices that do not pertain to any part of the Model,
Derivatives of the Model. You may add Your own copyright statement to
Your modifications and may provide additional or different license
terms and conditions - respecting paragraph 4.a. - for use,
reproduction, or Distribution of Your modifications, or for any such
Derivatives of the Model as a whole, provided Your use, reproduction,
and Distribution of the Model otherwise complies with the conditions
stated in this License.
5. Use-based restrictions. The restrictions set forth in Attachment A
are considered Use-based restrictions. Therefore You cannot use the
Model and the Derivatives of the Model for the specified restricted
uses. You may use the Model subject to this License, including only
for lawful purposes and in accordance with the License. Use may
include creating any content with, finetuning, updating, running,
training, evaluating and/or reparametrizing the Model. You shall
require all of Your users who use the Model or a Derivative of the
Model to comply with the terms of this paragraph (paragraph 5).
6. The Output You Generate. Except as set forth herein, Licensor
claims no rights in the Output You generate using the Model. You are
accountable for the Output you generate and its subsequent uses. No
use of the output can contravene any provision as stated in the
License.
Section IV: OTHER PROVISIONS
7. Updates and Runtime Restrictions. To the maximum extent permitted
by law, Licensor reserves the right to restrict (remotely or
otherwise) usage of the Model in violation of this License, update the
Model through electronic means, or modify the Output of the Model
based on updates. You shall undertake reasonable efforts to use the
latest version of the Model.
8. Trademarks and related. Nothing in this License permits You to make
use of Licensors trademarks, trade names, logos or to otherwise
suggest endorsement or misrepresent the relationship between the
parties; and any rights not expressly granted herein are reserved by
the Licensors.
9. Disclaimer of Warranty. Unless required by applicable law or agreed
to in writing, Licensor provides the Model and the Complementary
Material (and each Contributor provides its Contributions) on an "AS
IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied, including, without limitation, any warranties or
conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR
A PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Model, Derivatives of
the Model, and the Complementary Material and assume any risks
associated with Your exercise of permissions under this License.
10. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise, unless
required by applicable law (such as deliberate and grossly negligent
acts) or agreed to in writing, shall any Contributor be liable to You
for damages, including any direct, indirect, special, incidental, or
consequential damages of any character arising as a result of this
License or out of the use or inability to use the Model and the
Complementary Material (including but not limited to damages for loss
of goodwill, work stoppage, computer failure or malfunction, or any
and all other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
11. Accepting Warranty or Additional Liability. While redistributing
the Model, Derivatives of the Model and the Complementary Material
thereof, You may choose to offer, and charge a fee for, acceptance of
support, warranty, indemnity, or other liability obligations and/or
rights consistent with this License. However, in accepting such
obligations, You may act only on Your own behalf and on Your sole
responsibility, not on behalf of any other Contributor, and only if
You agree to indemnify, defend, and hold each Contributor harmless for
any liability incurred by, or claims asserted against, such
Contributor by reason of your accepting any such warranty or
additional liability.
12. If any provision of this License is held to be invalid, illegal or
unenforceable, the remaining provisions shall be unaffected thereby
and remain valid as if such provision had not been set forth herein.
END OF TERMS AND CONDITIONS
Attachment A
Use Restrictions
You agree not to use the Model or Derivatives of the Model:
- In any way that violates any applicable national, federal, state,
local or international law or regulation;
- For the purpose of exploiting, harming or attempting to exploit or
harm minors in any way;
- To generate or disseminate verifiably false information and/or
content with the purpose of harming others;
- To generate or disseminate personal identifiable information that
can be used to harm an individual;
- To defame, disparage or otherwise harass others;
- For fully automated decision making that adversely impacts an
individuals legal rights or otherwise creates or modifies a
binding, enforceable obligation;
pp- For any use intended to or which has the effect of discriminating
against or harming individuals or groups based on online or offline
social behavior or known or predicted personal or personality
characteristics;
- To exploit any of the vulnerabilities of a specific group of persons
based on their age, social, physical or mental characteristics, in
order to materially distort the behavior of a person pertaining to
that group in a manner that causes or is likely to cause that person
or another person physical or psychological harm;
- For any use intended to or which has the effect of discriminating
against individuals or groups based on legally protected
characteristics or categories;
- To provide medical advice and medical results interpretation;
- To generate or disseminate information for the purpose to be used
for administration of justice, law enforcement, immigration or
asylum processes, such as predicting an individual will commit
fraud/crime commitment (e.g. by text profiling, drawing causal
relationships between assertions made in documents, indiscriminate
and arbitrarily-targeted use).

527
README.md
View File

@@ -1,488 +1,165 @@
# Stable Diffusion Dream Script
<h1 align='center'><b>Stable Diffusion Dream Script</b></h1>
This is a fork of CompVis/stable-diffusion, the wonderful open source
text-to-image generator.
<p align='center'>
<img src="docs/assets/logo.png"/>
</p>
The original has been modified in several ways:
<p align="center">
<img src="https://img.shields.io/github/last-commit/lstein/stable-diffusion?logo=Python&logoColor=green&style=for-the-badge" alt="last-commit"/>
<img src="https://img.shields.io/github/stars/lstein/stable-diffusion?logo=GitHub&style=for-the-badge" alt="stars"/>
<br>
<img src="https://img.shields.io/github/issues/lstein/stable-diffusion?logo=GitHub&style=for-the-badge" alt="issues"/>
<img src="https://img.shields.io/github/issues-pr/lstein/stable-diffusion?logo=GitHub&style=for-the-badge" alt="pull-requests"/>
</p>
## Interactive command-line interface similar to the Discord bot
# **Stable Diffusion Dream Script**
The *dream.py* script, located in scripts/dream.py,
provides an interactive interface to image generation similar to
the "dream mothership" bot that Stable AI provided on its Discord
server. Unlike the txt2img.py and img2img.py scripts provided in the
original CompViz/stable-diffusion source code repository, the
time-consuming initialization of the AI model
initialization only happens once. After that image generation
from the command-line interface is very fast.
This is a fork of
[CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion),
the open source text-to-image generator. It provides a streamlined
process with various new features and options to aid the image
generation process. It runs on Windows, Mac and Linux machines,
and runs on GPU cards with as little as 4 GB or RAM.
The script uses the readline library to allow for in-line editing,
command history (up and down arrows), autocompletion, and more.
_Note: This fork is rapidly evolving. Please use the
[Issues](https://github.com/lstein/stable-diffusion/issues) tab to
report bugs and make feature requests. Be sure to use the provided
templates. They will help aid diagnose issues faster._
The script is confirmed to work on Linux and Windows systems. It should
work on MacOSX as well, but this is not confirmed. Note that this script
runs from the command-line (CMD or Terminal window), and does not have a GUI.
# **Table of Contents**
~~~~
(ldm) ~/stable-diffusion$ python3 ./scripts/dream.py
* Initializing, be patient...
Loading model from models/ldm/text2img-large/model.ckpt
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 872.30 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loading Bert tokenizer from "models/bert"
setting sampler to plms
1. [Installation](#installation)
2. [Major Features](#features)
3. [Changelog](#latest-changes)
4. [Troubleshooting](#troubleshooting)
5. [Contributing](#contributing)
6. [Support](#support)
* Initialization done! Awaiting your command...
dream> ashley judd riding a camel -n2 -s150
Outputs:
outputs/txt2img-samples/00009.png: "ashley judd riding a camel" -n2 -s150 -S 416354203
outputs/txt2img-samples/00010.png: "ashley judd riding a camel" -n2 -s150-S 1362479620
# Installation
dream> "there's a fly in my soup" -n6 -g
outputs/txt2img-samples/00041.png: "there's a fly in my soup" -n6 -g -S 2685670268
seeds for individual rows: [2685670268, 1216708065, 2335773498, 822223658, 714542046, 3395302430]
~~~~
This fork is supported across multiple platforms. You can find individual installation instructions below.
The dream> prompt's arguments are pretty much
identical to those used in the Discord bot, except you don't need to
type "!dream" (it doesn't hurt if you do). A significant change is that creation of individual images
is now the default
unless --grid (-g) is given. For backward compatibility, the -i switch is recognized.
For command-line help type -h (or --help) at the dream> prompt.
- ## [Linux](docs/installation/INSTALL_LINUX.md)
- ## [Windows](docs/installation/INSTALL_WINDOWS.md)
- ## [Macintosh](docs/installation/INSTALL_MAC.md)
The script itself also recognizes a series of command-line switches that will change
important global defaults, such as the directory for image outputs and the location
of the model weight files.
## **Hardware Requirements**
## Image-to-Image
**System**
This script also provides an img2img feature that lets you seed your
creations with a drawing or photo. This is a really cool feature that tells
stable diffusion to build the prompt on top of the image you provide, preserving
the original's basic shape and layout. To use it, provide the --init_img
option as shown here:
You wil need one of the following:
~~~~
dream> "waterfall and rainbow" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4
~~~~
- An NVIDIA-based graphics card with 4 GB or more VRAM memory.
- An Apple computer with an M1 chip.
The --init_img (-I) option gives the path to the seed picture. --strength (-f) controls how much
the original will be modified, ranging from 0.0 (keep the original intact), to 1.0 (ignore the original
completely). The default is 0.75, and ranges from 0.25-0.75 give interesting results.
**Memory**
## Changes
- At least 12 GB Main Memory RAM.
- v1.01 (21 August 2022)
* added k_lms sampling **Please run "conda update -f environment.yaml" to load the k_lms dependencies**
* use half precision arithmetic by default, resulting in faster execution and lower memory requirements
Pass argument --full_precision to dream.py to get slower but more accurate image generation
**Disk**
- At least 6 GB of free disk space for the machine learning model, Python, and all its dependencies.
## Installation
**Note**
### Linux/Mac
If you are have a Nvidia 10xx series card (e.g. the 1080ti), please
run the dream script in full-precision mode as shown below.
1. You will need to install the following prerequisites if they are not already available. Use your
operating system's preferred installer
* Python (version 3.8 or higher)
* git
Similarly, specify full-precision mode on Apple M1 hardware.
To run in full-precision mode, start `dream.py` with the
`--full_precision` flag:
2. Install the Python Anaconda environment manager using pip3.
```
~$ pip3 install anaconda
```
After installing anaconda, you should log out of your system and log back in. If the installation
worked, your command prompt will be prefixed by the name of the current anaconda environment, "(base)".
3. Copy the stable-diffusion source code from GitHub:
```
(base) ~$ git clone https://github.com/lstein/stable-diffusion.git
```
This will create stable-diffusion folder where you will follow the rest of the steps.
4. Enter the newly-created stable-diffusion folder. From this step forward make sure that you are working in the stable-diffusion directory!
```
(base) ~$ cd stable-diffusion
(base) ~/stable-diffusion$
```
5. Use anaconda to copy necessary python packages, create a new python environment named "ldm",
and activate the environment.
```
(base) ~/stable-diffusion$ conda env create -f environment.yaml
(base) ~/stable-diffusion$ conda activate ldm
(ldm) ~/stable-diffusion$
```
After these steps, your command prompt will be prefixed by "(ldm)" as shown above.
6. Load a couple of small machine-learning models required by stable diffusion:
```
(ldm) ~/stable-diffusion$ python3 scripts/preload_models.py
(ldm) ~/stable-diffusion$ python scripts/dream.py --full_precision
```
7. Now you need to install the weights for the stable diffusion model.
# Features
For testing prior to the release of the real weights, you can use an older weight file that produces low-quality images. Create a directory within stable-diffusion named "models/ldm/text2img.large", and use the wget URL downloader tool to copy the weight file into it:
```
(ldm) ~/stable-diffusion$ mkdir -p models/ldm/text2img-large
(ldm) ~/stable-diffusion$ wget -O models/ldm/text2img-large/model.ckpt https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt
```
For testing with the released weighs, you will do something similar, but with a directory named "models/ldm/stable-diffusion-v1"
```
(ldm) ~/stable-diffusion$ mkdir -p models/ldm/stable-diffusion-v1
(ldm) ~/stable-diffusion$ wget -O models/ldm/stable-diffusion-v1/model.ckpt <ENTER URL HERE>
```
These weight files are ~5 GB in size, so downloading may take a while.
## **Major Features**
8. Start generating images!
```
# for the pre-release weights use the -l or --liaon400m switch
(ldm) ~/stable-diffusion$ python3 scripts/dream.py -l
- ## [Interactive Command Line Interface](docs/features/CLI.md)
# for the post-release weights do not use the switch
(ldm) ~/stable-diffusion$ python3 scripts/dream.py
- ## [Image To Image](docs/features/IMG2IMG.md)
# for additional configuration switches and arguments, use -h or --help
(ldm) ~/stable-diffusion$ python3 scripts/dream.py -h
```
9. Subsequently, to relaunch the script, be sure to run "conda activate ldm" (step 5, second command), enter the "stable-diffusion"
directory, and then launch the dream script (step 8). If you forget to activate the ldm environment, the script will fail with multiple ModuleNotFound errors.
- ## [Inpainting Support](docs/features/INPAINTING.md)
### Updating to newer versions of the script
- ## [GFPGAN and Real-ESRGAN Support](docs/features/UPSCALE.md)
This distribution is changing rapidly. If you used the "git clone" method (step 5) to download the stable-diffusion directory, then to update to the latest and greatest version, launch the Anaconda window, enter "stable-diffusion", and type:
```
(ldm) ~/stable-diffusion$ git pull
```
This will bring your local copy into sync with the remote one.
- ## [Seamless Tiling](docs/features/OTHER.md#seamless-tiling)
### Windows
- ## [Google Colab](docs/features/OTHER.md#google-colab)
1. Install the most recent Python from here: https://www.python.org/downloads/windows/
- ## [Web Server](docs/features/WEB.md)
2. Install Anaconda3 (miniconda3 version) from here: https://docs.anaconda.com/anaconda/install/windows/
- ## [Reading Prompts From File](docs/features/OTHER.md#reading-prompts-from-a-file)
3. Install Git from here: https://git-scm.com/download/win
- ## [Shortcut: Reusing Seeds](docs/features/OTHER.md#shortcuts-reusing-seeds)
4. Launch Anaconda from the Windows Start menu. This will bring up a command window. Type all the remaining commands in this window.
- ## [Weighted Prompts](docs/features/OTHER.md#weighted-prompts)
5. Run the command:
```
git clone https://github.com/lstein/stable-diffusion.git
```
This will create stable-diffusion folder where you will follow the rest of the steps.
- ## [Variations](docs/features/VARIATIONS.md)
6. Enter the newly-created stable-diffusion folder. From this step forward make sure that you are working in the stable-diffusion directory!
```
cd stable-diffusion
```
- ## [Personalizing Text-to-Image Generation](docs/features/TEXTUAL_INVERSION.md)
7. Run the following two commands:
```
conda env create -f environment.yaml (step 7a)
conda activate ldm (step 7b)
```
This will install all python requirements and activate the "ldm" environment which sets PATH and other environment variables properly.
- ## [Simplified API for text to image generation](docs/features/OTHER.md#simplified-api)
8. Run the command:
```
python scripts\preload_models.py
```
This installs two machine learning models that stable diffusion requires.
## **Other Features**
9. Now you need to install the weights for the big stable diffusion model.
- ### [Creating Transparent Regions for Inpainting](docs/features/INPAINTING.md#creating-transparent-regions-for-inpainting)
For testing prior to the release of the real weights, create a directory within stable-diffusion named "models\ldm\text2img.large".
- ### [Preload Models](docs/features/OTHER.md#preload-models)
For testing with the released weights, create a directory within stable-diffusion named "models\ldm\stable-diffusion-v1".
# Latest Changes
Then use a web browser to copy model.ckpt into the appropriate directory. For the text2img.large (pre-release) model, the weights are at https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt. Check back here later for the release URL.
- v1.14 (11 September 2022)
10. Start generating images!
```
# for the pre-release weights
python scripts\dream.py -l
- Memory optimizations for small-RAM cards. 512x512 now possible on 4 GB GPUs.
- Full support for Apple hardware with M1 or M2 chips.
- Add "seamless mode" for circular tiling of image. Generates beautiful effects. ([prixt](https://github.com/prixt)).
- Inpainting support.
- Improved web server GUI.
- Lots of code and documentation cleanups.
# for the post-release weights
python scripts\dream.py
```
11. Subsequently, to relaunch the script, first activate the Anaconda command window (step 4), run "conda activate ldm" (step 7b), and then launch the dream script (step 10).
- v1.13 (3 September 2022
### Updating to newer versions of the script
- Support image variations (see [VARIATIONS](docs/features/VARIATIONS.md) ([Kevin Gibbons](https://github.com/bakkot) and many contributors and reviewers)
- Supports a Google Colab notebook for a standalone server running on Google hardware [Arturo Mendivil](https://github.com/artmen1516)
- WebUI supports GFPGAN/ESRGAN facial reconstruction and upscaling [Kevin Gibbons](https://github.com/bakkot)
- WebUI supports incremental display of in-progress images during generation [Kevin Gibbons](https://github.com/bakkot)
- A new configuration file scheme that allows new models (including upcoming stable-diffusion-v1.5)
to be added without altering the code. ([David Wager](https://github.com/maddavid12))
- Can specify --grid on dream.py command line as the default.
- Miscellaneous internal bug and stability fixes.
- Works on M1 Apple hardware.
- Multiple bug fixes.
This distribution is changing rapidly. If you used the "git clone" method (step 5) to download the stable-diffusion directory, then to update to the latest and greatest version, launch the Anaconda window, enter "stable-diffusion", and type:
```
git pull
```
This will bring your local copy into sync with the remote one.
For older changelogs, please visit **[CHANGELOGS](docs/CHANGELOG.md)**.
## Simplified API for text to image generation
# Troubleshooting
For programmers who wish to incorporate stable-diffusion into other
products, this repository includes a simplified API for text to image generation, which
lets you create images from a prompt in just three lines of code:
Please check out our **[Q&A](docs/help/TROUBLESHOOT.md)** to get solutions for common installation problems and other issues.
~~~~
from ldm.simplet2i import T2I
model = T2I()
outputs = model.text2image("a unicorn in manhattan")
~~~~
# Contributing
Outputs is a list of lists in the format [[filename1,seed1],[filename2,seed2]...]
Please see ldm/simplet2i.py for more information.
Anyone who wishes to contribute to this project, whether documentation, features, bug fixes, code cleanup, testing, or code reviews, is very much encouraged to do so. If you are unfamiliar with
how to contribute to GitHub projects, here is a [Getting Started Guide](https://opensource.com/article/19/7/create-pull-request-github).
A full set of contribution guidelines, along with templates, are in progress, but for now the most important thing is to **make your pull request against the "development" branch**, and not against "main". This will help keep public breakage to a minimum and will allow you to propose more radical changes.
## Workaround for machines with limited internet connectivity
## **Contributors**
My development machine is a GPU node in a high-performance compute
cluster which has no connection to the internet. During model
initialization, stable-diffusion tries to download the Bert tokenizer
and a file needed by the kornia library. This obviously didn't work
for me.
This fork is a combined effort of various people from across the world. [Check out the list of all these amazing people](docs/CONTRIBUTORS.md). We thank them for their time, hard work and effort.
To work around this, I have modified ldm/modules/encoders/modules.py
to look for locally cached Bert files rather than attempting to
download them. For this to work, you must run
"scripts/preload_models.py" once from an internet-connected machine
prior to running the code on an isolated one. This assumes that both
machines share a common network-mounted filesystem with a common
.cache directory.
~~~~
(ldm) ~/stable-diffusion$ python3 ./scripts/preload_models.py
preloading bert tokenizer...
Downloading: 100%|██████████████████████████████████| 28.0/28.0 [00:00<00:00, 49.3kB/s]
Downloading: 100%|██████████████████████████████████| 226k/226k [00:00<00:00, 2.79MB/s]
Downloading: 100%|██████████████████████████████████| 455k/455k [00:00<00:00, 4.36MB/s]
Downloading: 100%|██████████████████████████████████| 570/570 [00:00<00:00, 477kB/s]
...success
preloading kornia requirements...
Downloading: "https://github.com/DagnyT/hardnet/raw/master/pretrained/train_liberty_with_aug/checkpoint_liberty_with_aug.pth" to /u/lstein/.cache/torch/hub/checkpoints/checkpoint_liberty_with_aug.pth
100%|███████████████████████████████████████████████| 5.10M/5.10M [00:00<00:00, 101MB/s]
...success
~~~~
If you don't need this change and want to download the files just in
time, copy over the file ldm/modules/encoders/modules.py from the
CompVis/stable-diffusion repository. Or you can run preload_models.py
on the target machine.
## Support
# Support
For support,
please use this repository's GitHub Issues tracking service. Feel free
to send me an email if you use and like the script.
*Author:* Lincoln D. Stein <lincoln.stein@gmail.com>
# Original README from CompViz/stable-diffusion
*Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:*
[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://ommer-lab.com/research/latent-diffusion-models/)<br/>
[Robin Rombach](https://github.com/rromb)\*,
[Andreas Blattmann](https://github.com/ablattmann)\*,
[Dominik Lorenz](https://github.com/qp-qp)\,
[Patrick Esser](https://github.com/pesser),
[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)<br/>
**CVPR '22 Oral**
which is available on [GitHub](https://github.com/CompVis/latent-diffusion). PDF at [arXiv](https://arxiv.org/abs/2112.10752). Please also visit our [Project page](https://ommer-lab.com/research/latent-diffusion-models/).
![txt2img-stable2](assets/stable-samples/txt2img/merged-0006.png)
[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion
model.
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487),
this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).
## Requirements
A suitable [conda](https://conda.io/) environment named `ldm` can be created
and activated with:
```
conda env create -f environment.yaml
conda activate ldm
```
You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running
```
conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2
pip install -e .
```
## Stable Diffusion v1
Stable Diffusion v1 refers to a specific configuration of the model
architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet
and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and
then finetuned on 512x512 images.
*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present
in its training data.
Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion).
Research into the safe deployment of general text-to-image models is an ongoing effort. To prevent misuse and harm, we currently provide access to the checkpoints only for [academic research purposes upon request](https://stability.ai/academia-access-form).
**This is an experiment in safe and community-driven publication of a capable and general text-to-image model. We are working on a public release with a more permissive license that also incorporates ethical considerations.***
[Request access to Stable Diffusion v1 checkpoints for academic research](https://stability.ai/academia-access-form)
### Weights
We currently provide three checkpoints, `sd-v1-1.ckpt`, `sd-v1-2.ckpt` and `sd-v1-3.ckpt`,
which were trained as follows,
- `sd-v1-1.ckpt`: 237k steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
194k steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
- `sd-v1-2.ckpt`: Resumed from `sd-v1-1.ckpt`.
515k steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
- `sd-v1-3.ckpt`: Resumed from `sd-v1-2.ckpt`. 195k steps at resolution `512x512` on "laion-improved-aesthetics" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling
steps show the relative improvements of the checkpoints:
![sd evaluation results](assets/v1-variants-scores.jpg)
### Text-to-Image with Stable Diffusion
![txt2img-stable2](assets/stable-samples/txt2img/merged-0005.png)
![txt2img-stable2](assets/stable-samples/txt2img/merged-0007.png)
Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder.
#### Sampling Script
After [obtaining the weights](#weights), link them
```
mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt
```
and sample with
```
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
```
By default, this uses a guidance scale of `--scale 7.5`, [Katherine Crowson's implementation](https://github.com/CompVis/latent-diffusion/pull/51) of the [PLMS](https://arxiv.org/abs/2202.09778) sampler,
and renders images of size 512x512 (which it was trained on) in 50 steps. All supported arguments are listed below (type `python scripts/txt2img.py --help`).
```commandline
usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA] [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS]
[--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT] [--seed SEED] [--precision {full,autocast}]
optional arguments:
-h, --help show this help message and exit
--prompt [PROMPT] the prompt to render
--outdir [OUTDIR] dir to write results to
--skip_grid do not save a grid, only individual samples. Helpful when evaluating lots of samples
--skip_save do not save individual samples. For speed measurements.
--ddim_steps DDIM_STEPS
number of ddim sampling steps
--plms use plms sampling
--laion400m uses the LAION400M model
--fixed_code if enabled, uses the same starting code across samples
--ddim_eta DDIM_ETA ddim eta (eta=0.0 corresponds to deterministic sampling
--n_iter N_ITER sample this often
--H H image height, in pixel space
--W W image width, in pixel space
--C C latent channels
--f F downsampling factor
--n_samples N_SAMPLES
how many samples to produce for each given prompt. A.k.a. batch size
--n_rows N_ROWS rows in the grid (default: n_samples)
--scale SCALE unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
--from-file FROM_FILE
if specified, load prompts from this file
--config CONFIG path to config which constructs model
--ckpt CKPT path to checkpoint of model
--seed SEED the seed (for reproducible sampling)
--precision {full,autocast}
evaluate at this precision
```
Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints.
For this reason `use_ema=False` is set in the configuration, otherwise the code will try to switch from
non-EMA to EMA weights. If you want to examine the effect of EMA vs no EMA, we provide "full" checkpoints
which contain both types of weights. For these, `use_ema=False` will load and use the non-EMA weights.
#### Diffusers Integration
Another way to download and sample Stable Diffusion is by using the [diffusers library](https://github.com/huggingface/diffusers/tree/main#new--stable-diffusion-is-now-fully-compatible-with-diffusers)
```py
# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-3-diffusers",
use_auth_token=True
)
prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
image = pipe(prompt)["sample"][0]
image.save("astronaut_rides_horse.png")
```
### Image Modification with Stable Diffusion
By using a diffusion-denoising mechanism as first proposed by [SDEdit](https://arxiv.org/abs/2108.01073), the model can be used for different
tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script,
we provide a script to perform image modification with Stable Diffusion.
The following describes an example where a rough sketch made in [Pinta](https://www.pinta-project.com/) is converted into a detailed artwork.
```
python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8
```
Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image.
Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.
**Input**
![sketch-in](assets/stable-samples/img2img/sketch-mountains-input.jpg)
**Outputs**
![out3](assets/stable-samples/img2img/mountains-3.png)
![out2](assets/stable-samples/img2img/mountains-2.png)
This procedure can, for example, also be used to upscale samples from the base model.
## Comments
- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
Thanks for open-sourcing!
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
## BibTeX
```
@misc{rombach2021highresolution,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year={2021},
eprint={2112.10752},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
Original portions of the software are Copyright (c) 2020 Lincoln D. Stein (https://github.com/lstein)
# Further Reading
Please see the original README for more information on this software
and underlying algorithm, located in the file [README-CompViz.md](docs/README-CompViz.md).

18
configs/models.yaml Normal file
View File

@@ -0,0 +1,18 @@
# This file describes the alternative machine learning models
# available to the dream script.
#
# To add a new model, follow the examples below. Each
# model requires a model config file, a weights file,
# and the width and height of the images it
# was trained on.
laion400m:
config: configs/latent-diffusion/txt2img-1p4B-eval.yaml
weights: models/ldm/text2img-large/model.ckpt
width: 256
height: 256
stable-diffusion-1.4:
config: configs/stable-diffusion/v1-inference.yaml
weights: models/ldm/stable-diffusion-v1/model.ckpt
width: 512
height: 512

View File

@@ -0,0 +1,110 @@
model:
base_learning_rate: 5.0e-03
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.00085
linear_end: 0.0120
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
cond_stage_key: caption
image_size: 64
channels: 4
cond_stage_trainable: true # Note: different from the one we trained before
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: False
embedding_reg_weight: 0.0
personalization_config:
target: ldm.modules.embedding_manager.EmbeddingManager
params:
placeholder_strings: ["*"]
initializer_words: ["sculpture"]
per_image_tokens: false
num_vectors_per_token: 1
progressive_words: False
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32 # unused
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_heads: 8
use_spatial_transformer: True
transformer_depth: 1
context_dim: 768
use_checkpoint: True
legacy: False
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
data:
target: main.DataModuleFromConfig
params:
batch_size: 1
num_workers: 2
wrap: false
train:
target: ldm.data.personalized.PersonalizedBase
params:
size: 512
set: train
per_image_tokens: false
repeats: 100
validation:
target: ldm.data.personalized.PersonalizedBase
params:
size: 512
set: val
per_image_tokens: false
repeats: 10
lightning:
modelcheckpoint:
params:
every_n_train_steps: 500
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 500
max_images: 8
increase_log_steps: False
trainer:
benchmark: True
max_steps: 4000000
# max_steps: 4000

View File

@@ -0,0 +1,103 @@
model:
base_learning_rate: 5.0e-03
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.00085
linear_end: 0.0120
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
cond_stage_key: caption
image_size: 64
channels: 4
cond_stage_trainable: true # Note: different from the one we trained before
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: False
embedding_reg_weight: 0.0
personalization_config:
target: ldm.modules.embedding_manager.EmbeddingManager
params:
placeholder_strings: ["*"]
initializer_words: ["painting"]
per_image_tokens: false
num_vectors_per_token: 1
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32 # unused
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_heads: 8
use_spatial_transformer: True
transformer_depth: 1
context_dim: 768
use_checkpoint: True
legacy: False
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
data:
target: main.DataModuleFromConfig
params:
batch_size: 2
num_workers: 16
wrap: false
train:
target: ldm.data.personalized_style.PersonalizedBase
params:
size: 512
set: train
per_image_tokens: false
repeats: 100
validation:
target: ldm.data.personalized_style.PersonalizedBase
params:
size: 512
set: val
per_image_tokens: false
repeats: 10
lightning:
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 500
max_images: 8
increase_log_steps: False
trainer:
benchmark: True

View File

@@ -26,6 +26,15 @@ model:
f_max: [ 1. ]
f_min: [ 1. ]
personalization_config:
target: ldm.modules.embedding_manager.EmbeddingManager
params:
placeholder_strings: ["*"]
initializer_words: ["sculpture"]
per_image_tokens: false
num_vectors_per_token: 1
progressive_words: False
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:

137
docs/CHANGELOG.md Normal file
View File

@@ -0,0 +1,137 @@
# **Changelog**
## v1.13 (in process)
- Supports a Google Colab notebook for a standalone server running on Google hardware [Arturo Mendivil](https://github.com/artmen1516)
- WebUI supports GFPGAN/ESRGAN facial reconstruction and upscaling [Kevin Gibbons](https://github.com/bakkot)
- WebUI supports incremental display of in-progress images during generation [Kevin Gibbons](https://github.com/bakkot)
- Output directory can be specified on the dream> command line.
- The grid was displaying duplicated images when not enough images to fill the final row [Muhammad Usama](https://github.com/SMUsamaShah)
- Can specify --grid on dream.py command line as the default.
- Miscellaneous internal bug and stability fixes.
---
## v1.12 (28 August 2022)
- Improved file handling, including ability to read prompts from standard input.
(kudos to [Yunsaki](https://github.com/yunsaki)
- The web server is now integrated with the dream.py script. Invoke by adding --web to
the dream.py command arguments.
- Face restoration and upscaling via GFPGAN and Real-ESGAN are now automatically
enabled if the GFPGAN directory is located as a sibling to Stable Diffusion.
VRAM requirements are modestly reduced. Thanks to both [Blessedcoolant](https://github.com/blessedcoolant) and
[Oceanswave](https://github.com/oceanswave) for their work on this.
- You can now swap samplers on the dream> command line. [Blessedcoolant](https://github.com/blessedcoolant)
---
## v1.11 (26 August 2022)
- NEW FEATURE: Support upscaling and face enhancement using the GFPGAN module. (kudos to [Oceanswave](https://github.com/Oceanswave)
- You now can specify a seed of -1 to use the previous image's seed, -2 to use the seed for the image generated before that, etc.
Seed memory only extends back to the previous command, but will work on all images generated with the -n# switch.
- Variant generation support temporarily disabled pending more general solution.
- Created a feature branch named **yunsaki-morphing-dream** which adds experimental support for
iteratively modifying the prompt and its parameters. Please see[ Pull Request #86](https://github.com/lstein/stable-diffusion/pull/86)
for a synopsis of how this works. Note that when this feature is eventually added to the main branch, it will may be modified
significantly.
---
## v1.10 (25 August 2022)
- A barebones but fully functional interactive web server for online generation of txt2img and img2img.
---
## v1.09 (24 August 2022)
- A new -v option allows you to generate multiple variants of an initial image
in img2img mode. (kudos to [Oceanswave](https://github.com/Oceanswave). [
See this discussion in the PR for examples and details on use](https://github.com/lstein/stable-diffusion/pull/71#issuecomment-1226700810))
- Added ability to personalize text to image generation (kudos to [Oceanswave](https://github.com/Oceanswave) and [nicolai256](https://github.com/nicolai256))
- Enabled all of the samplers from k_diffusion
---
## v1.08 (24 August 2022)
- Escape single quotes on the dream> command before trying to parse. This avoids
parse errors.
- Removed instruction to get Python3.8 as first step in Windows install.
Anaconda3 does it for you.
- Added bounds checks for numeric arguments that could cause crashes.
- Cleaned up the copyright and license agreement files.
---
## v1.07 (23 August 2022)
- Image filenames will now never fill gaps in the sequence, but will be assigned the
next higher name in the chosen directory. This ensures that the alphabetic and chronological
sort orders are the same.
---
## v1.06 (23 August 2022)
- Added weighted prompt support contributed by [xraxra](https://github.com/xraxra)
- Example of using weighted prompts to tweak a demonic figure contributed by [bmaltais](https://github.com/bmaltais)
---
## v1.05 (22 August 2022 - after the drop)
- Filenames now use the following formats:
000010.95183149.png -- Two files produced by the same command (e.g. -n2),
000010.26742632.png -- distinguished by a different seed.
000011.455191342.01.png -- Two files produced by the same command using
000011.455191342.02.png -- a batch size>1 (e.g. -b2). They have the same seed.
000011.4160627868.grid#1-4.png -- a grid of four images (-g); the whole grid can
be regenerated with the indicated key
- It should no longer be possible for one image to overwrite another
- You can use the "cd" and "pwd" commands at the dream> prompt to set and retrieve
the path of the output directory.
---
## v1.04 (22 August 2022 - after the drop)
- Updated README to reflect installation of the released weights.
- Suppressed very noisy and inconsequential warning when loading the frozen CLIP
tokenizer.
---
## v1.03 (22 August 2022)
- The original txt2img and img2img scripts from the CompViz repository have been moved into
a subfolder named "orig_scripts", to reduce confusion.
---
## v1.02 (21 August 2022)
- A copy of the prompt and all of its switches and options is now stored in the corresponding
image in a tEXt metadata field named "Dream". You can read the prompt using scripts/images2prompt.py,
or an image editor that allows you to explore the full metadata.
**Please run "conda env update -f environment.yaml" to load the k_lms dependencies!!**
---
## v1.01 (21 August 2022)
- added k_lms sampling.
**Please run "conda env update -f environment.yaml" to load the k_lms dependencies!!**
- use half precision arithmetic by default, resulting in faster execution and lower memory requirements
Pass argument --full_precision to dream.py to get slower but more accurate image generation
---
## Links
- **[Read Me](../readme.md)**

61
docs/CONTRIBUTORS.md Normal file
View File

@@ -0,0 +1,61 @@
# Contributors
The list of all the amazing people who have contributed to the various features that you get to experience in this fork.
We thank them for all of their time and hard work.
_Original Author:_
- Lincoln D. Stein <lincoln.stein@gmail.com>
_Contributions by:_
- [Sean McLellan](https://github.com/Oceanswave)
- [Kevin Gibbons](https://github.com/bakkot)
- [Tesseract Cat](https://github.com/TesseractCat)
- [blessedcoolant](https://github.com/blessedcoolant)
- [David Ford](https://github.com/david-ford)
- [yunsaki](https://github.com/yunsaki)
- [James Reynolds](https://github.com/magnusviri)
- [David Wager](https://github.com/maddavid123)
- [Jason Toffaletti](https://github.com/toffaletti)
- [tildebyte](https://github.com/tildebyte)
- [Cragin Godley](https://github.com/cgodley)
- [BlueAmulet](https://github.com/BlueAmulet)
- [Benjamin Warner](https://github.com/warner-benjamin)
- [Cora Johnson-Roberson](https://github.com/corajr)
- [veprogames](https://github.com/veprogames)
- [JigenD](https://github.com/JigenD)
- [Niek van der Maas](https://github.com/Niek)
- [Henry van Megen](https://github.com/hvanmegen)
- [Håvard Gulldahl](https://github.com/havardgulldahl)
- [greentext2](https://github.com/greentext2)
- [Simon Vans-Colina](https://github.com/simonvc)
- [Gabriel Rotbart](https://github.com/gabrielrotbart)
- [Eric Khun](https://github.com/erickhun)
- [Brent Ozar](https://github.com/BrentOzar)
- [nderscore](https://github.com/nderscore)
- [Mikhail Tishin](https://github.com/tishin)
- [Tom Elovi Spruce](https://github.com/ilovecomputers)
- [spezialspezial](https://github.com/spezialspezial)
- [Yosuke Shinya](https://github.com/shinya7y)
- [Andy Pilate](https://github.com/Cubox)
- [Muhammad Usama](https://github.com/SMUsamaShah)
- [Arturo Mendivil](https://github.com/artmen1516)
- [Paul Sajna](https://github.com/sajattack)
- [Samuel Husso](https://github.com/shusso)
- [nicolai256](https://github.com/nicolai256)
_Original CompVis Authors:_
- [Robin Rombach](https://github.com/rromb)
- [Patrick von Platen](https://github.com/patrickvonplaten)
- [ablattmann](https://github.com/ablattmann)
- [Patrick Esser](https://github.com/pesser)
- [owenvincent](https://github.com/owenvincent)
- [apolinario](https://github.com/apolinario)
- [Charles Packer](https://github.com/cpacker)
---
_If you have contributed and don't see your name on the list of contributors, please let one of the collaborators know about the omission, or feel free to make a pull request._

208
docs/README-CompViz.md Normal file
View File

@@ -0,0 +1,208 @@
# Original README from CompViz/stable-diffusion
_Stable Diffusion was made possible thanks to a collaboration with [Stability AI](https://stability.ai/) and [Runway](https://runwayml.com/) and builds upon our previous work:_
[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://ommer-lab.com/research/latent-diffusion-models/)<br/>
[Robin Rombach](https://github.com/rromb)\*,
[Andreas Blattmann](https://github.com/ablattmann)\*,
[Dominik Lorenz](https://github.com/qp-qp)\,
[Patrick Esser](https://github.com/pesser),
[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)<br/>
**CVPR '22 Oral**
which is available on [GitHub](https://github.com/CompVis/latent-diffusion). PDF at [arXiv](https://arxiv.org/abs/2112.10752). Please also visit our [Project page](https://ommer-lab.com/research/latent-diffusion-models/).
![txt2img-stable2](../assets/stable-samples/txt2img/merged-0006.png)
[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion
model.
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487),
this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).
## Requirements
A suitable [conda](https://conda.io/) environment named `ldm` can be created
and activated with:
```
conda env create -f environment.yaml
conda activate ldm
```
You can also update an existing [latent diffusion](https://github.com/CompVis/latent-diffusion) environment by running
```
conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2
pip install -e .
```
## Stable Diffusion v1
Stable Diffusion v1 refers to a specific configuration of the model
architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet
and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and
then finetuned on 512x512 images.
\*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present
in its training data.
Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion).
Research into the safe deployment of general text-to-image models is an ongoing effort. To prevent misuse and harm, we currently provide access to the checkpoints only for [academic research purposes upon request](https://stability.ai/academia-access-form).
**This is an experiment in safe and community-driven publication of a capable and general text-to-image model. We are working on a public release with a more permissive license that also incorporates ethical considerations.\***
[Request access to Stable Diffusion v1 checkpoints for academic research](https://stability.ai/academia-access-form)
### Weights
We currently provide three checkpoints, `sd-v1-1.ckpt`, `sd-v1-2.ckpt` and `sd-v1-3.ckpt`,
which were trained as follows,
- `sd-v1-1.ckpt`: 237k steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
194k steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
- `sd-v1-2.ckpt`: Resumed from `sd-v1-1.ckpt`.
515k steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
- `sd-v1-3.ckpt`: Resumed from `sd-v1-2.ckpt`. 195k steps at resolution `512x512` on "laion-improved-aesthetics" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling
steps show the relative improvements of the checkpoints:
![sd evaluation results](../assets/v1-variants-scores.jpg)
### Text-to-Image with Stable Diffusion
![txt2img-stable2](../assets/stable-samples/txt2img/merged-0005.png)
![txt2img-stable2](../assets/stable-samples/txt2img/merged-0007.png)
Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder.
#### Sampling Script
After [obtaining the weights](#weights), link them
```
mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt
```
and sample with
```
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
```
By default, this uses a guidance scale of `--scale 7.5`, [Katherine Crowson's implementation](https://github.com/CompVis/latent-diffusion/pull/51) of the [PLMS](https://arxiv.org/abs/2202.09778) sampler,
and renders images of size 512x512 (which it was trained on) in 50 steps. All supported arguments are listed below (type `python scripts/txt2img.py --help`).
```commandline
usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA] [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS]
[--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT] [--seed SEED] [--precision {full,autocast}]
optional arguments:
-h, --help show this help message and exit
--prompt [PROMPT] the prompt to render
--outdir [OUTDIR] dir to write results to
--skip_grid do not save a grid, only individual samples. Helpful when evaluating lots of samples
--skip_save do not save individual samples. For speed measurements.
--ddim_steps DDIM_STEPS
number of ddim sampling steps
--plms use plms sampling
--laion400m uses the LAION400M model
--fixed_code if enabled, uses the same starting code across samples
--ddim_eta DDIM_ETA ddim eta (eta=0.0 corresponds to deterministic sampling
--n_iter N_ITER sample this often
--H H image height, in pixel space
--W W image width, in pixel space
--C C latent channels
--f F downsampling factor
--n_samples N_SAMPLES
how many samples to produce for each given prompt. A.k.a. batch size
(note that the seeds for each image in the batch will be unavailable)
--n_rows N_ROWS rows in the grid (default: n_samples)
--scale SCALE unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
--from-file FROM_FILE
if specified, load prompts from this file
--config CONFIG path to config which constructs model
--ckpt CKPT path to checkpoint of model
--seed SEED the seed (for reproducible sampling)
--precision {full,autocast}
evaluate at this precision
```
Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints.
For this reason `use_ema=False` is set in the configuration, otherwise the code will try to switch from
non-EMA to EMA weights. If you want to examine the effect of EMA vs no EMA, we provide "full" checkpoints
which contain both types of weights. For these, `use_ema=False` will load and use the non-EMA weights.
#### Diffusers Integration
Another way to download and sample Stable Diffusion is by using the [diffusers library](https://github.com/huggingface/diffusers/tree/main#new--stable-diffusion-is-now-fully-compatible-with-diffusers)
```py
# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-3-diffusers",
use_auth_token=True
)
prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
image = pipe(prompt)["sample"][0]
image.save("astronaut_rides_horse.png")
```
### Image Modification with Stable Diffusion
By using a diffusion-denoising mechanism as first proposed by [SDEdit](https://arxiv.org/abs/2108.01073), the model can be used for different
tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script,
we provide a script to perform image modification with Stable Diffusion.
The following describes an example where a rough sketch made in [Pinta](https://www.pinta-project.com/) is converted into a detailed artwork.
```
python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8
```
Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image.
Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.
**Input**
![sketch-in](../assets/stable-samples/img2img/sketch-mountains-input.jpg)
**Outputs**
![out3](../assets/stable-samples/img2img/mountains-3.png)
![out2](../assets/stable-samples/img2img/mountains-2.png)
This procedure can, for example, also be used to upscale samples from the base model.
## Comments
- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
Thanks for open-sourcing!
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
## BibTeX
```
@misc{rombach2021highresolution,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year={2021},
eprint={2112.10752},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 799 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 499 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 536 KiB

BIN
docs/assets/logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 429 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 445 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 426 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 427 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 424 KiB

228
docs/features/CLI.md Normal file
View File

@@ -0,0 +1,228 @@
# **Interactive Command-Line Interface**
The `dream.py` script, located in `scripts/dream.py`, provides an interactive interface to image generation similar to the "dream mothership" bot that Stable AI provided on its Discord server.
Unlike the txt2img.py and img2img.py scripts provided in the original CompViz/stable-diffusion source code repository, the time-consuming initialization of the AI model initialization only happens once. After that image generation
from the command-line interface is very fast.
The script uses the readline library to allow for in-line editing, command history (up and down arrows), autocompletion, and more. To help keep track of which prompts generated which images, the script writes a log file of image names and prompts to the selected output directory.
In addition, as of version 1.02, it also writes the prompt into the PNG file's metadata where it can be retrieved using scripts/images2prompt.py
The script is confirmed to work on Linux, Windows and Mac systems.
_Note:_ This script runs from the command-line or can be used as a Web application. The Web GUI is currently rudimentary, but a much better replacement is on its way.
```
(ldm) ~/stable-diffusion$ python3 ./scripts/dream.py
* Initializing, be patient...
Loading model from models/ldm/text2img-large/model.ckpt
(...more initialization messages...)
* Initialization done! Awaiting your command...
dream> ashley judd riding a camel -n2 -s150
Outputs:
outputs/img-samples/00009.png: "ashley judd riding a camel" -n2 -s150 -S 416354203
outputs/img-samples/00010.png: "ashley judd riding a camel" -n2 -s150 -S 1362479620
dream> "there's a fly in my soup" -n6 -g
outputs/img-samples/00011.png: "there's a fly in my soup" -n6 -g -S 2685670268
seeds for individual rows: [2685670268, 1216708065, 2335773498, 822223658, 714542046, 3395302430]
dream> q
# this shows how to retrieve the prompt stored in the saved image's metadata
(ldm) ~/stable-diffusion$ python ./scripts/images2prompt.py outputs/img_samples/*.png
00009.png: "ashley judd riding a camel" -s150 -S 416354203
00010.png: "ashley judd riding a camel" -s150 -S 1362479620
00011.png: "there's a fly in my soup" -n6 -g -S 2685670268
```
<p align='center'>
<img src="../assets/dream-py-demo.png"/>
</p>
The `dream>` prompt's arguments are pretty much identical to those
used in the Discord bot, except you don't need to type "!dream" (it
doesn't hurt if you do). A significant change is that creation of
individual images is now the default unless --grid (-g) is given. A
full list is given in [List of prompt arguments]
(#list-of-prompt-arguments).
# Arguments
The script itself also recognizes a series of command-line switches
that will change important global defaults, such as the directory for
image outputs and the location of the model weight files.
## List of arguments recognized at the command line:
These command-line arguments can be passed to dream.py when you first
run it from the Windows, Mac or Linux command line. Some set defaults
that can be overridden on a per-prompt basis (see [List of prompt
arguments] (#list-of-prompt-arguments). Others
| Argument | Shortcut | Default | Description |
|--------------------|------------|---------------------|--------------|
| --help | -h | | Print a concise help message. |
| --outdir <path> | -o<path> | outputs/img_samples | Location for generated images. |
| --prompt_as_dir | -p | False | Name output directories using the prompt text. |
| --from_file <path> | | None | Read list of prompts from a file. Use "-" to read from standard input |
| --model <modelname>| | stable-diffusion-1.4| Loads model specified in configs/models.yaml. Currently one of "stable-diffusion-1.4" or "laion400m"|
| --full_precision | -F | False | Run in slower full-precision mode. Needed for Macintosh M1/M2 hardware and some older video cards. |
| --web | | False | Start in web server mode |
| --host <ip addr> | | localhost | Which network interface web server should listen on. Set to 0.0.0.0 to listen on any. |
| --port <port> | | 9090 | Which port web server should listen for requests on. |
| --config <path> | | configs/models.yaml | Configuration file for models and their weights. |
| --iterations <int> | -n<int> | 1 | How many images to generate per prompt. |
| --grid | -g | False | Save all image series as a grid rather than individually. |
| --sampler <sampler>| -A<sampler>| k_lms | Sampler to use. Use -h to get list of available samplers. |
| --seamless | | False | Create interesting effects by tiling elements of the image. |
| --embedding_path <path>| | None | Path to pre-trained embedding manager checkpoints, for custom models |
| --gfpgan_dir | | src/gfpgan | Path to where GFPGAN is installed. |
| --gfpgan_model_path| | experiments/pretrained_models/GFPGANv1.3.pth| Path to GFPGAN model file, relative to --gfpgan_dir. |
| --device <device> | -d<device>| torch.cuda.current_device() | Device to run SD on, e.g. "cuda:0" |
These arguments are deprecated but still work:
| Argument | Shortcut | Default | Description |
|--------------------|------------|---------------------|--------------|
| --weights <path> | | None | Pth to weights file; use `--model stable-diffusion-1.4` instead |
| --laion400m | -l | False | Use older LAION400m weights; use `--model=laion400m` instead |
**A note on path names:** On Windows systems, you may run into
problems when passing the dream script standard backslashed path
names because the Python interpreter treats "\" as an escape.
You can either double your slashes (ick): C:\\\\path\\\\to\\\\my\\\\file, or
use Linux/Mac style forward slashes (better): C:/path/to/my/file.
## List of prompt arguments
After the dream.py script initializes, it will present you with a
**dream>** prompt. Here you can enter information to generate images
from text (txt2img), to embellish an existing image or sketch
(img2img), or to selectively alter chosen regions of the image
(inpainting).
### This is an example of txt2img:
~~~~
dream> waterfall and rainbow -W640 -H480
~~~~
This will create the requested image with the dimensions 640 (width)
and 480 (height).
Here are the dream> command that apply to txt2img:
| Argument | Shortcut | Default | Description |
|--------------------|------------|---------------------|--------------|
| "my prompt" | | | Text prompt to use. The quotation marks are optional. |
| --width <int> | -W<int> | 512 | Width of generated image |
| --height <int> | -H<int> | 512 | Height of generated image |
| --iterations <int> | -n<int> | 1 | How many images to generate from this prompt |
| --steps <int> | -s<int> | 50 | How many steps of refinement to apply |
| --cfg_scale <float>| -C<float> | 7.5 | How hard to try to match the prompt to the generated image; any number greater than 0.0 works, but the useful range is roughly 5.0 to 20.0 |
| --seed <int> | -S<int> | None | Set the random seed for the next series of images. This can be used to recreate an image generated previously.|
| --sampler <sampler>| -A<sampler>| k_lms | Sampler to use. Use -h to get list of available samplers. |
| --grid | -g | False | Turn on grid mode to return a single image combining all the images generated by this prompt |
| --individual | -i | True | Turn off grid mode (deprecated; leave off --grid instead) |
| --outdir <path> | -o<path> | outputs/img_samples | Temporarily change the location of these images |
| --seamless | | False | Activate seamless tiling for interesting effects |
| --log_tokenization | -t | False | Display a color-coded list of the parsed tokens derived from the prompt |
| --skip_normalization| -x | False | Weighted subprompts will not be normalized. See [Weighted Prompts](./OTHER.md#weighted-prompts) |
| --upscale <int> <float> | -U <int> <float> | -U 1 0.75| Upscale image by magnification factor (2, 4), and set strength of upscaling (0.0-1.0). If strength not set, will default to 0.75. |
| --gfpgan_strength <float> | -G <float> | -G0 | Fix faces using the GFPGAN algorithm; argument indicates how hard the algorithm should try (0.0-1.0) |
| --save_original | -save_orig| False | When upscaling or fixing faces, this will cause the original image to be saved rather than replaced. |
| --variation <float> |-v<float>| 0.0 | Add a bit of noise (0.0=none, 1.0=high) to the image in order to generate a series of variations. Usually used in combination with -S<seed> and -n<int> to generate a series a riffs on a starting image. See [Variations](./VARIATIONS.md). |
| --with_variations <pattern> | -V<pattern>| None | Combine two or more variations. See [Variations](./VARIATIONS.md) for now to use this. |
Note that the width and height of the image must be multiples of
64. You can provide different values, but they will be rounded down to
the nearest multiple of 64.
### This is an example of img2img:
~~~~
dream> waterfall and rainbow -I./vacation-photo.png -W640 -H480 --fit
~~~~
This will modify the indicated vacation photograph by making it more
like the prompt. Results will vary greatly depending on what is in the
image. We also ask to --fit the image into a box no bigger than
640x480. Otherwise the image size will be identical to the provided
photo and you may run out of memory if it is large.
In addition to the command-line options recognized by txt2img, img2img
accepts additional options:
| Argument | Shortcut | Default | Description |
|--------------------|------------|---------------------|--------------|
| --init_img <path> | -I<path> | None | Path to the initialization image |
| --fit | -F | False | Scale the image to fit into the specified -H and -W dimensions |
| --strength <float> | -s<float> | 0.75 | How hard to try to match the prompt to the initial image. Ranges from 0.0-0.99, with higher values replacing the initial image completely.|
### This is an example of inpainting:
~~~~
dream> waterfall and rainbow -I./vacation-photo.png -M./vacation-mask.png -W640 -H480 --fit
~~~~
This will do the same thing as img2img, but image alterations will
only occur within transparent areas defined by the mask file specified
by -M. You may also supply just a single initial image with the areas
to overpaint made transparent, but you must be careful not to destroy
the pixels underneath when you create the transparent areas. See
[Inpainting](./INPAINTING.md) for details.
inpainting accepts all the arguments used for txt2img and img2img, as
well as the --mask (-M) argument:
| Argument | Shortcut | Default | Description |
|--------------------|------------|---------------------|--------------|
| --init_mask <path> | -M<path> | None |Path to an image the same size as the initial_image, with areas for inpainting made transparent.|
# Command-line editing and completion
If you are on a Macintosh or Linux machine, the command-line offers
convenient history tracking, editing, and command completion.
- To scroll through previous commands and potentially edit/reuse them, use the up and down cursor keys.
- To edit the current command, use the left and right cursor keys to position the cursor, and then backspace, delete or insert characters.
- To move to the very beginning of the command, type CTRL-A (or command-A on the Mac)
- To move to the end of the command, type CTRL-E.
- To cut a section of the command, position the cursor where you want to start cutting and type CTRL-K.
- To paste a cut section back in, position the cursor where you want to paste, and type CTRL-Y
Windows users can get similar, but more limited, functionality if they
launch dream.py with the "winpty" program:
~~~
> winpty python scripts\dream.py
~~~
On the Mac and Linux platforms, when you exit dream.py, the last 1000
lines of your command-line history will be saved. When you restart
dream.py, you can access the saved history using the up-arrow key.
In addition, limited command-line completion is installed. In various
contexts, you can start typing your command and press tab. A list of
potential completions will be presented to you. You can then type a
little more, hit tab again, and eventually autocomplete what you want.
When specifying file paths using the one-letter shortcuts, the CLI
will attempt to complete pathnames for you. This is most handy for the
-I (init image) and -M (init mask) paths. To initiate completion, start
the path with a slash ("/") or "./". For example:
~~~
dream> zebra with a mustache -I./test-pictures<TAB>
-I./test-pictures/Lincoln-and-Parrot.png -I./test-pictures/zebra.jpg -I./test-pictures/madonna.png
-I./test-pictures/bad-sketch.png -I./test-pictures/man_with_eagle/
~~~
You can then type "z", hit tab again, and it will autofill to "zebra.jpg".
More text completion features (such as autocompleting seeds) are on their way.

30
docs/features/IMG2IMG.md Normal file
View File

@@ -0,0 +1,30 @@
# **Image-to-Image**
This script also provides an img2img feature that lets you seed your
creations with an initial drawing or photo. This is a really cool
feature that tells stable diffusion to build the prompt on top of the
image you provide, preserving the original's basic shape and
layout. To use it, provide the `--init_img` option as shown here:
```
dream> "waterfall and rainbow" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4
```
The `--init_img (-I)` option gives the path to the seed
picture. `--strength (-f)` controls how much the original will be
modified, ranging from `0.0` (keep the original intact), to `1.0`
(ignore the original completely). The default is `0.75`, and ranges
from `0.25-0.75` give interesting results.
You may also pass a `-v<count>` option to generate count variants on
the original image. This is done by passing the first generated image
back into img2img the requested number of times. It generates
interesting variants.
If the initial image contains transparent regions, then Stable
Diffusion will only draw within the transparent regions, a process
called "inpainting". However, for this to work correctly, the color
information underneath the transparent needs to be preserved, not
erased. See [Creating Transparent Images For
Inpainting](./INPAINTING.md#creating-transparent-regions-for-inpainting)
for details.

View File

@@ -0,0 +1,41 @@
# **Creating Transparent Regions for Inpainting**
Inpainting is really cool. To do it, you start with an initial image
and use a photoeditor to make one or more regions transparent
(i.e. they have a "hole" in them). You then provide the path to this
image at the dream> command line using the `-I` switch. Stable
Diffusion will only paint within the transparent region.
There's a catch. In the current implementation, you have to prepare
the initial image correctly so that the underlying colors are
preserved under the transparent area. Many imaging editing
applications will by default erase the color information under the
transparent pixels and replace them with white or black, which will
lead to suboptimal inpainting. You also must take care to export the
PNG file in such a way that the color information is preserved.
If your photoeditor is erasing the underlying color information,
`dream.py` will give you a big fat warning. If you can't find a way to
coax your photoeditor to retain color values under transparent areas,
then you can combine the `-I` and `-M` switches to provide both the
original unedited image and the masked (partially transparent) image:
```
dream> man with cat on shoulder -I./images/man.png -M./images/man-transparent.png
```
We are hoping to get rid of the need for this workaround in an upcoming release.
## Recipe for GIMP
[GIMP](https://www.gimp.org/) is a popular Linux photoediting tool.
1. Open image in GIMP.
2. Layer->Transparency->Add Alpha Channel
3. Use lasoo tool to select region to mask
4. Choose Select -> Float to create a floating selection
5. Open the Layers toolbar (^L) and select "Floating Selection"
6. Set opacity to 0%
7. Export as PNG
8. In the export dialogue, Make sure the "Save colour values from
transparent pixels" checkbox is selected.

133
docs/features/OTHER.md Normal file
View File

@@ -0,0 +1,133 @@
## **Google Colab**
Stable Diffusion AI Notebook: <a
href="https://colab.research.google.com/github/lstein/stable-diffusion/blob/main/notebooks/Stable_Diffusion_AI_Notebook.ipynb"
target="_parent"><img
src="https://colab.research.google.com/assets/colab-badge.svg"
alt="Open In Colab"/></a> <br> Open and follow instructions to use an
isolated environment running Dream.<br>
Output Example:
![Colab Notebook](../assets/colab_notebook.png)
---
## **Seamless Tiling**
The seamless tiling mode causes generated images to seamlessly tile
with itself. To use it, add the `--seamless` option when starting the
script which will result in all generated images to tile, or for each
`dream>` prompt as shown here:
```
dream> "pond garden with lotus by claude monet" --seamless -s100 -n4
```
---
## **Reading Prompts from a File**
You can automate `dream.py` by providing a text file with the prompts
you want to run, one line per prompt. The text file must be composed
with a text editor (e.g. Notepad) and not a word processor. Each line
should look like what you would type at the dream> prompt:
```
a beautiful sunny day in the park, children playing -n4 -C10
stormy weather on a mountain top, goats grazing -s100
innovative packaging for a squid's dinner -S137038382
```
Then pass this file's name to `dream.py` when you invoke it:
```
(ldm) ~/stable-diffusion$ python3 scripts/dream.py --from_file "path/to/prompts.txt"
```
You may read a series of prompts from standard input by providing a filename of `-`:
```
(ldm) ~/stable-diffusion$ echo "a beautiful day" | python3 scripts/dream.py --from_file -
```
---
## **Shortcuts: Reusing Seeds**
Since it is so common to reuse seeds while refining a prompt, there is now a shortcut as of version 1.11. Provide a `**-S**` (or `**--seed**`)
switch of `-1` to use the seed of the most recent image generated. If you produced multiple images with the `**-n**` switch, then you can go back further using -2, -3, etc. up to the first image generated by the previous command. Sorry, but you can't go back further than one command.
Here's an example of using this to do a quick refinement. It also illustrates using the new `**-G**` switch to turn on upscaling and face enhancement (see previous section):
```
dream> a cute child playing hopscotch -G0.5
[...]
outputs/img-samples/000039.3498014304.png: "a cute child playing hopscotch" -s50 -W512 -H512 -C7.5 -mk_lms -S3498014304
# I wonder what it will look like if I bump up the steps and set facial enhancement to full strength?
dream> a cute child playing hopscotch -G1.0 -s100 -S -1
reusing previous seed 3498014304
[...]
outputs/img-samples/000040.3498014304.png: "a cute child playing hopscotch" -G1.0 -s100 -W512 -H512 -C7.5 -mk_lms -S3498014304
```
---
## **Weighted Prompts**
You may weight different sections of the prompt to tell the sampler to attach different levels of
priority to them, by adding `:(number)` to the end of the section you wish to up- or downweight.
For example consider this prompt:
```
tabby cat:0.25 white duck:0.75 hybrid
```
This will tell the sampler to invest 25% of its effort on the tabby
cat aspect of the image and 75% on the white duck aspect
(surprisingly, this example actually works). The prompt weights can
use any combination of integers and floating point numbers, and they
do not need to add up to 1.
---
## **Simplified API**
For programmers who wish to incorporate stable-diffusion into other products, this repository includes a simplified API for text to image generation, which lets you create images from a prompt in just three lines of code:
```
from ldm.generate import Generate
g = Generate()
outputs = g.txt2img("a unicorn in manhattan")
```
Outputs is a list of lists in the format [filename1,seed1],[filename2,seed2]...].
Please see ldm/generate.py for more information. A set of example scripts is coming RSN.
---
## **Preload Models**
In situations where you have limited internet connectivity or are
blocked behind a firewall, you can use the preload script to preload
the required files for Stable Diffusion to run.
The preload script `scripts/preload_models.py` needs to be run once at
least while connected to the internet. In the following runs, it will
load up the cached versions of the required files from the `.cache`
directory of the system.
```
(ldm) ~/stable-diffusion$ python3 ./scripts/preload_models.py
preloading bert tokenizer...
Downloading: 100%|██████████████████████████████████| 28.0/28.0 [00:00<00:00, 49.3kB/s]
Downloading: 100%|██████████████████████████████████| 226k/226k [00:00<00:00, 2.79MB/s]
Downloading: 100%|██████████████████████████████████| 455k/455k [00:00<00:00, 4.36MB/s]
Downloading: 100%|██████████████████████████████████| 570/570 [00:00<00:00, 477kB/s]
...success
preloading kornia requirements...
Downloading: "https://github.com/DagnyT/hardnet/raw/master/pretrained/train_liberty_with_aug/checkpoint_liberty_with_aug.pth" to /u/lstein/.cache/torch/hub/checkpoints/checkpoint_liberty_with_aug.pth
100%|███████████████████████████████████████████████| 5.10M/5.10M [00:00<00:00, 101MB/s]
...success
```

View File

@@ -0,0 +1,70 @@
# **Personalizing Text-to-Image Generation**
You may personalize the generated images to provide your own styles or objects by training a new LDM checkpoint and introducing a new vocabulary to the fixed model as a (.pt) embeddings file. Alternatively, you may use or train HuggingFace Concepts embeddings files (.bin) from https://huggingface.co/sd-concepts-library and its associated notebooks.
**Training**
To train, prepare a folder that contains images sized at 512x512 and execute the following:
**WINDOWS**: As the default backend is not available on Windows, if you're using that platform, set the environment variable `PL_TORCH_DISTRIBUTED_BACKEND=gloo`
```
(ldm) ~/stable-diffusion$ python3 ./main.py --base ./configs/stable-diffusion/v1-finetune.yaml \
-t \
--actual_resume ./models/ldm/stable-diffusion-v1/model.ckpt \
-n my_cat \
--gpus 0, \
--data_root D:/textual-inversion/my_cat \
--init_word 'cat'
```
During the training process, files will be created in
/logs/[project][time][project]/ where you can see the process.
Conditioning contains the training prompts inputs, reconstruction the
input images for the training epoch samples, samples scaled for a
sample of the prompt and one with the init word provided.
On a RTX3090, the process for SD will take ~1h @1.6 iterations/sec.
_Note_: According to the associated paper, the optimal number of
images is 3-5. Your model may not converge if you use more images than
that.
Training will run indefinitely, but you may wish to stop it (with
ctrl-c) before the heat death of the universe, when you find a low
loss epoch or around ~5000 iterations. Note that you can set a fixed
limit on the number of training steps by decreasing the "max_steps"
option in configs/stable_diffusion/v1-finetune.yaml (currently set to
4000000)
**Running**
Once the model is trained, specify the trained .pt or .bin file when
starting dream using
```
(ldm) ~/stable-diffusion$ python3 ./scripts/dream.py --embedding_path /path/to/embedding.pt --full_precision
```
Then, to utilize your subject at the dream prompt
```
dream> "a photo of *"
```
This also works with image2image
```
dream> "waterfall and rainbow in the style of *" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4
```
For .pt files it's also possible to train multiple tokens (modify the placeholder string in `configs/stable-diffusion/v1-finetune.yaml`) and combine LDM checkpoints using:
```
(ldm) ~/stable-diffusion$ python3 ./scripts/merge_embeddings.py \
--manager_ckpts /path/to/first/embedding.pt /path/to/second/embedding.pt [...] \
--output_path /path/to/output/embedding.pt
```
Credit goes to rinongal and the repository located at https://github.com/rinongal/textual_inversion Please see the repository and associated paper for details and limitations.

105
docs/features/UPSCALE.md Normal file
View File

@@ -0,0 +1,105 @@
# **GFPGAN and Real-ESRGAN Support**
The script also provides the ability to do face restoration and
upscaling with the help of GFPGAN and Real-ESRGAN respectively.
As of version 1.14, environment.yaml will install the Real-ESRGAN package into the
standard install location for python packages, and will put GFPGAN into a subdirectory of "src"
in the stable-diffusion directory.
(The reason for this is that the standard GFPGAN distribution has a minor bug that adversely affects image
color.) Upscaling with Real-ESRGAN should "just work" without further intervention. Simply pass the --upscale (-U)
option on the dream> command line, or indicate the desired scale on the popup in the Web GUI.
For **GFPGAN** to work, there is one additional step needed. You will need to download and
copy the GFPGAN [models file](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth)
into **src/gfpgan/experiments/pretrained_models**. On Mac and Linux systems, here's how you'd do it using
**wget**:
~~~~
> wget https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth src/gfpgan/experiments/pretrained_models/
~~~~
Make sure that you're in the stable-diffusion directory when you do this.
Alternatively, if you have GFPGAN installed elsewhere, or if you are using
an earlier version of this package which asked you to install GFPGAN in a
sibling directory, you may use the `--gfpgan_dir` argument with `dream.py` to set a
custom path to your GFPGAN directory. _There are other GFPGAN related
boot arguments if you wish to customize further._
**Note: Internet connection needed:**
Users whose GPU machines are isolated from the Internet (e.g. on a
University cluster) should be aware that the first time you run
dream.py with GFPGAN and Real-ESRGAN turned on, it will try to
download model files from the Internet. To rectify this, you may run
`python3 scripts/preload_models.py` after you have installed GFPGAN
and all its dependencies.
**Usage**
You will now have access to two new prompt arguments.
**Upscaling**
`-U : <upscaling_factor> <upscaling_strength>`
The upscaling prompt argument takes two values. The first value is a
scaling factor and should be set to either `2` or `4` only. This will
either scale the image 2x or 4x respectively using different models.
You can set the scaling stength between `0` and `1.0` to control
intensity of the of the scaling. This is handy because AI upscalers
generally tend to smooth out texture details. If you wish to retain
some of those for natural looking results, we recommend using values
between `0.5 to 0.8`.
If you do not explicitly specify an upscaling_strength, it will
default to 0.75.
**Face Restoration**
`-G : <gfpgan_strength>`
This prompt argument controls the strength of the face restoration
that is being applied. Similar to upscaling, values between `0.5 to 0.8` are recommended.
You can use either one or both without any conflicts. In cases where
you use both, the image will be first upscaled and then the face
restoration process will be executed to ensure you get the highest
quality facial features.
`--save_orig`
When you use either `-U` or `-G`, the final result you get is upscaled
or face modified. If you want to save the original Stable Diffusion
generation, you can use the `-save_orig` prompt argument to save the
original unaffected version too.
**Example Usage**
```
dream > superman dancing with a panda bear -U 2 0.6 -G 0.4
```
This also works with img2img:
```
dream> a man wearing a pineapple hat -I path/to/your/file.png -U 2 0.5 -G 0.6
```
**Note**
GFPGAN and Real-ESRGAN are both memory intensive. In order to avoid
crashes and memory overloads during the Stable Diffusion process,
these effects are applied after Stable Diffusion has completed its
work.
In single image generations, you will see the output right away but
when you are using multiple iterations, the images will first be
generated and then upscaled and face restored after that process is
complete. While the image generation is taking place, you will still
be able to preview the base images.
If you wish to stop during the image generation but want to upscale or
face restore a particular generated image, pass it again with the same
prompt and generated seed along with the `-U` and `-G` prompt
arguments to perform those actions.

104
docs/features/VARIATIONS.md Normal file
View File

@@ -0,0 +1,104 @@
# **Variations**
Release 1.13 of SD-Dream adds support for image variations.
You are able to do the following:
1. Generate a series of systematic variations of an image, given a prompt. The amount of variation from one image to the next can be controlled.
2. Given two or more variations that you like, you can combine them in a weighted fashion.
---
This cheat sheet provides a quick guide for how this works in practice, using variations to create the desired image of Xena, Warrior Princess.
---
## Step 1 -- Find a base image that you like
The prompt we will use throughout is `lucy lawless as xena, warrior princess, character portrait, high resolution.`
This will be indicated as `prompt` in the examples below.
First we let SD create a series of images in the usual way, in this case requesting six iterations:
```
dream> lucy lawless as xena, warrior princess, character portrait, high resolution -n6
...
Outputs:
./outputs/Xena/000001.1579445059.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S1579445059
./outputs/Xena/000001.1880768722.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S1880768722
./outputs/Xena/000001.332057179.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S332057179
./outputs/Xena/000001.2224800325.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S2224800325
./outputs/Xena/000001.465250761.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S465250761
./outputs/Xena/000001.3357757885.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S3357757885
```
The one with seed 3357757885 looks nice:
<img src="assets/variation_walkthru/000001.3357757885.png"/>
---
## Step 2 - Generating Variations
Let's try to generate some variations. Using the same seed, we pass the argument `-v0.1` (or --variant_amount), which generates a series of
variations each differing by a variation amount of 0.2. This number ranges from `0` to `1.0`, with higher numbers being larger amounts of
variation.
```
dream> "prompt" -n6 -S3357757885 -v0.2
...
Outputs:
./outputs/Xena/000002.784039624.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 784039624:0.2 -S3357757885
./outputs/Xena/000002.3647897225.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.2 -S3357757885
./outputs/Xena/000002.917731034.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 917731034:0.2 -S3357757885
./outputs/Xena/000002.4116285959.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 4116285959:0.2 -S3357757885
./outputs/Xena/000002.1614299449.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 1614299449:0.2 -S3357757885
./outputs/Xena/000002.1335553075.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 1335553075:0.2 -S3357757885
```
### **Variation Sub Seeding**
Note that the output for each image has a `-V` option giving the "variant subseed" for that image, consisting of a seed followed by the
variation amount used to generate it.
This gives us a series of closely-related variations, including the two shown here.
<img src="assets/variation_walkthru/000002.3647897225.png">
<img src="assets/variation_walkthru/000002.1614299449.png">
I like the expression on Xena's face in the first one (subseed 3647897225), and the armor on her shoulder in the second one (subseed 1614299449). Can we combine them to get the best of both worlds?
We combine the two variations using `-V` (--with_variations). Again, we must provide the seed for the originally-chosen image in order for
this to work.
```
dream> "prompt" -S3357757885 -V3647897225,0.1;1614299449,0.1
Outputs:
./outputs/Xena/000003.1614299449.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1 -S3357757885
```
Here we are providing equal weights (0.1 and 0.1) for both the subseeds. The resulting image is close, but not exactly what I wanted:
<img src="assets/variation_walkthru/000003.1614299449.png">
We could either try combining the images with different weights, or we can generate more variations around the almost-but-not-quite image. We do the latter, using both the `-V` (combining) and `-v` (variation strength) options. Note that we use `-n6` to generate 6 variations:
```
dream> "prompt" -S3357757885 -V3647897225,0.1;1614299449,0.1 -v0.05 -n6
Outputs:
./outputs/Xena/000004.3279757577.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,3279757577:0.05 -S3357757885
./outputs/Xena/000004.2853129515.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,2853129515:0.05 -S3357757885
./outputs/Xena/000004.3747154981.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,3747154981:0.05 -S3357757885
./outputs/Xena/000004.2664260391.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,2664260391:0.05 -S3357757885
./outputs/Xena/000004.1642517170.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,1642517170:0.05 -S3357757885
./outputs/Xena/000004.2183375608.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,2183375608:0.05 -S3357757885
```
This produces six images, all slight variations on the combination of the chosen two images. Here's the one I like best:
<img src="assets/variation_walkthru/000004.3747154981.png">
As you can see, this is a very powerful tool, which when combined with subprompt weighting, gives you great control over the content and
quality of your generated images.

13
docs/features/WEB.md Normal file
View File

@@ -0,0 +1,13 @@
# Barebones Web Server
As of version 1.10, this distribution comes with a bare bones web server (see screenshot). To use it, run the `dream.py` script by adding the `**--web**` option.
```
(ldm) ~/stable-diffusion$ python3 scripts/dream.py --web
```
You can then connect to the server by pointing your web browser at http://localhost:9090, or to the network name or IP address of the server.
Kudos to [Tesseract Cat](https://github.com/TesseractCat) for contributing this code, and to [dagf2101](https://github.com/dagf2101) for refining it.
![Dream Web Server](../assets/dream_web_server.png)

68
docs/help/TROUBLESHOOT.md Normal file
View File

@@ -0,0 +1,68 @@
# **Frequently Asked Questions**
Here are a few common installation problems and their solutions. Often these are caused by incomplete installations or crashes during the
install process.
---
**QUESTION**
During `conda env create -f environment.yaml`, conda hangs indefinitely.
**SOLUTION**
Enter the stable-diffusion directory and completely remove the `src` directory and all its contents. The safest way to do this is to enter the stable-diffusion directory and give the command `git clean -f`. If this still doesn't fix the problem, try "conda clean -all" and then restart at the `conda env create` step.
---
**QUESTION**
`dream.py` crashes with the complaint that it can't find `ldm.simplet2i.py`. Or it complains that function is being passed incorrect parameters.
**SOLUTION**
Reinstall the stable diffusion modules. Enter the `stable-diffusion` directory and give the command `pip install -e .`
---
**QUESTION**
`dream.py` dies, complaining of various missing modules, none of which starts with `ldm``.
**SOLUTION**
From within the `stable-diffusion` directory, run `conda env update -f environment.yaml` This is also frequently the solution to
complaints about an unknown function in a module.
---
**QUESTION**
There's a feature or bugfix in the Stable Diffusion GitHub that you want to try out.
**SOLUTION**
**Main Branch**
If the fix/feature is on the `main` branch, enter the stable-diffusion directory and do a `git pull`.
Usually this will be sufficient, but if you start to see errors about missing or incorrect modules, use the command `pip install -e .` and/or `conda env update -f environment.yaml` (These commands won't break anything.)
**Sub Branch**
If the feature/fix is on a branch (e.g. "_foo-bugfix_"), the recipe is similar, but do a `git pull <name of branch>`.
**Not Committed**
If the feature/fix is in a pull request that has not yet been made part of the main branch or a feature/bugfix branch, then from the page for the desired pull request, look for the line at the top that reads "_xxxx wants to merge xx commits into lstein:main from YYYYYY_". Copy the URL in YYYY. It should have the format `https://github.com/<name of contributor>/stable-diffusion/tree/<name of branch>`
Then **go to the directory above stable-diffusion** and rename the directory to "_stable-diffusion.lstein_", "_stable-diffusion.old_", or anything else. You can then git clone the branch that contains the pull request:
```
git clone https://github.com/<name of contributor>/stable-diffusion/tree/<name
of branch>
```
You will need to go through the install procedure again, but it should be fast because all the dependencies are already loaded.
---

View File

@@ -0,0 +1,89 @@
# **Linux Installation**
1. You will need to install the following prerequisites if they are not already available. Use your operating system's preferred installer
- Python (version 3.8.5 recommended; higher may work)
- git
2. Install the Python Anaconda environment manager.
```
~$ wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
~$ chmod +x Anaconda3-2022.05-Linux-x86_64.sh
~$ ./Anaconda3-2022.05-Linux-x86_64.sh
```
After installing anaconda, you should log out of your system and log back in. If the installation
worked, your command prompt will be prefixed by the name of the current anaconda environment - `(base)`.
3. Copy the stable-diffusion source code from GitHub:
```
(base) ~$ git clone https://github.com/lstein/stable-diffusion.git
```
This will create stable-diffusion folder where you will follow the rest of the steps.
4. Enter the newly-created stable-diffusion folder. From this step forward make sure that you are working in the stable-diffusion directory!
```
(base) ~$ cd stable-diffusion
(base) ~/stable-diffusion$
```
5. Use anaconda to copy necessary python packages, create a new python environment named `ldm` and activate the environment.
```
(base) ~/stable-diffusion$ conda env create -f environment.yaml
(base) ~/stable-diffusion$ conda activate ldm
(ldm) ~/stable-diffusion$
```
After these steps, your command prompt will be prefixed by `(ldm)` as shown above.
6. Load a couple of small machine-learning models required by stable diffusion:
```
(ldm) ~/stable-diffusion$ python3 scripts/preload_models.py
```
Note that this step is necessary because I modified the original just-in-time model loading scheme to allow the script to work on GPU machines that are not internet connected. See [Preload Models](../features/OTHER.md#preload-models)
7. Now you need to install the weights for the stable diffusion model.
- For running with the released weights, you will first need to set up an acount with Hugging Face (https://huggingface.co).
- Use your credentials to log in, and then point your browser at https://huggingface.co/CompVis/stable-diffusion-v-1-4-original.
- You may be asked to sign a license agreement at this point.
- Click on "Files and versions" near the top of the page, and then click on the file named "sd-v1-4.ckpt". You'll be taken to a page that prompts you to click the "download" link. Save the file somewhere safe on your local machine.
Now run the following commands from within the stable-diffusion directory. This will create a symbolic link from the stable-diffusion model.ckpt file, to the true location of the sd-v1-4.ckpt file.
```
(ldm) ~/stable-diffusion$ mkdir -p models/ldm/stable-diffusion-v1
(ldm) ~/stable-diffusion$ ln -sf /path/to/sd-v1-4.ckpt models/ldm/stable-diffusion-v1/model.ckpt
```
8. Start generating images!
```
# for the pre-release weights use the -l or --liaon400m switch
(ldm) ~/stable-diffusion$ python3 scripts/dream.py -l
# for the post-release weights do not use the switch
(ldm) ~/stable-diffusion$ python3 scripts/dream.py
# for additional configuration switches and arguments, use -h or --help
(ldm) ~/stable-diffusion$ python3 scripts/dream.py -h
```
9. Subsequently, to relaunch the script, be sure to run "conda activate ldm" (step 5, second command), enter the `stable-diffusion` directory, and then launch the dream script (step 8). If you forget to activate the ldm environment, the script will fail with multiple `ModuleNotFound` errors.
### Updating to newer versions of the script
This distribution is changing rapidly. If you used the `git clone` method (step 5) to download the stable-diffusion directory, then to update to the latest and greatest version, launch the Anaconda window, enter `stable-diffusion` and type:
```
(ldm) ~/stable-diffusion$ git pull
```
This will bring your local copy into sync with the remote one.

View File

@@ -0,0 +1,352 @@
# **macOS Instructions**
Requirements
- macOS 12.3 Monterey or later
- Python
- Patience
- Apple Silicon\*
\*I haven't tested any of this on Intel Macs but I have read that one person got it to work, so Apple Silicon might not be requried.
Things have moved really fast and so these instructions change often
and are often out-of-date. One of the problems is that there are so
many different ways to run this.
We are trying to build a testing setup so that when we make changes it
doesn't always break.
How to (this hasn't been 100% tested yet):
First get the weights checkpoint download started - it's big:
1. Sign up at https://huggingface.co
2. Go to the [Stable diffusion diffusion model page](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original)
3. Accept the terms and click Access Repository:
4. Download [sd-v1-4.ckpt (4.27 GB)](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/blob/main/sd-v1-4.ckpt) and note where you have saved it (probably the Downloads folder)
While that is downloading, open Terminal and run the following commands one at a time.
```bash
# install brew (and Xcode command line tools):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
#
# Now there are two different routes to get the Python (miniconda) environment up and running:
# 1. Alongside pyenv
# 2. No pyenv
#
# If you don't know what we are talking about, choose 2.
#
# NOW EITHER DO
# 1. Installing alongside pyenv
brew install pyenv-virtualenv # you might have this from before, no problem
pyenv install anaconda3-2022.05
pyenv virtualenv anaconda3-2022.05
eval "$(pyenv init -)"
pyenv activate anaconda3-2022.05
# OR,
# 2. Installing standalone
# install python 3, git, cmake, protobuf:
brew install cmake protobuf rust
# install miniconda (M1 arm64 version):
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o Miniconda3-latest-MacOSX-arm64.sh
/bin/bash Miniconda3-latest-MacOSX-arm64.sh
# EITHER WAY,
# continue from here
# clone the repo
git clone https://github.com/lstein/stable-diffusion.git
cd stable-diffusion
#
# wait until the checkpoint file has downloaded, then proceed
#
# create symlink to checkpoint
mkdir -p models/ldm/stable-diffusion-v1/
PATH_TO_CKPT="$HOME/Downloads" # or wherever you saved sd-v1-4.ckpt
ln -s "$PATH_TO_CKPT/sd-v1-4.ckpt" models/ldm/stable-diffusion-v1/model.ckpt
# install packages
PIP_EXISTS_ACTION=w CONDA_SUBDIR=osx-arm64 conda env create -f environment-mac.yaml
conda activate ldm
# only need to do this once
python scripts/preload_models.py
# run SD!
python scripts/dream.py --full_precision # half-precision requires autocast and won't work
```
The original scripts should work as well.
```
python scripts/orig_scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
```
Note, `export PIP_EXISTS_ACTION=w` is a precaution to fix `conda env
create -f environment-mac.yaml` never finishing in some situations. So
it isn't required but wont hurt.
After you follow all the instructions and run dream.py you might get several errors. Here's the errors I've seen and found solutions for.
### Is it slow?
Be sure to specify 1 sample and 1 iteration.
python ./scripts/orig_scripts/txt2img.py --prompt "ocean" --ddim_steps 5 --n_samples 1 --n_iter 1
### Doesn't work anymore?
PyTorch nightly includes support for MPS. Because of this, this setup is inherently unstable. One morning I woke up and it no longer worked no matter what I did until I switched to miniforge. However, I have another Mac that works just fine with Anaconda. If you can't get it to work, please search a little first because many of the errors will get posted and solved. If you can't find a solution please [create an issue](https://github.com/lstein/stable-diffusion/issues).
One debugging step is to update to the latest version of PyTorch nightly.
conda install pytorch torchvision torchaudio -c pytorch-nightly
If `conda env create -f environment-mac.yaml` takes forever run this.
git clean -f
And run this.
conda clean --yes --all
Or you could reset Anaconda.
conda update --force-reinstall -y -n base -c defaults conda
### "No module named cv2", torch, 'ldm', 'transformers', 'taming', etc.
There are several causes of these errors.
First, did you remember to `conda activate ldm`? If your terminal prompt
begins with "(ldm)" then you activated it. If it begins with "(base)"
or something else you haven't.
Second, you might've run `./scripts/preload_models.py` or `./scripts/dream.py`
instead of `python ./scripts/preload_models.py` or `python ./scripts/dream.py`.
The cause of this error is long so it's below.
Third, if it says you're missing taming you need to rebuild your virtual
environment.
conda env remove -n ldm
conda env create -f environment-mac.yaml
Fourth, If you have activated the ldm virtual environment and tried rebuilding it, maybe the problem could be that I have something installed that you don't and you'll just need to manually install it. Make sure you activate the virtual environment so it installs there instead of
globally.
conda activate ldm
pip install *name*
You might also need to install Rust (I mention this again below).
### How many snakes are living in your computer?
Here's the reason why you have to specify which python to use.
There are several versions of python on macOS and the computer is
picking the wrong one. More specifically, preload_models.py and dream.py says to
find the first `python3` in the path environment variable. You can see which one
it is picking with `which python3`. These are the mostly likely paths you'll see.
% which python3
/usr/bin/python3
The above path is part of the OS. However, that path is a stub that asks you if
you want to install Xcode. If you have Xcode installed already,
/usr/bin/python3 will execute /Library/Developer/CommandLineTools/usr/bin/python3 or
/Applications/Xcode.app/Contents/Developer/usr/bin/python3 (depending on which
Xcode you've selected with `xcode-select`).
% which python3
/opt/homebrew/bin/python3
If you installed python3 with Homebrew and you've modified your path to search
for Homebrew binaries before system ones, you'll see the above path.
% which python
/opt/anaconda3/bin/python
If you drop the "3" you get an entirely different python. Note: starting in
macOS 12.3, /usr/bin/python no longer exists (it was python 2 anyway).
If you have Anaconda installed, this is what you'll see. There is a
/opt/anaconda3/bin/python3 also.
(ldm) % which python
/Users/name/miniforge3/envs/ldm/bin/python
This is what you'll see if you have miniforge and you've correctly activated
the ldm environment. This is the goal.
It's all a mess and you should know [how to modify the path environment variable](https://support.apple.com/guide/terminal/use-environment-variables-apd382cc5fa-4f58-4449-b20a-41c53c006f8f/mac)
if you want to fix it. Here's a brief hint of all the ways you can modify it
(don't really have the time to explain it all here).
- ~/.zshrc
- ~/.bash_profile
- ~/.bashrc
- /etc/paths.d
- /etc/path
Which one you use will depend on what you have installed except putting a file
in /etc/paths.d is what I prefer to do.
### Debugging?
Tired of waiting for your renders to finish before you can see if it
works? Reduce the steps! The image quality will be horrible but at least you'll
get quick feedback.
python ./scripts/txt2img.py --prompt "ocean" --ddim_steps 5 --n_samples 1 --n_iter 1
### OSError: Can't load tokenizer for 'openai/clip-vit-large-patch14'...
python scripts/preload_models.py
### "The operator [name] is not current implemented for the MPS device." (sic)
Example error.
```
...
NotImplementedError: The operator 'aten::_index_put_impl_' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on [https://github.com/pytorch/pytorch/issues/77764](https://github.com/pytorch/pytorch/issues/77764). As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
```
The lstein branch includes this fix in [environment-mac.yaml](https://github.com/lstein/stable-diffusion/blob/main/environment-mac.yaml).
### "Could not build wheels for tokenizers"
I have not seen this error because I had Rust installed on my computer before I started playing with Stable Diffusion. The fix is to install Rust.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
### How come `--seed` doesn't work?
First this:
> Completely reproducible results are not guaranteed across PyTorch
> releases, individual commits, or different platforms. Furthermore,
> results may not be reproducible between CPU and GPU executions, even
> when using identical seeds.
[PyTorch docs](https://pytorch.org/docs/stable/notes/randomness.html)
Second, we might have a fix that at least gets a consistent seed sort of. We're
still working on it.
### libiomp5.dylib error?
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
You are likely using an Intel package by mistake. Be sure to run conda with
the environment variable `CONDA_SUBDIR=osx-arm64`, like so:
`CONDA_SUBDIR=osx-arm64 conda install ...`
This error happens with Anaconda on Macs when the Intel-only `mkl` is pulled in by
a dependency. [nomkl](https://stackoverflow.com/questions/66224879/what-is-the-nomkl-python-package-used-for)
is a metapackage designed to prevent this, by making it impossible to install
`mkl`, but if your environment is already broken it may not work.
Do _not_ use `os.environ['KMP_DUPLICATE_LIB_OK']='True'` or equivalents as this
masks the underlying issue of using Intel packages.
### Not enough memory.
This seems to be a common problem and is probably the underlying
problem for a lot of symptoms (listed below). The fix is to lower your
image size or to add `model.half()` right after the model is loaded. I
should probably test it out. I've read that the reason this fixes
problems is because it converts the model from 32-bit to 16-bit and
that leaves more RAM for other things. I have no idea how that would
affect the quality of the images though.
See [this issue](https://github.com/CompVis/stable-diffusion/issues/71).
### "Error: product of dimension sizes > 2\*\*31'"
This error happens with img2img, which I haven't played with too much
yet. But I know it's because your image is too big or the resolution
isn't a multiple of 32x32. Because the stable-diffusion model was
trained on images that were 512 x 512, it's always best to use that
output size (which is the default). However, if you're using that size
and you get the above error, try 256 x 256 or 512 x 256 or something
as the source image.
BTW, 2\*\*31-1 = [2,147,483,647](https://en.wikipedia.org/wiki/2,147,483,647#In_computing), which is also 32-bit signed [LONG_MAX](https://en.wikipedia.org/wiki/C_data_types) in C.
### I just got Rickrolled! Do I have a virus?
You don't have a virus. It's part of the project. Here's
[Rick](https://github.com/lstein/stable-diffusion/blob/main/assets/rick.jpeg)
and here's [the
code](https://github.com/lstein/stable-diffusion/blob/69ae4b35e0a0f6ee1af8bb9a5d0016ccb27e36dc/scripts/txt2img.py#L79)
that swaps him in. It's a NSFW filter, which IMO, doesn't work very
good (and we call this "computer vision", sheesh).
Actually, this could be happening because there's not enough RAM. You could try the `model.half()` suggestion or specify smaller output images.
### My images come out black
We might have this fixed, we are still testing.
There's a [similar issue](https://github.com/CompVis/stable-diffusion/issues/69)
on CUDA GPU's where the images come out green. Maybe it's the same issue?
Someone in that issue says to use "--precision full", but this fork
actually disables that flag. I don't know why, someone else provided
that code and I don't know what it does. Maybe the `model.half()`
suggestion above would fix this issue too. I should probably test it.
### "view size is not compatible with input tensor's size and stride"
```
File "/opt/anaconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py", line 2511, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
```
Update to the latest version of lstein/stable-diffusion. We were
patching pytorch but we found a file in stable-diffusion that we could
change instead. This is a 32-bit vs 16-bit problem.
### The processor must support the Intel bla bla bla
What? Intel? On an Apple Silicon?
Intel MKL FATAL ERROR: This system does not meet the minimum requirements for use of the Intel(R) Math Kernel Library.
The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
This is due to the Intel `mkl` package getting picked up when you try to install
something that depends on it-- Rosetta can translate some Intel instructions but
not the specialized ones here. To avoid this, make sure to use the environment
variable `CONDA_SUBDIR=osx-arm64`, which restricts the Conda environment to only
use ARM packages, and use `nomkl` as described above.
### input types 'tensor<2x1280xf32>' and 'tensor<\*xf16>' are not broadcast compatible
May appear when just starting to generate, e.g.:
```
dream> clouds
Generating: 0%| | 0/1 [00:00<?, ?it/s]/Users/[...]/dev/stable-diffusion/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1662016319283/work/aten/src/ATen/mps/MPSFallback.mm:11.)
placeholder_idx = torch.where(
loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
Abort trap: 6
/Users/[...]/opt/anaconda3/envs/ldm/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
```
Macs do not support autocast/mixed-precision. Supply `--full_precision` to use float32 everywhere.

View File

@@ -0,0 +1,112 @@
# **Windows Installation**
## **Notebook install (semi-automated)**
We have a [Jupyter
notebook](https://github.com/lstein/stable-diffusion/blob/main/notebooks/Stable-Diffusion-local-Windows.ipynb)
with cell-by-cell installation steps. It will download the code in
this repo as one of the steps, so instead of cloning this repo, simply
download the notebook from the link above and load it up in VSCode
(with the appropriate extensions installed)/Jupyter/JupyterLab and
start running the cells one-by-one.
Note that you will need NVIDIA drivers, Python 3.10, and Git installed
beforehand - simplified [step-by-step
instructions](https://github.com/lstein/stable-diffusion/wiki/Easy-peasy-Windows-install)
are available in the wiki (you'll only need steps 1, 2, & 3 ).
## **Manual Install**
### **pip**
See [Easy-peasy Windows install](https://github.com/lstein/stable-diffusion/wiki/Easy-peasy-Windows-install)
in the wiki
### **Conda**
1. Install Anaconda3 (miniconda3 version) from here: https://docs.anaconda.com/anaconda/install/windows/
2. Install Git from here: https://git-scm.com/download/win
3. Launch Anaconda from the Windows Start menu. This will bring up a command window. Type all the remaining commands in this window.
4. Run the command:
```
git clone https://github.com/lstein/stable-diffusion.git
```
This will create stable-diffusion folder where you will follow the rest of the steps.
5. Enter the newly-created stable-diffusion folder. From this step forward make sure that you are working in the stable-diffusion directory!
```
cd stable-diffusion
```
6. Run the following two commands:
```
conda env create -f environment.yaml (step 6a)
conda activate ldm (step 6b)
```
This will install all python requirements and activate the "ldm"
environment which sets PATH and other environment variables properly.
7. Run the command:
```
python scripts\preload_models.py
```
This installs several machine learning models that stable diffusion requires.
Note: This step is required. This was done because some users may might be blocked by firewalls or have limited internet connectivity for the models to be downloaded just-in-time.
8. Now you need to install the weights for the big stable diffusion model.
- For running with the released weights, you will first need to set up an acount with Hugging Face (https://huggingface.co).
- Use your credentials to log in, and then point your browser at https://huggingface.co/CompVis/stable-diffusion-v-1-4-original.
- You may be asked to sign a license agreement at this point.
- Click on "Files and versions" near the top of the page, and then click on the file named `sd-v1-4.ckpt`. You'll be taken to a page that
prompts you to click the "download" link. Now save the file somewhere safe on your local machine.
- The weight file is >4 GB in size, so
downloading may take a while.
Now run the following commands from **within the stable-diffusion directory** to copy the weights file to the right place:
```
mkdir -p models\ldm\stable-diffusion-v1
copy C:\path\to\sd-v1-4.ckpt models\ldm\stable-diffusion-v1\model.ckpt
```
Please replace `C:\path\to\sd-v1.4.ckpt` with the correct path to wherever you stashed this file. If you prefer not to copy or move the .ckpt file,
you may instead create a shortcut to it from within `models\ldm\stable-diffusion-v1\`.
9. Start generating images!
```
# for the pre-release weights
python scripts\dream.py -l
# for the post-release weights
python scripts\dream.py
```
10. Subsequently, to relaunch the script, first activate the Anaconda command window (step 3),enter the stable-diffusion directory (step 5, `cd \path\to\stable-diffusion`), run `conda activate ldm` (step 6b), and then launch the dream script (step 9).
**Note:** Tildebyte has written an alternative ["Easy peasy Windows
install"](https://github.com/lstein/stable-diffusion/wiki/Easy-peasy-Windows-install)
which uses the Windows Powershell and pew. If you are having trouble with Anaconda on Windows, give this a try (or try it first!)
### Updating to newer versions of the script
This distribution is changing rapidly. If you used the `git clone` method (step 5) to download the stable-diffusion directory, then to update to the latest and greatest version, launch the Anaconda window, enter `stable-diffusion`, and type:
```
git pull
conda env update -f environment.yaml
```
This will bring your local copy into sync with the remote one.

54
environment-mac.yaml Normal file
View File

@@ -0,0 +1,54 @@
name: ldm
channels:
- pytorch
- conda-forge
dependencies:
- python==3.10.5
- pip==22.2.2
# pytorch left unpinned
- pytorch
- torchvision
# I suggest to keep the other deps sorted for convenience.
# To determine what the latest versions should be, run:
#
# ```shell
# sed -E 's/ldm/ldm-updated/;20,99s/- ([^=]+)==.+/- \1/' environment-mac.yaml > environment-mac-updated.yml
# CONDA_SUBDIR=osx-arm64 conda env create -f environment-mac-updated.yml && conda list -n ldm-updated | awk ' {print " - " $1 "==" $2;} '
# ```
- albumentations==1.2.1
- coloredlogs==15.0.1
- einops==0.4.1
- grpcio==1.46.4
- humanfriendly==10.0
- imageio==2.21.2
- imageio-ffmpeg==0.4.7
- imgaug==0.4.0
- kornia==0.6.7
- mpmath==1.2.1
- nomkl
- numpy==1.23.2
- omegaconf==2.1.1
- onnx==1.12.0
- onnxruntime==1.12.1
- pudb==2022.1
- pytorch-lightning==1.6.5
- scipy==1.9.1
- streamlit==1.12.2
- sympy==1.10.1
- tensorboard==2.9.0
- torchmetrics==0.9.3
- pip:
- opencv-python==4.6.0
- realesrgan==0.2.5.0
- test-tube==0.7.5
- transformers==4.21.2
- torch-fidelity==0.3.0
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
- -e git+https://github.com/Birch-san/k-diffusion.git@mps#egg=k_diffusion
- -e git+https://github.com/lstein/GFPGAN@fix-dark-cast-images#egg=gfpgan
- -e .
variables:
PYTORCH_ENABLE_MPS_FALLBACK: 1

View File

@@ -1,4 +1,4 @@
name: ldm
name: sd-ldm
channels:
- pytorch
- defaults
@@ -11,21 +11,23 @@ dependencies:
- numpy=1.19.2
- pip:
- albumentations==0.4.3
- opencv-python==4.1.2.30
- opencv-python==4.5.5.64
- pudb==2019.2
- imageio==2.9.0
- imageio-ffmpeg==0.4.2
- pytorch-lightning==1.4.2
- omegaconf==2.1.1
- realesrgan==0.2.5.0
- test-tube>=0.7.5
- streamlit>=0.73.1
- streamlit==1.12.0
- pillow==9.2.0
- einops==0.3.0
- torch-fidelity==0.3.0
- transformers==4.19.2
- torchmetrics==0.6.0
- kornia==0.6
- accelerate==0.12.0
- kornia==0.6.0
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
- -e git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion
- -e git+https://github.com/lstein/GFPGAN@fix-dark-cast-images#egg=gfpgan
- -e .

View File

@@ -1,11 +1,17 @@
from abc import abstractmethod
from torch.utils.data import Dataset, ConcatDataset, ChainDataset, IterableDataset
from torch.utils.data import (
Dataset,
ConcatDataset,
ChainDataset,
IterableDataset,
)
class Txt2ImgIterableBaseDataset(IterableDataset):
'''
"""
Define an interface to make the IterableDatasets for text2img data chainable
'''
"""
def __init__(self, num_records=0, valid_ids=None, size=256):
super().__init__()
self.num_records = num_records
@@ -13,11 +19,13 @@ class Txt2ImgIterableBaseDataset(IterableDataset):
self.sample_ids = valid_ids
self.size = size
print(f'{self.__class__.__name__} dataset contains {self.__len__()} examples.')
print(
f'{self.__class__.__name__} dataset contains {self.__len__()} examples.'
)
def __len__(self):
return self.num_records
@abstractmethod
def __iter__(self):
pass
pass

View File

@@ -11,24 +11,34 @@ from tqdm import tqdm
from torch.utils.data import Dataset, Subset
import taming.data.utils as tdu
from taming.data.imagenet import str_to_indices, give_synsets_from_indices, download, retrieve
from taming.data.imagenet import (
str_to_indices,
give_synsets_from_indices,
download,
retrieve,
)
from taming.data.imagenet import ImagePaths
from ldm.modules.image_degradation import degradation_fn_bsr, degradation_fn_bsr_light
from ldm.modules.image_degradation import (
degradation_fn_bsr,
degradation_fn_bsr_light,
)
def synset2idx(path_to_yaml="data/index_synset.yaml"):
def synset2idx(path_to_yaml='data/index_synset.yaml'):
with open(path_to_yaml) as f:
di2s = yaml.load(f)
return dict((v,k) for k,v in di2s.items())
return dict((v, k) for k, v in di2s.items())
class ImageNetBase(Dataset):
def __init__(self, config=None):
self.config = config or OmegaConf.create()
if not type(self.config)==dict:
if not type(self.config) == dict:
self.config = OmegaConf.to_container(self.config)
self.keep_orig_class_label = self.config.get("keep_orig_class_label", False)
self.keep_orig_class_label = self.config.get(
'keep_orig_class_label', False
)
self.process_images = True # if False we skip loading & processing images and self.data contains filepaths
self._prepare()
self._prepare_synset_to_human()
@@ -46,17 +56,23 @@ class ImageNetBase(Dataset):
raise NotImplementedError()
def _filter_relpaths(self, relpaths):
ignore = set([
"n06596364_9591.JPEG",
])
relpaths = [rpath for rpath in relpaths if not rpath.split("/")[-1] in ignore]
if "sub_indices" in self.config:
indices = str_to_indices(self.config["sub_indices"])
synsets = give_synsets_from_indices(indices, path_to_yaml=self.idx2syn) # returns a list of strings
ignore = set(
[
'n06596364_9591.JPEG',
]
)
relpaths = [
rpath for rpath in relpaths if not rpath.split('/')[-1] in ignore
]
if 'sub_indices' in self.config:
indices = str_to_indices(self.config['sub_indices'])
synsets = give_synsets_from_indices(
indices, path_to_yaml=self.idx2syn
) # returns a list of strings
self.synset2idx = synset2idx(path_to_yaml=self.idx2syn)
files = []
for rpath in relpaths:
syn = rpath.split("/")[0]
syn = rpath.split('/')[0]
if syn in synsets:
files.append(rpath)
return files
@@ -65,78 +81,89 @@ class ImageNetBase(Dataset):
def _prepare_synset_to_human(self):
SIZE = 2655750
URL = "https://heibox.uni-heidelberg.de/f/9f28e956cd304264bb82/?dl=1"
self.human_dict = os.path.join(self.root, "synset_human.txt")
if (not os.path.exists(self.human_dict) or
not os.path.getsize(self.human_dict)==SIZE):
URL = 'https://heibox.uni-heidelberg.de/f/9f28e956cd304264bb82/?dl=1'
self.human_dict = os.path.join(self.root, 'synset_human.txt')
if (
not os.path.exists(self.human_dict)
or not os.path.getsize(self.human_dict) == SIZE
):
download(URL, self.human_dict)
def _prepare_idx_to_synset(self):
URL = "https://heibox.uni-heidelberg.de/f/d835d5b6ceda4d3aa910/?dl=1"
self.idx2syn = os.path.join(self.root, "index_synset.yaml")
if (not os.path.exists(self.idx2syn)):
URL = 'https://heibox.uni-heidelberg.de/f/d835d5b6ceda4d3aa910/?dl=1'
self.idx2syn = os.path.join(self.root, 'index_synset.yaml')
if not os.path.exists(self.idx2syn):
download(URL, self.idx2syn)
def _prepare_human_to_integer_label(self):
URL = "https://heibox.uni-heidelberg.de/f/2362b797d5be43b883f6/?dl=1"
self.human2integer = os.path.join(self.root, "imagenet1000_clsidx_to_labels.txt")
if (not os.path.exists(self.human2integer)):
URL = 'https://heibox.uni-heidelberg.de/f/2362b797d5be43b883f6/?dl=1'
self.human2integer = os.path.join(
self.root, 'imagenet1000_clsidx_to_labels.txt'
)
if not os.path.exists(self.human2integer):
download(URL, self.human2integer)
with open(self.human2integer, "r") as f:
with open(self.human2integer, 'r') as f:
lines = f.read().splitlines()
assert len(lines) == 1000
self.human2integer_dict = dict()
for line in lines:
value, key = line.split(":")
value, key = line.split(':')
self.human2integer_dict[key] = int(value)
def _load(self):
with open(self.txt_filelist, "r") as f:
with open(self.txt_filelist, 'r') as f:
self.relpaths = f.read().splitlines()
l1 = len(self.relpaths)
self.relpaths = self._filter_relpaths(self.relpaths)
print("Removed {} files from filelist during filtering.".format(l1 - len(self.relpaths)))
print(
'Removed {} files from filelist during filtering.'.format(
l1 - len(self.relpaths)
)
)
self.synsets = [p.split("/")[0] for p in self.relpaths]
self.synsets = [p.split('/')[0] for p in self.relpaths]
self.abspaths = [os.path.join(self.datadir, p) for p in self.relpaths]
unique_synsets = np.unique(self.synsets)
class_dict = dict((synset, i) for i, synset in enumerate(unique_synsets))
class_dict = dict(
(synset, i) for i, synset in enumerate(unique_synsets)
)
if not self.keep_orig_class_label:
self.class_labels = [class_dict[s] for s in self.synsets]
else:
self.class_labels = [self.synset2idx[s] for s in self.synsets]
with open(self.human_dict, "r") as f:
with open(self.human_dict, 'r') as f:
human_dict = f.read().splitlines()
human_dict = dict(line.split(maxsplit=1) for line in human_dict)
self.human_labels = [human_dict[s] for s in self.synsets]
labels = {
"relpath": np.array(self.relpaths),
"synsets": np.array(self.synsets),
"class_label": np.array(self.class_labels),
"human_label": np.array(self.human_labels),
'relpath': np.array(self.relpaths),
'synsets': np.array(self.synsets),
'class_label': np.array(self.class_labels),
'human_label': np.array(self.human_labels),
}
if self.process_images:
self.size = retrieve(self.config, "size", default=256)
self.data = ImagePaths(self.abspaths,
labels=labels,
size=self.size,
random_crop=self.random_crop,
)
self.size = retrieve(self.config, 'size', default=256)
self.data = ImagePaths(
self.abspaths,
labels=labels,
size=self.size,
random_crop=self.random_crop,
)
else:
self.data = self.abspaths
class ImageNetTrain(ImageNetBase):
NAME = "ILSVRC2012_train"
URL = "http://www.image-net.org/challenges/LSVRC/2012/"
AT_HASH = "a306397ccf9c2ead27155983c254227c0fd938e2"
NAME = 'ILSVRC2012_train'
URL = 'http://www.image-net.org/challenges/LSVRC/2012/'
AT_HASH = 'a306397ccf9c2ead27155983c254227c0fd938e2'
FILES = [
"ILSVRC2012_img_train.tar",
'ILSVRC2012_img_train.tar',
]
SIZES = [
147897477120,
@@ -151,57 +178,64 @@ class ImageNetTrain(ImageNetBase):
if self.data_root:
self.root = os.path.join(self.data_root, self.NAME)
else:
cachedir = os.environ.get("XDG_CACHE_HOME", os.path.expanduser("~/.cache"))
self.root = os.path.join(cachedir, "autoencoders/data", self.NAME)
cachedir = os.environ.get(
'XDG_CACHE_HOME', os.path.expanduser('~/.cache')
)
self.root = os.path.join(cachedir, 'autoencoders/data', self.NAME)
self.datadir = os.path.join(self.root, "data")
self.txt_filelist = os.path.join(self.root, "filelist.txt")
self.datadir = os.path.join(self.root, 'data')
self.txt_filelist = os.path.join(self.root, 'filelist.txt')
self.expected_length = 1281167
self.random_crop = retrieve(self.config, "ImageNetTrain/random_crop",
default=True)
self.random_crop = retrieve(
self.config, 'ImageNetTrain/random_crop', default=True
)
if not tdu.is_prepared(self.root):
# prep
print("Preparing dataset {} in {}".format(self.NAME, self.root))
print('Preparing dataset {} in {}'.format(self.NAME, self.root))
datadir = self.datadir
if not os.path.exists(datadir):
path = os.path.join(self.root, self.FILES[0])
if not os.path.exists(path) or not os.path.getsize(path)==self.SIZES[0]:
if (
not os.path.exists(path)
or not os.path.getsize(path) == self.SIZES[0]
):
import academictorrents as at
atpath = at.get(self.AT_HASH, datastore=self.root)
assert atpath == path
print("Extracting {} to {}".format(path, datadir))
print('Extracting {} to {}'.format(path, datadir))
os.makedirs(datadir, exist_ok=True)
with tarfile.open(path, "r:") as tar:
with tarfile.open(path, 'r:') as tar:
tar.extractall(path=datadir)
print("Extracting sub-tars.")
subpaths = sorted(glob.glob(os.path.join(datadir, "*.tar")))
print('Extracting sub-tars.')
subpaths = sorted(glob.glob(os.path.join(datadir, '*.tar')))
for subpath in tqdm(subpaths):
subdir = subpath[:-len(".tar")]
subdir = subpath[: -len('.tar')]
os.makedirs(subdir, exist_ok=True)
with tarfile.open(subpath, "r:") as tar:
with tarfile.open(subpath, 'r:') as tar:
tar.extractall(path=subdir)
filelist = glob.glob(os.path.join(datadir, "**", "*.JPEG"))
filelist = glob.glob(os.path.join(datadir, '**', '*.JPEG'))
filelist = [os.path.relpath(p, start=datadir) for p in filelist]
filelist = sorted(filelist)
filelist = "\n".join(filelist)+"\n"
with open(self.txt_filelist, "w") as f:
filelist = '\n'.join(filelist) + '\n'
with open(self.txt_filelist, 'w') as f:
f.write(filelist)
tdu.mark_prepared(self.root)
class ImageNetValidation(ImageNetBase):
NAME = "ILSVRC2012_validation"
URL = "http://www.image-net.org/challenges/LSVRC/2012/"
AT_HASH = "5d6d0df7ed81efd49ca99ea4737e0ae5e3a5f2e5"
VS_URL = "https://heibox.uni-heidelberg.de/f/3e0f6e9c624e45f2bd73/?dl=1"
NAME = 'ILSVRC2012_validation'
URL = 'http://www.image-net.org/challenges/LSVRC/2012/'
AT_HASH = '5d6d0df7ed81efd49ca99ea4737e0ae5e3a5f2e5'
VS_URL = 'https://heibox.uni-heidelberg.de/f/3e0f6e9c624e45f2bd73/?dl=1'
FILES = [
"ILSVRC2012_img_val.tar",
"validation_synset.txt",
'ILSVRC2012_img_val.tar',
'validation_synset.txt',
]
SIZES = [
6744924160,
@@ -217,39 +251,49 @@ class ImageNetValidation(ImageNetBase):
if self.data_root:
self.root = os.path.join(self.data_root, self.NAME)
else:
cachedir = os.environ.get("XDG_CACHE_HOME", os.path.expanduser("~/.cache"))
self.root = os.path.join(cachedir, "autoencoders/data", self.NAME)
self.datadir = os.path.join(self.root, "data")
self.txt_filelist = os.path.join(self.root, "filelist.txt")
cachedir = os.environ.get(
'XDG_CACHE_HOME', os.path.expanduser('~/.cache')
)
self.root = os.path.join(cachedir, 'autoencoders/data', self.NAME)
self.datadir = os.path.join(self.root, 'data')
self.txt_filelist = os.path.join(self.root, 'filelist.txt')
self.expected_length = 50000
self.random_crop = retrieve(self.config, "ImageNetValidation/random_crop",
default=False)
self.random_crop = retrieve(
self.config, 'ImageNetValidation/random_crop', default=False
)
if not tdu.is_prepared(self.root):
# prep
print("Preparing dataset {} in {}".format(self.NAME, self.root))
print('Preparing dataset {} in {}'.format(self.NAME, self.root))
datadir = self.datadir
if not os.path.exists(datadir):
path = os.path.join(self.root, self.FILES[0])
if not os.path.exists(path) or not os.path.getsize(path)==self.SIZES[0]:
if (
not os.path.exists(path)
or not os.path.getsize(path) == self.SIZES[0]
):
import academictorrents as at
atpath = at.get(self.AT_HASH, datastore=self.root)
assert atpath == path
print("Extracting {} to {}".format(path, datadir))
print('Extracting {} to {}'.format(path, datadir))
os.makedirs(datadir, exist_ok=True)
with tarfile.open(path, "r:") as tar:
with tarfile.open(path, 'r:') as tar:
tar.extractall(path=datadir)
vspath = os.path.join(self.root, self.FILES[1])
if not os.path.exists(vspath) or not os.path.getsize(vspath)==self.SIZES[1]:
if (
not os.path.exists(vspath)
or not os.path.getsize(vspath) == self.SIZES[1]
):
download(self.VS_URL, vspath)
with open(vspath, "r") as f:
with open(vspath, 'r') as f:
synset_dict = f.read().splitlines()
synset_dict = dict(line.split() for line in synset_dict)
print("Reorganizing into synset folders")
print('Reorganizing into synset folders')
synsets = np.unique(list(synset_dict.values()))
for s in synsets:
os.makedirs(os.path.join(datadir, s), exist_ok=True)
@@ -258,21 +302,26 @@ class ImageNetValidation(ImageNetBase):
dst = os.path.join(datadir, v)
shutil.move(src, dst)
filelist = glob.glob(os.path.join(datadir, "**", "*.JPEG"))
filelist = glob.glob(os.path.join(datadir, '**', '*.JPEG'))
filelist = [os.path.relpath(p, start=datadir) for p in filelist]
filelist = sorted(filelist)
filelist = "\n".join(filelist)+"\n"
with open(self.txt_filelist, "w") as f:
filelist = '\n'.join(filelist) + '\n'
with open(self.txt_filelist, 'w') as f:
f.write(filelist)
tdu.mark_prepared(self.root)
class ImageNetSR(Dataset):
def __init__(self, size=None,
degradation=None, downscale_f=4, min_crop_f=0.5, max_crop_f=1.,
random_crop=True):
def __init__(
self,
size=None,
degradation=None,
downscale_f=4,
min_crop_f=0.5,
max_crop_f=1.0,
random_crop=True,
):
"""
Imagenet Superresolution Dataloader
Performs following ops in order:
@@ -296,67 +345,86 @@ class ImageNetSR(Dataset):
self.LR_size = int(size / downscale_f)
self.min_crop_f = min_crop_f
self.max_crop_f = max_crop_f
assert(max_crop_f <= 1.)
assert max_crop_f <= 1.0
self.center_crop = not random_crop
self.image_rescaler = albumentations.SmallestMaxSize(max_size=size, interpolation=cv2.INTER_AREA)
self.image_rescaler = albumentations.SmallestMaxSize(
max_size=size, interpolation=cv2.INTER_AREA
)
self.pil_interpolation = False # gets reset later if incase interp_op is from pillow
self.pil_interpolation = (
False # gets reset later if incase interp_op is from pillow
)
if degradation == "bsrgan":
self.degradation_process = partial(degradation_fn_bsr, sf=downscale_f)
if degradation == 'bsrgan':
self.degradation_process = partial(
degradation_fn_bsr, sf=downscale_f
)
elif degradation == "bsrgan_light":
self.degradation_process = partial(degradation_fn_bsr_light, sf=downscale_f)
elif degradation == 'bsrgan_light':
self.degradation_process = partial(
degradation_fn_bsr_light, sf=downscale_f
)
else:
interpolation_fn = {
"cv_nearest": cv2.INTER_NEAREST,
"cv_bilinear": cv2.INTER_LINEAR,
"cv_bicubic": cv2.INTER_CUBIC,
"cv_area": cv2.INTER_AREA,
"cv_lanczos": cv2.INTER_LANCZOS4,
"pil_nearest": PIL.Image.NEAREST,
"pil_bilinear": PIL.Image.BILINEAR,
"pil_bicubic": PIL.Image.BICUBIC,
"pil_box": PIL.Image.BOX,
"pil_hamming": PIL.Image.HAMMING,
"pil_lanczos": PIL.Image.LANCZOS,
'cv_nearest': cv2.INTER_NEAREST,
'cv_bilinear': cv2.INTER_LINEAR,
'cv_bicubic': cv2.INTER_CUBIC,
'cv_area': cv2.INTER_AREA,
'cv_lanczos': cv2.INTER_LANCZOS4,
'pil_nearest': PIL.Image.NEAREST,
'pil_bilinear': PIL.Image.BILINEAR,
'pil_bicubic': PIL.Image.BICUBIC,
'pil_box': PIL.Image.BOX,
'pil_hamming': PIL.Image.HAMMING,
'pil_lanczos': PIL.Image.LANCZOS,
}[degradation]
self.pil_interpolation = degradation.startswith("pil_")
self.pil_interpolation = degradation.startswith('pil_')
if self.pil_interpolation:
self.degradation_process = partial(TF.resize, size=self.LR_size, interpolation=interpolation_fn)
self.degradation_process = partial(
TF.resize,
size=self.LR_size,
interpolation=interpolation_fn,
)
else:
self.degradation_process = albumentations.SmallestMaxSize(max_size=self.LR_size,
interpolation=interpolation_fn)
self.degradation_process = albumentations.SmallestMaxSize(
max_size=self.LR_size, interpolation=interpolation_fn
)
def __len__(self):
return len(self.base)
def __getitem__(self, i):
example = self.base[i]
image = Image.open(example["file_path_"])
image = Image.open(example['file_path_'])
if not image.mode == "RGB":
image = image.convert("RGB")
if not image.mode == 'RGB':
image = image.convert('RGB')
image = np.array(image).astype(np.uint8)
min_side_len = min(image.shape[:2])
crop_side_len = min_side_len * np.random.uniform(self.min_crop_f, self.max_crop_f, size=None)
crop_side_len = min_side_len * np.random.uniform(
self.min_crop_f, self.max_crop_f, size=None
)
crop_side_len = int(crop_side_len)
if self.center_crop:
self.cropper = albumentations.CenterCrop(height=crop_side_len, width=crop_side_len)
self.cropper = albumentations.CenterCrop(
height=crop_side_len, width=crop_side_len
)
else:
self.cropper = albumentations.RandomCrop(height=crop_side_len, width=crop_side_len)
self.cropper = albumentations.RandomCrop(
height=crop_side_len, width=crop_side_len
)
image = self.cropper(image=image)["image"]
image = self.image_rescaler(image=image)["image"]
image = self.cropper(image=image)['image']
image = self.image_rescaler(image=image)['image']
if self.pil_interpolation:
image_pil = PIL.Image.fromarray(image)
@@ -364,10 +432,10 @@ class ImageNetSR(Dataset):
LR_image = np.array(LR_image).astype(np.uint8)
else:
LR_image = self.degradation_process(image=image)["image"]
LR_image = self.degradation_process(image=image)['image']
example["image"] = (image/127.5 - 1.0).astype(np.float32)
example["LR_image"] = (LR_image/127.5 - 1.0).astype(np.float32)
example['image'] = (image / 127.5 - 1.0).astype(np.float32)
example['LR_image'] = (LR_image / 127.5 - 1.0).astype(np.float32)
return example
@@ -377,9 +445,11 @@ class ImageNetSRTrain(ImageNetSR):
super().__init__(**kwargs)
def get_base(self):
with open("data/imagenet_train_hr_indices.p", "rb") as f:
with open('data/imagenet_train_hr_indices.p', 'rb') as f:
indices = pickle.load(f)
dset = ImageNetTrain(process_images=False,)
dset = ImageNetTrain(
process_images=False,
)
return Subset(dset, indices)
@@ -388,7 +458,9 @@ class ImageNetSRValidation(ImageNetSR):
super().__init__(**kwargs)
def get_base(self):
with open("data/imagenet_val_hr_indices.p", "rb") as f:
with open('data/imagenet_val_hr_indices.p', 'rb') as f:
indices = pickle.load(f)
dset = ImageNetValidation(process_images=False,)
dset = ImageNetValidation(
process_images=False,
)
return Subset(dset, indices)

View File

@@ -7,30 +7,33 @@ from torchvision import transforms
class LSUNBase(Dataset):
def __init__(self,
txt_file,
data_root,
size=None,
interpolation="bicubic",
flip_p=0.5
):
def __init__(
self,
txt_file,
data_root,
size=None,
interpolation='bicubic',
flip_p=0.5,
):
self.data_paths = txt_file
self.data_root = data_root
with open(self.data_paths, "r") as f:
with open(self.data_paths, 'r') as f:
self.image_paths = f.read().splitlines()
self._length = len(self.image_paths)
self.labels = {
"relative_file_path_": [l for l in self.image_paths],
"file_path_": [os.path.join(self.data_root, l)
for l in self.image_paths],
'relative_file_path_': [l for l in self.image_paths],
'file_path_': [
os.path.join(self.data_root, l) for l in self.image_paths
],
}
self.size = size
self.interpolation = {"linear": PIL.Image.LINEAR,
"bilinear": PIL.Image.BILINEAR,
"bicubic": PIL.Image.BICUBIC,
"lanczos": PIL.Image.LANCZOS,
}[interpolation]
self.interpolation = {
'linear': PIL.Image.LINEAR,
'bilinear': PIL.Image.BILINEAR,
'bicubic': PIL.Image.BICUBIC,
'lanczos': PIL.Image.LANCZOS,
}[interpolation]
self.flip = transforms.RandomHorizontalFlip(p=flip_p)
def __len__(self):
@@ -38,55 +41,86 @@ class LSUNBase(Dataset):
def __getitem__(self, i):
example = dict((k, self.labels[k][i]) for k in self.labels)
image = Image.open(example["file_path_"])
if not image.mode == "RGB":
image = image.convert("RGB")
image = Image.open(example['file_path_'])
if not image.mode == 'RGB':
image = image.convert('RGB')
# default to score-sde preprocessing
img = np.array(image).astype(np.uint8)
crop = min(img.shape[0], img.shape[1])
h, w, = img.shape[0], img.shape[1]
img = img[(h - crop) // 2:(h + crop) // 2,
(w - crop) // 2:(w + crop) // 2]
h, w, = (
img.shape[0],
img.shape[1],
)
img = img[
(h - crop) // 2 : (h + crop) // 2,
(w - crop) // 2 : (w + crop) // 2,
]
image = Image.fromarray(img)
if self.size is not None:
image = image.resize((self.size, self.size), resample=self.interpolation)
image = image.resize(
(self.size, self.size), resample=self.interpolation
)
image = self.flip(image)
image = np.array(image).astype(np.uint8)
example["image"] = (image / 127.5 - 1.0).astype(np.float32)
example['image'] = (image / 127.5 - 1.0).astype(np.float32)
return example
class LSUNChurchesTrain(LSUNBase):
def __init__(self, **kwargs):
super().__init__(txt_file="data/lsun/church_outdoor_train.txt", data_root="data/lsun/churches", **kwargs)
super().__init__(
txt_file='data/lsun/church_outdoor_train.txt',
data_root='data/lsun/churches',
**kwargs
)
class LSUNChurchesValidation(LSUNBase):
def __init__(self, flip_p=0., **kwargs):
super().__init__(txt_file="data/lsun/church_outdoor_val.txt", data_root="data/lsun/churches",
flip_p=flip_p, **kwargs)
def __init__(self, flip_p=0.0, **kwargs):
super().__init__(
txt_file='data/lsun/church_outdoor_val.txt',
data_root='data/lsun/churches',
flip_p=flip_p,
**kwargs
)
class LSUNBedroomsTrain(LSUNBase):
def __init__(self, **kwargs):
super().__init__(txt_file="data/lsun/bedrooms_train.txt", data_root="data/lsun/bedrooms", **kwargs)
super().__init__(
txt_file='data/lsun/bedrooms_train.txt',
data_root='data/lsun/bedrooms',
**kwargs
)
class LSUNBedroomsValidation(LSUNBase):
def __init__(self, flip_p=0.0, **kwargs):
super().__init__(txt_file="data/lsun/bedrooms_val.txt", data_root="data/lsun/bedrooms",
flip_p=flip_p, **kwargs)
super().__init__(
txt_file='data/lsun/bedrooms_val.txt',
data_root='data/lsun/bedrooms',
flip_p=flip_p,
**kwargs
)
class LSUNCatsTrain(LSUNBase):
def __init__(self, **kwargs):
super().__init__(txt_file="data/lsun/cat_train.txt", data_root="data/lsun/cats", **kwargs)
super().__init__(
txt_file='data/lsun/cat_train.txt',
data_root='data/lsun/cats',
**kwargs
)
class LSUNCatsValidation(LSUNBase):
def __init__(self, flip_p=0., **kwargs):
super().__init__(txt_file="data/lsun/cat_val.txt", data_root="data/lsun/cats",
flip_p=flip_p, **kwargs)
def __init__(self, flip_p=0.0, **kwargs):
super().__init__(
txt_file='data/lsun/cat_val.txt',
data_root='data/lsun/cats',
flip_p=flip_p,
**kwargs
)

202
ldm/data/personalized.py Normal file
View File

@@ -0,0 +1,202 @@
import os
import numpy as np
import PIL
from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms
import random
imagenet_templates_smallest = [
'a photo of a {}',
]
imagenet_templates_small = [
'a photo of a {}',
'a rendering of a {}',
'a cropped photo of the {}',
'the photo of a {}',
'a photo of a clean {}',
'a photo of a dirty {}',
'a dark photo of the {}',
'a photo of my {}',
'a photo of the cool {}',
'a close-up photo of a {}',
'a bright photo of the {}',
'a cropped photo of a {}',
'a photo of the {}',
'a good photo of the {}',
'a photo of one {}',
'a close-up photo of the {}',
'a rendition of the {}',
'a photo of the clean {}',
'a rendition of a {}',
'a photo of a nice {}',
'a good photo of a {}',
'a photo of the nice {}',
'a photo of the small {}',
'a photo of the weird {}',
'a photo of the large {}',
'a photo of a cool {}',
'a photo of a small {}',
]
imagenet_dual_templates_small = [
'a photo of a {} with {}',
'a rendering of a {} with {}',
'a cropped photo of the {} with {}',
'the photo of a {} with {}',
'a photo of a clean {} with {}',
'a photo of a dirty {} with {}',
'a dark photo of the {} with {}',
'a photo of my {} with {}',
'a photo of the cool {} with {}',
'a close-up photo of a {} with {}',
'a bright photo of the {} with {}',
'a cropped photo of a {} with {}',
'a photo of the {} with {}',
'a good photo of the {} with {}',
'a photo of one {} with {}',
'a close-up photo of the {} with {}',
'a rendition of the {} with {}',
'a photo of the clean {} with {}',
'a rendition of a {} with {}',
'a photo of a nice {} with {}',
'a good photo of a {} with {}',
'a photo of the nice {} with {}',
'a photo of the small {} with {}',
'a photo of the weird {} with {}',
'a photo of the large {} with {}',
'a photo of a cool {} with {}',
'a photo of a small {} with {}',
]
per_img_token_list = [
'א',
'ב',
'ג',
'ד',
'ה',
'ו',
'ז',
'ח',
'ט',
'י',
'כ',
'ל',
'מ',
'נ',
'ס',
'ע',
'פ',
'צ',
'ק',
'ר',
'ש',
'ת',
]
class PersonalizedBase(Dataset):
def __init__(
self,
data_root,
size=None,
repeats=100,
interpolation='bicubic',
flip_p=0.5,
set='train',
placeholder_token='*',
per_image_tokens=False,
center_crop=False,
mixing_prob=0.25,
coarse_class_text=None,
):
self.data_root = data_root
self.image_paths = [
os.path.join(self.data_root, file_path)
for file_path in os.listdir(self.data_root)
]
# self._length = len(self.image_paths)
self.num_images = len(self.image_paths)
self._length = self.num_images
self.placeholder_token = placeholder_token
self.per_image_tokens = per_image_tokens
self.center_crop = center_crop
self.mixing_prob = mixing_prob
self.coarse_class_text = coarse_class_text
if per_image_tokens:
assert self.num_images < len(
per_img_token_list
), f"Can't use per-image tokens when the training set contains more than {len(per_img_token_list)} tokens. To enable larger sets, add more tokens to 'per_img_token_list'."
if set == 'train':
self._length = self.num_images * repeats
self.size = size
self.interpolation = {
'linear': PIL.Image.LINEAR,
'bilinear': PIL.Image.BILINEAR,
'bicubic': PIL.Image.BICUBIC,
'lanczos': PIL.Image.LANCZOS,
}[interpolation]
self.flip = transforms.RandomHorizontalFlip(p=flip_p)
def __len__(self):
return self._length
def __getitem__(self, i):
example = {}
image = Image.open(self.image_paths[i % self.num_images])
if not image.mode == 'RGB':
image = image.convert('RGB')
placeholder_string = self.placeholder_token
if self.coarse_class_text:
placeholder_string = (
f'{self.coarse_class_text} {placeholder_string}'
)
if self.per_image_tokens and np.random.uniform() < self.mixing_prob:
text = random.choice(imagenet_dual_templates_small).format(
placeholder_string, per_img_token_list[i % self.num_images]
)
else:
text = random.choice(imagenet_templates_small).format(
placeholder_string
)
example['caption'] = text
# default to score-sde preprocessing
img = np.array(image).astype(np.uint8)
if self.center_crop:
crop = min(img.shape[0], img.shape[1])
h, w, = (
img.shape[0],
img.shape[1],
)
img = img[
(h - crop) // 2 : (h + crop) // 2,
(w - crop) // 2 : (w + crop) // 2,
]
image = Image.fromarray(img)
if self.size is not None:
image = image.resize(
(self.size, self.size), resample=self.interpolation
)
image = self.flip(image)
image = np.array(image).astype(np.uint8)
example['image'] = (image / 127.5 - 1.0).astype(np.float32)
return example

View File

@@ -0,0 +1,169 @@
import os
import numpy as np
import PIL
from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms
import random
imagenet_templates_small = [
'a painting in the style of {}',
'a rendering in the style of {}',
'a cropped painting in the style of {}',
'the painting in the style of {}',
'a clean painting in the style of {}',
'a dirty painting in the style of {}',
'a dark painting in the style of {}',
'a picture in the style of {}',
'a cool painting in the style of {}',
'a close-up painting in the style of {}',
'a bright painting in the style of {}',
'a cropped painting in the style of {}',
'a good painting in the style of {}',
'a close-up painting in the style of {}',
'a rendition in the style of {}',
'a nice painting in the style of {}',
'a small painting in the style of {}',
'a weird painting in the style of {}',
'a large painting in the style of {}',
]
imagenet_dual_templates_small = [
'a painting in the style of {} with {}',
'a rendering in the style of {} with {}',
'a cropped painting in the style of {} with {}',
'the painting in the style of {} with {}',
'a clean painting in the style of {} with {}',
'a dirty painting in the style of {} with {}',
'a dark painting in the style of {} with {}',
'a cool painting in the style of {} with {}',
'a close-up painting in the style of {} with {}',
'a bright painting in the style of {} with {}',
'a cropped painting in the style of {} with {}',
'a good painting in the style of {} with {}',
'a painting of one {} in the style of {}',
'a nice painting in the style of {} with {}',
'a small painting in the style of {} with {}',
'a weird painting in the style of {} with {}',
'a large painting in the style of {} with {}',
]
per_img_token_list = [
'א',
'ב',
'ג',
'ד',
'ה',
'ו',
'ז',
'ח',
'ט',
'י',
'כ',
'ל',
'מ',
'נ',
'ס',
'ע',
'פ',
'צ',
'ק',
'ר',
'ש',
'ת',
]
class PersonalizedBase(Dataset):
def __init__(
self,
data_root,
size=None,
repeats=100,
interpolation='bicubic',
flip_p=0.5,
set='train',
placeholder_token='*',
per_image_tokens=False,
center_crop=False,
):
self.data_root = data_root
self.image_paths = [
os.path.join(self.data_root, file_path)
for file_path in os.listdir(self.data_root)
]
# self._length = len(self.image_paths)
self.num_images = len(self.image_paths)
self._length = self.num_images
self.placeholder_token = placeholder_token
self.per_image_tokens = per_image_tokens
self.center_crop = center_crop
if per_image_tokens:
assert self.num_images < len(
per_img_token_list
), f"Can't use per-image tokens when the training set contains more than {len(per_img_token_list)} tokens. To enable larger sets, add more tokens to 'per_img_token_list'."
if set == 'train':
self._length = self.num_images * repeats
self.size = size
self.interpolation = {
'linear': PIL.Image.LINEAR,
'bilinear': PIL.Image.BILINEAR,
'bicubic': PIL.Image.BICUBIC,
'lanczos': PIL.Image.LANCZOS,
}[interpolation]
self.flip = transforms.RandomHorizontalFlip(p=flip_p)
def __len__(self):
return self._length
def __getitem__(self, i):
example = {}
image = Image.open(self.image_paths[i % self.num_images])
if not image.mode == 'RGB':
image = image.convert('RGB')
if self.per_image_tokens and np.random.uniform() < 0.25:
text = random.choice(imagenet_dual_templates_small).format(
self.placeholder_token, per_img_token_list[i % self.num_images]
)
else:
text = random.choice(imagenet_templates_small).format(
self.placeholder_token
)
example['caption'] = text
# default to score-sde preprocessing
img = np.array(image).astype(np.uint8)
if self.center_crop:
crop = min(img.shape[0], img.shape[1])
h, w, = (
img.shape[0],
img.shape[1],
)
img = img[
(h - crop) // 2 : (h + crop) // 2,
(w - crop) // 2 : (w + crop) // 2,
]
image = Image.fromarray(img)
if self.size is not None:
image = image.resize(
(self.size, self.size), resample=self.interpolation
)
image = self.flip(image)
image = np.array(image).astype(np.uint8)
example['image'] = (image / 127.5 - 1.0).astype(np.float32)
return example

96
ldm/dream/conditioning.py Normal file
View File

@@ -0,0 +1,96 @@
'''
This module handles the generation of the conditioning tensors, including management of
weighted subprompts.
Useful function exports:
get_uc_and_c() get the conditioned and unconditioned latent
split_weighted_subpromopts() split subprompts, normalize and weight them
log_tokenization() print out colour-coded tokens and warn if truncated
'''
import re
import torch
def get_uc_and_c(prompt, model, log_tokens=False, skip_normalize=False):
uc = model.get_learned_conditioning([''])
# get weighted sub-prompts
weighted_subprompts = split_weighted_subprompts(
prompt, skip_normalize
)
if len(weighted_subprompts) > 1:
# i dont know if this is correct.. but it works
c = torch.zeros_like(uc)
# normalize each "sub prompt" and add it
for subprompt, weight in weighted_subprompts:
log_tokenization(subprompt, model, log_tokens)
c = torch.add(
c,
model.get_learned_conditioning([subprompt]),
alpha=weight,
)
else: # just standard 1 prompt
log_tokenization(prompt, model, log_tokens)
c = model.get_learned_conditioning([prompt])
return (uc, c)
def split_weighted_subprompts(text, skip_normalize=False)->list:
"""
grabs all text up to the first occurrence of ':'
uses the grabbed text as a sub-prompt, and takes the value following ':' as weight
if ':' has no value defined, defaults to 1.0
repeats until no text remaining
"""
prompt_parser = re.compile("""
(?P<prompt> # capture group for 'prompt'
(?:\\\:|[^:])+ # match one or more non ':' characters or escaped colons '\:'
) # end 'prompt'
(?: # non-capture group
:+ # match one or more ':' characters
(?P<weight> # capture group for 'weight'
-?\d+(?:\.\d+)? # match positive or negative integer or decimal number
)? # end weight capture group, make optional
\s* # strip spaces after weight
| # OR
$ # else, if no ':' then match end of line
) # end non-capture group
""", re.VERBOSE)
parsed_prompts = [(match.group("prompt").replace("\\:", ":"), float(
match.group("weight") or 1)) for match in re.finditer(prompt_parser, text)]
if skip_normalize:
return parsed_prompts
weight_sum = sum(map(lambda x: x[1], parsed_prompts))
if weight_sum == 0:
print(
"Warning: Subprompt weights add up to zero. Discarding and using even weights instead.")
equal_weight = 1 / len(parsed_prompts)
return [(x[0], equal_weight) for x in parsed_prompts]
return [(x[0], x[1] / weight_sum) for x in parsed_prompts]
# shows how the prompt is tokenized
# usually tokens have '</w>' to indicate end-of-word,
# but for readability it has been replaced with ' '
def log_tokenization(text, model, log=False):
if not log:
return
tokens = model.cond_stage_model.tokenizer._tokenize(text)
tokenized = ""
discarded = ""
usedTokens = 0
totalTokens = len(tokens)
for i in range(0, totalTokens):
token = tokens[i].replace('</w>', ' ')
# alternate color
s = (usedTokens % 6) + 1
if i < model.cond_stage_model.max_length:
tokenized = tokenized + f"\x1b[0;3{s};40m{token}"
usedTokens += 1
else: # over max token length
discarded = discarded + f"\x1b[0;3{s};40m{token}"
print(f"\n>> Tokens ({usedTokens}):\n{tokenized}\x1b[0m")
if discarded != "":
print(
f">> Tokens Discarded ({totalTokens-usedTokens}):\n{discarded}\x1b[0m"
)

20
ldm/dream/devices.py Normal file
View File

@@ -0,0 +1,20 @@
import torch
from torch import autocast
from contextlib import contextmanager, nullcontext
def choose_torch_device() -> str:
'''Convenience routine for guessing which GPU device to run model on'''
if torch.cuda.is_available():
return 'cuda'
if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
return 'mps'
return 'cpu'
def choose_autocast_device(device):
'''Returns an autocast compatible device from a torch device'''
device_type = device.type # this returns 'mps' on M1
# autocast only supports cuda or cpu
if device_type in ('cuda','cpu'):
return device_type,autocast
else:
return 'cpu',nullcontext

View File

@@ -0,0 +1,4 @@
'''
Initialization file for the ldm.dream.generator package
'''
from .base import Generator

158
ldm/dream/generator/base.py Normal file
View File

@@ -0,0 +1,158 @@
'''
Base class for ldm.dream.generator.*
including img2img, txt2img, and inpaint
'''
import torch
import numpy as np
import random
from tqdm import tqdm, trange
from PIL import Image
from einops import rearrange, repeat
from pytorch_lightning import seed_everything
from ldm.dream.devices import choose_autocast_device
downsampling = 8
class Generator():
def __init__(self,model):
self.model = model
self.seed = None
self.latent_channels = model.channels
self.downsampling_factor = downsampling # BUG: should come from model or config
self.variation_amount = 0
self.with_variations = []
# this is going to be overridden in img2img.py, txt2img.py and inpaint.py
def get_make_image(self,prompt,**kwargs):
"""
Returns a function returning an image derived from the prompt and the initial image
Return value depends on the seed at the time you call it
"""
raise NotImplementedError("image_iterator() must be implemented in a descendent class")
def set_variation(self, seed, variation_amount, with_variations):
self.seed = seed
self.variation_amount = variation_amount
self.with_variations = with_variations
def generate(self,prompt,init_image,width,height,iterations=1,seed=None,
image_callback=None, step_callback=None,
**kwargs):
device_type,scope = choose_autocast_device(self.model.device)
make_image = self.get_make_image(
prompt,
init_image = init_image,
width = width,
height = height,
step_callback = step_callback,
**kwargs
)
results = []
seed = seed if seed else self.new_seed()
seed, initial_noise = self.generate_initial_noise(seed, width, height)
with scope(device_type), self.model.ema_scope():
for n in trange(iterations, desc='Generating'):
x_T = None
if self.variation_amount > 0:
seed_everything(seed)
target_noise = self.get_noise(width,height)
x_T = self.slerp(self.variation_amount, initial_noise, target_noise)
elif initial_noise is not None:
# i.e. we specified particular variations
x_T = initial_noise
else:
seed_everything(seed)
if self.model.device.type == 'mps':
x_T = self.get_noise(width,height)
# make_image will do the equivalent of get_noise itself
image = make_image(x_T)
results.append([image, seed])
if image_callback is not None:
image_callback(image, seed)
seed = self.new_seed()
return results
def sample_to_image(self,samples):
"""
Returns a function returning an image derived from the prompt and the initial image
Return value depends on the seed at the time you call it
"""
x_samples = self.model.decode_first_stage(samples)
x_samples = torch.clamp((x_samples + 1.0) / 2.0, min=0.0, max=1.0)
if len(x_samples) != 1:
raise Exception(
f'>> expected to get a single image, but got {len(x_samples)}')
x_sample = 255.0 * rearrange(
x_samples[0].cpu().numpy(), 'c h w -> h w c'
)
return Image.fromarray(x_sample.astype(np.uint8))
def generate_initial_noise(self, seed, width, height):
initial_noise = None
if self.variation_amount > 0 or len(self.with_variations) > 0:
# use fixed initial noise plus random noise per iteration
seed_everything(seed)
initial_noise = self.get_noise(width,height)
for v_seed, v_weight in self.with_variations:
seed = v_seed
seed_everything(seed)
next_noise = self.get_noise(width,height)
initial_noise = self.slerp(v_weight, initial_noise, next_noise)
if self.variation_amount > 0:
random.seed() # reset RNG to an actually random state, so we can get a random seed for variations
seed = random.randrange(0,np.iinfo(np.uint32).max)
return (seed, initial_noise)
else:
return (seed, None)
# returns a tensor filled with random numbers from a normal distribution
def get_noise(self,width,height):
"""
Returns a tensor filled with random numbers, either form a normal distribution
(txt2img) or from the latent image (img2img, inpaint)
"""
raise NotImplementedError("get_noise() must be implemented in a descendent class")
def new_seed(self):
self.seed = random.randrange(0, np.iinfo(np.uint32).max)
return self.seed
def slerp(self, t, v0, v1, DOT_THRESHOLD=0.9995):
'''
Spherical linear interpolation
Args:
t (float/np.ndarray): Float value between 0.0 and 1.0
v0 (np.ndarray): Starting vector
v1 (np.ndarray): Final vector
DOT_THRESHOLD (float): Threshold for considering the two vectors as
colineal. Not recommended to alter this.
Returns:
v2 (np.ndarray): Interpolation vector between v0 and v1
'''
inputs_are_torch = False
if not isinstance(v0, np.ndarray):
inputs_are_torch = True
v0 = v0.detach().cpu().numpy()
if not isinstance(v1, np.ndarray):
inputs_are_torch = True
v1 = v1.detach().cpu().numpy()
dot = np.sum(v0 * v1 / (np.linalg.norm(v0) * np.linalg.norm(v1)))
if np.abs(dot) > DOT_THRESHOLD:
v2 = (1 - t) * v0 + t * v1
else:
theta_0 = np.arccos(dot)
sin_theta_0 = np.sin(theta_0)
theta_t = theta_0 * t
sin_theta_t = np.sin(theta_t)
s0 = np.sin(theta_0 - theta_t) / sin_theta_0
s1 = sin_theta_t / sin_theta_0
v2 = s0 * v0 + s1 * v1
if inputs_are_torch:
v2 = torch.from_numpy(v2).to(self.model.device)
return v2

View File

@@ -0,0 +1,72 @@
'''
ldm.dream.generator.txt2img descends from ldm.dream.generator
'''
import torch
import numpy as np
from ldm.dream.devices import choose_autocast_device
from ldm.dream.generator.base import Generator
from ldm.models.diffusion.ddim import DDIMSampler
class Img2Img(Generator):
def __init__(self,model):
super().__init__(model)
self.init_latent = None # by get_noise()
@torch.no_grad()
def get_make_image(self,prompt,sampler,steps,cfg_scale,ddim_eta,
conditioning,init_image,strength,step_callback=None,**kwargs):
"""
Returns a function returning an image derived from the prompt and the initial image
Return value depends on the seed at the time you call it.
"""
# PLMS sampler not supported yet, so ignore previous sampler
if not isinstance(sampler,DDIMSampler):
print(
f">> sampler '{sampler.__class__.__name__}' is not yet supported. Using DDIM sampler"
)
sampler = DDIMSampler(self.model, device=self.model.device)
sampler.make_schedule(
ddim_num_steps=steps, ddim_eta=ddim_eta, verbose=False
)
device_type,scope = choose_autocast_device(self.model.device)
with scope(device_type):
self.init_latent = self.model.get_first_stage_encoding(
self.model.encode_first_stage(init_image)
) # move to latent space
t_enc = int(strength * steps)
uc, c = conditioning
@torch.no_grad()
def make_image(x_T):
# encode (scaled latent)
z_enc = sampler.stochastic_encode(
self.init_latent,
torch.tensor([t_enc]).to(self.model.device),
noise=x_T
)
# decode it
samples = sampler.decode(
z_enc,
c,
t_enc,
img_callback = step_callback,
unconditional_guidance_scale=cfg_scale,
unconditional_conditioning=uc,
)
return self.sample_to_image(samples)
return make_image
def get_noise(self,width,height):
device = self.model.device
init_latent = self.init_latent
assert init_latent is not None,'call to get_noise() when init_latent not set'
if device.type == 'mps':
return torch.randn_like(init_latent, device='cpu').to(device)
else:
return torch.randn_like(init_latent, device=device)

View File

@@ -0,0 +1,77 @@
'''
ldm.dream.generator.inpaint descends from ldm.dream.generator
'''
import torch
import numpy as np
from einops import rearrange, repeat
from ldm.dream.devices import choose_autocast_device
from ldm.dream.generator.img2img import Img2Img
from ldm.models.diffusion.ddim import DDIMSampler
class Inpaint(Img2Img):
def __init__(self,model):
self.init_latent = None
super().__init__(model)
@torch.no_grad()
def get_make_image(self,prompt,sampler,steps,cfg_scale,ddim_eta,
conditioning,init_image,mask_image,strength,
step_callback=None,**kwargs):
"""
Returns a function returning an image derived from the prompt and
the initial image + mask. Return value depends on the seed at
the time you call it. kwargs are 'init_latent' and 'strength'
"""
mask_image = mask_image[0][0].unsqueeze(0).repeat(4,1,1).unsqueeze(0)
mask_image = repeat(mask_image, '1 ... -> b ...', b=1)
# PLMS sampler not supported yet, so ignore previous sampler
if not isinstance(sampler,DDIMSampler):
print(
f">> sampler '{sampler.__class__.__name__}' is not yet supported. Using DDIM sampler"
)
sampler = DDIMSampler(self.model, device=self.model.device)
sampler.make_schedule(
ddim_num_steps=steps, ddim_eta=ddim_eta, verbose=False
)
device_type,scope = choose_autocast_device(self.model.device)
with scope(device_type):
self.init_latent = self.model.get_first_stage_encoding(
self.model.encode_first_stage(init_image)
) # move to latent space
t_enc = int(strength * steps)
uc, c = conditioning
print(f">> target t_enc is {t_enc} steps")
@torch.no_grad()
def make_image(x_T):
# encode (scaled latent)
z_enc = sampler.stochastic_encode(
self.init_latent,
torch.tensor([t_enc]).to(self.model.device),
noise=x_T
)
# decode it
samples = sampler.decode(
z_enc,
c,
t_enc,
img_callback = step_callback,
unconditional_guidance_scale = cfg_scale,
unconditional_conditioning = uc,
mask = mask_image,
init_latent = self.init_latent
)
return self.sample_to_image(samples)
return make_image

View File

@@ -0,0 +1,61 @@
'''
ldm.dream.generator.txt2img inherits from ldm.dream.generator
'''
import torch
import numpy as np
from ldm.dream.generator.base import Generator
class Txt2Img(Generator):
def __init__(self,model):
super().__init__(model)
@torch.no_grad()
def get_make_image(self,prompt,sampler,steps,cfg_scale,ddim_eta,
conditioning,width,height,step_callback=None,**kwargs):
"""
Returns a function returning an image derived from the prompt and the initial image
Return value depends on the seed at the time you call it
kwargs are 'width' and 'height'
"""
uc, c = conditioning
@torch.no_grad()
def make_image(x_T):
shape = [
self.latent_channels,
height // self.downsampling_factor,
width // self.downsampling_factor,
]
samples, _ = sampler.sample(
batch_size = 1,
S = steps,
x_T = x_T,
conditioning = c,
shape = shape,
verbose = False,
unconditional_guidance_scale = cfg_scale,
unconditional_conditioning = uc,
eta = ddim_eta,
img_callback = step_callback
)
return self.sample_to_image(samples)
return make_image
# returns a tensor filled with random numbers from a normal distribution
def get_noise(self,width,height):
device = self.model.device
if device.type == 'mps':
return torch.randn([1,
self.latent_channels,
height // self.downsampling_factor,
width // self.downsampling_factor],
device='cpu').to(device)
else:
return torch.randn([1,
self.latent_channels,
height // self.downsampling_factor,
width // self.downsampling_factor],
device=device)

70
ldm/dream/image_util.py Normal file
View File

@@ -0,0 +1,70 @@
from math import sqrt, floor, ceil
from PIL import Image
class InitImageResizer():
"""Simple class to create resized copies of an Image while preserving the aspect ratio."""
def __init__(self,Image):
self.image = Image
def resize(self,width=None,height=None) -> Image:
"""
Return a copy of the image resized to fit within
a box width x height. The aspect ratio is
maintained. If neither width nor height are provided,
then returns a copy of the original image. If one or the other is
provided, then the other will be calculated from the
aspect ratio.
Everything is floored to the nearest multiple of 64 so
that it can be passed to img2img()
"""
im = self.image
ar = im.width/float(im.height)
# Infer missing values from aspect ratio
if not(width or height): # both missing
width = im.width
height = im.height
elif not height: # height missing
height = int(width/ar)
elif not width: # width missing
width = int(height*ar)
# rw and rh are the resizing width and height for the image
# they maintain the aspect ratio, but may not completelyl fill up
# the requested destination size
(rw,rh) = (width,int(width/ar)) if im.width>=im.height else (int(height*ar),height)
#round everything to multiples of 64
width,height,rw,rh = map(
lambda x: x-x%64, (width,height,rw,rh)
)
# no resize necessary, but return a copy
if im.width == width and im.height == height:
return im.copy()
# otherwise resize the original image so that it fits inside the bounding box
resized_image = self.image.resize((rw,rh),resample=Image.Resampling.LANCZOS)
return resized_image
def make_grid(image_list, rows=None, cols=None):
image_cnt = len(image_list)
if None in (rows, cols):
rows = floor(sqrt(image_cnt)) # try to make it square
cols = ceil(image_cnt / rows)
width = image_list[0].width
height = image_list[0].height
grid_img = Image.new('RGB', (width * cols, height * rows))
i = 0
for r in range(0, rows):
for c in range(0, cols):
if i >= len(image_list):
break
grid_img.paste(image_list[i], (c * width, r * height))
i = i + 1
return grid_img

81
ldm/dream/pngwriter.py Normal file
View File

@@ -0,0 +1,81 @@
"""
Two helper classes for dealing with PNG images and their path names.
PngWriter -- Converts Images generated by T2I into PNGs, finds
appropriate names for them, and writes prompt metadata
into the PNG.
PromptFormatter -- Utility for converting a Namespace of prompt parameters
back into a formatted prompt string with command-line switches.
"""
import os
import re
from PIL import PngImagePlugin
# -------------------image generation utils-----
class PngWriter:
def __init__(self, outdir):
self.outdir = outdir
os.makedirs(outdir, exist_ok=True)
# gives the next unique prefix in outdir
def unique_prefix(self):
# sort reverse alphabetically until we find max+1
dirlist = sorted(os.listdir(self.outdir), reverse=True)
# find the first filename that matches our pattern or return 000000.0.png
existing_name = next(
(f for f in dirlist if re.match('^(\d+)\..*\.png', f)),
'0000000.0.png',
)
basecount = int(existing_name.split('.', 1)[0]) + 1
return f'{basecount:06}'
# saves image named _image_ to outdir/name, writing metadata from prompt
# returns full path of output
def save_image_and_prompt_to_png(self, image, prompt, name):
path = os.path.join(self.outdir, name)
info = PngImagePlugin.PngInfo()
info.add_text('Dream', prompt)
image.save(path, 'PNG', pnginfo=info)
return path
class PromptFormatter:
def __init__(self, t2i, opt):
self.t2i = t2i
self.opt = opt
# note: the t2i object should provide all these values.
# there should be no need to or against opt values
def normalize_prompt(self):
"""Normalize the prompt and switches"""
t2i = self.t2i
opt = self.opt
switches = list()
switches.append(f'"{opt.prompt}"')
switches.append(f'-s{opt.steps or t2i.steps}')
switches.append(f'-W{opt.width or t2i.width}')
switches.append(f'-H{opt.height or t2i.height}')
switches.append(f'-C{opt.cfg_scale or t2i.cfg_scale}')
switches.append(f'-A{opt.sampler_name or t2i.sampler_name}')
# to do: put model name into the t2i object
# switches.append(f'--model{t2i.model_name}')
if opt.seamless or t2i.seamless:
switches.append(f'--seamless')
if opt.init_img:
switches.append(f'-I{opt.init_img}')
if opt.fit:
switches.append(f'--fit')
if opt.strength and opt.init_img is not None:
switches.append(f'-f{opt.strength or t2i.strength}')
if opt.gfpgan_strength:
switches.append(f'-G{opt.gfpgan_strength}')
if opt.upscale:
switches.append(f'-U {" ".join([str(u) for u in opt.upscale])}')
if opt.variation_amount > 0:
switches.append(f'-v{opt.variation_amount}')
if opt.with_variations:
formatted_variations = ','.join(f'{seed}:{weight}' for seed, weight in opt.with_variations)
switches.append(f'-V{formatted_variations}')
return ' '.join(switches)

127
ldm/dream/readline.py Normal file
View File

@@ -0,0 +1,127 @@
"""
Readline helper functions for dream.py (linux and mac only).
"""
import os
import re
import atexit
# ---------------readline utilities---------------------
try:
import readline
readline_available = True
except:
readline_available = False
class Completer:
def __init__(self, options):
self.options = sorted(options)
return
def complete(self, text, state):
buffer = readline.get_line_buffer()
if text.startswith(('-I', '--init_img','-M','--init_mask')):
return self._path_completions(text, state, ('.png','.jpg','.jpeg'))
if buffer.strip().endswith('cd') or text.startswith(('.', '/')):
return self._path_completions(text, state, ())
response = None
if state == 0:
# This is the first time for this text, so build a match list.
if text:
self.matches = [
s for s in self.options if s and s.startswith(text)
]
else:
self.matches = self.options[:]
# Return the state'th item from the match list,
# if we have that many.
try:
response = self.matches[state]
except IndexError:
response = None
return response
def _path_completions(self, text, state, extensions):
# get the path so far
# TODO: replace this mess with a regular expression match
if text.startswith('-I'):
path = text.replace('-I', '', 1).lstrip()
elif text.startswith('--init_img='):
path = text.replace('--init_img=', '', 1).lstrip()
elif text.startswith('--init_mask='):
path = text.replace('--init_mask=', '', 1).lstrip()
elif text.startswith('-M'):
path = text.replace('-M', '', 1).lstrip()
else:
path = text
matches = list()
path = os.path.expanduser(path)
if len(path) == 0:
matches.append(text + './')
else:
dir = os.path.dirname(path)
dir_list = os.listdir(dir)
for n in dir_list:
if n.startswith('.') and len(n) > 1:
continue
full_path = os.path.join(dir, n)
if full_path.startswith(path):
if os.path.isdir(full_path):
matches.append(
os.path.join(os.path.dirname(text), n) + '/'
)
elif n.endswith(extensions):
matches.append(os.path.join(os.path.dirname(text), n))
try:
response = matches[state]
except IndexError:
response = None
return response
if readline_available:
readline.set_completer(
Completer(
[
'--steps','-s',
'--seed','-S',
'--iterations','-n',
'--width','-W','--height','-H',
'--cfg_scale','-C',
'--grid','-g',
'--individual','-i',
'--init_img','-I',
'--init_mask','-M',
'--strength','-f',
'--variants','-v',
'--outdir','-o',
'--sampler','-A','-m',
'--embedding_path',
'--device',
'--grid','-g',
'--gfpgan_strength','-G',
'--upscale','-U',
'-save_orig','--save_original',
'--skip_normalize','-x',
'--log_tokenization','t',
]
).complete
)
readline.set_completer_delims(' ')
readline.parse_and_bind('tab: complete')
histfile = os.path.join(os.path.expanduser('~'), '.dream_history')
try:
readline.read_history_file(histfile)
readline.set_history_length(1000)
except FileNotFoundError:
pass
atexit.register(readline.write_history_file, histfile)

243
ldm/dream/server.py Normal file
View File

@@ -0,0 +1,243 @@
import argparse
import json
import base64
import mimetypes
import os
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
from ldm.dream.pngwriter import PngWriter, PromptFormatter
from threading import Event
def build_opt(post_data, seed, gfpgan_model_exists):
opt = argparse.Namespace()
setattr(opt, 'prompt', post_data['prompt'])
setattr(opt, 'init_img', post_data['initimg'])
setattr(opt, 'strength', float(post_data['strength']))
setattr(opt, 'iterations', int(post_data['iterations']))
setattr(opt, 'steps', int(post_data['steps']))
setattr(opt, 'width', int(post_data['width']))
setattr(opt, 'height', int(post_data['height']))
setattr(opt, 'seamless', 'seamless' in post_data)
setattr(opt, 'fit', 'fit' in post_data)
setattr(opt, 'mask', 'mask' in post_data)
setattr(opt, 'invert_mask', 'invert_mask' in post_data)
setattr(opt, 'cfg_scale', float(post_data['cfg_scale']))
setattr(opt, 'sampler_name', post_data['sampler_name'])
setattr(opt, 'gfpgan_strength', float(post_data['gfpgan_strength']) if gfpgan_model_exists else 0)
setattr(opt, 'upscale', [int(post_data['upscale_level']), float(post_data['upscale_strength'])] if post_data['upscale_level'] != '' else None)
setattr(opt, 'progress_images', 'progress_images' in post_data)
setattr(opt, 'seed', None if int(post_data['seed']) == -1 else int(post_data['seed']))
setattr(opt, 'variation_amount', float(post_data['variation_amount']) if int(post_data['seed']) != -1 else 0)
setattr(opt, 'with_variations', [])
broken = False
if int(post_data['seed']) != -1 and post_data['with_variations'] != '':
for part in post_data['with_variations'].split(','):
seed_and_weight = part.split(':')
if len(seed_and_weight) != 2:
print(f'could not parse with_variation part "{part}"')
broken = True
break
try:
seed = int(seed_and_weight[0])
weight = float(seed_and_weight[1])
except ValueError:
print(f'could not parse with_variation part "{part}"')
broken = True
break
opt.with_variations.append([seed, weight])
if broken:
raise CanceledException
if len(opt.with_variations) == 0:
opt.with_variations = None
return opt
class CanceledException(Exception):
pass
class DreamServer(BaseHTTPRequestHandler):
model = None
outdir = None
canceled = Event()
def do_GET(self):
if self.path == "/":
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
with open("./static/dream_web/index.html", "rb") as content:
self.wfile.write(content.read())
elif self.path == "/config.js":
# unfortunately this import can't be at the top level, since that would cause a circular import
from ldm.gfpgan.gfpgan_tools import gfpgan_model_exists
self.send_response(200)
self.send_header("Content-type", "application/javascript")
self.end_headers()
config = {
'gfpgan_model_exists': gfpgan_model_exists
}
self.wfile.write(bytes("let config = " + json.dumps(config) + ";\n", "utf-8"))
elif self.path == "/run_log.json":
self.send_response(200)
self.send_header("Content-type", "application/json")
self.end_headers()
output = []
log_file = os.path.join(self.outdir, "dream_web_log.txt")
if os.path.exists(log_file):
with open(log_file, "r") as log:
for line in log:
url, config = line.split(": {", maxsplit=1)
config = json.loads("{" + config)
config["url"] = url.lstrip(".")
if os.path.exists(url):
output.append(config)
self.wfile.write(bytes(json.dumps({"run_log": output}), "utf-8"))
elif self.path == "/cancel":
self.canceled.set()
self.send_response(200)
self.send_header("Content-type", "application/json")
self.end_headers()
self.wfile.write(bytes('{}', 'utf8'))
else:
path = "." + self.path
cwd = os.path.realpath(os.getcwd())
is_in_cwd = os.path.commonprefix((os.path.realpath(path), cwd)) == cwd
if not (is_in_cwd and os.path.exists(path)):
self.send_response(404)
return
mime_type = mimetypes.guess_type(path)[0]
if mime_type is not None:
self.send_response(200)
self.send_header("Content-type", mime_type)
self.end_headers()
with open("." + self.path, "rb") as content:
self.wfile.write(content.read())
else:
self.send_response(404)
def do_POST(self):
self.send_response(200)
self.send_header("Content-type", "application/json")
self.end_headers()
# unfortunately this import can't be at the top level, since that would cause a circular import
from ldm.gfpgan.gfpgan_tools import gfpgan_model_exists
content_length = int(self.headers['Content-Length'])
post_data = json.loads(self.rfile.read(content_length))
opt = build_opt(post_data, self.model.seed, gfpgan_model_exists)
self.canceled.clear()
print(f">> Request to generate with prompt: {opt.prompt}")
# In order to handle upscaled images, the PngWriter needs to maintain state
# across images generated by each call to prompt2img(), so we define it in
# the outer scope of image_done()
config = post_data.copy() # Shallow copy
config['initimg'] = config.pop('initimg_name', '')
images_generated = 0 # helps keep track of when upscaling is started
images_upscaled = 0 # helps keep track of when upscaling is completed
pngwriter = PngWriter(self.outdir)
prefix = pngwriter.unique_prefix()
# if upscaling is requested, then this will be called twice, once when
# the images are first generated, and then again when after upscaling
# is complete. The upscaling replaces the original file, so the second
# entry should not be inserted into the image list.
def image_done(image, seed, upscaled=False):
name = f'{prefix}.{seed}.png'
iter_opt = argparse.Namespace(**vars(opt)) # copy
if opt.variation_amount > 0:
this_variation = [[seed, opt.variation_amount]]
if opt.with_variations is None:
iter_opt.with_variations = this_variation
else:
iter_opt.with_variations = opt.with_variations + this_variation
iter_opt.variation_amount = 0
elif opt.with_variations is None:
iter_opt.seed = seed
normalized_prompt = PromptFormatter(self.model, iter_opt).normalize_prompt()
path = pngwriter.save_image_and_prompt_to_png(image, f'{normalized_prompt} -S{iter_opt.seed}', name)
if int(config['seed']) == -1:
config['seed'] = seed
# Append post_data to log, but only once!
if not upscaled:
with open(os.path.join(self.outdir, "dream_web_log.txt"), "a") as log:
log.write(f"{path}: {json.dumps(config)}\n")
self.wfile.write(bytes(json.dumps(
{'event': 'result', 'url': path, 'seed': seed, 'config': config}
) + '\n',"utf-8"))
# control state of the "postprocessing..." message
upscaling_requested = opt.upscale or opt.gfpgan_strength > 0
nonlocal images_generated # NB: Is this bad python style? It is typical usage in a perl closure.
nonlocal images_upscaled # NB: Is this bad python style? It is typical usage in a perl closure.
if upscaled:
images_upscaled += 1
else:
images_generated += 1
if upscaling_requested:
action = None
if images_generated >= opt.iterations:
if images_upscaled < opt.iterations:
action = 'upscaling-started'
else:
action = 'upscaling-done'
if action:
x = images_upscaled + 1
self.wfile.write(bytes(json.dumps(
{'event': action, 'processed_file_cnt': f'{x}/{opt.iterations}'}
) + '\n',"utf-8"))
step_writer = PngWriter(os.path.join(self.outdir, "intermediates"))
step_index = 1
def image_progress(sample, step):
if self.canceled.is_set():
self.wfile.write(bytes(json.dumps({'event':'canceled'}) + '\n', 'utf-8'))
raise CanceledException
path = None
# since rendering images is moderately expensive, only render every 5th image
# and don't bother with the last one, since it'll render anyway
nonlocal step_index
if opt.progress_images and step % 5 == 0 and step < opt.steps - 1:
image = self.model.sample_to_image(sample)
name = f'{prefix}.{opt.seed}.{step_index}.png'
metadata = f'{opt.prompt} -S{opt.seed} [intermediate]'
path = step_writer.save_image_and_prompt_to_png(image, metadata, name)
step_index += 1
self.wfile.write(bytes(json.dumps(
{'event': 'step', 'step': step + 1, 'url': path}
) + '\n',"utf-8"))
try:
if opt.init_img is None:
# Run txt2img
self.model.prompt2image(**vars(opt), step_callback=image_progress, image_callback=image_done)
else:
# Decode initimg as base64 to temp file
with open("./img2img-tmp.png", "wb") as f:
initimg = opt.init_img.split(",")[1] # Ignore mime type
f.write(base64.b64decode(initimg))
opt1 = argparse.Namespace(**vars(opt))
opt1.init_img = "./img2img-tmp.png"
try:
# Run img2img
self.model.prompt2image(**vars(opt1), step_callback=image_progress, image_callback=image_done)
finally:
# Remove the temp file
os.remove("./img2img-tmp.png")
except CanceledException:
print(f"Canceled.")
return
class ThreadingDreamServer(ThreadingHTTPServer):
def __init__(self, server_address):
super(ThreadingDreamServer, self).__init__(server_address, DreamServer)

695
ldm/generate.py Normal file
View File

@@ -0,0 +1,695 @@
# Copyright (c) 2022 Lincoln D. Stein (https://github.com/lstein)
# Derived from source code carrying the following copyrights
# Copyright (c) 2022 Machine Vision and Learning Group, LMU Munich
# Copyright (c) 2022 Robin Rombach and Patrick Esser and contributors
import torch
import numpy as np
import random
import os
import time
import re
import sys
import traceback
import transformers
from omegaconf import OmegaConf
from PIL import Image, ImageOps
from torch import nn
from pytorch_lightning import seed_everything
from ldm.util import instantiate_from_config
from ldm.models.diffusion.ddim import DDIMSampler
from ldm.models.diffusion.plms import PLMSSampler
from ldm.models.diffusion.ksampler import KSampler
from ldm.dream.pngwriter import PngWriter
from ldm.dream.image_util import InitImageResizer
from ldm.dream.devices import choose_torch_device
from ldm.dream.conditioning import get_uc_and_c
"""Simplified text to image API for stable diffusion/latent diffusion
Example Usage:
from ldm.generate import Generate
# Create an object with default values
gr = Generate()
# do the slow model initialization
gr.load_model()
# Do the fast inference & image generation. Any options passed here
# override the default values assigned during class initialization
# Will call load_model() if the model was not previously loaded and so
# may be slow at first.
# The method returns a list of images. Each row of the list is a sub-list of [filename,seed]
results = gr.prompt2png(prompt = "an astronaut riding a horse",
outdir = "./outputs/samples",
iterations = 3)
for row in results:
print(f'filename={row[0]}')
print(f'seed ={row[1]}')
# Same thing, but using an initial image.
results = gr.prompt2png(prompt = "an astronaut riding a horse",
outdir = "./outputs/,
iterations = 3,
init_img = "./sketches/horse+rider.png")
for row in results:
print(f'filename={row[0]}')
print(f'seed ={row[1]}')
# Same thing, but we return a series of Image objects, which lets you manipulate them,
# combine them, and save them under arbitrary names
results = gr.prompt2image(prompt = "an astronaut riding a horse"
outdir = "./outputs/")
for row in results:
im = row[0]
seed = row[1]
im.save(f'./outputs/samples/an_astronaut_riding_a_horse-{seed}.png')
im.thumbnail(100,100).save('./outputs/samples/astronaut_thumb.jpg')
Note that the old txt2img() and img2img() calls are deprecated but will
still work.
The full list of arguments to Generate() are:
gr = Generate(
weights = path to model weights ('models/ldm/stable-diffusion-v1/model.ckpt')
config = path to model configuraiton ('configs/stable-diffusion/v1-inference.yaml')
iterations = <integer> // how many times to run the sampling (1)
steps = <integer> // 50
seed = <integer> // current system time
sampler_name= ['ddim', 'k_dpm_2_a', 'k_dpm_2', 'k_euler_a', 'k_euler', 'k_heun', 'k_lms', 'plms'] // k_lms
grid = <boolean> // false
width = <integer> // image width, multiple of 64 (512)
height = <integer> // image height, multiple of 64 (512)
cfg_scale = <float> // condition-free guidance scale (7.5)
)
"""
class Generate:
"""Generate class
Stores default values for multiple configuration items
"""
def __init__(
self,
iterations = 1,
steps = 50,
cfg_scale = 7.5,
weights = 'models/ldm/stable-diffusion-v1/model.ckpt',
config = 'configs/stable-diffusion/v1-inference.yaml',
grid = False,
width = 512,
height = 512,
sampler_name = 'k_lms',
ddim_eta = 0.0, # deterministic
precision = 'autocast',
full_precision = False,
strength = 0.75, # default in scripts/img2img.py
seamless = False,
embedding_path = None,
device_type = 'cuda',
ignore_ctrl_c = False,
):
self.iterations = iterations
self.width = width
self.height = height
self.steps = steps
self.cfg_scale = cfg_scale
self.weights = weights
self.config = config
self.sampler_name = sampler_name
self.grid = grid
self.ddim_eta = ddim_eta
self.precision = precision
self.full_precision = True if choose_torch_device() == 'mps' else full_precision
self.strength = strength
self.seamless = seamless
self.embedding_path = embedding_path
self.device_type = device_type
self.ignore_ctrl_c = ignore_ctrl_c # note, this logic probably doesn't belong here...
self.model = None # empty for now
self.sampler = None
self.device = None
self.generators = {}
self.base_generator = None
self.seed = None
if device_type == 'cuda' and not torch.cuda.is_available():
device_type = choose_torch_device()
print(">> cuda not available, using device", device_type)
self.device = torch.device(device_type)
# for VRAM usage statistics
device_type = choose_torch_device()
self.session_peakmem = torch.cuda.max_memory_allocated() if device_type == 'cuda' else None
transformers.logging.set_verbosity_error()
def prompt2png(self, prompt, outdir, **kwargs):
"""
Takes a prompt and an output directory, writes out the requested number
of PNG files, and returns an array of [[filename,seed],[filename,seed]...]
Optional named arguments are the same as those passed to Generate and prompt2image()
"""
results = self.prompt2image(prompt, **kwargs)
pngwriter = PngWriter(outdir)
prefix = pngwriter.unique_prefix()
outputs = []
for image, seed in results:
name = f'{prefix}.{seed}.png'
path = pngwriter.save_image_and_prompt_to_png(
image, f'{prompt} -S{seed}', name)
outputs.append([path, seed])
return outputs
def txt2img(self, prompt, **kwargs):
outdir = kwargs.pop('outdir', 'outputs/img-samples')
return self.prompt2png(prompt, outdir, **kwargs)
def img2img(self, prompt, **kwargs):
outdir = kwargs.pop('outdir', 'outputs/img-samples')
assert (
'init_img' in kwargs
), 'call to img2img() must include the init_img argument'
return self.prompt2png(prompt, outdir, **kwargs)
def prompt2image(
self,
# these are common
prompt,
iterations = None,
steps = None,
seed = None,
cfg_scale = None,
ddim_eta = None,
skip_normalize = False,
image_callback = None,
step_callback = None,
width = None,
height = None,
sampler_name = None,
seamless = False,
log_tokenization= False,
with_variations = None,
variation_amount = 0.0,
# these are specific to img2img and inpaint
init_img = None,
init_mask = None,
fit = False,
strength = None,
# these are specific to GFPGAN/ESRGAN
gfpgan_strength= 0,
save_original = False,
upscale = None,
**args,
): # eat up additional cruft
"""
ldm.generate.prompt2image() is the common entry point for txt2img() and img2img()
It takes the following arguments:
prompt // prompt string (no default)
iterations // iterations (1); image count=iterations
steps // refinement steps per iteration
seed // seed for random number generator
width // width of image, in multiples of 64 (512)
height // height of image, in multiples of 64 (512)
cfg_scale // how strongly the prompt influences the image (7.5) (must be >1)
seamless // whether the generated image should tile
init_img // path to an initial image
strength // strength for noising/unnoising init_img. 0.0 preserves image exactly, 1.0 replaces it completely
gfpgan_strength // strength for GFPGAN. 0.0 preserves image exactly, 1.0 replaces it completely
ddim_eta // image randomness (eta=0.0 means the same seed always produces the same image)
step_callback // a function or method that will be called each step
image_callback // a function or method that will be called each time an image is generated
with_variations // a weighted list [(seed_1, weight_1), (seed_2, weight_2), ...] of variations which should be applied before doing any generation
variation_amount // optional 0-1 value to slerp from -S noise to random noise (allows variations on an image)
To use the step callback, define a function that receives two arguments:
- Image GPU data
- The step number
To use the image callback, define a function of method that receives two arguments, an Image object
and the seed. You can then do whatever you like with the image, including converting it to
different formats and manipulating it. For example:
def process_image(image,seed):
image.save(f{'images/seed.png'})
The callback used by the prompt2png() can be found in ldm/dream_util.py. It contains code
to create the requested output directory, select a unique informative name for each image, and
write the prompt into the PNG metadata.
"""
# TODO: convert this into a getattr() loop
steps = steps or self.steps
width = width or self.width
height = height or self.height
seamless = seamless or self.seamless
cfg_scale = cfg_scale or self.cfg_scale
ddim_eta = ddim_eta or self.ddim_eta
iterations = iterations or self.iterations
strength = strength or self.strength
self.seed = seed
self.log_tokenization = log_tokenization
with_variations = [] if with_variations is None else with_variations
model = (
self.load_model()
) # will instantiate the model or return it from cache
for m in model.modules():
if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)):
m.padding_mode = 'circular' if seamless else m._orig_padding_mode
assert cfg_scale > 1.0, 'CFG_Scale (-C) must be >1.0'
assert (
0.0 < strength < 1.0
), 'img2img and inpaint strength can only work with 0.0 < strength < 1.0'
assert (
0.0 <= variation_amount <= 1.0
), '-v --variation_amount must be in [0.0, 1.0]'
# check this logic - doesn't look right
if len(with_variations) > 0 or variation_amount > 1.0:
assert seed is not None,\
'seed must be specified when using with_variations'
if variation_amount == 0.0:
assert iterations == 1,\
'when using --with_variations, multiple iterations are only possible when using --variation_amount'
assert all(0 <= weight <= 1 for _, weight in with_variations),\
f'variation weights must be in [0.0, 1.0]: got {[weight for _, weight in with_variations]}'
width, height, _ = self._resolution_check(width, height, log=True)
if sampler_name and (sampler_name != self.sampler_name):
self.sampler_name = sampler_name
self._set_sampler()
tic = time.time()
if torch.cuda.is_available():
torch.cuda.reset_peak_memory_stats()
results = list()
init_image = None
mask_image = None
try:
uc, c = get_uc_and_c(
prompt, model=self.model,
skip_normalize=skip_normalize,
log_tokens=self.log_tokenization
)
(init_image,mask_image) = self._make_images(init_img,init_mask, width, height, fit)
if (init_image is not None) and (mask_image is not None):
generator = self._make_inpaint()
elif init_image is not None:
generator = self._make_img2img()
else:
generator = self._make_txt2img()
generator.set_variation(self.seed, variation_amount, with_variations)
results = generator.generate(
prompt,
iterations = iterations,
seed = self.seed,
sampler = self.sampler,
steps = steps,
cfg_scale = cfg_scale,
conditioning = (uc,c),
ddim_eta = ddim_eta,
image_callback = image_callback, # called after the final image is generated
step_callback = step_callback, # called after each intermediate image is generated
width = width,
height = height,
init_image = init_image, # notice that init_image is different from init_img
mask_image = mask_image,
strength = strength,
)
if upscale is not None or gfpgan_strength > 0:
self.upscale_and_reconstruct(results,
upscale = upscale,
strength = gfpgan_strength,
save_original = save_original,
image_callback = image_callback)
except KeyboardInterrupt:
print('*interrupted*')
if not self.ignore_ctrl_c:
raise KeyboardInterrupt
print(
'>> Partial results will be returned; if --grid was requested, nothing will be returned.'
)
except RuntimeError as e:
print(traceback.format_exc(), file=sys.stderr)
print('>> Could not generate image.')
toc = time.time()
print('>> Usage stats:')
print(
f'>> {len(results)} image(s) generated in', '%4.2fs' % (toc - tic)
)
if torch.cuda.is_available() and self.device.type == 'cuda':
print(
f'>> Max VRAM used for this generation:',
'%4.2fG.' % (torch.cuda.max_memory_allocated() / 1e9),
'Current VRAM utilization:'
'%4.2fG' % (torch.cuda.memory_allocated() / 1e9),
)
self.session_peakmem = max(
self.session_peakmem, torch.cuda.max_memory_allocated()
)
print(
f'>> Max VRAM used since script start: ',
'%4.2fG' % (self.session_peakmem / 1e9),
)
return results
def _make_images(self, img_path, mask_path, width, height, fit=False):
init_image = None
init_mask = None
if not img_path:
return None,None
image = self._load_img(img_path, width, height, fit=fit) # this returns an Image
init_image = self._create_init_image(image) # this returns a torch tensor
if self._has_transparency(image) and not mask_path: # if image has a transparent area and no mask was provided, then try to generate mask
print('>> Initial image has transparent areas. Will inpaint in these regions.')
if self._check_for_erasure(image):
print(
'>> WARNING: Colors underneath the transparent region seem to have been erased.\n',
'>> Inpainting will be suboptimal. Please preserve the colors when making\n',
'>> a transparency mask, or provide mask explicitly using --init_mask (-M).'
)
init_mask = self._create_init_mask(image) # this returns a torch tensor
if mask_path:
mask_image = self._load_img(mask_path, width, height, fit=fit) # this returns an Image
init_mask = self._create_init_mask(mask_image)
return init_image,init_mask
def _make_img2img(self):
if not self.generators.get('img2img'):
from ldm.dream.generator.img2img import Img2Img
self.generators['img2img'] = Img2Img(self.model)
return self.generators['img2img']
def _make_txt2img(self):
if not self.generators.get('txt2img'):
from ldm.dream.generator.txt2img import Txt2Img
self.generators['txt2img'] = Txt2Img(self.model)
return self.generators['txt2img']
def _make_inpaint(self):
if not self.generators.get('inpaint'):
from ldm.dream.generator.inpaint import Inpaint
self.generators['inpaint'] = Inpaint(self.model)
return self.generators['inpaint']
def load_model(self):
"""Load and initialize the model from configuration variables passed at object creation time"""
if self.model is None:
seed_everything(random.randrange(0, np.iinfo(np.uint32).max))
try:
config = OmegaConf.load(self.config)
model = self._load_model_from_config(config, self.weights)
if self.embedding_path is not None:
model.embedding_manager.load(
self.embedding_path, self.full_precision
)
self.model = model.to(self.device)
# model.to doesn't change the cond_stage_model.device used to move the tokenizer output, so set it here
self.model.cond_stage_model.device = self.device
except AttributeError as e:
print(f'>> Error loading model. {str(e)}', file=sys.stderr)
print(traceback.format_exc(), file=sys.stderr)
raise SystemExit from e
self._set_sampler()
for m in self.model.modules():
if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)):
m._orig_padding_mode = m.padding_mode
return self.model
def upscale_and_reconstruct(self,
image_list,
upscale = None,
strength = 0.0,
save_original = False,
image_callback = None):
try:
if upscale is not None:
from ldm.gfpgan.gfpgan_tools import real_esrgan_upscale
if strength > 0:
from ldm.gfpgan.gfpgan_tools import run_gfpgan
except (ModuleNotFoundError, ImportError):
print(traceback.format_exc(), file=sys.stderr)
print('>> You may need to install the ESRGAN and/or GFPGAN modules')
return
for r in image_list:
image, seed = r
try:
if upscale is not None:
if len(upscale) < 2:
upscale.append(0.75)
image = real_esrgan_upscale(
image,
upscale[1],
int(upscale[0]),
seed,
)
if strength > 0:
image = run_gfpgan(
image, strength, seed, 1
)
except Exception as e:
print(
f'>> Error running RealESRGAN or GFPGAN. Your image was not upscaled.\n{e}'
)
if image_callback is not None:
image_callback(image, seed, upscaled=True)
else:
r[0] = image
# to help WebGUI - front end to generator util function
def sample_to_image(self,samples):
return self._sample_to_image(samples)
def _sample_to_image(self,samples):
if not self.base_generator:
from ldm.dream.generator import Generator
self.base_generator = Generator(self.model)
return self.base_generator.sample_to_image(samples)
def _set_sampler(self):
msg = f'>> Setting Sampler to {self.sampler_name}'
if self.sampler_name == 'plms':
self.sampler = PLMSSampler(self.model, device=self.device)
elif self.sampler_name == 'ddim':
self.sampler = DDIMSampler(self.model, device=self.device)
elif self.sampler_name == 'k_dpm_2_a':
self.sampler = KSampler(
self.model, 'dpm_2_ancestral', device=self.device
)
elif self.sampler_name == 'k_dpm_2':
self.sampler = KSampler(self.model, 'dpm_2', device=self.device)
elif self.sampler_name == 'k_euler_a':
self.sampler = KSampler(
self.model, 'euler_ancestral', device=self.device
)
elif self.sampler_name == 'k_euler':
self.sampler = KSampler(self.model, 'euler', device=self.device)
elif self.sampler_name == 'k_heun':
self.sampler = KSampler(self.model, 'heun', device=self.device)
elif self.sampler_name == 'k_lms':
self.sampler = KSampler(self.model, 'lms', device=self.device)
else:
msg = f'>> Unsupported Sampler: {self.sampler_name}, Defaulting to plms'
self.sampler = PLMSSampler(self.model, device=self.device)
print(msg)
def _load_model_from_config(self, config, ckpt):
print(f'>> Loading model from {ckpt}')
# for usage statistics
device_type = choose_torch_device()
if device_type == 'cuda':
torch.cuda.reset_peak_memory_stats()
tic = time.time()
# this does the work
pl_sd = torch.load(ckpt, map_location='cpu')
sd = pl_sd['state_dict']
model = instantiate_from_config(config.model)
m, u = model.load_state_dict(sd, strict=False)
if self.full_precision:
print(
'>> Using slower but more accurate full-precision math (--full_precision)'
)
else:
print(
'>> Using half precision math. Call with --full_precision to use more accurate but VRAM-intensive full precision.'
)
model.half()
model.to(self.device)
model.eval()
# usage statistics
toc = time.time()
print(
f'>> Model loaded in', '%4.2fs' % (toc - tic)
)
if device_type == 'cuda':
print(
'>> Max VRAM used to load the model:',
'%4.2fG' % (torch.cuda.max_memory_allocated() / 1e9),
'\n>> Current VRAM usage:'
'%4.2fG' % (torch.cuda.memory_allocated() / 1e9),
)
return model
def _load_img(self, path, width, height, fit=False):
assert os.path.exists(path), f'>> {path}: File not found'
# with Image.open(path) as img:
# image = img.convert('RGBA')
image = Image.open(path)
print(
f'>> loaded input image of size {image.width}x{image.height} from {path}'
)
if fit:
image = self._fit_image(image,(width,height))
else:
image = self._squeeze_image(image)
return image
def _create_init_image(self,image):
image = image.convert('RGB')
# print(
# f'>> DEBUG: writing the image to img.png'
# )
# image.save('img.png')
image = np.array(image).astype(np.float32) / 255.0
image = image[None].transpose(0, 3, 1, 2)
image = torch.from_numpy(image)
image = 2.0 * image - 1.0
return image.to(self.device)
def _create_init_mask(self, image):
# convert into a black/white mask
image = self._image_to_mask(image)
image = image.convert('RGB')
# BUG: We need to use the model's downsample factor rather than hardcoding "8"
from ldm.dream.generator.base import downsampling
image = image.resize((image.width//downsampling, image.height//downsampling), resample=Image.Resampling.LANCZOS)
# print(
# f'>> DEBUG: writing the mask to mask.png'
# )
# image.save('mask.png')
image = np.array(image)
image = image.astype(np.float32) / 255.0
image = image[None].transpose(0, 3, 1, 2)
image = torch.from_numpy(image)
return image.to(self.device)
# The mask is expected to have the region to be inpainted
# with alpha transparency. It converts it into a black/white
# image with the transparent part black.
def _image_to_mask(self, mask_image, invert=False) -> Image:
# Obtain the mask from the transparency channel
mask = Image.new(mode="L", size=mask_image.size, color=255)
mask.putdata(mask_image.getdata(band=3))
if invert:
mask = ImageOps.invert(mask)
return mask
def _has_transparency(self,image):
if image.info.get("transparency", None) is not None:
return True
if image.mode == "P":
transparent = image.info.get("transparency", -1)
for _, index in image.getcolors():
if index == transparent:
return True
elif image.mode == "RGBA":
extrema = image.getextrema()
if extrema[3][0] < 255:
return True
return False
def _check_for_erasure(self,image):
width, height = image.size
pixdata = image.load()
colored = 0
for y in range(height):
for x in range(width):
if pixdata[x, y][3] == 0:
r, g, b, _ = pixdata[x, y]
if (r, g, b) != (0, 0, 0) and \
(r, g, b) != (255, 255, 255):
colored += 1
return colored == 0
def _squeeze_image(self,image):
x,y,resize_needed = self._resolution_check(image.width,image.height)
if resize_needed:
return InitImageResizer(image).resize(x,y)
return image
def _fit_image(self,image,max_dimensions):
w,h = max_dimensions
print(
f'>> image will be resized to fit inside a box {w}x{h} in size.'
)
if image.width > image.height:
h = None # by setting h to none, we tell InitImageResizer to fit into the width and calculate height
elif image.height > image.width:
w = None # ditto for w
else:
pass
image = InitImageResizer(image).resize(w,h) # note that InitImageResizer does the multiple of 64 truncation internally
print(
f'>> after adjusting image dimensions to be multiples of 64, init image is {image.width}x{image.height}'
)
return image
def _resolution_check(self, width, height, log=False):
resize_needed = False
w, h = map(
lambda x: x - x % 64, (width, height)
) # resize to integer multiple of 64
if h != height or w != width:
if log:
print(
f'>> Provided width and height must be multiples of 64. Auto-resizing to {w}x{h}'
)
height = h
width = w
resize_needed = True
if (width * height) > (self.width * self.height):
print(">> This input is larger than your defaults. If you run out of memory, please use a smaller image.")
return width, height, resize_needed

167
ldm/gfpgan/gfpgan_tools.py Normal file
View File

@@ -0,0 +1,167 @@
import torch
import warnings
import os
import sys
import numpy as np
from PIL import Image
from scripts.dream import create_argv_parser
arg_parser = create_argv_parser()
opt = arg_parser.parse_args()
model_path = os.path.join(opt.gfpgan_dir, opt.gfpgan_model_path)
gfpgan_model_exists = os.path.isfile(model_path)
def run_gfpgan(image, strength, seed, upsampler_scale=4):
print(f'>> GFPGAN - Restoring Faces for image seed:{seed}')
gfpgan = None
with warnings.catch_warnings():
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=UserWarning)
try:
if not gfpgan_model_exists:
raise Exception('GFPGAN model not found at path ' + model_path)
sys.path.append(os.path.abspath(opt.gfpgan_dir))
from gfpgan import GFPGANer
bg_upsampler = _load_gfpgan_bg_upsampler(
opt.gfpgan_bg_upsampler, upsampler_scale, opt.gfpgan_bg_tile
)
gfpgan = GFPGANer(
model_path=model_path,
upscale=upsampler_scale,
arch='clean',
channel_multiplier=2,
bg_upsampler=bg_upsampler,
)
except Exception:
import traceback
print('>> Error loading GFPGAN:', file=sys.stderr)
print(traceback.format_exc(), file=sys.stderr)
if gfpgan is None:
print(
f'>> WARNING: GFPGAN not initialized.'
)
print(
f'>> Download https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth to {model_path}, \nor change GFPGAN directory with --gfpgan_dir.'
)
return image
image = image.convert('RGB')
cropped_faces, restored_faces, restored_img = gfpgan.enhance(
np.array(image, dtype=np.uint8),
has_aligned=False,
only_center_face=False,
paste_back=True,
)
res = Image.fromarray(restored_img)
if strength < 1.0:
# Resize the image to the new image if the sizes have changed
if restored_img.size != image.size:
image = image.resize(res.size)
res = Image.blend(image, res, strength)
if torch.cuda.is_available():
torch.cuda.empty_cache()
gfpgan = None
return res
def _load_gfpgan_bg_upsampler(bg_upsampler, upsampler_scale, bg_tile=400):
if bg_upsampler == 'realesrgan':
if not torch.cuda.is_available(): # CPU or MPS on M1
use_half_precision = False
else:
use_half_precision = True
model_path = {
2: 'https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth',
4: 'https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth',
}
if upsampler_scale not in model_path:
return None
from basicsr.archs.rrdbnet_arch import RRDBNet
from realesrgan import RealESRGANer
if upsampler_scale == 4:
model = RRDBNet(
num_in_ch=3,
num_out_ch=3,
num_feat=64,
num_block=23,
num_grow_ch=32,
scale=4,
)
if upsampler_scale == 2:
model = RRDBNet(
num_in_ch=3,
num_out_ch=3,
num_feat=64,
num_block=23,
num_grow_ch=32,
scale=2,
)
bg_upsampler = RealESRGANer(
scale=upsampler_scale,
model_path=model_path[upsampler_scale],
model=model,
tile=bg_tile,
tile_pad=10,
pre_pad=0,
half=use_half_precision,
)
else:
bg_upsampler = None
return bg_upsampler
def real_esrgan_upscale(image, strength, upsampler_scale, seed):
print(
f'>> Real-ESRGAN Upscaling seed:{seed} : scale:{upsampler_scale}x'
)
with warnings.catch_warnings():
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=UserWarning)
try:
upsampler = _load_gfpgan_bg_upsampler(
opt.gfpgan_bg_upsampler, upsampler_scale, opt.gfpgan_bg_tile
)
except Exception:
import traceback
print('>> Error loading Real-ESRGAN:', file=sys.stderr)
print(traceback.format_exc(), file=sys.stderr)
output, img_mode = upsampler.enhance(
np.array(image, dtype=np.uint8),
outscale=upsampler_scale,
alpha_upsampler=opt.gfpgan_bg_upsampler,
)
res = Image.fromarray(output)
if strength < 1.0:
# Resize the image to the new image if the sizes have changed
if output.size != image.size:
image = image.resize(res.size)
res = Image.blend(image, res, strength)
if torch.cuda.is_available():
torch.cuda.empty_cache()
upsampler = None
return res

View File

@@ -5,32 +5,49 @@ class LambdaWarmUpCosineScheduler:
"""
note: use with a base_lr of 1.0
"""
def __init__(self, warm_up_steps, lr_min, lr_max, lr_start, max_decay_steps, verbosity_interval=0):
def __init__(
self,
warm_up_steps,
lr_min,
lr_max,
lr_start,
max_decay_steps,
verbosity_interval=0,
):
self.lr_warm_up_steps = warm_up_steps
self.lr_start = lr_start
self.lr_min = lr_min
self.lr_max = lr_max
self.lr_max_decay_steps = max_decay_steps
self.last_lr = 0.
self.last_lr = 0.0
self.verbosity_interval = verbosity_interval
def schedule(self, n, **kwargs):
if self.verbosity_interval > 0:
if n % self.verbosity_interval == 0: print(f"current step: {n}, recent lr-multiplier: {self.last_lr}")
if n % self.verbosity_interval == 0:
print(
f'current step: {n}, recent lr-multiplier: {self.last_lr}'
)
if n < self.lr_warm_up_steps:
lr = (self.lr_max - self.lr_start) / self.lr_warm_up_steps * n + self.lr_start
lr = (
self.lr_max - self.lr_start
) / self.lr_warm_up_steps * n + self.lr_start
self.last_lr = lr
return lr
else:
t = (n - self.lr_warm_up_steps) / (self.lr_max_decay_steps - self.lr_warm_up_steps)
t = (n - self.lr_warm_up_steps) / (
self.lr_max_decay_steps - self.lr_warm_up_steps
)
t = min(t, 1.0)
lr = self.lr_min + 0.5 * (self.lr_max - self.lr_min) * (
1 + np.cos(t * np.pi))
1 + np.cos(t * np.pi)
)
self.last_lr = lr
return lr
def __call__(self, n, **kwargs):
return self.schedule(n,**kwargs)
return self.schedule(n, **kwargs)
class LambdaWarmUpCosineScheduler2:
@@ -38,15 +55,30 @@ class LambdaWarmUpCosineScheduler2:
supports repeated iterations, configurable via lists
note: use with a base_lr of 1.0.
"""
def __init__(self, warm_up_steps, f_min, f_max, f_start, cycle_lengths, verbosity_interval=0):
assert len(warm_up_steps) == len(f_min) == len(f_max) == len(f_start) == len(cycle_lengths)
def __init__(
self,
warm_up_steps,
f_min,
f_max,
f_start,
cycle_lengths,
verbosity_interval=0,
):
assert (
len(warm_up_steps)
== len(f_min)
== len(f_max)
== len(f_start)
== len(cycle_lengths)
)
self.lr_warm_up_steps = warm_up_steps
self.f_start = f_start
self.f_min = f_min
self.f_max = f_max
self.cycle_lengths = cycle_lengths
self.cum_cycles = np.cumsum([0] + list(self.cycle_lengths))
self.last_f = 0.
self.last_f = 0.0
self.verbosity_interval = verbosity_interval
def find_in_interval(self, n):
@@ -60,17 +92,25 @@ class LambdaWarmUpCosineScheduler2:
cycle = self.find_in_interval(n)
n = n - self.cum_cycles[cycle]
if self.verbosity_interval > 0:
if n % self.verbosity_interval == 0: print(f"current step: {n}, recent lr-multiplier: {self.last_f}, "
f"current cycle {cycle}")
if n % self.verbosity_interval == 0:
print(
f'current step: {n}, recent lr-multiplier: {self.last_f}, '
f'current cycle {cycle}'
)
if n < self.lr_warm_up_steps[cycle]:
f = (self.f_max[cycle] - self.f_start[cycle]) / self.lr_warm_up_steps[cycle] * n + self.f_start[cycle]
f = (
self.f_max[cycle] - self.f_start[cycle]
) / self.lr_warm_up_steps[cycle] * n + self.f_start[cycle]
self.last_f = f
return f
else:
t = (n - self.lr_warm_up_steps[cycle]) / (self.cycle_lengths[cycle] - self.lr_warm_up_steps[cycle])
t = (n - self.lr_warm_up_steps[cycle]) / (
self.cycle_lengths[cycle] - self.lr_warm_up_steps[cycle]
)
t = min(t, 1.0)
f = self.f_min[cycle] + 0.5 * (self.f_max[cycle] - self.f_min[cycle]) * (
1 + np.cos(t * np.pi))
f = self.f_min[cycle] + 0.5 * (
self.f_max[cycle] - self.f_min[cycle]
) * (1 + np.cos(t * np.pi))
self.last_f = f
return f
@@ -79,20 +119,25 @@ class LambdaWarmUpCosineScheduler2:
class LambdaLinearScheduler(LambdaWarmUpCosineScheduler2):
def schedule(self, n, **kwargs):
cycle = self.find_in_interval(n)
n = n - self.cum_cycles[cycle]
if self.verbosity_interval > 0:
if n % self.verbosity_interval == 0: print(f"current step: {n}, recent lr-multiplier: {self.last_f}, "
f"current cycle {cycle}")
if n % self.verbosity_interval == 0:
print(
f'current step: {n}, recent lr-multiplier: {self.last_f}, '
f'current cycle {cycle}'
)
if n < self.lr_warm_up_steps[cycle]:
f = (self.f_max[cycle] - self.f_start[cycle]) / self.lr_warm_up_steps[cycle] * n + self.f_start[cycle]
f = (
self.f_max[cycle] - self.f_start[cycle]
) / self.lr_warm_up_steps[cycle] * n + self.f_start[cycle]
self.last_f = f
return f
else:
f = self.f_min[cycle] + (self.f_max[cycle] - self.f_min[cycle]) * (self.cycle_lengths[cycle] - n) / (self.cycle_lengths[cycle])
f = self.f_min[cycle] + (self.f_max[cycle] - self.f_min[cycle]) * (
self.cycle_lengths[cycle] - n
) / (self.cycle_lengths[cycle])
self.last_f = f
return f

View File

@@ -6,29 +6,32 @@ from contextlib import contextmanager
from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer
from ldm.modules.diffusionmodules.model import Encoder, Decoder
from ldm.modules.distributions.distributions import DiagonalGaussianDistribution
from ldm.modules.distributions.distributions import (
DiagonalGaussianDistribution,
)
from ldm.util import instantiate_from_config
class VQModel(pl.LightningModule):
def __init__(self,
ddconfig,
lossconfig,
n_embed,
embed_dim,
ckpt_path=None,
ignore_keys=[],
image_key="image",
colorize_nlabels=None,
monitor=None,
batch_resize_range=None,
scheduler_config=None,
lr_g_factor=1.0,
remap=None,
sane_index_shape=False, # tell vector quantizer to return indices as bhw
use_ema=False
):
def __init__(
self,
ddconfig,
lossconfig,
n_embed,
embed_dim,
ckpt_path=None,
ignore_keys=[],
image_key='image',
colorize_nlabels=None,
monitor=None,
batch_resize_range=None,
scheduler_config=None,
lr_g_factor=1.0,
remap=None,
sane_index_shape=False, # tell vector quantizer to return indices as bhw
use_ema=False,
):
super().__init__()
self.embed_dim = embed_dim
self.n_embed = n_embed
@@ -36,24 +39,34 @@ class VQModel(pl.LightningModule):
self.encoder = Encoder(**ddconfig)
self.decoder = Decoder(**ddconfig)
self.loss = instantiate_from_config(lossconfig)
self.quantize = VectorQuantizer(n_embed, embed_dim, beta=0.25,
remap=remap,
sane_index_shape=sane_index_shape)
self.quant_conv = torch.nn.Conv2d(ddconfig["z_channels"], embed_dim, 1)
self.post_quant_conv = torch.nn.Conv2d(embed_dim, ddconfig["z_channels"], 1)
self.quantize = VectorQuantizer(
n_embed,
embed_dim,
beta=0.25,
remap=remap,
sane_index_shape=sane_index_shape,
)
self.quant_conv = torch.nn.Conv2d(ddconfig['z_channels'], embed_dim, 1)
self.post_quant_conv = torch.nn.Conv2d(
embed_dim, ddconfig['z_channels'], 1
)
if colorize_nlabels is not None:
assert type(colorize_nlabels)==int
self.register_buffer("colorize", torch.randn(3, colorize_nlabels, 1, 1))
assert type(colorize_nlabels) == int
self.register_buffer(
'colorize', torch.randn(3, colorize_nlabels, 1, 1)
)
if monitor is not None:
self.monitor = monitor
self.batch_resize_range = batch_resize_range
if self.batch_resize_range is not None:
print(f"{self.__class__.__name__}: Using per-batch resizing in range {batch_resize_range}.")
print(
f'{self.__class__.__name__}: Using per-batch resizing in range {batch_resize_range}.'
)
self.use_ema = use_ema
if self.use_ema:
self.model_ema = LitEma(self)
print(f"Keeping EMAs of {len(list(self.model_ema.buffers()))}.")
print(f'Keeping EMAs of {len(list(self.model_ema.buffers()))}.')
if ckpt_path is not None:
self.init_from_ckpt(ckpt_path, ignore_keys=ignore_keys)
@@ -66,28 +79,30 @@ class VQModel(pl.LightningModule):
self.model_ema.store(self.parameters())
self.model_ema.copy_to(self)
if context is not None:
print(f"{context}: Switched to EMA weights")
print(f'{context}: Switched to EMA weights')
try:
yield None
finally:
if self.use_ema:
self.model_ema.restore(self.parameters())
if context is not None:
print(f"{context}: Restored training weights")
print(f'{context}: Restored training weights')
def init_from_ckpt(self, path, ignore_keys=list()):
sd = torch.load(path, map_location="cpu")["state_dict"]
sd = torch.load(path, map_location='cpu')['state_dict']
keys = list(sd.keys())
for k in keys:
for ik in ignore_keys:
if k.startswith(ik):
print("Deleting key {} from state_dict.".format(k))
print('Deleting key {} from state_dict.'.format(k))
del sd[k]
missing, unexpected = self.load_state_dict(sd, strict=False)
print(f"Restored from {path} with {len(missing)} missing and {len(unexpected)} unexpected keys")
print(
f'Restored from {path} with {len(missing)} missing and {len(unexpected)} unexpected keys'
)
if len(missing) > 0:
print(f"Missing Keys: {missing}")
print(f"Unexpected Keys: {unexpected}")
print(f'Missing Keys: {missing}')
print(f'Unexpected Keys: {unexpected}')
def on_train_batch_end(self, *args, **kwargs):
if self.use_ema:
@@ -115,7 +130,7 @@ class VQModel(pl.LightningModule):
return dec
def forward(self, input, return_pred_indices=False):
quant, diff, (_,_,ind) = self.encode(input)
quant, diff, (_, _, ind) = self.encode(input)
dec = self.decode(quant)
if return_pred_indices:
return dec, diff, ind
@@ -125,7 +140,11 @@ class VQModel(pl.LightningModule):
x = batch[k]
if len(x.shape) == 3:
x = x[..., None]
x = x.permute(0, 3, 1, 2).to(memory_format=torch.contiguous_format).float()
x = (
x.permute(0, 3, 1, 2)
.to(memory_format=torch.contiguous_format)
.float()
)
if self.batch_resize_range is not None:
lower_size = self.batch_resize_range[0]
upper_size = self.batch_resize_range[1]
@@ -133,9 +152,11 @@ class VQModel(pl.LightningModule):
# do the first few batches with max size to avoid later oom
new_resize = upper_size
else:
new_resize = np.random.choice(np.arange(lower_size, upper_size+16, 16))
new_resize = np.random.choice(
np.arange(lower_size, upper_size + 16, 16)
)
if new_resize != x.shape[2]:
x = F.interpolate(x, size=new_resize, mode="bicubic")
x = F.interpolate(x, size=new_resize, mode='bicubic')
x = x.detach()
return x
@@ -147,81 +168,139 @@ class VQModel(pl.LightningModule):
if optimizer_idx == 0:
# autoencode
aeloss, log_dict_ae = self.loss(qloss, x, xrec, optimizer_idx, self.global_step,
last_layer=self.get_last_layer(), split="train",
predicted_indices=ind)
aeloss, log_dict_ae = self.loss(
qloss,
x,
xrec,
optimizer_idx,
self.global_step,
last_layer=self.get_last_layer(),
split='train',
predicted_indices=ind,
)
self.log_dict(log_dict_ae, prog_bar=False, logger=True, on_step=True, on_epoch=True)
self.log_dict(
log_dict_ae,
prog_bar=False,
logger=True,
on_step=True,
on_epoch=True,
)
return aeloss
if optimizer_idx == 1:
# discriminator
discloss, log_dict_disc = self.loss(qloss, x, xrec, optimizer_idx, self.global_step,
last_layer=self.get_last_layer(), split="train")
self.log_dict(log_dict_disc, prog_bar=False, logger=True, on_step=True, on_epoch=True)
discloss, log_dict_disc = self.loss(
qloss,
x,
xrec,
optimizer_idx,
self.global_step,
last_layer=self.get_last_layer(),
split='train',
)
self.log_dict(
log_dict_disc,
prog_bar=False,
logger=True,
on_step=True,
on_epoch=True,
)
return discloss
def validation_step(self, batch, batch_idx):
log_dict = self._validation_step(batch, batch_idx)
with self.ema_scope():
log_dict_ema = self._validation_step(batch, batch_idx, suffix="_ema")
log_dict_ema = self._validation_step(
batch, batch_idx, suffix='_ema'
)
return log_dict
def _validation_step(self, batch, batch_idx, suffix=""):
def _validation_step(self, batch, batch_idx, suffix=''):
x = self.get_input(batch, self.image_key)
xrec, qloss, ind = self(x, return_pred_indices=True)
aeloss, log_dict_ae = self.loss(qloss, x, xrec, 0,
self.global_step,
last_layer=self.get_last_layer(),
split="val"+suffix,
predicted_indices=ind
)
aeloss, log_dict_ae = self.loss(
qloss,
x,
xrec,
0,
self.global_step,
last_layer=self.get_last_layer(),
split='val' + suffix,
predicted_indices=ind,
)
discloss, log_dict_disc = self.loss(qloss, x, xrec, 1,
self.global_step,
last_layer=self.get_last_layer(),
split="val"+suffix,
predicted_indices=ind
)
rec_loss = log_dict_ae[f"val{suffix}/rec_loss"]
self.log(f"val{suffix}/rec_loss", rec_loss,
prog_bar=True, logger=True, on_step=False, on_epoch=True, sync_dist=True)
self.log(f"val{suffix}/aeloss", aeloss,
prog_bar=True, logger=True, on_step=False, on_epoch=True, sync_dist=True)
discloss, log_dict_disc = self.loss(
qloss,
x,
xrec,
1,
self.global_step,
last_layer=self.get_last_layer(),
split='val' + suffix,
predicted_indices=ind,
)
rec_loss = log_dict_ae[f'val{suffix}/rec_loss']
self.log(
f'val{suffix}/rec_loss',
rec_loss,
prog_bar=True,
logger=True,
on_step=False,
on_epoch=True,
sync_dist=True,
)
self.log(
f'val{suffix}/aeloss',
aeloss,
prog_bar=True,
logger=True,
on_step=False,
on_epoch=True,
sync_dist=True,
)
if version.parse(pl.__version__) >= version.parse('1.4.0'):
del log_dict_ae[f"val{suffix}/rec_loss"]
del log_dict_ae[f'val{suffix}/rec_loss']
self.log_dict(log_dict_ae)
self.log_dict(log_dict_disc)
return self.log_dict
def configure_optimizers(self):
lr_d = self.learning_rate
lr_g = self.lr_g_factor*self.learning_rate
print("lr_d", lr_d)
print("lr_g", lr_g)
opt_ae = torch.optim.Adam(list(self.encoder.parameters())+
list(self.decoder.parameters())+
list(self.quantize.parameters())+
list(self.quant_conv.parameters())+
list(self.post_quant_conv.parameters()),
lr=lr_g, betas=(0.5, 0.9))
opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),
lr=lr_d, betas=(0.5, 0.9))
lr_g = self.lr_g_factor * self.learning_rate
print('lr_d', lr_d)
print('lr_g', lr_g)
opt_ae = torch.optim.Adam(
list(self.encoder.parameters())
+ list(self.decoder.parameters())
+ list(self.quantize.parameters())
+ list(self.quant_conv.parameters())
+ list(self.post_quant_conv.parameters()),
lr=lr_g,
betas=(0.5, 0.9),
)
opt_disc = torch.optim.Adam(
self.loss.discriminator.parameters(), lr=lr_d, betas=(0.5, 0.9)
)
if self.scheduler_config is not None:
scheduler = instantiate_from_config(self.scheduler_config)
print("Setting up LambdaLR scheduler...")
print('Setting up LambdaLR scheduler...')
scheduler = [
{
'scheduler': LambdaLR(opt_ae, lr_lambda=scheduler.schedule),
'scheduler': LambdaLR(
opt_ae, lr_lambda=scheduler.schedule
),
'interval': 'step',
'frequency': 1
'frequency': 1,
},
{
'scheduler': LambdaLR(opt_disc, lr_lambda=scheduler.schedule),
'scheduler': LambdaLR(
opt_disc, lr_lambda=scheduler.schedule
),
'interval': 'step',
'frequency': 1
'frequency': 1,
},
]
return [opt_ae, opt_disc], scheduler
@@ -235,7 +314,7 @@ class VQModel(pl.LightningModule):
x = self.get_input(batch, self.image_key)
x = x.to(self.device)
if only_inputs:
log["inputs"] = x
log['inputs'] = x
return log
xrec, _ = self(x)
if x.shape[1] > 3:
@@ -243,21 +322,24 @@ class VQModel(pl.LightningModule):
assert xrec.shape[1] > 3
x = self.to_rgb(x)
xrec = self.to_rgb(xrec)
log["inputs"] = x
log["reconstructions"] = xrec
log['inputs'] = x
log['reconstructions'] = xrec
if plot_ema:
with self.ema_scope():
xrec_ema, _ = self(x)
if x.shape[1] > 3: xrec_ema = self.to_rgb(xrec_ema)
log["reconstructions_ema"] = xrec_ema
if x.shape[1] > 3:
xrec_ema = self.to_rgb(xrec_ema)
log['reconstructions_ema'] = xrec_ema
return log
def to_rgb(self, x):
assert self.image_key == "segmentation"
if not hasattr(self, "colorize"):
self.register_buffer("colorize", torch.randn(3, x.shape[1], 1, 1).to(x))
assert self.image_key == 'segmentation'
if not hasattr(self, 'colorize'):
self.register_buffer(
'colorize', torch.randn(3, x.shape[1], 1, 1).to(x)
)
x = F.conv2d(x, weight=self.colorize)
x = 2.*(x-x.min())/(x.max()-x.min()) - 1.
x = 2.0 * (x - x.min()) / (x.max() - x.min()) - 1.0
return x
@@ -283,43 +365,50 @@ class VQModelInterface(VQModel):
class AutoencoderKL(pl.LightningModule):
def __init__(self,
ddconfig,
lossconfig,
embed_dim,
ckpt_path=None,
ignore_keys=[],
image_key="image",
colorize_nlabels=None,
monitor=None,
):
def __init__(
self,
ddconfig,
lossconfig,
embed_dim,
ckpt_path=None,
ignore_keys=[],
image_key='image',
colorize_nlabels=None,
monitor=None,
):
super().__init__()
self.image_key = image_key
self.encoder = Encoder(**ddconfig)
self.decoder = Decoder(**ddconfig)
self.loss = instantiate_from_config(lossconfig)
assert ddconfig["double_z"]
self.quant_conv = torch.nn.Conv2d(2*ddconfig["z_channels"], 2*embed_dim, 1)
self.post_quant_conv = torch.nn.Conv2d(embed_dim, ddconfig["z_channels"], 1)
assert ddconfig['double_z']
self.quant_conv = torch.nn.Conv2d(
2 * ddconfig['z_channels'], 2 * embed_dim, 1
)
self.post_quant_conv = torch.nn.Conv2d(
embed_dim, ddconfig['z_channels'], 1
)
self.embed_dim = embed_dim
if colorize_nlabels is not None:
assert type(colorize_nlabels)==int
self.register_buffer("colorize", torch.randn(3, colorize_nlabels, 1, 1))
assert type(colorize_nlabels) == int
self.register_buffer(
'colorize', torch.randn(3, colorize_nlabels, 1, 1)
)
if monitor is not None:
self.monitor = monitor
if ckpt_path is not None:
self.init_from_ckpt(ckpt_path, ignore_keys=ignore_keys)
def init_from_ckpt(self, path, ignore_keys=list()):
sd = torch.load(path, map_location="cpu")["state_dict"]
sd = torch.load(path, map_location='cpu')['state_dict']
keys = list(sd.keys())
for k in keys:
for ik in ignore_keys:
if k.startswith(ik):
print("Deleting key {} from state_dict.".format(k))
print('Deleting key {} from state_dict.'.format(k))
del sd[k]
self.load_state_dict(sd, strict=False)
print(f"Restored from {path}")
print(f'Restored from {path}')
def encode(self, x):
h = self.encoder(x)
@@ -345,7 +434,11 @@ class AutoencoderKL(pl.LightningModule):
x = batch[k]
if len(x.shape) == 3:
x = x[..., None]
x = x.permute(0, 3, 1, 2).to(memory_format=torch.contiguous_format).float()
x = (
x.permute(0, 3, 1, 2)
.to(memory_format=torch.contiguous_format)
.float()
)
return x
def training_step(self, batch, batch_idx, optimizer_idx):
@@ -354,44 +447,102 @@ class AutoencoderKL(pl.LightningModule):
if optimizer_idx == 0:
# train encoder+decoder+logvar
aeloss, log_dict_ae = self.loss(inputs, reconstructions, posterior, optimizer_idx, self.global_step,
last_layer=self.get_last_layer(), split="train")
self.log("aeloss", aeloss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
self.log_dict(log_dict_ae, prog_bar=False, logger=True, on_step=True, on_epoch=False)
aeloss, log_dict_ae = self.loss(
inputs,
reconstructions,
posterior,
optimizer_idx,
self.global_step,
last_layer=self.get_last_layer(),
split='train',
)
self.log(
'aeloss',
aeloss,
prog_bar=True,
logger=True,
on_step=True,
on_epoch=True,
)
self.log_dict(
log_dict_ae,
prog_bar=False,
logger=True,
on_step=True,
on_epoch=False,
)
return aeloss
if optimizer_idx == 1:
# train the discriminator
discloss, log_dict_disc = self.loss(inputs, reconstructions, posterior, optimizer_idx, self.global_step,
last_layer=self.get_last_layer(), split="train")
discloss, log_dict_disc = self.loss(
inputs,
reconstructions,
posterior,
optimizer_idx,
self.global_step,
last_layer=self.get_last_layer(),
split='train',
)
self.log("discloss", discloss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
self.log_dict(log_dict_disc, prog_bar=False, logger=True, on_step=True, on_epoch=False)
self.log(
'discloss',
discloss,
prog_bar=True,
logger=True,
on_step=True,
on_epoch=True,
)
self.log_dict(
log_dict_disc,
prog_bar=False,
logger=True,
on_step=True,
on_epoch=False,
)
return discloss
def validation_step(self, batch, batch_idx):
inputs = self.get_input(batch, self.image_key)
reconstructions, posterior = self(inputs)
aeloss, log_dict_ae = self.loss(inputs, reconstructions, posterior, 0, self.global_step,
last_layer=self.get_last_layer(), split="val")
aeloss, log_dict_ae = self.loss(
inputs,
reconstructions,
posterior,
0,
self.global_step,
last_layer=self.get_last_layer(),
split='val',
)
discloss, log_dict_disc = self.loss(inputs, reconstructions, posterior, 1, self.global_step,
last_layer=self.get_last_layer(), split="val")
discloss, log_dict_disc = self.loss(
inputs,
reconstructions,
posterior,
1,
self.global_step,
last_layer=self.get_last_layer(),
split='val',
)
self.log("val/rec_loss", log_dict_ae["val/rec_loss"])
self.log('val/rec_loss', log_dict_ae['val/rec_loss'])
self.log_dict(log_dict_ae)
self.log_dict(log_dict_disc)
return self.log_dict
def configure_optimizers(self):
lr = self.learning_rate
opt_ae = torch.optim.Adam(list(self.encoder.parameters())+
list(self.decoder.parameters())+
list(self.quant_conv.parameters())+
list(self.post_quant_conv.parameters()),
lr=lr, betas=(0.5, 0.9))
opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),
lr=lr, betas=(0.5, 0.9))
opt_ae = torch.optim.Adam(
list(self.encoder.parameters())
+ list(self.decoder.parameters())
+ list(self.quant_conv.parameters())
+ list(self.post_quant_conv.parameters()),
lr=lr,
betas=(0.5, 0.9),
)
opt_disc = torch.optim.Adam(
self.loss.discriminator.parameters(), lr=lr, betas=(0.5, 0.9)
)
return [opt_ae, opt_disc], []
def get_last_layer(self):
@@ -409,17 +560,19 @@ class AutoencoderKL(pl.LightningModule):
assert xrec.shape[1] > 3
x = self.to_rgb(x)
xrec = self.to_rgb(xrec)
log["samples"] = self.decode(torch.randn_like(posterior.sample()))
log["reconstructions"] = xrec
log["inputs"] = x
log['samples'] = self.decode(torch.randn_like(posterior.sample()))
log['reconstructions'] = xrec
log['inputs'] = x
return log
def to_rgb(self, x):
assert self.image_key == "segmentation"
if not hasattr(self, "colorize"):
self.register_buffer("colorize", torch.randn(3, x.shape[1], 1, 1).to(x))
assert self.image_key == 'segmentation'
if not hasattr(self, 'colorize'):
self.register_buffer(
'colorize', torch.randn(3, x.shape[1], 1, 1).to(x)
)
x = F.conv2d(x, weight=self.colorize)
x = 2.*(x-x.min())/(x.max()-x.min()) - 1.
x = 2.0 * (x - x.min()) / (x.max() - x.min()) - 1.0
return x

View File

@@ -10,13 +10,13 @@ from einops import rearrange
from glob import glob
from natsort import natsorted
from ldm.modules.diffusionmodules.openaimodel import EncoderUNetModel, UNetModel
from ldm.modules.diffusionmodules.openaimodel import (
EncoderUNetModel,
UNetModel,
)
from ldm.util import log_txt_as_img, default, ismap, instantiate_from_config
__models__ = {
'class_label': EncoderUNetModel,
'segmentation': UNetModel
}
__models__ = {'class_label': EncoderUNetModel, 'segmentation': UNetModel}
def disabled_train(self, mode=True):
@@ -26,37 +26,49 @@ def disabled_train(self, mode=True):
class NoisyLatentImageClassifier(pl.LightningModule):
def __init__(self,
diffusion_path,
num_classes,
ckpt_path=None,
pool='attention',
label_key=None,
diffusion_ckpt_path=None,
scheduler_config=None,
weight_decay=1.e-2,
log_steps=10,
monitor='val/loss',
*args,
**kwargs):
def __init__(
self,
diffusion_path,
num_classes,
ckpt_path=None,
pool='attention',
label_key=None,
diffusion_ckpt_path=None,
scheduler_config=None,
weight_decay=1.0e-2,
log_steps=10,
monitor='val/loss',
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.num_classes = num_classes
# get latest config of diffusion model
diffusion_config = natsorted(glob(os.path.join(diffusion_path, 'configs', '*-project.yaml')))[-1]
diffusion_config = natsorted(
glob(os.path.join(diffusion_path, 'configs', '*-project.yaml'))
)[-1]
self.diffusion_config = OmegaConf.load(diffusion_config).model
self.diffusion_config.params.ckpt_path = diffusion_ckpt_path
self.load_diffusion()
self.monitor = monitor
self.numd = self.diffusion_model.first_stage_model.encoder.num_resolutions - 1
self.log_time_interval = self.diffusion_model.num_timesteps // log_steps
self.numd = (
self.diffusion_model.first_stage_model.encoder.num_resolutions - 1
)
self.log_time_interval = (
self.diffusion_model.num_timesteps // log_steps
)
self.log_steps = log_steps
self.label_key = label_key if not hasattr(self.diffusion_model, 'cond_stage_key') \
self.label_key = (
label_key
if not hasattr(self.diffusion_model, 'cond_stage_key')
else self.diffusion_model.cond_stage_key
)
assert self.label_key is not None, 'label_key neither in diffusion model nor in model.params'
assert (
self.label_key is not None
), 'label_key neither in diffusion model nor in model.params'
if self.label_key not in __models__:
raise NotImplementedError()
@@ -68,22 +80,27 @@ class NoisyLatentImageClassifier(pl.LightningModule):
self.weight_decay = weight_decay
def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):
sd = torch.load(path, map_location="cpu")
if "state_dict" in list(sd.keys()):
sd = sd["state_dict"]
sd = torch.load(path, map_location='cpu')
if 'state_dict' in list(sd.keys()):
sd = sd['state_dict']
keys = list(sd.keys())
for k in keys:
for ik in ignore_keys:
if k.startswith(ik):
print("Deleting key {} from state_dict.".format(k))
print('Deleting key {} from state_dict.'.format(k))
del sd[k]
missing, unexpected = self.load_state_dict(sd, strict=False) if not only_model else self.model.load_state_dict(
sd, strict=False)
print(f"Restored from {path} with {len(missing)} missing and {len(unexpected)} unexpected keys")
missing, unexpected = (
self.load_state_dict(sd, strict=False)
if not only_model
else self.model.load_state_dict(sd, strict=False)
)
print(
f'Restored from {path} with {len(missing)} missing and {len(unexpected)} unexpected keys'
)
if len(missing) > 0:
print(f"Missing Keys: {missing}")
print(f'Missing Keys: {missing}')
if len(unexpected) > 0:
print(f"Unexpected Keys: {unexpected}")
print(f'Unexpected Keys: {unexpected}')
def load_diffusion(self):
model = instantiate_from_config(self.diffusion_config)
@@ -93,17 +110,25 @@ class NoisyLatentImageClassifier(pl.LightningModule):
param.requires_grad = False
def load_classifier(self, ckpt_path, pool):
model_config = deepcopy(self.diffusion_config.params.unet_config.params)
model_config.in_channels = self.diffusion_config.params.unet_config.params.out_channels
model_config = deepcopy(
self.diffusion_config.params.unet_config.params
)
model_config.in_channels = (
self.diffusion_config.params.unet_config.params.out_channels
)
model_config.out_channels = self.num_classes
if self.label_key == 'class_label':
model_config.pool = pool
self.model = __models__[self.label_key](**model_config)
if ckpt_path is not None:
print('#####################################################################')
print(
'#####################################################################'
)
print(f'load from ckpt "{ckpt_path}"')
print('#####################################################################')
print(
'#####################################################################'
)
self.init_from_ckpt(ckpt_path)
@torch.no_grad()
@@ -111,11 +136,19 @@ class NoisyLatentImageClassifier(pl.LightningModule):
noise = default(noise, lambda: torch.randn_like(x))
continuous_sqrt_alpha_cumprod = None
if self.diffusion_model.use_continuous_noise:
continuous_sqrt_alpha_cumprod = self.diffusion_model.sample_continuous_noise_level(x.shape[0], t + 1)
continuous_sqrt_alpha_cumprod = (
self.diffusion_model.sample_continuous_noise_level(
x.shape[0], t + 1
)
)
# todo: make sure t+1 is correct here
return self.diffusion_model.q_sample(x_start=x, t=t, noise=noise,
continuous_sqrt_alpha_cumprod=continuous_sqrt_alpha_cumprod)
return self.diffusion_model.q_sample(
x_start=x,
t=t,
noise=noise,
continuous_sqrt_alpha_cumprod=continuous_sqrt_alpha_cumprod,
)
def forward(self, x_noisy, t, *args, **kwargs):
return self.model(x_noisy, t)
@@ -141,17 +174,21 @@ class NoisyLatentImageClassifier(pl.LightningModule):
targets = rearrange(targets, 'b h w c -> b c h w')
for down in range(self.numd):
h, w = targets.shape[-2:]
targets = F.interpolate(targets, size=(h // 2, w // 2), mode='nearest')
targets = F.interpolate(
targets, size=(h // 2, w // 2), mode='nearest'
)
# targets = rearrange(targets,'b c h w -> b h w c')
return targets
def compute_top_k(self, logits, labels, k, reduction="mean"):
def compute_top_k(self, logits, labels, k, reduction='mean'):
_, top_ks = torch.topk(logits, k, dim=1)
if reduction == "mean":
return (top_ks == labels[:, None]).float().sum(dim=-1).mean().item()
elif reduction == "none":
if reduction == 'mean':
return (
(top_ks == labels[:, None]).float().sum(dim=-1).mean().item()
)
elif reduction == 'none':
return (top_ks == labels[:, None]).float().sum(dim=-1)
def on_train_epoch_start(self):
@@ -162,29 +199,59 @@ class NoisyLatentImageClassifier(pl.LightningModule):
def write_logs(self, loss, logits, targets):
log_prefix = 'train' if self.training else 'val'
log = {}
log[f"{log_prefix}/loss"] = loss.mean()
log[f"{log_prefix}/acc@1"] = self.compute_top_k(
logits, targets, k=1, reduction="mean"
log[f'{log_prefix}/loss'] = loss.mean()
log[f'{log_prefix}/acc@1'] = self.compute_top_k(
logits, targets, k=1, reduction='mean'
)
log[f"{log_prefix}/acc@5"] = self.compute_top_k(
logits, targets, k=5, reduction="mean"
log[f'{log_prefix}/acc@5'] = self.compute_top_k(
logits, targets, k=5, reduction='mean'
)
self.log_dict(log, prog_bar=False, logger=True, on_step=self.training, on_epoch=True)
self.log('loss', log[f"{log_prefix}/loss"], prog_bar=True, logger=False)
self.log('global_step', self.global_step, logger=False, on_epoch=False, prog_bar=True)
self.log_dict(
log,
prog_bar=False,
logger=True,
on_step=self.training,
on_epoch=True,
)
self.log(
'loss', log[f'{log_prefix}/loss'], prog_bar=True, logger=False
)
self.log(
'global_step',
self.global_step,
logger=False,
on_epoch=False,
prog_bar=True,
)
lr = self.optimizers().param_groups[0]['lr']
self.log('lr_abs', lr, on_step=True, logger=True, on_epoch=False, prog_bar=True)
self.log(
'lr_abs',
lr,
on_step=True,
logger=True,
on_epoch=False,
prog_bar=True,
)
def shared_step(self, batch, t=None):
x, *_ = self.diffusion_model.get_input(batch, k=self.diffusion_model.first_stage_key)
x, *_ = self.diffusion_model.get_input(
batch, k=self.diffusion_model.first_stage_key
)
targets = self.get_conditioning(batch)
if targets.dim() == 4:
targets = targets.argmax(dim=1)
if t is None:
t = torch.randint(0, self.diffusion_model.num_timesteps, (x.shape[0],), device=self.device).long()
t = torch.randint(
0,
self.diffusion_model.num_timesteps,
(x.shape[0],),
device=self.device,
).long()
else:
t = torch.full(size=(x.shape[0],), fill_value=t, device=self.device).long()
t = torch.full(
size=(x.shape[0],), fill_value=t, device=self.device
).long()
x_noisy = self.get_x_noisy(x, t)
logits = self(x_noisy, t)
@@ -200,8 +267,14 @@ class NoisyLatentImageClassifier(pl.LightningModule):
return loss
def reset_noise_accs(self):
self.noisy_acc = {t: {'acc@1': [], 'acc@5': []} for t in
range(0, self.diffusion_model.num_timesteps, self.diffusion_model.log_every_t)}
self.noisy_acc = {
t: {'acc@1': [], 'acc@5': []}
for t in range(
0,
self.diffusion_model.num_timesteps,
self.diffusion_model.log_every_t,
)
}
def on_validation_start(self):
self.reset_noise_accs()
@@ -212,24 +285,35 @@ class NoisyLatentImageClassifier(pl.LightningModule):
for t in self.noisy_acc:
_, logits, _, targets = self.shared_step(batch, t)
self.noisy_acc[t]['acc@1'].append(self.compute_top_k(logits, targets, k=1, reduction='mean'))
self.noisy_acc[t]['acc@5'].append(self.compute_top_k(logits, targets, k=5, reduction='mean'))
self.noisy_acc[t]['acc@1'].append(
self.compute_top_k(logits, targets, k=1, reduction='mean')
)
self.noisy_acc[t]['acc@5'].append(
self.compute_top_k(logits, targets, k=5, reduction='mean')
)
return loss
def configure_optimizers(self):
optimizer = AdamW(self.model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay)
optimizer = AdamW(
self.model.parameters(),
lr=self.learning_rate,
weight_decay=self.weight_decay,
)
if self.use_scheduler:
scheduler = instantiate_from_config(self.scheduler_config)
print("Setting up LambdaLR scheduler...")
print('Setting up LambdaLR scheduler...')
scheduler = [
{
'scheduler': LambdaLR(optimizer, lr_lambda=scheduler.schedule),
'scheduler': LambdaLR(
optimizer, lr_lambda=scheduler.schedule
),
'interval': 'step',
'frequency': 1
}]
'frequency': 1,
}
]
return [optimizer], scheduler
return optimizer
@@ -243,7 +327,7 @@ class NoisyLatentImageClassifier(pl.LightningModule):
y = self.get_conditioning(batch)
if self.label_key == 'class_label':
y = log_txt_as_img((x.shape[2], x.shape[3]), batch["human_label"])
y = log_txt_as_img((x.shape[2], x.shape[3]), batch['human_label'])
log['labels'] = y
if ismap(y):
@@ -256,10 +340,14 @@ class NoisyLatentImageClassifier(pl.LightningModule):
log[f'inputs@t{current_time}'] = x_noisy
pred = F.one_hot(logits.argmax(dim=1), num_classes=self.num_classes)
pred = F.one_hot(
logits.argmax(dim=1), num_classes=self.num_classes
)
pred = rearrange(pred, 'b h w c -> b c h w')
log[f'pred@t{current_time}'] = self.diffusion_model.to_rgb(pred)
log[f'pred@t{current_time}'] = self.diffusion_model.to_rgb(
pred
)
for key in log:
log[key] = log[key][:N]

View File

@@ -4,88 +4,146 @@ import torch
import numpy as np
from tqdm import tqdm
from functools import partial
from ldm.dream.devices import choose_torch_device
from ldm.modules.diffusionmodules.util import make_ddim_sampling_parameters, make_ddim_timesteps, noise_like, \
extract_into_tensor
from ldm.modules.diffusionmodules.util import (
make_ddim_sampling_parameters,
make_ddim_timesteps,
noise_like,
extract_into_tensor,
)
class DDIMSampler(object):
def __init__(self, model, schedule="linear", **kwargs):
def __init__(self, model, schedule='linear', device=None, **kwargs):
super().__init__()
self.model = model
self.ddpm_num_timesteps = model.num_timesteps
self.schedule = schedule
self.device = device or choose_torch_device()
def register_buffer(self, name, attr):
if type(attr) == torch.Tensor:
if attr.device != torch.device("cuda"):
attr = attr.to(torch.device("cuda"))
if attr.device != torch.device(self.device):
attr = attr.to(dtype=torch.float32, device=self.device)
setattr(self, name, attr)
def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddim_eta=0., verbose=True):
self.ddim_timesteps = make_ddim_timesteps(ddim_discr_method=ddim_discretize, num_ddim_timesteps=ddim_num_steps,
num_ddpm_timesteps=self.ddpm_num_timesteps,verbose=verbose)
def make_schedule(
self,
ddim_num_steps,
ddim_discretize='uniform',
ddim_eta=0.0,
verbose=True,
):
self.ddim_timesteps = make_ddim_timesteps(
ddim_discr_method=ddim_discretize,
num_ddim_timesteps=ddim_num_steps,
num_ddpm_timesteps=self.ddpm_num_timesteps,
verbose=verbose,
)
alphas_cumprod = self.model.alphas_cumprod
assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep'
to_torch = lambda x: x.clone().detach().to(torch.float32).to(self.model.device)
assert (
alphas_cumprod.shape[0] == self.ddpm_num_timesteps
), 'alphas have to be defined for each timestep'
to_torch = (
lambda x: x.clone()
.detach()
.to(torch.float32)
.to(self.model.device)
)
self.register_buffer('betas', to_torch(self.model.betas))
self.register_buffer('alphas_cumprod', to_torch(alphas_cumprod))
self.register_buffer('alphas_cumprod_prev', to_torch(self.model.alphas_cumprod_prev))
self.register_buffer(
'alphas_cumprod_prev', to_torch(self.model.alphas_cumprod_prev)
)
# calculations for diffusion q(x_t | x_{t-1}) and others
self.register_buffer('sqrt_alphas_cumprod', to_torch(np.sqrt(alphas_cumprod.cpu())))
self.register_buffer('sqrt_one_minus_alphas_cumprod', to_torch(np.sqrt(1. - alphas_cumprod.cpu())))
self.register_buffer('log_one_minus_alphas_cumprod', to_torch(np.log(1. - alphas_cumprod.cpu())))
self.register_buffer('sqrt_recip_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu())))
self.register_buffer('sqrt_recipm1_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu() - 1)))
self.register_buffer(
'sqrt_alphas_cumprod', to_torch(np.sqrt(alphas_cumprod.cpu()))
)
self.register_buffer(
'sqrt_one_minus_alphas_cumprod',
to_torch(np.sqrt(1.0 - alphas_cumprod.cpu())),
)
self.register_buffer(
'log_one_minus_alphas_cumprod',
to_torch(np.log(1.0 - alphas_cumprod.cpu())),
)
self.register_buffer(
'sqrt_recip_alphas_cumprod',
to_torch(np.sqrt(1.0 / alphas_cumprod.cpu())),
)
self.register_buffer(
'sqrt_recipm1_alphas_cumprod',
to_torch(np.sqrt(1.0 / alphas_cumprod.cpu() - 1)),
)
# ddim sampling parameters
ddim_sigmas, ddim_alphas, ddim_alphas_prev = make_ddim_sampling_parameters(alphacums=alphas_cumprod.cpu(),
ddim_timesteps=self.ddim_timesteps,
eta=ddim_eta,verbose=verbose)
(
ddim_sigmas,
ddim_alphas,
ddim_alphas_prev,
) = make_ddim_sampling_parameters(
alphacums=alphas_cumprod.cpu(),
ddim_timesteps=self.ddim_timesteps,
eta=ddim_eta,
verbose=verbose,
)
self.register_buffer('ddim_sigmas', ddim_sigmas)
self.register_buffer('ddim_alphas', ddim_alphas)
self.register_buffer('ddim_alphas_prev', ddim_alphas_prev)
self.register_buffer('ddim_sqrt_one_minus_alphas', np.sqrt(1. - ddim_alphas))
self.register_buffer(
'ddim_sqrt_one_minus_alphas', np.sqrt(1.0 - ddim_alphas)
)
sigmas_for_original_sampling_steps = ddim_eta * torch.sqrt(
(1 - self.alphas_cumprod_prev) / (1 - self.alphas_cumprod) * (
1 - self.alphas_cumprod / self.alphas_cumprod_prev))
self.register_buffer('ddim_sigmas_for_original_num_steps', sigmas_for_original_sampling_steps)
(1 - self.alphas_cumprod_prev)
/ (1 - self.alphas_cumprod)
* (1 - self.alphas_cumprod / self.alphas_cumprod_prev)
)
self.register_buffer(
'ddim_sigmas_for_original_num_steps',
sigmas_for_original_sampling_steps,
)
@torch.no_grad()
def sample(self,
S,
batch_size,
shape,
conditioning=None,
callback=None,
normals_sequence=None,
img_callback=None,
quantize_x0=False,
eta=0.,
mask=None,
x0=None,
temperature=1.,
noise_dropout=0.,
score_corrector=None,
corrector_kwargs=None,
verbose=True,
x_T=None,
log_every_t=100,
unconditional_guidance_scale=1.,
unconditional_conditioning=None,
# this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...
**kwargs
):
def sample(
self,
S,
batch_size,
shape,
conditioning=None,
callback=None,
normals_sequence=None,
img_callback=None,
quantize_x0=False,
eta=0.0,
mask=None,
x0=None,
temperature=1.0,
noise_dropout=0.0,
score_corrector=None,
corrector_kwargs=None,
verbose=True,
x_T=None,
log_every_t=100,
unconditional_guidance_scale=1.0,
unconditional_conditioning=None,
# this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...
**kwargs,
):
if conditioning is not None:
if isinstance(conditioning, dict):
cbs = conditioning[list(conditioning.keys())[0]].shape[0]
if cbs != batch_size:
print(f"Warning: Got {cbs} conditionings but batch-size is {batch_size}")
print(
f'Warning: Got {cbs} conditionings but batch-size is {batch_size}'
)
else:
if conditioning.shape[0] != batch_size:
print(f"Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}")
print(
f'Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}'
)
self.make_schedule(ddim_num_steps=S, ddim_eta=eta, verbose=verbose)
# sampling
@@ -93,30 +151,48 @@ class DDIMSampler(object):
size = (batch_size, C, H, W)
print(f'Data shape for DDIM sampling is {size}, eta {eta}')
samples, intermediates = self.ddim_sampling(conditioning, size,
callback=callback,
img_callback=img_callback,
quantize_denoised=quantize_x0,
mask=mask, x0=x0,
ddim_use_original_steps=False,
noise_dropout=noise_dropout,
temperature=temperature,
score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
x_T=x_T,
log_every_t=log_every_t,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
)
samples, intermediates = self.ddim_sampling(
conditioning,
size,
callback=callback,
img_callback=img_callback,
quantize_denoised=quantize_x0,
mask=mask,
x0=x0,
ddim_use_original_steps=False,
noise_dropout=noise_dropout,
temperature=temperature,
score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
x_T=x_T,
log_every_t=log_every_t,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
)
return samples, intermediates
# This routine gets called from img2img
@torch.no_grad()
def ddim_sampling(self, cond, shape,
x_T=None, ddim_use_original_steps=False,
callback=None, timesteps=None, quantize_denoised=False,
mask=None, x0=None, img_callback=None, log_every_t=100,
temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
unconditional_guidance_scale=1., unconditional_conditioning=None,):
def ddim_sampling(
self,
cond,
shape,
x_T=None,
ddim_use_original_steps=False,
callback=None,
timesteps=None,
quantize_denoised=False,
mask=None,
x0=None,
img_callback=None,
log_every_t=100,
temperature=1.0,
noise_dropout=0.0,
score_corrector=None,
corrector_kwargs=None,
unconditional_guidance_scale=1.0,
unconditional_conditioning=None,
):
device = self.model.betas.device
b = shape[0]
if x_T is None:
@@ -125,17 +201,38 @@ class DDIMSampler(object):
img = x_T
if timesteps is None:
timesteps = self.ddpm_num_timesteps if ddim_use_original_steps else self.ddim_timesteps
timesteps = (
self.ddpm_num_timesteps
if ddim_use_original_steps
else self.ddim_timesteps
)
elif timesteps is not None and not ddim_use_original_steps:
subset_end = int(min(timesteps / self.ddim_timesteps.shape[0], 1) * self.ddim_timesteps.shape[0]) - 1
subset_end = (
int(
min(timesteps / self.ddim_timesteps.shape[0], 1)
* self.ddim_timesteps.shape[0]
)
- 1
)
timesteps = self.ddim_timesteps[:subset_end]
intermediates = {'x_inter': [img], 'pred_x0': [img]}
time_range = reversed(range(0,timesteps)) if ddim_use_original_steps else np.flip(timesteps)
total_steps = timesteps if ddim_use_original_steps else timesteps.shape[0]
print(f"Running DDIM Sampling with {total_steps} timesteps")
time_range = (
reversed(range(0, timesteps))
if ddim_use_original_steps
else np.flip(timesteps)
)
total_steps = (
timesteps if ddim_use_original_steps else timesteps.shape[0]
)
print(f'Running DDIM Sampling with {total_steps} timesteps')
iterator = tqdm(time_range, desc='DDIM Sampler', total=total_steps, dynamic_ncols=True)
iterator = tqdm(
time_range,
desc='DDIM Sampler',
total=total_steps,
dynamic_ncols=True,
)
for i, step in enumerate(iterator):
index = total_steps - i - 1
@@ -143,18 +240,30 @@ class DDIMSampler(object):
if mask is not None:
assert x0 is not None
img_orig = self.model.q_sample(x0, ts) # TODO: deterministic forward pass?
img = img_orig * mask + (1. - mask) * img
img_orig = self.model.q_sample(
x0, ts
) # TODO: deterministic forward pass?
img = img_orig * mask + (1.0 - mask) * img
outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
quantize_denoised=quantize_denoised, temperature=temperature,
noise_dropout=noise_dropout, score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning)
outs = self.p_sample_ddim(
img,
cond,
ts,
index=index,
use_original_steps=ddim_use_original_steps,
quantize_denoised=quantize_denoised,
temperature=temperature,
noise_dropout=noise_dropout,
score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
)
img, pred_x0 = outs
if callback: callback(i)
if img_callback: img_callback(pred_x0, i)
if callback:
callback(i)
if img_callback:
img_callback(pred_x0, i)
if index % log_every_t == 0 or index == total_steps - 1:
intermediates['x_inter'].append(img)
@@ -162,43 +271,84 @@ class DDIMSampler(object):
return img, intermediates
# This routine gets called from ddim_sampling() and decode()
@torch.no_grad()
def p_sample_ddim(self, x, c, t, index, repeat_noise=False, use_original_steps=False, quantize_denoised=False,
temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
unconditional_guidance_scale=1., unconditional_conditioning=None):
def p_sample_ddim(
self,
x,
c,
t,
index,
repeat_noise=False,
use_original_steps=False,
quantize_denoised=False,
temperature=1.0,
noise_dropout=0.0,
score_corrector=None,
corrector_kwargs=None,
unconditional_guidance_scale=1.0,
unconditional_conditioning=None,
):
b, *_, device = *x.shape, x.device
if unconditional_conditioning is None or unconditional_guidance_scale == 1.:
if (
unconditional_conditioning is None
or unconditional_guidance_scale == 1.0
):
e_t = self.model.apply_model(x, t, c)
else:
x_in = torch.cat([x] * 2)
t_in = torch.cat([t] * 2)
c_in = torch.cat([unconditional_conditioning, c])
e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
e_t = e_t_uncond + unconditional_guidance_scale * (e_t - e_t_uncond)
e_t = e_t_uncond + unconditional_guidance_scale * (
e_t - e_t_uncond
)
if score_corrector is not None:
assert self.model.parameterization == "eps"
e_t = score_corrector.modify_score(self.model, e_t, x, t, c, **corrector_kwargs)
assert self.model.parameterization == 'eps'
e_t = score_corrector.modify_score(
self.model, e_t, x, t, c, **corrector_kwargs
)
alphas = self.model.alphas_cumprod if use_original_steps else self.ddim_alphas
alphas_prev = self.model.alphas_cumprod_prev if use_original_steps else self.ddim_alphas_prev
sqrt_one_minus_alphas = self.model.sqrt_one_minus_alphas_cumprod if use_original_steps else self.ddim_sqrt_one_minus_alphas
sigmas = self.model.ddim_sigmas_for_original_num_steps if use_original_steps else self.ddim_sigmas
alphas = (
self.model.alphas_cumprod
if use_original_steps
else self.ddim_alphas
)
alphas_prev = (
self.model.alphas_cumprod_prev
if use_original_steps
else self.ddim_alphas_prev
)
sqrt_one_minus_alphas = (
self.model.sqrt_one_minus_alphas_cumprod
if use_original_steps
else self.ddim_sqrt_one_minus_alphas
)
sigmas = (
self.model.ddim_sigmas_for_original_num_steps
if use_original_steps
else self.ddim_sigmas
)
# select parameters corresponding to the currently considered timestep
a_t = torch.full((b, 1, 1, 1), alphas[index], device=device)
a_prev = torch.full((b, 1, 1, 1), alphas_prev[index], device=device)
sigma_t = torch.full((b, 1, 1, 1), sigmas[index], device=device)
sqrt_one_minus_at = torch.full((b, 1, 1, 1), sqrt_one_minus_alphas[index],device=device)
sqrt_one_minus_at = torch.full(
(b, 1, 1, 1), sqrt_one_minus_alphas[index], device=device
)
# current prediction for x_0
pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()
if quantize_denoised:
pred_x0, _, *_ = self.model.first_stage_model.quantize(pred_x0)
# direction pointing to x_t
dir_xt = (1. - a_prev - sigma_t**2).sqrt() * e_t
noise = sigma_t * noise_like(x.shape, device, repeat_noise) * temperature
if noise_dropout > 0.:
dir_xt = (1.0 - a_prev - sigma_t**2).sqrt() * e_t
noise = (
sigma_t * noise_like(x.shape, device, repeat_noise) * temperature
)
if noise_dropout > 0.0:
noise = torch.nn.functional.dropout(noise, p=noise_dropout)
x_prev = a_prev.sqrt() * pred_x0 + dir_xt + noise
return x_prev, pred_x0
@@ -216,26 +366,68 @@ class DDIMSampler(object):
if noise is None:
noise = torch.randn_like(x0)
return (extract_into_tensor(sqrt_alphas_cumprod, t, x0.shape) * x0 +
extract_into_tensor(sqrt_one_minus_alphas_cumprod, t, x0.shape) * noise)
return (
extract_into_tensor(sqrt_alphas_cumprod, t, x0.shape) * x0
+ extract_into_tensor(sqrt_one_minus_alphas_cumprod, t, x0.shape)
* noise
)
@torch.no_grad()
def decode(self, x_latent, cond, t_start, unconditional_guidance_scale=1.0, unconditional_conditioning=None,
use_original_steps=False):
def decode(
self,
x_latent,
cond,
t_start,
img_callback=None,
unconditional_guidance_scale=1.0,
unconditional_conditioning=None,
use_original_steps=False,
init_latent = None,
mask = None,
):
timesteps = np.arange(self.ddpm_num_timesteps) if use_original_steps else self.ddim_timesteps
timesteps = (
np.arange(self.ddpm_num_timesteps)
if use_original_steps
else self.ddim_timesteps
)
timesteps = timesteps[:t_start]
time_range = np.flip(timesteps)
total_steps = timesteps.shape[0]
print(f"Running DDIM Sampling with {total_steps} timesteps")
print(f'Running DDIM Sampling with {total_steps} timesteps')
iterator = tqdm(time_range, desc='Decoding image', total=total_steps)
x_dec = x_latent
x0 = init_latent
for i, step in enumerate(iterator):
index = total_steps - i - 1
ts = torch.full((x_latent.shape[0],), step, device=x_latent.device, dtype=torch.long)
x_dec, _ = self.p_sample_ddim(x_dec, cond, ts, index=index, use_original_steps=use_original_steps,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning)
ts = torch.full(
(x_latent.shape[0],),
step,
device=x_latent.device,
dtype=torch.long,
)
if mask is not None:
assert x0 is not None
xdec_orig = self.model.q_sample(
x0, ts
) # TODO: deterministic forward pass?
x_dec = xdec_orig * mask + (1.0 - mask) * x_dec
x_dec, _ = self.p_sample_ddim(
x_dec,
cond,
ts,
index=index,
use_original_steps=use_original_steps,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
)
if img_callback:
img_callback(x_dec, i)
return x_dec

File diff suppressed because it is too large Load Diff

View File

@@ -1,8 +1,8 @@
'''wrapper around part of Karen Crownson's k-duffsion library, making it call compatible with other Samplers'''
"""wrapper around part of Katherine Crowson's k-diffusion library, making it call compatible with other Samplers"""
import k_diffusion as K
import torch
import torch.nn as nn
import accelerate
from ldm.dream.devices import choose_torch_device
class CFGDenoiser(nn.Module):
def __init__(self, model):
@@ -16,59 +16,73 @@ class CFGDenoiser(nn.Module):
uncond, cond = self.inner_model(x_in, sigma_in, cond=cond_in).chunk(2)
return uncond + (cond - uncond) * cond_scale
class KSampler(object):
def __init__(self,model,schedule="lms", **kwargs):
def __init__(self, model, schedule='lms', device=None, **kwargs):
super().__init__()
self.model = K.external.CompVisDenoiser(model)
self.accelerator = accelerate.Accelerator()
self.device = self.accelerator.device
self.model = K.external.CompVisDenoiser(model)
self.schedule = schedule
self.device = device or choose_torch_device()
def forward(self, x, sigma, uncond, cond, cond_scale):
x_in = torch.cat([x] * 2)
sigma_in = torch.cat([sigma] * 2)
cond_in = torch.cat([uncond, cond])
uncond, cond = self.inner_model(x_in, sigma_in, cond=cond_in).chunk(2)
uncond, cond = self.inner_model(
x_in, sigma_in, cond=cond_in
).chunk(2)
return uncond + (cond - uncond) * cond_scale
# most of these arguments are ignored and are only present for compatibility with
# other samples
@torch.no_grad()
def sample(self,
S,
batch_size,
shape,
conditioning=None,
callback=None,
normals_sequence=None,
img_callback=None,
quantize_x0=False,
eta=0.,
mask=None,
x0=None,
temperature=1.,
noise_dropout=0.,
score_corrector=None,
corrector_kwargs=None,
verbose=True,
x_T=None,
log_every_t=100,
unconditional_guidance_scale=1.,
unconditional_conditioning=None,
# this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...
**kwargs
):
def sample(
self,
S,
batch_size,
shape,
conditioning=None,
callback=None,
normals_sequence=None,
img_callback=None,
quantize_x0=False,
eta=0.0,
mask=None,
x0=None,
temperature=1.0,
noise_dropout=0.0,
score_corrector=None,
corrector_kwargs=None,
verbose=True,
x_T=None,
log_every_t=100,
unconditional_guidance_scale=1.0,
unconditional_conditioning=None,
# this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...
**kwargs,
):
def route_callback(k_callback_values):
if img_callback is not None:
img_callback(k_callback_values['x'], k_callback_values['i'])
sigmas = self.model.get_sigmas(S)
if x_T:
x = x_T
if x_T is not None:
x = x_T * sigmas[0]
else:
x = torch.randn([batch_size, *shape], device=self.device) * sigmas[0] # for GPU draw
x = (
torch.randn([batch_size, *shape], device=self.device)
* sigmas[0]
) # for GPU draw
model_wrap_cfg = CFGDenoiser(self.model)
extra_args = {'cond': conditioning, 'uncond': unconditional_conditioning, 'cond_scale': unconditional_guidance_scale}
return (K.sampling.sample_lms(model_wrap_cfg, x, sigmas, extra_args=extra_args, disable=not self.accelerator.is_main_process),
None)
def gather(samples_ddim):
return self.accelerator.gather(samples_ddim)
extra_args = {
'cond': conditioning,
'uncond': unconditional_conditioning,
'cond_scale': unconditional_guidance_scale,
}
return (
K.sampling.__dict__[f'sample_{self.schedule}'](
model_wrap_cfg, x, sigmas, extra_args=extra_args,
callback=route_callback
),
None,
)

View File

@@ -4,120 +4,195 @@ import torch
import numpy as np
from tqdm import tqdm
from functools import partial
from ldm.dream.devices import choose_torch_device
from ldm.modules.diffusionmodules.util import make_ddim_sampling_parameters, make_ddim_timesteps, noise_like
from ldm.modules.diffusionmodules.util import (
make_ddim_sampling_parameters,
make_ddim_timesteps,
noise_like,
)
class PLMSSampler(object):
def __init__(self, model, schedule="linear", **kwargs):
def __init__(self, model, schedule='linear', device=None, **kwargs):
super().__init__()
self.model = model
self.ddpm_num_timesteps = model.num_timesteps
self.schedule = schedule
self.device = device if device else choose_torch_device()
def register_buffer(self, name, attr):
if type(attr) == torch.Tensor:
if attr.device != torch.device("cuda"):
attr = attr.to(torch.device("cuda"))
if attr.device != torch.device(self.device):
attr = attr.to(torch.float32).to(torch.device(self.device))
setattr(self, name, attr)
def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddim_eta=0., verbose=True):
def make_schedule(
self,
ddim_num_steps,
ddim_discretize='uniform',
ddim_eta=0.0,
verbose=True,
):
if ddim_eta != 0:
raise ValueError('ddim_eta must be 0 for PLMS')
self.ddim_timesteps = make_ddim_timesteps(ddim_discr_method=ddim_discretize, num_ddim_timesteps=ddim_num_steps,
num_ddpm_timesteps=self.ddpm_num_timesteps,verbose=verbose)
self.ddim_timesteps = make_ddim_timesteps(
ddim_discr_method=ddim_discretize,
num_ddim_timesteps=ddim_num_steps,
num_ddpm_timesteps=self.ddpm_num_timesteps,
verbose=verbose,
)
alphas_cumprod = self.model.alphas_cumprod
assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep'
to_torch = lambda x: x.clone().detach().to(torch.float32).to(self.model.device)
assert (
alphas_cumprod.shape[0] == self.ddpm_num_timesteps
), 'alphas have to be defined for each timestep'
to_torch = (
lambda x: x.clone()
.detach()
.to(torch.float32)
.to(self.model.device)
)
self.register_buffer('betas', to_torch(self.model.betas))
self.register_buffer('alphas_cumprod', to_torch(alphas_cumprod))
self.register_buffer('alphas_cumprod_prev', to_torch(self.model.alphas_cumprod_prev))
self.register_buffer(
'alphas_cumprod_prev', to_torch(self.model.alphas_cumprod_prev)
)
# calculations for diffusion q(x_t | x_{t-1}) and others
self.register_buffer('sqrt_alphas_cumprod', to_torch(np.sqrt(alphas_cumprod.cpu())))
self.register_buffer('sqrt_one_minus_alphas_cumprod', to_torch(np.sqrt(1. - alphas_cumprod.cpu())))
self.register_buffer('log_one_minus_alphas_cumprod', to_torch(np.log(1. - alphas_cumprod.cpu())))
self.register_buffer('sqrt_recip_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu())))
self.register_buffer('sqrt_recipm1_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu() - 1)))
self.register_buffer(
'sqrt_alphas_cumprod', to_torch(np.sqrt(alphas_cumprod.cpu()))
)
self.register_buffer(
'sqrt_one_minus_alphas_cumprod',
to_torch(np.sqrt(1.0 - alphas_cumprod.cpu())),
)
self.register_buffer(
'log_one_minus_alphas_cumprod',
to_torch(np.log(1.0 - alphas_cumprod.cpu())),
)
self.register_buffer(
'sqrt_recip_alphas_cumprod',
to_torch(np.sqrt(1.0 / alphas_cumprod.cpu())),
)
self.register_buffer(
'sqrt_recipm1_alphas_cumprod',
to_torch(np.sqrt(1.0 / alphas_cumprod.cpu() - 1)),
)
# ddim sampling parameters
ddim_sigmas, ddim_alphas, ddim_alphas_prev = make_ddim_sampling_parameters(alphacums=alphas_cumprod.cpu(),
ddim_timesteps=self.ddim_timesteps,
eta=ddim_eta,verbose=verbose)
(
ddim_sigmas,
ddim_alphas,
ddim_alphas_prev,
) = make_ddim_sampling_parameters(
alphacums=alphas_cumprod.cpu(),
ddim_timesteps=self.ddim_timesteps,
eta=ddim_eta,
verbose=verbose,
)
self.register_buffer('ddim_sigmas', ddim_sigmas)
self.register_buffer('ddim_alphas', ddim_alphas)
self.register_buffer('ddim_alphas_prev', ddim_alphas_prev)
self.register_buffer('ddim_sqrt_one_minus_alphas', np.sqrt(1. - ddim_alphas))
self.register_buffer(
'ddim_sqrt_one_minus_alphas', np.sqrt(1.0 - ddim_alphas)
)
sigmas_for_original_sampling_steps = ddim_eta * torch.sqrt(
(1 - self.alphas_cumprod_prev) / (1 - self.alphas_cumprod) * (
1 - self.alphas_cumprod / self.alphas_cumprod_prev))
self.register_buffer('ddim_sigmas_for_original_num_steps', sigmas_for_original_sampling_steps)
(1 - self.alphas_cumprod_prev)
/ (1 - self.alphas_cumprod)
* (1 - self.alphas_cumprod / self.alphas_cumprod_prev)
)
self.register_buffer(
'ddim_sigmas_for_original_num_steps',
sigmas_for_original_sampling_steps,
)
@torch.no_grad()
def sample(self,
S,
batch_size,
shape,
conditioning=None,
callback=None,
normals_sequence=None,
img_callback=None,
quantize_x0=False,
eta=0.,
mask=None,
x0=None,
temperature=1.,
noise_dropout=0.,
score_corrector=None,
corrector_kwargs=None,
verbose=True,
x_T=None,
log_every_t=100,
unconditional_guidance_scale=1.,
unconditional_conditioning=None,
# this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...
**kwargs
):
def sample(
self,
S,
batch_size,
shape,
conditioning=None,
callback=None,
normals_sequence=None,
img_callback=None,
quantize_x0=False,
eta=0.0,
mask=None,
x0=None,
temperature=1.0,
noise_dropout=0.0,
score_corrector=None,
corrector_kwargs=None,
verbose=True,
x_T=None,
log_every_t=100,
unconditional_guidance_scale=1.0,
unconditional_conditioning=None,
# this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...
**kwargs,
):
if conditioning is not None:
if isinstance(conditioning, dict):
cbs = conditioning[list(conditioning.keys())[0]].shape[0]
if cbs != batch_size:
print(f"Warning: Got {cbs} conditionings but batch-size is {batch_size}")
print(
f'Warning: Got {cbs} conditionings but batch-size is {batch_size}'
)
else:
if conditioning.shape[0] != batch_size:
print(f"Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}")
print(
f'Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}'
)
self.make_schedule(ddim_num_steps=S, ddim_eta=eta, verbose=verbose)
# sampling
C, H, W = shape
size = (batch_size, C, H, W)
# print(f'Data shape for PLMS sampling is {size}')
# print(f'Data shape for PLMS sampling is {size}')
samples, intermediates = self.plms_sampling(conditioning, size,
callback=callback,
img_callback=img_callback,
quantize_denoised=quantize_x0,
mask=mask, x0=x0,
ddim_use_original_steps=False,
noise_dropout=noise_dropout,
temperature=temperature,
score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
x_T=x_T,
log_every_t=log_every_t,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
)
samples, intermediates = self.plms_sampling(
conditioning,
size,
callback=callback,
img_callback=img_callback,
quantize_denoised=quantize_x0,
mask=mask,
x0=x0,
ddim_use_original_steps=False,
noise_dropout=noise_dropout,
temperature=temperature,
score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
x_T=x_T,
log_every_t=log_every_t,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
)
return samples, intermediates
@torch.no_grad()
def plms_sampling(self, cond, shape,
x_T=None, ddim_use_original_steps=False,
callback=None, timesteps=None, quantize_denoised=False,
mask=None, x0=None, img_callback=None, log_every_t=100,
temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
unconditional_guidance_scale=1., unconditional_conditioning=None,):
def plms_sampling(
self,
cond,
shape,
x_T=None,
ddim_use_original_steps=False,
callback=None,
timesteps=None,
quantize_denoised=False,
mask=None,
x0=None,
img_callback=None,
log_every_t=100,
temperature=1.0,
noise_dropout=0.0,
score_corrector=None,
corrector_kwargs=None,
unconditional_guidance_scale=1.0,
unconditional_conditioning=None,
):
device = self.model.betas.device
b = shape[0]
if x_T is None:
@@ -126,42 +201,81 @@ class PLMSSampler(object):
img = x_T
if timesteps is None:
timesteps = self.ddpm_num_timesteps if ddim_use_original_steps else self.ddim_timesteps
timesteps = (
self.ddpm_num_timesteps
if ddim_use_original_steps
else self.ddim_timesteps
)
elif timesteps is not None and not ddim_use_original_steps:
subset_end = int(min(timesteps / self.ddim_timesteps.shape[0], 1) * self.ddim_timesteps.shape[0]) - 1
subset_end = (
int(
min(timesteps / self.ddim_timesteps.shape[0], 1)
* self.ddim_timesteps.shape[0]
)
- 1
)
timesteps = self.ddim_timesteps[:subset_end]
intermediates = {'x_inter': [img], 'pred_x0': [img]}
time_range = list(reversed(range(0,timesteps))) if ddim_use_original_steps else np.flip(timesteps)
total_steps = timesteps if ddim_use_original_steps else timesteps.shape[0]
# print(f"Running PLMS Sampling with {total_steps} timesteps")
time_range = (
list(reversed(range(0, timesteps)))
if ddim_use_original_steps
else np.flip(timesteps)
)
total_steps = (
timesteps if ddim_use_original_steps else timesteps.shape[0]
)
# print(f"Running PLMS Sampling with {total_steps} timesteps")
iterator = tqdm(time_range, desc='PLMS Sampler', total=total_steps, dynamic_ncols=True)
iterator = tqdm(
time_range,
desc='PLMS Sampler',
total=total_steps,
dynamic_ncols=True,
)
old_eps = []
for i, step in enumerate(iterator):
index = total_steps - i - 1
ts = torch.full((b,), step, device=device, dtype=torch.long)
ts_next = torch.full((b,), time_range[min(i + 1, len(time_range) - 1)], device=device, dtype=torch.long)
ts_next = torch.full(
(b,),
time_range[min(i + 1, len(time_range) - 1)],
device=device,
dtype=torch.long,
)
if mask is not None:
assert x0 is not None
img_orig = self.model.q_sample(x0, ts) # TODO: deterministic forward pass?
img = img_orig * mask + (1. - mask) * img
img_orig = self.model.q_sample(
x0, ts
) # TODO: deterministic forward pass?
img = img_orig * mask + (1.0 - mask) * img
outs = self.p_sample_plms(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
quantize_denoised=quantize_denoised, temperature=temperature,
noise_dropout=noise_dropout, score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
old_eps=old_eps, t_next=ts_next)
outs = self.p_sample_plms(
img,
cond,
ts,
index=index,
use_original_steps=ddim_use_original_steps,
quantize_denoised=quantize_denoised,
temperature=temperature,
noise_dropout=noise_dropout,
score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
old_eps=old_eps,
t_next=ts_next,
)
img, pred_x0, e_t = outs
old_eps.append(e_t)
if len(old_eps) >= 4:
old_eps.pop(0)
if callback: callback(i)
if img_callback: img_callback(pred_x0, i)
if callback:
callback(i)
if img_callback:
img_callback(pred_x0, i)
if index % log_every_t == 0 or index == total_steps - 1:
intermediates['x_inter'].append(img)
@@ -170,47 +284,95 @@ class PLMSSampler(object):
return img, intermediates
@torch.no_grad()
def p_sample_plms(self, x, c, t, index, repeat_noise=False, use_original_steps=False, quantize_denoised=False,
temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
unconditional_guidance_scale=1., unconditional_conditioning=None, old_eps=None, t_next=None):
def p_sample_plms(
self,
x,
c,
t,
index,
repeat_noise=False,
use_original_steps=False,
quantize_denoised=False,
temperature=1.0,
noise_dropout=0.0,
score_corrector=None,
corrector_kwargs=None,
unconditional_guidance_scale=1.0,
unconditional_conditioning=None,
old_eps=None,
t_next=None,
):
b, *_, device = *x.shape, x.device
def get_model_output(x, t):
if unconditional_conditioning is None or unconditional_guidance_scale == 1.:
if (
unconditional_conditioning is None
or unconditional_guidance_scale == 1.0
):
e_t = self.model.apply_model(x, t, c)
else:
x_in = torch.cat([x] * 2)
t_in = torch.cat([t] * 2)
c_in = torch.cat([unconditional_conditioning, c])
e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
e_t = e_t_uncond + unconditional_guidance_scale * (e_t - e_t_uncond)
e_t_uncond, e_t = self.model.apply_model(
x_in, t_in, c_in
).chunk(2)
e_t = e_t_uncond + unconditional_guidance_scale * (
e_t - e_t_uncond
)
if score_corrector is not None:
assert self.model.parameterization == "eps"
e_t = score_corrector.modify_score(self.model, e_t, x, t, c, **corrector_kwargs)
assert self.model.parameterization == 'eps'
e_t = score_corrector.modify_score(
self.model, e_t, x, t, c, **corrector_kwargs
)
return e_t
alphas = self.model.alphas_cumprod if use_original_steps else self.ddim_alphas
alphas_prev = self.model.alphas_cumprod_prev if use_original_steps else self.ddim_alphas_prev
sqrt_one_minus_alphas = self.model.sqrt_one_minus_alphas_cumprod if use_original_steps else self.ddim_sqrt_one_minus_alphas
sigmas = self.model.ddim_sigmas_for_original_num_steps if use_original_steps else self.ddim_sigmas
alphas = (
self.model.alphas_cumprod
if use_original_steps
else self.ddim_alphas
)
alphas_prev = (
self.model.alphas_cumprod_prev
if use_original_steps
else self.ddim_alphas_prev
)
sqrt_one_minus_alphas = (
self.model.sqrt_one_minus_alphas_cumprod
if use_original_steps
else self.ddim_sqrt_one_minus_alphas
)
sigmas = (
self.model.ddim_sigmas_for_original_num_steps
if use_original_steps
else self.ddim_sigmas
)
def get_x_prev_and_pred_x0(e_t, index):
# select parameters corresponding to the currently considered timestep
a_t = torch.full((b, 1, 1, 1), alphas[index], device=device)
a_prev = torch.full((b, 1, 1, 1), alphas_prev[index], device=device)
a_prev = torch.full(
(b, 1, 1, 1), alphas_prev[index], device=device
)
sigma_t = torch.full((b, 1, 1, 1), sigmas[index], device=device)
sqrt_one_minus_at = torch.full((b, 1, 1, 1), sqrt_one_minus_alphas[index],device=device)
sqrt_one_minus_at = torch.full(
(b, 1, 1, 1), sqrt_one_minus_alphas[index], device=device
)
# current prediction for x_0
pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()
if quantize_denoised:
pred_x0, _, *_ = self.model.first_stage_model.quantize(pred_x0)
# direction pointing to x_t
dir_xt = (1. - a_prev - sigma_t**2).sqrt() * e_t
noise = sigma_t * noise_like(x.shape, device, repeat_noise) * temperature
if noise_dropout > 0.:
dir_xt = (1.0 - a_prev - sigma_t**2).sqrt() * e_t
noise = (
sigma_t
* noise_like(x.shape, device, repeat_noise)
* temperature
)
if noise_dropout > 0.0:
noise = torch.nn.functional.dropout(noise, p=noise_dropout)
x_prev = a_prev.sqrt() * pred_x0 + dir_xt + noise
return x_prev, pred_x0
@@ -229,7 +391,12 @@ class PLMSSampler(object):
e_t_prime = (23 * e_t - 16 * old_eps[-1] + 5 * old_eps[-2]) / 12
elif len(old_eps) >= 3:
# 4nd order Pseudo Linear Multistep (Adams-Bashforth)
e_t_prime = (55 * e_t - 59 * old_eps[-1] + 37 * old_eps[-2] - 9 * old_eps[-3]) / 24
e_t_prime = (
55 * e_t
- 59 * old_eps[-1]
+ 37 * old_eps[-2]
- 9 * old_eps[-3]
) / 24
x_prev, pred_x0 = get_x_prev_and_pred_x0(e_t_prime, index)

View File

@@ -7,6 +7,7 @@ from einops import rearrange, repeat
from ldm.modules.diffusionmodules.util import checkpoint
import psutil
def exists(val):
return val is not None
@@ -167,30 +168,116 @@ class CrossAttention(nn.Module):
nn.Dropout(dropout)
)
if not torch.cuda.is_available():
mem_av = psutil.virtual_memory().available / (1024**3)
if mem_av > 32:
self.einsum_op = self.einsum_op_v1
elif mem_av > 12:
self.einsum_op = self.einsum_op_v2
else:
self.einsum_op = self.einsum_op_v3
del mem_av
else:
self.einsum_op = self.einsum_op_v4
# mps 64-128 GB
def einsum_op_v1(self, q, k, v, r1):
if q.shape[1] <= 4096: # for 512x512: the max q.shape[1] is 4096
s1 = einsum('b i d, b j d -> b i j', q, k) * self.scale # aggressive/faster: operation in one go
s2 = s1.softmax(dim=-1, dtype=q.dtype)
del s1
r1 = einsum('b i j, b j d -> b i d', s2, v)
del s2
else:
# q.shape[0] * q.shape[1] * slice_size >= 2**31 throws err
# needs around half of that slice_size to not generate noise
slice_size = math.floor(2**30 / (q.shape[0] * q.shape[1]))
for i in range(0, q.shape[1], slice_size):
end = i + slice_size
s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k) * self.scale
s2 = s1.softmax(dim=-1, dtype=r1.dtype)
del s1
r1[:, i:end] = einsum('b i j, b j d -> b i d', s2, v)
del s2
return r1
# mps 16-32 GB (can be optimized)
def einsum_op_v2(self, q, k, v, r1):
slice_size = math.floor(2**30 / (q.shape[0] * q.shape[1]))
for i in range(0, q.shape[1], slice_size): # conservative/less mem: operation in steps
end = i + slice_size
s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k) * self.scale
s2 = s1.softmax(dim=-1, dtype=r1.dtype)
del s1
r1[:, i:end] = einsum('b i j, b j d -> b i d', s2, v)
del s2
return r1
# mps 8 GB
def einsum_op_v3(self, q, k, v, r1):
slice_size = 1
for i in range(0, q.shape[0], slice_size): # iterate over q.shape[0]
end = min(q.shape[0], i + slice_size)
s1 = einsum('b i d, b j d -> b i j', q[i:end], k[i:end]) # adapted einsum for mem
s1 *= self.scale
s2 = s1.softmax(dim=-1, dtype=r1.dtype)
del s1
r1[i:end] = einsum('b i j, b j d -> b i d', s2, v[i:end]) # adapted einsum for mem
del s2
return r1
# cuda
def einsum_op_v4(self, q, k, v, r1):
stats = torch.cuda.memory_stats(q.device)
mem_active = stats['active_bytes.all.current']
mem_reserved = stats['reserved_bytes.all.current']
mem_free_cuda, _ = torch.cuda.mem_get_info(torch.cuda.current_device())
mem_free_torch = mem_reserved - mem_active
mem_free_total = mem_free_cuda + mem_free_torch
gb = 1024 ** 3
tensor_size = q.shape[0] * q.shape[1] * k.shape[1] * 4
mem_required = tensor_size * 2.5
steps = 1
if mem_required > mem_free_total:
steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2)))
if steps > 64:
max_res = math.floor(math.sqrt(math.sqrt(mem_free_total / 2.5)) / 8) * 64
raise RuntimeError(f'Not enough memory, use lower resolution (max approx. {max_res}x{max_res}). '
f'Need: {mem_required/64/gb:0.1f}GB free, Have:{mem_free_total/gb:0.1f}GB free')
slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1]
for i in range(0, q.shape[1], slice_size):
end = min(q.shape[1], i + slice_size)
s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k) * self.scale
s2 = s1.softmax(dim=-1, dtype=r1.dtype)
del s1
r1[:, i:end] = einsum('b i j, b j d -> b i d', s2, v)
del s2
return r1
def forward(self, x, context=None, mask=None):
h = self.heads
q = self.to_q(x)
q_in = self.to_q(x)
context = default(context, x)
k = self.to_k(context)
v = self.to_v(context)
k_in = self.to_k(context)
v_in = self.to_v(context)
device_type = 'mps' if x.device.type == 'mps' else 'cuda'
del context, x
q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))
q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q_in, k_in, v_in))
del q_in, k_in, v_in
r1 = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device, dtype=q.dtype)
r1 = self.einsum_op(q, k, v, r1)
del q, k, v
sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
r2 = rearrange(r1, '(b h) n d -> b n (h d)', h=h)
del r1
if exists(mask):
mask = rearrange(mask, 'b ... -> b (...)')
max_neg_value = -torch.finfo(sim.dtype).max
mask = repeat(mask, 'b j -> (b h) () j', h=h)
sim.masked_fill_(~mask, max_neg_value)
# attention, what we cannot get enough of
attn = sim.softmax(dim=-1)
out = einsum('b i j, b j d -> b i d', attn, v)
out = rearrange(out, '(b h) n d -> b n (h d)', h=h)
return self.to_out(out)
return self.to_out(r2)
class BasicTransformerBlock(nn.Module):
@@ -209,6 +296,7 @@ class BasicTransformerBlock(nn.Module):
return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
def _forward(self, x, context=None):
x = x.contiguous() if x.device.type == 'mps' else x
x = self.attn1(self.norm1(x)) + x
x = self.attn2(self.norm2(x), context=context) + x
x = self.ff(self.norm3(x)) + x
@@ -258,4 +346,4 @@ class SpatialTransformer(nn.Module):
x = block(x, context=context)
x = rearrange(x, 'b (h w) c -> b c h w', h=h, w=w)
x = self.proj_out(x)
return x + x_in
return x + x_in

View File

@@ -1,4 +1,5 @@
# pytorch_diffusion + derived encoder decoder
import gc
import math
import torch
import torch.nn as nn
@@ -8,6 +9,7 @@ from einops import rearrange
from ldm.util import instantiate_from_config
from ldm.modules.attention import LinearAttention
import psutil
def get_timestep_embedding(timesteps, embedding_dim):
"""
@@ -119,18 +121,30 @@ class ResnetBlock(nn.Module):
padding=0)
def forward(self, x, temb):
h = x
h = self.norm1(h)
h = nonlinearity(h)
h = self.conv1(h)
h1 = x
h2 = self.norm1(h1)
del h1
h3 = nonlinearity(h2)
del h2
h4 = self.conv1(h3)
del h3
if temb is not None:
h = h + self.temb_proj(nonlinearity(temb))[:,:,None,None]
h4 = h4 + self.temb_proj(nonlinearity(temb))[:,:,None,None]
h = self.norm2(h)
h = nonlinearity(h)
h = self.dropout(h)
h = self.conv2(h)
h5 = self.norm2(h4)
del h4
h6 = nonlinearity(h5)
del h5
h7 = self.dropout(h6)
del h6
h8 = self.conv2(h7)
del h7
if self.in_channels != self.out_channels:
if self.use_conv_shortcut:
@@ -138,8 +152,7 @@ class ResnetBlock(nn.Module):
else:
x = self.nin_shortcut(x)
return x+h
return x + h8
class LinAttnBlock(LinearAttention):
"""to match AttnBlock usage"""
@@ -178,28 +191,74 @@ class AttnBlock(nn.Module):
def forward(self, x):
h_ = x
h_ = self.norm(h_)
q = self.q(h_)
k = self.k(h_)
q1 = self.q(h_)
k1 = self.k(h_)
v = self.v(h_)
# compute attention
b,c,h,w = q.shape
q = q.reshape(b,c,h*w)
q = q.permute(0,2,1) # b,hw,c
k = k.reshape(b,c,h*w) # b,c,hw
w_ = torch.bmm(q,k) # b,hw,hw w[b,i,j]=sum_c q[b,i,c]k[b,c,j]
w_ = w_ * (int(c)**(-0.5))
w_ = torch.nn.functional.softmax(w_, dim=2)
b, c, h, w = q1.shape
# attend to values
v = v.reshape(b,c,h*w)
w_ = w_.permute(0,2,1) # b,hw,hw (first hw of k, second of q)
h_ = torch.bmm(v,w_) # b, c,hw (hw of q) h_[b,c,j] = sum_i v[b,c,i] w_[b,i,j]
h_ = h_.reshape(b,c,h,w)
q2 = q1.reshape(b, c, h*w)
del q1
h_ = self.proj_out(h_)
q = q2.permute(0, 2, 1) # b,hw,c
del q2
return x+h_
k = k1.reshape(b, c, h*w) # b,c,hw
del k1
h_ = torch.zeros_like(k, device=q.device)
device_type = 'mps' if q.device.type == 'mps' else 'cuda'
if device_type == 'cuda':
stats = torch.cuda.memory_stats(q.device)
mem_active = stats['active_bytes.all.current']
mem_reserved = stats['reserved_bytes.all.current']
mem_free_cuda, _ = torch.cuda.mem_get_info(torch.cuda.current_device())
mem_free_torch = mem_reserved - mem_active
mem_free_total = mem_free_cuda + mem_free_torch
tensor_size = q.shape[0] * q.shape[1] * k.shape[2] * 4
mem_required = tensor_size * 2.5
steps = 1
if mem_required > mem_free_total:
steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2)))
slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1]
else:
if psutil.virtual_memory().available / (1024**3) < 12:
slice_size = 1
else:
slice_size = min(q.shape[1], math.floor(2**30 / (q.shape[0] * q.shape[1])))
for i in range(0, q.shape[1], slice_size):
end = i + slice_size
w1 = torch.bmm(q[:, i:end], k) # b,hw,hw w[b,i,j]=sum_c q[b,i,c]k[b,c,j]
w2 = w1 * (int(c)**(-0.5))
del w1
w3 = torch.nn.functional.softmax(w2, dim=2)
del w2
# attend to values
v1 = v.reshape(b, c, h*w)
w4 = w3.permute(0, 2, 1) # b,hw,hw (first hw of k, second of q)
del w3
h_[:, :, i:end] = torch.bmm(v1, w4) # b, c,hw (hw of q) h_[b,c,j] = sum_i v[b,c,i] w_[b,i,j]
del v1, w4
h2 = h_.reshape(b, c, h, w)
del h_
h3 = self.proj_out(h2)
del h2
h3 += x
return h3
def make_attn(in_channels, attn_type="vanilla"):
@@ -540,31 +599,56 @@ class Decoder(nn.Module):
temb = None
# z to block_in
h = self.conv_in(z)
h1 = self.conv_in(z)
# middle
h = self.mid.block_1(h, temb)
h = self.mid.attn_1(h)
h = self.mid.block_2(h, temb)
h2 = self.mid.block_1(h1, temb)
del h1
h3 = self.mid.attn_1(h2)
del h2
h = self.mid.block_2(h3, temb)
del h3
# prepare for up sampling
device_type = 'mps' if h.device.type == 'mps' else 'cuda'
gc.collect()
if device_type == 'cuda':
torch.cuda.empty_cache()
# upsampling
for i_level in reversed(range(self.num_resolutions)):
for i_block in range(self.num_res_blocks+1):
h = self.up[i_level].block[i_block](h, temb)
if len(self.up[i_level].attn) > 0:
h = self.up[i_level].attn[i_block](h)
t = h
h = self.up[i_level].attn[i_block](t)
del t
if i_level != 0:
h = self.up[i_level].upsample(h)
t = h
h = self.up[i_level].upsample(t)
del t
# end
if self.give_pre_end:
return h
h = self.norm_out(h)
h = nonlinearity(h)
h = self.conv_out(h)
h1 = self.norm_out(h)
del h
h2 = nonlinearity(h1)
del h1
h = self.conv_out(h2)
del h2
if self.tanh_out:
h = torch.tanh(h)
t = h
h = torch.tanh(t)
del t
return h
@@ -832,4 +916,3 @@ class FirstStagePostProcessor(nn.Module):
if self.do_reshape:
z = rearrange(z,'b c h w -> b (h w) c')
return z

View File

@@ -24,6 +24,7 @@ from ldm.modules.attention import SpatialTransformer
def convert_module_to_f16(x):
pass
def convert_module_to_f32(x):
pass
@@ -42,7 +43,9 @@ class AttentionPool2d(nn.Module):
output_dim: int = None,
):
super().__init__()
self.positional_embedding = nn.Parameter(th.randn(embed_dim, spacial_dim ** 2 + 1) / embed_dim ** 0.5)
self.positional_embedding = nn.Parameter(
th.randn(embed_dim, spacial_dim**2 + 1) / embed_dim**0.5
)
self.qkv_proj = conv_nd(1, embed_dim, 3 * embed_dim, 1)
self.c_proj = conv_nd(1, embed_dim, output_dim or embed_dim, 1)
self.num_heads = embed_dim // num_heads_channels
@@ -97,37 +100,45 @@ class Upsample(nn.Module):
upsampling occurs in the inner-two dimensions.
"""
def __init__(self, channels, use_conv, dims=2, out_channels=None, padding=1):
def __init__(
self, channels, use_conv, dims=2, out_channels=None, padding=1
):
super().__init__()
self.channels = channels
self.out_channels = out_channels or channels
self.use_conv = use_conv
self.dims = dims
if use_conv:
self.conv = conv_nd(dims, self.channels, self.out_channels, 3, padding=padding)
self.conv = conv_nd(
dims, self.channels, self.out_channels, 3, padding=padding
)
def forward(self, x):
assert x.shape[1] == self.channels
if self.dims == 3:
x = F.interpolate(
x, (x.shape[2], x.shape[3] * 2, x.shape[4] * 2), mode="nearest"
x, (x.shape[2], x.shape[3] * 2, x.shape[4] * 2), mode='nearest'
)
else:
x = F.interpolate(x, scale_factor=2, mode="nearest")
x = F.interpolate(x, scale_factor=2, mode='nearest')
if self.use_conv:
x = self.conv(x)
return x
class TransposedUpsample(nn.Module):
'Learned 2x upsampling without padding'
"""Learned 2x upsampling without padding"""
def __init__(self, channels, out_channels=None, ks=5):
super().__init__()
self.channels = channels
self.out_channels = out_channels or channels
self.up = nn.ConvTranspose2d(self.channels,self.out_channels,kernel_size=ks,stride=2)
self.up = nn.ConvTranspose2d(
self.channels, self.out_channels, kernel_size=ks, stride=2
)
def forward(self,x):
def forward(self, x):
return self.up(x)
@@ -140,7 +151,9 @@ class Downsample(nn.Module):
downsampling occurs in the inner-two dimensions.
"""
def __init__(self, channels, use_conv, dims=2, out_channels=None,padding=1):
def __init__(
self, channels, use_conv, dims=2, out_channels=None, padding=1
):
super().__init__()
self.channels = channels
self.out_channels = out_channels or channels
@@ -149,7 +162,12 @@ class Downsample(nn.Module):
stride = 2 if dims != 3 else (1, 2, 2)
if use_conv:
self.op = conv_nd(
dims, self.channels, self.out_channels, 3, stride=stride, padding=padding
dims,
self.channels,
self.out_channels,
3,
stride=stride,
padding=padding,
)
else:
assert self.channels == self.out_channels
@@ -219,7 +237,9 @@ class ResBlock(TimestepBlock):
nn.SiLU(),
linear(
emb_channels,
2 * self.out_channels if use_scale_shift_norm else self.out_channels,
2 * self.out_channels
if use_scale_shift_norm
else self.out_channels,
),
)
self.out_layers = nn.Sequential(
@@ -227,7 +247,9 @@ class ResBlock(TimestepBlock):
nn.SiLU(),
nn.Dropout(p=dropout),
zero_module(
conv_nd(dims, self.out_channels, self.out_channels, 3, padding=1)
conv_nd(
dims, self.out_channels, self.out_channels, 3, padding=1
)
),
)
@@ -238,7 +260,9 @@ class ResBlock(TimestepBlock):
dims, channels, self.out_channels, 3, padding=1
)
else:
self.skip_connection = conv_nd(dims, channels, self.out_channels, 1)
self.skip_connection = conv_nd(
dims, channels, self.out_channels, 1
)
def forward(self, x, emb):
"""
@@ -251,7 +275,6 @@ class ResBlock(TimestepBlock):
self._forward, (x, emb), self.parameters(), self.use_checkpoint
)
def _forward(self, x, emb):
if self.updown:
in_rest, in_conv = self.in_layers[:-1], self.in_layers[-1]
@@ -297,7 +320,7 @@ class AttentionBlock(nn.Module):
else:
assert (
channels % num_head_channels == 0
), f"q,k,v channels {channels} is not divisible by num_head_channels {num_head_channels}"
), f'q,k,v channels {channels} is not divisible by num_head_channels {num_head_channels}'
self.num_heads = channels // num_head_channels
self.use_checkpoint = use_checkpoint
self.norm = normalization(channels)
@@ -312,8 +335,10 @@ class AttentionBlock(nn.Module):
self.proj_out = zero_module(conv_nd(1, channels, channels, 1))
def forward(self, x):
return checkpoint(self._forward, (x,), self.parameters(), True) # TODO: check checkpoint usage, is True # TODO: fix the .half call!!!
#return pt_checkpoint(self._forward, x) # pytorch
return checkpoint(
self._forward, (x,), self.parameters(), True
) # TODO: check checkpoint usage, is True # TODO: fix the .half call!!!
# return pt_checkpoint(self._forward, x) # pytorch
def _forward(self, x):
b, c, *spatial = x.shape
@@ -340,7 +365,7 @@ def count_flops_attn(model, _x, y):
# We perform two matmuls with the same number of ops.
# The first computes the weight matrix, the second computes
# the combination of the value vectors.
matmul_ops = 2 * b * (num_spatial ** 2) * c
matmul_ops = 2 * b * (num_spatial**2) * c
model.total_ops += th.DoubleTensor([matmul_ops])
@@ -362,13 +387,15 @@ class QKVAttentionLegacy(nn.Module):
bs, width, length = qkv.shape
assert width % (3 * self.n_heads) == 0
ch = width // (3 * self.n_heads)
q, k, v = qkv.reshape(bs * self.n_heads, ch * 3, length).split(ch, dim=1)
q, k, v = qkv.reshape(bs * self.n_heads, ch * 3, length).split(
ch, dim=1
)
scale = 1 / math.sqrt(math.sqrt(ch))
weight = th.einsum(
"bct,bcs->bts", q * scale, k * scale
'bct,bcs->bts', q * scale, k * scale
) # More stable with f16 than dividing afterwards
weight = th.softmax(weight.float(), dim=-1).type(weight.dtype)
a = th.einsum("bts,bcs->bct", weight, v)
a = th.einsum('bts,bcs->bct', weight, v)
return a.reshape(bs, -1, length)
@staticmethod
@@ -397,12 +424,14 @@ class QKVAttention(nn.Module):
q, k, v = qkv.chunk(3, dim=1)
scale = 1 / math.sqrt(math.sqrt(ch))
weight = th.einsum(
"bct,bcs->bts",
'bct,bcs->bts',
(q * scale).view(bs * self.n_heads, ch, length),
(k * scale).view(bs * self.n_heads, ch, length),
) # More stable with f16 than dividing afterwards
weight = th.softmax(weight.float(), dim=-1).type(weight.dtype)
a = th.einsum("bts,bcs->bct", weight, v.reshape(bs * self.n_heads, ch, length))
a = th.einsum(
'bts,bcs->bct', weight, v.reshape(bs * self.n_heads, ch, length)
)
return a.reshape(bs, -1, length)
@staticmethod
@@ -461,19 +490,24 @@ class UNetModel(nn.Module):
use_scale_shift_norm=False,
resblock_updown=False,
use_new_attention_order=False,
use_spatial_transformer=False, # custom transformer support
transformer_depth=1, # custom transformer support
context_dim=None, # custom transformer support
n_embed=None, # custom support for prediction of discrete ids into codebook of first stage vq model
use_spatial_transformer=False, # custom transformer support
transformer_depth=1, # custom transformer support
context_dim=None, # custom transformer support
n_embed=None, # custom support for prediction of discrete ids into codebook of first stage vq model
legacy=True,
):
super().__init__()
if use_spatial_transformer:
assert context_dim is not None, 'Fool!! You forgot to include the dimension of your cross-attention conditioning...'
assert (
context_dim is not None
), 'Fool!! You forgot to include the dimension of your cross-attention conditioning...'
if context_dim is not None:
assert use_spatial_transformer, 'Fool!! You forgot to use the spatial transformer for your cross-attention conditioning...'
assert (
use_spatial_transformer
), 'Fool!! You forgot to use the spatial transformer for your cross-attention conditioning...'
from omegaconf.listconfig import ListConfig
if type(context_dim) == ListConfig:
context_dim = list(context_dim)
@@ -481,10 +515,14 @@ class UNetModel(nn.Module):
num_heads_upsample = num_heads
if num_heads == -1:
assert num_head_channels != -1, 'Either num_heads or num_head_channels has to be set'
assert (
num_head_channels != -1
), 'Either num_heads or num_head_channels has to be set'
if num_head_channels == -1:
assert num_heads != -1, 'Either num_heads or num_head_channels has to be set'
assert (
num_heads != -1
), 'Either num_heads or num_head_channels has to be set'
self.image_size = image_size
self.in_channels = in_channels
@@ -545,8 +583,12 @@ class UNetModel(nn.Module):
num_heads = ch // num_head_channels
dim_head = num_head_channels
if legacy:
#num_heads = 1
dim_head = ch // num_heads if use_spatial_transformer else num_head_channels
# num_heads = 1
dim_head = (
ch // num_heads
if use_spatial_transformer
else num_head_channels
)
layers.append(
AttentionBlock(
ch,
@@ -554,8 +596,14 @@ class UNetModel(nn.Module):
num_heads=num_heads,
num_head_channels=dim_head,
use_new_attention_order=use_new_attention_order,
) if not use_spatial_transformer else SpatialTransformer(
ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim
)
if not use_spatial_transformer
else SpatialTransformer(
ch,
num_heads,
dim_head,
depth=transformer_depth,
context_dim=context_dim,
)
)
self.input_blocks.append(TimestepEmbedSequential(*layers))
@@ -592,8 +640,12 @@ class UNetModel(nn.Module):
num_heads = ch // num_head_channels
dim_head = num_head_channels
if legacy:
#num_heads = 1
dim_head = ch // num_heads if use_spatial_transformer else num_head_channels
# num_heads = 1
dim_head = (
ch // num_heads
if use_spatial_transformer
else num_head_channels
)
self.middle_block = TimestepEmbedSequential(
ResBlock(
ch,
@@ -609,9 +661,15 @@ class UNetModel(nn.Module):
num_heads=num_heads,
num_head_channels=dim_head,
use_new_attention_order=use_new_attention_order,
) if not use_spatial_transformer else SpatialTransformer(
ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim
),
)
if not use_spatial_transformer
else SpatialTransformer(
ch,
num_heads,
dim_head,
depth=transformer_depth,
context_dim=context_dim,
),
ResBlock(
ch,
time_embed_dim,
@@ -646,8 +704,12 @@ class UNetModel(nn.Module):
num_heads = ch // num_head_channels
dim_head = num_head_channels
if legacy:
#num_heads = 1
dim_head = ch // num_heads if use_spatial_transformer else num_head_channels
# num_heads = 1
dim_head = (
ch // num_heads
if use_spatial_transformer
else num_head_channels
)
layers.append(
AttentionBlock(
ch,
@@ -655,8 +717,14 @@ class UNetModel(nn.Module):
num_heads=num_heads_upsample,
num_head_channels=dim_head,
use_new_attention_order=use_new_attention_order,
) if not use_spatial_transformer else SpatialTransformer(
ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim
)
if not use_spatial_transformer
else SpatialTransformer(
ch,
num_heads,
dim_head,
depth=transformer_depth,
context_dim=context_dim,
)
)
if level and i == num_res_blocks:
@@ -673,7 +741,9 @@ class UNetModel(nn.Module):
up=True,
)
if resblock_updown
else Upsample(ch, conv_resample, dims=dims, out_channels=out_ch)
else Upsample(
ch, conv_resample, dims=dims, out_channels=out_ch
)
)
ds //= 2
self.output_blocks.append(TimestepEmbedSequential(*layers))
@@ -682,14 +752,16 @@ class UNetModel(nn.Module):
self.out = nn.Sequential(
normalization(ch),
nn.SiLU(),
zero_module(conv_nd(dims, model_channels, out_channels, 3, padding=1)),
zero_module(
conv_nd(dims, model_channels, out_channels, 3, padding=1)
),
)
if self.predict_codebook_ids:
self.id_predictor = nn.Sequential(
normalization(ch),
conv_nd(dims, model_channels, n_embed, 1),
#nn.LogSoftmax(dim=1) # change to cross_entropy and produce non-normalized logits
)
normalization(ch),
conv_nd(dims, model_channels, n_embed, 1),
# nn.LogSoftmax(dim=1) # change to cross_entropy and produce non-normalized logits
)
def convert_to_fp16(self):
"""
@@ -707,7 +779,7 @@ class UNetModel(nn.Module):
self.middle_block.apply(convert_module_to_f32)
self.output_blocks.apply(convert_module_to_f32)
def forward(self, x, timesteps=None, context=None, y=None,**kwargs):
def forward(self, x, timesteps=None, context=None, y=None, **kwargs):
"""
Apply the model to an input batch.
:param x: an [N x C x ...] Tensor of inputs.
@@ -718,9 +790,11 @@ class UNetModel(nn.Module):
"""
assert (y is not None) == (
self.num_classes is not None
), "must specify y if and only if the model is class-conditional"
), 'must specify y if and only if the model is class-conditional'
hs = []
t_emb = timestep_embedding(timesteps, self.model_channels, repeat_only=False)
t_emb = timestep_embedding(
timesteps, self.model_channels, repeat_only=False
)
emb = self.time_embed(t_emb)
if self.num_classes is not None:
@@ -768,9 +842,9 @@ class EncoderUNetModel(nn.Module):
use_scale_shift_norm=False,
resblock_updown=False,
use_new_attention_order=False,
pool="adaptive",
pool='adaptive',
*args,
**kwargs
**kwargs,
):
super().__init__()
@@ -888,7 +962,7 @@ class EncoderUNetModel(nn.Module):
)
self._feature_size += ch
self.pool = pool
if pool == "adaptive":
if pool == 'adaptive':
self.out = nn.Sequential(
normalization(ch),
nn.SiLU(),
@@ -896,7 +970,7 @@ class EncoderUNetModel(nn.Module):
zero_module(conv_nd(dims, ch, out_channels, 1)),
nn.Flatten(),
)
elif pool == "attention":
elif pool == 'attention':
assert num_head_channels != -1
self.out = nn.Sequential(
normalization(ch),
@@ -905,13 +979,13 @@ class EncoderUNetModel(nn.Module):
(image_size // ds), ch, num_head_channels, out_channels
),
)
elif pool == "spatial":
elif pool == 'spatial':
self.out = nn.Sequential(
nn.Linear(self._feature_size, 2048),
nn.ReLU(),
nn.Linear(2048, self.out_channels),
)
elif pool == "spatial_v2":
elif pool == 'spatial_v2':
self.out = nn.Sequential(
nn.Linear(self._feature_size, 2048),
normalization(2048),
@@ -919,7 +993,7 @@ class EncoderUNetModel(nn.Module):
nn.Linear(2048, self.out_channels),
)
else:
raise NotImplementedError(f"Unexpected {pool} pooling")
raise NotImplementedError(f'Unexpected {pool} pooling')
def convert_to_fp16(self):
"""
@@ -942,20 +1016,21 @@ class EncoderUNetModel(nn.Module):
:param timesteps: a 1-D batch of timesteps.
:return: an [N x K] Tensor of outputs.
"""
emb = self.time_embed(timestep_embedding(timesteps, self.model_channels))
emb = self.time_embed(
timestep_embedding(timesteps, self.model_channels)
)
results = []
h = x.type(self.dtype)
for module in self.input_blocks:
h = module(h, emb)
if self.pool.startswith("spatial"):
if self.pool.startswith('spatial'):
results.append(h.type(x.dtype).mean(dim=(2, 3)))
h = self.middle_block(h, emb)
if self.pool.startswith("spatial"):
if self.pool.startswith('spatial'):
results.append(h.type(x.dtype).mean(dim=(2, 3)))
h = th.cat(results, axis=-1)
return self.out(h)
else:
h = h.type(x.dtype)
return self.out(h)

View File

@@ -18,15 +18,24 @@ from einops import repeat
from ldm.util import instantiate_from_config
def make_beta_schedule(schedule, n_timestep, linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3):
if schedule == "linear":
def make_beta_schedule(
schedule, n_timestep, linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3
):
if schedule == 'linear':
betas = (
torch.linspace(linear_start ** 0.5, linear_end ** 0.5, n_timestep, dtype=torch.float64) ** 2
torch.linspace(
linear_start**0.5,
linear_end**0.5,
n_timestep,
dtype=torch.float64,
)
** 2
)
elif schedule == "cosine":
elif schedule == 'cosine':
timesteps = (
torch.arange(n_timestep + 1, dtype=torch.float64) / n_timestep + cosine_s
torch.arange(n_timestep + 1, dtype=torch.float64) / n_timestep
+ cosine_s
)
alphas = timesteps / (1 + cosine_s) * np.pi / 2
alphas = torch.cos(alphas).pow(2)
@@ -34,43 +43,73 @@ def make_beta_schedule(schedule, n_timestep, linear_start=1e-4, linear_end=2e-2,
betas = 1 - alphas[1:] / alphas[:-1]
betas = np.clip(betas, a_min=0, a_max=0.999)
elif schedule == "sqrt_linear":
betas = torch.linspace(linear_start, linear_end, n_timestep, dtype=torch.float64)
elif schedule == "sqrt":
betas = torch.linspace(linear_start, linear_end, n_timestep, dtype=torch.float64) ** 0.5
elif schedule == 'sqrt_linear':
betas = torch.linspace(
linear_start, linear_end, n_timestep, dtype=torch.float64
)
elif schedule == 'sqrt':
betas = (
torch.linspace(
linear_start, linear_end, n_timestep, dtype=torch.float64
)
** 0.5
)
else:
raise ValueError(f"schedule '{schedule}' unknown.")
return betas.numpy()
def make_ddim_timesteps(ddim_discr_method, num_ddim_timesteps, num_ddpm_timesteps, verbose=True):
def make_ddim_timesteps(
ddim_discr_method, num_ddim_timesteps, num_ddpm_timesteps, verbose=True
):
if ddim_discr_method == 'uniform':
c = num_ddpm_timesteps // num_ddim_timesteps
ddim_timesteps = np.asarray(list(range(0, num_ddpm_timesteps, c)))
elif ddim_discr_method == 'quad':
ddim_timesteps = ((np.linspace(0, np.sqrt(num_ddpm_timesteps * .8), num_ddim_timesteps)) ** 2).astype(int)
ddim_timesteps = (
(
np.linspace(
0, np.sqrt(num_ddpm_timesteps * 0.8), num_ddim_timesteps
)
)
** 2
).astype(int)
else:
raise NotImplementedError(f'There is no ddim discretization method called "{ddim_discr_method}"')
raise NotImplementedError(
f'There is no ddim discretization method called "{ddim_discr_method}"'
)
# assert ddim_timesteps.shape[0] == num_ddim_timesteps
# add one to get the final alpha values right (the ones from first scale to data during sampling)
steps_out = ddim_timesteps + 1
# steps_out = ddim_timesteps + 1
steps_out = ddim_timesteps
if verbose:
print(f'Selected timesteps for ddim sampler: {steps_out}')
return steps_out
def make_ddim_sampling_parameters(alphacums, ddim_timesteps, eta, verbose=True):
def make_ddim_sampling_parameters(
alphacums, ddim_timesteps, eta, verbose=True
):
# select alphas for computing the variance schedule
alphas = alphacums[ddim_timesteps]
alphas_prev = np.asarray([alphacums[0]] + alphacums[ddim_timesteps[:-1]].tolist())
alphas_prev = np.asarray(
[alphacums[0]] + alphacums[ddim_timesteps[:-1]].tolist()
)
# according the the formula provided in https://arxiv.org/abs/2010.02502
sigmas = eta * np.sqrt((1 - alphas_prev) / (1 - alphas) * (1 - alphas / alphas_prev))
sigmas = eta * np.sqrt(
(1 - alphas_prev) / (1 - alphas) * (1 - alphas / alphas_prev)
)
if verbose:
print(f'Selected alphas for ddim sampler: a_t: {alphas}; a_(t-1): {alphas_prev}')
print(f'For the chosen value of eta, which is {eta}, '
f'this results in the following sigma_t schedule for ddim sampler {sigmas}')
print(
f'Selected alphas for ddim sampler: a_t: {alphas}; a_(t-1): {alphas_prev}'
)
print(
f'For the chosen value of eta, which is {eta}, '
f'this results in the following sigma_t schedule for ddim sampler {sigmas}'
)
return sigmas, alphas, alphas_prev
@@ -109,7 +148,9 @@ def checkpoint(func, inputs, params, flag):
explicitly take as arguments.
:param flag: if False, disable gradient checkpointing.
"""
if flag:
if (
False
): # disabled checkpointing to allow requires_grad = False for main model
args = tuple(inputs) + tuple(params)
return CheckpointFunction.apply(func, len(inputs), *args)
else:
@@ -129,7 +170,9 @@ class CheckpointFunction(torch.autograd.Function):
@staticmethod
def backward(ctx, *output_grads):
ctx.input_tensors = [x.detach().requires_grad_(True) for x in ctx.input_tensors]
ctx.input_tensors = [
x.detach().requires_grad_(True) for x in ctx.input_tensors
]
with torch.enable_grad():
# Fixes a bug where the first op in run_function modifies the
# Tensor storage in place, which is not allowed for detach()'d
@@ -160,12 +203,16 @@ def timestep_embedding(timesteps, dim, max_period=10000, repeat_only=False):
if not repeat_only:
half = dim // 2
freqs = torch.exp(
-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half
-math.log(max_period)
* torch.arange(start=0, end=half, dtype=torch.float32)
/ half
).to(device=timesteps.device)
args = timesteps[:, None].float() * freqs[None]
embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
if dim % 2:
embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
embedding = torch.cat(
[embedding, torch.zeros_like(embedding[:, :1])], dim=-1
)
else:
embedding = repeat(timesteps, 'b -> b d', d=dim)
return embedding
@@ -215,6 +262,7 @@ class GroupNorm32(nn.GroupNorm):
def forward(self, x):
return super().forward(x.float()).type(x.dtype)
def conv_nd(dims, *args, **kwargs):
"""
Create a 1D, 2D, or 3D convolution module.
@@ -225,7 +273,7 @@ def conv_nd(dims, *args, **kwargs):
return nn.Conv2d(*args, **kwargs)
elif dims == 3:
return nn.Conv3d(*args, **kwargs)
raise ValueError(f"unsupported dimensions: {dims}")
raise ValueError(f'unsupported dimensions: {dims}')
def linear(*args, **kwargs):
@@ -245,15 +293,16 @@ def avg_pool_nd(dims, *args, **kwargs):
return nn.AvgPool2d(*args, **kwargs)
elif dims == 3:
return nn.AvgPool3d(*args, **kwargs)
raise ValueError(f"unsupported dimensions: {dims}")
raise ValueError(f'unsupported dimensions: {dims}')
class HybridConditioner(nn.Module):
def __init__(self, c_concat_config, c_crossattn_config):
super().__init__()
self.concat_conditioner = instantiate_from_config(c_concat_config)
self.crossattn_conditioner = instantiate_from_config(c_crossattn_config)
self.crossattn_conditioner = instantiate_from_config(
c_crossattn_config
)
def forward(self, c_concat, c_crossattn):
c_concat = self.concat_conditioner(c_concat)
@@ -262,6 +311,8 @@ class HybridConditioner(nn.Module):
def noise_like(shape, device, repeat=False):
repeat_noise = lambda: torch.randn((1, *shape[1:]), device=device).repeat(shape[0], *((1,) * (len(shape) - 1)))
repeat_noise = lambda: torch.randn((1, *shape[1:]), device=device).repeat(
shape[0], *((1,) * (len(shape) - 1))
)
noise = lambda: torch.randn(shape, device=device)
return repeat_noise() if repeat else noise()
return repeat_noise() if repeat else noise()

View File

@@ -30,33 +30,45 @@ class DiagonalGaussianDistribution(object):
self.std = torch.exp(0.5 * self.logvar)
self.var = torch.exp(self.logvar)
if self.deterministic:
self.var = self.std = torch.zeros_like(self.mean).to(device=self.parameters.device)
self.var = self.std = torch.zeros_like(self.mean).to(
device=self.parameters.device
)
def sample(self):
x = self.mean + self.std * torch.randn(self.mean.shape).to(device=self.parameters.device)
x = self.mean + self.std * torch.randn(self.mean.shape).to(
device=self.parameters.device
)
return x
def kl(self, other=None):
if self.deterministic:
return torch.Tensor([0.])
return torch.Tensor([0.0])
else:
if other is None:
return 0.5 * torch.sum(torch.pow(self.mean, 2)
+ self.var - 1.0 - self.logvar,
dim=[1, 2, 3])
return 0.5 * torch.sum(
torch.pow(self.mean, 2) + self.var - 1.0 - self.logvar,
dim=[1, 2, 3],
)
else:
return 0.5 * torch.sum(
torch.pow(self.mean - other.mean, 2) / other.var
+ self.var / other.var - 1.0 - self.logvar + other.logvar,
dim=[1, 2, 3])
+ self.var / other.var
- 1.0
- self.logvar
+ other.logvar,
dim=[1, 2, 3],
)
def nll(self, sample, dims=[1,2,3]):
def nll(self, sample, dims=[1, 2, 3]):
if self.deterministic:
return torch.Tensor([0.])
return torch.Tensor([0.0])
logtwopi = np.log(2.0 * np.pi)
return 0.5 * torch.sum(
logtwopi + self.logvar + torch.pow(sample - self.mean, 2) / self.var,
dim=dims)
logtwopi
+ self.logvar
+ torch.pow(sample - self.mean, 2) / self.var,
dim=dims,
)
def mode(self):
return self.mean
@@ -74,7 +86,7 @@ def normal_kl(mean1, logvar1, mean2, logvar2):
if isinstance(obj, torch.Tensor):
tensor = obj
break
assert tensor is not None, "at least one argument must be a Tensor"
assert tensor is not None, 'at least one argument must be a Tensor'
# Force variances to be Tensors. Broadcasting helps convert scalars to
# Tensors, but it does not work for torch.exp().

View File

@@ -10,24 +10,30 @@ class LitEma(nn.Module):
self.m_name2s_name = {}
self.register_buffer('decay', torch.tensor(decay, dtype=torch.float32))
self.register_buffer('num_updates', torch.tensor(0,dtype=torch.int) if use_num_upates
else torch.tensor(-1,dtype=torch.int))
self.register_buffer(
'num_updates',
torch.tensor(0, dtype=torch.int)
if use_num_upates
else torch.tensor(-1, dtype=torch.int),
)
for name, p in model.named_parameters():
if p.requires_grad:
#remove as '.'-character is not allowed in buffers
s_name = name.replace('.','')
self.m_name2s_name.update({name:s_name})
self.register_buffer(s_name,p.clone().detach().data)
# remove as '.'-character is not allowed in buffers
s_name = name.replace('.', '')
self.m_name2s_name.update({name: s_name})
self.register_buffer(s_name, p.clone().detach().data)
self.collected_params = []
def forward(self,model):
def forward(self, model):
decay = self.decay
if self.num_updates >= 0:
self.num_updates += 1
decay = min(self.decay,(1 + self.num_updates) / (10 + self.num_updates))
decay = min(
self.decay, (1 + self.num_updates) / (10 + self.num_updates)
)
one_minus_decay = 1.0 - decay
@@ -38,8 +44,12 @@ class LitEma(nn.Module):
for key in m_param:
if m_param[key].requires_grad:
sname = self.m_name2s_name[key]
shadow_params[sname] = shadow_params[sname].type_as(m_param[key])
shadow_params[sname].sub_(one_minus_decay * (shadow_params[sname] - m_param[key]))
shadow_params[sname] = shadow_params[sname].type_as(
m_param[key]
)
shadow_params[sname].sub_(
one_minus_decay * (shadow_params[sname] - m_param[key])
)
else:
assert not key in self.m_name2s_name
@@ -48,7 +58,9 @@ class LitEma(nn.Module):
shadow_params = dict(self.named_buffers())
for key in m_param:
if m_param[key].requires_grad:
m_param[key].data.copy_(shadow_params[self.m_name2s_name[key]].data)
m_param[key].data.copy_(
shadow_params[self.m_name2s_name[key]].data
)
else:
assert not key in self.m_name2s_name

View File

@@ -0,0 +1,272 @@
from cmath import log
import torch
from torch import nn
import sys
from ldm.data.personalized import per_img_token_list
from transformers import CLIPTokenizer
from functools import partial
DEFAULT_PLACEHOLDER_TOKEN = ['*']
PROGRESSIVE_SCALE = 2000
def get_clip_token_for_string(tokenizer, string):
batch_encoding = tokenizer(
string,
truncation=True,
max_length=77,
return_length=True,
return_overflowing_tokens=False,
padding='max_length',
return_tensors='pt',
)
tokens = batch_encoding['input_ids']
""" assert (
torch.count_nonzero(tokens - 49407) == 2
), f"String '{string}' maps to more than a single token. Please use another string" """
return tokens[0, 1]
def get_bert_token_for_string(tokenizer, string):
token = tokenizer(string)
# assert torch.count_nonzero(token) == 3, f"String '{string}' maps to more than a single token. Please use another string"
token = token[0, 1]
return token
def get_embedding_for_clip_token(embedder, token):
return embedder(token.unsqueeze(0))[0, 0]
class EmbeddingManager(nn.Module):
def __init__(
self,
embedder,
placeholder_strings=None,
initializer_words=None,
per_image_tokens=False,
num_vectors_per_token=1,
progressive_words=False,
**kwargs,
):
super().__init__()
self.embedder = embedder
self.string_to_token_dict = {}
self.string_to_param_dict = nn.ParameterDict()
self.initial_embeddings = (
nn.ParameterDict()
) # These should not be optimized
self.progressive_words = progressive_words
self.progressive_counter = 0
self.max_vectors_per_token = num_vectors_per_token
if hasattr(
embedder, 'tokenizer'
): # using Stable Diffusion's CLIP encoder
self.is_clip = True
get_token_for_string = partial(
get_clip_token_for_string, embedder.tokenizer
)
get_embedding_for_tkn = partial(
get_embedding_for_clip_token,
embedder.transformer.text_model.embeddings,
)
token_dim = 1280
else: # using LDM's BERT encoder
self.is_clip = False
get_token_for_string = partial(
get_bert_token_for_string, embedder.tknz_fn
)
get_embedding_for_tkn = embedder.transformer.token_emb
token_dim = 1280
if per_image_tokens:
placeholder_strings.extend(per_img_token_list)
for idx, placeholder_string in enumerate(placeholder_strings):
token = get_token_for_string(placeholder_string)
if initializer_words and idx < len(initializer_words):
init_word_token = get_token_for_string(initializer_words[idx])
with torch.no_grad():
init_word_embedding = get_embedding_for_tkn(
init_word_token.cpu()
)
token_params = torch.nn.Parameter(
init_word_embedding.unsqueeze(0).repeat(
num_vectors_per_token, 1
),
requires_grad=True,
)
self.initial_embeddings[
placeholder_string
] = torch.nn.Parameter(
init_word_embedding.unsqueeze(0).repeat(
num_vectors_per_token, 1
),
requires_grad=False,
)
else:
token_params = torch.nn.Parameter(
torch.rand(
size=(num_vectors_per_token, token_dim),
requires_grad=True,
)
)
self.string_to_token_dict[placeholder_string] = token
self.string_to_param_dict[placeholder_string] = token_params
def forward(
self,
tokenized_text,
embedded_text,
):
b, n, device = *tokenized_text.shape, tokenized_text.device
for (
placeholder_string,
placeholder_token,
) in self.string_to_token_dict.items():
placeholder_embedding = self.string_to_param_dict[
placeholder_string
].to(device)
if (
self.max_vectors_per_token == 1
): # If there's only one vector per token, we can do a simple replacement
placeholder_idx = torch.where(
tokenized_text == placeholder_token.to(device)
)
embedded_text[placeholder_idx] = placeholder_embedding
else: # otherwise, need to insert and keep track of changing indices
if self.progressive_words:
self.progressive_counter += 1
max_step_tokens = (
1 + self.progressive_counter // PROGRESSIVE_SCALE
)
else:
max_step_tokens = self.max_vectors_per_token
num_vectors_for_token = min(
placeholder_embedding.shape[0], max_step_tokens
)
placeholder_rows, placeholder_cols = torch.where(
tokenized_text == placeholder_token.to(device)
)
if placeholder_rows.nelement() == 0:
continue
sorted_cols, sort_idx = torch.sort(
placeholder_cols, descending=True
)
sorted_rows = placeholder_rows[sort_idx]
for idx in range(len(sorted_rows)):
row = sorted_rows[idx]
col = sorted_cols[idx]
new_token_row = torch.cat(
[
tokenized_text[row][:col],
placeholder_token.repeat(num_vectors_for_token).to(
device
),
tokenized_text[row][col + 1 :],
],
axis=0,
)[:n]
new_embed_row = torch.cat(
[
embedded_text[row][:col],
placeholder_embedding[:num_vectors_for_token],
embedded_text[row][col + 1 :],
],
axis=0,
)[:n]
embedded_text[row] = new_embed_row
tokenized_text[row] = new_token_row
return embedded_text
def save(self, ckpt_path):
torch.save(
{
'string_to_token': self.string_to_token_dict,
'string_to_param': self.string_to_param_dict,
},
ckpt_path,
)
def load(self, ckpt_path, full=True):
ckpt = torch.load(ckpt_path, map_location='cpu')
# Handle .pt textual inversion files
if 'string_to_token' in ckpt and 'string_to_param' in ckpt:
self.string_to_token_dict = ckpt["string_to_token"]
self.string_to_param_dict = ckpt["string_to_param"]
# Handle .bin textual inversion files from Huggingface Concepts
# https://huggingface.co/sd-concepts-library
else:
for token_str in list(ckpt.keys()):
token = get_clip_token_for_string(self.embedder.tokenizer, token_str)
self.string_to_token_dict[token_str] = token
ckpt[token_str] = torch.nn.Parameter(ckpt[token_str])
self.string_to_param_dict.update(ckpt)
if not full:
for key, value in self.string_to_param_dict.items():
self.string_to_param_dict[key] = torch.nn.Parameter(value.half())
print(f'Added terms: {", ".join(self.string_to_param_dict.keys())}')
def get_embedding_norms_squared(self):
all_params = torch.cat(
list(self.string_to_param_dict.values()), axis=0
) # num_placeholders x embedding_dim
param_norm_squared = (all_params * all_params).sum(
axis=-1
) # num_placeholders
return param_norm_squared
def embedding_parameters(self):
return self.string_to_param_dict.parameters()
def embedding_to_coarse_loss(self):
loss = 0.0
num_embeddings = len(self.initial_embeddings)
for key in self.initial_embeddings:
optimized = self.string_to_param_dict[key]
coarse = self.initial_embeddings[key].clone().to(optimized.device)
loss = (
loss
+ (optimized - coarse)
@ (optimized - coarse).T
/ num_embeddings
)
return loss

View File

@@ -5,8 +5,40 @@ import clip
from einops import rearrange, repeat
from transformers import CLIPTokenizer, CLIPTextModel
import kornia
from ldm.dream.devices import choose_torch_device
from ldm.modules.x_transformer import Encoder, TransformerWrapper # TODO: can we directly rely on lucidrains code and simply add this as a reuirement? --> test
from ldm.modules.x_transformer import (
Encoder,
TransformerWrapper,
) # TODO: can we directly rely on lucidrains code and simply add this as a reuirement? --> test
def _expand_mask(mask, dtype, tgt_len=None):
"""
Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
"""
bsz, src_len = mask.size()
tgt_len = tgt_len if tgt_len is not None else src_len
expanded_mask = (
mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)
)
inverted_mask = 1.0 - expanded_mask
return inverted_mask.masked_fill(
inverted_mask.to(torch.bool), torch.finfo(dtype).min
)
def _build_causal_attention_mask(bsz, seq_len, dtype):
# lazily create causal attention mask, with full attention between the vision tokens
# pytorch uses additive attention mask; fill with -inf
mask = torch.empty(bsz, seq_len, seq_len, dtype=dtype)
mask.fill_(torch.tensor(torch.finfo(dtype).min))
mask.triu_(1) # zero out the lower diagonal
mask = mask.unsqueeze(1) # expand mask
return mask
class AbstractEncoder(nn.Module):
@@ -17,7 +49,6 @@ class AbstractEncoder(nn.Module):
raise NotImplementedError
class ClassEmbedder(nn.Module):
def __init__(self, embed_dim, n_classes=1000, key='class'):
super().__init__()
@@ -35,11 +66,22 @@ class ClassEmbedder(nn.Module):
class TransformerEmbedder(AbstractEncoder):
"""Some transformer encoder layers"""
def __init__(self, n_embed, n_layer, vocab_size, max_seq_len=77, device="cuda"):
def __init__(
self,
n_embed,
n_layer,
vocab_size,
max_seq_len=77,
device=choose_torch_device(),
):
super().__init__()
self.device = device
self.transformer = TransformerWrapper(num_tokens=vocab_size, max_seq_len=max_seq_len,
attn_layers=Encoder(dim=n_embed, depth=n_layer))
self.transformer = TransformerWrapper(
num_tokens=vocab_size,
max_seq_len=max_seq_len,
attn_layers=Encoder(dim=n_embed, depth=n_layer),
)
def forward(self, tokens):
tokens = tokens.to(self.device) # meh
@@ -51,27 +93,44 @@ class TransformerEmbedder(AbstractEncoder):
class BERTTokenizer(AbstractEncoder):
""" Uses a pretrained BERT tokenizer by huggingface. Vocab size: 30522 (?)"""
def __init__(self, device="cuda", vq_interface=True, max_length=77):
"""Uses a pretrained BERT tokenizer by huggingface. Vocab size: 30522 (?)"""
def __init__(
self, device=choose_torch_device(), vq_interface=True, max_length=77
):
super().__init__()
from transformers import BertTokenizerFast # TODO: add to reuquirements
from transformers import (
BertTokenizerFast,
) # TODO: add to reuquirements
# Modified to allow to run on non-internet connected compute nodes.
# Model needs to be loaded into cache from an internet-connected machine
# by running:
# from transformers import BertTokenizerFast
# BertTokenizerFast.from_pretrained("bert-base-uncased")
try:
self.tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased",local_files_only=True)
self.tokenizer = BertTokenizerFast.from_pretrained(
'bert-base-uncased', local_files_only=True
)
except OSError:
raise SystemExit("* Couldn't load Bert tokenizer files. Try running scripts/preload_models.py from an internet-conected machine.")
raise SystemExit(
"* Couldn't load Bert tokenizer files. Try running scripts/preload_models.py from an internet-conected machine."
)
self.device = device
self.vq_interface = vq_interface
self.max_length = max_length
def forward(self, text):
batch_encoding = self.tokenizer(text, truncation=True, max_length=self.max_length, return_length=True,
return_overflowing_tokens=False, padding="max_length", return_tensors="pt")
tokens = batch_encoding["input_ids"].to(self.device)
batch_encoding = self.tokenizer(
text,
truncation=True,
max_length=self.max_length,
return_length=True,
return_overflowing_tokens=False,
padding='max_length',
return_tensors='pt',
)
tokens = batch_encoding['input_ids'].to(self.device)
return tokens
@torch.no_grad()
@@ -87,54 +146,84 @@ class BERTTokenizer(AbstractEncoder):
class BERTEmbedder(AbstractEncoder):
"""Uses the BERT tokenizr model and add some transformer encoder layers"""
def __init__(self, n_embed, n_layer, vocab_size=30522, max_seq_len=77,
device="cuda",use_tokenizer=True, embedding_dropout=0.0):
def __init__(
self,
n_embed,
n_layer,
vocab_size=30522,
max_seq_len=77,
device=choose_torch_device(),
use_tokenizer=True,
embedding_dropout=0.0,
):
super().__init__()
self.use_tknz_fn = use_tokenizer
if self.use_tknz_fn:
self.tknz_fn = BERTTokenizer(vq_interface=False, max_length=max_seq_len)
self.tknz_fn = BERTTokenizer(
vq_interface=False, max_length=max_seq_len
)
self.device = device
self.transformer = TransformerWrapper(num_tokens=vocab_size, max_seq_len=max_seq_len,
attn_layers=Encoder(dim=n_embed, depth=n_layer),
emb_dropout=embedding_dropout)
self.transformer = TransformerWrapper(
num_tokens=vocab_size,
max_seq_len=max_seq_len,
attn_layers=Encoder(dim=n_embed, depth=n_layer),
emb_dropout=embedding_dropout,
)
def forward(self, text):
def forward(self, text, embedding_manager=None):
if self.use_tknz_fn:
tokens = self.tknz_fn(text)#.to(self.device)
tokens = self.tknz_fn(text) # .to(self.device)
else:
tokens = text
z = self.transformer(tokens, return_embeddings=True)
z = self.transformer(
tokens, return_embeddings=True, embedding_manager=embedding_manager
)
return z
def encode(self, text):
def encode(self, text, **kwargs):
# output of length 77
return self(text)
return self(text, **kwargs)
class SpatialRescaler(nn.Module):
def __init__(self,
n_stages=1,
method='bilinear',
multiplier=0.5,
in_channels=3,
out_channels=None,
bias=False):
def __init__(
self,
n_stages=1,
method='bilinear',
multiplier=0.5,
in_channels=3,
out_channels=None,
bias=False,
):
super().__init__()
self.n_stages = n_stages
assert self.n_stages >= 0
assert method in ['nearest','linear','bilinear','trilinear','bicubic','area']
assert method in [
'nearest',
'linear',
'bilinear',
'trilinear',
'bicubic',
'area',
]
self.multiplier = multiplier
self.interpolator = partial(torch.nn.functional.interpolate, mode=method)
self.interpolator = partial(
torch.nn.functional.interpolate, mode=method
)
self.remap_output = out_channels is not None
if self.remap_output:
print(f'Spatial Rescaler mapping from {in_channels} to {out_channels} channels after resizing.')
self.channel_mapper = nn.Conv2d(in_channels,out_channels,1,bias=bias)
print(
f'Spatial Rescaler mapping from {in_channels} to {out_channels} channels after resizing.'
)
self.channel_mapper = nn.Conv2d(
in_channels, out_channels, 1, bias=bias
)
def forward(self,x):
def forward(self, x):
for stage in range(self.n_stages):
x = self.interpolator(x, scale_factor=self.multiplier)
if self.remap_output:
x = self.channel_mapper(x)
return x
@@ -142,41 +231,245 @@ class SpatialRescaler(nn.Module):
def encode(self, x):
return self(x)
class FrozenCLIPEmbedder(AbstractEncoder):
"""Uses the CLIP transformer encoder for text (from Hugging Face)"""
def __init__(self, version="openai/clip-vit-large-patch14", device="cuda", max_length=77):
def __init__(
self,
version='openai/clip-vit-large-patch14',
device=choose_torch_device(),
max_length=77,
):
super().__init__()
self.tokenizer = CLIPTokenizer.from_pretrained(version)
self.transformer = CLIPTextModel.from_pretrained(version)
self.tokenizer = CLIPTokenizer.from_pretrained(
version, local_files_only=True
)
self.transformer = CLIPTextModel.from_pretrained(
version, local_files_only=True
)
self.device = device
self.max_length = max_length
self.freeze()
def embedding_forward(
self,
input_ids=None,
position_ids=None,
inputs_embeds=None,
embedding_manager=None,
) -> torch.Tensor:
seq_length = (
input_ids.shape[-1]
if input_ids is not None
else inputs_embeds.shape[-2]
)
if position_ids is None:
position_ids = self.position_ids[:, :seq_length]
if inputs_embeds is None:
inputs_embeds = self.token_embedding(input_ids)
if embedding_manager is not None:
inputs_embeds = embedding_manager(input_ids, inputs_embeds)
position_embeddings = self.position_embedding(position_ids)
embeddings = inputs_embeds + position_embeddings
return embeddings
self.transformer.text_model.embeddings.forward = (
embedding_forward.__get__(self.transformer.text_model.embeddings)
)
def encoder_forward(
self,
inputs_embeds,
attention_mask=None,
causal_attention_mask=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
):
output_attentions = (
output_attentions
if output_attentions is not None
else self.config.output_attentions
)
output_hidden_states = (
output_hidden_states
if output_hidden_states is not None
else self.config.output_hidden_states
)
return_dict = (
return_dict
if return_dict is not None
else self.config.use_return_dict
)
encoder_states = () if output_hidden_states else None
all_attentions = () if output_attentions else None
hidden_states = inputs_embeds
for idx, encoder_layer in enumerate(self.layers):
if output_hidden_states:
encoder_states = encoder_states + (hidden_states,)
layer_outputs = encoder_layer(
hidden_states,
attention_mask,
causal_attention_mask,
output_attentions=output_attentions,
)
hidden_states = layer_outputs[0]
if output_attentions:
all_attentions = all_attentions + (layer_outputs[1],)
if output_hidden_states:
encoder_states = encoder_states + (hidden_states,)
return hidden_states
self.transformer.text_model.encoder.forward = encoder_forward.__get__(
self.transformer.text_model.encoder
)
def text_encoder_forward(
self,
input_ids=None,
attention_mask=None,
position_ids=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
embedding_manager=None,
):
output_attentions = (
output_attentions
if output_attentions is not None
else self.config.output_attentions
)
output_hidden_states = (
output_hidden_states
if output_hidden_states is not None
else self.config.output_hidden_states
)
return_dict = (
return_dict
if return_dict is not None
else self.config.use_return_dict
)
if input_ids is None:
raise ValueError('You have to specify either input_ids')
input_shape = input_ids.size()
input_ids = input_ids.view(-1, input_shape[-1])
hidden_states = self.embeddings(
input_ids=input_ids,
position_ids=position_ids,
embedding_manager=embedding_manager,
)
bsz, seq_len = input_shape
# CLIP's text model uses causal mask, prepare it here.
# https://github.com/openai/CLIP/blob/cfcffb90e69f37bf2ff1e988237a0fbe41f33c04/clip/model.py#L324
causal_attention_mask = _build_causal_attention_mask(
bsz, seq_len, hidden_states.dtype
).to(hidden_states.device)
# expand attention_mask
if attention_mask is not None:
# [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
attention_mask = _expand_mask(
attention_mask, hidden_states.dtype
)
last_hidden_state = self.encoder(
inputs_embeds=hidden_states,
attention_mask=attention_mask,
causal_attention_mask=causal_attention_mask,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
last_hidden_state = self.final_layer_norm(last_hidden_state)
return last_hidden_state
self.transformer.text_model.forward = text_encoder_forward.__get__(
self.transformer.text_model
)
def transformer_forward(
self,
input_ids=None,
attention_mask=None,
position_ids=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
embedding_manager=None,
):
return self.text_model(
input_ids=input_ids,
attention_mask=attention_mask,
position_ids=position_ids,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
embedding_manager=embedding_manager,
)
self.transformer.forward = transformer_forward.__get__(
self.transformer
)
def freeze(self):
self.transformer = self.transformer.eval()
for param in self.parameters():
param.requires_grad = False
def forward(self, text):
batch_encoding = self.tokenizer(text, truncation=True, max_length=self.max_length, return_length=True,
return_overflowing_tokens=False, padding="max_length", return_tensors="pt")
tokens = batch_encoding["input_ids"].to(self.device)
outputs = self.transformer(input_ids=tokens)
def forward(self, text, **kwargs):
batch_encoding = self.tokenizer(
text,
truncation=True,
max_length=self.max_length,
return_length=True,
return_overflowing_tokens=False,
padding='max_length',
return_tensors='pt',
)
tokens = batch_encoding['input_ids'].to(self.device)
z = self.transformer(input_ids=tokens, **kwargs)
z = outputs.last_hidden_state
return z
def encode(self, text):
return self(text)
def encode(self, text, **kwargs):
return self(text, **kwargs)
class FrozenCLIPTextEmbedder(nn.Module):
"""
Uses the CLIP transformer encoder for text.
"""
def __init__(self, version='ViT-L/14', device="cuda", max_length=77, n_repeat=1, normalize=True):
def __init__(
self,
version='ViT-L/14',
device=choose_torch_device(),
max_length=77,
n_repeat=1,
normalize=True,
):
super().__init__()
self.model, _ = clip.load(version, jit=False, device="cpu")
self.model, _ = clip.load(version, jit=False, device=device)
self.device = device
self.max_length = max_length
self.n_repeat = n_repeat
@@ -196,7 +489,7 @@ class FrozenCLIPTextEmbedder(nn.Module):
def encode(self, text):
z = self(text)
if z.ndim==2:
if z.ndim == 2:
z = z[:, None, :]
z = repeat(z, 'b 1 d -> b k d', k=self.n_repeat)
return z
@@ -204,29 +497,42 @@ class FrozenCLIPTextEmbedder(nn.Module):
class FrozenClipImageEmbedder(nn.Module):
"""
Uses the CLIP image encoder.
"""
Uses the CLIP image encoder.
"""
def __init__(
self,
model,
jit=False,
device='cuda' if torch.cuda.is_available() else 'cpu',
antialias=False,
):
self,
model,
jit=False,
device=choose_torch_device(),
antialias=False,
):
super().__init__()
self.model, _ = clip.load(name=model, device=device, jit=jit)
self.antialias = antialias
self.register_buffer('mean', torch.Tensor([0.48145466, 0.4578275, 0.40821073]), persistent=False)
self.register_buffer('std', torch.Tensor([0.26862954, 0.26130258, 0.27577711]), persistent=False)
self.register_buffer(
'mean',
torch.Tensor([0.48145466, 0.4578275, 0.40821073]),
persistent=False,
)
self.register_buffer(
'std',
torch.Tensor([0.26862954, 0.26130258, 0.27577711]),
persistent=False,
)
def preprocess(self, x):
# normalize to [0,1]
x = kornia.geometry.resize(x, (224, 224),
interpolation='bicubic',align_corners=True,
antialias=self.antialias)
x = (x + 1.) / 2.
x = kornia.geometry.resize(
x,
(224, 224),
interpolation='bicubic',
align_corners=True,
antialias=self.antialias,
)
x = (x + 1.0) / 2.0
# renormalize according to clip
x = kornia.enhance.normalize(x, self.mean, self.std)
return x
@@ -236,7 +542,8 @@ class FrozenClipImageEmbedder(nn.Module):
return self.model.encode_image(self.preprocess(x))
if __name__ == "__main__":
if __name__ == '__main__':
from ldm.util import count_params
model = FrozenCLIPEmbedder()
count_params(model, verbose=True)

View File

@@ -1,2 +1,6 @@
from ldm.modules.image_degradation.bsrgan import degradation_bsrgan_variant as degradation_fn_bsr
from ldm.modules.image_degradation.bsrgan_light import degradation_bsrgan_variant as degradation_fn_bsr_light
from ldm.modules.image_degradation.bsrgan import (
degradation_bsrgan_variant as degradation_fn_bsr,
)
from ldm.modules.image_degradation.bsrgan_light import (
degradation_bsrgan_variant as degradation_fn_bsr_light,
)

View File

@@ -27,16 +27,16 @@ import ldm.modules.image_degradation.utils_image as util
def modcrop_np(img, sf):
'''
"""
Args:
img: numpy image, WxH or WxHxC
sf: scale factor
Return:
cropped image
'''
"""
w, h = img.shape[:2]
im = np.copy(img)
return im[:w - w % sf, :h - h % sf, ...]
return im[: w - w % sf, : h - h % sf, ...]
"""
@@ -54,7 +54,9 @@ def analytic_kernel(k):
# Loop over the small kernel to fill the big one
for r in range(k_size):
for c in range(k_size):
big_k[2 * r:2 * r + k_size, 2 * c:2 * c + k_size] += k[r, c] * k
big_k[2 * r : 2 * r + k_size, 2 * c : 2 * c + k_size] += (
k[r, c] * k
)
# Crop the edges of the big kernel to ignore very small values and increase run time of SR
crop = k_size // 2
cropped_big_k = big_k[crop:-crop, crop:-crop]
@@ -63,7 +65,7 @@ def analytic_kernel(k):
def anisotropic_Gaussian(ksize=15, theta=np.pi, l1=6, l2=6):
""" generate an anisotropic Gaussian kernel
"""generate an anisotropic Gaussian kernel
Args:
ksize : e.g., 15, kernel size
theta : [0, pi], rotation angle range
@@ -74,7 +76,12 @@ def anisotropic_Gaussian(ksize=15, theta=np.pi, l1=6, l2=6):
k : kernel
"""
v = np.dot(np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]), np.array([1., 0.]))
v = np.dot(
np.array(
[[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]
),
np.array([1.0, 0.0]),
)
V = np.array([[v[0], v[1]], [v[1], -v[0]]])
D = np.array([[l1, 0], [0, l2]])
Sigma = np.dot(np.dot(V, D), np.linalg.inv(V))
@@ -126,24 +133,32 @@ def shift_pixel(x, sf, upper_left=True):
def blur(x, k):
'''
"""
x: image, NxcxHxW
k: kernel, Nx1xhxw
'''
"""
n, c = x.shape[:2]
p1, p2 = (k.shape[-2] - 1) // 2, (k.shape[-1] - 1) // 2
x = torch.nn.functional.pad(x, pad=(p1, p2, p1, p2), mode='replicate')
k = k.repeat(1, c, 1, 1)
k = k.view(-1, 1, k.shape[2], k.shape[3])
x = x.view(1, -1, x.shape[2], x.shape[3])
x = torch.nn.functional.conv2d(x, k, bias=None, stride=1, padding=0, groups=n * c)
x = torch.nn.functional.conv2d(
x, k, bias=None, stride=1, padding=0, groups=n * c
)
x = x.view(n, c, x.shape[2], x.shape[3])
return x
def gen_kernel(k_size=np.array([15, 15]), scale_factor=np.array([4, 4]), min_var=0.6, max_var=10., noise_level=0):
""""
def gen_kernel(
k_size=np.array([15, 15]),
scale_factor=np.array([4, 4]),
min_var=0.6,
max_var=10.0,
noise_level=0,
):
""" "
# modified version of https://github.com/assafshocher/BlindSR_dataset_generator
# Kai Zhang
# min_var = 0.175 * sf # variance of the gaussian kernel will be sampled between min_var and max_var
@@ -157,13 +172,16 @@ def gen_kernel(k_size=np.array([15, 15]), scale_factor=np.array([4, 4]), min_var
# Set COV matrix using Lambdas and Theta
LAMBDA = np.diag([lambda_1, lambda_2])
Q = np.array([[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]])
Q = np.array(
[[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]
)
SIGMA = Q @ LAMBDA @ Q.T
INV_SIGMA = np.linalg.inv(SIGMA)[None, None, :, :]
# Set expectation position (shifting kernel for aligned image)
MU = k_size // 2 - 0.5 * (scale_factor - 1) # - 0.5 * (scale_factor - k_size % 2)
MU = k_size // 2 - 0.5 * (
scale_factor - 1
) # - 0.5 * (scale_factor - k_size % 2)
MU = MU[None, None, :, None]
# Create meshgrid for Gaussian
@@ -188,7 +206,9 @@ def fspecial_gaussian(hsize, sigma):
hsize = [hsize, hsize]
siz = [(hsize[0] - 1.0) / 2.0, (hsize[1] - 1.0) / 2.0]
std = sigma
[x, y] = np.meshgrid(np.arange(-siz[1], siz[1] + 1), np.arange(-siz[0], siz[0] + 1))
[x, y] = np.meshgrid(
np.arange(-siz[1], siz[1] + 1), np.arange(-siz[0], siz[0] + 1)
)
arg = -(x * x + y * y) / (2 * std * std)
h = np.exp(arg)
h[h < scipy.finfo(float).eps * h.max()] = 0
@@ -208,10 +228,10 @@ def fspecial_laplacian(alpha):
def fspecial(filter_type, *args, **kwargs):
'''
"""
python code from:
https://github.com/ronaldosena/imagens-medicas-2/blob/40171a6c259edec7827a6693a93955de2bd39e76/Aulas/aula_2_-_uniform_filter/matlab_fspecial.py
'''
"""
if filter_type == 'gaussian':
return fspecial_gaussian(*args, **kwargs)
if filter_type == 'laplacian':
@@ -226,19 +246,19 @@ def fspecial(filter_type, *args, **kwargs):
def bicubic_degradation(x, sf=3):
'''
"""
Args:
x: HxWxC image, [0, 1]
sf: down-scale factor
Return:
bicubicly downsampled LR image
'''
"""
x = util.imresize_np(x, scale=1 / sf)
return x
def srmd_degradation(x, k, sf=3):
''' blur + bicubic downsampling
"""blur + bicubic downsampling
Args:
x: HxWxC image, [0, 1]
k: hxw, double
@@ -253,14 +273,16 @@ def srmd_degradation(x, k, sf=3):
pages={3262--3271},
year={2018}
}
'''
x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap') # 'nearest' | 'mirror'
"""
x = ndimage.filters.convolve(
x, np.expand_dims(k, axis=2), mode='wrap'
) # 'nearest' | 'mirror'
x = bicubic_degradation(x, sf=sf)
return x
def dpsr_degradation(x, k, sf=3):
''' bicubic downsampling + blur
"""bicubic downsampling + blur
Args:
x: HxWxC image, [0, 1]
k: hxw, double
@@ -275,21 +297,21 @@ def dpsr_degradation(x, k, sf=3):
pages={1671--1681},
year={2019}
}
'''
"""
x = bicubic_degradation(x, sf=sf)
x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap')
return x
def classical_degradation(x, k, sf=3):
''' blur + downsampling
"""blur + downsampling
Args:
x: HxWxC image, [0, 1]/[0, 255]
k: hxw, double
sf: down-scale factor
Return:
downsampled LR image
'''
"""
x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap')
# x = filters.correlate(x, np.expand_dims(np.flip(k), axis=2))
st = 0
@@ -328,10 +350,19 @@ def add_blur(img, sf=4):
if random.random() < 0.5:
l1 = wd2 * random.random()
l2 = wd2 * random.random()
k = anisotropic_Gaussian(ksize=2 * random.randint(2, 11) + 3, theta=random.random() * np.pi, l1=l1, l2=l2)
k = anisotropic_Gaussian(
ksize=2 * random.randint(2, 11) + 3,
theta=random.random() * np.pi,
l1=l1,
l2=l2,
)
else:
k = fspecial('gaussian', 2 * random.randint(2, 11) + 3, wd * random.random())
img = ndimage.filters.convolve(img, np.expand_dims(k, axis=2), mode='mirror')
k = fspecial(
'gaussian', 2 * random.randint(2, 11) + 3, wd * random.random()
)
img = ndimage.filters.convolve(
img, np.expand_dims(k, axis=2), mode='mirror'
)
return img
@@ -344,7 +375,11 @@ def add_resize(img, sf=4):
sf1 = random.uniform(0.5 / sf, 1)
else:
sf1 = 1.0
img = cv2.resize(img, (int(sf1 * img.shape[1]), int(sf1 * img.shape[0])), interpolation=random.choice([1, 2, 3]))
img = cv2.resize(
img,
(int(sf1 * img.shape[1]), int(sf1 * img.shape[0])),
interpolation=random.choice([1, 2, 3]),
)
img = np.clip(img, 0.0, 1.0)
return img
@@ -366,19 +401,26 @@ def add_resize(img, sf=4):
# img = np.clip(img, 0.0, 1.0)
# return img
def add_Gaussian_noise(img, noise_level1=2, noise_level2=25):
noise_level = random.randint(noise_level1, noise_level2)
rnum = np.random.rand()
if rnum > 0.6: # add color Gaussian noise
img = img + np.random.normal(0, noise_level / 255.0, img.shape).astype(np.float32)
img = img + np.random.normal(0, noise_level / 255.0, img.shape).astype(
np.float32
)
elif rnum < 0.4: # add grayscale Gaussian noise
img = img + np.random.normal(0, noise_level / 255.0, (*img.shape[:2], 1)).astype(np.float32)
img = img + np.random.normal(
0, noise_level / 255.0, (*img.shape[:2], 1)
).astype(np.float32)
else: # add noise
L = noise_level2 / 255.
L = noise_level2 / 255.0
D = np.diag(np.random.rand(3))
U = orth(np.random.rand(3, 3))
conv = np.dot(np.dot(np.transpose(U), D), U)
img = img + np.random.multivariate_normal([0, 0, 0], np.abs(L ** 2 * conv), img.shape[:2]).astype(np.float32)
img = img + np.random.multivariate_normal(
[0, 0, 0], np.abs(L**2 * conv), img.shape[:2]
).astype(np.float32)
img = np.clip(img, 0.0, 1.0)
return img
@@ -388,28 +430,37 @@ def add_speckle_noise(img, noise_level1=2, noise_level2=25):
img = np.clip(img, 0.0, 1.0)
rnum = random.random()
if rnum > 0.6:
img += img * np.random.normal(0, noise_level / 255.0, img.shape).astype(np.float32)
img += img * np.random.normal(
0, noise_level / 255.0, img.shape
).astype(np.float32)
elif rnum < 0.4:
img += img * np.random.normal(0, noise_level / 255.0, (*img.shape[:2], 1)).astype(np.float32)
img += img * np.random.normal(
0, noise_level / 255.0, (*img.shape[:2], 1)
).astype(np.float32)
else:
L = noise_level2 / 255.
L = noise_level2 / 255.0
D = np.diag(np.random.rand(3))
U = orth(np.random.rand(3, 3))
conv = np.dot(np.dot(np.transpose(U), D), U)
img += img * np.random.multivariate_normal([0, 0, 0], np.abs(L ** 2 * conv), img.shape[:2]).astype(np.float32)
img += img * np.random.multivariate_normal(
[0, 0, 0], np.abs(L**2 * conv), img.shape[:2]
).astype(np.float32)
img = np.clip(img, 0.0, 1.0)
return img
def add_Poisson_noise(img):
img = np.clip((img * 255.0).round(), 0, 255) / 255.
img = np.clip((img * 255.0).round(), 0, 255) / 255.0
vals = 10 ** (2 * random.random() + 2.0) # [2, 4]
if random.random() < 0.5:
img = np.random.poisson(img * vals).astype(np.float32) / vals
else:
img_gray = np.dot(img[..., :3], [0.299, 0.587, 0.114])
img_gray = np.clip((img_gray * 255.0).round(), 0, 255) / 255.
noise_gray = np.random.poisson(img_gray * vals).astype(np.float32) / vals - img_gray
img_gray = np.clip((img_gray * 255.0).round(), 0, 255) / 255.0
noise_gray = (
np.random.poisson(img_gray * vals).astype(np.float32) / vals
- img_gray
)
img += noise_gray[:, :, np.newaxis]
img = np.clip(img, 0.0, 1.0)
return img
@@ -418,7 +469,9 @@ def add_Poisson_noise(img):
def add_JPEG_noise(img):
quality_factor = random.randint(30, 95)
img = cv2.cvtColor(util.single2uint(img), cv2.COLOR_RGB2BGR)
result, encimg = cv2.imencode('.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), quality_factor])
result, encimg = cv2.imencode(
'.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), quality_factor]
)
img = cv2.imdecode(encimg, 1)
img = cv2.cvtColor(util.uint2single(img), cv2.COLOR_BGR2RGB)
return img
@@ -428,10 +481,14 @@ def random_crop(lq, hq, sf=4, lq_patchsize=64):
h, w = lq.shape[:2]
rnd_h = random.randint(0, h - lq_patchsize)
rnd_w = random.randint(0, w - lq_patchsize)
lq = lq[rnd_h:rnd_h + lq_patchsize, rnd_w:rnd_w + lq_patchsize, :]
lq = lq[rnd_h : rnd_h + lq_patchsize, rnd_w : rnd_w + lq_patchsize, :]
rnd_h_H, rnd_w_H = int(rnd_h * sf), int(rnd_w * sf)
hq = hq[rnd_h_H:rnd_h_H + lq_patchsize * sf, rnd_w_H:rnd_w_H + lq_patchsize * sf, :]
hq = hq[
rnd_h_H : rnd_h_H + lq_patchsize * sf,
rnd_w_H : rnd_w_H + lq_patchsize * sf,
:,
]
return lq, hq
@@ -452,7 +509,7 @@ def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
sf_ori = sf
h1, w1 = img.shape[:2]
img = img.copy()[:w1 - w1 % sf, :h1 - h1 % sf, ...] # mod crop
img = img.copy()[: w1 - w1 % sf, : h1 - h1 % sf, ...] # mod crop
h, w = img.shape[:2]
if h < lq_patchsize * sf or w < lq_patchsize * sf:
@@ -462,8 +519,11 @@ def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
if sf == 4 and random.random() < scale2_prob: # downsample1
if np.random.rand() < 0.5:
img = cv2.resize(img, (int(1 / 2 * img.shape[1]), int(1 / 2 * img.shape[0])),
interpolation=random.choice([1, 2, 3]))
img = cv2.resize(
img,
(int(1 / 2 * img.shape[1]), int(1 / 2 * img.shape[0])),
interpolation=random.choice([1, 2, 3]),
)
else:
img = util.imresize_np(img, 1 / 2, True)
img = np.clip(img, 0.0, 1.0)
@@ -472,7 +532,10 @@ def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
shuffle_order = random.sample(range(7), 7)
idx1, idx2 = shuffle_order.index(2), shuffle_order.index(3)
if idx1 > idx2: # keep downsample3 last
shuffle_order[idx1], shuffle_order[idx2] = shuffle_order[idx2], shuffle_order[idx1]
shuffle_order[idx1], shuffle_order[idx2] = (
shuffle_order[idx2],
shuffle_order[idx1],
)
for i in shuffle_order:
@@ -487,19 +550,30 @@ def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
# downsample2
if random.random() < 0.75:
sf1 = random.uniform(1, 2 * sf)
img = cv2.resize(img, (int(1 / sf1 * img.shape[1]), int(1 / sf1 * img.shape[0])),
interpolation=random.choice([1, 2, 3]))
img = cv2.resize(
img,
(int(1 / sf1 * img.shape[1]), int(1 / sf1 * img.shape[0])),
interpolation=random.choice([1, 2, 3]),
)
else:
k = fspecial('gaussian', 25, random.uniform(0.1, 0.6 * sf))
k_shifted = shift_pixel(k, sf)
k_shifted = k_shifted / k_shifted.sum() # blur with shifted kernel
img = ndimage.filters.convolve(img, np.expand_dims(k_shifted, axis=2), mode='mirror')
k_shifted = (
k_shifted / k_shifted.sum()
) # blur with shifted kernel
img = ndimage.filters.convolve(
img, np.expand_dims(k_shifted, axis=2), mode='mirror'
)
img = img[0::sf, 0::sf, ...] # nearest downsampling
img = np.clip(img, 0.0, 1.0)
elif i == 3:
# downsample3
img = cv2.resize(img, (int(1 / sf * a), int(1 / sf * b)), interpolation=random.choice([1, 2, 3]))
img = cv2.resize(
img,
(int(1 / sf * a), int(1 / sf * b)),
interpolation=random.choice([1, 2, 3]),
)
img = np.clip(img, 0.0, 1.0)
elif i == 4:
@@ -544,15 +618,18 @@ def degradation_bsrgan_variant(image, sf=4, isp_model=None):
sf_ori = sf
h1, w1 = image.shape[:2]
image = image.copy()[:w1 - w1 % sf, :h1 - h1 % sf, ...] # mod crop
image = image.copy()[: w1 - w1 % sf, : h1 - h1 % sf, ...] # mod crop
h, w = image.shape[:2]
hq = image.copy()
if sf == 4 and random.random() < scale2_prob: # downsample1
if np.random.rand() < 0.5:
image = cv2.resize(image, (int(1 / 2 * image.shape[1]), int(1 / 2 * image.shape[0])),
interpolation=random.choice([1, 2, 3]))
image = cv2.resize(
image,
(int(1 / 2 * image.shape[1]), int(1 / 2 * image.shape[0])),
interpolation=random.choice([1, 2, 3]),
)
else:
image = util.imresize_np(image, 1 / 2, True)
image = np.clip(image, 0.0, 1.0)
@@ -561,7 +638,10 @@ def degradation_bsrgan_variant(image, sf=4, isp_model=None):
shuffle_order = random.sample(range(7), 7)
idx1, idx2 = shuffle_order.index(2), shuffle_order.index(3)
if idx1 > idx2: # keep downsample3 last
shuffle_order[idx1], shuffle_order[idx2] = shuffle_order[idx2], shuffle_order[idx1]
shuffle_order[idx1], shuffle_order[idx2] = (
shuffle_order[idx2],
shuffle_order[idx1],
)
for i in shuffle_order:
@@ -576,19 +656,33 @@ def degradation_bsrgan_variant(image, sf=4, isp_model=None):
# downsample2
if random.random() < 0.75:
sf1 = random.uniform(1, 2 * sf)
image = cv2.resize(image, (int(1 / sf1 * image.shape[1]), int(1 / sf1 * image.shape[0])),
interpolation=random.choice([1, 2, 3]))
image = cv2.resize(
image,
(
int(1 / sf1 * image.shape[1]),
int(1 / sf1 * image.shape[0]),
),
interpolation=random.choice([1, 2, 3]),
)
else:
k = fspecial('gaussian', 25, random.uniform(0.1, 0.6 * sf))
k_shifted = shift_pixel(k, sf)
k_shifted = k_shifted / k_shifted.sum() # blur with shifted kernel
image = ndimage.filters.convolve(image, np.expand_dims(k_shifted, axis=2), mode='mirror')
k_shifted = (
k_shifted / k_shifted.sum()
) # blur with shifted kernel
image = ndimage.filters.convolve(
image, np.expand_dims(k_shifted, axis=2), mode='mirror'
)
image = image[0::sf, 0::sf, ...] # nearest downsampling
image = np.clip(image, 0.0, 1.0)
elif i == 3:
# downsample3
image = cv2.resize(image, (int(1 / sf * a), int(1 / sf * b)), interpolation=random.choice([1, 2, 3]))
image = cv2.resize(
image,
(int(1 / sf * a), int(1 / sf * b)),
interpolation=random.choice([1, 2, 3]),
)
image = np.clip(image, 0.0, 1.0)
elif i == 4:
@@ -609,12 +703,19 @@ def degradation_bsrgan_variant(image, sf=4, isp_model=None):
# add final JPEG compression noise
image = add_JPEG_noise(image)
image = util.single2uint(image)
example = {"image":image}
example = {'image': image}
return example
# TODO incase there is a pickle error one needs to replace a += x with a = a + x in add_speckle_noise etc...
def degradation_bsrgan_plus(img, sf=4, shuffle_prob=0.5, use_sharp=True, lq_patchsize=64, isp_model=None):
def degradation_bsrgan_plus(
img,
sf=4,
shuffle_prob=0.5,
use_sharp=True,
lq_patchsize=64,
isp_model=None,
):
"""
This is an extended degradation model by combining
the degradation models of BSRGAN and Real-ESRGAN
@@ -630,7 +731,7 @@ def degradation_bsrgan_plus(img, sf=4, shuffle_prob=0.5, use_sharp=True, lq_patc
"""
h1, w1 = img.shape[:2]
img = img.copy()[:w1 - w1 % sf, :h1 - h1 % sf, ...] # mod crop
img = img.copy()[: w1 - w1 % sf, : h1 - h1 % sf, ...] # mod crop
h, w = img.shape[:2]
if h < lq_patchsize * sf or w < lq_patchsize * sf:
@@ -645,8 +746,12 @@ def degradation_bsrgan_plus(img, sf=4, shuffle_prob=0.5, use_sharp=True, lq_patc
else:
shuffle_order = list(range(13))
# local shuffle for noise, JPEG is always the last one
shuffle_order[2:6] = random.sample(shuffle_order[2:6], len(range(2, 6)))
shuffle_order[9:13] = random.sample(shuffle_order[9:13], len(range(9, 13)))
shuffle_order[2:6] = random.sample(
shuffle_order[2:6], len(range(2, 6))
)
shuffle_order[9:13] = random.sample(
shuffle_order[9:13], len(range(9, 13))
)
poisson_prob, speckle_prob, isp_prob = 0.1, 0.1, 0.1
@@ -689,8 +794,11 @@ def degradation_bsrgan_plus(img, sf=4, shuffle_prob=0.5, use_sharp=True, lq_patc
print('check the shuffle!')
# resize to desired size
img = cv2.resize(img, (int(1 / sf * hq.shape[1]), int(1 / sf * hq.shape[0])),
interpolation=random.choice([1, 2, 3]))
img = cv2.resize(
img,
(int(1 / sf * hq.shape[1]), int(1 / sf * hq.shape[0])),
interpolation=random.choice([1, 2, 3]),
)
# add final JPEG compression noise
img = add_JPEG_noise(img)
@@ -702,29 +810,37 @@ def degradation_bsrgan_plus(img, sf=4, shuffle_prob=0.5, use_sharp=True, lq_patc
if __name__ == '__main__':
print("hey")
img = util.imread_uint('utils/test.png', 3)
print(img)
img = util.uint2single(img)
print(img)
img = img[:448, :448]
h = img.shape[0] // 4
print("resizing to", h)
sf = 4
deg_fn = partial(degradation_bsrgan_variant, sf=sf)
for i in range(20):
print(i)
img_lq = deg_fn(img)
print(img_lq)
img_lq_bicubic = albumentations.SmallestMaxSize(max_size=h, interpolation=cv2.INTER_CUBIC)(image=img)["image"]
print(img_lq.shape)
print("bicubic", img_lq_bicubic.shape)
print(img_hq.shape)
lq_nearest = cv2.resize(util.single2uint(img_lq), (int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),
interpolation=0)
lq_bicubic_nearest = cv2.resize(util.single2uint(img_lq_bicubic), (int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),
interpolation=0)
img_concat = np.concatenate([lq_bicubic_nearest, lq_nearest, util.single2uint(img_hq)], axis=1)
util.imsave(img_concat, str(i) + '.png')
print('hey')
img = util.imread_uint('utils/test.png', 3)
print(img)
img = util.uint2single(img)
print(img)
img = img[:448, :448]
h = img.shape[0] // 4
print('resizing to', h)
sf = 4
deg_fn = partial(degradation_bsrgan_variant, sf=sf)
for i in range(20):
print(i)
img_lq = deg_fn(img)
print(img_lq)
img_lq_bicubic = albumentations.SmallestMaxSize(
max_size=h, interpolation=cv2.INTER_CUBIC
)(image=img)['image']
print(img_lq.shape)
print('bicubic', img_lq_bicubic.shape)
print(img_hq.shape)
lq_nearest = cv2.resize(
util.single2uint(img_lq),
(int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),
interpolation=0,
)
lq_bicubic_nearest = cv2.resize(
util.single2uint(img_lq_bicubic),
(int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),
interpolation=0,
)
img_concat = np.concatenate(
[lq_bicubic_nearest, lq_nearest, util.single2uint(img_hq)], axis=1
)
util.imsave(img_concat, str(i) + '.png')

View File

@@ -27,16 +27,16 @@ import ldm.modules.image_degradation.utils_image as util
def modcrop_np(img, sf):
'''
"""
Args:
img: numpy image, WxH or WxHxC
sf: scale factor
Return:
cropped image
'''
"""
w, h = img.shape[:2]
im = np.copy(img)
return im[:w - w % sf, :h - h % sf, ...]
return im[: w - w % sf, : h - h % sf, ...]
"""
@@ -54,7 +54,9 @@ def analytic_kernel(k):
# Loop over the small kernel to fill the big one
for r in range(k_size):
for c in range(k_size):
big_k[2 * r:2 * r + k_size, 2 * c:2 * c + k_size] += k[r, c] * k
big_k[2 * r : 2 * r + k_size, 2 * c : 2 * c + k_size] += (
k[r, c] * k
)
# Crop the edges of the big kernel to ignore very small values and increase run time of SR
crop = k_size // 2
cropped_big_k = big_k[crop:-crop, crop:-crop]
@@ -63,7 +65,7 @@ def analytic_kernel(k):
def anisotropic_Gaussian(ksize=15, theta=np.pi, l1=6, l2=6):
""" generate an anisotropic Gaussian kernel
"""generate an anisotropic Gaussian kernel
Args:
ksize : e.g., 15, kernel size
theta : [0, pi], rotation angle range
@@ -74,7 +76,12 @@ def anisotropic_Gaussian(ksize=15, theta=np.pi, l1=6, l2=6):
k : kernel
"""
v = np.dot(np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]), np.array([1., 0.]))
v = np.dot(
np.array(
[[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]
),
np.array([1.0, 0.0]),
)
V = np.array([[v[0], v[1]], [v[1], -v[0]]])
D = np.array([[l1, 0], [0, l2]])
Sigma = np.dot(np.dot(V, D), np.linalg.inv(V))
@@ -126,24 +133,32 @@ def shift_pixel(x, sf, upper_left=True):
def blur(x, k):
'''
"""
x: image, NxcxHxW
k: kernel, Nx1xhxw
'''
"""
n, c = x.shape[:2]
p1, p2 = (k.shape[-2] - 1) // 2, (k.shape[-1] - 1) // 2
x = torch.nn.functional.pad(x, pad=(p1, p2, p1, p2), mode='replicate')
k = k.repeat(1, c, 1, 1)
k = k.view(-1, 1, k.shape[2], k.shape[3])
x = x.view(1, -1, x.shape[2], x.shape[3])
x = torch.nn.functional.conv2d(x, k, bias=None, stride=1, padding=0, groups=n * c)
x = torch.nn.functional.conv2d(
x, k, bias=None, stride=1, padding=0, groups=n * c
)
x = x.view(n, c, x.shape[2], x.shape[3])
return x
def gen_kernel(k_size=np.array([15, 15]), scale_factor=np.array([4, 4]), min_var=0.6, max_var=10., noise_level=0):
""""
def gen_kernel(
k_size=np.array([15, 15]),
scale_factor=np.array([4, 4]),
min_var=0.6,
max_var=10.0,
noise_level=0,
):
""" "
# modified version of https://github.com/assafshocher/BlindSR_dataset_generator
# Kai Zhang
# min_var = 0.175 * sf # variance of the gaussian kernel will be sampled between min_var and max_var
@@ -157,13 +172,16 @@ def gen_kernel(k_size=np.array([15, 15]), scale_factor=np.array([4, 4]), min_var
# Set COV matrix using Lambdas and Theta
LAMBDA = np.diag([lambda_1, lambda_2])
Q = np.array([[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]])
Q = np.array(
[[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]
)
SIGMA = Q @ LAMBDA @ Q.T
INV_SIGMA = np.linalg.inv(SIGMA)[None, None, :, :]
# Set expectation position (shifting kernel for aligned image)
MU = k_size // 2 - 0.5 * (scale_factor - 1) # - 0.5 * (scale_factor - k_size % 2)
MU = k_size // 2 - 0.5 * (
scale_factor - 1
) # - 0.5 * (scale_factor - k_size % 2)
MU = MU[None, None, :, None]
# Create meshgrid for Gaussian
@@ -188,7 +206,9 @@ def fspecial_gaussian(hsize, sigma):
hsize = [hsize, hsize]
siz = [(hsize[0] - 1.0) / 2.0, (hsize[1] - 1.0) / 2.0]
std = sigma
[x, y] = np.meshgrid(np.arange(-siz[1], siz[1] + 1), np.arange(-siz[0], siz[0] + 1))
[x, y] = np.meshgrid(
np.arange(-siz[1], siz[1] + 1), np.arange(-siz[0], siz[0] + 1)
)
arg = -(x * x + y * y) / (2 * std * std)
h = np.exp(arg)
h[h < scipy.finfo(float).eps * h.max()] = 0
@@ -208,10 +228,10 @@ def fspecial_laplacian(alpha):
def fspecial(filter_type, *args, **kwargs):
'''
"""
python code from:
https://github.com/ronaldosena/imagens-medicas-2/blob/40171a6c259edec7827a6693a93955de2bd39e76/Aulas/aula_2_-_uniform_filter/matlab_fspecial.py
'''
"""
if filter_type == 'gaussian':
return fspecial_gaussian(*args, **kwargs)
if filter_type == 'laplacian':
@@ -226,19 +246,19 @@ def fspecial(filter_type, *args, **kwargs):
def bicubic_degradation(x, sf=3):
'''
"""
Args:
x: HxWxC image, [0, 1]
sf: down-scale factor
Return:
bicubicly downsampled LR image
'''
"""
x = util.imresize_np(x, scale=1 / sf)
return x
def srmd_degradation(x, k, sf=3):
''' blur + bicubic downsampling
"""blur + bicubic downsampling
Args:
x: HxWxC image, [0, 1]
k: hxw, double
@@ -253,14 +273,16 @@ def srmd_degradation(x, k, sf=3):
pages={3262--3271},
year={2018}
}
'''
x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap') # 'nearest' | 'mirror'
"""
x = ndimage.filters.convolve(
x, np.expand_dims(k, axis=2), mode='wrap'
) # 'nearest' | 'mirror'
x = bicubic_degradation(x, sf=sf)
return x
def dpsr_degradation(x, k, sf=3):
''' bicubic downsampling + blur
"""bicubic downsampling + blur
Args:
x: HxWxC image, [0, 1]
k: hxw, double
@@ -275,21 +297,21 @@ def dpsr_degradation(x, k, sf=3):
pages={1671--1681},
year={2019}
}
'''
"""
x = bicubic_degradation(x, sf=sf)
x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap')
return x
def classical_degradation(x, k, sf=3):
''' blur + downsampling
"""blur + downsampling
Args:
x: HxWxC image, [0, 1]/[0, 255]
k: hxw, double
sf: down-scale factor
Return:
downsampled LR image
'''
"""
x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap')
# x = filters.correlate(x, np.expand_dims(np.flip(k), axis=2))
st = 0
@@ -326,16 +348,25 @@ def add_blur(img, sf=4):
wd2 = 4.0 + sf
wd = 2.0 + 0.2 * sf
wd2 = wd2/4
wd = wd/4
wd2 = wd2 / 4
wd = wd / 4
if random.random() < 0.5:
l1 = wd2 * random.random()
l2 = wd2 * random.random()
k = anisotropic_Gaussian(ksize=random.randint(2, 11) + 3, theta=random.random() * np.pi, l1=l1, l2=l2)
k = anisotropic_Gaussian(
ksize=random.randint(2, 11) + 3,
theta=random.random() * np.pi,
l1=l1,
l2=l2,
)
else:
k = fspecial('gaussian', random.randint(2, 4) + 3, wd * random.random())
img = ndimage.filters.convolve(img, np.expand_dims(k, axis=2), mode='mirror')
k = fspecial(
'gaussian', random.randint(2, 4) + 3, wd * random.random()
)
img = ndimage.filters.convolve(
img, np.expand_dims(k, axis=2), mode='mirror'
)
return img
@@ -348,7 +379,11 @@ def add_resize(img, sf=4):
sf1 = random.uniform(0.5 / sf, 1)
else:
sf1 = 1.0
img = cv2.resize(img, (int(sf1 * img.shape[1]), int(sf1 * img.shape[0])), interpolation=random.choice([1, 2, 3]))
img = cv2.resize(
img,
(int(sf1 * img.shape[1]), int(sf1 * img.shape[0])),
interpolation=random.choice([1, 2, 3]),
)
img = np.clip(img, 0.0, 1.0)
return img
@@ -370,19 +405,26 @@ def add_resize(img, sf=4):
# img = np.clip(img, 0.0, 1.0)
# return img
def add_Gaussian_noise(img, noise_level1=2, noise_level2=25):
noise_level = random.randint(noise_level1, noise_level2)
rnum = np.random.rand()
if rnum > 0.6: # add color Gaussian noise
img = img + np.random.normal(0, noise_level / 255.0, img.shape).astype(np.float32)
img = img + np.random.normal(0, noise_level / 255.0, img.shape).astype(
np.float32
)
elif rnum < 0.4: # add grayscale Gaussian noise
img = img + np.random.normal(0, noise_level / 255.0, (*img.shape[:2], 1)).astype(np.float32)
img = img + np.random.normal(
0, noise_level / 255.0, (*img.shape[:2], 1)
).astype(np.float32)
else: # add noise
L = noise_level2 / 255.
L = noise_level2 / 255.0
D = np.diag(np.random.rand(3))
U = orth(np.random.rand(3, 3))
conv = np.dot(np.dot(np.transpose(U), D), U)
img = img + np.random.multivariate_normal([0, 0, 0], np.abs(L ** 2 * conv), img.shape[:2]).astype(np.float32)
img = img + np.random.multivariate_normal(
[0, 0, 0], np.abs(L**2 * conv), img.shape[:2]
).astype(np.float32)
img = np.clip(img, 0.0, 1.0)
return img
@@ -392,28 +434,37 @@ def add_speckle_noise(img, noise_level1=2, noise_level2=25):
img = np.clip(img, 0.0, 1.0)
rnum = random.random()
if rnum > 0.6:
img += img * np.random.normal(0, noise_level / 255.0, img.shape).astype(np.float32)
img += img * np.random.normal(
0, noise_level / 255.0, img.shape
).astype(np.float32)
elif rnum < 0.4:
img += img * np.random.normal(0, noise_level / 255.0, (*img.shape[:2], 1)).astype(np.float32)
img += img * np.random.normal(
0, noise_level / 255.0, (*img.shape[:2], 1)
).astype(np.float32)
else:
L = noise_level2 / 255.
L = noise_level2 / 255.0
D = np.diag(np.random.rand(3))
U = orth(np.random.rand(3, 3))
conv = np.dot(np.dot(np.transpose(U), D), U)
img += img * np.random.multivariate_normal([0, 0, 0], np.abs(L ** 2 * conv), img.shape[:2]).astype(np.float32)
img += img * np.random.multivariate_normal(
[0, 0, 0], np.abs(L**2 * conv), img.shape[:2]
).astype(np.float32)
img = np.clip(img, 0.0, 1.0)
return img
def add_Poisson_noise(img):
img = np.clip((img * 255.0).round(), 0, 255) / 255.
img = np.clip((img * 255.0).round(), 0, 255) / 255.0
vals = 10 ** (2 * random.random() + 2.0) # [2, 4]
if random.random() < 0.5:
img = np.random.poisson(img * vals).astype(np.float32) / vals
else:
img_gray = np.dot(img[..., :3], [0.299, 0.587, 0.114])
img_gray = np.clip((img_gray * 255.0).round(), 0, 255) / 255.
noise_gray = np.random.poisson(img_gray * vals).astype(np.float32) / vals - img_gray
img_gray = np.clip((img_gray * 255.0).round(), 0, 255) / 255.0
noise_gray = (
np.random.poisson(img_gray * vals).astype(np.float32) / vals
- img_gray
)
img += noise_gray[:, :, np.newaxis]
img = np.clip(img, 0.0, 1.0)
return img
@@ -422,7 +473,9 @@ def add_Poisson_noise(img):
def add_JPEG_noise(img):
quality_factor = random.randint(80, 95)
img = cv2.cvtColor(util.single2uint(img), cv2.COLOR_RGB2BGR)
result, encimg = cv2.imencode('.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), quality_factor])
result, encimg = cv2.imencode(
'.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), quality_factor]
)
img = cv2.imdecode(encimg, 1)
img = cv2.cvtColor(util.uint2single(img), cv2.COLOR_BGR2RGB)
return img
@@ -432,10 +485,14 @@ def random_crop(lq, hq, sf=4, lq_patchsize=64):
h, w = lq.shape[:2]
rnd_h = random.randint(0, h - lq_patchsize)
rnd_w = random.randint(0, w - lq_patchsize)
lq = lq[rnd_h:rnd_h + lq_patchsize, rnd_w:rnd_w + lq_patchsize, :]
lq = lq[rnd_h : rnd_h + lq_patchsize, rnd_w : rnd_w + lq_patchsize, :]
rnd_h_H, rnd_w_H = int(rnd_h * sf), int(rnd_w * sf)
hq = hq[rnd_h_H:rnd_h_H + lq_patchsize * sf, rnd_w_H:rnd_w_H + lq_patchsize * sf, :]
hq = hq[
rnd_h_H : rnd_h_H + lq_patchsize * sf,
rnd_w_H : rnd_w_H + lq_patchsize * sf,
:,
]
return lq, hq
@@ -456,7 +513,7 @@ def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
sf_ori = sf
h1, w1 = img.shape[:2]
img = img.copy()[:w1 - w1 % sf, :h1 - h1 % sf, ...] # mod crop
img = img.copy()[: w1 - w1 % sf, : h1 - h1 % sf, ...] # mod crop
h, w = img.shape[:2]
if h < lq_patchsize * sf or w < lq_patchsize * sf:
@@ -466,8 +523,11 @@ def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
if sf == 4 and random.random() < scale2_prob: # downsample1
if np.random.rand() < 0.5:
img = cv2.resize(img, (int(1 / 2 * img.shape[1]), int(1 / 2 * img.shape[0])),
interpolation=random.choice([1, 2, 3]))
img = cv2.resize(
img,
(int(1 / 2 * img.shape[1]), int(1 / 2 * img.shape[0])),
interpolation=random.choice([1, 2, 3]),
)
else:
img = util.imresize_np(img, 1 / 2, True)
img = np.clip(img, 0.0, 1.0)
@@ -476,7 +536,10 @@ def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
shuffle_order = random.sample(range(7), 7)
idx1, idx2 = shuffle_order.index(2), shuffle_order.index(3)
if idx1 > idx2: # keep downsample3 last
shuffle_order[idx1], shuffle_order[idx2] = shuffle_order[idx2], shuffle_order[idx1]
shuffle_order[idx1], shuffle_order[idx2] = (
shuffle_order[idx2],
shuffle_order[idx1],
)
for i in shuffle_order:
@@ -491,19 +554,30 @@ def degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):
# downsample2
if random.random() < 0.75:
sf1 = random.uniform(1, 2 * sf)
img = cv2.resize(img, (int(1 / sf1 * img.shape[1]), int(1 / sf1 * img.shape[0])),
interpolation=random.choice([1, 2, 3]))
img = cv2.resize(
img,
(int(1 / sf1 * img.shape[1]), int(1 / sf1 * img.shape[0])),
interpolation=random.choice([1, 2, 3]),
)
else:
k = fspecial('gaussian', 25, random.uniform(0.1, 0.6 * sf))
k_shifted = shift_pixel(k, sf)
k_shifted = k_shifted / k_shifted.sum() # blur with shifted kernel
img = ndimage.filters.convolve(img, np.expand_dims(k_shifted, axis=2), mode='mirror')
k_shifted = (
k_shifted / k_shifted.sum()
) # blur with shifted kernel
img = ndimage.filters.convolve(
img, np.expand_dims(k_shifted, axis=2), mode='mirror'
)
img = img[0::sf, 0::sf, ...] # nearest downsampling
img = np.clip(img, 0.0, 1.0)
elif i == 3:
# downsample3
img = cv2.resize(img, (int(1 / sf * a), int(1 / sf * b)), interpolation=random.choice([1, 2, 3]))
img = cv2.resize(
img,
(int(1 / sf * a), int(1 / sf * b)),
interpolation=random.choice([1, 2, 3]),
)
img = np.clip(img, 0.0, 1.0)
elif i == 4:
@@ -548,15 +622,18 @@ def degradation_bsrgan_variant(image, sf=4, isp_model=None):
sf_ori = sf
h1, w1 = image.shape[:2]
image = image.copy()[:w1 - w1 % sf, :h1 - h1 % sf, ...] # mod crop
image = image.copy()[: w1 - w1 % sf, : h1 - h1 % sf, ...] # mod crop
h, w = image.shape[:2]
hq = image.copy()
if sf == 4 and random.random() < scale2_prob: # downsample1
if np.random.rand() < 0.5:
image = cv2.resize(image, (int(1 / 2 * image.shape[1]), int(1 / 2 * image.shape[0])),
interpolation=random.choice([1, 2, 3]))
image = cv2.resize(
image,
(int(1 / 2 * image.shape[1]), int(1 / 2 * image.shape[0])),
interpolation=random.choice([1, 2, 3]),
)
else:
image = util.imresize_np(image, 1 / 2, True)
image = np.clip(image, 0.0, 1.0)
@@ -565,7 +642,10 @@ def degradation_bsrgan_variant(image, sf=4, isp_model=None):
shuffle_order = random.sample(range(7), 7)
idx1, idx2 = shuffle_order.index(2), shuffle_order.index(3)
if idx1 > idx2: # keep downsample3 last
shuffle_order[idx1], shuffle_order[idx2] = shuffle_order[idx2], shuffle_order[idx1]
shuffle_order[idx1], shuffle_order[idx2] = (
shuffle_order[idx2],
shuffle_order[idx1],
)
for i in shuffle_order:
@@ -583,20 +663,34 @@ def degradation_bsrgan_variant(image, sf=4, isp_model=None):
# downsample2
if random.random() < 0.8:
sf1 = random.uniform(1, 2 * sf)
image = cv2.resize(image, (int(1 / sf1 * image.shape[1]), int(1 / sf1 * image.shape[0])),
interpolation=random.choice([1, 2, 3]))
image = cv2.resize(
image,
(
int(1 / sf1 * image.shape[1]),
int(1 / sf1 * image.shape[0]),
),
interpolation=random.choice([1, 2, 3]),
)
else:
k = fspecial('gaussian', 25, random.uniform(0.1, 0.6 * sf))
k_shifted = shift_pixel(k, sf)
k_shifted = k_shifted / k_shifted.sum() # blur with shifted kernel
image = ndimage.filters.convolve(image, np.expand_dims(k_shifted, axis=2), mode='mirror')
k_shifted = (
k_shifted / k_shifted.sum()
) # blur with shifted kernel
image = ndimage.filters.convolve(
image, np.expand_dims(k_shifted, axis=2), mode='mirror'
)
image = image[0::sf, 0::sf, ...] # nearest downsampling
image = np.clip(image, 0.0, 1.0)
elif i == 3:
# downsample3
image = cv2.resize(image, (int(1 / sf * a), int(1 / sf * b)), interpolation=random.choice([1, 2, 3]))
image = cv2.resize(
image,
(int(1 / sf * a), int(1 / sf * b)),
interpolation=random.choice([1, 2, 3]),
)
image = np.clip(image, 0.0, 1.0)
elif i == 4:
@@ -617,34 +711,41 @@ def degradation_bsrgan_variant(image, sf=4, isp_model=None):
# add final JPEG compression noise
image = add_JPEG_noise(image)
image = util.single2uint(image)
example = {"image": image}
example = {'image': image}
return example
if __name__ == '__main__':
print("hey")
print('hey')
img = util.imread_uint('utils/test.png', 3)
img = img[:448, :448]
h = img.shape[0] // 4
print("resizing to", h)
print('resizing to', h)
sf = 4
deg_fn = partial(degradation_bsrgan_variant, sf=sf)
for i in range(20):
print(i)
img_hq = img
img_lq = deg_fn(img)["image"]
img_lq = deg_fn(img)['image']
img_hq, img_lq = util.uint2single(img_hq), util.uint2single(img_lq)
print(img_lq)
img_lq_bicubic = albumentations.SmallestMaxSize(max_size=h, interpolation=cv2.INTER_CUBIC)(image=img_hq)["image"]
img_lq_bicubic = albumentations.SmallestMaxSize(
max_size=h, interpolation=cv2.INTER_CUBIC
)(image=img_hq)['image']
print(img_lq.shape)
print("bicubic", img_lq_bicubic.shape)
print('bicubic', img_lq_bicubic.shape)
print(img_hq.shape)
lq_nearest = cv2.resize(util.single2uint(img_lq), (int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),
interpolation=0)
lq_bicubic_nearest = cv2.resize(util.single2uint(img_lq_bicubic),
(int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),
interpolation=0)
img_concat = np.concatenate([lq_bicubic_nearest, lq_nearest, util.single2uint(img_hq)], axis=1)
lq_nearest = cv2.resize(
util.single2uint(img_lq),
(int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),
interpolation=0,
)
lq_bicubic_nearest = cv2.resize(
util.single2uint(img_lq_bicubic),
(int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),
interpolation=0,
)
img_concat = np.concatenate(
[lq_bicubic_nearest, lq_nearest, util.single2uint(img_hq)], axis=1
)
util.imsave(img_concat, str(i) + '.png')

View File

@@ -6,13 +6,14 @@ import torch
import cv2
from torchvision.utils import make_grid
from datetime import datetime
#import matplotlib.pyplot as plt # TODO: check with Dominik, also bsrgan.py vs bsrgan_light.py
# import matplotlib.pyplot as plt # TODO: check with Dominik, also bsrgan.py vs bsrgan_light.py
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
'''
"""
# --------------------------------------------
# Kai Zhang (github: https://github.com/cszn)
# 03/Mar/2019
@@ -20,10 +21,22 @@ os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
# https://github.com/twhui/SRGAN-pyTorch
# https://github.com/xinntao/BasicSR
# --------------------------------------------
'''
"""
IMG_EXTENSIONS = ['.jpg', '.JPG', '.jpeg', '.JPEG', '.png', '.PNG', '.ppm', '.PPM', '.bmp', '.BMP', '.tif']
IMG_EXTENSIONS = [
'.jpg',
'.JPG',
'.jpeg',
'.JPEG',
'.png',
'.PNG',
'.ppm',
'.PPM',
'.bmp',
'.BMP',
'.tif',
]
def is_image_file(filename):
@@ -49,19 +62,19 @@ def surf(Z, cmap='rainbow', figsize=None):
ax3 = plt.axes(projection='3d')
w, h = Z.shape[:2]
xx = np.arange(0,w,1)
yy = np.arange(0,h,1)
xx = np.arange(0, w, 1)
yy = np.arange(0, h, 1)
X, Y = np.meshgrid(xx, yy)
ax3.plot_surface(X,Y,Z,cmap=cmap)
#ax3.contour(X,Y,Z, zdim='z',offset=-2cmap=cmap)
ax3.plot_surface(X, Y, Z, cmap=cmap)
# ax3.contour(X,Y,Z, zdim='z',offset=-2cmap=cmap)
plt.show()
'''
"""
# --------------------------------------------
# get image pathes
# --------------------------------------------
'''
"""
def get_image_paths(dataroot):
@@ -83,26 +96,26 @@ def _get_paths_from_images(path):
return images
'''
"""
# --------------------------------------------
# split large images into small images
# --------------------------------------------
'''
"""
def patches_from_image(img, p_size=512, p_overlap=64, p_max=800):
w, h = img.shape[:2]
patches = []
if w > p_max and h > p_max:
w1 = list(np.arange(0, w-p_size, p_size-p_overlap, dtype=np.int))
h1 = list(np.arange(0, h-p_size, p_size-p_overlap, dtype=np.int))
w1.append(w-p_size)
h1.append(h-p_size)
# print(w1)
# print(h1)
w1 = list(np.arange(0, w - p_size, p_size - p_overlap, dtype=np.int))
h1 = list(np.arange(0, h - p_size, p_size - p_overlap, dtype=np.int))
w1.append(w - p_size)
h1.append(h - p_size)
# print(w1)
# print(h1)
for i in w1:
for j in h1:
patches.append(img[i:i+p_size, j:j+p_size,:])
patches.append(img[i : i + p_size, j : j + p_size, :])
else:
patches.append(img)
@@ -118,11 +131,21 @@ def imssave(imgs, img_path):
for i, img in enumerate(imgs):
if img.ndim == 3:
img = img[:, :, [2, 1, 0]]
new_path = os.path.join(os.path.dirname(img_path), img_name+str('_s{:04d}'.format(i))+'.png')
new_path = os.path.join(
os.path.dirname(img_path),
img_name + str('_s{:04d}'.format(i)) + '.png',
)
cv2.imwrite(new_path, img)
def split_imageset(original_dataroot, taget_dataroot, n_channels=3, p_size=800, p_overlap=96, p_max=1000):
def split_imageset(
original_dataroot,
taget_dataroot,
n_channels=3,
p_size=800,
p_overlap=96,
p_max=1000,
):
"""
split the large images from original_dataroot into small overlapped images with size (p_size)x(p_size),
and save them into taget_dataroot; only the images with larger size than (p_max)x(p_max)
@@ -139,15 +162,18 @@ def split_imageset(original_dataroot, taget_dataroot, n_channels=3, p_size=800,
# img_name, ext = os.path.splitext(os.path.basename(img_path))
img = imread_uint(img_path, n_channels=n_channels)
patches = patches_from_image(img, p_size, p_overlap, p_max)
imssave(patches, os.path.join(taget_dataroot,os.path.basename(img_path)))
#if original_dataroot == taget_dataroot:
#del img_path
imssave(
patches, os.path.join(taget_dataroot, os.path.basename(img_path))
)
# if original_dataroot == taget_dataroot:
# del img_path
'''
"""
# --------------------------------------------
# makedir
# --------------------------------------------
'''
"""
def mkdir(path):
@@ -171,12 +197,12 @@ def mkdir_and_rename(path):
os.makedirs(path)
'''
"""
# --------------------------------------------
# read image from path
# opencv is fast, but read BGR numpy image
# --------------------------------------------
'''
"""
# --------------------------------------------
@@ -206,6 +232,7 @@ def imsave(img, img_path):
img = img[:, :, [2, 1, 0]]
cv2.imwrite(img_path, img)
def imwrite(img, img_path):
img = np.squeeze(img)
if img.ndim == 3:
@@ -213,7 +240,6 @@ def imwrite(img, img_path):
cv2.imwrite(img_path, img)
# --------------------------------------------
# get single image of size HxWxn_channles (BGR)
# --------------------------------------------
@@ -221,7 +247,7 @@ def read_img(path):
# read image by cv2
# return: Numpy float32, HWC, BGR, [0,1]
img = cv2.imread(path, cv2.IMREAD_UNCHANGED) # cv2.IMREAD_GRAYSCALE
img = img.astype(np.float32) / 255.
img = img.astype(np.float32) / 255.0
if img.ndim == 2:
img = np.expand_dims(img, axis=2)
# some images have 4 channels
@@ -230,7 +256,7 @@ def read_img(path):
return img
'''
"""
# --------------------------------------------
# image format conversion
# --------------------------------------------
@@ -238,7 +264,7 @@ def read_img(path):
# numpy(single) <---> tensor
# numpy(unit) <---> tensor
# --------------------------------------------
'''
"""
# --------------------------------------------
@@ -248,22 +274,22 @@ def read_img(path):
def uint2single(img):
return np.float32(img/255.)
return np.float32(img / 255.0)
def single2uint(img):
return np.uint8((img.clip(0, 1)*255.).round())
return np.uint8((img.clip(0, 1) * 255.0).round())
def uint162single(img):
return np.float32(img/65535.)
return np.float32(img / 65535.0)
def single2uint16(img):
return np.uint16((img.clip(0, 1)*65535.).round())
return np.uint16((img.clip(0, 1) * 65535.0).round())
# --------------------------------------------
@@ -275,14 +301,25 @@ def single2uint16(img):
def uint2tensor4(img):
if img.ndim == 2:
img = np.expand_dims(img, axis=2)
return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1).float().div(255.).unsqueeze(0)
return (
torch.from_numpy(np.ascontiguousarray(img))
.permute(2, 0, 1)
.float()
.div(255.0)
.unsqueeze(0)
)
# convert uint to 3-dimensional torch tensor
def uint2tensor3(img):
if img.ndim == 2:
img = np.expand_dims(img, axis=2)
return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1).float().div(255.)
return (
torch.from_numpy(np.ascontiguousarray(img))
.permute(2, 0, 1)
.float()
.div(255.0)
)
# convert 2/3/4-dimensional torch tensor to uint
@@ -290,7 +327,7 @@ def tensor2uint(img):
img = img.data.squeeze().float().clamp_(0, 1).cpu().numpy()
if img.ndim == 3:
img = np.transpose(img, (1, 2, 0))
return np.uint8((img*255.0).round())
return np.uint8((img * 255.0).round())
# --------------------------------------------
@@ -305,7 +342,12 @@ def single2tensor3(img):
# convert single (HxWxC) to 4-dimensional torch tensor
def single2tensor4(img):
return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1).float().unsqueeze(0)
return (
torch.from_numpy(np.ascontiguousarray(img))
.permute(2, 0, 1)
.float()
.unsqueeze(0)
)
# convert torch tensor to single
@@ -316,6 +358,7 @@ def tensor2single(img):
return img
# convert torch tensor to single
def tensor2single3(img):
img = img.data.squeeze().float().cpu().numpy()
@@ -327,30 +370,48 @@ def tensor2single3(img):
def single2tensor5(img):
return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1, 3).float().unsqueeze(0)
return (
torch.from_numpy(np.ascontiguousarray(img))
.permute(2, 0, 1, 3)
.float()
.unsqueeze(0)
)
def single32tensor5(img):
return torch.from_numpy(np.ascontiguousarray(img)).float().unsqueeze(0).unsqueeze(0)
return (
torch.from_numpy(np.ascontiguousarray(img))
.float()
.unsqueeze(0)
.unsqueeze(0)
)
def single42tensor4(img):
return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1, 3).float()
return (
torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1, 3).float()
)
# from skimage.io import imread, imsave
def tensor2img(tensor, out_type=np.uint8, min_max=(0, 1)):
'''
"""
Converts a torch Tensor into an image Numpy array of BGR channel order
Input: 4D(B,(3/1),H,W), 3D(C,H,W), or 2D(H,W), any range, RGB channel order
Output: 3D(H,W,C) or 2D(H,W), [0,255], np.uint8 (default)
'''
tensor = tensor.squeeze().float().cpu().clamp_(*min_max) # squeeze first, then clamp
tensor = (tensor - min_max[0]) / (min_max[1] - min_max[0]) # to range [0,1]
"""
tensor = (
tensor.squeeze().float().cpu().clamp_(*min_max)
) # squeeze first, then clamp
tensor = (tensor - min_max[0]) / (
min_max[1] - min_max[0]
) # to range [0,1]
n_dim = tensor.dim()
if n_dim == 4:
n_img = len(tensor)
img_np = make_grid(tensor, nrow=int(math.sqrt(n_img)), normalize=False).numpy()
img_np = make_grid(
tensor, nrow=int(math.sqrt(n_img)), normalize=False
).numpy()
img_np = np.transpose(img_np[[2, 1, 0], :, :], (1, 2, 0)) # HWC, BGR
elif n_dim == 3:
img_np = tensor.numpy()
@@ -359,14 +420,17 @@ def tensor2img(tensor, out_type=np.uint8, min_max=(0, 1)):
img_np = tensor.numpy()
else:
raise TypeError(
'Only support 4D, 3D and 2D tensor. But received with dimension: {:d}'.format(n_dim))
'Only support 4D, 3D and 2D tensor. But received with dimension: {:d}'.format(
n_dim
)
)
if out_type == np.uint8:
img_np = (img_np * 255.0).round()
# Important. Unlike matlab, numpy.unit8() WILL NOT round by default.
return img_np.astype(out_type)
'''
"""
# --------------------------------------------
# Augmentation, flipe and/or rotate
# --------------------------------------------
@@ -374,12 +438,11 @@ def tensor2img(tensor, out_type=np.uint8, min_max=(0, 1)):
# (1) augmet_img: numpy image of WxHxC or WxH
# (2) augment_img_tensor4: tensor image 1xCxWxH
# --------------------------------------------
'''
"""
def augment_img(img, mode=0):
'''Kai Zhang (github: https://github.com/cszn)
'''
"""Kai Zhang (github: https://github.com/cszn)"""
if mode == 0:
return img
elif mode == 1:
@@ -399,8 +462,7 @@ def augment_img(img, mode=0):
def augment_img_tensor4(img, mode=0):
'''Kai Zhang (github: https://github.com/cszn)
'''
"""Kai Zhang (github: https://github.com/cszn)"""
if mode == 0:
return img
elif mode == 1:
@@ -420,8 +482,7 @@ def augment_img_tensor4(img, mode=0):
def augment_img_tensor(img, mode=0):
'''Kai Zhang (github: https://github.com/cszn)
'''
"""Kai Zhang (github: https://github.com/cszn)"""
img_size = img.size()
img_np = img.data.cpu().numpy()
if len(img_size) == 3:
@@ -484,11 +545,11 @@ def augment_imgs(img_list, hflip=True, rot=True):
return [_augment(img) for img in img_list]
'''
"""
# --------------------------------------------
# modcrop and shave
# --------------------------------------------
'''
"""
def modcrop(img_in, scale):
@@ -497,11 +558,11 @@ def modcrop(img_in, scale):
if img.ndim == 2:
H, W = img.shape
H_r, W_r = H % scale, W % scale
img = img[:H - H_r, :W - W_r]
img = img[: H - H_r, : W - W_r]
elif img.ndim == 3:
H, W, C = img.shape
H_r, W_r = H % scale, W % scale
img = img[:H - H_r, :W - W_r, :]
img = img[: H - H_r, : W - W_r, :]
else:
raise ValueError('Wrong img ndim: [{:d}].'.format(img.ndim))
return img
@@ -511,11 +572,11 @@ def shave(img_in, border=0):
# img_in: Numpy, HWC or HW
img = np.copy(img_in)
h, w = img.shape[:2]
img = img[border:h-border, border:w-border]
img = img[border : h - border, border : w - border]
return img
'''
"""
# --------------------------------------------
# image processing process on numpy image
# channel_convert(in_c, tar_type, img_list):
@@ -523,74 +584,92 @@ def shave(img_in, border=0):
# bgr2ycbcr(img, only_y=True):
# ycbcr2rgb(img):
# --------------------------------------------
'''
"""
def rgb2ycbcr(img, only_y=True):
'''same as matlab rgb2ycbcr
"""same as matlab rgb2ycbcr
only_y: only return Y channel
Input:
uint8, [0, 255]
float, [0, 1]
'''
"""
in_img_type = img.dtype
img.astype(np.float32)
if in_img_type != np.uint8:
img *= 255.
img *= 255.0
# convert
if only_y:
rlt = np.dot(img, [65.481, 128.553, 24.966]) / 255.0 + 16.0
else:
rlt = np.matmul(img, [[65.481, -37.797, 112.0], [128.553, -74.203, -93.786],
[24.966, 112.0, -18.214]]) / 255.0 + [16, 128, 128]
rlt = np.matmul(
img,
[
[65.481, -37.797, 112.0],
[128.553, -74.203, -93.786],
[24.966, 112.0, -18.214],
],
) / 255.0 + [16, 128, 128]
if in_img_type == np.uint8:
rlt = rlt.round()
else:
rlt /= 255.
rlt /= 255.0
return rlt.astype(in_img_type)
def ycbcr2rgb(img):
'''same as matlab ycbcr2rgb
"""same as matlab ycbcr2rgb
Input:
uint8, [0, 255]
float, [0, 1]
'''
"""
in_img_type = img.dtype
img.astype(np.float32)
if in_img_type != np.uint8:
img *= 255.
img *= 255.0
# convert
rlt = np.matmul(img, [[0.00456621, 0.00456621, 0.00456621], [0, -0.00153632, 0.00791071],
[0.00625893, -0.00318811, 0]]) * 255.0 + [-222.921, 135.576, -276.836]
rlt = np.matmul(
img,
[
[0.00456621, 0.00456621, 0.00456621],
[0, -0.00153632, 0.00791071],
[0.00625893, -0.00318811, 0],
],
) * 255.0 + [-222.921, 135.576, -276.836]
if in_img_type == np.uint8:
rlt = rlt.round()
else:
rlt /= 255.
rlt /= 255.0
return rlt.astype(in_img_type)
def bgr2ycbcr(img, only_y=True):
'''bgr version of rgb2ycbcr
"""bgr version of rgb2ycbcr
only_y: only return Y channel
Input:
uint8, [0, 255]
float, [0, 1]
'''
"""
in_img_type = img.dtype
img.astype(np.float32)
if in_img_type != np.uint8:
img *= 255.
img *= 255.0
# convert
if only_y:
rlt = np.dot(img, [24.966, 128.553, 65.481]) / 255.0 + 16.0
else:
rlt = np.matmul(img, [[24.966, 112.0, -18.214], [128.553, -74.203, -93.786],
[65.481, -37.797, 112.0]]) / 255.0 + [16, 128, 128]
rlt = np.matmul(
img,
[
[24.966, 112.0, -18.214],
[128.553, -74.203, -93.786],
[65.481, -37.797, 112.0],
],
) / 255.0 + [16, 128, 128]
if in_img_type == np.uint8:
rlt = rlt.round()
else:
rlt /= 255.
rlt /= 255.0
return rlt.astype(in_img_type)
@@ -608,11 +687,11 @@ def channel_convert(in_c, tar_type, img_list):
return img_list
'''
"""
# --------------------------------------------
# metric, PSNR and SSIM
# --------------------------------------------
'''
"""
# --------------------------------------------
@@ -620,17 +699,17 @@ def channel_convert(in_c, tar_type, img_list):
# --------------------------------------------
def calculate_psnr(img1, img2, border=0):
# img1 and img2 have range [0, 255]
#img1 = img1.squeeze()
#img2 = img2.squeeze()
# img1 = img1.squeeze()
# img2 = img2.squeeze()
if not img1.shape == img2.shape:
raise ValueError('Input images must have the same dimensions.')
h, w = img1.shape[:2]
img1 = img1[border:h-border, border:w-border]
img2 = img2[border:h-border, border:w-border]
img1 = img1[border : h - border, border : w - border]
img2 = img2[border : h - border, border : w - border]
img1 = img1.astype(np.float64)
img2 = img2.astype(np.float64)
mse = np.mean((img1 - img2)**2)
mse = np.mean((img1 - img2) ** 2)
if mse == 0:
return float('inf')
return 20 * math.log10(255.0 / math.sqrt(mse))
@@ -640,17 +719,17 @@ def calculate_psnr(img1, img2, border=0):
# SSIM
# --------------------------------------------
def calculate_ssim(img1, img2, border=0):
'''calculate SSIM
"""calculate SSIM
the same outputs as MATLAB's
img1, img2: [0, 255]
'''
#img1 = img1.squeeze()
#img2 = img2.squeeze()
"""
# img1 = img1.squeeze()
# img2 = img2.squeeze()
if not img1.shape == img2.shape:
raise ValueError('Input images must have the same dimensions.')
h, w = img1.shape[:2]
img1 = img1[border:h-border, border:w-border]
img2 = img2[border:h-border, border:w-border]
img1 = img1[border : h - border, border : w - border]
img2 = img2[border : h - border, border : w - border]
if img1.ndim == 2:
return ssim(img1, img2)
@@ -658,7 +737,7 @@ def calculate_ssim(img1, img2, border=0):
if img1.shape[2] == 3:
ssims = []
for i in range(3):
ssims.append(ssim(img1[:,:,i], img2[:,:,i]))
ssims.append(ssim(img1[:, :, i], img2[:, :, i]))
return np.array(ssims).mean()
elif img1.shape[2] == 1:
return ssim(np.squeeze(img1), np.squeeze(img2))
@@ -667,8 +746,8 @@ def calculate_ssim(img1, img2, border=0):
def ssim(img1, img2):
C1 = (0.01 * 255)**2
C2 = (0.03 * 255)**2
C1 = (0.01 * 255) ** 2
C2 = (0.03 * 255) ** 2
img1 = img1.astype(np.float64)
img2 = img2.astype(np.float64)
@@ -684,16 +763,17 @@ def ssim(img1, img2):
sigma2_sq = cv2.filter2D(img2**2, -1, window)[5:-5, 5:-5] - mu2_sq
sigma12 = cv2.filter2D(img1 * img2, -1, window)[5:-5, 5:-5] - mu1_mu2
ssim_map = ((2 * mu1_mu2 + C1) * (2 * sigma12 + C2)) / ((mu1_sq + mu2_sq + C1) *
(sigma1_sq + sigma2_sq + C2))
ssim_map = ((2 * mu1_mu2 + C1) * (2 * sigma12 + C2)) / (
(mu1_sq + mu2_sq + C1) * (sigma1_sq + sigma2_sq + C2)
)
return ssim_map.mean()
'''
"""
# --------------------------------------------
# matlab's bicubic imresize (numpy and torch) [0, 1]
# --------------------------------------------
'''
"""
# matlab 'imresize' function, now only support 'bicubic'
@@ -701,11 +781,14 @@ def cubic(x):
absx = torch.abs(x)
absx2 = absx**2
absx3 = absx**3
return (1.5*absx3 - 2.5*absx2 + 1) * ((absx <= 1).type_as(absx)) + \
(-0.5*absx3 + 2.5*absx2 - 4*absx + 2) * (((absx > 1)*(absx <= 2)).type_as(absx))
return (1.5 * absx3 - 2.5 * absx2 + 1) * ((absx <= 1).type_as(absx)) + (
-0.5 * absx3 + 2.5 * absx2 - 4 * absx + 2
) * (((absx > 1) * (absx <= 2)).type_as(absx))
def calculate_weights_indices(in_length, out_length, scale, kernel, kernel_width, antialiasing):
def calculate_weights_indices(
in_length, out_length, scale, kernel, kernel_width, antialiasing
):
if (scale < 1) and (antialiasing):
# Use a modified kernel to simultaneously interpolate and antialias- larger kernel width
kernel_width = kernel_width / scale
@@ -729,8 +812,9 @@ def calculate_weights_indices(in_length, out_length, scale, kernel, kernel_width
# The indices of the input pixels involved in computing the k-th output
# pixel are in row k of the indices matrix.
indices = left.view(out_length, 1).expand(out_length, P) + torch.linspace(0, P - 1, P).view(
1, P).expand(out_length, P)
indices = left.view(out_length, 1).expand(out_length, P) + torch.linspace(
0, P - 1, P
).view(1, P).expand(out_length, P)
# The weights used to compute the k-th output pixel are in row k of the
# weights matrix.
@@ -771,7 +855,11 @@ def imresize(img, scale, antialiasing=True):
if need_squeeze:
img.unsqueeze_(0)
in_C, in_H, in_W = img.size()
out_C, out_H, out_W = in_C, math.ceil(in_H * scale), math.ceil(in_W * scale)
out_C, out_H, out_W = (
in_C,
math.ceil(in_H * scale),
math.ceil(in_W * scale),
)
kernel_width = 4
kernel = 'cubic'
@@ -782,9 +870,11 @@ def imresize(img, scale, antialiasing=True):
# get weights and indices
weights_H, indices_H, sym_len_Hs, sym_len_He = calculate_weights_indices(
in_H, out_H, scale, kernel, kernel_width, antialiasing)
in_H, out_H, scale, kernel, kernel_width, antialiasing
)
weights_W, indices_W, sym_len_Ws, sym_len_We = calculate_weights_indices(
in_W, out_W, scale, kernel, kernel_width, antialiasing)
in_W, out_W, scale, kernel, kernel_width, antialiasing
)
# process H dimension
# symmetric copying
img_aug = torch.FloatTensor(in_C, in_H + sym_len_Hs + sym_len_He, in_W)
@@ -805,7 +895,11 @@ def imresize(img, scale, antialiasing=True):
for i in range(out_H):
idx = int(indices_H[i][0])
for j in range(out_C):
out_1[j, i, :] = img_aug[j, idx:idx + kernel_width, :].transpose(0, 1).mv(weights_H[i])
out_1[j, i, :] = (
img_aug[j, idx : idx + kernel_width, :]
.transpose(0, 1)
.mv(weights_H[i])
)
# process W dimension
# symmetric copying
@@ -827,7 +921,9 @@ def imresize(img, scale, antialiasing=True):
for i in range(out_W):
idx = int(indices_W[i][0])
for j in range(out_C):
out_2[j, :, i] = out_1_aug[j, :, idx:idx + kernel_width].mv(weights_W[i])
out_2[j, :, i] = out_1_aug[j, :, idx : idx + kernel_width].mv(
weights_W[i]
)
if need_squeeze:
out_2.squeeze_()
return out_2
@@ -846,7 +942,11 @@ def imresize_np(img, scale, antialiasing=True):
img.unsqueeze_(2)
in_H, in_W, in_C = img.size()
out_C, out_H, out_W = in_C, math.ceil(in_H * scale), math.ceil(in_W * scale)
out_C, out_H, out_W = (
in_C,
math.ceil(in_H * scale),
math.ceil(in_W * scale),
)
kernel_width = 4
kernel = 'cubic'
@@ -857,9 +957,11 @@ def imresize_np(img, scale, antialiasing=True):
# get weights and indices
weights_H, indices_H, sym_len_Hs, sym_len_He = calculate_weights_indices(
in_H, out_H, scale, kernel, kernel_width, antialiasing)
in_H, out_H, scale, kernel, kernel_width, antialiasing
)
weights_W, indices_W, sym_len_Ws, sym_len_We = calculate_weights_indices(
in_W, out_W, scale, kernel, kernel_width, antialiasing)
in_W, out_W, scale, kernel, kernel_width, antialiasing
)
# process H dimension
# symmetric copying
img_aug = torch.FloatTensor(in_H + sym_len_Hs + sym_len_He, in_W, in_C)
@@ -880,7 +982,11 @@ def imresize_np(img, scale, antialiasing=True):
for i in range(out_H):
idx = int(indices_H[i][0])
for j in range(out_C):
out_1[i, :, j] = img_aug[idx:idx + kernel_width, :, j].transpose(0, 1).mv(weights_H[i])
out_1[i, :, j] = (
img_aug[idx : idx + kernel_width, :, j]
.transpose(0, 1)
.mv(weights_H[i])
)
# process W dimension
# symmetric copying
@@ -902,7 +1008,9 @@ def imresize_np(img, scale, antialiasing=True):
for i in range(out_W):
idx = int(indices_W[i][0])
for j in range(out_C):
out_2[:, i, j] = out_1_aug[:, idx:idx + kernel_width, j].mv(weights_W[i])
out_2[:, i, j] = out_1_aug[:, idx : idx + kernel_width, j].mv(
weights_W[i]
)
if need_squeeze:
out_2.squeeze_()
@@ -913,4 +1021,4 @@ if __name__ == '__main__':
print('---')
# img = imread_uint('test.bmp', 3)
# img = uint2single(img)
# img_bicubic = imresize_np(img, 1/4)
# img_bicubic = imresize_np(img, 1/4)

View File

@@ -1 +1 @@
from ldm.modules.losses.contperceptual import LPIPSWithDiscriminator
from ldm.modules.losses.contperceptual import LPIPSWithDiscriminator

View File

@@ -5,13 +5,24 @@ from taming.modules.losses.vqperceptual import * # TODO: taming dependency yes/
class LPIPSWithDiscriminator(nn.Module):
def __init__(self, disc_start, logvar_init=0.0, kl_weight=1.0, pixelloss_weight=1.0,
disc_num_layers=3, disc_in_channels=3, disc_factor=1.0, disc_weight=1.0,
perceptual_weight=1.0, use_actnorm=False, disc_conditional=False,
disc_loss="hinge"):
def __init__(
self,
disc_start,
logvar_init=0.0,
kl_weight=1.0,
pixelloss_weight=1.0,
disc_num_layers=3,
disc_in_channels=3,
disc_factor=1.0,
disc_weight=1.0,
perceptual_weight=1.0,
use_actnorm=False,
disc_conditional=False,
disc_loss='hinge',
):
super().__init__()
assert disc_loss in ["hinge", "vanilla"]
assert disc_loss in ['hinge', 'vanilla']
self.kl_weight = kl_weight
self.pixel_weight = pixelloss_weight
self.perceptual_loss = LPIPS().eval()
@@ -19,42 +30,68 @@ class LPIPSWithDiscriminator(nn.Module):
# output log variance
self.logvar = nn.Parameter(torch.ones(size=()) * logvar_init)
self.discriminator = NLayerDiscriminator(input_nc=disc_in_channels,
n_layers=disc_num_layers,
use_actnorm=use_actnorm
).apply(weights_init)
self.discriminator = NLayerDiscriminator(
input_nc=disc_in_channels,
n_layers=disc_num_layers,
use_actnorm=use_actnorm,
).apply(weights_init)
self.discriminator_iter_start = disc_start
self.disc_loss = hinge_d_loss if disc_loss == "hinge" else vanilla_d_loss
self.disc_loss = (
hinge_d_loss if disc_loss == 'hinge' else vanilla_d_loss
)
self.disc_factor = disc_factor
self.discriminator_weight = disc_weight
self.disc_conditional = disc_conditional
def calculate_adaptive_weight(self, nll_loss, g_loss, last_layer=None):
if last_layer is not None:
nll_grads = torch.autograd.grad(nll_loss, last_layer, retain_graph=True)[0]
g_grads = torch.autograd.grad(g_loss, last_layer, retain_graph=True)[0]
nll_grads = torch.autograd.grad(
nll_loss, last_layer, retain_graph=True
)[0]
g_grads = torch.autograd.grad(
g_loss, last_layer, retain_graph=True
)[0]
else:
nll_grads = torch.autograd.grad(nll_loss, self.last_layer[0], retain_graph=True)[0]
g_grads = torch.autograd.grad(g_loss, self.last_layer[0], retain_graph=True)[0]
nll_grads = torch.autograd.grad(
nll_loss, self.last_layer[0], retain_graph=True
)[0]
g_grads = torch.autograd.grad(
g_loss, self.last_layer[0], retain_graph=True
)[0]
d_weight = torch.norm(nll_grads) / (torch.norm(g_grads) + 1e-4)
d_weight = torch.clamp(d_weight, 0.0, 1e4).detach()
d_weight = d_weight * self.discriminator_weight
return d_weight
def forward(self, inputs, reconstructions, posteriors, optimizer_idx,
global_step, last_layer=None, cond=None, split="train",
weights=None):
rec_loss = torch.abs(inputs.contiguous() - reconstructions.contiguous())
def forward(
self,
inputs,
reconstructions,
posteriors,
optimizer_idx,
global_step,
last_layer=None,
cond=None,
split='train',
weights=None,
):
rec_loss = torch.abs(
inputs.contiguous() - reconstructions.contiguous()
)
if self.perceptual_weight > 0:
p_loss = self.perceptual_loss(inputs.contiguous(), reconstructions.contiguous())
p_loss = self.perceptual_loss(
inputs.contiguous(), reconstructions.contiguous()
)
rec_loss = rec_loss + self.perceptual_weight * p_loss
nll_loss = rec_loss / torch.exp(self.logvar) + self.logvar
weighted_nll_loss = nll_loss
if weights is not None:
weighted_nll_loss = weights*nll_loss
weighted_nll_loss = torch.sum(weighted_nll_loss) / weighted_nll_loss.shape[0]
weighted_nll_loss = weights * nll_loss
weighted_nll_loss = (
torch.sum(weighted_nll_loss) / weighted_nll_loss.shape[0]
)
nll_loss = torch.sum(nll_loss) / nll_loss.shape[0]
kl_loss = posteriors.kl()
kl_loss = torch.sum(kl_loss) / kl_loss.shape[0]
@@ -67,45 +104,72 @@ class LPIPSWithDiscriminator(nn.Module):
logits_fake = self.discriminator(reconstructions.contiguous())
else:
assert self.disc_conditional
logits_fake = self.discriminator(torch.cat((reconstructions.contiguous(), cond), dim=1))
logits_fake = self.discriminator(
torch.cat((reconstructions.contiguous(), cond), dim=1)
)
g_loss = -torch.mean(logits_fake)
if self.disc_factor > 0.0:
try:
d_weight = self.calculate_adaptive_weight(nll_loss, g_loss, last_layer=last_layer)
d_weight = self.calculate_adaptive_weight(
nll_loss, g_loss, last_layer=last_layer
)
except RuntimeError:
assert not self.training
d_weight = torch.tensor(0.0)
else:
d_weight = torch.tensor(0.0)
disc_factor = adopt_weight(self.disc_factor, global_step, threshold=self.discriminator_iter_start)
loss = weighted_nll_loss + self.kl_weight * kl_loss + d_weight * disc_factor * g_loss
disc_factor = adopt_weight(
self.disc_factor,
global_step,
threshold=self.discriminator_iter_start,
)
loss = (
weighted_nll_loss
+ self.kl_weight * kl_loss
+ d_weight * disc_factor * g_loss
)
log = {"{}/total_loss".format(split): loss.clone().detach().mean(), "{}/logvar".format(split): self.logvar.detach(),
"{}/kl_loss".format(split): kl_loss.detach().mean(), "{}/nll_loss".format(split): nll_loss.detach().mean(),
"{}/rec_loss".format(split): rec_loss.detach().mean(),
"{}/d_weight".format(split): d_weight.detach(),
"{}/disc_factor".format(split): torch.tensor(disc_factor),
"{}/g_loss".format(split): g_loss.detach().mean(),
}
log = {
'{}/total_loss'.format(split): loss.clone().detach().mean(),
'{}/logvar'.format(split): self.logvar.detach(),
'{}/kl_loss'.format(split): kl_loss.detach().mean(),
'{}/nll_loss'.format(split): nll_loss.detach().mean(),
'{}/rec_loss'.format(split): rec_loss.detach().mean(),
'{}/d_weight'.format(split): d_weight.detach(),
'{}/disc_factor'.format(split): torch.tensor(disc_factor),
'{}/g_loss'.format(split): g_loss.detach().mean(),
}
return loss, log
if optimizer_idx == 1:
# second pass for discriminator update
if cond is None:
logits_real = self.discriminator(inputs.contiguous().detach())
logits_fake = self.discriminator(reconstructions.contiguous().detach())
logits_fake = self.discriminator(
reconstructions.contiguous().detach()
)
else:
logits_real = self.discriminator(torch.cat((inputs.contiguous().detach(), cond), dim=1))
logits_fake = self.discriminator(torch.cat((reconstructions.contiguous().detach(), cond), dim=1))
logits_real = self.discriminator(
torch.cat((inputs.contiguous().detach(), cond), dim=1)
)
logits_fake = self.discriminator(
torch.cat(
(reconstructions.contiguous().detach(), cond), dim=1
)
)
disc_factor = adopt_weight(self.disc_factor, global_step, threshold=self.discriminator_iter_start)
disc_factor = adopt_weight(
self.disc_factor,
global_step,
threshold=self.discriminator_iter_start,
)
d_loss = disc_factor * self.disc_loss(logits_real, logits_fake)
log = {"{}/disc_loss".format(split): d_loss.clone().detach().mean(),
"{}/logits_real".format(split): logits_real.detach().mean(),
"{}/logits_fake".format(split): logits_fake.detach().mean()
}
log = {
'{}/disc_loss'.format(split): d_loss.clone().detach().mean(),
'{}/logits_real'.format(split): logits_real.detach().mean(),
'{}/logits_fake'.format(split): logits_fake.detach().mean(),
}
return d_loss, log

View File

@@ -3,21 +3,25 @@ from torch import nn
import torch.nn.functional as F
from einops import repeat
from taming.modules.discriminator.model import NLayerDiscriminator, weights_init
from taming.modules.discriminator.model import (
NLayerDiscriminator,
weights_init,
)
from taming.modules.losses.lpips import LPIPS
from taming.modules.losses.vqperceptual import hinge_d_loss, vanilla_d_loss
def hinge_d_loss_with_exemplar_weights(logits_real, logits_fake, weights):
assert weights.shape[0] == logits_real.shape[0] == logits_fake.shape[0]
loss_real = torch.mean(F.relu(1. - logits_real), dim=[1,2,3])
loss_fake = torch.mean(F.relu(1. + logits_fake), dim=[1,2,3])
loss_real = torch.mean(F.relu(1.0 - logits_real), dim=[1, 2, 3])
loss_fake = torch.mean(F.relu(1.0 + logits_fake), dim=[1, 2, 3])
loss_real = (weights * loss_real).sum() / weights.sum()
loss_fake = (weights * loss_fake).sum() / weights.sum()
d_loss = 0.5 * (loss_real + loss_fake)
return d_loss
def adopt_weight(weight, global_step, threshold=0, value=0.):
def adopt_weight(weight, global_step, threshold=0, value=0.0):
if global_step < threshold:
weight = value
return weight
@@ -26,57 +30,76 @@ def adopt_weight(weight, global_step, threshold=0, value=0.):
def measure_perplexity(predicted_indices, n_embed):
# src: https://github.com/karpathy/deep-vector-quantization/blob/main/model.py
# eval cluster perplexity. when perplexity == num_embeddings then all clusters are used exactly equally
encodings = F.one_hot(predicted_indices, n_embed).float().reshape(-1, n_embed)
encodings = (
F.one_hot(predicted_indices, n_embed).float().reshape(-1, n_embed)
)
avg_probs = encodings.mean(0)
perplexity = (-(avg_probs * torch.log(avg_probs + 1e-10)).sum()).exp()
cluster_use = torch.sum(avg_probs > 0)
return perplexity, cluster_use
def l1(x, y):
return torch.abs(x-y)
return torch.abs(x - y)
def l2(x, y):
return torch.pow((x-y), 2)
return torch.pow((x - y), 2)
class VQLPIPSWithDiscriminator(nn.Module):
def __init__(self, disc_start, codebook_weight=1.0, pixelloss_weight=1.0,
disc_num_layers=3, disc_in_channels=3, disc_factor=1.0, disc_weight=1.0,
perceptual_weight=1.0, use_actnorm=False, disc_conditional=False,
disc_ndf=64, disc_loss="hinge", n_classes=None, perceptual_loss="lpips",
pixel_loss="l1"):
def __init__(
self,
disc_start,
codebook_weight=1.0,
pixelloss_weight=1.0,
disc_num_layers=3,
disc_in_channels=3,
disc_factor=1.0,
disc_weight=1.0,
perceptual_weight=1.0,
use_actnorm=False,
disc_conditional=False,
disc_ndf=64,
disc_loss='hinge',
n_classes=None,
perceptual_loss='lpips',
pixel_loss='l1',
):
super().__init__()
assert disc_loss in ["hinge", "vanilla"]
assert perceptual_loss in ["lpips", "clips", "dists"]
assert pixel_loss in ["l1", "l2"]
assert disc_loss in ['hinge', 'vanilla']
assert perceptual_loss in ['lpips', 'clips', 'dists']
assert pixel_loss in ['l1', 'l2']
self.codebook_weight = codebook_weight
self.pixel_weight = pixelloss_weight
if perceptual_loss == "lpips":
print(f"{self.__class__.__name__}: Running with LPIPS.")
if perceptual_loss == 'lpips':
print(f'{self.__class__.__name__}: Running with LPIPS.')
self.perceptual_loss = LPIPS().eval()
else:
raise ValueError(f"Unknown perceptual loss: >> {perceptual_loss} <<")
raise ValueError(
f'Unknown perceptual loss: >> {perceptual_loss} <<'
)
self.perceptual_weight = perceptual_weight
if pixel_loss == "l1":
if pixel_loss == 'l1':
self.pixel_loss = l1
else:
self.pixel_loss = l2
self.discriminator = NLayerDiscriminator(input_nc=disc_in_channels,
n_layers=disc_num_layers,
use_actnorm=use_actnorm,
ndf=disc_ndf
).apply(weights_init)
self.discriminator = NLayerDiscriminator(
input_nc=disc_in_channels,
n_layers=disc_num_layers,
use_actnorm=use_actnorm,
ndf=disc_ndf,
).apply(weights_init)
self.discriminator_iter_start = disc_start
if disc_loss == "hinge":
if disc_loss == 'hinge':
self.disc_loss = hinge_d_loss
elif disc_loss == "vanilla":
elif disc_loss == 'vanilla':
self.disc_loss = vanilla_d_loss
else:
raise ValueError(f"Unknown GAN loss '{disc_loss}'.")
print(f"VQLPIPSWithDiscriminator running with {disc_loss} loss.")
print(f'VQLPIPSWithDiscriminator running with {disc_loss} loss.')
self.disc_factor = disc_factor
self.discriminator_weight = disc_weight
self.disc_conditional = disc_conditional
@@ -84,31 +107,53 @@ class VQLPIPSWithDiscriminator(nn.Module):
def calculate_adaptive_weight(self, nll_loss, g_loss, last_layer=None):
if last_layer is not None:
nll_grads = torch.autograd.grad(nll_loss, last_layer, retain_graph=True)[0]
g_grads = torch.autograd.grad(g_loss, last_layer, retain_graph=True)[0]
nll_grads = torch.autograd.grad(
nll_loss, last_layer, retain_graph=True
)[0]
g_grads = torch.autograd.grad(
g_loss, last_layer, retain_graph=True
)[0]
else:
nll_grads = torch.autograd.grad(nll_loss, self.last_layer[0], retain_graph=True)[0]
g_grads = torch.autograd.grad(g_loss, self.last_layer[0], retain_graph=True)[0]
nll_grads = torch.autograd.grad(
nll_loss, self.last_layer[0], retain_graph=True
)[0]
g_grads = torch.autograd.grad(
g_loss, self.last_layer[0], retain_graph=True
)[0]
d_weight = torch.norm(nll_grads) / (torch.norm(g_grads) + 1e-4)
d_weight = torch.clamp(d_weight, 0.0, 1e4).detach()
d_weight = d_weight * self.discriminator_weight
return d_weight
def forward(self, codebook_loss, inputs, reconstructions, optimizer_idx,
global_step, last_layer=None, cond=None, split="train", predicted_indices=None):
def forward(
self,
codebook_loss,
inputs,
reconstructions,
optimizer_idx,
global_step,
last_layer=None,
cond=None,
split='train',
predicted_indices=None,
):
if not exists(codebook_loss):
codebook_loss = torch.tensor([0.]).to(inputs.device)
#rec_loss = torch.abs(inputs.contiguous() - reconstructions.contiguous())
rec_loss = self.pixel_loss(inputs.contiguous(), reconstructions.contiguous())
codebook_loss = torch.tensor([0.0]).to(inputs.device)
# rec_loss = torch.abs(inputs.contiguous() - reconstructions.contiguous())
rec_loss = self.pixel_loss(
inputs.contiguous(), reconstructions.contiguous()
)
if self.perceptual_weight > 0:
p_loss = self.perceptual_loss(inputs.contiguous(), reconstructions.contiguous())
p_loss = self.perceptual_loss(
inputs.contiguous(), reconstructions.contiguous()
)
rec_loss = rec_loss + self.perceptual_weight * p_loss
else:
p_loss = torch.tensor([0.0])
nll_loss = rec_loss
#nll_loss = torch.sum(nll_loss) / nll_loss.shape[0]
# nll_loss = torch.sum(nll_loss) / nll_loss.shape[0]
nll_loss = torch.mean(nll_loss)
# now the GAN part
@@ -119,49 +164,77 @@ class VQLPIPSWithDiscriminator(nn.Module):
logits_fake = self.discriminator(reconstructions.contiguous())
else:
assert self.disc_conditional
logits_fake = self.discriminator(torch.cat((reconstructions.contiguous(), cond), dim=1))
logits_fake = self.discriminator(
torch.cat((reconstructions.contiguous(), cond), dim=1)
)
g_loss = -torch.mean(logits_fake)
try:
d_weight = self.calculate_adaptive_weight(nll_loss, g_loss, last_layer=last_layer)
d_weight = self.calculate_adaptive_weight(
nll_loss, g_loss, last_layer=last_layer
)
except RuntimeError:
assert not self.training
d_weight = torch.tensor(0.0)
disc_factor = adopt_weight(self.disc_factor, global_step, threshold=self.discriminator_iter_start)
loss = nll_loss + d_weight * disc_factor * g_loss + self.codebook_weight * codebook_loss.mean()
disc_factor = adopt_weight(
self.disc_factor,
global_step,
threshold=self.discriminator_iter_start,
)
loss = (
nll_loss
+ d_weight * disc_factor * g_loss
+ self.codebook_weight * codebook_loss.mean()
)
log = {"{}/total_loss".format(split): loss.clone().detach().mean(),
"{}/quant_loss".format(split): codebook_loss.detach().mean(),
"{}/nll_loss".format(split): nll_loss.detach().mean(),
"{}/rec_loss".format(split): rec_loss.detach().mean(),
"{}/p_loss".format(split): p_loss.detach().mean(),
"{}/d_weight".format(split): d_weight.detach(),
"{}/disc_factor".format(split): torch.tensor(disc_factor),
"{}/g_loss".format(split): g_loss.detach().mean(),
}
log = {
'{}/total_loss'.format(split): loss.clone().detach().mean(),
'{}/quant_loss'.format(split): codebook_loss.detach().mean(),
'{}/nll_loss'.format(split): nll_loss.detach().mean(),
'{}/rec_loss'.format(split): rec_loss.detach().mean(),
'{}/p_loss'.format(split): p_loss.detach().mean(),
'{}/d_weight'.format(split): d_weight.detach(),
'{}/disc_factor'.format(split): torch.tensor(disc_factor),
'{}/g_loss'.format(split): g_loss.detach().mean(),
}
if predicted_indices is not None:
assert self.n_classes is not None
with torch.no_grad():
perplexity, cluster_usage = measure_perplexity(predicted_indices, self.n_classes)
log[f"{split}/perplexity"] = perplexity
log[f"{split}/cluster_usage"] = cluster_usage
perplexity, cluster_usage = measure_perplexity(
predicted_indices, self.n_classes
)
log[f'{split}/perplexity'] = perplexity
log[f'{split}/cluster_usage'] = cluster_usage
return loss, log
if optimizer_idx == 1:
# second pass for discriminator update
if cond is None:
logits_real = self.discriminator(inputs.contiguous().detach())
logits_fake = self.discriminator(reconstructions.contiguous().detach())
logits_fake = self.discriminator(
reconstructions.contiguous().detach()
)
else:
logits_real = self.discriminator(torch.cat((inputs.contiguous().detach(), cond), dim=1))
logits_fake = self.discriminator(torch.cat((reconstructions.contiguous().detach(), cond), dim=1))
logits_real = self.discriminator(
torch.cat((inputs.contiguous().detach(), cond), dim=1)
)
logits_fake = self.discriminator(
torch.cat(
(reconstructions.contiguous().detach(), cond), dim=1
)
)
disc_factor = adopt_weight(self.disc_factor, global_step, threshold=self.discriminator_iter_start)
disc_factor = adopt_weight(
self.disc_factor,
global_step,
threshold=self.discriminator_iter_start,
)
d_loss = disc_factor * self.disc_loss(logits_real, logits_fake)
log = {"{}/disc_loss".format(split): d_loss.clone().detach().mean(),
"{}/logits_real".format(split): logits_real.detach().mean(),
"{}/logits_fake".format(split): logits_fake.detach().mean()
}
log = {
'{}/disc_loss'.format(split): d_loss.clone().detach().mean(),
'{}/logits_real'.format(split): logits_real.detach().mean(),
'{}/logits_fake'.format(split): logits_fake.detach().mean(),
}
return d_loss, log

View File

@@ -11,15 +11,13 @@ from einops import rearrange, repeat, reduce
DEFAULT_DIM_HEAD = 64
Intermediates = namedtuple('Intermediates', [
'pre_softmax_attn',
'post_softmax_attn'
])
Intermediates = namedtuple(
'Intermediates', ['pre_softmax_attn', 'post_softmax_attn']
)
LayerIntermediates = namedtuple('Intermediates', [
'hiddens',
'attn_intermediates'
])
LayerIntermediates = namedtuple(
'Intermediates', ['hiddens', 'attn_intermediates']
)
class AbsolutePositionalEmbedding(nn.Module):
@@ -39,11 +37,16 @@ class AbsolutePositionalEmbedding(nn.Module):
class FixedPositionalEmbedding(nn.Module):
def __init__(self, dim):
super().__init__()
inv_freq = 1. / (10000 ** (torch.arange(0, dim, 2).float() / dim))
inv_freq = 1.0 / (10000 ** (torch.arange(0, dim, 2).float() / dim))
self.register_buffer('inv_freq', inv_freq)
def forward(self, x, seq_dim=1, offset=0):
t = torch.arange(x.shape[seq_dim], device=x.device).type_as(self.inv_freq) + offset
t = (
torch.arange(x.shape[seq_dim], device=x.device).type_as(
self.inv_freq
)
+ offset
)
sinusoid_inp = torch.einsum('i , j -> i j', t, self.inv_freq)
emb = torch.cat((sinusoid_inp.sin(), sinusoid_inp.cos()), dim=-1)
return emb[None, :, :]
@@ -51,6 +54,7 @@ class FixedPositionalEmbedding(nn.Module):
# helpers
def exists(val):
return val is not None
@@ -64,18 +68,21 @@ def default(val, d):
def always(val):
def inner(*args, **kwargs):
return val
return inner
def not_equals(val):
def inner(x):
return x != val
return inner
def equals(val):
def inner(x):
return x == val
return inner
@@ -85,6 +92,7 @@ def max_neg_value(tensor):
# keyword argument helpers
def pick_and_pop(keys, d):
values = list(map(lambda key: d.pop(key), keys))
return dict(zip(keys, values))
@@ -108,8 +116,15 @@ def group_by_key_prefix(prefix, d):
def groupby_prefix_and_trim(prefix, d):
kwargs_with_prefix, kwargs = group_dict_by_key(partial(string_begins_with, prefix), d)
kwargs_without_prefix = dict(map(lambda x: (x[0][len(prefix):], x[1]), tuple(kwargs_with_prefix.items())))
kwargs_with_prefix, kwargs = group_dict_by_key(
partial(string_begins_with, prefix), d
)
kwargs_without_prefix = dict(
map(
lambda x: (x[0][len(prefix) :], x[1]),
tuple(kwargs_with_prefix.items()),
)
)
return kwargs_without_prefix, kwargs
@@ -139,7 +154,7 @@ class Rezero(nn.Module):
class ScaleNorm(nn.Module):
def __init__(self, dim, eps=1e-5):
super().__init__()
self.scale = dim ** -0.5
self.scale = dim**-0.5
self.eps = eps
self.g = nn.Parameter(torch.ones(1))
@@ -151,7 +166,7 @@ class ScaleNorm(nn.Module):
class RMSNorm(nn.Module):
def __init__(self, dim, eps=1e-8):
super().__init__()
self.scale = dim ** -0.5
self.scale = dim**-0.5
self.eps = eps
self.g = nn.Parameter(torch.ones(dim))
@@ -173,7 +188,7 @@ class GRUGating(nn.Module):
def forward(self, x, residual):
gated_output = self.gru(
rearrange(x, 'b n d -> (b n) d'),
rearrange(residual, 'b n d -> (b n) d')
rearrange(residual, 'b n d -> (b n) d'),
)
return gated_output.reshape_as(x)
@@ -181,6 +196,7 @@ class GRUGating(nn.Module):
# feedforward
class GEGLU(nn.Module):
def __init__(self, dim_in, dim_out):
super().__init__()
@@ -192,19 +208,18 @@ class GEGLU(nn.Module):
class FeedForward(nn.Module):
def __init__(self, dim, dim_out=None, mult=4, glu=False, dropout=0.):
def __init__(self, dim, dim_out=None, mult=4, glu=False, dropout=0.0):
super().__init__()
inner_dim = int(dim * mult)
dim_out = default(dim_out, dim)
project_in = nn.Sequential(
nn.Linear(dim, inner_dim),
nn.GELU()
) if not glu else GEGLU(dim, inner_dim)
project_in = (
nn.Sequential(nn.Linear(dim, inner_dim), nn.GELU())
if not glu
else GEGLU(dim, inner_dim)
)
self.net = nn.Sequential(
project_in,
nn.Dropout(dropout),
nn.Linear(inner_dim, dim_out)
project_in, nn.Dropout(dropout), nn.Linear(inner_dim, dim_out)
)
def forward(self, x):
@@ -214,23 +229,25 @@ class FeedForward(nn.Module):
# attention.
class Attention(nn.Module):
def __init__(
self,
dim,
dim_head=DEFAULT_DIM_HEAD,
heads=8,
causal=False,
mask=None,
talking_heads=False,
sparse_topk=None,
use_entmax15=False,
num_mem_kv=0,
dropout=0.,
on_attn=False
self,
dim,
dim_head=DEFAULT_DIM_HEAD,
heads=8,
causal=False,
mask=None,
talking_heads=False,
sparse_topk=None,
use_entmax15=False,
num_mem_kv=0,
dropout=0.0,
on_attn=False,
):
super().__init__()
if use_entmax15:
raise NotImplementedError("Check out entmax activation instead of softmax activation!")
self.scale = dim_head ** -0.5
raise NotImplementedError(
'Check out entmax activation instead of softmax activation!'
)
self.scale = dim_head**-0.5
self.heads = heads
self.causal = causal
self.mask = mask
@@ -252,7 +269,7 @@ class Attention(nn.Module):
self.sparse_topk = sparse_topk
# entmax
#self.attn_fn = entmax15 if use_entmax15 else F.softmax
# self.attn_fn = entmax15 if use_entmax15 else F.softmax
self.attn_fn = F.softmax
# add memory key / values
@@ -263,20 +280,29 @@ class Attention(nn.Module):
# attention on attention
self.attn_on_attn = on_attn
self.to_out = nn.Sequential(nn.Linear(inner_dim, dim * 2), nn.GLU()) if on_attn else nn.Linear(inner_dim, dim)
self.to_out = (
nn.Sequential(nn.Linear(inner_dim, dim * 2), nn.GLU())
if on_attn
else nn.Linear(inner_dim, dim)
)
def forward(
self,
x,
context=None,
mask=None,
context_mask=None,
rel_pos=None,
sinusoidal_emb=None,
prev_attn=None,
mem=None
self,
x,
context=None,
mask=None,
context_mask=None,
rel_pos=None,
sinusoidal_emb=None,
prev_attn=None,
mem=None,
):
b, n, _, h, talking_heads, device = *x.shape, self.heads, self.talking_heads, x.device
b, n, _, h, talking_heads, device = (
*x.shape,
self.heads,
self.talking_heads,
x.device,
)
kv_input = default(context, x)
q_input = x
@@ -297,23 +323,35 @@ class Attention(nn.Module):
k = self.to_k(k_input)
v = self.to_v(v_input)
q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h=h), (q, k, v))
q, k, v = map(
lambda t: rearrange(t, 'b n (h d) -> b h n d', h=h), (q, k, v)
)
input_mask = None
if any(map(exists, (mask, context_mask))):
q_mask = default(mask, lambda: torch.ones((b, n), device=device).bool())
q_mask = default(
mask, lambda: torch.ones((b, n), device=device).bool()
)
k_mask = q_mask if not exists(context) else context_mask
k_mask = default(k_mask, lambda: torch.ones((b, k.shape[-2]), device=device).bool())
k_mask = default(
k_mask,
lambda: torch.ones((b, k.shape[-2]), device=device).bool(),
)
q_mask = rearrange(q_mask, 'b i -> b () i ()')
k_mask = rearrange(k_mask, 'b j -> b () () j')
input_mask = q_mask * k_mask
if self.num_mem_kv > 0:
mem_k, mem_v = map(lambda t: repeat(t, 'h n d -> b h n d', b=b), (self.mem_k, self.mem_v))
mem_k, mem_v = map(
lambda t: repeat(t, 'h n d -> b h n d', b=b),
(self.mem_k, self.mem_v),
)
k = torch.cat((mem_k, k), dim=-2)
v = torch.cat((mem_v, v), dim=-2)
if exists(input_mask):
input_mask = F.pad(input_mask, (self.num_mem_kv, 0), value=True)
input_mask = F.pad(
input_mask, (self.num_mem_kv, 0), value=True
)
dots = einsum('b h i d, b h j d -> b h i j', q, k) * self.scale
mask_value = max_neg_value(dots)
@@ -324,7 +362,9 @@ class Attention(nn.Module):
pre_softmax_attn = dots
if talking_heads:
dots = einsum('b h i j, h k -> b k i j', dots, self.pre_softmax_proj).contiguous()
dots = einsum(
'b h i j, h k -> b k i j', dots, self.pre_softmax_proj
).contiguous()
if exists(rel_pos):
dots = rel_pos(dots)
@@ -336,7 +376,9 @@ class Attention(nn.Module):
if self.causal:
i, j = dots.shape[-2:]
r = torch.arange(i, device=device)
mask = rearrange(r, 'i -> () () i ()') < rearrange(r, 'j -> () () () j')
mask = rearrange(r, 'i -> () () i ()') < rearrange(
r, 'j -> () () () j'
)
mask = F.pad(mask, (j - i, 0), value=False)
dots.masked_fill_(mask, mask_value)
del mask
@@ -354,14 +396,16 @@ class Attention(nn.Module):
attn = self.dropout(attn)
if talking_heads:
attn = einsum('b h i j, h k -> b k i j', attn, self.post_softmax_proj).contiguous()
attn = einsum(
'b h i j, h k -> b k i j', attn, self.post_softmax_proj
).contiguous()
out = einsum('b h i j, b h j d -> b h i d', attn, v)
out = rearrange(out, 'b h n d -> b n (h d)')
intermediates = Intermediates(
pre_softmax_attn=pre_softmax_attn,
post_softmax_attn=post_softmax_attn
post_softmax_attn=post_softmax_attn,
)
return self.to_out(out), intermediates
@@ -369,28 +413,28 @@ class Attention(nn.Module):
class AttentionLayers(nn.Module):
def __init__(
self,
dim,
depth,
heads=8,
causal=False,
cross_attend=False,
only_cross=False,
use_scalenorm=False,
use_rmsnorm=False,
use_rezero=False,
rel_pos_num_buckets=32,
rel_pos_max_distance=128,
position_infused_attn=False,
custom_layers=None,
sandwich_coef=None,
par_ratio=None,
residual_attn=False,
cross_residual_attn=False,
macaron=False,
pre_norm=True,
gate_residual=False,
**kwargs
self,
dim,
depth,
heads=8,
causal=False,
cross_attend=False,
only_cross=False,
use_scalenorm=False,
use_rmsnorm=False,
use_rezero=False,
rel_pos_num_buckets=32,
rel_pos_max_distance=128,
position_infused_attn=False,
custom_layers=None,
sandwich_coef=None,
par_ratio=None,
residual_attn=False,
cross_residual_attn=False,
macaron=False,
pre_norm=True,
gate_residual=False,
**kwargs,
):
super().__init__()
ff_kwargs, kwargs = groupby_prefix_and_trim('ff_', kwargs)
@@ -403,10 +447,14 @@ class AttentionLayers(nn.Module):
self.layers = nn.ModuleList([])
self.has_pos_emb = position_infused_attn
self.pia_pos_emb = FixedPositionalEmbedding(dim) if position_infused_attn else None
self.pia_pos_emb = (
FixedPositionalEmbedding(dim) if position_infused_attn else None
)
self.rotary_pos_emb = always(None)
assert rel_pos_num_buckets <= rel_pos_max_distance, 'number of relative position buckets must be less than the relative position max distance'
assert (
rel_pos_num_buckets <= rel_pos_max_distance
), 'number of relative position buckets must be less than the relative position max distance'
self.rel_pos = None
self.pre_norm = pre_norm
@@ -438,15 +486,27 @@ class AttentionLayers(nn.Module):
assert 1 < par_ratio <= par_depth, 'par ratio out of range'
default_block = tuple(filter(not_equals('f'), default_block))
par_attn = par_depth // par_ratio
depth_cut = par_depth * 2 // 3 # 2 / 3 attention layer cutoff suggested by PAR paper
depth_cut = (
par_depth * 2 // 3
) # 2 / 3 attention layer cutoff suggested by PAR paper
par_width = (depth_cut + depth_cut // par_attn) // par_attn
assert len(default_block) <= par_width, 'default block is too large for par_ratio'
par_block = default_block + ('f',) * (par_width - len(default_block))
assert (
len(default_block) <= par_width
), 'default block is too large for par_ratio'
par_block = default_block + ('f',) * (
par_width - len(default_block)
)
par_head = par_block * par_attn
layer_types = par_head + ('f',) * (par_depth - len(par_head))
elif exists(sandwich_coef):
assert sandwich_coef > 0 and sandwich_coef <= depth, 'sandwich coefficient should be less than the depth'
layer_types = ('a',) * sandwich_coef + default_block * (depth - sandwich_coef) + ('f',) * sandwich_coef
assert (
sandwich_coef > 0 and sandwich_coef <= depth
), 'sandwich coefficient should be less than the depth'
layer_types = (
('a',) * sandwich_coef
+ default_block * (depth - sandwich_coef)
+ ('f',) * sandwich_coef
)
else:
layer_types = default_block * depth
@@ -455,7 +515,9 @@ class AttentionLayers(nn.Module):
for layer_type in self.layer_types:
if layer_type == 'a':
layer = Attention(dim, heads=heads, causal=causal, **attn_kwargs)
layer = Attention(
dim, heads=heads, causal=causal, **attn_kwargs
)
elif layer_type == 'c':
layer = Attention(dim, heads=heads, **attn_kwargs)
elif layer_type == 'f':
@@ -472,20 +534,17 @@ class AttentionLayers(nn.Module):
else:
residual_fn = Residual()
self.layers.append(nn.ModuleList([
norm_fn(),
layer,
residual_fn
]))
self.layers.append(nn.ModuleList([norm_fn(), layer, residual_fn]))
def forward(
self,
x,
context=None,
mask=None,
context_mask=None,
mems=None,
return_hiddens=False
self,
x,
context=None,
mask=None,
context_mask=None,
mems=None,
return_hiddens=False,
**kwargs,
):
hiddens = []
intermediates = []
@@ -494,7 +553,9 @@ class AttentionLayers(nn.Module):
mems = mems.copy() if exists(mems) else [None] * self.num_attn_layers
for ind, (layer_type, (norm, block, residual_fn)) in enumerate(zip(self.layer_types, self.layers)):
for ind, (layer_type, (norm, block, residual_fn)) in enumerate(
zip(self.layer_types, self.layers)
):
is_last = ind == (len(self.layers) - 1)
if layer_type == 'a':
@@ -507,10 +568,22 @@ class AttentionLayers(nn.Module):
x = norm(x)
if layer_type == 'a':
out, inter = block(x, mask=mask, sinusoidal_emb=self.pia_pos_emb, rel_pos=self.rel_pos,
prev_attn=prev_attn, mem=layer_mem)
out, inter = block(
x,
mask=mask,
sinusoidal_emb=self.pia_pos_emb,
rel_pos=self.rel_pos,
prev_attn=prev_attn,
mem=layer_mem,
)
elif layer_type == 'c':
out, inter = block(x, context=context, mask=mask, context_mask=context_mask, prev_attn=prev_cross_attn)
out, inter = block(
x,
context=context,
mask=mask,
context_mask=context_mask,
prev_attn=prev_cross_attn,
)
elif layer_type == 'f':
out = block(x)
@@ -529,8 +602,7 @@ class AttentionLayers(nn.Module):
if return_hiddens:
intermediates = LayerIntermediates(
hiddens=hiddens,
attn_intermediates=intermediates
hiddens=hiddens, attn_intermediates=intermediates
)
return x, intermediates
@@ -544,23 +616,24 @@ class Encoder(AttentionLayers):
super().__init__(causal=False, **kwargs)
class TransformerWrapper(nn.Module):
def __init__(
self,
*,
num_tokens,
max_seq_len,
attn_layers,
emb_dim=None,
max_mem_len=0.,
emb_dropout=0.,
num_memory_tokens=None,
tie_embedding=False,
use_pos_emb=True
self,
*,
num_tokens,
max_seq_len,
attn_layers,
emb_dim=None,
max_mem_len=0.0,
emb_dropout=0.0,
num_memory_tokens=None,
tie_embedding=False,
use_pos_emb=True,
):
super().__init__()
assert isinstance(attn_layers, AttentionLayers), 'attention layers must be one of Encoder or Decoder'
assert isinstance(
attn_layers, AttentionLayers
), 'attention layers must be one of Encoder or Decoder'
dim = attn_layers.dim
emb_dim = default(emb_dim, dim)
@@ -570,23 +643,34 @@ class TransformerWrapper(nn.Module):
self.num_tokens = num_tokens
self.token_emb = nn.Embedding(num_tokens, emb_dim)
self.pos_emb = AbsolutePositionalEmbedding(emb_dim, max_seq_len) if (
use_pos_emb and not attn_layers.has_pos_emb) else always(0)
self.pos_emb = (
AbsolutePositionalEmbedding(emb_dim, max_seq_len)
if (use_pos_emb and not attn_layers.has_pos_emb)
else always(0)
)
self.emb_dropout = nn.Dropout(emb_dropout)
self.project_emb = nn.Linear(emb_dim, dim) if emb_dim != dim else nn.Identity()
self.project_emb = (
nn.Linear(emb_dim, dim) if emb_dim != dim else nn.Identity()
)
self.attn_layers = attn_layers
self.norm = nn.LayerNorm(dim)
self.init_()
self.to_logits = nn.Linear(dim, num_tokens) if not tie_embedding else lambda t: t @ self.token_emb.weight.t()
self.to_logits = (
nn.Linear(dim, num_tokens)
if not tie_embedding
else lambda t: t @ self.token_emb.weight.t()
)
# memory tokens (like [cls]) from Memory Transformers paper
num_memory_tokens = default(num_memory_tokens, 0)
self.num_memory_tokens = num_memory_tokens
if num_memory_tokens > 0:
self.memory_tokens = nn.Parameter(torch.randn(num_memory_tokens, dim))
self.memory_tokens = nn.Parameter(
torch.randn(num_memory_tokens, dim)
)
# let funnel encoder know number of memory tokens, if specified
if hasattr(attn_layers, 'num_memory_tokens'):
@@ -596,18 +680,26 @@ class TransformerWrapper(nn.Module):
nn.init.normal_(self.token_emb.weight, std=0.02)
def forward(
self,
x,
return_embeddings=False,
mask=None,
return_mems=False,
return_attn=False,
mems=None,
**kwargs
self,
x,
return_embeddings=False,
mask=None,
return_mems=False,
return_attn=False,
mems=None,
embedding_manager=None,
**kwargs,
):
b, n, device, num_mem = *x.shape, x.device, self.num_memory_tokens
x = self.token_emb(x)
x += self.pos_emb(x)
embedded_x = self.token_emb(x)
if embedding_manager:
x = embedding_manager(x, embedded_x)
else:
x = embedded_x
x = x + self.pos_emb(x)
x = self.emb_dropout(x)
x = self.project_emb(x)
@@ -620,7 +712,9 @@ class TransformerWrapper(nn.Module):
if exists(mask):
mask = F.pad(mask, (num_mem, 0), value=True)
x, intermediates = self.attn_layers(x, mask=mask, mems=mems, return_hiddens=True, **kwargs)
x, intermediates = self.attn_layers(
x, mask=mask, mems=mems, return_hiddens=True, **kwargs
)
x = self.norm(x)
mem, x = x[:, :num_mem], x[:, num_mem:]
@@ -629,13 +723,30 @@ class TransformerWrapper(nn.Module):
if return_mems:
hiddens = intermediates.hiddens
new_mems = list(map(lambda pair: torch.cat(pair, dim=-2), zip(mems, hiddens))) if exists(mems) else hiddens
new_mems = list(map(lambda t: t[..., -self.max_mem_len:, :].detach(), new_mems))
new_mems = (
list(
map(
lambda pair: torch.cat(pair, dim=-2),
zip(mems, hiddens),
)
)
if exists(mems)
else hiddens
)
new_mems = list(
map(
lambda t: t[..., -self.max_mem_len :, :].detach(), new_mems
)
)
return out, new_mems
if return_attn:
attn_maps = list(map(lambda t: t.post_softmax_attn, intermediates.attn_intermediates))
attn_maps = list(
map(
lambda t: t.post_softmax_attn,
intermediates.attn_intermediates,
)
)
return out, attn_maps
return out

View File

@@ -1,428 +1,13 @@
"""Simplified text to image API for stable diffusion/latent diffusion
'''
This module is provided for backward compatibility with the
original (hasty) API.
Example Usage:
Please use ldm.generate instead.
'''
from ldm.simplet2i import T2I
# Create an object with default values
t2i = T2I(outdir = <path> // outputs/txt2img-samples
model = <path> // models/ldm/stable-diffusion-v1/model.ckpt
config = <path> // default="configs/stable-diffusion/v1-inference.yaml
iterations = <integer> // how many times to run the sampling (1)
batch_size = <integer> // how many images to generate per sampling (1)
steps = <integer> // 50
seed = <integer> // current system time
sampler = ['ddim','plms','klms'] // klms
grid = <boolean> // false
width = <integer> // image width, multiple of 64 (512)
height = <integer> // image height, multiple of 64 (512)
cfg_scale = <float> // unconditional guidance scale (7.5)
fixed_code = <boolean> // False
)
from ldm.generate import Generate
# do the slow model initialization
t2i.load_model()
# Do the fast inference & image generation. Any options passed here
# override the default values assigned during class initialization
# Will call load_model() if the model was not previously loaded.
# The method returns a list of images. Each row of the list is a sub-list of [filename,seed]
results = t2i.txt2img(prompt = "an astronaut riding a horse"
outdir = "./outputs/txt2img-samples)
)
for row in results:
print(f'filename={row[0]}')
print(f'seed ={row[1]}')
# Same thing, but using an initial image.
results = t2i.img2img(prompt = "an astronaut riding a horse"
outdir = "./outputs/img2img-samples"
init_img = "./sketches/horse+rider.png")
for row in results:
print(f'filename={row[0]}')
print(f'seed ={row[1]}')
"""
import torch
import numpy as np
import random
import sys
import os
from omegaconf import OmegaConf
from PIL import Image
from tqdm import tqdm, trange
from itertools import islice
from einops import rearrange, repeat
from torchvision.utils import make_grid
from pytorch_lightning import seed_everything
from torch import autocast
from contextlib import contextmanager, nullcontext
import time
import math
from ldm.util import instantiate_from_config
from ldm.models.diffusion.ddim import DDIMSampler
from ldm.models.diffusion.plms import PLMSSampler
from ldm.models.diffusion.ksampler import KSampler
class T2I:
"""T2I class
Attributes
----------
outdir
model
config
iterations
batch_size
steps
seed
sampler
grid
individual
width
height
cfg_scale
fixed_code
latent_channels
downsampling_factor
precision
strength
"""
def __init__(self,
outdir="outputs/txt2img-samples",
batch_size=1,
iterations = 1,
width=512,
height=512,
grid=False,
individual=None, # redundant
steps=50,
seed=None,
cfg_scale=7.5,
weights="models/ldm/stable-diffusion-v1/model.ckpt",
config = "configs/latent-diffusion/txt2img-1p4B-eval.yaml",
sampler="klms",
latent_channels=4,
downsampling_factor=8,
ddim_eta=0.0, # deterministic
fixed_code=False,
precision='autocast',
full_precision=False,
strength=0.75 # default in scripts/img2img.py
):
self.outdir = outdir
self.batch_size = batch_size
self.iterations = iterations
self.width = width
self.height = height
self.grid = grid
self.steps = steps
self.cfg_scale = cfg_scale
self.weights = weights
self.config = config
self.sampler_name = sampler
self.fixed_code = fixed_code
self.latent_channels = latent_channels
self.downsampling_factor = downsampling_factor
self.ddim_eta = ddim_eta
self.precision = precision
self.full_precision = full_precision
self.strength = strength
self.model = None # empty for now
self.sampler = None
if seed is None:
self.seed = self._new_seed()
else:
self.seed = seed
def txt2img(self,prompt,outdir=None,batch_size=None,iterations=None,
steps=None,seed=None,grid=None,individual=None,width=None,height=None,
cfg_scale=None,ddim_eta=None,strength=None,init_img=None):
"""
Generate an image from the prompt, writing iteration images into the outdir
The output is a list of lists in the format: [[filename1,seed1], [filename2,seed2],...]
"""
outdir = outdir or self.outdir
steps = steps or self.steps
seed = seed or self.seed
width = width or self.width
height = height or self.height
cfg_scale = cfg_scale or self.cfg_scale
ddim_eta = ddim_eta or self.ddim_eta
batch_size = batch_size or self.batch_size
iterations = iterations or self.iterations
strength = strength or self.strength # not actually used here, but preserved for code refactoring
model = self.load_model() # will instantiate the model or return it from cache
# grid and individual are mutually exclusive, with individual taking priority.
# not necessary, but needed for compatability with dream bot
if (grid is None):
grid = self.grid
if individual:
grid = False
data = [batch_size * [prompt]]
# make directories and establish names for the output files
os.makedirs(outdir, exist_ok=True)
base_count = len(os.listdir(outdir))-1
start_code = None
if self.fixed_code:
start_code = torch.randn([batch_size,
self.latent_channels,
height // self.downsampling_factor,
width // self.downsampling_factor],
device=self.device)
precision_scope = autocast if self.precision=="autocast" else nullcontext
sampler = self.sampler
images = list()
seeds = list()
tic = time.time()
with torch.no_grad():
with precision_scope("cuda"):
with model.ema_scope():
all_samples = list()
for n in trange(iterations, desc="Sampling"):
seed_everything(seed)
for prompts in tqdm(data, desc="data", dynamic_ncols=True):
uc = None
if cfg_scale != 1.0:
uc = model.get_learned_conditioning(batch_size * [""])
if isinstance(prompts, tuple):
prompts = list(prompts)
c = model.get_learned_conditioning(prompts)
shape = [self.latent_channels, height // self.downsampling_factor, width // self.downsampling_factor]
samples_ddim, _ = sampler.sample(S=steps,
conditioning=c,
batch_size=batch_size,
shape=shape,
verbose=False,
unconditional_guidance_scale=cfg_scale,
unconditional_conditioning=uc,
eta=ddim_eta,
x_T=start_code)
x_samples_ddim = model.decode_first_stage(samples_ddim)
x_samples_ddim = torch.clamp((x_samples_ddim + 1.0) / 2.0, min=0.0, max=1.0)
if not grid:
for x_sample in x_samples_ddim:
x_sample = 255. * rearrange(x_sample.cpu().numpy(), 'c h w -> h w c')
filename = os.path.join(outdir, f"{base_count:05}.png")
Image.fromarray(x_sample.astype(np.uint8)).save(filename)
images.append([filename,seed])
base_count += 1
else:
all_samples.append(x_samples_ddim)
seeds.append(seed)
seed = self._new_seed()
if grid:
images = self._make_grid(samples=all_samples,
seeds=seeds,
batch_size=batch_size,
iterations=iterations,
outdir=outdir)
toc = time.time()
print(f'{batch_size * iterations} images generated in',"%4.2fs"% (toc-tic))
return images
# There is lots of shared code between this and txt2img and should be refactored.
def img2img(self,prompt,outdir=None,init_img=None,batch_size=None,iterations=None,
steps=None,seed=None,grid=None,individual=None,width=None,height=None,
cfg_scale=None,ddim_eta=None,strength=None):
"""
Generate an image from the prompt and the initial image, writing iteration images into the outdir
The output is a list of lists in the format: [[filename1,seed1], [filename2,seed2],...]
"""
outdir = outdir or self.outdir
steps = steps or self.steps
seed = seed or self.seed
cfg_scale = cfg_scale or self.cfg_scale
ddim_eta = ddim_eta or self.ddim_eta
batch_size = batch_size or self.batch_size
iterations = iterations or self.iterations
strength = strength or self.strength
if init_img is None:
print("no init_img provided!")
return []
model = self.load_model() # will instantiate the model or return it from cache
precision_scope = autocast if self.precision=="autocast" else nullcontext
# grid and individual are mutually exclusive, with individual taking priority.
# not necessary, but needed for compatability with dream bot
if (grid is None):
grid = self.grid
if individual:
grid = False
data = [batch_size * [prompt]]
# PLMS sampler not supported yet, so ignore previous sampler
if self.sampler_name!='ddim':
print(f"sampler '{self.sampler_name}' is not yet supported. Using DDM sampler")
sampler = DDIMSampler(model)
else:
sampler = self.sampler
# make directories and establish names for the output files
os.makedirs(outdir, exist_ok=True)
base_count = len(os.listdir(outdir))-1
assert os.path.isfile(init_img)
init_image = self._load_img(init_img).to(self.device)
init_image = repeat(init_image, '1 ... -> b ...', b=batch_size)
with precision_scope("cuda"):
init_latent = model.get_first_stage_encoding(model.encode_first_stage(init_image)) # move to latent space
sampler.make_schedule(ddim_num_steps=steps, ddim_eta=ddim_eta, verbose=False)
try:
assert 0. <= strength <= 1., 'can only work with strength in [0.0, 1.0]'
except AssertionError:
print(f"strength must be between 0.0 and 1.0, but received value {strength}")
return []
t_enc = int(strength * steps)
print(f"target t_enc is {t_enc} steps")
images = list()
seeds = list()
tic = time.time()
with torch.no_grad():
with precision_scope("cuda"):
with model.ema_scope():
all_samples = list()
for n in trange(iterations, desc="Sampling"):
seed_everything(seed)
for prompts in tqdm(data, desc="data", dynamic_ncols=True):
uc = None
if cfg_scale != 1.0:
uc = model.get_learned_conditioning(batch_size * [""])
if isinstance(prompts, tuple):
prompts = list(prompts)
c = model.get_learned_conditioning(prompts)
# encode (scaled latent)
z_enc = sampler.stochastic_encode(init_latent, torch.tensor([t_enc]*batch_size).to(self.device))
# decode it
samples = sampler.decode(z_enc, c, t_enc, unconditional_guidance_scale=cfg_scale,
unconditional_conditioning=uc,)
x_samples = model.decode_first_stage(samples)
x_samples = torch.clamp((x_samples + 1.0) / 2.0, min=0.0, max=1.0)
if not grid:
for x_sample in x_samples:
x_sample = 255. * rearrange(x_sample.cpu().numpy(), 'c h w -> h w c')
filename = os.path.join(outdir, f"{base_count:05}.png")
Image.fromarray(x_sample.astype(np.uint8)).save(filename)
images.append([filename,seed])
base_count += 1
else:
all_samples.append(x_samples)
seeds.append(seed)
seed = self._new_seed()
if grid:
images = self._make_grid(samples=all_samples,
seeds=seeds,
batch_size=batch_size,
iterations=iterations,
outdir=outdir)
toc = time.time()
print(f'{batch_size * iterations} images generated in',"%4.2fs"% (toc-tic))
return images
def _make_grid(self,samples,seeds,batch_size,iterations,outdir):
images = list()
base_count = len(os.listdir(outdir))-1
n_rows = batch_size if batch_size>1 else int(math.sqrt(batch_size * iterations))
# save as grid
grid = torch.stack(samples, 0)
grid = rearrange(grid, 'n b c h w -> (n b) c h w')
grid = make_grid(grid, nrow=n_rows)
# to image
grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy()
filename = os.path.join(outdir, f"{base_count:05}.png")
Image.fromarray(grid.astype(np.uint8)).save(filename)
for s in seeds:
images.append([filename,s])
return images
def _new_seed(self):
self.seed = random.randrange(0,np.iinfo(np.uint32).max)
return self.seed
def load_model(self):
""" Load and initialize the model from configuration variables passed at object creation time """
if self.model is None:
seed_everything(self.seed)
try:
config = OmegaConf.load(self.config)
self.device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model = self._load_model_from_config(config,self.weights)
self.model = model.to(self.device)
except AttributeError:
raise SystemExit
if self.sampler_name=='plms':
print("setting sampler to plms")
self.sampler = PLMSSampler(self.model)
elif self.sampler_name == 'ddim':
print("setting sampler to ddim")
self.sampler = DDIMSampler(self.model)
elif self.sampler_name == 'klms':
print("setting sampler to klms")
self.sampler = KSampler(self.model,'lms')
else:
print(f"unsupported sampler {self.sampler_name}, defaulting to plms")
self.sampler = PLMSSampler(self.model)
return self.model
def _load_model_from_config(self, config, ckpt):
print(f"Loading model from {ckpt}")
pl_sd = torch.load(ckpt, map_location="cpu")
if "global_step" in pl_sd:
print(f"Global Step: {pl_sd['global_step']}")
sd = pl_sd["state_dict"]
model = instantiate_from_config(config.model)
m, u = model.load_state_dict(sd, strict=False)
model.cuda()
model.eval()
if self.full_precision:
print('Using slower but more accurate full-precision math (--full_precision)')
else:
print('Using half precision math. Call with --full_precision to use full precision')
model.half()
return model
def _load_img(self,path):
image = Image.open(path).convert("RGB")
w, h = image.size
print(f"loaded input image of size ({w}, {h}) from {path}")
w, h = map(lambda x: x - x % 32, (w, h)) # resize to integer multiple of 32
image = image.resize((w, h), resample=Image.Resampling.LANCZOS)
image = np.array(image).astype(np.float32) / 255.0
image = image[None].transpose(0, 3, 1, 2)
image = torch.from_numpy(image)
return 2.*image - 1.
class T2I(Generate):
def __init__(self,**kwargs):
print(f'>> The ldm.simplet2i module is deprecated. Use ldm.generate instead. It is a drop-in replacement.')
super().__init__(kwargs)

View File

@@ -12,22 +12,26 @@ from queue import Queue
from inspect import isfunction
from PIL import Image, ImageDraw, ImageFont
def log_txt_as_img(wh, xc, size=10):
# wh a tuple of (width, height)
# xc a list of captions to plot
b = len(xc)
txts = list()
for bi in range(b):
txt = Image.new("RGB", wh, color="white")
txt = Image.new('RGB', wh, color='white')
draw = ImageDraw.Draw(txt)
font = ImageFont.truetype('data/DejaVuSans.ttf', size=size)
font = ImageFont.load_default()
nc = int(40 * (wh[0] / 256))
lines = "\n".join(xc[bi][start:start + nc] for start in range(0, len(xc[bi]), nc))
lines = '\n'.join(
xc[bi][start : start + nc] for start in range(0, len(xc[bi]), nc)
)
try:
draw.text((0, 0), lines, fill="black", font=font)
draw.text((0, 0), lines, fill='black', font=font)
except UnicodeEncodeError:
print("Cant encode string for logging. Skipping.")
print('Cant encode string for logging. Skipping.')
txt = np.array(txt).transpose(2, 0, 1) / 127.5 - 1.0
txts.append(txt)
@@ -69,22 +73,26 @@ def mean_flat(tensor):
def count_params(model, verbose=False):
total_params = sum(p.numel() for p in model.parameters())
if verbose:
print(f"{model.__class__.__name__} has {total_params * 1.e-6:.2f} M params.")
print(
f'{model.__class__.__name__} has {total_params * 1.e-6:.2f} M params.'
)
return total_params
def instantiate_from_config(config):
if not "target" in config:
def instantiate_from_config(config, **kwargs):
if not 'target' in config:
if config == '__is_first_stage__':
return None
elif config == "__is_unconditional__":
elif config == '__is_unconditional__':
return None
raise KeyError("Expected key `target` to instantiate.")
return get_obj_from_str(config["target"])(**config.get("params", dict()))
raise KeyError('Expected key `target` to instantiate.')
return get_obj_from_str(config['target'])(
**config.get('params', dict()), **kwargs
)
def get_obj_from_str(string, reload=False):
module, cls = string.rsplit(".", 1)
module, cls = string.rsplit('.', 1)
if reload:
module_imp = importlib.import_module(module)
importlib.reload(module_imp)
@@ -100,31 +108,36 @@ def _do_parallel_data_prefetch(func, Q, data, idx, idx_to_fn=False):
else:
res = func(data)
Q.put([idx, res])
Q.put("Done")
Q.put('Done')
def parallel_data_prefetch(
func: callable, data, n_proc, target_data_type="ndarray", cpu_intensive=True, use_worker_id=False
func: callable,
data,
n_proc,
target_data_type='ndarray',
cpu_intensive=True,
use_worker_id=False,
):
# if target_data_type not in ["ndarray", "list"]:
# raise ValueError(
# "Data, which is passed to parallel_data_prefetch has to be either of type list or ndarray."
# )
if isinstance(data, np.ndarray) and target_data_type == "list":
raise ValueError("list expected but function got ndarray.")
if isinstance(data, np.ndarray) and target_data_type == 'list':
raise ValueError('list expected but function got ndarray.')
elif isinstance(data, abc.Iterable):
if isinstance(data, dict):
print(
f'WARNING:"data" argument passed to parallel_data_prefetch is a dict: Using only its values and disregarding keys.'
)
data = list(data.values())
if target_data_type == "ndarray":
if target_data_type == 'ndarray':
data = np.asarray(data)
else:
data = list(data)
else:
raise TypeError(
f"The data, that shall be processed parallel has to be either an np.ndarray or an Iterable, but is actually {type(data)}."
f'The data, that shall be processed parallel has to be either an np.ndarray or an Iterable, but is actually {type(data)}.'
)
if cpu_intensive:
@@ -134,7 +147,7 @@ def parallel_data_prefetch(
Q = Queue(1000)
proc = Thread
# spawn processes
if target_data_type == "ndarray":
if target_data_type == 'ndarray':
arguments = [
[func, Q, part, i, use_worker_id]
for i, part in enumerate(np.array_split(data, n_proc))
@@ -148,7 +161,7 @@ def parallel_data_prefetch(
arguments = [
[func, Q, part, i, use_worker_id]
for i, part in enumerate(
[data[i: i + step] for i in range(0, len(data), step)]
[data[i : i + step] for i in range(0, len(data), step)]
)
]
processes = []
@@ -157,7 +170,7 @@ def parallel_data_prefetch(
processes += [p]
# start processes
print(f"Start prefetching...")
print(f'Start prefetching...')
import time
start = time.time()
@@ -170,13 +183,13 @@ def parallel_data_prefetch(
while k < n_proc:
# get result
res = Q.get()
if res == "Done":
if res == 'Done':
k += 1
else:
gather_res[res[0]] = res[1]
except Exception as e:
print("Exception: ", e)
print('Exception: ', e)
for p in processes:
p.terminate()
@@ -184,7 +197,7 @@ def parallel_data_prefetch(
finally:
for p in processes:
p.join()
print(f"Prefetching complete. [{time.time() - start} sec.]")
print(f'Prefetching complete. [{time.time() - start} sec.]')
if target_data_type == 'ndarray':
if not isinstance(gather_res[0], np.ndarray):

723
main.py

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,265 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Easy-peasy Windows install"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that you will need NVIDIA drivers, Python 3.10, and Git installed\n",
"beforehand - simplified\n",
"[step-by-step instructions](https://github.com/lstein/stable-diffusion/wiki/Easy-peasy-Windows-install)\n",
"are available in the wiki (you'll only need steps 1, 2, & 3 )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Run each cell in turn. In VSCode, either hit SHIFT-ENTER, or click on the little ▶️ to the left of the cell. In Jupyter/JupyterLab, you **must** hit SHIFT-ENTER"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install pew"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%cmd\n",
"git clone https://github.com/lstein/stable-diffusion.git"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%cd stable-diffusion"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile requirements.txt\n",
"albumentations==0.4.3\n",
"einops==0.3.0\n",
"huggingface-hub==0.8.1\n",
"imageio-ffmpeg==0.4.2\n",
"imageio==2.9.0\n",
"kornia==0.6.0\n",
"# pip will resolve the version which matches torch\n",
"numpy\n",
"omegaconf==2.1.1\n",
"opencv-python==4.6.0.66\n",
"pillow==9.2.0\n",
"pip>=22\n",
"pudb==2019.2\n",
"pytorch-lightning==1.4.2\n",
"streamlit==1.12.0\n",
"# \"CompVis/taming-transformers\" doesn't work\n",
"# ldm\\models\\autoencoder.py\", line 6, in <module>\n",
"# from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer\n",
"# ModuleNotFoundError\n",
"taming-transformers-rom1504==0.0.6\n",
"test-tube>=0.7.5\n",
"torch-fidelity==0.3.0\n",
"torchmetrics==0.6.0\n",
"transformers==4.19.2\n",
"git+https://github.com/openai/CLIP.git@main#egg=clip\n",
"git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion\n",
"# No CUDA in PyPi builds\n",
"--extra-index-url https://download.pytorch.org/whl/cu113 --trusted-host https://download.pytorch.org\n",
"torch==1.11.0\n",
"# Same as numpy - let pip do its thing\n",
"torchvision\n",
"-e .\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%cmd\n",
"pew new --python 3.10 -r requirements.txt --dont-activate ldm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Switch the notebook kernel to the new 'ldm' environment!\n",
"\n",
"## VSCode: restart VSCode and come back to this cell\n",
"\n",
"1. Ctrl+Shift+P\n",
"1. Type \"Select Interpreter\" and select \"Jupyter: Select Interpreter to Start Jupyter Server\"\n",
"1. VSCode will say that it needs to install packages. Click the \"Install\" button.\n",
"1. Once the install is finished, do 1 & 2 again\n",
"1. Pick 'ldm'\n",
"1. Run the following cell"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%cd stable-diffusion"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Jupyter/JupyterLab\n",
"\n",
"1. Run the cell below\n",
"1. Click on the toolbar where it says \"(ipyknel)\" ↗️. You should get a pop-up asking you to \"Select Kernel\". Pick 'ldm' from the drop-down.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### DO NOT RUN THE FOLLOWING CELL IF YOU ARE USING VSCODE!!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# DO NOT RUN THIS CELL IF YOU ARE USING VSCODE!!\n",
"%%cmd\n",
"pew workon ldm\n",
"pip3 install ipykernel\n",
"python -m ipykernel install --name=ldm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### When running the next cell, Jupyter/JupyterLab users might get a warning saying \"IProgress not found\". This can be ignored."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%run \"scripts/preload_models.py\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%cmd\n",
"mkdir \"models/ldm/stable-diffusion-v1\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now copy the SD model you downloaded from Hugging Face into the above new directory, and (if necessary) rename it to 'model.ckpt'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now go create some magic!\n",
"\n",
"VSCode\n",
"\n",
"- The actual input box for the 'dream' prompt will appear at the very top of the VSCode window. Type in your commands and hit 'ENTER'.\n",
"- To quit, hit the 'Interrupt' button in the toolbar up there ⬆️ a couple of times, then hit ENTER (you'll probably see a terrifying traceback from Python - just ignore it).\n",
"\n",
"Jupyter/JupyterLab\n",
"\n",
"- The input box for the 'dream' prompt will appear below. Type in your commands and hit 'ENTER'.\n",
"- To quit, hit the interrupt button (⏹️) in the toolbar up there ⬆️ a couple of times, then hit ENTER (you'll probably see a terrifying traceback from Python - just ignore it)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%run \"scripts/dream.py\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Once this seems to be working well, you can try opening a terminal\n",
"\n",
"- VSCode: type ('CTRL+`')\n",
"- Jupyter/JupyterLab: File|New Terminal\n",
"- Or jump out of the notebook entirely, and open Powershell/Command Prompt\n",
"\n",
"Now:\n",
"\n",
"1. `cd` to wherever the 'stable-diffusion' directory is\n",
"1. Run `pew workon ldm`\n",
"1. Run `winpty python scripts\\dream.py`"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.6 ('ldm')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"vscode": {
"interpreter": {
"hash": "a05e4574567b7bc2c98f7f9aa579f9ea5b8739b54844ab610ac85881c4be2659"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,281 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"collapsed_sections": [],
"private_outputs": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU",
"gpuClass": "standard"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Stable Diffusion AI Notebook (Release 1.13)\n",
"\n",
"<img src=\"https://user-images.githubusercontent.com/60411196/186547976-d9de378a-9de8-4201-9c25-c057a9c59bad.jpeg\" alt=\"stable-diffusion-ai\" width=\"170px\"/> <br>\n",
"#### Instructions:\n",
"1. Execute each cell in order to mount a Dream bot and create images from text. <br>\n",
"2. Once cells 1-8 were run correctly you'll be executing a terminal in cell #9, you'll need to enter `python scripts/dream.py` command to run Dream bot.<br> \n",
"3. After launching dream bot, you'll see: <br> `Dream > ` in terminal. <br> Insert a command, eg. `Dream > Astronaut floating in a distant galaxy`, or type `-h` for help.\n",
"3. After completion you'll see your generated images in path `stable-diffusion/outputs/img-samples/`, you can also show last generated images in cell #10.\n",
"4. To quit Dream bot use `q` command. <br> \n",
"---\n",
"<font color=\"red\">Note:</font> It takes some time to load, but after installing all dependencies you can use the bot all time you want while colab instance is up. <br>\n",
"<font color=\"red\">Requirements:</font> For this notebook to work you need to have [Stable-Diffusion-v-1-4](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original) stored in your Google Drive, it will be needed in cell #7\n",
"##### For more details visit Github repository: [lstein/stable-diffusion](https://github.com/lstein/stable-diffusion)\n",
"---\n"
],
"metadata": {
"id": "ycYWcsEKc6w7"
}
},
{
"cell_type": "markdown",
"source": [
"## ◢ Installation"
],
"metadata": {
"id": "dr32VLxlnouf"
}
},
{
"cell_type": "code",
"source": [
"#@title 1. Check current GPU assigned\n",
"!nvidia-smi -L\n",
"!nvidia-smi"
],
"metadata": {
"cellView": "form",
"id": "a2Z5Qu_o8VtQ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "vbI9ZsQHzjqF"
},
"outputs": [],
"source": [
"#@title 2. Download stable-diffusion Repository\n",
"from os.path import exists\n",
"\n",
"if exists(\"/content/stable-diffusion/\")==True:\n",
" %cd /content/stable-diffusion/\n",
" print(\"Already downloaded repo\")\n",
"else:\n",
" !git clone --quiet https://github.com/lstein/stable-diffusion.git # Original repo\n",
" %cd /content/stable-diffusion/\n",
" !git checkout --quiet tags/release-1.13"
]
},
{
"cell_type": "code",
"source": [
"#@title 3. Install dependencies\n",
"import gc\n",
"\n",
"if exists(\"/content/stable-diffusion/requirements-colab.txt\")==True:\n",
" %cd /content/stable-diffusion/\n",
" print(\"Already downloaded requirements file\")\n",
"else:\n",
" !wget https://raw.githubusercontent.com/lstein/stable-diffusion/development/requirements-colab.txt\n",
"!pip install colab-xterm\n",
"!pip install -r requirements-colab.txt\n",
"gc.collect()"
],
"metadata": {
"cellView": "form",
"id": "QbXcGXYEFSNB"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#@title 4. Load small ML models required\n",
"%cd /content/stable-diffusion/\n",
"!python scripts/preload_models.py\n",
"gc.collect()"
],
"metadata": {
"cellView": "form",
"id": "ChIDWxLVHGGJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#@title 5. Restart Runtime\n",
"exit()"
],
"metadata": {
"cellView": "form",
"id": "8rSMhgnAttQa"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## ◢ Configuration"
],
"metadata": {
"id": "795x1tMoo8b1"
}
},
{
"cell_type": "code",
"source": [
"#@title 6. Mount google Drive\n",
"from google.colab import drive\n",
"drive.mount('/content/drive')"
],
"metadata": {
"cellView": "form",
"id": "YEWPV-sF1RDM"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#@title 7. Drive Path to model\n",
"#@markdown Path should start with /content/drive/path-to-your-file <br>\n",
"#@markdown <font color=\"red\">Note:</font> Model should be downloaded from https://huggingface.co <br>\n",
"#@markdown Lastest release: [Stable-Diffusion-v-1-4](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original)\n",
"from os.path import exists\n",
"\n",
"model_path = \"\" #@param {type:\"string\"}\n",
"if exists(model_path)==True:\n",
" print(\"✅ Valid directory\")\n",
"else: \n",
" print(\"❌ File doesn't exist\")"
],
"metadata": {
"cellView": "form",
"id": "zRTJeZ461WGu"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#@title 8. Symlink to model\n",
"\n",
"from os.path import exists\n",
"import os \n",
"\n",
"# Folder creation if it doesn't exist\n",
"if exists(\"/content/stable-diffusion/models/ldm/stable-diffusion-v1\")==True:\n",
" print(\"❗ Dir stable-diffusion-v1 already exists\")\n",
"else:\n",
" %mkdir /content/stable-diffusion/models/ldm/stable-diffusion-v1\n",
" print(\"✅ Dir stable-diffusion-v1 created\")\n",
"\n",
"# Symbolic link if it doesn't exist\n",
"if exists(\"/content/stable-diffusion/models/ldm/stable-diffusion-v1/model.ckpt\")==True:\n",
" print(\"❗ Symlink already created\")\n",
"else: \n",
" src = model_path\n",
" dst = '/content/stable-diffusion/models/ldm/stable-diffusion-v1/model.ckpt'\n",
" os.symlink(src, dst) \n",
" print(\"✅ Symbolic link created successfully\")"
],
"metadata": {
"id": "UY-NNz4I8_aG",
"cellView": "form"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## ◢ Execution"
],
"metadata": {
"id": "Mc28N0_NrCQH"
}
},
{
"cell_type": "code",
"source": [
"#@title 9. Run Terminal and Execute Dream bot\n",
"#@markdown <font color=\"blue\">Steps:</font> <br>\n",
"#@markdown 1. Execute command `python scripts/dream.py` to run dream bot.<br>\n",
"#@markdown 2. After initialized you'll see `Dream>` line.<br>\n",
"#@markdown 3. Example text: `Astronaut floating in a distant galaxy` <br>\n",
"#@markdown 4. To quit Dream bot use: `q` command.<br>\n",
"\n",
"import gc\n",
"%cd /content/stable-diffusion/\n",
"%load_ext colabxterm\n",
"%xterm\n",
"gc.collect()"
],
"metadata": {
"id": "ir4hCrMIuUpl",
"cellView": "form"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#@title 10. Show the last 15 generated images\n",
"import gc\n",
"import glob\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib.image as mpimg\n",
"%matplotlib inline\n",
"\n",
"images = []\n",
"for img_path in sorted(glob.glob('/content/stable-diffusion/outputs/img-samples/*.png'), reverse=True):\n",
" images.append(mpimg.imread(img_path))\n",
"\n",
"images = images[:15] \n",
"\n",
"plt.figure(figsize=(20,10))\n",
"\n",
"columns = 5\n",
"for i, image in enumerate(images):\n",
" ax = plt.subplot(len(images) / columns + 1, columns, i + 1)\n",
" ax.axes.xaxis.set_visible(False)\n",
" ax.axes.yaxis.set_visible(False)\n",
" ax.axis('off')\n",
" plt.imshow(image)\n",
" gc.collect()\n",
"\n"
],
"metadata": {
"cellView": "form",
"id": "qnLohSHmKoGk"
},
"execution_count": null,
"outputs": []
}
]
}

View File

@@ -14,7 +14,7 @@ from ldm.models.diffusion.ddim import DDIMSampler
from ldm.util import ismap
import time
from omegaconf import OmegaConf
from ldm.dream.devices import choose_torch_device
def download_models(mode):
@@ -117,7 +117,8 @@ def get_cond(mode, selected_path):
c = rearrange(c, '1 c h w -> 1 h w c')
c = 2. * c - 1.
c = c.to(torch.device("cuda"))
device = choose_torch_device()
c = c.to(device)
example["LR_image"] = c
example["image"] = c_up
@@ -267,4 +268,4 @@ def make_convolutional_sample(batch, model, mode="vanilla", custom_steps=None, e
log["sample"] = x_sample
log["time"] = t1 - t0
return log
return log

26
requirements-colab.txt Normal file
View File

@@ -0,0 +1,26 @@
albumentations==0.4.3
clean-fid==0.1.29
einops==0.3.0
huggingface-hub==0.8.1
imageio-ffmpeg==0.4.2
imageio==2.9.0
kornia==0.6.0
numpy==1.21.6
omegaconf==2.1.1
opencv-python==4.6.0.66
pillow==9.2.0
pip>=22
pudb==2019.2
pytorch-lightning==1.4.2
streamlit==1.12.0
taming-transformers-rom1504==0.0.6
test-tube>=0.7.5
torch-fidelity==0.3.0
torchmetrics==0.6.0
torchtext==0.6.0
transformers==4.19.2
torch==1.12.1+cu113
torchvision==0.13.1+cu113
git+https://github.com/openai/CLIP.git@main#egg=clip
git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion
-e .

33
requirements-lin.txt Normal file
View File

@@ -0,0 +1,33 @@
albumentations==0.4.3
einops==0.3.0
huggingface-hub==0.8.1
imageio-ffmpeg==0.4.2
imageio==2.9.0
kornia==0.6.0
# pip will resolve the version which matches torch
numpy
omegaconf==2.1.1
opencv-python==4.6.0.66
pillow==9.2.0
pip>=22
pudb==2019.2
pytorch-lightning==1.4.2
streamlit==1.12.0
# "CompVis/taming-transformers" doesn't work
# ldm\models\autoencoder.py", line 6, in <module>
# from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer
# ModuleNotFoundError
taming-transformers-rom1504==0.0.6
test-tube>=0.7.5
torch-fidelity==0.3.0
torchmetrics==0.6.0
transformers==4.19.2
git+https://github.com/openai/CLIP.git@main#egg=clip
git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion
git+https://github.com/lstein/GFPGAN@fix-dark-cast-images#egg=gfpgan
# No CUDA in PyPi builds
--extra-index-url https://download.pytorch.org/whl/cu113 --trusted-host https://download.pytorch.org
torch==1.11.0
# Same as numpy - let pip do its thing
torchvision
-e .

24
requirements-mac.txt Normal file
View File

@@ -0,0 +1,24 @@
albumentations==0.4.3
einops==0.3.0
huggingface-hub==0.8.1
imageio==2.9.0
imageio-ffmpeg==0.4.2
kornia==0.6.0
numpy==1.23.1
--pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
omegaconf==2.1.1
opencv-python==4.6.0.66
pillow==9.2.0
pudb==2019.2
torch==1.12.1
torchvision==0.13.0
pytorch-lightning==1.4.2
streamlit==1.12.0
test-tube>=0.7.5
torch-fidelity==0.3.0
torchmetrics==0.6.0
transformers==4.19.2
-e git+https://github.com/openai/CLIP.git@main#egg=clip
-e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
-e git+https://github.com/Birch-san/k-diffusion.git@mps#egg=k-diffusion
-e git+https://github.com/lstein/GFPGAN@fix-dark-cast-images#egg=gfpgan

33
requirements-win.txt Normal file
View File

@@ -0,0 +1,33 @@
albumentations==0.4.3
einops==0.3.0
huggingface-hub==0.8.1
imageio-ffmpeg==0.4.2
imageio==2.9.0
kornia==0.6.0
# pip will resolve the version which matches torch
numpy
omegaconf==2.1.1
opencv-python==4.6.0.66
pillow==9.2.0
pip>=22
pudb==2019.2
pytorch-lightning==1.4.2
streamlit==1.12.0
# "CompVis/taming-transformers" doesn't work
# ldm\models\autoencoder.py", line 6, in <module>
# from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer
# ModuleNotFoundError
taming-transformers-rom1504==0.0.6
test-tube>=0.7.5
torch-fidelity==0.3.0
torchmetrics==0.6.0
transformers==4.19.2
git+https://github.com/openai/CLIP.git@main#egg=clip
git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion
git+https://github.com/lstein/GFPGAN@fix-dark-cast-images#egg=gfpgan
# No CUDA in PyPi builds
--extra-index-url https://download.pytorch.org/whl/cu113 --trusted-host https://download.pytorch.org
torch==1.11.0
# Same as numpy - let pip do its thing
torchvision
-e .

View File

@@ -1,284 +1,685 @@
#!/usr/bin/env python3
# Copyright (c) 2022 Lincoln D. Stein (https://github.com/lstein)
import argparse
import shlex
import atexit
import os
import re
import sys
import copy
import warnings
import time
import ldm.dream.readline
from ldm.dream.pngwriter import PngWriter, PromptFormatter
from ldm.dream.server import DreamServer, ThreadingDreamServer
from ldm.dream.image_util import make_grid
from omegaconf import OmegaConf
# readline unavailable on windows systems
try:
import readline
readline_available = True
except:
readline_available = False
# Placeholder to be replaced with proper class that tracks the
# outputs and associates with the prompt that generated them.
# Just want to get the formatting look right for now.
output_cntr = 0
debugging = False
def main():
''' Initialize command-line parsers and the diffusion model '''
"""Initialize command-line parsers and the diffusion model"""
arg_parser = create_argv_parser()
opt = arg_parser.parse_args()
opt = arg_parser.parse_args()
if opt.laion400m:
# defaults suitable to the older latent diffusion weights
width = 256
height = 256
config = "configs/latent-diffusion/txt2img-1p4B-eval.yaml"
weights = "models/ldm/text2img-large/model.ckpt"
else:
# some defaults suitable for stable diffusion weights
width = 512
height = 512
config = "configs/stable-diffusion/v1-inference.yaml"
weights = "models/ldm/stable-diffusion-v1/model.ckpt"
print('--laion400m flag has been deprecated. Please use --model laion400m instead.')
sys.exit(-1)
if opt.weights != 'model':
print('--weights argument has been deprecated. Please configure ./configs/models.yaml, and call it using --model instead.')
sys.exit(-1)
# command line history will be stored in a file called "~/.dream_history"
if readline_available:
setup_readline()
try:
models = OmegaConf.load(opt.config)
width = models[opt.model].width
height = models[opt.model].height
config = models[opt.model].config
weights = models[opt.model].weights
except (FileNotFoundError, IOError, KeyError) as e:
print(f'{e}. Aborting.')
sys.exit(-1)
print("* Initializing, be patient...\n")
print('* Initializing, be patient...\n')
sys.path.append('.')
from pytorch_lightning import logging
from ldm.simplet2i import T2I
from ldm.generate import Generate
# these two lines prevent a horrible warning message from appearing
# when the frozen CLIP tokenizer is imported
import transformers
transformers.logging.set_verbosity_error()
# creating a simple text2image object with a handful of
# defaults passed on the command line.
# additional parameters will be added (or overriden) during
# the user input loop
t2i = T2I(width=width,
height=height,
batch_size=opt.batch_size,
outdir=opt.outdir,
sampler=opt.sampler,
weights=weights,
full_precision=opt.full_precision,
config=config)
t2i = Generate(
width=width,
height=height,
sampler_name=opt.sampler_name,
weights=weights,
full_precision=opt.full_precision,
config=config,
grid=opt.grid,
# this is solely for recreating the prompt
seamless=opt.seamless,
embedding_path=opt.embedding_path,
device_type=opt.device,
ignore_ctrl_c=opt.infile is None,
)
# make sure the output directory exists
if not os.path.exists(opt.outdir):
os.makedirs(opt.outdir)
# gets rid of annoying messages about random seed
logging.getLogger("pytorch_lightning").setLevel(logging.ERROR)
logging.getLogger('pytorch_lightning').setLevel(logging.ERROR)
# load the infile as a list of lines
infile = None
if opt.infile:
try:
if os.path.isfile(opt.infile):
infile = open(opt.infile, 'r', encoding='utf-8')
elif opt.infile == '-': # stdin
infile = sys.stdin
else:
raise FileNotFoundError(f'{opt.infile} not found.')
except (FileNotFoundError, IOError) as e:
print(f'{e}. Aborting.')
sys.exit(-1)
if opt.seamless:
print(">> changed to seamless tiling mode")
# preload the model
if not debugging:
t2i.load_model()
print("\n* Initialization done! Awaiting your command (-h for help, q to quit)...")
t2i.load_model()
log_path = os.path.join(opt.outdir,"dream_log.txt")
with open(log_path,'a') as log:
cmd_parser = create_cmd_parser()
main_loop(t2i,cmd_parser,log)
log.close()
if not infile:
print(
"\n* Initialization done! Awaiting your command (-h for help, 'q' to quit)"
)
cmd_parser = create_cmd_parser()
if opt.web:
dream_server_loop(t2i, opt.host, opt.port, opt.outdir)
else:
main_loop(t2i, opt.outdir, opt.prompt_as_dir, cmd_parser, infile)
def main_loop(t2i,parser,log):
''' prompt/read/execute loop '''
def main_loop(t2i, outdir, prompt_as_dir, parser, infile):
"""prompt/read/execute loop"""
done = False
path_filter = re.compile(r'[<>:"/\\|?*]')
last_results = list()
# os.pathconf is not available on Windows
if hasattr(os, 'pathconf'):
path_max = os.pathconf(outdir, 'PC_PATH_MAX')
name_max = os.pathconf(outdir, 'PC_NAME_MAX')
else:
path_max = 260
name_max = 255
while not done:
try:
command = input("dream> ")
command = get_next_command(infile)
except EOFError:
done = True
continue
except KeyboardInterrupt:
done = True
continue
# skip empty lines
if not command.strip():
continue
if command.startswith(('#', '//')):
continue
# before splitting, escape single quotes so as not to mess
# up the parser
command = command.replace("'", "\\'")
try:
elements = shlex.split(command)
except ValueError as e:
print(str(e))
continue
if elements[0] == 'q':
done = True
break
elements = shlex.split(command)
if elements[0]=='q': #
done = True
break
if elements[0].startswith('!dream'): # in case a stored prompt still contains the !dream command
if elements[0].startswith(
'!dream'
): # in case a stored prompt still contains the !dream command
elements.pop(0)
# rearrange the arguments to mimic how it works in the Dream bot.
switches = ['']
switches_started = False
for el in elements:
if el[0]=='-' and not switches_started:
if el[0] == '-' and not switches_started:
switches_started = True
if switches_started:
switches.append(el)
else:
switches[0] += el
switches[0] += ' '
switches[0] = switches[0][:len(switches[0])-1]
switches[0] = switches[0][: len(switches[0]) - 1]
try:
opt = parser.parse_args(switches)
opt = parser.parse_args(switches)
except SystemExit:
parser.print_help()
continue
if len(opt.prompt)==0:
print("Try again with a prompt!")
if len(opt.prompt) == 0:
print('Try again with a prompt!')
continue
# retrieve previous value!
if opt.init_img is not None and re.match('^-\\d+$', opt.init_img):
try:
opt.init_img = last_results[int(opt.init_img)][0]
print(f'>> Reusing previous image {opt.init_img}')
except IndexError:
print(
f'>> No previous initial image at position {opt.init_img} found')
opt.init_img = None
continue
try:
if opt.init_img is None:
results = t2i.txt2img(**vars(opt))
if opt.seed is not None and opt.seed < 0: # retrieve previous value!
try:
opt.seed = last_results[opt.seed][1]
print(f'>> Reusing previous seed {opt.seed}')
except IndexError:
print(f'>> No previous seed at position {opt.seed} found')
opt.seed = None
continue
do_grid = opt.grid or t2i.grid
if opt.with_variations is not None:
# shotgun parsing, woo
parts = []
broken = False # python doesn't have labeled loops...
for part in opt.with_variations.split(','):
seed_and_weight = part.split(':')
if len(seed_and_weight) != 2:
print(f'could not parse with_variation part "{part}"')
broken = True
break
try:
seed = int(seed_and_weight[0])
weight = float(seed_and_weight[1])
except ValueError:
print(f'could not parse with_variation part "{part}"')
broken = True
break
parts.append([seed, weight])
if broken:
continue
if len(parts) > 0:
opt.with_variations = parts
else:
results = t2i.img2img(**vars(opt))
print("Outputs:")
write_log_message(opt,switches,results,log)
except KeyboardInterrupt:
print('*interrupted*')
opt.with_variations = None
if opt.outdir:
if not os.path.exists(opt.outdir):
os.makedirs(opt.outdir)
current_outdir = opt.outdir
elif prompt_as_dir:
# sanitize the prompt to a valid folder name
subdir = path_filter.sub('_', opt.prompt)[:name_max].rstrip(' .')
# truncate path to maximum allowed length
# 27 is the length of '######.##########.##.png', plus two separators and a NUL
subdir = subdir[:(path_max - 27 - len(os.path.abspath(outdir)))]
current_outdir = os.path.join(outdir, subdir)
print('Writing files to directory: "' + current_outdir + '"')
# make sure the output directory exists
if not os.path.exists(current_outdir):
os.makedirs(current_outdir)
else:
current_outdir = outdir
# Here is where the images are actually generated!
last_results = []
try:
file_writer = PngWriter(current_outdir)
prefix = file_writer.unique_prefix()
results = [] # list of filename, prompt pairs
grid_images = dict() # seed -> Image, only used if `do_grid`
def image_writer(image, seed, upscaled=False):
path = None
if do_grid:
grid_images[seed] = image
else:
if upscaled and opt.save_original:
filename = f'{prefix}.{seed}.postprocessed.png'
else:
filename = f'{prefix}.{seed}.png'
if opt.variation_amount > 0:
iter_opt = argparse.Namespace(**vars(opt)) # copy
this_variation = [[seed, opt.variation_amount]]
if opt.with_variations is None:
iter_opt.with_variations = this_variation
else:
iter_opt.with_variations = opt.with_variations + this_variation
iter_opt.variation_amount = 0
normalized_prompt = PromptFormatter(
t2i, iter_opt).normalize_prompt()
metadata_prompt = f'{normalized_prompt} -S{iter_opt.seed}'
elif opt.with_variations is not None:
normalized_prompt = PromptFormatter(
t2i, opt).normalize_prompt()
# use the original seed - the per-iteration value is the last variation-seed
metadata_prompt = f'{normalized_prompt} -S{opt.seed}'
else:
normalized_prompt = PromptFormatter(
t2i, opt).normalize_prompt()
metadata_prompt = f'{normalized_prompt} -S{seed}'
path = file_writer.save_image_and_prompt_to_png(
image, metadata_prompt, filename)
if (not upscaled) or opt.save_original:
# only append to results if we didn't overwrite an earlier output
results.append([path, metadata_prompt])
last_results.append([path, seed])
t2i.prompt2image(image_callback=image_writer, **vars(opt))
if do_grid and len(grid_images) > 0:
grid_img = make_grid(list(grid_images.values()))
grid_seeds = list(grid_images.keys())
first_seed = last_results[0][1]
filename = f'{prefix}.{first_seed}.png'
# TODO better metadata for grid images
normalized_prompt = PromptFormatter(
t2i, opt).normalize_prompt()
metadata_prompt = f'{normalized_prompt} -S{first_seed} --grid -n{len(grid_images)} # {grid_seeds}'
path = file_writer.save_image_and_prompt_to_png(
grid_img, metadata_prompt, filename
)
results = [[path, metadata_prompt]]
except AssertionError as e:
print(e)
continue
print("goodbye!")
except OSError as e:
print(e)
continue
print('Outputs:')
log_path = os.path.join(current_outdir, 'dream_log.txt')
write_log_message(results, log_path)
print()
print('goodbye!')
def write_log_message(opt,switches,results,logfile):
''' logs the name of the output image, its prompt and seed to both the terminal and the log file '''
if opt.grid:
_output_for_grid(switches,results,logfile)
def get_next_command(infile=None) -> str: # command string
if infile is None:
command = input('dream> ')
else:
_output_for_individual(switches,results,logfile)
command = infile.readline()
if not command:
raise EOFError
else:
command = command.strip()
print(f'#{command}')
return command
def _output_for_individual(switches,results,logfile):
for r in results:
log_message = " ".join([' ',str(r[0])+':',
f'"{switches[0]}"',
*switches[1:],f'-S {r[1]}'])
print(log_message)
logfile.write(log_message+"\n")
logfile.flush()
def _output_for_grid(switches,results,logfile):
first_seed = results[0][1]
log_message = " ".join([' ',str(results[0][0])+':',
f'"{switches[0]}"',
*switches[1:],f'-S {results[0][1]}'])
print(log_message)
logfile.write(log_message+"\n")
all_seeds = [row[1] for row in results]
log_message = f' seeds for individual rows: {all_seeds}'
print(log_message)
logfile.write(log_message+"\n")
def dream_server_loop(t2i, host, port, outdir):
print('\n* --web was specified, starting web server...')
# Change working directory to the stable-diffusion directory
os.chdir(
os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
)
# Start server
DreamServer.model = t2i
DreamServer.outdir = outdir
dream_server = ThreadingDreamServer((host, port))
print(">> Started Stable Diffusion dream server!")
if host == '0.0.0.0':
print(
f"Point your browser at http://localhost:{port} or use the host's DNS name or IP address.")
else:
print(">> Default host address now 127.0.0.1 (localhost). Use --host 0.0.0.0 to bind any address.")
print(f">> Point your browser at http://{host}:{port}.")
try:
dream_server.serve_forever()
except KeyboardInterrupt:
pass
dream_server.server_close()
def write_log_message(results, log_path):
"""logs the name of the output image, prompt, and prompt args to the terminal and log file"""
global output_cntr
log_lines = [f'{path}: {prompt}\n' for path, prompt in results]
for l in log_lines:
output_cntr += 1
print(f'[{output_cntr}] {l}',end='')
with open(log_path, 'a', encoding='utf-8') as file:
file.writelines(log_lines)
SAMPLER_CHOICES = [
'ddim',
'k_dpm_2_a',
'k_dpm_2',
'k_euler_a',
'k_euler',
'k_heun',
'k_lms',
'plms',
]
def create_argv_parser():
parser = argparse.ArgumentParser(description="Parse script's command line args")
parser.add_argument("--laion400m",
"--latent_diffusion",
"-l",
dest='laion400m',
action='store_true',
help="fallback to the latent diffusion (LAION4400M) weights and config")
parser.add_argument('-n','--iterations',
type=int,
default=1,
help="number of images to generate")
parser.add_argument('-F','--full_precision',
dest='full_precision',
action='store_true',
help="use slower full precision math for calculations")
parser.add_argument('-b','--batch_size',
type=int,
default=1,
help="number of images to produce per iteration (currently not working properly - producing too many images)")
parser.add_argument('--sampler',
choices=['plms','ddim', 'klms'],
default='klms',
help="which sampler to use (klms)")
parser.add_argument('-o',
'--outdir',
type=str,
default="outputs/img-samples",
help="directory in which to place generated images and a log of prompts and seeds")
parser = argparse.ArgumentParser(
description="""Generate images using Stable Diffusion.
Use --web to launch the web interface.
Use --from_file to load prompts from a file path or standard input ("-").
Otherwise you will be dropped into an interactive command prompt (type -h for help.)
Other command-line arguments are defaults that can usually be overridden
prompt the command prompt.
"""
)
parser.add_argument(
'--laion400m',
'--latent_diffusion',
'-l',
dest='laion400m',
action='store_true',
help='Fallback to the latent diffusion (laion400m) weights and config',
)
parser.add_argument(
'--from_file',
dest='infile',
type=str,
help='If specified, load prompts from this file',
)
parser.add_argument(
'-n',
'--iterations',
type=int,
default=1,
help='Number of images to generate',
)
parser.add_argument(
'-F',
'--full_precision',
dest='full_precision',
action='store_true',
help='Use more memory-intensive full precision math for calculations',
)
parser.add_argument(
'-g',
'--grid',
action='store_true',
help='Generate a grid instead of individual images',
)
parser.add_argument(
'-A',
'-m',
'--sampler',
dest='sampler_name',
choices=SAMPLER_CHOICES,
metavar='SAMPLER_NAME',
default='k_lms',
help=f'Set the initial sampler. Default: k_lms. Supported samplers: {", ".join(SAMPLER_CHOICES)}',
)
parser.add_argument(
'--outdir',
'-o',
type=str,
default='outputs/img-samples',
help='Directory to save generated images and a log of prompts and seeds. Default: outputs/img-samples',
)
parser.add_argument(
'--seamless',
action='store_true',
help='Change the model to seamless tiling (circular) mode',
)
parser.add_argument(
'--embedding_path',
type=str,
help='Path to a pre-trained embedding manager checkpoint - can only be set on command line',
)
parser.add_argument(
'--prompt_as_dir',
'-p',
action='store_true',
help='Place images in subdirectories named after the prompt.',
)
# GFPGAN related args
parser.add_argument(
'--gfpgan_bg_upsampler',
type=str,
default='realesrgan',
help='Background upsampler. Default: realesrgan. Options: realesrgan, none.',
)
parser.add_argument(
'--gfpgan_bg_tile',
type=int,
default=400,
help='Tile size for background sampler, 0 for no tile during testing. Default: 400.',
)
parser.add_argument(
'--gfpgan_model_path',
type=str,
default='experiments/pretrained_models/GFPGANv1.3.pth',
help='Indicates the path to the GFPGAN model, relative to --gfpgan_dir.',
)
parser.add_argument(
'--gfpgan_dir',
type=str,
default='./src/gfpgan',
help='Indicates the directory containing the GFPGAN code.',
)
parser.add_argument(
'--web',
dest='web',
action='store_true',
help='Start in web server mode.',
)
parser.add_argument(
'--host',
type=str,
default='127.0.0.1',
help='Web server: Host or IP to listen on. Set to 0.0.0.0 to accept traffic from other devices on your network.'
)
parser.add_argument(
'--port',
type=int,
default='9090',
help='Web server: Port to listen on'
)
parser.add_argument(
'--weights',
default='model',
help='Indicates the Stable Diffusion model to use.',
)
parser.add_argument(
'--device',
'-d',
type=str,
default='cuda',
help="device to run stable diffusion on. defaults to cuda `torch.cuda.current_device()` if available"
)
parser.add_argument(
'--model',
default='stable-diffusion-1.4',
help='Indicates which diffusion model to load. (currently "stable-diffusion-1.4" (default) or "laion400m")',
)
parser.add_argument(
'--config',
default='configs/models.yaml',
help='Path to configuration file for alternate models.',
)
return parser
def create_cmd_parser():
parser = argparse.ArgumentParser(description='Example: dream> a fantastic alien landscape -W1024 -H960 -s100 -n12')
parser = argparse.ArgumentParser(
description='Example: dream> a fantastic alien landscape -W1024 -H960 -s100 -n12'
)
parser.add_argument('prompt')
parser.add_argument('-s','--steps',type=int,help="number of steps")
parser.add_argument('-S','--seed',type=int,help="image seed")
parser.add_argument('-n','--iterations',type=int,default=1,help="number of samplings to perform")
parser.add_argument('-b','--batch_size',type=int,default=1,help="number of images to produce per sampling (currently broken)")
parser.add_argument('-W','--width',type=int,help="image width, multiple of 64")
parser.add_argument('-H','--height',type=int,help="image height, multiple of 64")
parser.add_argument('-C','--cfg_scale',default=7.5,type=float,help="prompt configuration scale")
parser.add_argument('-g','--grid',action='store_true',help="generate a grid")
parser.add_argument('-i','--individual',action='store_true',help="generate individual files (default)")
parser.add_argument('-I','--init_img',type=str,help="path to input image (supersedes width and height)")
parser.add_argument('-f','--strength',default=0.75,type=float,help="strength for noising/unnoising. 0.0 preserves image exactly, 1.0 replaces it completely")
parser.add_argument('-s', '--steps', type=int, help='Number of steps')
parser.add_argument(
'-S',
'--seed',
type=int,
help='Image seed; a +ve integer, or use -1 for the previous seed, -2 for the one before that, etc',
)
parser.add_argument(
'-n',
'--iterations',
type=int,
default=1,
help='Number of samplings to perform (slower, but will provide seeds for individual images)',
)
parser.add_argument(
'-W', '--width', type=int, help='Image width, multiple of 64'
)
parser.add_argument(
'-H', '--height', type=int, help='Image height, multiple of 64'
)
parser.add_argument(
'-C',
'--cfg_scale',
default=7.5,
type=float,
help='Classifier free guidance (CFG) scale - higher numbers cause generator to "try" harder.',
)
parser.add_argument(
'-g', '--grid', action='store_true', help='generate a grid'
)
parser.add_argument(
'--outdir',
'-o',
type=str,
default=None,
help='Directory to save generated images and a log of prompts and seeds',
)
parser.add_argument(
'--seamless',
action='store_true',
help='Change the model to seamless tiling (circular) mode',
)
parser.add_argument(
'-i',
'--individual',
action='store_true',
help='Generate individual files (default)',
)
parser.add_argument(
'-I',
'--init_img',
type=str,
help='Path to input image for img2img mode (supersedes width and height)',
)
parser.add_argument(
'-M',
'--init_mask',
type=str,
help='Path to input mask for inpainting mode (supersedes width and height)',
)
parser.add_argument(
'-T',
'-fit',
'--fit',
action='store_true',
help='If specified, will resize the input image to fit within the dimensions of width x height (512x512 default)',
)
parser.add_argument(
'-f',
'--strength',
default=0.75,
type=float,
help='Strength for noising/unnoising. 0.0 preserves image exactly, 1.0 replaces it completely',
)
parser.add_argument(
'-G',
'--gfpgan_strength',
default=0,
type=float,
help='The strength at which to apply the GFPGAN model to the result, in order to improve faces.',
)
parser.add_argument(
'-U',
'--upscale',
nargs='+',
default=None,
type=float,
help='Scale factor (2, 4) for upscaling followed by upscaling strength (0-1.0). If strength not specified, defaults to 0.75'
)
parser.add_argument(
'-save_orig',
'--save_original',
action='store_true',
help='Save original. Use it when upscaling to save both versions.',
)
# variants is going to be superseded by a generalized "prompt-morph" function
# parser.add_argument('-v','--variants',type=int,help="in img2img mode, the first generated image will get passed back to img2img to generate the requested number of variants")
parser.add_argument(
'-x',
'--skip_normalize',
action='store_true',
help='Skip subprompt weight normalization',
)
parser.add_argument(
'-A',
'-m',
'--sampler',
dest='sampler_name',
default=None,
type=str,
choices=SAMPLER_CHOICES,
metavar='SAMPLER_NAME',
help=f'Switch to a different sampler. Supported samplers: {", ".join(SAMPLER_CHOICES)}',
)
parser.add_argument(
'-t',
'--log_tokenization',
action='store_true',
help='shows how the prompt is split into tokens'
)
parser.add_argument(
'-v',
'--variation_amount',
default=0.0,
type=float,
help='If > 0, generates variations on the initial seed instead of random seeds per iteration. Must be between 0 and 1. Higher values will be more different.'
)
parser.add_argument(
'-V',
'--with_variations',
default=None,
type=str,
help='list of variations to apply, in the format `seed:weight,seed:weight,...'
)
return parser
if readline_available:
def setup_readline():
readline.set_completer(Completer(['--steps','-s','--seed','-S','--iterations','-n','--batch_size','-b',
'--width','-W','--height','-H','--cfg_scale','-C','--grid','-g',
'--individual','-i','--init_img','-I','--strength','-f']).complete)
readline.set_completer_delims(" ")
readline.parse_and_bind('tab: complete')
load_history()
def load_history():
histfile = os.path.join(os.path.expanduser('~'),".dream_history")
try:
readline.read_history_file(histfile)
readline.set_history_length(1000)
except FileNotFoundError:
pass
atexit.register(readline.write_history_file,histfile)
class Completer():
def __init__(self,options):
self.options = sorted(options)
return
def complete(self,text,state):
if text.startswith('-I') or text.startswith('--init_img'):
return self._image_completions(text,state)
response = None
if state == 0:
# This is the first time for this text, so build a match list.
if text:
self.matches = [s
for s in self.options
if s and s.startswith(text)]
else:
self.matches = self.options[:]
# Return the state'th item from the match list,
# if we have that many.
try:
response = self.matches[state]
except IndexError:
response = None
return response
def _image_completions(self,text,state):
# get the path so far
if text.startswith('-I'):
path = text.replace('-I','',1).lstrip()
elif text.startswith('--init_img='):
path = text.replace('--init_img=','',1).lstrip()
matches = list()
path = os.path.expanduser(path)
if len(path)==0:
matches.append(text+'./')
else:
dir = os.path.dirname(path)
dir_list = os.listdir(dir)
for n in dir_list:
if n.startswith('.') and len(n)>1:
continue
full_path = os.path.join(dir,n)
if full_path.startswith(path):
if os.path.isdir(full_path):
matches.append(os.path.join(os.path.dirname(text),n)+'/')
elif n.endswith('.png'):
matches.append(os.path.join(os.path.dirname(text),n))
try:
response = matches[state]
except IndexError:
response = None
return response
if __name__ == "__main__":
if __name__ == '__main__':
main()

30
scripts/images2prompt.py Executable file
View File

@@ -0,0 +1,30 @@
#!/usr/bin/env python3
'''This script reads the "Dream" Stable Diffusion prompt embedded in files generated by dream.py'''
import sys
from PIL import Image,PngImagePlugin
if len(sys.argv) < 2:
print("Usage: file2prompt.py <file1.png> <file2.png> <file3.png>...")
print("This script opens up the indicated dream.py-generated PNG file(s) and prints out the prompt used to generate them.")
exit(-1)
filenames = sys.argv[1:]
for f in filenames:
try:
im = Image.open(f)
try:
prompt = im.text['Dream']
except KeyError:
prompt = ''
print(f'{f}: {prompt}')
except FileNotFoundError:
sys.stderr.write(f'{f} not found\n')
continue
except PermissionError:
sys.stderr.write(f'{f} could not be opened due to inadequate permissions\n')
continue

View File

@@ -6,7 +6,7 @@ import numpy as np
import torch
from main import instantiate_from_config
from ldm.models.diffusion.ddim import DDIMSampler
from ldm.dream.devices import choose_torch_device
def make_batch(image, mask, device):
image = np.array(Image.open(image).convert("RGB"))
@@ -61,8 +61,8 @@ if __name__ == "__main__":
model.load_state_dict(torch.load("models/ldm/inpainting_big/last.ckpt")["state_dict"],
strict=False)
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model = model.to(device)
device = choose_torch_device()
model = model.to(device)
sampler = DDIMSampler(model)
os.makedirs(opt.outdir, exist_ok=True)

115
scripts/merge_embeddings.py Normal file
View File

@@ -0,0 +1,115 @@
from ldm.modules.encoders.modules import FrozenCLIPEmbedder, BERTEmbedder
from ldm.modules.embedding_manager import EmbeddingManager
import argparse, os
from functools import partial
import torch
def get_placeholder_loop(placeholder_string, embedder, use_bert):
new_placeholder = None
while True:
if new_placeholder is None:
new_placeholder = input(f"Placeholder string {placeholder_string} was already used. Please enter a replacement string: ")
else:
new_placeholder = input(f"Placeholder string '{new_placeholder}' maps to more than a single token. Please enter another string: ")
token = get_bert_token_for_string(embedder.tknz_fn, new_placeholder) if use_bert else get_clip_token_for_string(embedder.tokenizer, new_placeholder)
if token is not None:
return new_placeholder, token
def get_clip_token_for_string(tokenizer, string):
batch_encoding = tokenizer(
string,
truncation=True,
max_length=77,
return_length=True,
return_overflowing_tokens=False,
padding="max_length",
return_tensors="pt"
)
tokens = batch_encoding["input_ids"]
if torch.count_nonzero(tokens - 49407) == 2:
return tokens[0, 1]
return None
def get_bert_token_for_string(tokenizer, string):
token = tokenizer(string)
if torch.count_nonzero(token) == 3:
return token[0, 1]
return None
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--manager_ckpts",
type=str,
nargs="+",
required=True,
help="Paths to a set of embedding managers to be merged."
)
parser.add_argument(
"--output_path",
type=str,
required=True,
help="Output path for the merged manager",
)
parser.add_argument(
"-sd", "--use_bert",
action="store_true",
help="Flag to denote that we are not merging stable diffusion embeddings"
)
args = parser.parse_args()
if args.use_bert:
embedder = BERTEmbedder(n_embed=1280, n_layer=32).cuda()
else:
embedder = FrozenCLIPEmbedder().cuda()
EmbeddingManager = partial(EmbeddingManager, embedder, ["*"])
string_to_token_dict = {}
string_to_param_dict = torch.nn.ParameterDict()
placeholder_to_src = {}
for manager_ckpt in args.manager_ckpts:
print(f"Parsing {manager_ckpt}...")
manager = EmbeddingManager()
manager.load(manager_ckpt)
for placeholder_string in manager.string_to_token_dict:
if not placeholder_string in string_to_token_dict:
string_to_token_dict[placeholder_string] = manager.string_to_token_dict[placeholder_string]
string_to_param_dict[placeholder_string] = manager.string_to_param_dict[placeholder_string]
placeholder_to_src[placeholder_string] = manager_ckpt
else:
new_placeholder, new_token = get_placeholder_loop(placeholder_string, embedder, use_bert=args.use_bert)
string_to_token_dict[new_placeholder] = new_token
string_to_param_dict[new_placeholder] = manager.string_to_param_dict[placeholder_string]
placeholder_to_src[new_placeholder] = manager_ckpt
print("Saving combined manager...")
merged_manager = EmbeddingManager()
merged_manager.string_to_param_dict = string_to_param_dict
merged_manager.string_to_token_dict = string_to_token_dict
merged_manager.save(args.output_path)
print("Managers merged. Final list of placeholders: ")
print(placeholder_to_src)

Some files were not shown because too many files have changed in this diff Show More