Commit Graph

91 Commits

Author SHA1 Message Date
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu
7dc3352877 increase stable diffusion validation threshold 1e-4 -> 3e-4 (#2897)
saw a flaky CI failure with 1.1e-4, and 3e-4 is a good number
2023-12-21 11:45:25 -05:00
chenyu
a044125c39 validate stable diffusion for seed 0 (#2773)
* validate stable diffusion for seed 0

the closest false positive i can get is with the setup and one less step. dist = 0.0036
same setup with fp16 has dist=5e-6.
so setting validation threshold to 1e-4 should be good

* run with --seed 0
2023-12-15 00:07:09 -05:00
chenyu
9afa8009c1 hot fix explicitly set arange dtype to float (#2772) 2023-12-14 23:14:38 -05:00
George Hotz
9e07824542 move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz
095e2ced61 add name support to fetch (#2407)
* add name support

* use fetch in gpt2

* remove requests from main lib, networkx also optional

* umm, keep that assert

* updates to fetch

* i love the walrus so much

* stop bundling mnist with tinygrad

* err, https

* download cache names

* add DOWNLOAD_CACHE_VERSION

* need env.

* ugh, wrong path

* replace get_child
2023-11-23 14:16:17 -08:00
chenyu
5ef8d682e3 clean up attentions in stable diffusion (#2275) 2023-11-11 14:25:36 -05:00
Ahmed Harmouche
265304e7fd Stable diffusion WebGPU port (#1370)
* WIP: Stable diffusion WebGPU port

* Load whole model: split safetensor to avoid Chrome allocation limit

* Gitignore .DS_Store, remove debug print

* Clip tokenizer in JS

* WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS

* e2e stable diffusion flow

* Create initial random latent tensor in JS

* SD working e2e

* Log if some weights were not loaded properly

* Remove latent_tensor.npy used for debugging

* Cleanup, remove useless logs

* Improve UI

* Add progress bar

* Remove .npy files used for debugging

* Add clip tokenizer as external dependency

* Remove alphas_cumprod.js and load it from safetensors

* Refactor

* Simplify a lot

* Dedup base when limiting elementwise merge (webgpu)

* Add return type to safe_load_metadata

* Do not allow run when webgpu is not supported

* Add progress bar, refactor, fix special names

* Add option to chose from local vs huggingface weights

* lowercase tinygrad :)

* fp16 model dl, decompression client side

* Cache f16 model in browser, better progress

* Cache miss recovery

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-03 18:29:16 -07:00
George Hotz
6dc8eb5bfd universal disk cache (#2130)
* caching infra for tinygrad

* nons tr key

* fix linter

* no shelve in beam search

* beam search caching

* check tensor cores with beam too

* pretty print

* LATEBEAM in stable diffusion
2023-10-22 10:56:57 -07:00
Ahmed Harmouche
0d3410d93f Stable diffusion: Make guidance modifiable (#2077) 2023-10-15 14:36:43 -07:00
Ahmed Harmouche
e27fedfc7b Fix stable diffusion output error on WebGPU (#2032)
* Fix stable diffusion on WebGPU

* Remove hack, numpy cast only on webgpu

* No-copy numpy cast
2023-10-10 06:40:51 -07:00
George Hotz
adab724caa schedule2, keep the tests working with small changes (#1932)
* lazy cleanups

* ast functions take in LazyOps

* op instead of self.op

* _base for mops

* fix contiguous

* start schedule

* test_schedule

* fix openpilot

* more tests

* bugfix and test skip

* work

* make sure things get freed

* fix zerosized tensors

* fix failing test

* fix ceil and friends

* fix openpilot

* disable training

* disable test collectives
2023-09-28 09:14:43 -07:00
Dat D. Nguyen
ae9529e678 chore: remove redundant noise in stable diffusion example (#1910) 2023-09-24 21:33:45 +08:00
segf00lt
9e8c1dbf34 patch to remove hack from stable_diffusion.py (#1814)
* patch to remove hack from stable_diffusion.py

* sorry linter

* realize after assign?

* float16 broken in llvmlite use float64 for now

* int32

* idiot forgot to change test array dtype
2023-09-08 09:26:50 -07:00
George Hotz
722823dee1 stable diffusion: force fp16 free 2023-09-06 15:11:05 -07:00
Francis Lam
0379b64ac4 add seed option to stable_diffusion (#1784)
useful for testing correctness of model runs
2023-09-05 19:45:15 -07:00
Karan Handa
a8aa13dc91 [ready] Replacing os with pathlib (#1708)
* replace os.path with pathlib

* safe convert dirnames to pathlib

* replace all os.path.join

* fix cuda error

* change main chunk

* Reviewer fixes

* fix vgg

* Fixed everything

* Final fixes

* ensure consistency

* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
Umut Zengin
1682e9a38a Fix: Stable Diffusion index (#1713) 2023-08-30 00:21:10 -04:00
George Hotz
aa7c98722b sd timing (#1706) 2023-08-28 20:22:57 -07:00
nimlgen
1c0449e190 add cache collector (#1595)
* init cache collector

* add test_cache_collector.py

* switch GlobalCounters.cache to CacheCollector

* init jit models test

* jitted SD

* add debug msg to print loaded bufs count

* moved cache collctor to jit

* clearer SD

* no double device import
2023-08-28 19:59:55 -07:00
George Hotz
718ced296c move state to nn/state (#1619) 2023-08-22 07:36:24 -07:00
George Hotz
b9feb1b743 fp16 support in stable diffusion 2023-08-20 05:37:21 +00:00
George Hotz
47f18f4d60 [New] SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1516) (#1518)
* Refactor AttnBlock, CrossAttention, CLIPAttention to share code

* Reshape and transpose in loop

* Bugfix on attention mask

Co-authored-by: Jacky Lee <39754370+jla524@users.noreply.github.com>
2023-08-10 15:04:18 -07:00
George Hotz
c82bd59b85 Revert "SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513)" (#1515)
This reverts commit 85e02311a2.
2023-08-10 09:08:51 -07:00
Jacky Lee
85e02311a2 SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513)
* Refactor AttnBlock, CrossAttention, CLIPAttention to share code

* Reshape and transpose in loop
2023-08-10 08:52:33 -07:00
George Hotz
d78fb8f4ed add stable diffusion and llama (#1471)
* add stable diffusion and llama

* pretty in CI

* was CI not true

* that

* CI=true, wtf

* pythonpath

* debug=1

* oops, wrong place

* uops test broken for wgpu

* wgpu tests flaky
2023-08-06 21:31:51 -07:00
Felix
97a6029cf7 Corrected a few misspelled words (#1435) 2023-08-04 16:51:08 -07:00
George Hotz
f27df835a6 delete dead stuff (#1382)
* delete bpe from repo

* remove yolo examples

* Revert "remove yolo examples"

This reverts commit cd1f49d466.

* no windows
2023-07-31 11:17:49 -07:00
George Hotz
bfbb8d3d0f fix ones, BS=2 stable diffusion, caching optimizer (#1312)
* fix ones, BS=2 stable diffusion

* caching optimizer

* print search time

* minor bug fix
2023-07-21 09:55:49 -07:00
George Hotz
f45013f0a3 stable diffusion: remove realizes we don't need 2023-07-20 19:53:07 -07:00
George Hotz
b58dd015e3 stable diffusion: remove import numpy as np 2023-07-20 19:35:44 -07:00
George Hotz
35bc46289c stable diffusion: use new tinygrad primitives 2023-07-20 19:25:49 -07:00
AN Long
f75de602df fix typo in stable diffusion example (#1219) 2023-07-11 15:26:40 -07:00
Diogo
2d4370b487 Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
Diogo
3bb38c3518 limit split to 1 due to windows path containing : (#944) 2023-06-06 10:27:54 -07:00
George Hotz
ed1963b899 Fast DiskTensor to other Tensor (#916)
* make disktensors fast

* loading

* loader for sd and llama
2023-06-03 12:25:41 -07:00
George Hotz
46d419060b start on mlperf models 2023-05-10 16:30:49 -07:00
Kirill
0fe5014b1f Use pathlib (#711)
* Use pathlib in llama

* Use pathlib in stablediffusion
2023-03-18 13:49:21 -07:00
Kirill
af7745073f Add comments to SD (#686)
* Add explanation for empty lambdas

* Fix my_unpickle if pytorch_lightning is installed

* oops
2023-03-12 10:56:49 -07:00
George Hotz
b1206bcb18 third try at torch loading (#677)
* third try at torch loading

* numpy fixed

* fix enet compile

* load_single_weight supports empty weights

* oops, CPU wasn't the default

* so many bugs
2023-03-10 19:11:29 -08:00
George Hotz
8bf75a7fdd fix stable diffusion and CI 2023-03-10 17:48:12 -08:00
Pankaj Doharey
9d97d97b26 Opens image in default viewer after saving. (#612) 2023-03-03 17:28:49 -08:00
Jacky Lee
c35fcc6964 Replace phrase for prompt (#555) 2023-02-12 09:04:44 -08:00
Kirill
27154db99a Downloads weights in examples/stable_diffusion.py (#537)
* Downloads weights in examples/stable_diffusion.py

* use download_file_if_not_exists in fetch

* make consistent with previous NOCACHE behavior
2023-02-10 14:37:04 -06:00
Jacky Lee
f08187526f Fix examples (#540)
* Fix examples

* Remove training in parameters

* Simplify a bit

* Remove extra import

* Fix linter errors

* factor out Device

* NumPy-like semantics for Tensor.__getitem__ (#506)

* Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None

* Fixed pad2d

* mypy doesn't know about mlops methods

* normal python behavior for out-of-bounds slicing

* type: ignore

* inlined idxfix

* added comment for __getitem__

* Better comments, better tests, and fixed bug in np.newaxis

* update cpu and torch to hold buffers (#542)

* update cpu and torch to hold buffers

* save lines, and probably faster

* Mypy fun (#541)

* mypy fun

* things are just faster

* running fast

* mypy is fast

* compile.sh

* no gpu hack

* refactor ops_cpu and ops_torch to not subclass

* make weak buffer work

* tensor works

* fix test failing

* cpu/torch cleanups

* no or operator on dict in python 3.8

* that was junk

* fix warnings

* comment and touchup

* dyn add of math ops

* refactor ops_cpu and ops_torch to not share code

* nn/optim.py compiles now

* Reorder imports

* call mkdir only if directory doesn't exist

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: Mitchell Goff <mitchellgoffpc@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-02-10 12:09:37 -06:00
George Hotz
5e37f084db stable diffusion: clean up constant folding 2023-02-01 12:53:16 -08:00
Jacky Lee
486f023e81 Rename Normalize and move to nn (#513)
* Rename Normalize and move to nn

* Match PyTorch for dim>1
2023-02-01 11:55:03 -08:00
George Hotz
487685919b Revert "Rename Normalize and move to nn (#415)" (#474)
This reverts commit d768acb6a9.
2023-01-25 07:50:04 -08:00
Jacky Lee
d768acb6a9 Rename Normalize and move to nn (#415)
* Rename Normalize and move to nn

* Fix comparison to None error

* Add test for GroupNorm

* Rename test case

* Flip parameters to match PyTorch

* Increase error tolerance

* Fix elementwise_affine on channels

* Match arguments with PyTorch

* Initialize weight and bias only when affine is true

* Is this it?

* A bit cleaner

* Handle case where weight or bias is None
2023-01-25 07:47:59 -08:00