* Refactor AttnBlock, CrossAttention, CLIPAttention to share code
* Reshape and transpose in loop
* Bugfix on attention mask
Co-authored-by: Jacky Lee <39754370+jla524@users.noreply.github.com>
* add stable diffusion and llama
* pretty in CI
* was CI not true
* that
* CI=true, wtf
* pythonpath
* debug=1
* oops, wrong place
* uops test broken for wgpu
* wgpu tests flaky
* third try at torch loading
* numpy fixed
* fix enet compile
* load_single_weight supports empty weights
* oops, CPU wasn't the default
* so many bugs
* Fix examples
* Remove training in parameters
* Simplify a bit
* Remove extra import
* Fix linter errors
* factor out Device
* NumPy-like semantics for Tensor.__getitem__ (#506)
* Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None
* Fixed pad2d
* mypy doesn't know about mlops methods
* normal python behavior for out-of-bounds slicing
* type: ignore
* inlined idxfix
* added comment for __getitem__
* Better comments, better tests, and fixed bug in np.newaxis
* update cpu and torch to hold buffers (#542)
* update cpu and torch to hold buffers
* save lines, and probably faster
* Mypy fun (#541)
* mypy fun
* things are just faster
* running fast
* mypy is fast
* compile.sh
* no gpu hack
* refactor ops_cpu and ops_torch to not subclass
* make weak buffer work
* tensor works
* fix test failing
* cpu/torch cleanups
* no or operator on dict in python 3.8
* that was junk
* fix warnings
* comment and touchup
* dyn add of math ops
* refactor ops_cpu and ops_torch to not share code
* nn/optim.py compiles now
* Reorder imports
* call mkdir only if directory doesn't exist
---------
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: Mitchell Goff <mitchellgoffpc@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* Rename Normalize and move to nn
* Fix comparison to None error
* Add test for GroupNorm
* Rename test case
* Flip parameters to match PyTorch
* Increase error tolerance
* Fix elementwise_affine on channels
* Match arguments with PyTorch
* Initialize weight and bias only when affine is true
* Is this it?
* A bit cleaner
* Handle case where weight or bias is None
* this work?
* glorot uniform
* requies_grad broke
* propagate the None correctly
* so this weight init works
* ahh, i think it's this
* can't beat this
* glorot is best for ae
* remove comments
* Added standalone CLIP tokenizer.
* Fixed empty phrase.
* Truncating long prompts.
* Keeping two slots for the start and end token.
* Fixed empty phrase.
* Using tokenizer for empty phrase.
* Typo.