* refactoring thneed
* continue
* minor update
* looks like it's working
* big refactor
* confirm thneed got the right output
* code is there but it's broken
* works now
* always OPTWG, input -> dat
* fix type issue
* ngrl stuff
* fngrl
* fix typo in compile script
* workflow dispatch
* new models in tests
* dont need to up this threshold
Co-authored-by: HaraldSchafer <harald.the.engineer@gmail.com>
* in progress
* big conv test works
* that's unneeded
* fix opencl with reduce
* rewrite contiguous_view_constant_fold
* clean up mids in loop code
* subidx
* print cl kernel before run
* no reduce, no loop
* Revert "no reduce, no loop"
This reverts commit 92777e40e9.
* Fix OpenCL Metal texture issues
Tile CL images when needed, to fit into the 16384 max Metal image size;
gets me to ~4.8s/iteration for SD on M1 Pro with OPENCL=1 FLOAT16=1.
* Minor cleanup
* Fix mish in CI, or no-op?
* Is mish being framed?
* It would help if any of this reproduced locally
* ???
* OPT is reverted; use original mish
* Cleanup post-review
* Fix some shape usage
* Tiler tests, shouldn't oom or overflow either
* Can't CL if there's no CL?
* Run tiler tests even if GPU=1
* relu6 segfault binary chop; revert test
* relu6 segfault binary chop; revert accel
* relu6 segfault binary chop; revert . (???)
* end relu6 segfault binary chop; repo's haunted
* this work?
* glorot uniform
* requies_grad broke
* propagate the None correctly
* so this weight init works
* ahh, i think it's this
* can't beat this
* glorot is best for ae
* remove comments
* Added standalone CLIP tokenizer.
* Fixed empty phrase.
* Truncating long prompts.
* Keeping two slots for the start and end token.
* Fixed empty phrase.
* Using tokenizer for empty phrase.
* Typo.