* make POSTOPT=2 the default
* more matching tc
* fix winograd
* fix that test
* add matvec to Scheduler
* flip tc sort order
* similar speed
* fix beam on image
* disable slow tests
* slow
* ** simple kernel to replace Kernel for postopt
* support old
* fix beam
* beaming
* beam on old
* bring tensor cores back
* raise
* postbeam
* test ops passes on mac
* skip that
* postopt default
* gate that
* fix tensor cores
* a few test fixes
* dsp fix
* tc fix
* loop
* support swap
* test_gemv
* fix beam for variable
* test opts from high level stuff
* range annoying
* compile slow
* metal slow
* better beam
* no POSTBEAM
* fix nolocals
* hc opt mostly works
* put that back
* lil
* some work
* fix that
* POSTOPT 2
* fix tests
* no postopt 2
* work
* back
* padded tensors cores
* shift_to
* postopt 0 passes?
* write PADTO
* fix padded tensor cores
* compare hcopt
* 18000 lines
* should pass tests
* fix rangeify
* put types back
* cvar dtype:DType|tuple[DType, ...]|None=None
* fmt
* add a test
* list typeguard as a dep for CI
* extra step to install mypy
* fix venv
* ci fixes
* mv typeguard to testing install group
* simpler TYPED=1 test
* add typeguard to lint group
* ** rangeify, try 3
* bring that over
* bufferize, don't use contig tag
* work
* ish
* fix rangeify
* flash attention is back
* fix rangeify tests
* stuff passes
* fix test_log_softmax
* more stuff passes
* progress children
* new endrange solution
* progress
* progress counter
* basic assign
* contigs only
* symbolic in schedule
* unbind_kernel
* late children
* ops fixed
* beautiful mnist is close
* that seems to work
* mnist works
* improve names
* fix bmnist
* no pcontig
* testing backward
* work
* clone movement ops
* new_range helper
* MBLOCK/MERGE
* ops tests pass
* revert mblock stuff
* cleanups...but it breaks ops
* remove reindex
* hack for relu
* disable the hacks
* more hacks
* upd
* mostly works with cleanups disabled
* ndr
* ops tests pass
* terrible hacks for indexing to work
* context mismatch
* pcontig
* split pcontig v contig
* z3 trunc
* null
* no fuse in rangeify
* ops test passes
* lnorm
* fix assign
* nd rangeify
* both should work
* tests for rangeify
* cleanups
* stores pass the pointer through
* disable pcontig for now
* PARTIAL_CONTIG is a flag
* move device tests to test/device
* test speedups
* test device
* linalg to unit
* upd
* so pytest just works
* more divide and skip
* speed
* test devectorize
* add pillow
* BOOM
* cache extra/huggingface/models/
* why max buffer size is not 0
* override MAX_BUFFER_SIZE
* less models
* remove more models and change cache dir to already cached dir
* only metal
* less is more?
* remove check ops
* why is this not setting the ENVVAR
* ughhhhh just test in models
* only cpu and gpu
* only cpu actually
* just override it idk
* final
* move extra dependencies up top
* simplification
* fix print
* make README better
* revert ops_disk fix for now
* clean up test_onnx
* remove testing fashion clip model cuz sloooowwwwww
* actually let METAL run this
* fix comment mistake
* fix download path in run_models
* does this work?
* cleanup setup and teardown
* contextvar like this?
* prove model is cached
* do I need to increment DOWNLOAD_CACHE_VERSION?
* see if cached with incremented DOWNLOAD_CACHE_VERSION
* use warnings to see if the model exists
* revert DOWNLOAD_CACHE_VERSION stuff and clean up
* add retry to download
* nit
* move view pushing to codegen, try 2
* fix up some linearizer tests
* fix test search
* fix test schedule
* delete that test
* fix test arange
* fix a few tests
* update tests
* push views
* ebs cleanup
* fix local/reg
* test and lint
* fix more tests
* test cleanups
* skipped that one
* cleanup fix_kernel
* early load buffer
* early meta ops
* move those to fix_kernel_ops
* fix tests
* remote metal was flaky
* Revert "fix tests"
This reverts commit a27019383d.
* that hack broke things
* fine for ptx