* no zeroview start
* closer
* stride mask
* st tests pass, delete ZeroView
* byebye zv
* close to working
* not contiguous with mask
* subtract, don't add
* mask on view
* ugh, that shouldn't have been in there
* shape merge
* bugfixes
* fuzzer + 4 fuzzer failures
* fuzzer for symbolic
* more fuzzing and nothing
* that fuzzer doesn't hit either
* fixes padding...ugh
* no more offsets
* working
* rewrite load and store
* all checks
* fix idxs
* progress
* bugfix
* float4_axis
* works
* cleanups
* complex valids_okay
* make maximum split grad
* added test for maximum split grad when equal
* minor expr simplification
* (2-eq)/2 only once
* update test bc one more sum output child stays
* Make GPU the default device
* Compile EfficientNet with CPU
* don't print device
* use METAL and CUDA if possible
* Revert some changes to workflow
* Fix import error when checking device availability
* device lookup is now optional
* hopefully fix linter and tests
* fix workflow
* Skip device if not available
* don't change default if CPU=1
* simplify device selection
* Default to CPU if no GPU
* don't print device name...
* No need to change default in llama
* Make GPU the default device
* Compile EfficientNet with CPU
* don't print device
* use METAL and CUDA if possible
* Revert some changes to workflow
* Fix import error when checking device availability
* device lookup is now optional
* hopefully fix linter and tests
* fix workflow
* Skip device if not available
* don't change default if CPU=1
* simplify device selection
* Default to CPU if no GPU
* don't print device name...
* No need to change default in llama
* run github workflow
* Fix logic to select default
* pass if an error occurs
* use separate function for try except
* activation ops
* type hints + more testing
* formatting correction + parameter testing
* fixes to shape testing
* hardtanh to use clip + removed type hints
* assign val fix
* fix binop, other tests failure
* that was a bad idea
* better layernorm
* inference kernel count tests
* new style reshape pushing
* fixup replacement
* 199 kernels is okay. fix flops
* push reshape through unaryops only
* GRAPH=2 draws the phantom ops
* found resnet issue
* non working test
* mul is cheaper than div
* OPT inflation
* SHUFFLE_PAD_OPS in OPT=2