* make maximum split grad
* added test for maximum split grad when equal
* minor expr simplification
* (2-eq)/2 only once
* update test bc one more sum output child stays
* Make GPU the default device
* Compile EfficientNet with CPU
* don't print device
* use METAL and CUDA if possible
* Revert some changes to workflow
* Fix import error when checking device availability
* device lookup is now optional
* hopefully fix linter and tests
* fix workflow
* Skip device if not available
* don't change default if CPU=1
* simplify device selection
* Default to CPU if no GPU
* don't print device name...
* No need to change default in llama
* Make GPU the default device
* Compile EfficientNet with CPU
* don't print device
* use METAL and CUDA if possible
* Revert some changes to workflow
* Fix import error when checking device availability
* device lookup is now optional
* hopefully fix linter and tests
* fix workflow
* Skip device if not available
* don't change default if CPU=1
* simplify device selection
* Default to CPU if no GPU
* don't print device name...
* No need to change default in llama
* run github workflow
* Fix logic to select default
* pass if an error occurs
* use separate function for try except
* activation ops
* type hints + more testing
* formatting correction + parameter testing
* fixes to shape testing
* hardtanh to use clip + removed type hints
* assign val fix
* fix binop, other tests failure
* that was a bad idea
* better layernorm
* inference kernel count tests
* new style reshape pushing
* fixup replacement
* 199 kernels is okay. fix flops
* push reshape through unaryops only
* GRAPH=2 draws the phantom ops
* found resnet issue
* non working test
* mul is cheaper than div
* OPT inflation
* SHUFFLE_PAD_OPS in OPT=2
* linearizer outputs something
* working ish
* cstyle codegen
* clang mostly works
* fix load valid
* fix numberless loop
* fancy gen
* working
* fix enet compiler
* cleanups
* float4 upcasting
* less lines
* supports_float4
* constant folding
* mulacc
* internet tests flaky in CI
* 90% image support
* fix image generic
* bugs exposed with shapetracker and single view
* new llvm
* use vload, remove OLD
* that's really poorly done
* ending up being more lines
* add int64 as supported dtype from numpy
Without this, examples/transformer.py didn't run. With this change it runs successfully.
* Update helpers.py
* Update transformer.py
* Update training.py