* flake8: Ignore frequent violations, correct infrequent ones
* Ignore some rules in test
* Reorder test ignores
* Lint test + main
* EOF indent
* Include all E71,E72 errors
* Test the failing case in CI
* Revert "Test the failing case in CI"
This reverts commit 110add0a70.
* Push to test!
This reverts commit f317532779.
* ok back to passing
This reverts commit ba5052685f.
* Prove that CI fails when formatting is incorrect.
* Fix formatting
* Remove duplicitous E117 rule
* Use flake8 config for precommit
---------
Co-authored-by: waifairer <waifairer@gmail.com>
* Add additional kernel when reducing multiple dimensions at once.
* Faster for smaller inputs
* Whitespace and naming
* Cleaner, guard for Metal only, and max 1 split rather than N
* Draft of different approach
* One additional kernel call for this test (as expected)
* fold expands that precede a reduce if the reduction is on the same axis as the expansion
* add deterministic test for SIMPLIFY_SUM_RESHAPE_EXPAND_SUM optimization
* add a test case to make sure we don't fold reduce-expand-reduce on different axes
* feat: promote Embedding to nn
* fix: fix failing test
* feat: add test with jit
* feat: rewrite embedding to no longer need stacked for loops
* clean+fix: don't know how that happened
* e2e testing
* min failure
* no affine on bn, still fails
* why did i think i could detach that?
* allow more kernels for bn
* some test issue i don't understand
* make maximum split grad
* added test for maximum split grad when equal
* minor expr simplification
* (2-eq)/2 only once
* update test bc one more sum output child stays
* fix binop, other tests failure
* that was a bad idea
* better layernorm
* inference kernel count tests
* new style reshape pushing
* fixup replacement
* 199 kernels is okay. fix flops
* push reshape through unaryops only
* GRAPH=2 draws the phantom ops
* found resnet issue
* non working test
* mul is cheaper than div
* OPT inflation
* SHUFFLE_PAD_OPS in OPT=2
* runs one metal kernel
* conv2d works
* ops tests are passing
* const folding
* all ops work
* pre commit always passes
* torch works
* working still
* fix graph test
* tests passing
* image almost works
* image conv works
* most images
* fix custom
* fix assignment
* fix compile enet
* clean up comments
* fix realize return value
* include shapetracker in LB repr
* copy should make a copy
* reenable method cache
* fix lna
* dtypes in graph
* forward only for IMAGE=2
* simple realize
* getting close
* fixup new api, it's good except the kernel count
* back to 197 kernels
* tests should pass
* go to a real float
* no type_on_cpu
* fix the docs
* put shapetracker back in it's proper place
* Refactor contraction and add unit tests
* Fix typo; Fix TestConv.test_elu failure due to some ones in old_shape
* Add push permute test cases
* Fix mypy type annotation check error
* Add contraction unit test; Reshape to higher dimension is not contraction