* wip pool
* check CI for remove alternative implementation
* Revert "check CI for remove alternative implementation"
This reverts commit 7b1bb900e5.
* fix test
* tests tests tests
* slap a resolve on it
* fix comment
* a little simpler pool
* check CI for removal again
* Revert "check CI for removal again"
This reverts commit be798b7857.
* small
* update
* some ez tests
* english
* clean up code
* fix ruff
* how did I +25 lines?
* small clean ups
* moar clean ups
* try test_avgpool2d_failure2 in CI
* final clean up
* exclude bug fix
* avg underscore pool
* no more edge case stuff
* add better comments for explanation
* add test cases for decreasing end padding
* address feedback
* improve test coverage
* tiny more polish as we wait for lines :D
* more readable code ordering
* add to documentation
* oops
* set to False instead
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* simple clean ups first
* more work
* kinda have adam
* ooo momentum worked nicely
* almost there
* wow.. is the onnx test wrong
* nicer optim stuff
* just skip that test
* small comment changes
* use naming convention from other parts of codebase
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* Rebase nested div and with const
* Update the ordering
* return None on vectors
Fixes cpu test
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* First version of div_mod folding together
* Working version with old div folding behaviour
* Test is fixed
* Fix linting
* Happy mypy
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* don't mutate the uop/lazybuffer, just the Buffer [pr]
* fix red test
* try different fix
* that
* that's the right fix
* test for fixed behavior
* bump to 3.12
* minor uop cleaner [pr]
* free uop creation speed by removing WeakValueDictionary
* a lil faster
* disable that test
* lines
* and it doesn't print non hit patterns
* alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1)
only do if at least one branch is const, so total alu won't increase
* tests and interesting TODO cases
* script to run regressed sd conv on metal
this and other similar `conv2d + add` kernels contributed to most of the speed regression
* # ruff: noqa: E501
* second try at block linearize
* weeee, works for lil matmul
* it's so beautiful
* test tiny passes
* fix bugs
* combine matching BLOCKENDS
* wrapping
* test lin failures passes
* those failures were fake
* flip sort order
* fix ptx tests
* deal with store better
* dumb ptx fix
* expect less
* reduce lines
* reduce lines
* less lines and cleaner
* no defaultdict
* tighter
* simpler block_parent_count
* update test for gated store
* put gated store rewrite to uopgraph, rm from ptx
* update test
update test
update test
* remove gated st rewrite in llvm
* lint
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>