* unwrap_dtype maybe
* uopgraph stuff that hardcoded None
* test_ops passes
* dtypes.py fixups
* update test_linearizer and friends
* more ast updates
* test_beam and test_schedule too
* add void type to uop [run_process_replay]
* remove dumb casts
* start making it green
* more cast cleanups
* more cls methods to fix
* regenerate dataset
* split UOp and NOp const
* maybe that too
* fix docs
* update test_uop_symbolic
* test_verify_ast
* new sops with no diff
* meh, type_ignore is alright
* remove that assert
---------
Co-authored-by: qazal <qazal.software@gmail.com>
* Revert "late gate creation for STORE [run_process_replay] (#6373)"
This reverts commit c26744de9f.
* Revert "gated store rewrite to UOps.IF (#5976)"
This reverts commit 48061e8400.
* Core change to gate stores in IFs
* Updates to cstyle renderer to handle IFs around STOREs
* Make uops asserts happy
* Add tests and fix newly broken tests
* make ruff happy
* make mypy happy
* Simplify renderer to have all gated stores use IF
* Revert some changes
* Make test_where_fold happy
* Revert unnecessary handling of ifs rendering. Was included before when changes weren't fully built out
* Rewrite graph to have IFs be dependent on RANGEs if STORE is already dependent on RANGE
* Re-change broken test
* Make ifs be grouped together
* get non-merged IFs working. ALl tests pass except grouping related ifs together
* Fix tests by making the IF UOp dependent on the correct node of the STORE UOp
* Changes to uopgraph
* Simplify graph rewrite logic
* Changes to get test_padto_where_multireduce working
* Simplify uops.store renderer
* Make test_padto_where_multireduce pass but now other tests fail
* Clean up uopgraph from scrach work
* Ignore sudo IF srcs when rendering
* Attempt to fix llvm tests
* rm comment
* reduce lines
* Add line to make mypy happy :(
* llvmir fix pt 1
* Mods after rebasing to master
* Fix llvmir
* Fix ptx tests
* Fix other ptx tests
* Move changes from uops.py to ops.py
* rm uops.py
* Fix TestGateStoreRewrite tests
* Get multireduce tests working
* reset to remote branch
* Fix linearizer tests
* uop_graph test patch
* Add comment to create_gate
* hotfix: uncomment those tests
* Attempt to fix ptx tests by including whitespace inside if block
* Patch from remote tinybox. Tests passing here
* Min changes to get some ptx tests passsing
* Changes after rebase
* Exclude ifs and endifs from ptx
* IF conditional branching within ptx
* Save lines on delete_redundant_gates
* Simplify merge_gates
* rm noqa
* Remove unnecessary checks when merging gates
* Fix ops error msg
* Smarter check for if/endif in llvmir
* simplify delete redundant gates to only have 2 returns
* spacing
* Smarter check at beginning of merge_gates
* patches from comments
* Remove need for merge_gates
* include proper srcs in IF from the get-go
* test expand ifs dumb will result in 4 ifs, not 1 now
* Make tests happy
* Fix uops stats
* rm merge_gates method. Will add back in separate PR
* Spacing
* cleaner error msg
* Fix uops rendering when expanding. test_failure_43
* patch tests
* undo changes in delete_redundant_gates
* process replay attempt
* re-intro deletion of redundant gates
* fix addition of gates when they get nested in stores and loads
* patch tests
* smarter init of IF srcs when adding gate to STORE
* make ruff happy
* Resp to comment
* include all src[2]'s srcs in IF for gated store
* add reference of the storing value to the gate's src
* minor patch after rebasing
* change ptx renderer
---------
Co-authored-by: qazal <qazal.software@gmail.com>
* no UnaryOps.NEG in generated UOp patterns
removed pattern `x * (-1) -> -x` and `x != True`
* those are fine because NEG became CMPNE and True
* fix sd validation L2 norm
* most of the work from the uops2 branch
* schedule
* realize
* kernel
* lowerer
* search
* green
* merge uops with ops
* Revert "merge uops with ops"
This reverts commit 1408a59f12.
* fix benchmark
* remove extra dedup
* rewrite bool ADD to OR and MUL to AND
fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.
only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure
* fold those, and fix tests
* only for bool
* move dtypes.bool
* expand merge
* merge barriers
* gate_folder
* test_linearizer_failures
* this can be here
* bring the new repr back
* gate_folder2
* gate_creator is better
* gate_folder
* dedup conditions
* early gate folding
* dedup barrier
* fold noop conditions
* all consts can go away
* free lines
* test/test_linearizer_failures: add a new beautiful_mnist one
this one is from a DEPTH=2 fuzz_linearizer search
* add GPU to test_failure_40
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* test/external/fuzz_linearizer: fix for new AST changes
also add beautiful_mnist failures
* add CLANG and LLVM to test_failure_35 failed_platforms
* fix test_linearizer_failure names
* linearizer: fix get_grouping_dims to respect global/local max
* fix lidx variable index offset and unrestrict clang/llvm global len
* test reverse variable indexing when reverse_dims is true
* change the collapse axis to be the right most if reversed
* Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV
* Delete unused import
* Add cstyle renderer
* Fix formatting text
* Fix test error due to bad implementation of renderer
* Add PTX support
* Add RECIP to LLVMIR
* Remove BinaryOps.DIV from symbolic test
* Change some test and fix C floor division
* Change references to DIV for the RECIP or IDIV
* Add mimic idiv for symbolic test
* Restore floor
* Mimic idiv
* cast to int
* Fix some test and renderer
* Remove DIV for render nodes
* Resolve issue with div
* Add TestRenderer
* Fix test
* fix error
* Fix PAD test
* Fix div implementation
* Remove DIV
* Add upcast to rshift, due to use of MUL and RECIP on DIV
* Fix linter
* Remove complete BinaryOps.DIV
* Fix lint
* Fix some test
* Revert mul modification
* Fix tests
* Fix CLANG for uops
* Revert IDIV function
* Minor fix
* modify pattern matching rule to support nan
* Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP
* Remove const folding for IDIV and fix PTX
* Complete remove IDIV from extra
* Remove test_div from TestFloatUOps due to test on recip
* Fix linearizer
* fix
* Fix test_22
* Fix llvm
* Apply trunc function for llvmlit
* use floor instead of trunc
* Use correct type
* Generate new fuzz db
* Fix rshift, do not cast to float to support idiv
* Return upcast=false to rshift
* Add to unsafepad BinaryOps.IDIV
* Remove RECIP override for CUDA
* add atol / rtol for the test
* Remove cast to int on IDIV
* Regenerate sops
* delete sops.gz
* regenerate
* regenerate
* regenerate
* Reduce margins
* pass atol and rtol as parametersg for _test_metrics
* regenerated dataset
* Regenerate
* Remove duplicated
* Revert changes on extra
* Remove changes extra and NOQA for test
* Remove E501
* Remove and change line
* Remove E501
* Fix atan2
* Revert import and E501
* Remove E501
* Add hrcp to halp ops
* Remove 1 of hrcp
* Remove last DIV and add type check on uops for IDIV
* Fix new tests
* Fix tests and custom function
* Regenerate dataset
* Regenerate dataset
* Revert dataset
* Change generate dataset script
* Remove line
* Change IDIV, type checker validate if x,y and z are int
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* basic tests
* cleanup
* pylint
* ruff
* use define acc as a proxy for rendered reductions
* use define acc as a proxy for rendered reductions
* recursive reduceop rendering via ast_parse
* linters + cleanup
* fixing late buf loading
* plus linters
* removing extra line
* linters
* does this break ci?
* added tests and if add end change
* typo in add_ends
* linters
* removing comments
* allow endifs to be inserted before the end of the graph
* find add ENDIF before next BARRIER
* removing tests with manual ENDIF + linters
* specifically the next barrier aftr the store of the local result
* Revert "specifically the next barrier aftr the store of the local result"
This reverts commit b288a5c3ce.
* keeping up to date
* linters + merge changes
* cleaning up old bad decisions
* linters and opts
* mrged linearizer tests
* fixing merge issues
* removing the big ugly uop test (functionality tested end-to-end by test_linearizer additions
* small diff fixes
* updating linearizer to work without uops.add( ... cachable)
* linters
* comment in multireduce tests
* skipping tests without locals
* full tests
* linters
* load_cache[key] fix for multiple accs
* linters
* assert only one reduceop
* fix loop_scope test to actually cause an issue
* self.load_cache[key] key for DEFINE_ACC changed to use a string to make sure each acc is unique
* updated tests
* fixing merge
* removing debug prints
* complete merge fix
* linters
* diff cleanup
* adding tests in
* give each reduce it's own local buffer
* gpu=1 changes
* store and load locals with upcasting
* modifying test?
* make multireduce_netsted_local_upcast test match single reduce shapes
* removing todo
* cleaning up the diff
* unroll test
* unroll and upcast tests
* fix gpu
* seq and self.load_cache[key] cleaning
* linters
* padto works
* merge fixes
* fixes
* add skips for amd
* linters + seq
* cleaning & more tests
* softmax tests
* linters
* [run_process_replay]
* add new tests back
This reverts commit 19dec22e01.
* more hardcoded -1s
* fix ptx
* Fix name for loop in ptx
* cleaning up the diff
* cleaning up the uops diff
* nv ci is too slow
---------
Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>