Commit Graph

96 Commits

Author SHA1 Message Date
George Hotz
50ddd11350 lil cleanup matchers [pr] (#7437)
* move delete_redundant_gates [pr]

* simpler uops test

* addr in delete_redundant_gates

* lines

* correct early delete gates

* shorter find_gate
2024-10-31 17:52:22 +08:00
George Hotz
2e3048fc57 Revert "improve full_graph_rewrite matchers for speed (#7431)" (#7434)
This reverts commit 996152d2de.
2024-10-31 16:16:47 +08:00
George Hotz
996152d2de improve full_graph_rewrite matchers for speed (#7431)
* remove finalize [pr]

* early transcendental

* fix tests

* load store indexing runs with devectorize

* move delete_redundant_gates

* ptx has to wait for the mask to move
2024-10-31 16:13:11 +08:00
George Hotz
17c9a9fde4 pm_render [pr] (#7430)
* pm_render [pr]

* test fixes

* use gep, not src

* ptx only symbolic, not sym

* move cast rules
2024-10-31 15:04:50 +08:00
George Hotz
7039fba406 move indexing first (#7409)
* move indexing first [pr]

* no create gate

* fix create_gate

* fix load/store folding

* fix index folding

* remove comment, no process replay
2024-10-31 00:50:35 +08:00
George Hotz
133fe81cc5 Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)" (#7407)
* Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)"

This reverts commit ea5654a9bc.

* test padded in emulation too

* bring back early folding
2024-10-30 23:25:45 +08:00
chenyu
ea5654a9bc Revert "move up migrate + new gated fold (#7403)" (#7406)
This reverts commit adccfade7f.
2024-10-30 23:02:18 +08:00
George Hotz
adccfade7f move up migrate + new gated fold (#7403)
* move up migrate + new gated fold [pr]

* vcount for const ptr

* move those rules there

* fix openpilot
2024-10-30 22:14:01 +08:00
George Hotz
2cfc7b6695 Index everywhere 2 (#7363)
* indexing everywhere [pr]

* fix tests
2024-10-29 19:29:40 +08:00
George Hotz
0af1212164 use assertEqual with new style uops [pr] (#7360) 2024-10-29 18:43:21 +08:00
George Hotz
aadf688aeb order flipper as *normal* rewrite rule (#7300)
* instant isn't actually used [pr]

* order flipper as *normal* rewrite rule

* fix inf loop

* need simplify now
2024-10-25 21:28:30 +08:00
George Hotz
9a3d498d9c with commutative hack, uops can change. fix that (#7266)
* with commutative hack, uops can change. fix that

* simpler
2024-10-24 18:50:23 +08:00
George Hotz
63048ad880 don't recreate COMMUTATIVE the other way (#7255)
* don't recreate COMMUTATIVE the other way

* add shl and add passing test

* fix tests and move assignment to __new__

* that can stay there

* happy mypy
2024-10-24 14:38:29 +08:00
George Hotz
ded1b38b84 minor dtype cleanup [pr] (#7124)
* minor dtype cleanup [pr]

* use ptr() function
2024-10-17 17:41:23 +08:00
George Hotz
3169cb386d remove graph [pr] (#7085) 2024-10-16 11:40:07 +08:00
George Hotz
85a45164fb remove pyint [pr] (#7016)
* remove pyint

* bump time on tp [pr]

* dont truncate in const fold

* remove dead code

* Revert "dont truncate in const fold"

This reverts commit 29c81db0f7.

* remove define_var
2024-10-12 22:36:24 +08:00
George Hotz
e7a0ffe46a break out linearization [pr] (#6994) 2024-10-11 15:27:33 +08:00
George Hotz
c08521e823 minor cleanups from toonygrad (#6990) 2024-10-11 14:19:10 +08:00
qazal
20d3c2d113 unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW (#6955)
* add UOps.VIEW

* update hardcoded asts

* update sops.gz
2024-10-09 02:00:17 +08:00
qazal
4a4aa69b84 add a better dedup test for DEFINE_VAR with CONST arg (#6813) 2024-09-30 15:43:55 +08:00
George Hotz
eaa1e0eeeb rename constant_folder to sym [run_process_replay] (#6780) 2024-09-27 14:54:54 +08:00
George Hotz
e945fa9c5c put local on the PtrDtype [run_process_replay] (#6656)
* put local on the PtrDtype [run_process_replay]

* those are local too
2024-09-23 10:29:17 +08:00
qazal
391d14438e DEFINE_VAR prereqs for VALID [run_process_replay] (#6637) 2024-09-21 13:28:39 +08:00
George Hotz
ffce3ed896 add some new rules (#6555)
* add some new rules

* fix that

* non controversial
2024-09-17 13:59:55 +08:00
George Hotz
a2239c812e minimum new style expand (#6534)
* minimum new style expand [run_process_replay]

* float4 folding works

* fix uop graph

* if means or

* dype.count idx overload

* fix test arange

* expand nope

* fix expand contract

* fix amd tensor core

* oh, that's a good test with a real failure

* remove prints

* early reduce

* tomorrow, we remove sorted on expand args

* fix wmma issue

* that makes test_arange pass

* vectorized folding

* no check

* broadcast

* fix clang with self assign rule
2024-09-17 13:02:41 +08:00
George Hotz
a532d59bbd gep tuple [run_process_replay] (#6495)
* gep tuple [run_process_replay]

* no inf loop, that goes in expander

* fix ops python

* unbreak gep 0

* fix tests

* fix tests

* VECTORIZE/GEP

* oops, broken
2024-09-12 16:37:31 +08:00
George Hotz
6dfa63cb21 more vconst stuff + gep tuple [run_process_replay] (#6494)
* more vconst stuff [run_process_replay]

* revert that

* fix inf loop
2024-09-12 14:58:14 +08:00
George Hotz
119b0ea4af add UOps.VCONST [run_process_replay] (#6487)
* add UOps.VCONST [run_process_replay]

* VCONST folding

* simpler devectorize

* alu

* revert that type
2024-09-12 14:03:39 +08:00
George Hotz
76487a3533 remove nop, use upat [run_process_replay] (#6489)
* remove nop, use upat [run_process_replay]

* mypy passes

* no wonder nothing worked

* fixes
2024-09-12 12:16:19 +08:00
George Hotz
bdd0c06f29 add void type to uop (#6471)
* unwrap_dtype maybe

* uopgraph stuff that hardcoded None

* test_ops passes

* dtypes.py fixups

* update test_linearizer and friends

* more ast updates

* test_beam and test_schedule too

* add void type to uop [run_process_replay]

* remove dumb casts

* start making it green

* more cast cleanups

* more cls methods to fix

* regenerate dataset

* split UOp and NOp const

* maybe that too

* fix docs

* update test_uop_symbolic

* test_verify_ast

* new sops with no diff

* meh, type_ignore is alright

* remove that assert

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-11 18:16:28 +08:00
George Hotz
90fb17304f put rewrite back in ops [run_process_replay] (#6421) 2024-09-09 13:53:51 +08:00
chenyu
26c5d8346a remove Variable from UOp.DEFINE_VAR (#6393)
now it's just arg = (expr as str, min as UOp.const, max as UOp.const)
2024-09-06 05:55:19 -04:00
chenyu
9ed2b8b818 fix DEFINE_VAR setup in test_uop_graph [run_process_replay] (#6392)
making sure arg always have 3 items
2024-09-06 05:32:12 -04:00
chenyu
9a9fea7b8c move DEFINE_VAR min/max from src to arg (#6388)
new arg is (Variable, min as CONST, max as CONST)
2024-09-06 05:01:02 -04:00
George Hotz
c88329244b create rewrite.py [run_process_replay] (#6379)
* create rewrite.py [run_process_replay]

* fix tests

* not in rewrite or ops

* skip flaky test
2024-09-06 10:51:01 +08:00
George Hotz
66e7e51c79 Revert beam failure (#6376)
* Revert "late gate creation for STORE [run_process_replay] (#6373)"

This reverts commit c26744de9f.

* Revert "gated store rewrite to UOps.IF (#5976)"

This reverts commit 48061e8400.
2024-09-06 09:36:44 +08:00
qazal
c26744de9f late gate creation for STORE [run_process_replay] (#6373) 2024-09-06 03:32:19 +08:00
Ian Paul
48061e8400 gated store rewrite to UOps.IF (#5976)
* Core change to gate stores in IFs

* Updates to cstyle renderer to handle IFs around STOREs

* Make uops asserts happy

* Add tests and fix newly broken tests

* make ruff happy

* make mypy happy

* Simplify renderer to have all gated stores use IF

* Revert some changes

* Make test_where_fold happy

* Revert unnecessary handling of ifs rendering. Was included before when changes weren't fully built out

* Rewrite graph to have IFs be dependent on RANGEs if STORE is already dependent on RANGE

* Re-change broken test

* Make ifs be grouped together

* get non-merged IFs working. ALl tests pass except grouping related ifs together

* Fix tests by making the IF UOp dependent on the correct node of the STORE UOp

* Changes to uopgraph

* Simplify graph rewrite logic

* Changes to get test_padto_where_multireduce working

* Simplify uops.store renderer

* Make test_padto_where_multireduce pass but now other tests fail

* Clean up uopgraph from scrach work

* Ignore sudo IF srcs when rendering

* Attempt to fix llvm tests

* rm comment

* reduce lines

* Add line to make mypy happy :(

* llvmir fix pt 1

* Mods after rebasing to master

* Fix llvmir

* Fix ptx tests

* Fix other ptx tests

* Move changes from uops.py to ops.py

* rm uops.py

* Fix TestGateStoreRewrite tests

* Get multireduce tests working

* reset to remote branch

* Fix linearizer tests

* uop_graph test patch

* Add comment to create_gate

* hotfix: uncomment those tests

* Attempt to fix ptx tests by including whitespace inside if block

* Patch from remote tinybox. Tests passing here

* Min changes to get some ptx tests passsing

* Changes after rebase

* Exclude ifs and endifs from ptx

* IF conditional branching within ptx

* Save lines on delete_redundant_gates

* Simplify merge_gates

* rm noqa

* Remove unnecessary checks when merging gates

* Fix ops error msg

* Smarter check for if/endif in llvmir

* simplify delete redundant gates to only have 2 returns

* spacing

* Smarter check at beginning of merge_gates

* patches from comments

* Remove need for merge_gates

* include proper srcs in IF from the get-go

* test expand ifs dumb will result in 4 ifs, not 1 now

* Make tests happy

* Fix uops stats

* rm merge_gates method. Will add back in separate PR

* Spacing

* cleaner error msg

* Fix uops rendering when expanding. test_failure_43

* patch tests

* undo changes in delete_redundant_gates

* process replay attempt

* re-intro deletion of redundant gates

* fix addition of gates when they get nested in stores and loads

* patch tests

* smarter init of IF srcs when adding gate to STORE

* make ruff happy

* Resp to comment

* include all src[2]'s srcs in IF for gated store

* add reference of the storing value to the gate's src

* minor patch after rebasing

* change ptx renderer

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-06 01:05:30 +08:00
George Hotz
4a51c28ee7 switch const to const_like [run_process_replay] (#6356)
* const like

* no more _const

* missed one

* mypy ops.py

* file missing

* const_like

* fix image and test uop graph [run_process_replay]

* fix ptx
2024-09-05 13:57:54 +08:00
qazal
e7f6b654ad cleanup uop eq asserts for swizzle [run_process_replay] (#6362)
* cleanup uop eq asserts for swizzle [run_process_replay]

* more stuff
2024-09-05 13:36:36 +08:00
George Hotz
385904526f remove more rules [run_process_replay] (#6326)
* remove more rules [run_process_replay]

* disable invalid test

* ptx needs that str
2024-08-29 16:27:10 -07:00
qazal
bcb2f1caa3 init REDUCE_AXIS with BinaryOps (#6256)
* REDUCE_AXIS arg with BinaryOps

* more work in kernel.py
fixup sops.gz

* fix TestGraphRewriteEfficiency
2024-08-24 11:28:41 +03:00
George Hotz
53a73038e3 hotfix: TestGraphRewriteEfficiency.test_create_many_uops 2024-08-23 15:51:57 -07:00
George Hotz
238896ca02 loooking into graph rewrite speed (#6239)
* loooking into graph rewrite speed

* track, replace is slow

* if all same, no permutations [run_process_replay]

* types so compile works

* no implied comprehension

* TRACK_MATCH_STATS=2
2024-08-22 13:17:55 -07:00
George Hotz
16f420f7a7 split full_graph_rewrite and linearize_uop [run_process_replay] (#6215)
* split full_graph_rewrite and linearize_uop

* fix tests

* graph rewrite in test uops

* add types
2024-08-20 20:12:33 -07:00
George Hotz
912f01ed4b UOpGraph -> linearize_uop [run_process_replay] (#6119) 2024-08-16 19:48:39 -07:00
George Hotz
74ee9febec remove iter from uopgraph (#6110)
* remove iter from uopgraph

* linearize returns uops

* fix tests

* linearize in linearize

* tests fix

* touchup

* test failures
2024-08-16 15:58:29 -07:00
qazal
28c75bf2a6 merge uops with ops (#6111)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-08-16 18:17:57 -04:00
chenyu
9ef82e1f2b UOp pattern DEFINE_VAR with min==max is also CONST (#6095)
* UOp pattern DEFINE_VAR with min==max is also CONST

* fix tests
2024-08-15 12:09:44 -04:00
chenyu
5accfe26a0 rewrite bool ADD to OR and MUL to AND (#6084)
* rewrite bool ADD to OR and MUL to AND

fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.

only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure

* fold those, and fix tests

* only for bool

* move dtypes.bool
2024-08-15 10:11:57 -04:00