Commit Graph

75 Commits

Author SHA1 Message Date
George Hotz
e945fa9c5c put local on the PtrDtype [run_process_replay] (#6656)
* put local on the PtrDtype [run_process_replay]

* those are local too
2024-09-23 10:29:17 +08:00
qazal
391d14438e DEFINE_VAR prereqs for VALID [run_process_replay] (#6637) 2024-09-21 13:28:39 +08:00
George Hotz
ffce3ed896 add some new rules (#6555)
* add some new rules

* fix that

* non controversial
2024-09-17 13:59:55 +08:00
George Hotz
a2239c812e minimum new style expand (#6534)
* minimum new style expand [run_process_replay]

* float4 folding works

* fix uop graph

* if means or

* dype.count idx overload

* fix test arange

* expand nope

* fix expand contract

* fix amd tensor core

* oh, that's a good test with a real failure

* remove prints

* early reduce

* tomorrow, we remove sorted on expand args

* fix wmma issue

* that makes test_arange pass

* vectorized folding

* no check

* broadcast

* fix clang with self assign rule
2024-09-17 13:02:41 +08:00
George Hotz
a532d59bbd gep tuple [run_process_replay] (#6495)
* gep tuple [run_process_replay]

* no inf loop, that goes in expander

* fix ops python

* unbreak gep 0

* fix tests

* fix tests

* VECTORIZE/GEP

* oops, broken
2024-09-12 16:37:31 +08:00
George Hotz
6dfa63cb21 more vconst stuff + gep tuple [run_process_replay] (#6494)
* more vconst stuff [run_process_replay]

* revert that

* fix inf loop
2024-09-12 14:58:14 +08:00
George Hotz
119b0ea4af add UOps.VCONST [run_process_replay] (#6487)
* add UOps.VCONST [run_process_replay]

* VCONST folding

* simpler devectorize

* alu

* revert that type
2024-09-12 14:03:39 +08:00
George Hotz
76487a3533 remove nop, use upat [run_process_replay] (#6489)
* remove nop, use upat [run_process_replay]

* mypy passes

* no wonder nothing worked

* fixes
2024-09-12 12:16:19 +08:00
George Hotz
bdd0c06f29 add void type to uop (#6471)
* unwrap_dtype maybe

* uopgraph stuff that hardcoded None

* test_ops passes

* dtypes.py fixups

* update test_linearizer and friends

* more ast updates

* test_beam and test_schedule too

* add void type to uop [run_process_replay]

* remove dumb casts

* start making it green

* more cast cleanups

* more cls methods to fix

* regenerate dataset

* split UOp and NOp const

* maybe that too

* fix docs

* update test_uop_symbolic

* test_verify_ast

* new sops with no diff

* meh, type_ignore is alright

* remove that assert

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-11 18:16:28 +08:00
George Hotz
90fb17304f put rewrite back in ops [run_process_replay] (#6421) 2024-09-09 13:53:51 +08:00
chenyu
26c5d8346a remove Variable from UOp.DEFINE_VAR (#6393)
now it's just arg = (expr as str, min as UOp.const, max as UOp.const)
2024-09-06 05:55:19 -04:00
chenyu
9ed2b8b818 fix DEFINE_VAR setup in test_uop_graph [run_process_replay] (#6392)
making sure arg always have 3 items
2024-09-06 05:32:12 -04:00
chenyu
9a9fea7b8c move DEFINE_VAR min/max from src to arg (#6388)
new arg is (Variable, min as CONST, max as CONST)
2024-09-06 05:01:02 -04:00
George Hotz
c88329244b create rewrite.py [run_process_replay] (#6379)
* create rewrite.py [run_process_replay]

* fix tests

* not in rewrite or ops

* skip flaky test
2024-09-06 10:51:01 +08:00
George Hotz
66e7e51c79 Revert beam failure (#6376)
* Revert "late gate creation for STORE [run_process_replay] (#6373)"

This reverts commit c26744de9f.

* Revert "gated store rewrite to UOps.IF (#5976)"

This reverts commit 48061e8400.
2024-09-06 09:36:44 +08:00
qazal
c26744de9f late gate creation for STORE [run_process_replay] (#6373) 2024-09-06 03:32:19 +08:00
Ian Paul
48061e8400 gated store rewrite to UOps.IF (#5976)
* Core change to gate stores in IFs

* Updates to cstyle renderer to handle IFs around STOREs

* Make uops asserts happy

* Add tests and fix newly broken tests

* make ruff happy

* make mypy happy

* Simplify renderer to have all gated stores use IF

* Revert some changes

* Make test_where_fold happy

* Revert unnecessary handling of ifs rendering. Was included before when changes weren't fully built out

* Rewrite graph to have IFs be dependent on RANGEs if STORE is already dependent on RANGE

* Re-change broken test

* Make ifs be grouped together

* get non-merged IFs working. ALl tests pass except grouping related ifs together

* Fix tests by making the IF UOp dependent on the correct node of the STORE UOp

* Changes to uopgraph

* Simplify graph rewrite logic

* Changes to get test_padto_where_multireduce working

* Simplify uops.store renderer

* Make test_padto_where_multireduce pass but now other tests fail

* Clean up uopgraph from scrach work

* Ignore sudo IF srcs when rendering

* Attempt to fix llvm tests

* rm comment

* reduce lines

* Add line to make mypy happy :(

* llvmir fix pt 1

* Mods after rebasing to master

* Fix llvmir

* Fix ptx tests

* Fix other ptx tests

* Move changes from uops.py to ops.py

* rm uops.py

* Fix TestGateStoreRewrite tests

* Get multireduce tests working

* reset to remote branch

* Fix linearizer tests

* uop_graph test patch

* Add comment to create_gate

* hotfix: uncomment those tests

* Attempt to fix ptx tests by including whitespace inside if block

* Patch from remote tinybox. Tests passing here

* Min changes to get some ptx tests passsing

* Changes after rebase

* Exclude ifs and endifs from ptx

* IF conditional branching within ptx

* Save lines on delete_redundant_gates

* Simplify merge_gates

* rm noqa

* Remove unnecessary checks when merging gates

* Fix ops error msg

* Smarter check for if/endif in llvmir

* simplify delete redundant gates to only have 2 returns

* spacing

* Smarter check at beginning of merge_gates

* patches from comments

* Remove need for merge_gates

* include proper srcs in IF from the get-go

* test expand ifs dumb will result in 4 ifs, not 1 now

* Make tests happy

* Fix uops stats

* rm merge_gates method. Will add back in separate PR

* Spacing

* cleaner error msg

* Fix uops rendering when expanding. test_failure_43

* patch tests

* undo changes in delete_redundant_gates

* process replay attempt

* re-intro deletion of redundant gates

* fix addition of gates when they get nested in stores and loads

* patch tests

* smarter init of IF srcs when adding gate to STORE

* make ruff happy

* Resp to comment

* include all src[2]'s srcs in IF for gated store

* add reference of the storing value to the gate's src

* minor patch after rebasing

* change ptx renderer

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-06 01:05:30 +08:00
George Hotz
4a51c28ee7 switch const to const_like [run_process_replay] (#6356)
* const like

* no more _const

* missed one

* mypy ops.py

* file missing

* const_like

* fix image and test uop graph [run_process_replay]

* fix ptx
2024-09-05 13:57:54 +08:00
qazal
e7f6b654ad cleanup uop eq asserts for swizzle [run_process_replay] (#6362)
* cleanup uop eq asserts for swizzle [run_process_replay]

* more stuff
2024-09-05 13:36:36 +08:00
George Hotz
385904526f remove more rules [run_process_replay] (#6326)
* remove more rules [run_process_replay]

* disable invalid test

* ptx needs that str
2024-08-29 16:27:10 -07:00
qazal
bcb2f1caa3 init REDUCE_AXIS with BinaryOps (#6256)
* REDUCE_AXIS arg with BinaryOps

* more work in kernel.py
fixup sops.gz

* fix TestGraphRewriteEfficiency
2024-08-24 11:28:41 +03:00
George Hotz
53a73038e3 hotfix: TestGraphRewriteEfficiency.test_create_many_uops 2024-08-23 15:51:57 -07:00
George Hotz
238896ca02 loooking into graph rewrite speed (#6239)
* loooking into graph rewrite speed

* track, replace is slow

* if all same, no permutations [run_process_replay]

* types so compile works

* no implied comprehension

* TRACK_MATCH_STATS=2
2024-08-22 13:17:55 -07:00
George Hotz
16f420f7a7 split full_graph_rewrite and linearize_uop [run_process_replay] (#6215)
* split full_graph_rewrite and linearize_uop

* fix tests

* graph rewrite in test uops

* add types
2024-08-20 20:12:33 -07:00
George Hotz
912f01ed4b UOpGraph -> linearize_uop [run_process_replay] (#6119) 2024-08-16 19:48:39 -07:00
George Hotz
74ee9febec remove iter from uopgraph (#6110)
* remove iter from uopgraph

* linearize returns uops

* fix tests

* linearize in linearize

* tests fix

* touchup

* test failures
2024-08-16 15:58:29 -07:00
qazal
28c75bf2a6 merge uops with ops (#6111)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-08-16 18:17:57 -04:00
chenyu
9ef82e1f2b UOp pattern DEFINE_VAR with min==max is also CONST (#6095)
* UOp pattern DEFINE_VAR with min==max is also CONST

* fix tests
2024-08-15 12:09:44 -04:00
chenyu
5accfe26a0 rewrite bool ADD to OR and MUL to AND (#6084)
* rewrite bool ADD to OR and MUL to AND

fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.

only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure

* fold those, and fix tests

* only for bool

* move dtypes.bool
2024-08-15 10:11:57 -04:00
chenyu
df03dca6e3 move % inside UOp mod_folding and remove deprecated tests (#6085)
[run_process_replay]
2024-08-14 23:25:10 -04:00
qazal
9145ad52ff revert UOps eq, this needs to be isolated in realize.py (#6063)
This reverts commit dccca7f227.
2024-08-13 18:02:34 +03:00
qazal
dccca7f227 test: uop and lazyop have the same compare (#6053)
* test: uop and lazyop have the same compare

* typings

* self.assert_equiv_uops -> assertEqual

* hash dtype

* test nop too

* TestPatternMatcher never used this compare anyway

* nop eq and ne tests
2024-08-13 00:33:19 +03:00
gswangg
df44a4e861 Make vectorization of CONST explicit (#5322)
* remove test_const_vectorize_fold

* remove const folding UPat for VECTORIZE

* refactor cstyle render_const

* remove calls to dtype.scalar() in render_const

* add assert

* add vectorized const to UOp.const

* add UPat GEP-VECTORIZE-CONST -> CONST

* render_vectorize for DEFINE_ACC in cstyle

* add back missing render_cast in render_const

* generate vectorized consts as UOps for DEFINE_ACC

* update asserts for DEFINE_ACC with VECTORIZE src

* add UPats for PHI with VECTORIZE src

* use prev rendered vectorize in DEFINE_ACC render

* update DEFINE_ACC in python runtime

* update vectorized DEFINE_ACC in PTXRenderer

* rebase DEFINE_ACC changes on lowerer

* verbose rewrite of bad UPats

* simplify UOps.CONST implementation in ops_python

* update sum_collapse UPats for DEFINE_ACC-VECTORIZE

* revert linearizer to TOT

* fix DEFINE_ACC implementation in ops_python

* simplify DEFINE_ACC in cstyle

* Fix linter error

* support VECTORIZE in fold gated load/store UPat

* support VECTORIZE in other fold gated load UPats

* rewrite VECTORIZE in UPat for no input DEFINE_ACC

* simplify DEFINE_ACC render in cstyle

* make VECTORIZE rules more concise

* add more vectorize fold tests

* inline VECTORIZE-CONSTs in cstyle render

* revert VECTORIZE/GEP rule refactor

* revert cstyle render_const refactor

* inline VECTORIZE-CONSTs in cstyle render

* implicitly vectorized const rendering -> explicit

* WMMA VECTORIZE CONST process replay hacks

* VECTORIZE CONST NAN process_replay hacks

* more VECTORIZE CONST NAN hacks

* cleanup process_replay hacks

* isnan() -> not isfinite() cstyle VECTORIZE CONST

* tweak isnan and isfinite checks VECTORIZE CONST

* tweak for positive vs negative infinity VECTORIZE CONST

* add assert to PTX CONST render

* process_replay VECTORIZE CONST render parity for PTX STORE

* vmin/vmax for VECTORIZE'd CONST

* update WMMA folding rules

* add tests for WMMA VECTORIZE fold

* hack for cstyle half4 CONST zero process_replay parity

* revert PTX backend changes

* add back minimal DEFINE_ACC PTX change

* remove cstyle process_replay hacks

* remove dead code in PTX CONST render

* cleanup vmin/vmax logic for VECTORIZE'd CONSTs

* update vectorize fold tests to use DEFINE_VAR

* fix long line formatting in test

* remove unwanted merge artifact

* more vmin/vmax cleanup

* remove unnecessary asserts

* yet more vmin/vmax cleanup

* get rid of explicit VECTORIZE CONST logic in _min_max

* reuse CONST instead of creating a new one

* remove unneeded cast

* handle DType correctly in sconst

* improve readability of tests

* save a line

* save another line

* tuplize pats in src

* remove GEP-VECTORIZE pats

* add vec +0 fold

* HACK: fold only vec8 +0

* remove vectorized ALU fold hack

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-08-08 20:59:05 +03:00
chenyu
794796256c UOp.const_factor [run_process_replay] (#5945)
* UOp.const_factor [run_process_replay]

simplify mod and div folding

* test does not work now
2024-08-06 18:18:29 -04:00
George Hotz
8d1c884e78 capture the const pattern in both directions (#5919)
* capture the const pattern in both directions

* add regression test
2024-08-05 12:15:38 -07:00
George Hotz
d7387d31bf remove useless reduce cases [run_process_replay] (#5907)
* remove useless reduce cases [run_process_replay]

* do_reduce cleanup

* more cleanups + no longer supported tests

* Revert "more cleanups + no longer supported tests"

This reverts commit e9f2f6ba70.

* no longer supported tests

* switch ReduceOps.SUM -> BinaryOps.ADD
2024-08-04 17:11:08 -07:00
George Hotz
be8958e26b use CONTRACT before REDUCE (#5903)
* use CONTRACT before REDUCE [run_process_replay]

* support half expand

* EXPAND GEP
2024-08-04 16:17:33 -07:00
chenyu
d5de44340e UOp add mod folding (#5862)
* UOp add mod folding

* that passes now
2024-08-02 18:31:46 -04:00
George Hotz
877e0b4ba0 define global only has the index [run_process_replay] (#5869)
* define global only has the index [run_process_replay]

* fix that linearizer test

* fix ptx

* stupid ptx fix
2024-08-01 19:01:15 -07:00
qazal
ed556c260e UOps.IF rules more tests (#5831)
* init tests

* split tests

* assert multiple gates simplicity
2024-07-31 00:11:02 -04:00
chenyu
c3da458bc3 UOp if min==max folds to CONST (#5828)
* UOp if min==max folds to CONST

* fix test
2024-07-30 22:14:22 -04:00
George Hotz
693990a346 swap src[2] and src[3] in load [run_process_replay] (#5821)
* swap src[2] and src[3] in load [run_process_replay]

* cleanups + bugfix

* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412 new style load/store folder (#5784)
* remove old index reorder

* new style folder

* works better

* dedup

* one failure

* this is fine now...

* expander_rewrite

* images broken, but all else should work

* cleanups

* make tests work with old

* fix images

* cleanups + bugfix

* minor fixes

* fix gated store folding

* flip gate_creator and expander

* fix gated store

* remove unneeded rules

* lines getting close

* line count good
2024-07-30 13:17:20 -07:00
George Hotz
76d191ab94 move consts to end of add (#5783)
* move consts to end of add

* better

* fix infinite loop
2024-07-28 17:38:57 -07:00
qazal
95dda8dadf more unmatching vectorize/gep asserts [run_process_replay] (#5760)
* merge vectorize/gep rules [run_process_replay]

* assert dtypes

* src=

* float2=(float4.x,float4.y)
2024-07-28 15:08:54 +08:00
kormann
b0c1dba299 named UOp class "NOP" [run_process_replay] (#5728)
* NOP

* fix const + simplify compile

* rm VAR for NOOP

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-07-26 13:25:53 -07:00
George Hotz
fa14f7b4fd switch contract arg to match expand arg [run_process_replay] (#5667)
* switch contract arg to match expand arg [run_process_replay]

* support multiaxis contract too, it's easy

* cancel contract/expand
2024-07-23 18:08:33 -07:00
George Hotz
a85493bdbe multiaxis contract test 2024-07-23 15:09:15 -07:00
chenyu
16c27ae400 update UOp.SPECIAL arg spec [run_process_replay] (#5661)
* update UOp.SPECIAL arg spec [run_process_replay]

from `(0, "gid0", 4)` to just `("gid0", 4)`. closer to a Variable

* fix ptx
2024-07-23 16:58:12 -04:00
chenyu
24505199fb UOp.const(x.dtype, y) -> x.const(y) [run_process_replay] (#5642) 2024-07-22 17:09:40 -04:00