chenyu
794796256c
UOp.const_factor [run_process_replay] ( #5945 )
...
* UOp.const_factor [run_process_replay]
simplify mod and div folding
* test does not work now
2024-08-06 18:18:29 -04:00
George Hotz
8d1c884e78
capture the const pattern in both directions ( #5919 )
...
* capture the const pattern in both directions
* add regression test
2024-08-05 12:15:38 -07:00
George Hotz
d7387d31bf
remove useless reduce cases [run_process_replay] ( #5907 )
...
* remove useless reduce cases [run_process_replay]
* do_reduce cleanup
* more cleanups + no longer supported tests
* Revert "more cleanups + no longer supported tests"
This reverts commit e9f2f6ba70 .
* no longer supported tests
* switch ReduceOps.SUM -> BinaryOps.ADD
2024-08-04 17:11:08 -07:00
George Hotz
be8958e26b
use CONTRACT before REDUCE ( #5903 )
...
* use CONTRACT before REDUCE [run_process_replay]
* support half expand
* EXPAND GEP
2024-08-04 16:17:33 -07:00
chenyu
d5de44340e
UOp add mod folding ( #5862 )
...
* UOp add mod folding
* that passes now
2024-08-02 18:31:46 -04:00
George Hotz
877e0b4ba0
define global only has the index [run_process_replay] ( #5869 )
...
* define global only has the index [run_process_replay]
* fix that linearizer test
* fix ptx
* stupid ptx fix
2024-08-01 19:01:15 -07:00
qazal
ed556c260e
UOps.IF rules more tests ( #5831 )
...
* init tests
* split tests
* assert multiple gates simplicity
2024-07-31 00:11:02 -04:00
chenyu
c3da458bc3
UOp if min==max folds to CONST ( #5828 )
...
* UOp if min==max folds to CONST
* fix test
2024-07-30 22:14:22 -04:00
George Hotz
693990a346
swap src[2] and src[3] in load [run_process_replay] ( #5821 )
...
* swap src[2] and src[3] in load [run_process_replay]
* cleanups + bugfix
* fix ptx
2024-07-30 14:04:13 -07:00
George Hotz
17a2f74412
new style load/store folder ( #5784 )
...
* remove old index reorder
* new style folder
* works better
* dedup
* one failure
* this is fine now...
* expander_rewrite
* images broken, but all else should work
* cleanups
* make tests work with old
* fix images
* cleanups + bugfix
* minor fixes
* fix gated store folding
* flip gate_creator and expander
* fix gated store
* remove unneeded rules
* lines getting close
* line count good
2024-07-30 13:17:20 -07:00
George Hotz
76d191ab94
move consts to end of add ( #5783 )
...
* move consts to end of add
* better
* fix infinite loop
2024-07-28 17:38:57 -07:00
qazal
95dda8dadf
more unmatching vectorize/gep asserts [run_process_replay] ( #5760 )
...
* merge vectorize/gep rules [run_process_replay]
* assert dtypes
* src=
* float2=(float4.x,float4.y)
2024-07-28 15:08:54 +08:00
kormann
b0c1dba299
named UOp class "NOP" [run_process_replay] ( #5728 )
...
* NOP
* fix const + simplify compile
* rm VAR for NOOP
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-07-26 13:25:53 -07:00
George Hotz
fa14f7b4fd
switch contract arg to match expand arg [run_process_replay] ( #5667 )
...
* switch contract arg to match expand arg [run_process_replay]
* support multiaxis contract too, it's easy
* cancel contract/expand
2024-07-23 18:08:33 -07:00
George Hotz
a85493bdbe
multiaxis contract test
2024-07-23 15:09:15 -07:00
chenyu
16c27ae400
update UOp.SPECIAL arg spec [run_process_replay] ( #5661 )
...
* update UOp.SPECIAL arg spec [run_process_replay]
from `(0, "gid0", 4)` to just `("gid0", 4)`. closer to a Variable
* fix ptx
2024-07-23 16:58:12 -04:00
chenyu
24505199fb
UOp.const(x.dtype, y) -> x.const(y) [run_process_replay] ( #5642 )
2024-07-22 17:09:40 -04:00
chenyu
b991097d41
move UPat and PatternMatcher from uopgraph.py to uops.py ( #5597 )
...
* move UPat and PatternMatcher from uopgraph.py to uops.py
towards instant UOps rewrite on UOp.alu
[run_process_replay]
* fix imports
2024-07-19 19:28:24 -04:00
kormann
2c4add6844
pretty print lazy op per default ( #5505 )
...
* pretty lop
* min diff
* walrus
* fix
* min diff
* simplify
* pretty helper function
* ws
* pretty uop upat
* tests
* stricter tests
* test passes
* ws
* stronger upat test
* delete print_tree
* min diff
* stricter exp test
* fix merge
* stronger uops eval test
* +readable and deep upat test
* +readable and deep upat test
* sort inv fix
* fix
* revert allowed_len
2024-07-18 09:34:08 -07:00
George Hotz
1242b302fa
expand UOps with rewrite rules ( #5501 )
...
* expand UOps with rewrite rules [run_process_replay]
* progress
* much closer
* close, way less bugs
* bunch of expander tests
* fix contract
* ops tests pass
* fix barrier
* mostly passing
* bitcast in expanded ops
* support more expand merges
* all tests pass maybe
* fix empty EXPAND
* fix LIN fuzzing
* add ALL_SAME assert
* all same
* all same work
* raise CompileError
* pass fuzz linearizer
* revert whitespace
* fix nv tensor core test
* fix mypy
* bug fix
* fuzzer passes
* put tests back
* expand arg to idx
2024-07-17 10:17:50 -07:00
George Hotz
158221b36b
expand tests from uop_expander [run_process_replay] ( #5524 )
...
* expand tests from uop_expander
* more changes from the branch
2024-07-17 09:22:36 -07:00
qazal
0b3a34e3b1
vectorize folding [run_process_replay] ( #5470 )
...
* test_gep_vec_fold
* remove that
* fix process replay
* lint
2024-07-14 09:41:48 +03:00
George Hotz
d13654a820
move uopgraph to file [run_process_replay] ( #5364 )
...
* move uopgraph to file [run_process_replay]
* fix print tree test
2024-07-10 17:34:50 -07:00
kormann
2349d837fb
Fix scope order in graph toposort [run_process_replay] ( #5330 )
...
* fix
* test
* nothing
2024-07-08 11:46:15 -07:00
greg-niemeyer
77b2ce9fc9
Add UOps.VECTORIZE [run_process_replay] ( #5289 )
...
* Add UOps.VECTORIZE to core
* Update vectorized cast tests
* Addresses code review comments
- Removes VECTORIZE from LLVMRenderer
- Add line breaks to unduly long lines
- Add noop CAST rule back
- Update asserts and add render_vectorize in
CSytleLanguage renderer
* Add missing const folding rule for VECTORIZE
Also adds corresponding test
* Fixes test_const_vectorize_fold and add assert
- Use sane types with VECTORIZE in test_const_vectorize_fold
- Add assert that sanity checks the types for VECTORIZE
* Rename test_cast_vectorized_fold
Renames test_cast_vectorized_fold to test_noop_vectorize_fold
because the test targets a very specific rule and there are
other tests for VECTORIZE.
* Revert unrelated changes
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
Co-authored-by: qazal <qazal.software@gmail.com >
2024-07-07 09:59:57 +03:00
qazal
1cefbb33ab
uop graph tests + type_verify cleanup ( #5292 )
...
* test_cast_alu_fold
* test_double_cast_fold + these should assert
2024-07-05 13:00:01 +03:00
qazal
f374fb77af
assert bool dtype for valid [run_process_replay] ( #5214 )
...
* valid is always bool
* prevent NumNode to begin with
* part 2
* test: disable pattern matchers, asserts should pass
* test: store without cast
* test: if (0)
* cleanup time
* only pattern match bool literal
* better for upstream debug
2024-06-29 21:20:32 +03:00
kormann
6c456b6d66
remove uopgraph dedup + slight speedup ( #5199 )
...
* rm dedup
* rm dedup
* tests
* reduce diff
* oups
* reduce diff
* rm UOp.tuple
2024-06-28 09:26:32 -07:00
George Hotz
345bcc2099
move graph_dedup out of class [run_process_replay] ( #5197 )
2024-06-27 12:04:00 -07:00
George Hotz
d094a6828f
single pass rewrite ( #5159 )
...
* single pass rewrite
* claude cleanups
* claude cleanups
* skip those tests
* restrict that to ints
* comment
* asserts i don't expect to fail do fail
* simplest...rewrite...ever
* simplest...rewrite...ever
* add that rule back
* tests pass?
* only collapse reduce loops
* second SHL/SHR arg must be 4 bytes
* fix verify
* no SHL/SHR in ptx
* put that back
* skip them in PTX...bad tests
2024-06-27 11:36:05 -07:00
qazal
24c89a2a33
move assert_equiv_uops to helpers + use == for dtypes ( #5067 )
...
* dtypes should use ==
* use TestUOps
* should use assertIs
2024-06-20 16:39:34 +03:00
qazal
55e02cdd84
generic gate folding ( #5061 )
...
* add assert
* fold truthy gates [run_process_replay]
* fold falsy gates [run_process_replay] [no_assert]
* redo asserts
* check both barriers
* spec start
* spec end
* assert srcs
* make test_fold_gated_load_local better
* [run_process_replay] [no_assert]
2024-06-20 16:10:08 +03:00
kormann
7c3b877216
rename uop [run_process_replay] ( #5031 )
...
* rename
* fix unittests
* rename vin
* fix test
* fix type [run_process_replay]
* rm pre commit hook change
2024-06-18 21:34:05 +03:00
George Hotz
63a8add2c2
move uops add logic to linearize ( #4952 )
...
* move logic to linearize
* idk how this should work
* empty
2024-06-14 03:52:37 -07:00
George Hotz
9823752397
make uops.add private ( #4950 )
...
* make uops.add private
* modernize all tests
2024-06-14 03:23:25 -07:00
chenyu
3afc914617
CMPEQ -> CMPNE and make it safe to pad ( #4818 )
...
* CMPNE
* new dataset
2024-06-03 18:02:15 -04:00
qazal
0e69b22629
multireduce OptOps tests (start) ( #4733 )
...
* start
* full tests
* add skips
* unrelated
* notes
2024-05-27 12:21:33 +03:00
chenyu
eb714a600d
fix UOps.CAST noop for vectorized dtypes ( #4704 )
...
* ==
* add test
* not lazyop
* use str comparison for PtrDType
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-05-23 17:33:29 -04:00
George Hotz
07b350a8f4
new uops is an actual graph ( #4560 )
...
* new uops is an actual graph
* it's way slower
* simpler
* fix define acc
* render_loop unique
* ops test pass
* add pattern matcher back, there's bugs
* rewrite
* use priority queue
* recursive children
* fix tests
* fix tests with SINK
* fix abstractions
* fix assembly
* simpler
* link define_acc
* fix DEFINE_ACC placement
* type verify
* full cmp
* fix cmp
* ACCESS_ACC
* insert DEFINE_ACC
* fix PHI
* recursive rewrite
* fix many tests
* sum collapse
* more patterns
* correct change
* fold arange
* fix that lin test
* space
* big folding rule works
* close
* has more maxes, meh
* cached node replace
* set changed
* simplest folding yet
* works
* works
* DIV
* all tests pass
* del
* fuzz linearizer fails
* sum_collapse
* test depth 2 cf
* fix lin test 14
* fix clang depth
* disable that
* failure 14 is fixed
* fix ptx
* failure 27 is fixed
* fix llama
* run_cnt
* Revert "Optimize PTX gated loads index calculation (#4304 )"
This reverts commit d97d5a7689 .
* fix uops loop
* fix ptx bugs
* add barrier
* print
* mem_type in ptx direct
* bypass tests that fail in CI but pass locally
* ptx remove ptr_ar
* more ptx passing
* fix ptx tests
* assert compile support
* remove model inference benchmark from red
2024-05-17 18:00:18 -07:00
qazal
267bbb57f9
Revert "Add insert_before to Linearizer Functions ( #4320 )" ( #4421 )
...
This reverts commit 664b563c91 .
2024-05-04 17:50:21 +03:00
Timmy
664b563c91
Add insert_before to Linearizer Functions ( #4320 )
...
* adding insert_before to linearizer functions
* uop insert_before test case
* formatting
* more formatting
* more formatting
* syntax
* removing self.cast
* addressing err
* removing noqa s
2024-04-28 11:38:36 -04:00
George Hotz
2024b24f35
add some graph tests ( #3702 )
...
* add some graph tests
* PatternMatcher class
* speedup
* const cast test
* fix tests
* itertools chain
2024-03-12 09:49:47 -07:00