Commit Graph

5748 Commits

Author SHA1 Message Date
ignaciosica
3918f6eea0 refactor amd render_kernel (#6223)
* refactor amd render_kernel

* fix spacing

* add half alias back

* use itemsize * 8 insted of fixed values

* reverting becasue it broke as no longer 32 was default

* remove comment

* remove nested tuples

* hotfix: prefix.append

* hotfix2: is not None

* more diff cleanups

* hotfix 4: spacing changes must not be in the same diff

* revert wmma dtype rendering

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-08-27 00:28:36 +08:00
ignaciosica
3132449086 refactor _make_{cuda/clang}_dtype into render_vector_prefix (#6287) 2024-08-26 09:14:44 -07:00
Max-We
ab2714423b Add einsum tests (#6286)
Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com>
2024-08-26 09:09:25 -07:00
chenyu
b76f0c875e lazy const fold idiv 1 (#6285) 2024-08-26 10:29:59 -04:00
chenyu
af7c04ff57 Tensor.__floordiv__ (#6283)
support Tensor.__floordiv__ and friends
2024-08-26 09:43:40 -04:00
qazal
d2f8eeed2e make [compare_schedule] the default [run_process_replay] (#6273)
* make [compare_schedule] the default

* capture ctx

* logging

* set capture to false
2024-08-26 21:40:03 +08:00
qazal
067aeaeb2f single arange fusion with graph rewrite (#6160) 2024-08-26 18:18:16 +08:00
qazal
b4381e9777 uop output_st is Optional [run_process_replay] (#6282) 2024-08-26 17:58:55 +08:00
qazal
1c0456af89 add UOps.SWIZZLE (#6271)
* add UOps.SWIZZLE

* flip swizzle init

* generic st_fixup
2024-08-26 16:08:51 +08:00
CaltropHungerton
002f60b4c3 fix intel wmma flop counting, add flop counting tests for different tensor cores (#6192)
* fix wmma flop counting on intel, add count tests

* half

* add half gemm

* Update test.yml

* one test

* Update test_uops_stats.py

* Update test_uops_stats.py

* Update test_uops_stats.py

* smaller matrix, use unittest skipUnless decorator
2024-08-25 18:37:05 -07:00
Tobias Fischer
331b0f5477 new clip gather (#6277) 2024-08-25 19:27:24 -04:00
qazal
f0cc8ca5f2 generic st_fixup in scheduler graph rewrite [compare_schedule] (#6278) 2024-08-25 11:02:17 +03:00
qazal
70015bd89c move permute_reduces to uop movementops [run_process_replay] (#6272) 2024-08-25 10:25:51 +03:00
chenyu
b86907c6c7 UOp.const(x.dtype, y) -> x.const(y) [run_process_replay] (#6276) 2024-08-24 21:39:50 -04:00
chenyu
00282afa41 identity element of binary ops (#6275)
helper for the number reduce acc is inited to (0 for ADD, 1 for MUL and -inf for MAX)
2024-08-24 18:10:19 -04:00
qazal
ee245b48a9 refactor reduceop swizzling (prep for UOps.SWIZZLE) [compare_schedule] (#6269) 2024-08-24 18:17:19 +03:00
gswangg
3cf507ae7f remove extra.ops and LazyOp support from Kernel (#6267)
* remove extra.ops and BufferOps

* remove extra.ops and LazyOp support in Kernel
2024-08-24 16:44:38 +03:00
qazal
ccb05d8baa fixup neg tests [run_process_replay] (#6268) 2024-08-24 16:35:43 +03:00
gswangg
ea76b93814 migrate test_linearizer_dumb.py to UOp AST (#6241)
* add imports and update test_unmerged_ifs to UOp AST

* test_max_simplify_and_cancel

* test_expander_new_srcs

* test_llama_embedding

* test_unaligns_idxs

* test_unrolled_float4_align

* test_upcasted_stores_out_of_order

* remove LazyOp

* remove extra/ops and replace ReduceOps.SUM with BinaryOps.ADD
2024-08-24 16:27:29 +03:00
gswangg
e44653e25a migrate test_linearizer_failures.py to UOp AST (#6240)
* add imports and update test_failure_1 to UOp AST

* update test_failure_2 with UOp AST

* update test_failure_3

* test_failure_5

* test_failure_6

* test_failure_7

* test_failure_8

* test_failure_9

* test_failure_10

* test_failure_11

* test_failure_12

* test_failure_12_multireduce

* uncomment skip and migrate test_failure_13

* test_failure_14

* test_failure_15

* test_failure_16

* test_failure_17

* test_failure_18

* test_failure_19

* test_failure_20

* test_failure_21

* test_failure_22

* test_failure_23

* test_failure_24

* test_failure_25

* test_failure_26

* test_failure_27

* test_failure_28

* test_failure_29

* test_failure_30

* test_failure_31

* test_failure_32

* test_failure_33

* test_failure_34

* test_failure_36

* test_failure_37

* test_failure_38

* test_update_39

* test_failure_40

* test_failure_41

* test_failure_42

* test_failure_43

* test_failure_44

* test_failure_45

* test_failure_46

* test_failure_47

* test_failure_48

* test_failure_49

* test_failure_50

* remove LazyOp

* reskip test_failure_22

* remove extra/ops

* replace ReduceOps with BinaryOps

* fixup that import

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-08-24 16:26:58 +03:00
qazal
1b4ad982e5 share REDUCE_ALU in multi and schedule [run_process_replay] (#6266) 2024-08-24 16:16:38 +03:00
gswangg
1dc6040877 migrate test_search.py to UOp AST (#6245)
* add imports and update test_kernel_count with UOp AST

* test_filter_global_buffer

* remove LazyOp

* remove extra.ops and ReduceOps

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-08-24 16:13:53 +03:00
qazal
ae23540d6e refresh process replay schedule ref in reset.py (#6265) 2024-08-24 16:12:51 +03:00
gswangg
7be5eede71 migrate test_linearizer_overflows.py to UOp AST (#6244)
* add imports, remove ConstBuffer, and update test_overflow_1 with UOp AST

* test_overflow_2

* test_overflow_3

* test_overflow_4

* test_overflow_5

* test_overflow_6

* test_overflow_7

* TestLinearizerOverflowAlt::test_overflow_1

* TestLinearizerOverflowAlt::test_overflow_2

* remove LazyOp

* remove extra.ops

* remove ReduceOps
2024-08-24 16:10:29 +03:00
chenyu
943ab97d24 fix Tensor.prod for multitensor (#6264) 2024-08-24 08:52:24 -04:00
qazal
bcb2f1caa3 init REDUCE_AXIS with BinaryOps (#6256)
* REDUCE_AXIS arg with BinaryOps

* more work in kernel.py
fixup sops.gz

* fix TestGraphRewriteEfficiency
2024-08-24 11:28:41 +03:00
chenyu
da5cf11859 fix acc init value for MUL (#6263) 2024-08-23 23:19:44 -04:00
wozeparrot
a7bf20c7cd feat: updated tinybox docs (#6261)
* feat: updated tinybox docs

* fix: grammar
2024-08-23 18:27:46 -07:00
George Hotz
26498b322e add BEAM to external_benchmark_schedule.py 2024-08-23 18:10:46 -07:00
George Hotz
53a73038e3 hotfix: TestGraphRewriteEfficiency.test_create_many_uops 2024-08-23 15:51:57 -07:00
George Hotz
7c3ba3fa8a improve match stats + custom early reject [run_process_replay] (#6260)
* improve match stats [run_process_replay]

* custom_early_reject
2024-08-23 15:28:57 -07:00
George Hotz
0b0a8829fb allowed_len early stop [run_process_replay] (#6257)
* vectorize single rule [run_process_replay]

* allowed_len gate

* i mean, i guess i like the rule

* cleaner way to write that, and faster
2024-08-23 13:31:07 -07:00
George Hotz
a18744188f more early reject [run_process_replay] (#6254)
* simple matcher in alu [run_process_replay]

* never mind, i don't like simple matcher

* allowed_len == 0 is okay sometimes

* more generic matcher
2024-08-23 12:16:44 -07:00
qazal
0d4887e9df use UOps.WMMA everywhere (#6255)
* add UOps.WMMA_AXIS

* delete ReduceOps.WMMA from ops
2024-08-23 15:03:26 -04:00
chenyu
66d0b14a20 simpler CMPLT UOp _min_max [run_process_replay] (#6251) 2024-08-23 10:36:16 -04:00
chenyu
590c0922b6 Tensor.prod (#6250)
* Tensor.prod

a new reduce op!

* onnx ReduceProd
2024-08-23 10:06:32 -04:00
qazal
78d6bd8b41 start graph rewrite in the scheduler (#6248)
* start graph rewrite in the scheduler

* test: enable it

* test timings

* only fails in multi reduce

* more isolated tests
2024-08-23 13:15:55 +03:00
chenyu
75700edf73 minor bitcast touchup (#6246)
`not A == B` -> `A != B`
2024-08-22 20:25:28 -04:00
chenyu
4d40de867b remove redundant c1-(x+c2) rule [run_process_replay] (#6243) 2024-08-22 16:45:49 -04:00
George Hotz
238896ca02 loooking into graph rewrite speed (#6239)
* loooking into graph rewrite speed

* track, replace is slow

* if all same, no permutations [run_process_replay]

* types so compile works

* no implied comprehension

* TRACK_MATCH_STATS=2
2024-08-22 13:17:55 -07:00
chenyu
f62c4b3b5f remove redundant -(x*c) pattern [run_process_replay] (#6242)
covered by `x*c0*c1`
2024-08-22 16:11:02 -04:00
chenyu
e745e16441 remove UnaryOps.NEG (#6238)
* Remove UnaryOps.NEG

generated new dataset with
```
time JIT=2 PYTHONPATH=. ./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```

* fix that
2024-08-22 14:21:39 -04:00
nimlgen
6c4ddd6260 hcq skip tests when no multidev (#6235)
* hcq skip tests when no multidev

* linter

* a bit higher tinout
2024-08-22 18:27:16 +03:00
chenyu
08539f08b0 fix UOp repr with Variable in arg (#6236) 2024-08-22 11:06:33 -04:00
chenyu
3fc8203475 remove NEG from handwritten ast in tests (#6234)
* remove NEG from handwritten ast in tests

* test_linearizer_failures
2024-08-22 09:06:59 -04:00
chenyu
1c5ef5b793 format test_linearizer_failure (#6231)
made it easier to remove NEG
2024-08-21 21:10:56 -04:00
George Hotz
5cdec79469 simpler expand without dont_expand_args [run_process_replay] (#6230)
* simpler expand without dont_expand_args [run_process_replay]

* Revert "simpler expand without dont_expand_args [run_process_replay]"

This reverts commit 81693024c097c31e601f1a199a631e9eda0d9638.

* exclude_args

* why does that fix it

* correct fix

* _swizzle_args should be fast

* add comment

* zip is tuples
2024-08-21 17:48:45 -07:00
nimlgen
78c94abe9c raise time limit for ci in test_profile_multidev_transfer (#6227) 2024-08-21 22:42:03 +03:00
gswangg
c74b318458 migrate test_linearizer.py to UOp AST, pt. 2 (#6228) 2024-08-21 22:16:11 +03:00
George Hotz
c3168952f0 wip: tracking pattern matcher [run_process_replay] (#6225)
* wip: tracking pattern matcher

* better

* proper dedup

* timing

* early reject

* mergable match stats

* TrackedPattenMatcher

* fix TrackedPattenMatcher

* cleanups

* clean that too

* remove early_reject

* Revert "remove early_reject"

This reverts commit dc2aef14b8f5da58f5ec9566daf252513cac394c.

* total

* sort by time

* match_stats cleanup
2024-08-21 11:57:26 -07:00