ignaciosica
3918f6eea0
refactor amd render_kernel ( #6223 )
...
* refactor amd render_kernel
* fix spacing
* add half alias back
* use itemsize * 8 insted of fixed values
* reverting becasue it broke as no longer 32 was default
* remove comment
* remove nested tuples
* hotfix: prefix.append
* hotfix2: is not None
* more diff cleanups
* hotfix 4: spacing changes must not be in the same diff
* revert wmma dtype rendering
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-08-27 00:28:36 +08:00
ignaciosica
3132449086
refactor _make_{cuda/clang}_dtype into render_vector_prefix ( #6287 )
2024-08-26 09:14:44 -07:00
Max-We
ab2714423b
Add einsum tests ( #6286 )
...
Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com >
2024-08-26 09:09:25 -07:00
chenyu
b76f0c875e
lazy const fold idiv 1 ( #6285 )
2024-08-26 10:29:59 -04:00
chenyu
af7c04ff57
Tensor.__floordiv__ ( #6283 )
...
support Tensor.__floordiv__ and friends
2024-08-26 09:43:40 -04:00
qazal
d2f8eeed2e
make [compare_schedule] the default [run_process_replay] ( #6273 )
...
* make [compare_schedule] the default
* capture ctx
* logging
* set capture to false
2024-08-26 21:40:03 +08:00
qazal
067aeaeb2f
single arange fusion with graph rewrite ( #6160 )
2024-08-26 18:18:16 +08:00
qazal
b4381e9777
uop output_st is Optional [run_process_replay] ( #6282 )
2024-08-26 17:58:55 +08:00
qazal
1c0456af89
add UOps.SWIZZLE ( #6271 )
...
* add UOps.SWIZZLE
* flip swizzle init
* generic st_fixup
2024-08-26 16:08:51 +08:00
CaltropHungerton
002f60b4c3
fix intel wmma flop counting, add flop counting tests for different tensor cores ( #6192 )
...
* fix wmma flop counting on intel, add count tests
* half
* add half gemm
* Update test.yml
* one test
* Update test_uops_stats.py
* Update test_uops_stats.py
* Update test_uops_stats.py
* smaller matrix, use unittest skipUnless decorator
2024-08-25 18:37:05 -07:00
Tobias Fischer
331b0f5477
new clip gather ( #6277 )
2024-08-25 19:27:24 -04:00
qazal
f0cc8ca5f2
generic st_fixup in scheduler graph rewrite [compare_schedule] ( #6278 )
2024-08-25 11:02:17 +03:00
qazal
70015bd89c
move permute_reduces to uop movementops [run_process_replay] ( #6272 )
2024-08-25 10:25:51 +03:00
chenyu
b86907c6c7
UOp.const(x.dtype, y) -> x.const(y) [run_process_replay] ( #6276 )
2024-08-24 21:39:50 -04:00
chenyu
00282afa41
identity element of binary ops ( #6275 )
...
helper for the number reduce acc is inited to (0 for ADD, 1 for MUL and -inf for MAX)
2024-08-24 18:10:19 -04:00
qazal
ee245b48a9
refactor reduceop swizzling (prep for UOps.SWIZZLE) [compare_schedule] ( #6269 )
2024-08-24 18:17:19 +03:00
gswangg
3cf507ae7f
remove extra.ops and LazyOp support from Kernel ( #6267 )
...
* remove extra.ops and BufferOps
* remove extra.ops and LazyOp support in Kernel
2024-08-24 16:44:38 +03:00
qazal
ccb05d8baa
fixup neg tests [run_process_replay] ( #6268 )
2024-08-24 16:35:43 +03:00
gswangg
ea76b93814
migrate test_linearizer_dumb.py to UOp AST ( #6241 )
...
* add imports and update test_unmerged_ifs to UOp AST
* test_max_simplify_and_cancel
* test_expander_new_srcs
* test_llama_embedding
* test_unaligns_idxs
* test_unrolled_float4_align
* test_upcasted_stores_out_of_order
* remove LazyOp
* remove extra/ops and replace ReduceOps.SUM with BinaryOps.ADD
2024-08-24 16:27:29 +03:00
gswangg
e44653e25a
migrate test_linearizer_failures.py to UOp AST ( #6240 )
...
* add imports and update test_failure_1 to UOp AST
* update test_failure_2 with UOp AST
* update test_failure_3
* test_failure_5
* test_failure_6
* test_failure_7
* test_failure_8
* test_failure_9
* test_failure_10
* test_failure_11
* test_failure_12
* test_failure_12_multireduce
* uncomment skip and migrate test_failure_13
* test_failure_14
* test_failure_15
* test_failure_16
* test_failure_17
* test_failure_18
* test_failure_19
* test_failure_20
* test_failure_21
* test_failure_22
* test_failure_23
* test_failure_24
* test_failure_25
* test_failure_26
* test_failure_27
* test_failure_28
* test_failure_29
* test_failure_30
* test_failure_31
* test_failure_32
* test_failure_33
* test_failure_34
* test_failure_36
* test_failure_37
* test_failure_38
* test_update_39
* test_failure_40
* test_failure_41
* test_failure_42
* test_failure_43
* test_failure_44
* test_failure_45
* test_failure_46
* test_failure_47
* test_failure_48
* test_failure_49
* test_failure_50
* remove LazyOp
* reskip test_failure_22
* remove extra/ops
* replace ReduceOps with BinaryOps
* fixup that import
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-08-24 16:26:58 +03:00
qazal
1b4ad982e5
share REDUCE_ALU in multi and schedule [run_process_replay] ( #6266 )
2024-08-24 16:16:38 +03:00
gswangg
1dc6040877
migrate test_search.py to UOp AST ( #6245 )
...
* add imports and update test_kernel_count with UOp AST
* test_filter_global_buffer
* remove LazyOp
* remove extra.ops and ReduceOps
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-08-24 16:13:53 +03:00
qazal
ae23540d6e
refresh process replay schedule ref in reset.py ( #6265 )
2024-08-24 16:12:51 +03:00
gswangg
7be5eede71
migrate test_linearizer_overflows.py to UOp AST ( #6244 )
...
* add imports, remove ConstBuffer, and update test_overflow_1 with UOp AST
* test_overflow_2
* test_overflow_3
* test_overflow_4
* test_overflow_5
* test_overflow_6
* test_overflow_7
* TestLinearizerOverflowAlt::test_overflow_1
* TestLinearizerOverflowAlt::test_overflow_2
* remove LazyOp
* remove extra.ops
* remove ReduceOps
2024-08-24 16:10:29 +03:00
chenyu
943ab97d24
fix Tensor.prod for multitensor ( #6264 )
2024-08-24 08:52:24 -04:00
qazal
bcb2f1caa3
init REDUCE_AXIS with BinaryOps ( #6256 )
...
* REDUCE_AXIS arg with BinaryOps
* more work in kernel.py
fixup sops.gz
* fix TestGraphRewriteEfficiency
2024-08-24 11:28:41 +03:00
chenyu
da5cf11859
fix acc init value for MUL ( #6263 )
2024-08-23 23:19:44 -04:00
wozeparrot
a7bf20c7cd
feat: updated tinybox docs ( #6261 )
...
* feat: updated tinybox docs
* fix: grammar
2024-08-23 18:27:46 -07:00
George Hotz
26498b322e
add BEAM to external_benchmark_schedule.py
2024-08-23 18:10:46 -07:00
George Hotz
53a73038e3
hotfix: TestGraphRewriteEfficiency.test_create_many_uops
2024-08-23 15:51:57 -07:00
George Hotz
7c3ba3fa8a
improve match stats + custom early reject [run_process_replay] ( #6260 )
...
* improve match stats [run_process_replay]
* custom_early_reject
2024-08-23 15:28:57 -07:00
George Hotz
0b0a8829fb
allowed_len early stop [run_process_replay] ( #6257 )
...
* vectorize single rule [run_process_replay]
* allowed_len gate
* i mean, i guess i like the rule
* cleaner way to write that, and faster
2024-08-23 13:31:07 -07:00
George Hotz
a18744188f
more early reject [run_process_replay] ( #6254 )
...
* simple matcher in alu [run_process_replay]
* never mind, i don't like simple matcher
* allowed_len == 0 is okay sometimes
* more generic matcher
2024-08-23 12:16:44 -07:00
qazal
0d4887e9df
use UOps.WMMA everywhere ( #6255 )
...
* add UOps.WMMA_AXIS
* delete ReduceOps.WMMA from ops
2024-08-23 15:03:26 -04:00
chenyu
66d0b14a20
simpler CMPLT UOp _min_max [run_process_replay] ( #6251 )
2024-08-23 10:36:16 -04:00
chenyu
590c0922b6
Tensor.prod ( #6250 )
...
* Tensor.prod
a new reduce op!
* onnx ReduceProd
2024-08-23 10:06:32 -04:00
qazal
78d6bd8b41
start graph rewrite in the scheduler ( #6248 )
...
* start graph rewrite in the scheduler
* test: enable it
* test timings
* only fails in multi reduce
* more isolated tests
2024-08-23 13:15:55 +03:00
chenyu
75700edf73
minor bitcast touchup ( #6246 )
...
`not A == B` -> `A != B`
2024-08-22 20:25:28 -04:00
chenyu
4d40de867b
remove redundant c1-(x+c2) rule [run_process_replay] ( #6243 )
2024-08-22 16:45:49 -04:00
George Hotz
238896ca02
loooking into graph rewrite speed ( #6239 )
...
* loooking into graph rewrite speed
* track, replace is slow
* if all same, no permutations [run_process_replay]
* types so compile works
* no implied comprehension
* TRACK_MATCH_STATS=2
2024-08-22 13:17:55 -07:00
chenyu
f62c4b3b5f
remove redundant -(x*c) pattern [run_process_replay] ( #6242 )
...
covered by `x*c0*c1`
2024-08-22 16:11:02 -04:00
chenyu
e745e16441
remove UnaryOps.NEG ( #6238 )
...
* Remove UnaryOps.NEG
generated new dataset with
```
time JIT=2 PYTHONPATH=. ./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
* fix that
2024-08-22 14:21:39 -04:00
nimlgen
6c4ddd6260
hcq skip tests when no multidev ( #6235 )
...
* hcq skip tests when no multidev
* linter
* a bit higher tinout
2024-08-22 18:27:16 +03:00
chenyu
08539f08b0
fix UOp repr with Variable in arg ( #6236 )
2024-08-22 11:06:33 -04:00
chenyu
3fc8203475
remove NEG from handwritten ast in tests ( #6234 )
...
* remove NEG from handwritten ast in tests
* test_linearizer_failures
2024-08-22 09:06:59 -04:00
chenyu
1c5ef5b793
format test_linearizer_failure ( #6231 )
...
made it easier to remove NEG
2024-08-21 21:10:56 -04:00
George Hotz
5cdec79469
simpler expand without dont_expand_args [run_process_replay] ( #6230 )
...
* simpler expand without dont_expand_args [run_process_replay]
* Revert "simpler expand without dont_expand_args [run_process_replay]"
This reverts commit 81693024c097c31e601f1a199a631e9eda0d9638.
* exclude_args
* why does that fix it
* correct fix
* _swizzle_args should be fast
* add comment
* zip is tuples
2024-08-21 17:48:45 -07:00
nimlgen
78c94abe9c
raise time limit for ci in test_profile_multidev_transfer ( #6227 )
2024-08-21 22:42:03 +03:00
gswangg
c74b318458
migrate test_linearizer.py to UOp AST, pt. 2 ( #6228 )
2024-08-21 22:16:11 +03:00
George Hotz
c3168952f0
wip: tracking pattern matcher [run_process_replay] ( #6225 )
...
* wip: tracking pattern matcher
* better
* proper dedup
* timing
* early reject
* mergable match stats
* TrackedPattenMatcher
* fix TrackedPattenMatcher
* cleanups
* clean that too
* remove early_reject
* Revert "remove early_reject"
This reverts commit dc2aef14b8f5da58f5ec9566daf252513cac394c.
* total
* sort by time
* match_stats cleanup
2024-08-21 11:57:26 -07:00