chenyu
e3105675fb
cond.where(True, False) is cond ( #7733 )
2024-11-16 09:44:17 -05:00
qazal
40ae0e9115
smaller big graph ( #7695 )
...
* start
* work
* rewrite to PRELOAD
* st is always from base
* fix aesthetics
* work
* more work
* refactor to is_forced_realize
* uh
* green?
* metaop can be image
* dont count realized
* this is the new src
* test_tiny_add passes
* work
2024-11-16 22:04:57 +08:00
qazal
f3f95ab9d9
flatten fusion upats [pr] ( #7732 )
2024-11-16 21:26:19 +08:00
qazal
ec8c5598f6
refactor to generic UPat for sourcing unrealized bufs [pr] ( #7731 )
...
* base check
* use is_scheduled
* fixup lazy
* update metadata
* match is too slow
2024-11-16 21:01:22 +08:00
ignaciosica
597a239e28
Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] ( #7725 )
...
* remove unaryops
* remove ternaryops
* remove metaops
* hotfix
* remove binaryops
* hotfix: test_pattern_matcher
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-16 20:56:56 +08:00
chenyu
22da31b223
clean up Tensor.dot ( #7728 )
...
more docs (similar to numpy) and removed many confusing `-min(n2, 2)`
2024-11-15 18:21:15 -05:00
chenyu
4338c450ac
fix max_pool2d for int tensor with padding ( #7726 )
...
padding inf messed output dtype
2024-11-15 16:22:11 -05:00
chenyu
d736ae7153
example script to show BasicTransformerBlock speed regression ( #7724 )
2024-11-15 15:48:25 -05:00
chenyu
aeb1301bab
enable a few tests that work now ( #7721 )
...
should mark the ones that are expected to work with expectedFailure, and delete and ones that are not expected to work
2024-11-15 14:30:52 -05:00
ignaciosica
fc1e123138
minor cleanup in lazy.py ( #7719 )
2024-11-15 13:48:24 -05:00
qazal
ef4f402946
add property to flag contig buffer uop [pr] ( #7716 )
2024-11-15 22:27:47 +08:00
qazal
313af6d23c
assert buffer VIEW is void [pr] ( #7715 )
2024-11-15 22:02:59 +08:00
ignaciosica
c37d142cf8
Refactor metal tc wmma kernel rendering ( #7416 )
...
* refactor metal tc wmma kernel rendering
* hotfix: bug
* hotfix: hack to avoid backlash in f-string expression
* hotfix
* hotfix: rename vars
* hotfix: moew new_line
* hotfix: cleaner wmma rendering
2024-11-15 21:23:08 +08:00
qazal
bddee26114
Ops.VALID cleanup, move recursive tests [pr] ( #7713 )
2024-11-15 20:22:46 +08:00
qazal
703a255301
use the method_cache in test_schedule [pr] ( #7712 )
...
* use the method_cache in test_schedule [pr]
* need half
2024-11-15 19:20:47 +08:00
qazal
88f760cc32
test_two_sum doesn't need del ( #7711 )
2024-11-15 18:50:08 +08:00
George Hotz
9f98f0c93a
use disassemble method for objdump [pr] ( #7708 )
2024-11-15 12:55:37 +08:00
George Hotz
9b1605eef9
Revert "objdump intel syntax ( #7605 )" ( #7707 )
...
This reverts commit 8f8e375f27 .
2024-11-15 12:13:04 +08:00
ttomsa
8f8e375f27
objdump intel syntax ( #7605 )
...
* objdump intel syntax
* test for objdump intel syntax
* add disassemble to ClangCompiler and LLVMCompiler. Use just llvm-objdump
* linter
2024-11-15 11:32:23 +08:00
chenyu
9cfc4f68c8
clean up Tensor.cat ( #7701 )
2024-11-14 13:46:02 -05:00
chenyu
888fcb3643
Tensor.shrink arg cleanup ( #7700 )
...
removed duplicated logic
2024-11-14 13:01:22 -05:00
chenyu
9fb396f660
test_ops maxpool2d -> max_pool2d ( #7696 )
...
and avgpool2d -> avg_pool2d for better grepping the tests
2024-11-14 10:39:12 -05:00
ignaciosica
1419d8e58a
assert op is not store in view ( #7679 )
...
* assert op is not store in view
* update view spec
* hotfix: nit
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-14 22:17:18 +08:00
Ahmed Harmouche
43040c0e24
add render_cast ( #7687 )
2024-11-14 18:01:29 +08:00
geohotstan
f8056a74d6
combine pad2d with pad ( #7677 )
...
* I have pad2d, I have pad, uuh~, pad2dpad~
* fix some small things
* strategically placed cast hack
* fix more
* fix more more
* tests
* periods
2024-11-14 17:56:02 +08:00
qazal
3747669ab4
post 7655 schedule line savings [pr] ( #7692 )
2024-11-14 17:20:41 +08:00
qazal
64ebaa72b5
schedule independent of lazy.py ( #7655 )
...
* make it compile
* allow allbufs
* _recursive_group starts to work
* forced_realize works
* _get_isolated_children almost works
* 80%
* 90%
* ocd behavior
* 100% for _get_isolated_children
* FUSE_CONV_BW=1 works
* this took long
* can be from buffer's arg too
* eventually i'll share these
* test_prefer_half_buffer
* FUSE_ARANGE=1 sorta
* start assign and cleanup
fix assign
* braindump
* diff reset
* --- day 3 ---
* make _recursive_group work
* very minimal groups
* BASE
* _get_isolated_children that actually works
* working version of FUSE_CONV_BW=1 and prefer_half
* FUSE_ARANGE=1 works
* fix assign
* one less problem
2024-11-14 17:01:59 +08:00
qazal
0914c2fec9
add TestLinearizerFailures test_failure_56 and test_failure_57 ( #7682 )
...
* add test_failure_56 and test_failure_57
* so it's only METAL=1
2024-11-14 12:00:33 +08:00
qazal
a87813f063
hotfix: early fold image to image cast store ( #7681 )
...
* hotfix: early fold image to image cast store
* count out meta ops
2024-11-14 11:35:59 +08:00
chenyu
e0ad083904
user ceildiv in shard and fix a typo ( #7690 )
2024-11-13 18:25:06 -05:00
chenyu
51afc3cc88
update env_vars doc on VIZ link ( #7689 )
...
existing one throws 404 because mkdocs does not allow traverse above doc root (i think?). so for now just stick the github link to it
2024-11-13 17:28:14 -05:00
chenyu
333f5f9f8b
Tensor.bitwise_not ( #7688 )
...
implemented with xor in tensor for now to not add another op. also used it in Tensor.min to fix dtype int on -2**31
2024-11-13 16:31:52 -05:00
chenyu
0423db8d00
simpler nll_loss ( #7686 )
2024-11-13 15:10:08 -05:00
chenyu
fb933b79a6
add test case for nll_loss with input > 2D ( #7685 )
...
* failed test case for nll_loss with input > 2D
* fixed
* add more
2024-11-13 14:34:07 -05:00
geohotstan
9c41c376d3
add Tensor.nll_loss ( #7683 )
...
* move nll_loss to new branch
* make nll_loss examples practical
* self *is*
* add to docs
* small
2024-11-13 13:12:13 -05:00
chenyu
3c6fe4b79a
fix Tensor.bitwise_and and Tensor.bitwise_or to support bool ( #7684 )
2024-11-13 13:10:39 -05:00
chenyu
3d82f8e340
simpler rand_like ( #7680 )
2024-11-13 12:28:41 -05:00
Roelof van Dijk
e75a855f51
refactor: efficient syntax [pr] ( #7673 )
2024-11-13 11:08:48 -05:00
Roelof van Dijk
433ebecee7
refactor: double if statement [pr] ( #7674 )
2024-11-13 11:06:59 -05:00
James
d4e4a084a1
fix: Tensor min function for unsigned ints ( #7675 )
...
* add failing tests for uint8 `min()`
* fix unsigned data type min()
* fix test data
* fix whitespace
---------
Co-authored-by: rezaarezvan <reza@rezvan.xyz >
Co-authored-by: Jamesb <experimentallearning0@gmail.com >
2024-11-13 11:04:27 -05:00
chenyu
d1dfd598a2
assert specifying device to rand_like a multi tensor ( #7678 )
...
* assert specifying device to rand_like a multi tensor
raise RuntimeError instead of dropping it silently
* fix that
2024-11-13 10:24:40 -05:00
chenyu
51432bfbff
add rand_like test case with device specified ( #7663 )
...
in single device or copied multi case, device is applied. but for sharded case the device is silently ignored now. maybe similar to rand we just don't allow tuple device in rand_like
2024-11-13 09:32:55 -05:00
Reza Rezvan
23363dee55
Add: failing tests for uint8 min() ( #7669 )
...
* add failing tests for uint8 `min()`
* mark as expected failure
2024-11-13 22:12:53 +08:00
qazal
29508504ea
uop style prefer small dtype + cleanups [pr] ( #7671 )
...
* just this
* space
* typing 2
2024-11-13 21:32:34 +08:00
qazal
e84d089ef1
delete ReduceOps, only use REDUCE_AXIS ( #7667 )
2024-11-13 19:04:27 +08:00
qazal
217c006103
buffer access on UOp [pr] ( #7665 )
...
* add .buffer access on uop
* rename to buf_uop
* start smaller
* ptr != buffer!!
2024-11-13 17:04:19 +08:00
qazal
5da149d23c
uop can have base [pr] ( #7666 )
2024-11-13 16:53:49 +08:00
qazal
ca99c67d78
refactors from the delete lazy diff [pr] ( #7664 )
...
* dedup parent shapetrackers [pr]
* arg -> dtype
* move to ops
* arg
2024-11-13 16:23:53 +08:00
chenyu
e6cfaaa496
metal benchmark JIT=2 -> JIT=1 ( #7661 )
2024-11-12 22:55:27 -05:00
chenyu
4c5f7ddf1f
flux set model path in args ( #7660 )
...
in addition to default downloading through fetch, add an arg to pass model path directly
2024-11-12 22:11:40 -05:00