chenyu
55707fd00d
fix passing sum_acc_dtype="" to Tensor.sum should fail ( #7748 )
2024-11-17 10:58:41 -05:00
chenyu
f18296e23c
simpler Tensor._reduce ( #7747 )
2024-11-17 09:20:00 -05:00
qazal
0cc8de2f15
reverse map buf_uops [pr] ( #7743 )
2024-11-17 21:29:56 +08:00
chenyu
0292ae7508
Tensor.meshgrid cleanup ( #7741 )
2024-11-17 08:26:53 -05:00
qazal
40642cb9ea
to_uop split paths part 2 [pr] ( #7746 )
2024-11-17 21:07:28 +08:00
qazal
99024b922b
to_uop one path for all ops part 1 ( #7745 )
...
* flat meta ops
* one path for everything
* add tests
* view is always base
* just run
2024-11-17 20:12:44 +08:00
qazal
eeb222f98b
add UOp.new_buffer [pr] ( #7742 )
2024-11-17 16:44:52 +08:00
chenyu
a15a900415
fix Tensor.meshgrid for 1D input and check indexing ( #7740 )
2024-11-16 23:39:30 -05:00
geohotstan
72a41095bc
add Tensor.meshgrid ( #7714 )
...
* initial implementation and test
* some other places that can use meshgrid
* revert the onnx_ops change
* add to docs
* revert interpolate too
* update
* improve edge case test
* might as well test grad
* add to test can improve docs
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-16 23:06:47 -05:00
mesozoic-egg
1a5e896bd4
[pr] Have PTX share code with LLVM ( #7635 )
...
* integrate into ops_cuda
* remove debugging stuff
* lint fix
* mypy fixes
* swap ptx.py
* edit
* simplify wmma
* wip
* space
* refactor
* sync the ops removal changes
* refactor
* rename variables
---------
Co-authored-by: judy <mesozoic.egg@proton.mail >
2024-11-17 10:53:56 +08:00
chenyu
f2f7384b67
_resolve_dim cleanup ( #7736 )
...
no duplicated self.ndim+outer
2024-11-16 11:05:39 -05:00
chenyu
e777211a00
Tensor.repeat cleanup ( #7735 )
...
flatten instead of double for loop comprehension
2024-11-16 10:43:45 -05:00
chenyu
f1efd84c92
fix repeat_interleave with negative dim ( #7734 )
2024-11-16 10:15:29 -05:00
chenyu
e3105675fb
cond.where(True, False) is cond ( #7733 )
2024-11-16 09:44:17 -05:00
qazal
40ae0e9115
smaller big graph ( #7695 )
...
* start
* work
* rewrite to PRELOAD
* st is always from base
* fix aesthetics
* work
* more work
* refactor to is_forced_realize
* uh
* green?
* metaop can be image
* dont count realized
* this is the new src
* test_tiny_add passes
* work
2024-11-16 22:04:57 +08:00
qazal
f3f95ab9d9
flatten fusion upats [pr] ( #7732 )
2024-11-16 21:26:19 +08:00
qazal
ec8c5598f6
refactor to generic UPat for sourcing unrealized bufs [pr] ( #7731 )
...
* base check
* use is_scheduled
* fixup lazy
* update metadata
* match is too slow
2024-11-16 21:01:22 +08:00
ignaciosica
597a239e28
Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] ( #7725 )
...
* remove unaryops
* remove ternaryops
* remove metaops
* hotfix
* remove binaryops
* hotfix: test_pattern_matcher
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-16 20:56:56 +08:00
chenyu
22da31b223
clean up Tensor.dot ( #7728 )
...
more docs (similar to numpy) and removed many confusing `-min(n2, 2)`
2024-11-15 18:21:15 -05:00
chenyu
4338c450ac
fix max_pool2d for int tensor with padding ( #7726 )
...
padding inf messed output dtype
2024-11-15 16:22:11 -05:00
chenyu
d736ae7153
example script to show BasicTransformerBlock speed regression ( #7724 )
2024-11-15 15:48:25 -05:00
chenyu
aeb1301bab
enable a few tests that work now ( #7721 )
...
should mark the ones that are expected to work with expectedFailure, and delete and ones that are not expected to work
2024-11-15 14:30:52 -05:00
ignaciosica
fc1e123138
minor cleanup in lazy.py ( #7719 )
2024-11-15 13:48:24 -05:00
qazal
ef4f402946
add property to flag contig buffer uop [pr] ( #7716 )
2024-11-15 22:27:47 +08:00
qazal
313af6d23c
assert buffer VIEW is void [pr] ( #7715 )
2024-11-15 22:02:59 +08:00
ignaciosica
c37d142cf8
Refactor metal tc wmma kernel rendering ( #7416 )
...
* refactor metal tc wmma kernel rendering
* hotfix: bug
* hotfix: hack to avoid backlash in f-string expression
* hotfix
* hotfix: rename vars
* hotfix: moew new_line
* hotfix: cleaner wmma rendering
2024-11-15 21:23:08 +08:00
qazal
bddee26114
Ops.VALID cleanup, move recursive tests [pr] ( #7713 )
2024-11-15 20:22:46 +08:00
qazal
703a255301
use the method_cache in test_schedule [pr] ( #7712 )
...
* use the method_cache in test_schedule [pr]
* need half
2024-11-15 19:20:47 +08:00
qazal
88f760cc32
test_two_sum doesn't need del ( #7711 )
2024-11-15 18:50:08 +08:00
George Hotz
9f98f0c93a
use disassemble method for objdump [pr] ( #7708 )
2024-11-15 12:55:37 +08:00
George Hotz
9b1605eef9
Revert "objdump intel syntax ( #7605 )" ( #7707 )
...
This reverts commit 8f8e375f27 .
2024-11-15 12:13:04 +08:00
ttomsa
8f8e375f27
objdump intel syntax ( #7605 )
...
* objdump intel syntax
* test for objdump intel syntax
* add disassemble to ClangCompiler and LLVMCompiler. Use just llvm-objdump
* linter
2024-11-15 11:32:23 +08:00
chenyu
9cfc4f68c8
clean up Tensor.cat ( #7701 )
2024-11-14 13:46:02 -05:00
chenyu
888fcb3643
Tensor.shrink arg cleanup ( #7700 )
...
removed duplicated logic
2024-11-14 13:01:22 -05:00
chenyu
9fb396f660
test_ops maxpool2d -> max_pool2d ( #7696 )
...
and avgpool2d -> avg_pool2d for better grepping the tests
2024-11-14 10:39:12 -05:00
ignaciosica
1419d8e58a
assert op is not store in view ( #7679 )
...
* assert op is not store in view
* update view spec
* hotfix: nit
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-14 22:17:18 +08:00
Ahmed Harmouche
43040c0e24
add render_cast ( #7687 )
2024-11-14 18:01:29 +08:00
geohotstan
f8056a74d6
combine pad2d with pad ( #7677 )
...
* I have pad2d, I have pad, uuh~, pad2dpad~
* fix some small things
* strategically placed cast hack
* fix more
* fix more more
* tests
* periods
2024-11-14 17:56:02 +08:00
qazal
3747669ab4
post 7655 schedule line savings [pr] ( #7692 )
2024-11-14 17:20:41 +08:00
qazal
64ebaa72b5
schedule independent of lazy.py ( #7655 )
...
* make it compile
* allow allbufs
* _recursive_group starts to work
* forced_realize works
* _get_isolated_children almost works
* 80%
* 90%
* ocd behavior
* 100% for _get_isolated_children
* FUSE_CONV_BW=1 works
* this took long
* can be from buffer's arg too
* eventually i'll share these
* test_prefer_half_buffer
* FUSE_ARANGE=1 sorta
* start assign and cleanup
fix assign
* braindump
* diff reset
* --- day 3 ---
* make _recursive_group work
* very minimal groups
* BASE
* _get_isolated_children that actually works
* working version of FUSE_CONV_BW=1 and prefer_half
* FUSE_ARANGE=1 works
* fix assign
* one less problem
2024-11-14 17:01:59 +08:00
qazal
0914c2fec9
add TestLinearizerFailures test_failure_56 and test_failure_57 ( #7682 )
...
* add test_failure_56 and test_failure_57
* so it's only METAL=1
2024-11-14 12:00:33 +08:00
qazal
a87813f063
hotfix: early fold image to image cast store ( #7681 )
...
* hotfix: early fold image to image cast store
* count out meta ops
2024-11-14 11:35:59 +08:00
chenyu
e0ad083904
user ceildiv in shard and fix a typo ( #7690 )
2024-11-13 18:25:06 -05:00
chenyu
51afc3cc88
update env_vars doc on VIZ link ( #7689 )
...
existing one throws 404 because mkdocs does not allow traverse above doc root (i think?). so for now just stick the github link to it
2024-11-13 17:28:14 -05:00
chenyu
333f5f9f8b
Tensor.bitwise_not ( #7688 )
...
implemented with xor in tensor for now to not add another op. also used it in Tensor.min to fix dtype int on -2**31
2024-11-13 16:31:52 -05:00
chenyu
0423db8d00
simpler nll_loss ( #7686 )
2024-11-13 15:10:08 -05:00
chenyu
fb933b79a6
add test case for nll_loss with input > 2D ( #7685 )
...
* failed test case for nll_loss with input > 2D
* fixed
* add more
2024-11-13 14:34:07 -05:00
geohotstan
9c41c376d3
add Tensor.nll_loss ( #7683 )
...
* move nll_loss to new branch
* make nll_loss examples practical
* self *is*
* add to docs
* small
2024-11-13 13:12:13 -05:00
chenyu
3c6fe4b79a
fix Tensor.bitwise_and and Tensor.bitwise_or to support bool ( #7684 )
2024-11-13 13:10:39 -05:00
chenyu
3d82f8e340
simpler rand_like ( #7680 )
2024-11-13 12:28:41 -05:00