Commit Graph

6837 Commits

Author SHA1 Message Date
chenyu
55707fd00d fix passing sum_acc_dtype="" to Tensor.sum should fail (#7748) 2024-11-17 10:58:41 -05:00
chenyu
f18296e23c simpler Tensor._reduce (#7747) 2024-11-17 09:20:00 -05:00
qazal
0cc8de2f15 reverse map buf_uops [pr] (#7743) 2024-11-17 21:29:56 +08:00
chenyu
0292ae7508 Tensor.meshgrid cleanup (#7741) 2024-11-17 08:26:53 -05:00
qazal
40642cb9ea to_uop split paths part 2 [pr] (#7746) 2024-11-17 21:07:28 +08:00
qazal
99024b922b to_uop one path for all ops part 1 (#7745)
* flat meta ops

* one path for everything

* add tests

* view is always base

* just run
2024-11-17 20:12:44 +08:00
qazal
eeb222f98b add UOp.new_buffer [pr] (#7742) 2024-11-17 16:44:52 +08:00
chenyu
a15a900415 fix Tensor.meshgrid for 1D input and check indexing (#7740) 2024-11-16 23:39:30 -05:00
geohotstan
72a41095bc add Tensor.meshgrid (#7714)
* initial implementation and test

* some other places that can use meshgrid

* revert the onnx_ops change

* add to docs

* revert interpolate too

* update

* improve edge case test

* might as well test grad

* add to test can improve docs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-16 23:06:47 -05:00
mesozoic-egg
1a5e896bd4 [pr] Have PTX share code with LLVM (#7635)
* integrate into ops_cuda

* remove debugging stuff

* lint fix

* mypy fixes

* swap ptx.py

* edit

* simplify wmma

* wip

* space

* refactor

* sync the ops removal changes

* refactor

* rename variables

---------

Co-authored-by: judy <mesozoic.egg@proton.mail>
2024-11-17 10:53:56 +08:00
chenyu
f2f7384b67 _resolve_dim cleanup (#7736)
no duplicated self.ndim+outer
2024-11-16 11:05:39 -05:00
chenyu
e777211a00 Tensor.repeat cleanup (#7735)
flatten instead of double for loop comprehension
2024-11-16 10:43:45 -05:00
chenyu
f1efd84c92 fix repeat_interleave with negative dim (#7734) 2024-11-16 10:15:29 -05:00
chenyu
e3105675fb cond.where(True, False) is cond (#7733) 2024-11-16 09:44:17 -05:00
qazal
40ae0e9115 smaller big graph (#7695)
* start

* work

* rewrite to PRELOAD

* st is always from base

* fix aesthetics

* work

* more work

* refactor to is_forced_realize

* uh

* green?

* metaop can be image

* dont count realized

* this is the new src

* test_tiny_add passes

* work
2024-11-16 22:04:57 +08:00
qazal
f3f95ab9d9 flatten fusion upats [pr] (#7732) 2024-11-16 21:26:19 +08:00
qazal
ec8c5598f6 refactor to generic UPat for sourcing unrealized bufs [pr] (#7731)
* base check

* use is_scheduled

* fixup lazy

* update metadata

* match is too slow
2024-11-16 21:01:22 +08:00
ignaciosica
597a239e28 Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725)
* remove unaryops

* remove ternaryops

* remove metaops

* hotfix

* remove binaryops

* hotfix: test_pattern_matcher

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-16 20:56:56 +08:00
chenyu
22da31b223 clean up Tensor.dot (#7728)
more docs (similar to numpy) and removed many confusing  `-min(n2, 2)`
2024-11-15 18:21:15 -05:00
chenyu
4338c450ac fix max_pool2d for int tensor with padding (#7726)
padding inf messed output dtype
2024-11-15 16:22:11 -05:00
chenyu
d736ae7153 example script to show BasicTransformerBlock speed regression (#7724) 2024-11-15 15:48:25 -05:00
chenyu
aeb1301bab enable a few tests that work now (#7721)
should mark the ones that are expected to work with expectedFailure, and delete and ones that are not expected to work
2024-11-15 14:30:52 -05:00
ignaciosica
fc1e123138 minor cleanup in lazy.py (#7719) 2024-11-15 13:48:24 -05:00
qazal
ef4f402946 add property to flag contig buffer uop [pr] (#7716) 2024-11-15 22:27:47 +08:00
qazal
313af6d23c assert buffer VIEW is void [pr] (#7715) 2024-11-15 22:02:59 +08:00
ignaciosica
c37d142cf8 Refactor metal tc wmma kernel rendering (#7416)
* refactor metal tc wmma kernel rendering

* hotfix: bug

* hotfix: hack to avoid backlash in f-string expression

* hotfix

* hotfix: rename vars

* hotfix: moew new_line

* hotfix: cleaner wmma rendering
2024-11-15 21:23:08 +08:00
qazal
bddee26114 Ops.VALID cleanup, move recursive tests [pr] (#7713) 2024-11-15 20:22:46 +08:00
qazal
703a255301 use the method_cache in test_schedule [pr] (#7712)
* use the method_cache in test_schedule [pr]

* need half
2024-11-15 19:20:47 +08:00
qazal
88f760cc32 test_two_sum doesn't need del (#7711) 2024-11-15 18:50:08 +08:00
George Hotz
9f98f0c93a use disassemble method for objdump [pr] (#7708) 2024-11-15 12:55:37 +08:00
George Hotz
9b1605eef9 Revert "objdump intel syntax (#7605)" (#7707)
This reverts commit 8f8e375f27.
2024-11-15 12:13:04 +08:00
ttomsa
8f8e375f27 objdump intel syntax (#7605)
* objdump intel syntax

* test for objdump intel syntax

* add disassemble to ClangCompiler and LLVMCompiler. Use just llvm-objdump

* linter
2024-11-15 11:32:23 +08:00
chenyu
9cfc4f68c8 clean up Tensor.cat (#7701) 2024-11-14 13:46:02 -05:00
chenyu
888fcb3643 Tensor.shrink arg cleanup (#7700)
removed duplicated logic
2024-11-14 13:01:22 -05:00
chenyu
9fb396f660 test_ops maxpool2d -> max_pool2d (#7696)
and avgpool2d -> avg_pool2d for better grepping the tests
2024-11-14 10:39:12 -05:00
ignaciosica
1419d8e58a assert op is not store in view (#7679)
* assert op is not store in view

* update view spec

* hotfix: nit

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-14 22:17:18 +08:00
Ahmed Harmouche
43040c0e24 add render_cast (#7687) 2024-11-14 18:01:29 +08:00
geohotstan
f8056a74d6 combine pad2d with pad (#7677)
* I have pad2d, I have pad, uuh~, pad2dpad~

* fix some small things

* strategically placed cast hack

* fix more

* fix more more

* tests

* periods
2024-11-14 17:56:02 +08:00
qazal
3747669ab4 post 7655 schedule line savings [pr] (#7692) 2024-11-14 17:20:41 +08:00
qazal
64ebaa72b5 schedule independent of lazy.py (#7655)
* make it compile

* allow allbufs

* _recursive_group starts to work

* forced_realize works

* _get_isolated_children almost works

* 80%

* 90%

* ocd behavior

* 100% for _get_isolated_children

* FUSE_CONV_BW=1 works

* this took long

* can be from buffer's arg too

* eventually i'll share these

* test_prefer_half_buffer

* FUSE_ARANGE=1 sorta

* start assign and cleanup

fix assign

* braindump

* diff reset

* --- day 3 ---

* make _recursive_group work

* very minimal groups

* BASE

* _get_isolated_children that actually works

* working version of FUSE_CONV_BW=1 and prefer_half

* FUSE_ARANGE=1 works

* fix assign

* one less problem
2024-11-14 17:01:59 +08:00
qazal
0914c2fec9 add TestLinearizerFailures test_failure_56 and test_failure_57 (#7682)
* add test_failure_56 and test_failure_57

* so it's only METAL=1
2024-11-14 12:00:33 +08:00
qazal
a87813f063 hotfix: early fold image to image cast store (#7681)
* hotfix: early fold image to image cast store

* count out meta ops
2024-11-14 11:35:59 +08:00
chenyu
e0ad083904 user ceildiv in shard and fix a typo (#7690) 2024-11-13 18:25:06 -05:00
chenyu
51afc3cc88 update env_vars doc on VIZ link (#7689)
existing one throws 404 because mkdocs does not allow traverse above doc root (i think?). so for now just stick the github link to it
2024-11-13 17:28:14 -05:00
chenyu
333f5f9f8b Tensor.bitwise_not (#7688)
implemented with xor in tensor for now to not add another op. also used it in Tensor.min to fix dtype int on -2**31
2024-11-13 16:31:52 -05:00
chenyu
0423db8d00 simpler nll_loss (#7686) 2024-11-13 15:10:08 -05:00
chenyu
fb933b79a6 add test case for nll_loss with input > 2D (#7685)
* failed test case for nll_loss with input > 2D

* fixed

* add more
2024-11-13 14:34:07 -05:00
geohotstan
9c41c376d3 add Tensor.nll_loss (#7683)
* move nll_loss to new branch

* make nll_loss examples practical

* self *is*

* add to docs

* small
2024-11-13 13:12:13 -05:00
chenyu
3c6fe4b79a fix Tensor.bitwise_and and Tensor.bitwise_or to support bool (#7684) 2024-11-13 13:10:39 -05:00
chenyu
3d82f8e340 simpler rand_like (#7680) 2024-11-13 12:28:41 -05:00