Commit Graph

416 Commits

Author SHA1 Message Date
George Hotz
2ed3acd767 toposort is a function [pr] (#10004) 2025-04-23 16:25:03 +01:00
qazal
f4ec57baff new schedule linearizer enqueues KERNEL UOps [pr] (#9993)
* new schedule linearizer enqueues kernels [pr]

* no defaultdict

* diff

* minor
2025-04-23 05:17:58 +08:00
qazal
6cb2d18c03 refactor schedule linearize to defaultdict [pr] (#9984)
* refactor schedule linearize to defaultdict [pr]

* skip that

* don't need .get
2025-04-23 00:00:23 +08:00
qazal
bbc324f5dc remove CAST_AFTER_EXPAND (#9980) 2025-04-22 21:06:11 +08:00
qazal
7b55846e08 prep STORE UOp creation for multi output [pr] (#9975)
* prep STORE UOp creation for multi output [pr]

* test_multioutput_ast
2025-04-22 19:34:52 +08:00
qazal
1cf4e24ca5 fix kernelize usage with pm_gradient (#9953)
* fix kernelize usage with pm_gradient

* remove that
2025-04-22 17:26:05 +08:00
qazal
36ed3c3253 fix kernelize with VIEW children (#9961) 2025-04-21 23:38:46 +08:00
qazal
e8910540f6 Kernelize can be called multiple times on a Tensor (#9949)
* Kernelize can be called multiple times on a Tensor

* add (failing) test_kernelize_bw
2025-04-21 06:28:47 +08:00
qazal
e20ef7196a Tensor.kernelize (#9845)
* add kernelize

* remove that

* kernelize returns self

* update abstractions2.py

* kernelize in test_schedule

* temp: assert BUFFER_VIEW's existence

* ASSIGN must have a buffer or subbuffer target

* assert and shrink

* fix

* padded setitem

* var

* toposort once

* extra

* base_buffer

* end with BUFFER_VIEW

* setitem for disk

* test_setitem_becomes_subbuffer

* mul slice test

* torch backend fix 1

* non-deterministic

* keep subbuffer
2025-04-20 20:53:49 +08:00
qazal
b58decac0c fix diamond assigns before mapping tensors UOps to assigns (#9855)
* keep tensor_map until diamond assign fixup

* ctx
2025-04-18 14:17:43 +03:00
qazal
f13e9cf2d9 move view_left to grouper.py + tiny reorders [pr] (#9780)
* move view_left to grouper.py [pr]

* reorder grouper

* test_schedule
2025-04-08 15:39:28 +08:00
qazal
9963bb51e0 grouper tests cleanups [pr] (#9777)
* grouper tests cleanups [pr]

* viz

* tuple

* whitespace
2025-04-08 12:33:11 +08:00
qazal
891322fd51 split into grouper.py (#9768)
* split into grouper.py

* update tests

* reorder
2025-04-07 18:40:59 +08:00
qazal
ae688e4103 simple failing test for scheduling parallel reduce [pr] (#9501)
* simple failing test for scheduling parallel reduce [pr]

* atol
2025-03-19 10:52:13 +08:00
George Hotz
117b7a16ef VALIDATE_WITH_CPU [pr] (#9488)
* VALIDATE_WITH_CPU [pr]

* fix test
2025-03-18 15:15:04 +08:00
qazal
e03c0aacf2 more explicit DONT_PUSH_VIEWS [pr] (#9479)
* more explicit DONT_PUSH_VIEWS [pr]

* update tests to not handcode ast

* lint

* test_recursive_swizzle and test_simple_store_reshape
2025-03-17 20:43:21 +08:00
qazal
3b00a778ba fix view_left for unsafe pad ops [pr] (#9478) 2025-03-17 19:02:02 +08:00
qazal
813f713edc merge_views for buffer ops + create valids last (#9472)
* merge_views for buffer ops + create valids last

* view.arg

* pass
2025-03-17 17:15:44 +08:00
qazal
bd1f71c1e2 simple failing test for extra ops in VALID [pr] (#9474)
* simple failing test for extra valids [pr]

* this has DEBUG=4
2025-03-17 17:02:40 +08:00
qazal
90ffa9bd45 swizzle without buffer ops try 2 [pr] (#9427)
* add DONT_PUSH_VIEWS to matchers

* swizzle without buffer ops try 2 [pr]

* swizzle reduceop

* simple failing test

* fix failing test

* s/on/for
2025-03-13 10:00:40 +01:00
qazal
59dfb234eb replace hardcoded ast with tensors in TestSwizzle [pr] (#9401) 2025-03-10 19:33:57 +01:00
qazal
a1f41fadf6 test_schedule cleanups + add DONT_GROUP_REDUCES [pr] (#9392)
* test_schedule cleanups + add DONT_GROUP_REDUCES [pr]

* replace with test_swizzle_reduceop

* delete duplicate tests

* test_allow_push_permutes

* one kernel tests
2025-03-09 15:01:08 +01:00
qazal
286b480f82 do not replace assign with the offset buffer [pr] (#9387) 2025-03-08 11:57:44 +01:00
qazal
0d2762c010 prep refactor for adding buffer ops last [pr] (#9383)
* prep refactor for adding buffer ops last [pr]

* freeze buffers

* add swizzle_reduceop

* shape for reduceop_view_right

* simpler elementwise_view_right

* add shapetracker to const

* only const

* from process replay
2025-03-08 08:00:14 +01:00
qazal
23084fd850 merge merge_views and remove_movement_ops [pr] (#9333)
* merge merge_views and remove_movement_ops [pr]

* fix that assert
2025-03-03 12:38:59 +01:00
qazal
cdf66cc67f test: recompute expanded CAST (#9286)
* those views should merge

* diff cleanup

* gpu

* put it behind CAST_AFTER_EXPAND
2025-02-27 19:22:17 +01:00
qazal
e162aa862d is_realized only if buffer is allocated (#9253)
* is_realized only if the buffer is allocated

* fix the image check too

* assert test_lil_model after ExecItems run
2025-02-26 08:58:08 +01:00
George Hotz
3f4eb9006a test for device mismatch [pr] (#9250)
* test for device mismatch [pr]

* fix bert
2025-02-26 13:06:33 +08:00
qazal
cbfe95d306 bring cast before view back (#9230)
* bring cast before view back

* tune it to only trigger on expands

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-25 01:50:39 +02:00
George Hotz
c9493e41a6 reorder expand (#9051)
* reorder expand

* symbolic ops needs resolve here

* s/arg/st + whitespace

* viz

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-24 13:55:47 +01:00
qazal
14aa2395d0 allow VIEW(BUFFER) in Tensor UOps [pr] (#9210)
* allow VIEW(BUFFER) in Tensor UOps [pr]

* still reshapes

* update becomes_map tests

* bring copy folder to the scheduler

* lint

* only sgd left

* optimizer assign

* 13 kernels

* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
qazal
2eab8021fb remove inputs+outputs attributes from ScheduleItem [pr] (#9192)
* remove inputs/outputs from ScheduleItem

* fix test_linearizer

* fix test_conv_shapetracker

* fix test_schedule + lint

* test_image_dtype + multitensor + search
2025-02-21 13:48:11 +01:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
George Hotz
1bf66d62cf symbolic gets its own file [pr] (#9132) 2025-02-17 18:55:21 +08:00
qazal
2b9ce1235a simple failing case for reorder expand + keep views in tensor_map [pr] (#9057) 2025-02-13 11:22:55 +01:00
Ahmed Harmouche
916d5e7f08 WebGPU f16 support (f16 bounty part 2) (#8653)
* WebGPU f16 support

* Don't enable f16 yet

* dtype tests passing after bitcast fix

* Maybe all WebGPU green?

* Require shader-f16 in examples

* Minor wgsl touchup

* 1 line shorter

* Simpler

* Add transcendetal support

* log2 nan location mismatch on Vulkan

* Nan skips
2025-02-12 19:46:53 +08:00
qazal
cd77e51810 fix tensor realization bug in #8975 (#8984)
* fix tensor realization bug in #8975

* that's a reshape now

* work

* works

* give those tests better names

* test when multiple mops result in the same ShapeTracker

* test_become_existing_buf_complex is enough

* that too
2025-02-10 13:51:30 +01:00
qazal
fd9f9ec772 realized base tensors become RESHAPE(BUFFER) [pr] (#8994) 2025-02-10 10:17:54 +01:00
qazal
7eba5fb413 Tensor.empty is RESHAPE(BUFFER) (#8987)
* empty is RESHAPE(BUFFER)

* eh

* add test_empty_buf

* can we unsupport this

* linter

* Revert "can we unsupport this"

This reverts commit 0f71e1aadb.
2025-02-09 18:42:51 +01:00
qazal
55351ebb31 minimal failing test for #8975 [pr] (#8982) 2025-02-09 14:10:37 +01:00
chenyu
cfd28517df move pow folding tests to test_schedule [pr] (#8955)
not really belongs to test_const_folding
2025-02-07 12:51:43 -05:00
chenyu
488200f16c move more pow const to rewrite (#8916)
* move more pow const to rewrite

one less use of _to_const_val

* fix
2025-02-05 20:30:12 -05:00
qazal
af4f9d1aa9 use matchers to verify AST shape [pr] (#8828)
* use matchers to verify kernel AST [pr]

* work

* use swizzle_cnt

* add comment

* imports

* modified_ast comment

* brief
2025-01-31 09:17:42 +02:00
George Hotz
643c09a6c6 tensor uop spec should be in spec.py [pr] (#8827)
* tensor uop spec should be in spec.py [pr]

* err, spec.py

* print uops can stay
2025-01-31 13:54:04 +08:00
qazal
a78f0f85d3 remove support for checking tensor uops in FUSE_ARANGE [pr] (#8829) 2025-01-31 07:48:28 +02:00
qazal
1fce864a6d delete multi output support (#8822)
* delete multioutput for now

* test_schedule

* test_assign too

* linter

* 515 for sd

* update tests and ctx

* update that assign check
2025-01-30 22:45:50 -05:00
qazal
530961f7d5 realized only exists on base (#8815)
* realized only exists on base [pr]

* shorter

* update that too
2025-01-30 23:02:25 +02:00
qazal
5643429c17 give BUFFER UOp a ShapeTracker [pr] (#8811)
* give BUFFER UOp a ShapeTracker [pr]

* move that

* update contiguous

* test_advancedindex should use movement ops
2025-01-30 22:33:32 +02:00
qazal
ba17786068 do not construct unmasked VALID (#8759)
* new lines that exist in codegen/ops

* update tests

* update sops.gz (13071 -> 13070 asts)

* fix viz too

* remove that TODO

* diff pruning

* mask assert + device

* work

* diff pruning

* re: fix viz too

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-28 20:51:21 +02:00
qazal
3417bc1814 fix ShapeTracker spec for const [pr] (#8791) 2025-01-28 19:53:36 +02:00