Commit Graph

263 Commits

Author SHA1 Message Date
qazal
8e5bd0cd7a fix buffer init and skip test_swizzle_failure_permute [pr] (#8732)
* fix buffer init and skip test_swizzle_failure_permute [pr]

* replace preload with just load

* add
2025-01-23 17:21:38 +02:00
qazal
07ec99001a keep VIEW in big_sink + copy of buffer view spec [pr] (#8727)
* keep views in sink [pr]

* tests

* things from the gpt2 bug
2025-01-23 11:29:30 +02:00
qazal
e3d1464ba4 move assign preload out of schedule item [pr] (#8710)
* move assign preload out of schedule item [pr]

* fix that
2025-01-22 12:43:57 +02:00
qazal
d6bf1feaab remove the "no copy" line from copy_to_device (#8702)
* delete the no copy one

* add tests
2025-01-21 17:09:33 +02:00
qazal
f0d424ecdf Tensor UOps can become a buffer or const after scheduling (#8698)
* spec

* work

* update test_viewed_consts_do_not_realize

* remove
2025-01-21 12:33:19 +02:00
qazal
e2008c98c3 allow symbolic shape in tensor const parents [pr] (#8699) 2025-01-21 12:01:25 +02:00
qazal
66ac0087e8 more high level contiguous tests + scheduler deletions [pr] (#8695)
* delete those

* move the upat too

* rename ops_folding to just sym

* keep that
2025-01-21 01:52:58 +02:00
qazal
08eb1f1f56 simplify tensors before scheduling [pr] (#8580)
* delete forced_realize

* put that back

* work

* remove forced_realize

* expectedFailures

* contiguous(buffer)

* multi

* expectedFailures

* cleaner create_subbuffer

* more comments

* remove that

* note

* realizes

* work

* one upat and image is back

* remove

* cleaner

* fix test_complex_backward for now

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-01-20 23:42:42 +02:00
chenyu
679b1ad058 move softmax upcast to after subtracting max (#8684)
* move softmax upcast to after subtracting max

max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax

* skipUnless half
2025-01-20 12:16:32 -05:00
qazal
9e55495b4d fold double contiguous [pr] (#8687) 2025-01-20 14:38:33 +02:00
qazal
ed63ff2372 Remove contiguous on buffer (#8676)
* remove contiguous on buffer

* spec

* make things that can't be images not images
2025-01-20 13:48:33 +02:00
George Hotz
168c16646a change create_schedule_with_vars api to big_sink [pr] (#8677) 2025-01-19 13:30:26 -08:00
chenyu
beba490ba8 update mask in scaled_dot_product_attention (#8674)
built is_causal mask with ones_like and start with boolean, and reversed the mask -inf order
2025-01-19 15:19:23 -05:00
chenyu
5842ee56c6 raise if attn_mask is set when is_causal=True in sdpa [pr] (#8675)
matches torch, also fixed incorrect usage in tests
2025-01-19 12:55:04 -05:00
qazal
2faf8774fe replace DEVICE of CONST after copy folding (#8673) 2025-01-19 11:33:39 -05:00
qazal
d957a4f108 add tests for div buffer collapsing in the scheduler [pr] (#8671)
* add tests for mul/div buffer collapsing in the scheduler [pr]

* lint

* merge with test_linearizer's version of this

* 4*3
2025-01-18 14:15:29 -05:00
qazal
2b7db9b45d delete unused cast/bitcast lines from ops.py [pr] (#8651)
* move cast and bitcast out

* more deletion of bitcast arg

* fix test_bitcast_fuses

* update tests

* work
2025-01-17 03:04:18 -05:00
qazal
81a84aa85a remove is_unrealized_unmasked_const [pr] (#8644) 2025-01-16 05:27:47 -05:00
qazal
a1f70ce7d0 only use BUFFER_VIEW in disk [pr] (#8629)
* only use BUFFER_VIEW in disk [pr]

* delete can_view

* BUFFER_VIEW op on DISK

* remove that allow_buffer_view=False

* notes

* bitcast is a low-level op too

* this passes on AMD and LLVM
2025-01-15 12:34:15 -05:00
George Hotz
504ad08e73 hotfix: add test_example_matmul_same 2025-01-14 19:03:17 -08:00
George Hotz
bfbe81df71 remove cast before view (#8613)
* remove cast before view

* greener

* indexing

* that passes too

* openpilot too

* ack

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-01-14 15:04:58 -05:00
qazal
ae2229d727 assert kernel buffer limit at compile time [pr] (#8595)
* remove the BUF_LIMIT assert

* skip the base one
2025-01-13 16:32:07 -05:00
qazal
7562cc0399 better test for reduce swizzle + don't use double dtype [pr] (#8586)
* better test_permute_rewrite

* use float32
2025-01-13 05:02:21 -05:00
qazal
cff1ee9038 add SINK folding from the tensor_map branch [pr] (#8562)
* delete is_constant from the scheduler

* add sink folding

* always give BUFFER uops Buffers [pr]

* spec for view, var (bind) and const

* add test_buffer_only_after_realize

* work

* 3 lines

* more work
2025-01-12 03:39:34 -05:00
qazal
87cbff3ac0 always give BUFFER uops Buffers [pr] (#8572)
* always give BUFFER uops Buffers [pr]

* add test_buffer_only_after_realize
2025-01-11 23:17:09 +02:00
chenyu
d09897c2aa allow double copy [pr] (#8559)
fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark
2025-01-10 18:21:01 -05:00
qazal
2fd068ffc0 delete empty op (#8544)
* simple delete EMPTY op

* there's no schedule for empty
2025-01-09 14:10:15 -05:00
qazal
f6eb0574f2 start tests for putting the tensor graph in a single kernel [pr] (#8542)
* start tests for putting the tensor graph in a single kernel [pr]

* parallel actually

* better view_left test

* test a softmax

* put all that in sym
2025-01-09 13:33:21 -05:00
qazal
947de23cac add VIEW(DEVICE) to tensor variable [pr] (#8529)
* add VIEW(DEVICE) to tensor variable [pr]

* bind 2

* restrict shapetracker

* move var and bind closer

* one less line
2025-01-08 01:39:42 -05:00
qazal
b22494b710 restrict tensor const ShapeTracker in spec [pr] (#8447)
* restrict tensor const ShapeTracker in spec [pr]

* pass sink srcs

* reject if any of the specs disagree

* deceive mypy

* viz

* default to float

* just check the view

* create_schedule is gone

* test_verify_arg is flaky
2025-01-07 19:05:11 -05:00
qazal
0e97f807e0 test fixup prereqs for delete_buffer_view [pr] (#8523) 2025-01-07 11:52:18 +02:00
qazal
ed618a72e7 do not use subbuffer for bitcast (#8514)
* do not use subbuffer for bitcast

* edit that test

* explicit test for ptx

* ptx
2025-01-06 18:40:46 +02:00
qazal
ed121d235c spec for CAST_BEFORE_VIEW=1 [pr] (#8512) 2025-01-06 10:43:58 +02:00
qazal
eb7df92136 dedup COPY UOp [pr] (#8506) 2025-01-06 10:37:20 +02:00
qazal
bd4d7dc4eb return becomes_map from the scheduler (#8483)
* return becomes_map from the scheduler

* fix test_schedule

* fix abstractions2

* s/becomes/becomes_map
2025-01-03 22:47:21 +08:00
qazal
0d33391038 delete unused allow_buffer_view=True arg from bitcast [pr] (#8462) 2025-01-03 22:20:46 +08:00
qazal
08c9d980dc use const_like in uop zero folding [pr] (#8470) 2025-01-03 01:05:09 +08:00
qazal
f2bee34197 tests for symbolic_simple failing tensor const spec [pr] (#8469)
* tests for symbolic_simple failing tensor const spec [pr]

* mul is correct
2025-01-02 19:13:16 +08:00
qazal
c7ec0ab674 delete unused View lt support (2) (#8451)
* delete lt on view (2)

* the scheduler uses symbolic_simple
2024-12-31 07:01:25 +08:00
qazal
866dfa1f23 create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
qazal
7499139239 scheduler renames from the buffer_shape branch [pr] (#8444)
* scheduler refactors and renames from the buffer_shape branch [pr]

* all unmasked sts are allowed here

* only renames
2024-12-30 16:33:38 +08:00
George Hotz
b71c51191b tests from remove uop mutability [pr] (#8442)
* tests from remove uop mutability [pr]

* more test fix

* simpler test fix

* remove that
2024-12-29 12:14:10 -05:00
qazal
34987a03af const copy folding spec + multi.py behavior [pr] (#8436)
* const copy folding spec + multi behavior [pr]

* copy from clang, move multi test
2024-12-29 23:12:13 +08:00
qazal
a44cd1e6f7 add collapse_view to the scheduler [pr] (#8440) 2024-12-29 21:30:29 +08:00
qazal
b5820a5209 deletions from an ops.py "instant rule" audit [pr] (#8424)
* UOp.st cleanup 2 [pr]

* deletions from an ops.py instant rule audit [pr]

* note
2024-12-27 00:49:04 +08:00
qazal
9defbc7d54 add symbolic_simple to the scheduler [pr] (#8419) 2024-12-26 20:05:08 +08:00
qazal
313bdfa43f Add View lt support back [pr] (#8407)
* Revert "remove unused View.t and lt [pr] (#8374)"

This reverts commit 8fdcb60461.

* green test_masked_const_elementwise
2024-12-26 01:09:59 +08:00
qazal
4cbe5919d6 tensor uops symbolic folding spec [pr] (#8406) 2024-12-26 00:26:41 +08:00
qazal
3273972f44 delete is_unrealized_const, it's just CONST [pr] (#8390) 2024-12-24 16:46:12 +08:00
qazal
3a556a7e8b fully local tensor const representation: CONST(VIEW(DEVICE)) [pr] (#8389) 2024-12-24 16:15:56 +08:00