Commit Graph

243 Commits

Author SHA1 Message Date
George Hotz
bfbe81df71 remove cast before view (#8613)
* remove cast before view

* greener

* indexing

* that passes too

* openpilot too

* ack

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-01-14 15:04:58 -05:00
qazal
ae2229d727 assert kernel buffer limit at compile time [pr] (#8595)
* remove the BUF_LIMIT assert

* skip the base one
2025-01-13 16:32:07 -05:00
qazal
7562cc0399 better test for reduce swizzle + don't use double dtype [pr] (#8586)
* better test_permute_rewrite

* use float32
2025-01-13 05:02:21 -05:00
qazal
cff1ee9038 add SINK folding from the tensor_map branch [pr] (#8562)
* delete is_constant from the scheduler

* add sink folding

* always give BUFFER uops Buffers [pr]

* spec for view, var (bind) and const

* add test_buffer_only_after_realize

* work

* 3 lines

* more work
2025-01-12 03:39:34 -05:00
qazal
87cbff3ac0 always give BUFFER uops Buffers [pr] (#8572)
* always give BUFFER uops Buffers [pr]

* add test_buffer_only_after_realize
2025-01-11 23:17:09 +02:00
chenyu
d09897c2aa allow double copy [pr] (#8559)
fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark
2025-01-10 18:21:01 -05:00
qazal
2fd068ffc0 delete empty op (#8544)
* simple delete EMPTY op

* there's no schedule for empty
2025-01-09 14:10:15 -05:00
qazal
f6eb0574f2 start tests for putting the tensor graph in a single kernel [pr] (#8542)
* start tests for putting the tensor graph in a single kernel [pr]

* parallel actually

* better view_left test

* test a softmax

* put all that in sym
2025-01-09 13:33:21 -05:00
qazal
947de23cac add VIEW(DEVICE) to tensor variable [pr] (#8529)
* add VIEW(DEVICE) to tensor variable [pr]

* bind 2

* restrict shapetracker

* move var and bind closer

* one less line
2025-01-08 01:39:42 -05:00
qazal
b22494b710 restrict tensor const ShapeTracker in spec [pr] (#8447)
* restrict tensor const ShapeTracker in spec [pr]

* pass sink srcs

* reject if any of the specs disagree

* deceive mypy

* viz

* default to float

* just check the view

* create_schedule is gone

* test_verify_arg is flaky
2025-01-07 19:05:11 -05:00
qazal
0e97f807e0 test fixup prereqs for delete_buffer_view [pr] (#8523) 2025-01-07 11:52:18 +02:00
qazal
ed618a72e7 do not use subbuffer for bitcast (#8514)
* do not use subbuffer for bitcast

* edit that test

* explicit test for ptx

* ptx
2025-01-06 18:40:46 +02:00
qazal
ed121d235c spec for CAST_BEFORE_VIEW=1 [pr] (#8512) 2025-01-06 10:43:58 +02:00
qazal
eb7df92136 dedup COPY UOp [pr] (#8506) 2025-01-06 10:37:20 +02:00
qazal
bd4d7dc4eb return becomes_map from the scheduler (#8483)
* return becomes_map from the scheduler

* fix test_schedule

* fix abstractions2

* s/becomes/becomes_map
2025-01-03 22:47:21 +08:00
qazal
0d33391038 delete unused allow_buffer_view=True arg from bitcast [pr] (#8462) 2025-01-03 22:20:46 +08:00
qazal
08c9d980dc use const_like in uop zero folding [pr] (#8470) 2025-01-03 01:05:09 +08:00
qazal
f2bee34197 tests for symbolic_simple failing tensor const spec [pr] (#8469)
* tests for symbolic_simple failing tensor const spec [pr]

* mul is correct
2025-01-02 19:13:16 +08:00
qazal
c7ec0ab674 delete unused View lt support (2) (#8451)
* delete lt on view (2)

* the scheduler uses symbolic_simple
2024-12-31 07:01:25 +08:00
qazal
866dfa1f23 create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
qazal
7499139239 scheduler renames from the buffer_shape branch [pr] (#8444)
* scheduler refactors and renames from the buffer_shape branch [pr]

* all unmasked sts are allowed here

* only renames
2024-12-30 16:33:38 +08:00
George Hotz
b71c51191b tests from remove uop mutability [pr] (#8442)
* tests from remove uop mutability [pr]

* more test fix

* simpler test fix

* remove that
2024-12-29 12:14:10 -05:00
qazal
34987a03af const copy folding spec + multi.py behavior [pr] (#8436)
* const copy folding spec + multi behavior [pr]

* copy from clang, move multi test
2024-12-29 23:12:13 +08:00
qazal
a44cd1e6f7 add collapse_view to the scheduler [pr] (#8440) 2024-12-29 21:30:29 +08:00
qazal
b5820a5209 deletions from an ops.py "instant rule" audit [pr] (#8424)
* UOp.st cleanup 2 [pr]

* deletions from an ops.py instant rule audit [pr]

* note
2024-12-27 00:49:04 +08:00
qazal
9defbc7d54 add symbolic_simple to the scheduler [pr] (#8419) 2024-12-26 20:05:08 +08:00
qazal
313bdfa43f Add View lt support back [pr] (#8407)
* Revert "remove unused View.t and lt [pr] (#8374)"

This reverts commit 8fdcb60461.

* green test_masked_const_elementwise
2024-12-26 01:09:59 +08:00
qazal
4cbe5919d6 tensor uops symbolic folding spec [pr] (#8406) 2024-12-26 00:26:41 +08:00
qazal
3273972f44 delete is_unrealized_const, it's just CONST [pr] (#8390) 2024-12-24 16:46:12 +08:00
qazal
3a556a7e8b fully local tensor const representation: CONST(VIEW(DEVICE)) [pr] (#8389) 2024-12-24 16:15:56 +08:00
qazal
514a6740e4 Revert "CONST(VIEW(DEVICE)) (#8365)" (#8372)
This reverts commit 83284985f0.
2024-12-22 04:44:34 +02:00
qazal
83284985f0 CONST(VIEW(DEVICE)) (#8365) 2024-12-22 04:18:35 +02:00
qazal
88bc51385c scheduler: don't trade complexity for speed (#8370)
* scheduler: don't trade complexity for speed

* don't need is_scheduled

* make those tests real world

* graph_rewrite dedup
2024-12-22 03:30:51 +02:00
qazal
72aa38aa3b BIND in tensor_uop_spec + cleanups [pr] (#8363)
* Ops.BIND pattern in tensor_uop_spec + cleanups [pr]

* use metaops there
2024-12-21 21:26:47 +08:00
qazal
2649e87546 delete the fake buffer from const (#8355)
* delete the fake buffer from const

* fix test_sink_childless_const_alt

* it should be CONST(VIEW(DEVICE))
2024-12-21 04:20:28 +08:00
qazal
8e266091fb tensor const spec [pr] (#8331) 2024-12-19 22:41:30 +08:00
qazal
fddaeb6344 scheduler deduping spec and asserts [pr] (#8307)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-18 09:21:41 -08:00
George Hotz
801e199196 change buffer to not be pointer [pr] (#8302) 2024-12-17 16:47:51 -08:00
George Hotz
6d83a96440 retry: use movement ops [pr] (#8225)
* Revert "Revert "use movement ops [pr] (#8222)" (#8224)"

This reverts commit da19c37f0a.

* fix cast before view
2024-12-13 15:14:26 -08:00
George Hotz
da19c37f0a Revert "use movement ops [pr] (#8222)" (#8224)
This reverts commit 0d26c970ba.
2024-12-13 14:10:47 -08:00
George Hotz
0d26c970ba use movement ops [pr] (#8222)
* use movement ops [pr]

* test indexing
2024-12-13 14:06:01 -08:00
George Hotz
dbe549e462 rename expand to unroll [pr] (#8218) 2024-12-13 11:41:52 -08:00
George Hotz
8a04a3a77a rename LazyBuffer -> UOp [pr] (#8169)
* rename LazyBuffer -> UOp [pr]

* fix docs
2024-12-11 16:15:52 -08:00
qazal
9044b0746a delete lazy [pr] (#7801)
* LazyBuffer = UOp

* try 4 at this diff

* skip optimization tests p1

* raise kernel count expectations

* BIND isn't the _only_ uop that can become a tensor

* fix test_ones_sum on symbolic

* bump openpilot, correctness first

* offset on assign is fine

* uop is immutable

* what if this was higher

* more optimization skips

* instant fold const copy

* test_multitensor shouldn't expect buffer for unrealized

* move copy folder to upats

* start BUFFER_VIEW

* kinda BUFFER_VIEW

* Revert "kinda BUFFER_VIEW"

This reverts commit 94b4fe3040.

* BUFFER_VIEW try 2

* linter and missed _device

* pylint

* keep Ops.CONTIGUOUS

* always BUFFER_VIEW disk

* test

* cpu isn't a real device

* buffer references afte del

* add that back

* start bringing some of these back

* more test updates

* simpler simplify copy

* subbufer everything

* this is fine with buffer view

* cleanup the diff in test/ 1

* copy is one thing

* diff pruning

* diff pruning 2

* oh bind unbinds way too early

* extra

* more diff pruning

* more const folding

* experiment with symbolic here

* Revert "experiment with symbolic here"

This reverts commit cb87d61f7a.

* Revert "more const folding"

This reverts commit 2a7d258a2b.

* Revert VALID early folding

This reverts commit 4074f52317.

* storing const is fine

* fix test_prefer_half_buffer

* iterate on test_real_world

* this fixes test_train_mnist memory, breaks everything else

* Revert "this fixes test_train_mnist memory, breaks everything else"

This reverts commit dccfcbe068.

* always expect buffer to exist here

* temp debug: something is mutating lazydata in compile3

* Revert "temp debug: something is mutating lazydata in compile3"

This reverts commit 71400f0d55.

* everything back to normal

* compile3

* compile3 test

* start captured jit work, that test passes

* finalized memory skip set

* linter err

* back to base here

* tiny metaop cleanup

* print tensor

* 4th type this unbind got me

* green pickle

* tensor_variable sanity

* cast sanity

* link from the reds

* COPY sanity + minor repr change

* you can exist

* enable test_winograd

* bye bye nbytes

* danger, uop is mutating

* real become

* delete those from uop init

* put it in buffer init

* buffer inits with so much stuff

* buffer pickle try 2

* toposort can't be a cached property

* fix test_schedule_gc_with_inputs

* remove all @unittest.skip(gc)

* Revert "remove all @unittest.skip(gc)"

This reverts commit 9d8d92dd85.

* reenable real world + test_schedule_gc

* test: RUN_PROCESS_REPLAY=0

* fix pickle jit

* test changes

* reenable test_lru_alloc and TestTrain

* fix imagedtype

* bring pr back

* reenable 3 gc tests

* test_schedule better diff

* disable SPLIT_REDUCEOP

* test_save_all_dtypes looks fixed

* fix metadata

* skip that one

* fix viz by not pickling buffers

* simple test for const folding

* bring split reduceop back

* add simplify_alu

* simplify_binop fixes a test

* fix cast folding

* disable that test

* that test looks fine

* changes from delete_lazy pruning p1

* cast folding and children base

* test: cast folding from pruning branch

* green test_sgd_4convs_fuse_conv_bw

* enable some indexing folding

* test_complex_backward is fixed

* prune more, 295 -> 233

* fix test_multi_const_folding_literal

* fix double copy

* early become test

* ooooops

* clean up ctx in all big_graph

* fix openpilot 208 kernels

* train_cifar is fine now

* fix CAST_BEFORE_VIEW

* ever faker const

* back to 13

* mark expectedFailure

* fine don't create them

* test_multi_const_folding_tensor

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-12 05:05:19 +08:00
qazal
047a6dabc3 prereq for scheduler contiguous_child [pr] (#8163)
* the whole context is fine here [pr]

* fix that
2024-12-12 02:02:22 +08:00
qazal
b894657aa7 assert the same things without mutating or accessing internal ops state [pr] (#8157)
* don't mutate internal state in test_lazybuffer

* fix test_schedule internals

* save time

* third si

* fine sometimes buffer_view isn't there
2024-12-11 22:01:27 +08:00
Ahmed Harmouche
a8cfdc70ed Run more webgpu tests (#8142) 2024-12-10 23:20:04 +01:00
qazal
5dd61035f7 revert VALID early folding for now (#8114)
This reverts commit 4074f52317.
2024-12-09 00:34:24 +08:00
qazal
4074f52317 VALID early folding (#8100)
* fold valid

* :)

* fix test_verify_ast

* keep symbolic working
2024-12-07 18:37:47 +08:00
qazal
df84dc6444 unrelated test fixups from delete_lazy [pr] (#8088)
* unrelated test fixups from delete_lazy [pr]

* fine if it's scheduled later
2024-12-06 17:31:02 +02:00