Commit Graph

10473 Commits

Author SHA1 Message Date
George Hotz
1b1752b6f8 Revert "remove the on_stack stuff and any retries"
This reverts commit 49a2b328b9.
2025-10-08 21:11:36 +08:00
George Hotz
a4edf78351 don't use the p word 2025-10-08 20:00:27 +08:00
George Hotz
67b3e463b8 Merge branch 'master' into delete_slow_rangeify 2025-10-08 19:58:16 +08:00
George Hotz
49a2b328b9 remove the on_stack stuff and any retries 2025-10-08 19:57:13 +08:00
George Hotz
78b45838bf delete the old rangeify path and all the children stuff 2025-10-08 19:50:20 +08:00
qazal
b6835f4134 remove Ops.VIEW and related UOp methods (#12522)
* remove Ops.VIEW and related UOp methods

* update abstractions2.py

* no ShapeTrackers in abstractions2.py

* it's a size 1
2025-10-08 14:47:02 +03:00
George Hotz
3b0b3a2e64 fast RANGEIFY (#12504)
* rtoposort is fast, can replace rangeify with this

* fast rangeify

* work

* fast rangeify works for mnist

* should work

* progress

* pad fix

* FAST

* tests passing

* don't delete those shape ops

* put in rangeify map

* ending ranges fix

* tests

* mstack/mselect no hacks

* move to indexing.py

* touch up tests + add comments

* disable failing test

* actually make the file readable

* failing

* error
2025-10-08 19:38:06 +08:00
qazal
9448924d9e update gpt2 kernel count tests in CI=0 (#12523) 2025-10-08 14:29:11 +03:00
qazal
c5a1f9f5f9 no ShapeTrackers in multi.py (#12521)
* switch multi to all movement ops

* inline dvars
2025-10-08 14:04:05 +03:00
chenyu
ee0382ad99 remove ShapeTracker.invert (#12520) 2025-10-08 18:37:34 +08:00
chenyu
d5058427ea remove ShapeTracker.real_size (#12519) 2025-10-08 06:15:29 -04:00
qazal
6f26603f06 delete swizzler.py (#12518)
* delete swizzler

* remove merge_views tests

* don't need rewrites_for_views

* apply_rewrites
2025-10-08 13:02:34 +03:00
qazal
7e0b14243e delete grouper and kernelize (#12517)
* delete grouper and kernelize

* +sys.setrecursionlimit
2025-10-08 12:27:26 +03:00
chenyu
942022c309 smaller LLAMA_LAYER in Test llama 3 training (#12516)
very slow now
2025-10-08 05:10:51 -04:00
chenyu
e701106a64 remove FUSE_ARANGE (#12511)
it was the default already
2025-10-08 04:54:07 -04:00
qazal
291a19650b move Kernel dataclass to rangeify (#12510) 2025-10-08 11:30:06 +03:00
qazal
ad49f8148b switch process_replay to rangeify (#12509) 2025-10-08 11:26:43 +03:00
chenyu
da1f46ff3f remove RANGEIFY specific test jobs (#12507) 2025-10-08 04:12:04 -04:00
George Hotz
1e567a5cf8 make RANGEIFY=1 the default (#12161)
Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-10-08 03:46:09 -04:00
nimlgen
9e7103647d amd: rename cmd_id to sqtt_next_cmd_id (#12503)
* amd: rename cmd_id to sqtt_next_cmd_id

* and typo
2025-10-08 15:16:19 +08:00
nimlgen
4a756a37d8 amd: support rocm7 (#12502)
* amd: support rocm7

* mock
2025-10-08 14:30:39 +08:00
qazal
60b6dca5ba update some tests instead of expect_rangeify_fails (#12500)
* update test_clone_doesnt_dedup to use base

* new_flat_buffer passes

* fix test_reorder_expand

* remove the view stuff

* remove that test, we don't want this view const behavior

* test_setitem_becomes_subbuffer is good
2025-10-08 07:42:31 +03:00
qazal
84597ed53c early assert for device mistmatched asts in rangeify (#12499)
* early assert for device mistmatched asts in rangeify

* alt also passes
2025-10-08 07:19:36 +03:00
qazal
2e19354c1c viz: reorder timeline graphs (#12498)
* viz: reorder timeline graphs

* update test_viz with the new order
2025-10-08 07:10:23 +03:00
George Hotz
d06226b575 fix SPEC and all_tensors iterator (#12496) 2025-10-07 23:18:17 -04:00
qazal
a7cb80bfab use recursive_property in UOp device (#12477)
* simple failing test with RecursionError

* switch to @recursive_property

* merge 2

* diff
2025-10-08 06:15:05 +03:00
George Hotz
a6d59a0b45 backward_slice to get srcs recursively (#12494)
* change name to backward_slice

* faster check

* clean up comments and names

* comment
2025-10-08 10:31:42 +08:00
chenyu
eb3bc277b3 remove ASSERT_MIN_STEP_TIME in external_benchmark_openpilot (#12495)
should add for compile3 and compile 3 only
2025-10-07 22:13:42 -04:00
qazal
239f9a3029 update viz to not use children [pr] (#12493) 2025-10-08 04:35:01 +03:00
Sieds Lykles
b465c17b56 Revert "UOp.factor and add chain sorting (#12413)" (#12492)
This reverts commit e74be4a140.
2025-10-08 03:20:23 +02:00
George Hotz
945cc46475 delete children tracking from uop (#12491)
* delete children tracking from uop

* uop children no longer exists

* no tracked children

* that test is flaky too
2025-10-08 09:04:14 +08:00
nimlgen
648e5bb223 hcq: do not raise when fini (#12487)
* hcq: do not raise when fini

* Revert "hcq: do not raise when fini"

This reverts commit 44af5f7d05.

* this way

* runtime is fine

* nn
2025-10-07 23:27:03 +08:00
George Hotz
a2345787b9 parents is faster than sparents (#12490) 2025-10-07 21:31:50 +08:00
George Hotz
12c4963489 add more rangeify pm tests (#12488) 2025-10-07 05:45:38 -04:00
George Hotz
403fdfcfd4 check spec in test, cleanup vectorize render (#12484) 2025-10-07 17:05:50 +08:00
qazal
22674798df assert correctness in test_permuted_assignment [pr] (#12483) 2025-10-07 11:42:22 +03:00
George Hotz
75ce11593c test_reshape_match should match (#12479) 2025-10-07 16:07:21 +08:00
chenyu
fe774a4319 more skip WINO on benchmark (#12482) 2025-10-07 03:43:51 -04:00
chenyu
8ad5f9e74f skip slow benchmarks (#12481)
* skip slow benchmarks

padded tc is already slow, rest are slow with rangeify (correct if run locally)

* relax more
2025-10-07 03:28:56 -04:00
George Hotz
ea7672931f fix test_matmul_relu_cat (#12478) 2025-10-07 02:32:23 -04:00
George Hotz
514d2a0774 merge tagless reshapes (#12474)
* merge tagless reshapes

* cleanup
2025-10-07 13:57:58 +08:00
chenyu
7b48f3cc45 failed test case repro for openpilot model (#12475)
* failed test case repro for openpilot model

* assertEqual
2025-10-07 13:46:43 +08:00
chenyu
a5484b767e remove skipping cast in simplify_valid [pr] (#12472)
* remove skipping cast in simplify_valid [pr]

unsupported statements are handled in uop_given_valid already. the test failed because (100%x) somehow got simplified

* better test
2025-10-07 00:10:04 -04:00
George Hotz
b4509fba31 thundermittens (#12471)
* thundermittens

* give device a type
2025-10-07 11:47:39 +08:00
George Hotz
0f25b4b289 move frontend dir to nn [pr] (#12470) 2025-10-07 10:42:22 +08:00
qazal
f664bcc8bd use recursive_property in UOp tracing (#12469)
* test

* simple passing
2025-10-06 21:10:52 +03:00
qazal
1af05dae77 fix rangeify in compile4.py (#12467)
* fix rangeify in compile4.py

* fix type_verify
2025-10-06 13:37:46 +03:00
qazal
76e8a3250c rangeify: late zero folding (#12464)
* rangeify: late zero folding

* early

* not kernels

* none

* multi

* linter

* mstack is sink comment

* more comment
2025-10-06 12:52:33 +03:00
George Hotz
0c015a24fe use recursive_property to prevent RecursionError (#12465)
* use recursive_property to prevent RecursionError

* not slower

* fix tests

* faster

* simpler
2025-10-06 15:59:18 +08:00
chenyu
a1881b0c17 update test_chicken (#12466)
logits are close, just numerical
2025-10-06 03:58:44 -04:00