Commit Graph

65 Commits

Author SHA1 Message Date
chenyu
14d1c5fdfd assign fusion tests on detach and contiguous_backward (#15092) 2026-03-02 15:21:51 -05:00
qazal
f7aeff6061 viz: cli.py cleanups, do not require PYTHONPATH (#15085)
* cleanup the print

* sys.exit

* equal check

* cleanup unpacker

* cli doesn't need PYTHONPATH

* no semicolons

* %s/PYTHONPATH=. //g
2026-03-02 19:24:38 +09:00
chenyu
fe0fa8333b Revert "improve Tensor.sort indices (#15070)" (#15072)
This reverts commit e3003631f2.
2026-02-28 14:40:30 -05:00
chenyu
e3003631f2 improve Tensor.sort indices (#15070)
* improve Tensor.sort indices

instead of N^2 match at the end, have an arange to start and go through the same N(logN)^2 path

* contiguous
2026-02-28 14:16:16 -05:00
chenyu
d345f7f5dc remove _pending_assigns (#15040) 2026-02-26 22:38:10 -05:00
George Hotz
e3fa9896b7 start function and add walk rewrite (#14992)
* start function and add walk rewrite

* work

* add function on feed_forward

* llm progress

* stuff

* none of that
2026-02-25 13:56:27 +08:00
George Hotz
b643fca51e clean up complete_create_schedule_with_vars (#14980)
* clean up complete_create_schedule_with_vars

* transform_to_call

* update viz tests
2026-02-24 16:12:36 +08:00
ttomsa
0366474089 Bool cast to cmpne (#14544)
* test

* rm in llvmir

* rm in ptx and nir

* hmmmm

* rm in decompositions

* skip tests

* add test

* just this

* rm comment

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-23 10:31:36 -05:00
George Hotz
b824490e3f allocate generates a call (#14958)
* allocate generates a call

* symbolic works too

* DEFINE_VAR is param

* replace param later

* apply buffers

* name

* upd

* this was a bug...
2026-02-23 15:59:20 +08:00
chenyu
4424757b9a update test_sharded_memory (#14956)
cleaned up and moved to test/null
2026-02-22 16:56:08 -05:00
qazal
c5029fa460 jit case with Tensor.empty input, realized means allocated (#14930)
* simple failing jit test case with Tensor.empty

* this used to exist in ops.py...

* Revert "removed if self.buffer.is_allocated() in realized (#14836)"

This reverts commit 72cf603805.
2026-02-21 16:33:55 +09:00
George Hotz
df7774661a remove late numbering of UOps (#14923)
* remove late numbering of UOps

* stupid fix

* dead code
2026-02-21 09:18:48 +08:00
chenyu
24286c5593 fix clone for multi (#14919)
also update empty_like to make sure it's backed by buffers
2026-02-20 17:21:09 -05:00
chenyu
a4634b253a fix empty_like for sharded tensor (#14915) 2026-02-20 16:30:04 -05:00
George Hotz
2611907afb start ripping out old scheduler -- no maps (#14909)
* start ripping out old scheduler -- no maps

* no more metadata
2026-02-20 21:05:04 +08:00
George Hotz
55d3a5def9 preallocate all realized buffers (#14823)
* preallocate all realized buffers

* contiguous

* work

* comment that out

* move to schedule

* better

* correct fix

* just buffer

* disk bufs

* fixes disk tensor stuff

* fix symbolic stuff

* fix multi

* 162 failures

* bugfixes

* don't check that anymore

* fix schedule tests

* mnist should be contiguious

* type and buffer

* fix tests

* shrink axis correction

* mypy fixes

* tests skips

* same 37 failures

* dedup

* no shrink in the graph

* 29 failures

* skips

* fix custom kernel

* fix training

* those optimizations aren't supported currently

* simpler

* more correct

* tests

* 14 failures

* works

* fix that test

* broken

* 11 failures

* only kernel counts left

* fixes

* all tests pass

* remove tensor_map

* op test

* 200 -> 230

* test fixes

* fixes

* revert test_tiny thing

* guard

* revert that

* test tiny passes

* no contigs there

* base realize back

* Revert "no contigs there"

This reverts commit c45bb9fcfd.

* revert that

* chop many assigns

* 12 failures

* fix tests

* tests

* apply after

* pre-commit

* remove old code

* delete that

* fix types

* remove extra contig

* fix dataloader

* torch fix

* disk fix

* update kernel fusion numbres

* runs on amd

* restore kernel count

* add that rule back

* that

* disable that

* wrong

* add the correct rule for that folding

* more tests

* guard c1.arg

* no newlines

* realize those

* split into a different file

* remove detach/contig back

* skip 2

* update that
2026-02-20 20:05:54 +08:00
George Hotz
6610255654 add the correct rule for gcd div/mod folding (#14905)
* add the correct rule for that folding

* more tests

* guard c1.arg
2026-02-20 18:11:54 +08:00
George Hotz
a28fc2fba7 hotfix: remove wrong symbolic rule 2026-02-20 17:09:18 +08:00
qazal
e9ae3da711 viz: click on CALL node goes to codegen (#14609)
* viz: click on CALL node goes to codegen

* colored name
2026-02-20 11:13:11 +09:00
George Hotz
fc5677c28b resnet dataloader + more test cleanups (#14899)
* resnet dataloader

* tests
2026-02-20 10:05:47 +08:00
chenyu
b9744ab62b one more test_gpudims test (#14898)
failure from the bad simplification attempt
2026-02-19 18:18:44 -05:00
chenyu
9d6cf00be2 fix gpudim bug and test_split_2d_to_3d (#14896) 2026-02-19 16:46:24 -05:00
chenyu
2b31823ef9 update test_gpudims to prove bijectivity (#14895)
* update test_gpudims to prove bijectivity

* one more
2026-02-19 16:18:59 -05:00
chenyu
19ce7a3f7f use z3 to verify gpudims output index (#14894)
found a bug with z3
2026-02-19 15:24:38 -05:00
chenyu
52f727738b move test_grouped_dims to test/null (#14893)
it's a pure helper
2026-02-19 14:50:53 -05:00
chenyu
7400362a86 remove UOp.vars [pr] (#14891) 2026-02-19 12:09:39 -05:00
George Hotz
f6c1cf343c new symbolic rule from prealloc_bufs (#14883)
* new symbolic rule from prealloc_bufs

* optim
2026-02-19 20:57:30 +08:00
George Hotz
2f0f8b5776 more test relaxations from prealloc_bufs (#14880) 2026-02-19 14:23:28 +08:00
George Hotz
ab61c16730 fixes and test relaxations from prealloc_bufs (#14875)
* fixes and test relaxations from prealloc_bufs

* fix error type and guard _mop

* revert that

* contiguous makes extra/torch_backend/test_kernel_fusion.py fail
2026-02-19 11:37:25 +08:00
chenyu
f771de6738 gc.collect() to get the correct GlobalCounters.mem_used in tests (#14868)
test can be flaky if gc happens in between
2026-02-18 15:01:23 -05:00
chenyu
5746a605ce UOp.axis raises for invalid reshape (#14863)
reshape is lazy now, so better to raise from the .axis call and not have caller to handle invalid case
2026-02-18 11:28:56 -05:00
George Hotz
ab55e8c6b9 assign should be used as output buffer (#14845)
* assign should be used as buffer

* late removed

* the fix

* better fix

* backward slice
2026-02-18 09:37:46 +08:00
chenyu
72cf603805 removed if self.buffer.is_allocated() in realized (#14836)
automatically fixes is_realized issue for empty
2026-02-17 15:35:56 -05:00
chenyu
f147791105 update test to reset and test kernel_count directly (#14832) 2026-02-17 11:48:46 -05:00
George Hotz
bc3487d607 VIZ display cleanups (#14811)
* exclude reshape/expand broadcasts from viz

* limit src lines
2026-02-17 10:03:08 +08:00
nimlgen
9f8afb518c viz: sdma gb/s in graph (#14798)
* viz: sdma gb/s in graph

* f
2026-02-16 16:45:06 +03:00
qazal
db3db476ff viz: add GB/s to SDMA (#14795)
* work

* better

* fix that

* no decimal
2026-02-16 20:09:20 +09:00
qazal
c2be31e75b move Estimates to rewrite rules [pr] (#14782)
* move Estimates to rewrite rules [pr]

* don't need this cached_property

* tuple

* return
2026-02-16 12:59:42 +09:00
George Hotz
0abcb9aac2 move more to mixins (#14780)
* move more to mixins

* revert

* move some

* do not change

* more

* fix tests

* Revert "more"

This reverts commit d942d59fa4.

* go

* work

* more

* work

* guard

* base
2026-02-16 11:35:00 +08:00
George Hotz
9759fd6193 dtype mixin (#14763)
* dtype mixin

* dtype mixin methods
2026-02-15 16:03:48 +08:00
George Hotz
32980c74d1 hotfix: skip flaky tests, looped many times on tinymac3 2026-02-15 07:46:29 +08:00
chenyu
043f5dbfa0 fix write-after-read tracking (#14754)
AFTER-AFTER was silently dropped, which breaks write-after-read
2026-02-14 17:23:05 -05:00
chenyu
0ce4a55dad clean up test_setitem_slice (#14750)
moved to test_setitem_schedule, and use contiguous zeros as scheduler handles empty differently now
2026-02-14 14:29:16 -05:00
nimlgen
e1a18dadae fix devices for copies (#14747)
* fix devices for copies

* add test
2026-02-14 17:39:41 +03:00
George Hotz
c0fe78f73b BUG: metadata is lost with partial assign (#14732) 2026-02-13 21:35:21 +08:00
chenyu
50cb40be88 clean up test/null/test_indexing.py (#14720) 2026-02-12 22:36:53 -05:00
qazal
5b624b5e93 viz: better error message for out of range timestamps (#14722)
* test_timestamp_out_of_range

* rel_ts helper

* linter
2026-02-13 12:13:40 +09:00
chenyu
86352988d8 update test_uops_stats for setitem (#14710)
realize both full tensor and the slice should not add to global_mem
2026-02-12 12:26:13 -05:00
chenyu
56caf6a3a2 fix Estimate.from_uops for sliced access (#14695)
"assume all DEFINE_GLOBAL memory is accessed" is wrong for partial load. get accessed accumulated from INDEX, then cap at full size. now mem_est never exceeds lds_est
2026-02-12 11:18:07 -05:00
chenyu
8551fa50d3 support bitcast in sym_infer (#14708)
fixed `DEBUG=2 DEV=WEBGPU python -m pytest test/backend/test_tensor_variable.py::TestTensorVariable::test_symbolic_pad`
2026-02-12 10:21:05 -05:00