chenyu
14d1c5fdfd
assign fusion tests on detach and contiguous_backward ( #15092 )
2026-03-02 15:21:51 -05:00
qazal
f7aeff6061
viz: cli.py cleanups, do not require PYTHONPATH ( #15085 )
...
* cleanup the print
* sys.exit
* equal check
* cleanup unpacker
* cli doesn't need PYTHONPATH
* no semicolons
* %s/PYTHONPATH=. //g
2026-03-02 19:24:38 +09:00
chenyu
fe0fa8333b
Revert "improve Tensor.sort indices ( #15070 )" ( #15072 )
...
This reverts commit e3003631f2 .
2026-02-28 14:40:30 -05:00
chenyu
e3003631f2
improve Tensor.sort indices ( #15070 )
...
* improve Tensor.sort indices
instead of N^2 match at the end, have an arange to start and go through the same N(logN)^2 path
* contiguous
2026-02-28 14:16:16 -05:00
chenyu
d345f7f5dc
remove _pending_assigns ( #15040 )
2026-02-26 22:38:10 -05:00
George Hotz
e3fa9896b7
start function and add walk rewrite ( #14992 )
...
* start function and add walk rewrite
* work
* add function on feed_forward
* llm progress
* stuff
* none of that
2026-02-25 13:56:27 +08:00
George Hotz
b643fca51e
clean up complete_create_schedule_with_vars ( #14980 )
...
* clean up complete_create_schedule_with_vars
* transform_to_call
* update viz tests
2026-02-24 16:12:36 +08:00
ttomsa
0366474089
Bool cast to cmpne ( #14544 )
...
* test
* rm in llvmir
* rm in ptx and nir
* hmmmm
* rm in decompositions
* skip tests
* add test
* just this
* rm comment
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-02-23 10:31:36 -05:00
George Hotz
b824490e3f
allocate generates a call ( #14958 )
...
* allocate generates a call
* symbolic works too
* DEFINE_VAR is param
* replace param later
* apply buffers
* name
* upd
* this was a bug...
2026-02-23 15:59:20 +08:00
chenyu
4424757b9a
update test_sharded_memory ( #14956 )
...
cleaned up and moved to test/null
2026-02-22 16:56:08 -05:00
qazal
c5029fa460
jit case with Tensor.empty input, realized means allocated ( #14930 )
...
* simple failing jit test case with Tensor.empty
* this used to exist in ops.py...
* Revert "removed if self.buffer.is_allocated() in realized (#14836 )"
This reverts commit 72cf603805 .
2026-02-21 16:33:55 +09:00
George Hotz
df7774661a
remove late numbering of UOps ( #14923 )
...
* remove late numbering of UOps
* stupid fix
* dead code
2026-02-21 09:18:48 +08:00
chenyu
24286c5593
fix clone for multi ( #14919 )
...
also update empty_like to make sure it's backed by buffers
2026-02-20 17:21:09 -05:00
chenyu
a4634b253a
fix empty_like for sharded tensor ( #14915 )
2026-02-20 16:30:04 -05:00
George Hotz
2611907afb
start ripping out old scheduler -- no maps ( #14909 )
...
* start ripping out old scheduler -- no maps
* no more metadata
2026-02-20 21:05:04 +08:00
George Hotz
55d3a5def9
preallocate all realized buffers ( #14823 )
...
* preallocate all realized buffers
* contiguous
* work
* comment that out
* move to schedule
* better
* correct fix
* just buffer
* disk bufs
* fixes disk tensor stuff
* fix symbolic stuff
* fix multi
* 162 failures
* bugfixes
* don't check that anymore
* fix schedule tests
* mnist should be contiguious
* type and buffer
* fix tests
* shrink axis correction
* mypy fixes
* tests skips
* same 37 failures
* dedup
* no shrink in the graph
* 29 failures
* skips
* fix custom kernel
* fix training
* those optimizations aren't supported currently
* simpler
* more correct
* tests
* 14 failures
* works
* fix that test
* broken
* 11 failures
* only kernel counts left
* fixes
* all tests pass
* remove tensor_map
* op test
* 200 -> 230
* test fixes
* fixes
* revert test_tiny thing
* guard
* revert that
* test tiny passes
* no contigs there
* base realize back
* Revert "no contigs there"
This reverts commit c45bb9fcfd .
* revert that
* chop many assigns
* 12 failures
* fix tests
* tests
* apply after
* pre-commit
* remove old code
* delete that
* fix types
* remove extra contig
* fix dataloader
* torch fix
* disk fix
* update kernel fusion numbres
* runs on amd
* restore kernel count
* add that rule back
* that
* disable that
* wrong
* add the correct rule for that folding
* more tests
* guard c1.arg
* no newlines
* realize those
* split into a different file
* remove detach/contig back
* skip 2
* update that
2026-02-20 20:05:54 +08:00
George Hotz
6610255654
add the correct rule for gcd div/mod folding ( #14905 )
...
* add the correct rule for that folding
* more tests
* guard c1.arg
2026-02-20 18:11:54 +08:00
George Hotz
a28fc2fba7
hotfix: remove wrong symbolic rule
2026-02-20 17:09:18 +08:00
qazal
e9ae3da711
viz: click on CALL node goes to codegen ( #14609 )
...
* viz: click on CALL node goes to codegen
* colored name
2026-02-20 11:13:11 +09:00
George Hotz
fc5677c28b
resnet dataloader + more test cleanups ( #14899 )
...
* resnet dataloader
* tests
2026-02-20 10:05:47 +08:00
chenyu
b9744ab62b
one more test_gpudims test ( #14898 )
...
failure from the bad simplification attempt
2026-02-19 18:18:44 -05:00
chenyu
9d6cf00be2
fix gpudim bug and test_split_2d_to_3d ( #14896 )
2026-02-19 16:46:24 -05:00
chenyu
2b31823ef9
update test_gpudims to prove bijectivity ( #14895 )
...
* update test_gpudims to prove bijectivity
* one more
2026-02-19 16:18:59 -05:00
chenyu
19ce7a3f7f
use z3 to verify gpudims output index ( #14894 )
...
found a bug with z3
2026-02-19 15:24:38 -05:00
chenyu
52f727738b
move test_grouped_dims to test/null ( #14893 )
...
it's a pure helper
2026-02-19 14:50:53 -05:00
chenyu
7400362a86
remove UOp.vars [pr] ( #14891 )
2026-02-19 12:09:39 -05:00
George Hotz
f6c1cf343c
new symbolic rule from prealloc_bufs ( #14883 )
...
* new symbolic rule from prealloc_bufs
* optim
2026-02-19 20:57:30 +08:00
George Hotz
2f0f8b5776
more test relaxations from prealloc_bufs ( #14880 )
2026-02-19 14:23:28 +08:00
George Hotz
ab61c16730
fixes and test relaxations from prealloc_bufs ( #14875 )
...
* fixes and test relaxations from prealloc_bufs
* fix error type and guard _mop
* revert that
* contiguous makes extra/torch_backend/test_kernel_fusion.py fail
2026-02-19 11:37:25 +08:00
chenyu
f771de6738
gc.collect() to get the correct GlobalCounters.mem_used in tests ( #14868 )
...
test can be flaky if gc happens in between
2026-02-18 15:01:23 -05:00
chenyu
5746a605ce
UOp.axis raises for invalid reshape ( #14863 )
...
reshape is lazy now, so better to raise from the .axis call and not have caller to handle invalid case
2026-02-18 11:28:56 -05:00
George Hotz
ab55e8c6b9
assign should be used as output buffer ( #14845 )
...
* assign should be used as buffer
* late removed
* the fix
* better fix
* backward slice
2026-02-18 09:37:46 +08:00
chenyu
72cf603805
removed if self.buffer.is_allocated() in realized ( #14836 )
...
automatically fixes is_realized issue for empty
2026-02-17 15:35:56 -05:00
chenyu
f147791105
update test to reset and test kernel_count directly ( #14832 )
2026-02-17 11:48:46 -05:00
George Hotz
bc3487d607
VIZ display cleanups ( #14811 )
...
* exclude reshape/expand broadcasts from viz
* limit src lines
2026-02-17 10:03:08 +08:00
nimlgen
9f8afb518c
viz: sdma gb/s in graph ( #14798 )
...
* viz: sdma gb/s in graph
* f
2026-02-16 16:45:06 +03:00
qazal
db3db476ff
viz: add GB/s to SDMA ( #14795 )
...
* work
* better
* fix that
* no decimal
2026-02-16 20:09:20 +09:00
qazal
c2be31e75b
move Estimates to rewrite rules [pr] ( #14782 )
...
* move Estimates to rewrite rules [pr]
* don't need this cached_property
* tuple
* return
2026-02-16 12:59:42 +09:00
George Hotz
0abcb9aac2
move more to mixins ( #14780 )
...
* move more to mixins
* revert
* move some
* do not change
* more
* fix tests
* Revert "more"
This reverts commit d942d59fa4 .
* go
* work
* more
* work
* guard
* base
2026-02-16 11:35:00 +08:00
George Hotz
9759fd6193
dtype mixin ( #14763 )
...
* dtype mixin
* dtype mixin methods
2026-02-15 16:03:48 +08:00
George Hotz
32980c74d1
hotfix: skip flaky tests, looped many times on tinymac3
2026-02-15 07:46:29 +08:00
chenyu
043f5dbfa0
fix write-after-read tracking ( #14754 )
...
AFTER-AFTER was silently dropped, which breaks write-after-read
2026-02-14 17:23:05 -05:00
chenyu
0ce4a55dad
clean up test_setitem_slice ( #14750 )
...
moved to test_setitem_schedule, and use contiguous zeros as scheduler handles empty differently now
2026-02-14 14:29:16 -05:00
nimlgen
e1a18dadae
fix devices for copies ( #14747 )
...
* fix devices for copies
* add test
2026-02-14 17:39:41 +03:00
George Hotz
c0fe78f73b
BUG: metadata is lost with partial assign ( #14732 )
2026-02-13 21:35:21 +08:00
chenyu
50cb40be88
clean up test/null/test_indexing.py ( #14720 )
2026-02-12 22:36:53 -05:00
qazal
5b624b5e93
viz: better error message for out of range timestamps ( #14722 )
...
* test_timestamp_out_of_range
* rel_ts helper
* linter
2026-02-13 12:13:40 +09:00
chenyu
86352988d8
update test_uops_stats for setitem ( #14710 )
...
realize both full tensor and the slice should not add to global_mem
2026-02-12 12:26:13 -05:00
chenyu
56caf6a3a2
fix Estimate.from_uops for sliced access ( #14695 )
...
"assume all DEFINE_GLOBAL memory is accessed" is wrong for partial load. get accessed accumulated from INDEX, then cap at full size. now mem_est never exceeds lds_est
2026-02-12 11:18:07 -05:00
chenyu
8551fa50d3
support bitcast in sym_infer ( #14708 )
...
fixed `DEBUG=2 DEV=WEBGPU python -m pytest test/backend/test_tensor_variable.py::TestTensorVariable::test_symbolic_pad`
2026-02-12 10:21:05 -05:00