chenyu
86352988d8
update test_uops_stats for setitem ( #14710 )
...
realize both full tensor and the slice should not add to global_mem
2026-02-12 12:26:13 -05:00
chenyu
56caf6a3a2
fix Estimate.from_uops for sliced access ( #14695 )
...
"assume all DEFINE_GLOBAL memory is accessed" is wrong for partial load. get accessed accumulated from INDEX, then cap at full size. now mem_est never exceeds lds_est
2026-02-12 11:18:07 -05:00
chenyu
8551fa50d3
support bitcast in sym_infer ( #14708 )
...
fixed `DEBUG=2 DEV=WEBGPU python -m pytest test/backend/test_tensor_variable.py::TestTensorVariable::test_symbolic_pad`
2026-02-12 10:21:05 -05:00
chenyu
557134e1c7
model/test fix that failed with WEBGPU=1 DEBUG=2 ( #14706 )
2026-02-12 09:08:16 -05:00
nimlgen
b376bd7a21
jit: fix raw in same kernel ( #14699 )
...
* jit: fix raw in same kernel
* fix
* ugh
* x
* simpler
2026-02-12 15:33:32 +03:00
George Hotz
19e68a1833
skip AMD on not AMD ( #14703 )
2026-02-12 18:56:54 +08:00
George Hotz
4680247e35
renderer/amd: move in tree ( #14702 )
...
* renderer/amd: move in tree
* fix paths in tests
* 24000 lines
* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
095a064ba8
test.yml explicitly says backend ( #14700 )
...
* test.yml explicitly says backend
* 1e-5
2026-02-12 16:03:44 +08:00
nimlgen
14a1991da6
viz: sort tracks in timeline ( #14591 )
...
* viz: sort devices in timeline
* fix
* rev
* upd
* skip
2026-02-12 10:51:41 +03:00
George Hotz
befc1e800c
assembly/amd: disasm is test only ( #14694 )
...
* assembly/amd: disasm is test only
* viz uses str
2026-02-12 12:33:46 +08:00
George Hotz
c331798201
move tests to test/backend ( #14691 )
...
* move tests to test/backend
* fix imports
* fix CI
* revert that one
* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
chenyu
0c63f63ee4
recursive resolve assign dependency ( #14688 )
...
remove the .realize in llm.py
2026-02-11 17:41:05 -05:00
chenyu
cbbc2fdea5
update test_assign_slice_then_read ( #14687 )
...
passes locally now
2026-02-11 15:02:44 -05:00
chenyu
7465b22ba0
handle setitem target in rangeify ( #14685 )
2026-02-11 11:38:59 -05:00
chenyu
0d215b962e
few setitem test cases diff from numpy ( #14684 )
...
have claude fuzzed frontend and found some real bugs
2026-02-11 08:41:03 -05:00
nimlgen
df8b21eeb5
add real self assign test ( #14683 )
...
* self assign fix
* no
2026-02-11 12:41:53 +03:00
George Hotz
4565958792
some lil speedups ( #14679 )
2026-02-11 10:01:58 +08:00
George Hotz
2d4ad9e739
add a waitlist for graph rewrite ( #14678 )
...
* add a waitlist for graph rewrite
* cleaner
* one context on spec check
2026-02-11 09:30:13 +08:00
Christopher Milan
389e2eeda1
Revert "transcendental works with long decomp" ( #14676 )
2026-02-10 19:46:34 -05:00
Christopher Milan
0662c8037d
transcendental works with long decomp ( #14672 )
2026-02-10 19:30:24 -05:00
chenyu
ebef63dba0
update test_self_assign_same_device_copy ( #14673 )
...
that test would have passed without the optimization because .to shortcut
2026-02-10 17:23:43 -05:00
nimlgen
aafa9dcb5b
eliminate same-device copy self-assigns ( #14671 )
...
* eliminate same-device copy self-assigns
* ugh
2026-02-10 22:54:51 +03:00
chenyu
494eec2694
test_setitem_const_fused ( #14668 )
...
did not realize #14640 also fixed #10690 , so added a test for it
2026-02-10 08:33:02 -05:00
George Hotz
8dc46dde07
everything has dtype.long now ( #14661 )
...
* everything has dtype.long now
* int64/uint64 are everywhere now
* that doesn't work
2026-02-10 15:08:50 +08:00
George Hotz
cc9bf8ccbc
move more to null/unit tests ( #14658 )
...
* move more to null tests
* move test_gc
* no test fusion op
2026-02-10 13:35:17 +08:00
chenyu
83f6d28579
two less realize in setitem ( #14655 )
2026-02-09 23:45:24 -05:00
chenyu
0dedf4063c
minor test_setitem cleanup ( #14654 )
2026-02-09 20:40:29 -05:00
Christopher Milan
e6562a5061
remove CompilerPair ( #14638 )
2026-02-09 19:51:18 -05:00
chenyu
9e3f24db9f
assign realize fix ( #14649 )
...
fix the need for explicit assign. track pending assigns for each buffer, and run those before the main realize in order
2026-02-09 17:46:46 -05:00
chenyu
e9f40f49d4
explicitly check advanced setitem ( #14644 )
...
advanced setitem DISK would failed in rangeify with bad error, now it's checked directly in setitem. eventully DISK can use regular setitem path
2026-02-09 13:36:46 -05:00
chenyu
20a132b1c4
relax atol for test_uop_scan_matmul ( #14646 )
...
flaky, also log max diff
2026-02-09 13:25:19 -05:00
chenyu
8a2c23d3dc
raise RuntimeError for setitem dtype mismatch ( #14642 )
2026-02-09 10:37:08 -05:00
qazal
80b0119cef
llama: add new asm gemm shape ( #14611 )
...
* llama: add new asm gemm shape
* work
* cleanup
* half dtype
* more comment
2026-02-10 00:34:29 +09:00
Filip Brzek
1667669c46
fix: python3 -m tinygrad.device reporting on AMD/CPU ( #14622 )
...
* test: device module expects PASS in -m tinygrad.device for CPU
* fix: use device._compiler_name instead of unwrap_class_type(compiler).__name__ in enumerate_devices_str
2026-02-08 20:22:35 +03:00
qazal
087dab4c3b
gemm/asm: split out cdna tests from CI ( #14619 )
...
* gemm/asm: split out cdna tests from CI
* reorder
* work
2026-02-08 21:33:42 +09:00
George Hotz
183d38b128
remove CUSTOM_KERNEL / directly construct it ( #14604 )
...
* remove CUSTOM_KERNEL / directly construct it
* clean that up
* simpler multi
* custom kernel spec
* remove Kernel
* fix multi
* use sharded shape
* explicit regression test
2026-02-08 18:43:33 +08:00
nimlgen
6838b35cff
mockgpu: hevc ( #14606 )
...
* mockgpu: hevc
* eng
2026-02-07 12:27:55 +03:00
chenyu
884592f6c8
pin z3-solver version ( #14605 )
...
found exact input that crashes z3 4.15.4
2026-02-06 22:49:31 -05:00
George Hotz
7a2a3b5c71
Remove Ops.KERNEL, it's all Ops.CALL now ( #14603 )
2026-02-07 10:21:54 +08:00
George Hotz
ca6604eae2
kernel is call ( #14577 )
...
* call is kernel
* closer
* fix bugs
* dedup
* pm_gate_kernel_sink
* better
* Revert "better"
This reverts commit b4c799b810 .
* Reapply "better"
This reverts commit e53f094ce7 .
* cleanups
* work
* remove junk
* subtle fix
* index
* viz cleanups
* disable assert for now
2026-02-07 10:10:14 +08:00
Christopher Milan
ad9e2f0de7
decompose bf16 ( #14601 )
2026-02-06 19:24:09 -05:00
Christopher Milan
7bb45e7df0
decompose fp8 to bigger floats [skip_process_replay] ( #14554 )
...
* decompose fp8 also
* it works
* cleanup
* no shift required
* default to float
* cleanup
* fixes
* fp8e5m2
* don't rely on behavior comparing nans
* cleanup
2026-02-06 19:05:40 -05:00
chenyu
7d193a6e26
fix wgsl bitcast ( #14600 )
...
was wrong for signed int
2026-02-06 16:57:36 -05:00
chenyu
b9fe8b7591
fix opt in process replay [pr] ( #14599 )
2026-02-06 16:49:56 -05:00
chenyu
197ebcbbbc
log seed with flush=True in fuzz_symbolic ( #14597 )
...
* log seed with flush=True in fuzz_symbolic
i think z3 can crash. added reading seed from argv to see if we repro later
* fuzz_symbolic_symbolic_div
2026-02-06 15:03:57 -05:00
qazal
a80fb4e641
viz: better ordering of device engines in profiler ( #14590 )
2026-02-06 23:08:09 +09:00
nimlgen
fbeb978170
diff devices for sdma ( #14589 )
...
* start
* x
* fix
* sdma
* c
* clean
* x
* hm
* cleaer
2026-02-06 16:39:12 +03:00
George Hotz
03af2404e2
small changes and test fixes from kernel is call ( #14586 )
2026-02-06 17:08:33 +08:00
George Hotz
3c26ce29b2
make disk tensor tests process safe ( #14584 )
2026-02-06 15:39:55 +08:00
qazal
cf73d7e2a7
hotfix: disable slower asm gemm shape from llama seqlen 8192 ( #14582 )
2026-02-06 15:05:19 +09:00