George Hotz
c331798201
move tests to test/backend ( #14691 )
...
* move tests to test/backend
* fix imports
* fix CI
* revert that one
* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
chenyu
0c63f63ee4
recursive resolve assign dependency ( #14688 )
...
remove the .realize in llm.py
2026-02-11 17:41:05 -05:00
chenyu
cbbc2fdea5
update test_assign_slice_then_read ( #14687 )
...
passes locally now
2026-02-11 15:02:44 -05:00
chenyu
7465b22ba0
handle setitem target in rangeify ( #14685 )
2026-02-11 11:38:59 -05:00
chenyu
0d215b962e
few setitem test cases diff from numpy ( #14684 )
...
have claude fuzzed frontend and found some real bugs
2026-02-11 08:41:03 -05:00
nimlgen
df8b21eeb5
add real self assign test ( #14683 )
...
* self assign fix
* no
2026-02-11 12:41:53 +03:00
George Hotz
4565958792
some lil speedups ( #14679 )
2026-02-11 10:01:58 +08:00
George Hotz
2d4ad9e739
add a waitlist for graph rewrite ( #14678 )
...
* add a waitlist for graph rewrite
* cleaner
* one context on spec check
2026-02-11 09:30:13 +08:00
Christopher Milan
389e2eeda1
Revert "transcendental works with long decomp" ( #14676 )
2026-02-10 19:46:34 -05:00
Christopher Milan
0662c8037d
transcendental works with long decomp ( #14672 )
2026-02-10 19:30:24 -05:00
chenyu
ebef63dba0
update test_self_assign_same_device_copy ( #14673 )
...
that test would have passed without the optimization because .to shortcut
2026-02-10 17:23:43 -05:00
nimlgen
aafa9dcb5b
eliminate same-device copy self-assigns ( #14671 )
...
* eliminate same-device copy self-assigns
* ugh
2026-02-10 22:54:51 +03:00
chenyu
494eec2694
test_setitem_const_fused ( #14668 )
...
did not realize #14640 also fixed #10690 , so added a test for it
2026-02-10 08:33:02 -05:00
George Hotz
8dc46dde07
everything has dtype.long now ( #14661 )
...
* everything has dtype.long now
* int64/uint64 are everywhere now
* that doesn't work
2026-02-10 15:08:50 +08:00
George Hotz
cc9bf8ccbc
move more to null/unit tests ( #14658 )
...
* move more to null tests
* move test_gc
* no test fusion op
2026-02-10 13:35:17 +08:00
chenyu
83f6d28579
two less realize in setitem ( #14655 )
2026-02-09 23:45:24 -05:00
chenyu
0dedf4063c
minor test_setitem cleanup ( #14654 )
2026-02-09 20:40:29 -05:00
Christopher Milan
e6562a5061
remove CompilerPair ( #14638 )
2026-02-09 19:51:18 -05:00
chenyu
9e3f24db9f
assign realize fix ( #14649 )
...
fix the need for explicit assign. track pending assigns for each buffer, and run those before the main realize in order
2026-02-09 17:46:46 -05:00
chenyu
e9f40f49d4
explicitly check advanced setitem ( #14644 )
...
advanced setitem DISK would failed in rangeify with bad error, now it's checked directly in setitem. eventully DISK can use regular setitem path
2026-02-09 13:36:46 -05:00
chenyu
20a132b1c4
relax atol for test_uop_scan_matmul ( #14646 )
...
flaky, also log max diff
2026-02-09 13:25:19 -05:00
chenyu
8a2c23d3dc
raise RuntimeError for setitem dtype mismatch ( #14642 )
2026-02-09 10:37:08 -05:00
qazal
80b0119cef
llama: add new asm gemm shape ( #14611 )
...
* llama: add new asm gemm shape
* work
* cleanup
* half dtype
* more comment
2026-02-10 00:34:29 +09:00
Filip Brzek
1667669c46
fix: python3 -m tinygrad.device reporting on AMD/CPU ( #14622 )
...
* test: device module expects PASS in -m tinygrad.device for CPU
* fix: use device._compiler_name instead of unwrap_class_type(compiler).__name__ in enumerate_devices_str
2026-02-08 20:22:35 +03:00
qazal
087dab4c3b
gemm/asm: split out cdna tests from CI ( #14619 )
...
* gemm/asm: split out cdna tests from CI
* reorder
* work
2026-02-08 21:33:42 +09:00
George Hotz
183d38b128
remove CUSTOM_KERNEL / directly construct it ( #14604 )
...
* remove CUSTOM_KERNEL / directly construct it
* clean that up
* simpler multi
* custom kernel spec
* remove Kernel
* fix multi
* use sharded shape
* explicit regression test
2026-02-08 18:43:33 +08:00
nimlgen
6838b35cff
mockgpu: hevc ( #14606 )
...
* mockgpu: hevc
* eng
2026-02-07 12:27:55 +03:00
chenyu
884592f6c8
pin z3-solver version ( #14605 )
...
found exact input that crashes z3 4.15.4
2026-02-06 22:49:31 -05:00
George Hotz
7a2a3b5c71
Remove Ops.KERNEL, it's all Ops.CALL now ( #14603 )
2026-02-07 10:21:54 +08:00
George Hotz
ca6604eae2
kernel is call ( #14577 )
...
* call is kernel
* closer
* fix bugs
* dedup
* pm_gate_kernel_sink
* better
* Revert "better"
This reverts commit b4c799b810 .
* Reapply "better"
This reverts commit e53f094ce7 .
* cleanups
* work
* remove junk
* subtle fix
* index
* viz cleanups
* disable assert for now
2026-02-07 10:10:14 +08:00
Christopher Milan
ad9e2f0de7
decompose bf16 ( #14601 )
2026-02-06 19:24:09 -05:00
Christopher Milan
7bb45e7df0
decompose fp8 to bigger floats [skip_process_replay] ( #14554 )
...
* decompose fp8 also
* it works
* cleanup
* no shift required
* default to float
* cleanup
* fixes
* fp8e5m2
* don't rely on behavior comparing nans
* cleanup
2026-02-06 19:05:40 -05:00
chenyu
7d193a6e26
fix wgsl bitcast ( #14600 )
...
was wrong for signed int
2026-02-06 16:57:36 -05:00
chenyu
b9fe8b7591
fix opt in process replay [pr] ( #14599 )
2026-02-06 16:49:56 -05:00
chenyu
197ebcbbbc
log seed with flush=True in fuzz_symbolic ( #14597 )
...
* log seed with flush=True in fuzz_symbolic
i think z3 can crash. added reading seed from argv to see if we repro later
* fuzz_symbolic_symbolic_div
2026-02-06 15:03:57 -05:00
qazal
a80fb4e641
viz: better ordering of device engines in profiler ( #14590 )
2026-02-06 23:08:09 +09:00
nimlgen
fbeb978170
diff devices for sdma ( #14589 )
...
* start
* x
* fix
* sdma
* c
* clean
* x
* hm
* cleaer
2026-02-06 16:39:12 +03:00
George Hotz
03af2404e2
small changes and test fixes from kernel is call ( #14586 )
2026-02-06 17:08:33 +08:00
George Hotz
3c26ce29b2
make disk tensor tests process safe ( #14584 )
2026-02-06 15:39:55 +08:00
qazal
cf73d7e2a7
hotfix: disable slower asm gemm shape from llama seqlen 8192 ( #14582 )
2026-02-06 15:05:19 +09:00
chenyu
15d3344d9e
use int inputs in test_assign ( #14580 )
...
int is less flaky
2026-02-06 00:07:31 -05:00
chenyu
b09dc646f5
revert some late_buffer_view change ( #14578 )
...
revert #14478 which breaks tinyfs
2026-02-05 22:51:40 -05:00
George Hotz
6cbcf98627
KernelInfo is required on get_program ( #14571 )
...
* rangeify always adds KernelInfo
* fix tests
* skip flaky test
2026-02-06 10:49:27 +08:00
chenyu
79b7799dba
clean up linearize schedule [pr] ( #14565 )
...
* clean up linearize schedule [pr]
don't mix ScheduleItem and UOp in schedule queue
* ok
2026-02-05 15:24:09 -05:00
chenyu
41a179f542
fix test_xlm_roberta_large ( #14564 )
...
onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too
2026-02-05 14:56:06 -05:00
Christopher Milan
aa9dc50577
dtype decomps don't require bitshifts ( #14542 )
...
* dtype decomps don't require bitshifts
* simplify shr/shl
* ruff
2026-02-05 14:42:30 -05:00
chenyu
2b47a9a1b5
skip test_xlm_roberta_large ( #14563 )
...
symlink model not allowed in latest onnxruntime
2026-02-05 14:00:24 -05:00
qazal
190042358f
llama: faster bf16 matmul / rope backward ( #14558 )
2026-02-05 23:57:25 +09:00
George Hotz
43e7eda4e7
grad_b uses custom gemm ( #14550 )
...
* grad_b uses custom gemm
* fix multi backward, acc is in float32
* test_gemm_batched
* square gemm
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
Co-authored-by: qazal <qazal.software@gmail.com >
2026-02-05 15:22:27 +09:00
qazal
f9cfb64cd9
test asm_gemm in CI ( #14551 )
...
* test asm_gemm in CI
* default float16
* use a smaller shape for multi
* smaller size
* smaller for CI
* smaller for ci
* need half
2026-02-05 13:32:22 +09:00