Commit Graph

5037 Commits

Author SHA1 Message Date
George Hotz
c331798201 move tests to test/backend (#14691)
* move tests to test/backend

* fix imports

* fix CI

* revert that one

* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
chenyu
0c63f63ee4 recursive resolve assign dependency (#14688)
remove the .realize in llm.py
2026-02-11 17:41:05 -05:00
chenyu
cbbc2fdea5 update test_assign_slice_then_read (#14687)
passes locally now
2026-02-11 15:02:44 -05:00
chenyu
7465b22ba0 handle setitem target in rangeify (#14685) 2026-02-11 11:38:59 -05:00
chenyu
0d215b962e few setitem test cases diff from numpy (#14684)
have claude fuzzed frontend and found some real bugs
2026-02-11 08:41:03 -05:00
nimlgen
df8b21eeb5 add real self assign test (#14683)
* self assign fix

* no
2026-02-11 12:41:53 +03:00
George Hotz
4565958792 some lil speedups (#14679) 2026-02-11 10:01:58 +08:00
George Hotz
2d4ad9e739 add a waitlist for graph rewrite (#14678)
* add a waitlist for graph rewrite

* cleaner

* one context on spec check
2026-02-11 09:30:13 +08:00
Christopher Milan
389e2eeda1 Revert "transcendental works with long decomp" (#14676) 2026-02-10 19:46:34 -05:00
Christopher Milan
0662c8037d transcendental works with long decomp (#14672) 2026-02-10 19:30:24 -05:00
chenyu
ebef63dba0 update test_self_assign_same_device_copy (#14673)
that test would have passed without the optimization because .to shortcut
2026-02-10 17:23:43 -05:00
nimlgen
aafa9dcb5b eliminate same-device copy self-assigns (#14671)
* eliminate same-device copy self-assigns

* ugh
2026-02-10 22:54:51 +03:00
chenyu
494eec2694 test_setitem_const_fused (#14668)
did not realize #14640 also fixed #10690, so added a test for it
2026-02-10 08:33:02 -05:00
George Hotz
8dc46dde07 everything has dtype.long now (#14661)
* everything has dtype.long now

* int64/uint64 are everywhere now

* that doesn't work
2026-02-10 15:08:50 +08:00
George Hotz
cc9bf8ccbc move more to null/unit tests (#14658)
* move more to null tests

* move test_gc

* no test fusion op
2026-02-10 13:35:17 +08:00
chenyu
83f6d28579 two less realize in setitem (#14655) 2026-02-09 23:45:24 -05:00
chenyu
0dedf4063c minor test_setitem cleanup (#14654) 2026-02-09 20:40:29 -05:00
Christopher Milan
e6562a5061 remove CompilerPair (#14638) 2026-02-09 19:51:18 -05:00
chenyu
9e3f24db9f assign realize fix (#14649)
fix the need for explicit assign. track pending assigns for each buffer, and run those before the main realize in order
2026-02-09 17:46:46 -05:00
chenyu
e9f40f49d4 explicitly check advanced setitem (#14644)
advanced setitem DISK would failed in rangeify with bad error, now it's checked directly in setitem. eventully DISK can use regular setitem path
2026-02-09 13:36:46 -05:00
chenyu
20a132b1c4 relax atol for test_uop_scan_matmul (#14646)
flaky, also log max diff
2026-02-09 13:25:19 -05:00
chenyu
8a2c23d3dc raise RuntimeError for setitem dtype mismatch (#14642) 2026-02-09 10:37:08 -05:00
qazal
80b0119cef llama: add new asm gemm shape (#14611)
* llama: add new asm gemm shape

* work

* cleanup

* half dtype

* more comment
2026-02-10 00:34:29 +09:00
Filip Brzek
1667669c46 fix: python3 -m tinygrad.device reporting on AMD/CPU (#14622)
* test: device module expects PASS in -m tinygrad.device for CPU

* fix: use device._compiler_name instead of unwrap_class_type(compiler).__name__ in enumerate_devices_str
2026-02-08 20:22:35 +03:00
qazal
087dab4c3b gemm/asm: split out cdna tests from CI (#14619)
* gemm/asm: split out cdna tests from CI

* reorder

* work
2026-02-08 21:33:42 +09:00
George Hotz
183d38b128 remove CUSTOM_KERNEL / directly construct it (#14604)
* remove CUSTOM_KERNEL / directly construct it

* clean that up

* simpler multi

* custom kernel spec

* remove Kernel

* fix multi

* use sharded shape

* explicit regression test
2026-02-08 18:43:33 +08:00
nimlgen
6838b35cff mockgpu: hevc (#14606)
* mockgpu: hevc

* eng
2026-02-07 12:27:55 +03:00
chenyu
884592f6c8 pin z3-solver version (#14605)
found exact input that crashes z3 4.15.4
2026-02-06 22:49:31 -05:00
George Hotz
7a2a3b5c71 Remove Ops.KERNEL, it's all Ops.CALL now (#14603) 2026-02-07 10:21:54 +08:00
George Hotz
ca6604eae2 kernel is call (#14577)
* call is kernel

* closer

* fix bugs

* dedup

* pm_gate_kernel_sink

* better

* Revert "better"

This reverts commit b4c799b810.

* Reapply "better"

This reverts commit e53f094ce7.

* cleanups

* work

* remove junk

* subtle fix

* index

* viz cleanups

* disable assert for now
2026-02-07 10:10:14 +08:00
Christopher Milan
ad9e2f0de7 decompose bf16 (#14601) 2026-02-06 19:24:09 -05:00
Christopher Milan
7bb45e7df0 decompose fp8 to bigger floats [skip_process_replay] (#14554)
* decompose fp8 also

* it works

* cleanup

* no shift required

* default to float

* cleanup

* fixes

* fp8e5m2

* don't rely on behavior comparing nans

* cleanup
2026-02-06 19:05:40 -05:00
chenyu
7d193a6e26 fix wgsl bitcast (#14600)
was wrong for signed int
2026-02-06 16:57:36 -05:00
chenyu
b9fe8b7591 fix opt in process replay [pr] (#14599) 2026-02-06 16:49:56 -05:00
chenyu
197ebcbbbc log seed with flush=True in fuzz_symbolic (#14597)
* log seed with flush=True in fuzz_symbolic

i think z3 can crash. added reading seed from argv to see if we repro later

* fuzz_symbolic_symbolic_div
2026-02-06 15:03:57 -05:00
qazal
a80fb4e641 viz: better ordering of device engines in profiler (#14590) 2026-02-06 23:08:09 +09:00
nimlgen
fbeb978170 diff devices for sdma (#14589)
* start

* x

* fix

* sdma

* c

* clean

* x

* hm

* cleaer
2026-02-06 16:39:12 +03:00
George Hotz
03af2404e2 small changes and test fixes from kernel is call (#14586) 2026-02-06 17:08:33 +08:00
George Hotz
3c26ce29b2 make disk tensor tests process safe (#14584) 2026-02-06 15:39:55 +08:00
qazal
cf73d7e2a7 hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582) 2026-02-06 15:05:19 +09:00
chenyu
15d3344d9e use int inputs in test_assign (#14580)
int is less flaky
2026-02-06 00:07:31 -05:00
chenyu
b09dc646f5 revert some late_buffer_view change (#14578)
revert #14478 which breaks tinyfs
2026-02-05 22:51:40 -05:00
George Hotz
6cbcf98627 KernelInfo is required on get_program (#14571)
* rangeify always adds KernelInfo

* fix tests

* skip flaky test
2026-02-06 10:49:27 +08:00
chenyu
79b7799dba clean up linearize schedule [pr] (#14565)
* clean up linearize schedule [pr]

don't mix ScheduleItem and UOp in schedule queue

* ok
2026-02-05 15:24:09 -05:00
chenyu
41a179f542 fix test_xlm_roberta_large (#14564)
onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too
2026-02-05 14:56:06 -05:00
Christopher Milan
aa9dc50577 dtype decomps don't require bitshifts (#14542)
* dtype decomps don't require bitshifts

* simplify shr/shl

* ruff
2026-02-05 14:42:30 -05:00
chenyu
2b47a9a1b5 skip test_xlm_roberta_large (#14563)
symlink model not allowed in latest onnxruntime
2026-02-05 14:00:24 -05:00
qazal
190042358f llama: faster bf16 matmul / rope backward (#14558) 2026-02-05 23:57:25 +09:00
George Hotz
43e7eda4e7 grad_b uses custom gemm (#14550)
* grad_b uses custom gemm

* fix multi backward, acc is in float32

* test_gemm_batched

* square gemm

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
Co-authored-by: qazal <qazal.software@gmail.com>
2026-02-05 15:22:27 +09:00
qazal
f9cfb64cd9 test asm_gemm in CI (#14551)
* test asm_gemm in CI

* default float16

* use a smaller shape for multi

* smaller size

* smaller for CI

* smaller for ci

* need half
2026-02-05 13:32:22 +09:00