Commit Graph

12163 Commits

Author SHA1 Message Date
chenyu
86352988d8 update test_uops_stats for setitem (#14710)
realize both full tensor and the slice should not add to global_mem
2026-02-12 12:26:13 -05:00
chenyu
56caf6a3a2 fix Estimate.from_uops for sliced access (#14695)
"assume all DEFINE_GLOBAL memory is accessed" is wrong for partial load. get accessed accumulated from INDEX, then cap at full size. now mem_est never exceeds lds_est
2026-02-12 11:18:07 -05:00
chenyu
8551fa50d3 support bitcast in sym_infer (#14708)
fixed `DEBUG=2 DEV=WEBGPU python -m pytest test/backend/test_tensor_variable.py::TestTensorVariable::test_symbolic_pad`
2026-02-12 10:21:05 -05:00
chenyu
212789e31e fix long_decomp with None tag (#14707)
fixed `DEBUG=2 WEBGPU=1 python -m pytest test/null/test_tensor.py::TestIdxUpcast::test_int64_unsupported_overflow_sym`
2026-02-12 09:31:52 -05:00
chenyu
557134e1c7 model/test fix that failed with WEBGPU=1 DEBUG=2 (#14706) 2026-02-12 09:08:16 -05:00
nimlgen
10c94d2c2d amd: print more info about device hang (#14705) 2026-02-12 15:34:08 +03:00
nimlgen
b376bd7a21 jit: fix raw in same kernel (#14699)
* jit: fix raw in same kernel

* fix

* ugh

* x

* simpler
2026-02-12 15:33:32 +03:00
George Hotz
19e68a1833 skip AMD on not AMD (#14703) 2026-02-12 18:56:54 +08:00
George Hotz
b7dade2adf hotfix: skip test/amd in macpytest 2026-02-12 18:16:04 +08:00
George Hotz
4680247e35 renderer/amd: move in tree (#14702)
* renderer/amd: move in tree

* fix paths in tests

* 24000 lines

* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
d5fc3ea1ba assembly/amd: mypy+ruff passes (#14701)
* assembly/amd: mypy+ruff passes

* touchups
2026-02-12 16:59:42 +08:00
George Hotz
095a064ba8 test.yml explicitly says backend (#14700)
* test.yml explicitly says backend

* 1e-5
2026-02-12 16:03:44 +08:00
nimlgen
14a1991da6 viz: sort tracks in timeline (#14591)
* viz: sort devices in timeline

* fix

* rev

* upd

* skip
2026-02-12 10:51:41 +03:00
George Hotz
025049c521 clean up sqtt / update src formatting in viz (#14696)
* update src formatting in viz

* rename to RDNA3/RDNA4 in sqtt

* wrap

* move sqttmap

* update readme

* why did that change?

* cdna

* that's just for test
2026-02-12 14:27:14 +08:00
Christopher Milan
b1a3876492 IMAGE=1 supports FLOAT16=1 (#14693)
requires 2d indexing to be actually fast
2026-02-12 00:30:55 -05:00
George Hotz
befc1e800c assembly/amd: disasm is test only (#14694)
* assembly/amd: disasm is test only

* viz uses str
2026-02-12 12:33:46 +08:00
George Hotz
c331798201 move tests to test/backend (#14691)
* move tests to test/backend

* fix imports

* fix CI

* revert that one

* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
wozeparrot
4b5d3bda1f llama3: data seed (#14681) 2026-02-11 19:04:40 -08:00
chenyu
0c63f63ee4 recursive resolve assign dependency (#14688)
remove the .realize in llm.py
2026-02-11 17:41:05 -05:00
nimlgen
869083e373 nv: pciiface pma (#14686)
* x

* w

* z

* clean

* o

* r

* x

* c

* r

* list

* deanon

* b
2026-02-11 23:29:07 +03:00
chenyu
cbbc2fdea5 update test_assign_slice_then_read (#14687)
passes locally now
2026-02-11 15:02:44 -05:00
chenyu
7465b22ba0 handle setitem target in rangeify (#14685) 2026-02-11 11:38:59 -05:00
chenyu
0d215b962e few setitem test cases diff from numpy (#14684)
have claude fuzzed frontend and found some real bugs
2026-02-11 08:41:03 -05:00
nimlgen
df8b21eeb5 add real self assign test (#14683)
* self assign fix

* no
2026-02-11 12:41:53 +03:00
wozeparrot
a60220bed9 llama3: move dl to numpy & jit more (#14677)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-02-10 18:16:40 -08:00
George Hotz
4565958792 some lil speedups (#14679) 2026-02-11 10:01:58 +08:00
George Hotz
2d4ad9e739 add a waitlist for graph rewrite (#14678)
* add a waitlist for graph rewrite

* cleaner

* one context on spec check
2026-02-11 09:30:13 +08:00
Christopher Milan
389e2eeda1 Revert "transcendental works with long decomp" (#14676) 2026-02-10 19:46:34 -05:00
Christopher Milan
0662c8037d transcendental works with long decomp (#14672) 2026-02-10 19:30:24 -05:00
George Hotz
3fab43c57c add cache to asm gemm (#14675) 2026-02-11 08:26:30 +08:00
chenyu
ebef63dba0 update test_self_assign_same_device_copy (#14673)
that test would have passed without the optimization because .to shortcut
2026-02-10 17:23:43 -05:00
nimlgen
aafa9dcb5b eliminate same-device copy self-assigns (#14671)
* eliminate same-device copy self-assigns

* ugh
2026-02-10 22:54:51 +03:00
chenyu
494eec2694 test_setitem_const_fused (#14668)
did not realize #14640 also fixed #10690, so added a test for it
2026-02-10 08:33:02 -05:00
nimlgen
42ded7c34d amd: bind aql (#14666)
* amd: bind to aql

* bind

* x

* f
2026-02-10 16:28:11 +03:00
George Hotz
82974929b7 use PARAM in schedule (#14665)
* use PARAM in schedule

* create_new_buffer
2026-02-10 19:18:40 +08:00
George Hotz
8dc46dde07 everything has dtype.long now (#14661)
* everything has dtype.long now

* int64/uint64 are everywhere now

* that doesn't work
2026-02-10 15:08:50 +08:00
Christopher Milan
cdb78954cb better cl compiler name (#14660)
cl_compiler instead of compiler because overriding Compiled.compiler seems more confusing
2026-02-10 01:03:46 -05:00
George Hotz
cc9bf8ccbc move more to null/unit tests (#14658)
* move more to null tests

* move test_gc

* no test fusion op
2026-02-10 13:35:17 +08:00
chenyu
83f6d28579 two less realize in setitem (#14655) 2026-02-09 23:45:24 -05:00
wozeparrot
69574542ab fix: use correct fa implementation in eval (#14651) 2026-02-09 18:20:44 -08:00
chenyu
0dedf4063c minor test_setitem cleanup (#14654) 2026-02-09 20:40:29 -05:00
Christopher Milan
b36b62eb59 don't push docker cache for PRs (#14652) 2026-02-09 19:55:55 -05:00
Christopher Milan
e6562a5061 remove CompilerPair (#14638) 2026-02-09 19:51:18 -05:00
Christopher Milan
396e1320fb bump cache version for z3 (#14650) 2026-02-09 19:32:07 -05:00
chenyu
9e3f24db9f assign realize fix (#14649)
fix the need for explicit assign. track pending assigns for each buffer, and run those before the main realize in order
2026-02-09 17:46:46 -05:00
chenyu
0913c068ea clean up setitem disk path (#14648) 2026-02-09 15:58:04 -05:00
chenyu
205a1212b7 delegate non Tensor src setitem to assign (#14647)
cannot do this for DISK in the unified path
2026-02-09 13:53:20 -05:00
chenyu
e9f40f49d4 explicitly check advanced setitem (#14644)
advanced setitem DISK would failed in rangeify with bad error, now it's checked directly in setitem. eventully DISK can use regular setitem path
2026-02-09 13:36:46 -05:00
chenyu
20a132b1c4 relax atol for test_uop_scan_matmul (#14646)
flaky, also log max diff
2026-02-09 13:25:19 -05:00
qazal
50d3f6cea5 EVAL_BS=0 in llama profile (#14643) 2026-02-10 00:49:43 +09:00