chenyu
86352988d8
update test_uops_stats for setitem ( #14710 )
...
realize both full tensor and the slice should not add to global_mem
2026-02-12 12:26:13 -05:00
chenyu
56caf6a3a2
fix Estimate.from_uops for sliced access ( #14695 )
...
"assume all DEFINE_GLOBAL memory is accessed" is wrong for partial load. get accessed accumulated from INDEX, then cap at full size. now mem_est never exceeds lds_est
2026-02-12 11:18:07 -05:00
chenyu
8551fa50d3
support bitcast in sym_infer ( #14708 )
...
fixed `DEBUG=2 DEV=WEBGPU python -m pytest test/backend/test_tensor_variable.py::TestTensorVariable::test_symbolic_pad`
2026-02-12 10:21:05 -05:00
chenyu
212789e31e
fix long_decomp with None tag ( #14707 )
...
fixed `DEBUG=2 WEBGPU=1 python -m pytest test/null/test_tensor.py::TestIdxUpcast::test_int64_unsupported_overflow_sym`
2026-02-12 09:31:52 -05:00
chenyu
557134e1c7
model/test fix that failed with WEBGPU=1 DEBUG=2 ( #14706 )
2026-02-12 09:08:16 -05:00
nimlgen
10c94d2c2d
amd: print more info about device hang ( #14705 )
2026-02-12 15:34:08 +03:00
nimlgen
b376bd7a21
jit: fix raw in same kernel ( #14699 )
...
* jit: fix raw in same kernel
* fix
* ugh
* x
* simpler
2026-02-12 15:33:32 +03:00
George Hotz
19e68a1833
skip AMD on not AMD ( #14703 )
2026-02-12 18:56:54 +08:00
George Hotz
b7dade2adf
hotfix: skip test/amd in macpytest
2026-02-12 18:16:04 +08:00
George Hotz
4680247e35
renderer/amd: move in tree ( #14702 )
...
* renderer/amd: move in tree
* fix paths in tests
* 24000 lines
* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
d5fc3ea1ba
assembly/amd: mypy+ruff passes ( #14701 )
...
* assembly/amd: mypy+ruff passes
* touchups
2026-02-12 16:59:42 +08:00
George Hotz
095a064ba8
test.yml explicitly says backend ( #14700 )
...
* test.yml explicitly says backend
* 1e-5
2026-02-12 16:03:44 +08:00
nimlgen
14a1991da6
viz: sort tracks in timeline ( #14591 )
...
* viz: sort devices in timeline
* fix
* rev
* upd
* skip
2026-02-12 10:51:41 +03:00
George Hotz
025049c521
clean up sqtt / update src formatting in viz ( #14696 )
...
* update src formatting in viz
* rename to RDNA3/RDNA4 in sqtt
* wrap
* move sqttmap
* update readme
* why did that change?
* cdna
* that's just for test
2026-02-12 14:27:14 +08:00
Christopher Milan
b1a3876492
IMAGE=1 supports FLOAT16=1 ( #14693 )
...
requires 2d indexing to be actually fast
2026-02-12 00:30:55 -05:00
George Hotz
befc1e800c
assembly/amd: disasm is test only ( #14694 )
...
* assembly/amd: disasm is test only
* viz uses str
2026-02-12 12:33:46 +08:00
George Hotz
c331798201
move tests to test/backend ( #14691 )
...
* move tests to test/backend
* fix imports
* fix CI
* revert that one
* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
wozeparrot
4b5d3bda1f
llama3: data seed ( #14681 )
2026-02-11 19:04:40 -08:00
chenyu
0c63f63ee4
recursive resolve assign dependency ( #14688 )
...
remove the .realize in llm.py
2026-02-11 17:41:05 -05:00
nimlgen
869083e373
nv: pciiface pma ( #14686 )
...
* x
* w
* z
* clean
* o
* r
* x
* c
* r
* list
* deanon
* b
2026-02-11 23:29:07 +03:00
chenyu
cbbc2fdea5
update test_assign_slice_then_read ( #14687 )
...
passes locally now
2026-02-11 15:02:44 -05:00
chenyu
7465b22ba0
handle setitem target in rangeify ( #14685 )
2026-02-11 11:38:59 -05:00
chenyu
0d215b962e
few setitem test cases diff from numpy ( #14684 )
...
have claude fuzzed frontend and found some real bugs
2026-02-11 08:41:03 -05:00
nimlgen
df8b21eeb5
add real self assign test ( #14683 )
...
* self assign fix
* no
2026-02-11 12:41:53 +03:00
wozeparrot
a60220bed9
llama3: move dl to numpy & jit more ( #14677 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-02-10 18:16:40 -08:00
George Hotz
4565958792
some lil speedups ( #14679 )
2026-02-11 10:01:58 +08:00
George Hotz
2d4ad9e739
add a waitlist for graph rewrite ( #14678 )
...
* add a waitlist for graph rewrite
* cleaner
* one context on spec check
2026-02-11 09:30:13 +08:00
Christopher Milan
389e2eeda1
Revert "transcendental works with long decomp" ( #14676 )
2026-02-10 19:46:34 -05:00
Christopher Milan
0662c8037d
transcendental works with long decomp ( #14672 )
2026-02-10 19:30:24 -05:00
George Hotz
3fab43c57c
add cache to asm gemm ( #14675 )
2026-02-11 08:26:30 +08:00
chenyu
ebef63dba0
update test_self_assign_same_device_copy ( #14673 )
...
that test would have passed without the optimization because .to shortcut
2026-02-10 17:23:43 -05:00
nimlgen
aafa9dcb5b
eliminate same-device copy self-assigns ( #14671 )
...
* eliminate same-device copy self-assigns
* ugh
2026-02-10 22:54:51 +03:00
chenyu
494eec2694
test_setitem_const_fused ( #14668 )
...
did not realize #14640 also fixed #10690 , so added a test for it
2026-02-10 08:33:02 -05:00
nimlgen
42ded7c34d
amd: bind aql ( #14666 )
...
* amd: bind to aql
* bind
* x
* f
2026-02-10 16:28:11 +03:00
George Hotz
82974929b7
use PARAM in schedule ( #14665 )
...
* use PARAM in schedule
* create_new_buffer
2026-02-10 19:18:40 +08:00
George Hotz
8dc46dde07
everything has dtype.long now ( #14661 )
...
* everything has dtype.long now
* int64/uint64 are everywhere now
* that doesn't work
2026-02-10 15:08:50 +08:00
Christopher Milan
cdb78954cb
better cl compiler name ( #14660 )
...
cl_compiler instead of compiler because overriding Compiled.compiler seems more confusing
2026-02-10 01:03:46 -05:00
George Hotz
cc9bf8ccbc
move more to null/unit tests ( #14658 )
...
* move more to null tests
* move test_gc
* no test fusion op
2026-02-10 13:35:17 +08:00
chenyu
83f6d28579
two less realize in setitem ( #14655 )
2026-02-09 23:45:24 -05:00
wozeparrot
69574542ab
fix: use correct fa implementation in eval ( #14651 )
2026-02-09 18:20:44 -08:00
chenyu
0dedf4063c
minor test_setitem cleanup ( #14654 )
2026-02-09 20:40:29 -05:00
Christopher Milan
b36b62eb59
don't push docker cache for PRs ( #14652 )
2026-02-09 19:55:55 -05:00
Christopher Milan
e6562a5061
remove CompilerPair ( #14638 )
2026-02-09 19:51:18 -05:00
Christopher Milan
396e1320fb
bump cache version for z3 ( #14650 )
2026-02-09 19:32:07 -05:00
chenyu
9e3f24db9f
assign realize fix ( #14649 )
...
fix the need for explicit assign. track pending assigns for each buffer, and run those before the main realize in order
2026-02-09 17:46:46 -05:00
chenyu
0913c068ea
clean up setitem disk path ( #14648 )
2026-02-09 15:58:04 -05:00
chenyu
205a1212b7
delegate non Tensor src setitem to assign ( #14647 )
...
cannot do this for DISK in the unified path
2026-02-09 13:53:20 -05:00
chenyu
e9f40f49d4
explicitly check advanced setitem ( #14644 )
...
advanced setitem DISK would failed in rangeify with bad error, now it's checked directly in setitem. eventully DISK can use regular setitem path
2026-02-09 13:36:46 -05:00
chenyu
20a132b1c4
relax atol for test_uop_scan_matmul ( #14646 )
...
flaky, also log max diff
2026-02-09 13:25:19 -05:00
qazal
50d3f6cea5
EVAL_BS=0 in llama profile ( #14643 )
2026-02-10 00:49:43 +09:00