Commit Graph

11974 Commits

Author SHA1 Message Date
qazal
616e9c1483 CDNA assembly gemm in tensor.py with flag (#14310)
* work

* work

* the assembly

* remove the old one

* remove ws bufs, assert splitk

* notes cleanup

* work

* gemm args

* gemm in mixins would be nice

* add gemm gradient

* print counters

* the realize is for DEBUG=2 aesthetics

* dedup

* rewrite to python dsl, no list copies

* leave that

* add B, M, N, K to gemm name

* it's M0 not NULL

* fp16 support

* test cleanup + more gemms

* work from viz

* more work

* gemm batch_size

* xccg path work

* tiny comments on the label naming

* s_waitcnt
2026-01-31 22:34:14 +09:00
chenyu
55f806b713 tighter late_buffer_view match [pr] (#14456)
src must be len 2 at that point
2026-01-31 07:28:26 -05:00
qazal
d69bc5aa1a make DEV=NULL EMULATE=AMD amd_asm_matmul run (#14460) 2026-01-31 20:45:24 +09:00
qazal
4976544bf9 multi ram usage tests on the NULL device (#14457) 2026-01-31 14:14:53 +09:00
chenyu
99b44121bc failed test case for non-consecutive disk read (#14455)
silently fail now
2026-01-30 23:44:04 -05:00
George Hotz
b705c9143c assembly/amd: test more instructions (#14365)
* assembly/amd: test more instructions

* more

* passing

* revert

* no const fold

* remove junk

* cleaner
2026-01-31 12:40:22 +08:00
George Hotz
c9a3ddb341 benchmark llama walltime script (#14454)
* benchmark llama walltime script

* adj layers
2026-01-31 10:21:54 +08:00
George Hotz
f5346d6a1a fix USE_ATOMICS for non float dtypes and make it the default (#14444)
* embedded multistep test

* complex test

* with jit

* fix dtypes and reenable USE_ATOMICS

* that test didn't catch anything
2026-01-31 09:44:16 +08:00
Christopher Milan
e575dd8275 prevent UB in long decomp and more emulated tests (#14447) 2026-01-30 19:38:41 -05:00
chenyu
3204f94454 correct var_vals schedule filter (#14451)
complete_create_schedule_with_vars returns var_vals that's used in schedule
2026-01-30 17:10:07 -05:00
chenyu
cfcd1debb5 test schedule with multiple AFTER (#14449) 2026-01-30 15:59:00 -05:00
nimlgen
486d53d646 device: call free for external_ptr (#14448)
* device: call free for external_ptr

* lin
2026-01-30 23:53:17 +03:00
nimlgen
e0978498dc amd: read_ptr/write_ptr/doorbells are not lists (#14445) 2026-01-30 23:11:57 +03:00
Christopher Milan
1803ee939d EMULATED_DTYPES=long works with CPU_LLVM (#14446) 2026-01-30 13:54:43 -05:00
chenyu
03613e83ad update TestTensorMetadata (#14443)
run with SCACHE=0 some more TODOs
2026-01-30 12:39:01 -05:00
George Hotz
cbb1eed57b hotfix: partial revert of 9eb449f88, caused llama NaN 2026-01-30 17:19:27 +00:00
chenyu
26f5c00265 move TestTensorMetadata to unit (#14442) 2026-01-30 12:14:21 -05:00
chenyu
c05a0b85ae flip unique const src order [pr] (#14441)
* flip unique const src order [pr]

matches buffer, simplifies replace_input_buffer

* combine rules
2026-01-30 11:44:18 -05:00
George Hotz
ee2c78709d mlperf/llama: disable USE_ATOMICS for now 2026-01-31 00:42:08 +08:00
chenyu
beecac4d85 expand ranges -> unroll outer ranges [pr] (#14440) 2026-01-30 11:26:05 -05:00
chenyu
9eb449f882 clean up toposort sched_sink [pr] (#14439) 2026-01-30 10:18:28 -05:00
George Hotz
838cd078bc use atomics for embedding backward (#14400)
* embedding is slow

* failing

* float is fine

* null

* it fails

* simplify embedding with broadcasting

* ATOMIC_ADD incoming

* min change

* simpler test

* better test

* fix test

* real test

* simpler

* cleanups

* types and names

* _zero_kernel

* grad multi

* hack

* none

* multi unshard

* more for call

* don't tag in call

* good

* call_multi

* call_multi wow claude is useless

* embedding backward mutli test

* test passes

* fix as_param

* shape_to_shape_arg

* add clip

* before cast

* fix spec=2, use atomics
2026-01-30 18:10:59 +08:00
nimlgen
1998e0bb28 nv: add prof props to dev (#14437) 2026-01-30 12:51:43 +03:00
George Hotz
7a9dee4e50 add call/param UOps (#14433)
* add call/param UOps

* resolve call

* skip that for now

* grad on call

* fix tests
2026-01-30 14:51:45 +08:00
qazal
66d6a68016 viz: sqtt work from cdna gemm (#14434)
* it's the tag

* initialize rows based on the disasm

* test_cfg with Ops.BINARY

* pyremu wants s_code_end?

* test_diamond

* diff cleanup
2026-01-30 14:00:56 +09:00
Christopher Milan
88caf57ef4 ci: unify python versions (#14430) 2026-01-29 21:42:03 -05:00
chenyu
86a204d22a allow Tensor setitem input to be list/tuple (#14432)
matches assign, and generally matches numpy
2026-01-29 21:26:58 -05:00
chenyu
4a80319093 clean up split_store final logic [pr] (#14429)
explicitly check the structure
2026-01-29 18:40:07 -05:00
Christopher Milan
e47f12f671 ci: replace testing_minimal with testing_unit (#14427) 2026-01-29 18:02:43 -05:00
wozeparrot
c2fb8b208f fa: 32 block size (#14416) 2026-01-29 13:59:13 -08:00
chenyu
a979fafae5 cleanup around disk buffer [pr] (#14428)
style change, prep for refactor
2026-01-29 16:18:44 -05:00
nimlgen
dc977a03b0 nv_pma: bw decoder (#14424)
* nv_pma: bw decoder

* decoder fix

* better
2026-01-30 00:12:39 +03:00
chenyu
ddc041854b failed test case for disk setitem (#14426)
strided setitem is wrong
2026-01-29 14:54:19 -05:00
chenyu
31706bf6bc add few more types [pr] (#14425) 2026-01-29 14:04:09 -05:00
nimlgen
2d5c24879f nv: pma for 5090 (#14420)
* nv: pma for 5090

* hm

* 4090
2026-01-29 20:06:01 +03:00
nimlgen
c8dc6332d2 memory: read_fields is not universal (#14348) 2026-01-29 20:00:00 +03:00
chenyu
dbe8f034a7 pass z3.Context in validate ctx [pr] (#14423)
does not need to pass the whole solver
2026-01-29 11:11:47 -05:00
chenyu
033ce1b885 types for validate.py (#14422) 2026-01-29 10:56:50 -05:00
nimlgen
230d08ec70 test for am recovery and faults handling (#14421)
* test for am recovery and faults handling

* linter
2026-01-29 17:11:24 +03:00
George Hotz
793afbd473 simplify nn.Embedding, support AFTER in CUSTOM_KERNEL (#14419) 2026-01-29 17:22:13 +08:00
Christopher Milan
0c855d6149 ci: remove unused pydeps (#14418) 2026-01-29 01:51:26 -05:00
wozeparrot
4845e42135 llama3 gradacc fixes (#14414) 2026-01-28 19:12:39 -08:00
chenyu
37cde4a01a add one line mypy report (#14415) 2026-01-28 20:39:32 -05:00
chenyu
15aed51544 return types for all math.py function (#14413)
calling int() on sint -> int, i think it's better support since some UOp can be safely cast to int
2026-01-28 20:10:11 -05:00
nimlgen
aec1ae0de1 llama: set manual_seed (#14409) 2026-01-28 14:40:00 -08:00
chenyu
0870ed28b1 add Self type to MathMixin (#14411)
these don't cause error
2026-01-28 16:59:38 -05:00
chenyu
079f33c208 fix type in Tensor.mean and Tensor.var (#14410)
use Tensor.from_uop to wrap UOp from symbolic shape, kernels are the same
2026-01-28 15:24:02 -05:00
chenyu
2b5e99ccc1 minor type cleanups [pr] (#14408)
mypy --warn-redundant-casts has false negative
2026-01-28 14:11:50 -05:00
chenyu
726415dbc8 import sint directly in movement.py TYPE_CHECKING (#14406)
avoid creating string TypeAlias, fixed warning in `TYPED=1 python test/test_tiny.py`
2026-01-28 12:47:26 -05:00
nimlgen
acb2fc36ba nv_pma: add decoder (#14404)
* nv_pma: add decoder

* cl
2026-01-28 20:44:02 +03:00