Commit Graph

11963 Commits

Author SHA1 Message Date
nimlgen
486d53d646 device: call free for external_ptr (#14448)
* device: call free for external_ptr

* lin
2026-01-30 23:53:17 +03:00
nimlgen
e0978498dc amd: read_ptr/write_ptr/doorbells are not lists (#14445) 2026-01-30 23:11:57 +03:00
Christopher Milan
1803ee939d EMULATED_DTYPES=long works with CPU_LLVM (#14446) 2026-01-30 13:54:43 -05:00
chenyu
03613e83ad update TestTensorMetadata (#14443)
run with SCACHE=0 some more TODOs
2026-01-30 12:39:01 -05:00
George Hotz
cbb1eed57b hotfix: partial revert of 9eb449f88, caused llama NaN 2026-01-30 17:19:27 +00:00
chenyu
26f5c00265 move TestTensorMetadata to unit (#14442) 2026-01-30 12:14:21 -05:00
chenyu
c05a0b85ae flip unique const src order [pr] (#14441)
* flip unique const src order [pr]

matches buffer, simplifies replace_input_buffer

* combine rules
2026-01-30 11:44:18 -05:00
George Hotz
ee2c78709d mlperf/llama: disable USE_ATOMICS for now 2026-01-31 00:42:08 +08:00
chenyu
beecac4d85 expand ranges -> unroll outer ranges [pr] (#14440) 2026-01-30 11:26:05 -05:00
chenyu
9eb449f882 clean up toposort sched_sink [pr] (#14439) 2026-01-30 10:18:28 -05:00
George Hotz
838cd078bc use atomics for embedding backward (#14400)
* embedding is slow

* failing

* float is fine

* null

* it fails

* simplify embedding with broadcasting

* ATOMIC_ADD incoming

* min change

* simpler test

* better test

* fix test

* real test

* simpler

* cleanups

* types and names

* _zero_kernel

* grad multi

* hack

* none

* multi unshard

* more for call

* don't tag in call

* good

* call_multi

* call_multi wow claude is useless

* embedding backward mutli test

* test passes

* fix as_param

* shape_to_shape_arg

* add clip

* before cast

* fix spec=2, use atomics
2026-01-30 18:10:59 +08:00
nimlgen
1998e0bb28 nv: add prof props to dev (#14437) 2026-01-30 12:51:43 +03:00
George Hotz
7a9dee4e50 add call/param UOps (#14433)
* add call/param UOps

* resolve call

* skip that for now

* grad on call

* fix tests
2026-01-30 14:51:45 +08:00
qazal
66d6a68016 viz: sqtt work from cdna gemm (#14434)
* it's the tag

* initialize rows based on the disasm

* test_cfg with Ops.BINARY

* pyremu wants s_code_end?

* test_diamond

* diff cleanup
2026-01-30 14:00:56 +09:00
Christopher Milan
88caf57ef4 ci: unify python versions (#14430) 2026-01-29 21:42:03 -05:00
chenyu
86a204d22a allow Tensor setitem input to be list/tuple (#14432)
matches assign, and generally matches numpy
2026-01-29 21:26:58 -05:00
chenyu
4a80319093 clean up split_store final logic [pr] (#14429)
explicitly check the structure
2026-01-29 18:40:07 -05:00
Christopher Milan
e47f12f671 ci: replace testing_minimal with testing_unit (#14427) 2026-01-29 18:02:43 -05:00
wozeparrot
c2fb8b208f fa: 32 block size (#14416) 2026-01-29 13:59:13 -08:00
chenyu
a979fafae5 cleanup around disk buffer [pr] (#14428)
style change, prep for refactor
2026-01-29 16:18:44 -05:00
nimlgen
dc977a03b0 nv_pma: bw decoder (#14424)
* nv_pma: bw decoder

* decoder fix

* better
2026-01-30 00:12:39 +03:00
chenyu
ddc041854b failed test case for disk setitem (#14426)
strided setitem is wrong
2026-01-29 14:54:19 -05:00
chenyu
31706bf6bc add few more types [pr] (#14425) 2026-01-29 14:04:09 -05:00
nimlgen
2d5c24879f nv: pma for 5090 (#14420)
* nv: pma for 5090

* hm

* 4090
2026-01-29 20:06:01 +03:00
nimlgen
c8dc6332d2 memory: read_fields is not universal (#14348) 2026-01-29 20:00:00 +03:00
chenyu
dbe8f034a7 pass z3.Context in validate ctx [pr] (#14423)
does not need to pass the whole solver
2026-01-29 11:11:47 -05:00
chenyu
033ce1b885 types for validate.py (#14422) 2026-01-29 10:56:50 -05:00
nimlgen
230d08ec70 test for am recovery and faults handling (#14421)
* test for am recovery and faults handling

* linter
2026-01-29 17:11:24 +03:00
George Hotz
793afbd473 simplify nn.Embedding, support AFTER in CUSTOM_KERNEL (#14419) 2026-01-29 17:22:13 +08:00
Christopher Milan
0c855d6149 ci: remove unused pydeps (#14418) 2026-01-29 01:51:26 -05:00
wozeparrot
4845e42135 llama3 gradacc fixes (#14414) 2026-01-28 19:12:39 -08:00
chenyu
37cde4a01a add one line mypy report (#14415) 2026-01-28 20:39:32 -05:00
chenyu
15aed51544 return types for all math.py function (#14413)
calling int() on sint -> int, i think it's better support since some UOp can be safely cast to int
2026-01-28 20:10:11 -05:00
nimlgen
aec1ae0de1 llama: set manual_seed (#14409) 2026-01-28 14:40:00 -08:00
chenyu
0870ed28b1 add Self type to MathMixin (#14411)
these don't cause error
2026-01-28 16:59:38 -05:00
chenyu
079f33c208 fix type in Tensor.mean and Tensor.var (#14410)
use Tensor.from_uop to wrap UOp from symbolic shape, kernels are the same
2026-01-28 15:24:02 -05:00
chenyu
2b5e99ccc1 minor type cleanups [pr] (#14408)
mypy --warn-redundant-casts has false negative
2026-01-28 14:11:50 -05:00
chenyu
726415dbc8 import sint directly in movement.py TYPE_CHECKING (#14406)
avoid creating string TypeAlias, fixed warning in `TYPED=1 python test/test_tiny.py`
2026-01-28 12:47:26 -05:00
nimlgen
acb2fc36ba nv_pma: add decoder (#14404)
* nv_pma: add decoder

* cl
2026-01-28 20:44:02 +03:00
chenyu
7b9bc1d8cf _MockMemoryviewMeta for mockgpu (#14405)
fixed `PYTHONPATH=. TYPED=1 DEV=AMD MOCKGPU=1 python test/test_tiny.py`. basically make `isinstance(TrackedMemoryView_instance, memoryview)` true
2026-01-28 11:59:00 -05:00
chenyu
93793a645b use cl.cl_mem instead of internal ctypes._CData (#14403)
fixed `CHECK_OOB=0 DEV=CL TYPED=1 python test/test_tiny.py`
2026-01-28 10:56:41 -05:00
chenyu
a9b44070a8 fix webgpu runtime types (#14402)
`CHECK_OOB=0 DEV=WEBGPU TYPED=1 python test/test_tiny.py` passed, also skip tests that failed locally
2026-01-28 10:37:25 -05:00
George Hotz
0c6b3f50aa add marker to llama training (#14401) 2026-01-28 22:44:28 +08:00
Jakob Sachs
2b7c00d3d2 fix sd-example dtype for CLIP embeddings (#14397) 2026-01-28 09:07:19 -05:00
qazal
a5a9ce3fdf viz: disasm cleanups from null emulate (#14399)
* it's AMDHIPRenderer

* don't need that indent

* less assignment stuff

* that arg order did not make sense

* pmc
2026-01-28 22:03:30 +09:00
nimlgen
544928766d hcq_smi: kill mac pids (#14398) 2026-01-28 15:00:28 +03:00
George Hotz
202b74b369 assembly/amd: continue refactors (#14386)
* simpler

* merge

* flat

* no ctx

* use the correct apis

* dup code

* write clean code

* remove bad helpers

* bits junk remove

* junk remove

* smem test

* fix tests

* correct fix + tests

* Fmt matters it seems

* wmma refactor

* a lil more

* kimi cleanups

* line
2026-01-28 17:33:03 +08:00
qazal
5bffa17f82 llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience (#14395)
* beam opens devices

* switch to hip renderer

* amd: true?

* llvm true is for test_autogen
2026-01-28 17:31:22 +09:00
qazal
0294014108 fix bufferize cost function for multi, improve VIZ=-1 cli (#14394)
* improve cli

* remove_bufferize change
2026-01-28 15:53:18 +09:00
qazal
c158acea29 failing multi ram usage test from llama gemm (#14392) 2026-01-28 14:32:32 +09:00