Commit Graph

11936 Commits

Author SHA1 Message Date
nimlgen
230d08ec70 test for am recovery and faults handling (#14421)
* test for am recovery and faults handling

* linter
2026-01-29 17:11:24 +03:00
George Hotz
793afbd473 simplify nn.Embedding, support AFTER in CUSTOM_KERNEL (#14419) 2026-01-29 17:22:13 +08:00
Christopher Milan
0c855d6149 ci: remove unused pydeps (#14418) 2026-01-29 01:51:26 -05:00
wozeparrot
4845e42135 llama3 gradacc fixes (#14414) 2026-01-28 19:12:39 -08:00
chenyu
37cde4a01a add one line mypy report (#14415) 2026-01-28 20:39:32 -05:00
chenyu
15aed51544 return types for all math.py function (#14413)
calling int() on sint -> int, i think it's better support since some UOp can be safely cast to int
2026-01-28 20:10:11 -05:00
nimlgen
aec1ae0de1 llama: set manual_seed (#14409) 2026-01-28 14:40:00 -08:00
chenyu
0870ed28b1 add Self type to MathMixin (#14411)
these don't cause error
2026-01-28 16:59:38 -05:00
chenyu
079f33c208 fix type in Tensor.mean and Tensor.var (#14410)
use Tensor.from_uop to wrap UOp from symbolic shape, kernels are the same
2026-01-28 15:24:02 -05:00
chenyu
2b5e99ccc1 minor type cleanups [pr] (#14408)
mypy --warn-redundant-casts has false negative
2026-01-28 14:11:50 -05:00
chenyu
726415dbc8 import sint directly in movement.py TYPE_CHECKING (#14406)
avoid creating string TypeAlias, fixed warning in `TYPED=1 python test/test_tiny.py`
2026-01-28 12:47:26 -05:00
nimlgen
acb2fc36ba nv_pma: add decoder (#14404)
* nv_pma: add decoder

* cl
2026-01-28 20:44:02 +03:00
chenyu
7b9bc1d8cf _MockMemoryviewMeta for mockgpu (#14405)
fixed `PYTHONPATH=. TYPED=1 DEV=AMD MOCKGPU=1 python test/test_tiny.py`. basically make `isinstance(TrackedMemoryView_instance, memoryview)` true
2026-01-28 11:59:00 -05:00
chenyu
93793a645b use cl.cl_mem instead of internal ctypes._CData (#14403)
fixed `CHECK_OOB=0 DEV=CL TYPED=1 python test/test_tiny.py`
2026-01-28 10:56:41 -05:00
chenyu
a9b44070a8 fix webgpu runtime types (#14402)
`CHECK_OOB=0 DEV=WEBGPU TYPED=1 python test/test_tiny.py` passed, also skip tests that failed locally
2026-01-28 10:37:25 -05:00
George Hotz
0c6b3f50aa add marker to llama training (#14401) 2026-01-28 22:44:28 +08:00
Jakob Sachs
2b7c00d3d2 fix sd-example dtype for CLIP embeddings (#14397) 2026-01-28 09:07:19 -05:00
qazal
a5a9ce3fdf viz: disasm cleanups from null emulate (#14399)
* it's AMDHIPRenderer

* don't need that indent

* less assignment stuff

* that arg order did not make sense

* pmc
2026-01-28 22:03:30 +09:00
nimlgen
544928766d hcq_smi: kill mac pids (#14398) 2026-01-28 15:00:28 +03:00
George Hotz
202b74b369 assembly/amd: continue refactors (#14386)
* simpler

* merge

* flat

* no ctx

* use the correct apis

* dup code

* write clean code

* remove bad helpers

* bits junk remove

* junk remove

* smem test

* fix tests

* correct fix + tests

* Fmt matters it seems

* wmma refactor

* a lil more

* kimi cleanups

* line
2026-01-28 17:33:03 +08:00
qazal
5bffa17f82 llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience (#14395)
* beam opens devices

* switch to hip renderer

* amd: true?

* llvm true is for test_autogen
2026-01-28 17:31:22 +09:00
qazal
0294014108 fix bufferize cost function for multi, improve VIZ=-1 cli (#14394)
* improve cli

* remove_bufferize change
2026-01-28 15:53:18 +09:00
qazal
c158acea29 failing multi ram usage test from llama gemm (#14392) 2026-01-28 14:32:32 +09:00
Christopher Milan
067e27857e nested composite actions don't work (#14393) 2026-01-28 00:13:30 -05:00
Christopher Milan
9dddf3d478 don't save caches for PRs, try 2 (#14391) 2026-01-27 23:30:17 -05:00
Christopher Milan
68fe5d8b36 Revert "don't save caches for PRs (#14389)" (#14390) 2026-01-27 23:22:26 -05:00
Christopher Milan
4ab228b498 don't save caches for PRs (#14389) 2026-01-27 23:21:31 -05:00
Christopher Milan
5e36482314 decompose long to ints where unsupported, try 2 (#14383) 2026-01-27 23:20:43 -05:00
wozeparrot
e496547720 llama3 gradacc (#14291) 2026-01-27 19:48:10 -08:00
George Hotz
88bc5ee212 assembly/amd: rename to better names (#14384)
* assembly/amd: rename to better names

* might help fuzzing segfault

* emu2 -> emu
2026-01-28 10:00:54 +08:00
George Hotz
065b95cfb0 Revert "add retry to fetch (#14370)" (#14385)
This reverts commit dc4d7f2d55.
2026-01-28 09:35:37 +08:00
Eitan Turok
dc4d7f2d55 add retry to fetch (#14370) 2026-01-27 14:04:25 -08:00
chenyu
8d1f3c8885 fix copysign for inf input (#14381)
* fix copysign for inf input

* llvm olt
2026-01-27 16:45:48 -05:00
Christopher Milan
289a3e415e also skip test_nonoverlapping_shrink_assignment (#14382) 2026-01-27 16:26:26 -05:00
Christopher Milan
f34efc1ad1 DISABLE_FAST_IDIV actually works as a ContextVar (#14378) 2026-01-27 16:12:42 -05:00
chenyu
8c899e4aaf fix copysign for -0 (#14380)
test both x and 1/x < 0 work too. and found another big with the * 0 hack
2026-01-27 15:44:58 -05:00
chenyu
62884585a7 failed test case for copysign -0.0 (#14379)
* failed test case for copysign -0.0

* skip those
2026-01-27 14:37:17 -05:00
nimlgen
ec1b28bc2c am: exit early in case of failures (#14376)
* am: exit early in case of failures

* sorry, pre-linter

* reset when error state
2026-01-27 22:10:02 +03:00
chenyu
cd22ee9ed0 add InvalidType to ConstType [pr] (#14373)
* add InvalidType to ConstType [pr]

TYPED=1 python test/test_tiny.py passes.
added PyConst = float|int|bool for some Tensor level input types

* hcq
2026-01-27 14:09:34 -05:00
Christopher Milan
5b42a1357b SCACHE=0 works with DEBUG (#14377) 2026-01-27 13:12:43 -05:00
chenyu
db010a31be IGNORE_OOB -> CHECK_OOB [pr] (#14374)
flip the meaning
2026-01-27 12:20:59 -05:00
chenyu
c22667b0c4 also skip test_overlapping_shrink_assignment_reverse (#14375)
crashing
2026-01-27 12:20:39 -05:00
nimlgen
e52d58b041 autogen: update amd (#14372) 2026-01-27 19:53:14 +03:00
nimlgen
cbf94a0a95 nv: exit early in case of failures (#14363)
* nv: exit early in case of failures

* f

* cleaner
2026-01-27 19:16:22 +03:00
nimlgen
ec691cb299 am: print sq intrs (#14366)
* am: print sq intrs

* cleaner
2026-01-27 18:28:13 +03:00
qazal
a5f3d46423 hcq: do not assume kernel names are unique (#14371)
* hcq: do not assume kernel names are unique

* colored kernel name
2026-01-27 23:03:15 +09:00
George Hotz
e5df7e640b fix branches in amd_asm_matmul (#14369) 2026-01-27 20:48:42 +08:00
George Hotz
0ced258726 HOTFIX: skip crashing assign test 2026-01-27 20:35:17 +08:00
George Hotz
131ae604de force_transcendental on sqrt (#14368) 2026-01-27 20:24:41 +08:00
imaolo
14574c68fa Add ContextVar to disable the scheduler cache (#14257)
* add scheduler cache ContextVar

* test scheduler cache context var

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-01-27 19:55:29 +08:00