nimlgen
aec1ae0de1
llama: set manual_seed ( #14409 )
2026-01-28 14:40:00 -08:00
chenyu
0870ed28b1
add Self type to MathMixin ( #14411 )
...
these don't cause error
2026-01-28 16:59:38 -05:00
chenyu
079f33c208
fix type in Tensor.mean and Tensor.var ( #14410 )
...
use Tensor.from_uop to wrap UOp from symbolic shape, kernels are the same
2026-01-28 15:24:02 -05:00
chenyu
2b5e99ccc1
minor type cleanups [pr] ( #14408 )
...
mypy --warn-redundant-casts has false negative
2026-01-28 14:11:50 -05:00
chenyu
726415dbc8
import sint directly in movement.py TYPE_CHECKING ( #14406 )
...
avoid creating string TypeAlias, fixed warning in `TYPED=1 python test/test_tiny.py`
2026-01-28 12:47:26 -05:00
nimlgen
acb2fc36ba
nv_pma: add decoder ( #14404 )
...
* nv_pma: add decoder
* cl
2026-01-28 20:44:02 +03:00
chenyu
7b9bc1d8cf
_MockMemoryviewMeta for mockgpu ( #14405 )
...
fixed `PYTHONPATH=. TYPED=1 DEV=AMD MOCKGPU=1 python test/test_tiny.py`. basically make `isinstance(TrackedMemoryView_instance, memoryview)` true
2026-01-28 11:59:00 -05:00
chenyu
93793a645b
use cl.cl_mem instead of internal ctypes._CData ( #14403 )
...
fixed `CHECK_OOB=0 DEV=CL TYPED=1 python test/test_tiny.py`
2026-01-28 10:56:41 -05:00
chenyu
a9b44070a8
fix webgpu runtime types ( #14402 )
...
`CHECK_OOB=0 DEV=WEBGPU TYPED=1 python test/test_tiny.py` passed, also skip tests that failed locally
2026-01-28 10:37:25 -05:00
George Hotz
0c6b3f50aa
add marker to llama training ( #14401 )
2026-01-28 22:44:28 +08:00
Jakob Sachs
2b7c00d3d2
fix sd-example dtype for CLIP embeddings ( #14397 )
2026-01-28 09:07:19 -05:00
qazal
a5a9ce3fdf
viz: disasm cleanups from null emulate ( #14399 )
...
* it's AMDHIPRenderer
* don't need that indent
* less assignment stuff
* that arg order did not make sense
* pmc
2026-01-28 22:03:30 +09:00
nimlgen
544928766d
hcq_smi: kill mac pids ( #14398 )
2026-01-28 15:00:28 +03:00
George Hotz
202b74b369
assembly/amd: continue refactors ( #14386 )
...
* simpler
* merge
* flat
* no ctx
* use the correct apis
* dup code
* write clean code
* remove bad helpers
* bits junk remove
* junk remove
* smem test
* fix tests
* correct fix + tests
* Fmt matters it seems
* wmma refactor
* a lil more
* kimi cleanups
* line
2026-01-28 17:33:03 +08:00
qazal
5bffa17f82
llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience ( #14395 )
...
* beam opens devices
* switch to hip renderer
* amd: true?
* llvm true is for test_autogen
2026-01-28 17:31:22 +09:00
qazal
0294014108
fix bufferize cost function for multi, improve VIZ=-1 cli ( #14394 )
...
* improve cli
* remove_bufferize change
2026-01-28 15:53:18 +09:00
qazal
c158acea29
failing multi ram usage test from llama gemm ( #14392 )
2026-01-28 14:32:32 +09:00
Christopher Milan
067e27857e
nested composite actions don't work ( #14393 )
2026-01-28 00:13:30 -05:00
Christopher Milan
9dddf3d478
don't save caches for PRs, try 2 ( #14391 )
2026-01-27 23:30:17 -05:00
Christopher Milan
68fe5d8b36
Revert "don't save caches for PRs ( #14389 )" ( #14390 )
2026-01-27 23:22:26 -05:00
Christopher Milan
4ab228b498
don't save caches for PRs ( #14389 )
2026-01-27 23:21:31 -05:00
Christopher Milan
5e36482314
decompose long to ints where unsupported, try 2 ( #14383 )
2026-01-27 23:20:43 -05:00
wozeparrot
e496547720
llama3 gradacc ( #14291 )
2026-01-27 19:48:10 -08:00
George Hotz
88bc5ee212
assembly/amd: rename to better names ( #14384 )
...
* assembly/amd: rename to better names
* might help fuzzing segfault
* emu2 -> emu
2026-01-28 10:00:54 +08:00
George Hotz
065b95cfb0
Revert "add retry to fetch ( #14370 )" ( #14385 )
...
This reverts commit dc4d7f2d55 .
2026-01-28 09:35:37 +08:00
Eitan Turok
dc4d7f2d55
add retry to fetch ( #14370 )
2026-01-27 14:04:25 -08:00
chenyu
8d1f3c8885
fix copysign for inf input ( #14381 )
...
* fix copysign for inf input
* llvm olt
2026-01-27 16:45:48 -05:00
Christopher Milan
289a3e415e
also skip test_nonoverlapping_shrink_assignment ( #14382 )
2026-01-27 16:26:26 -05:00
Christopher Milan
f34efc1ad1
DISABLE_FAST_IDIV actually works as a ContextVar ( #14378 )
2026-01-27 16:12:42 -05:00
chenyu
8c899e4aaf
fix copysign for -0 ( #14380 )
...
test both x and 1/x < 0 work too. and found another big with the * 0 hack
2026-01-27 15:44:58 -05:00
chenyu
62884585a7
failed test case for copysign -0.0 ( #14379 )
...
* failed test case for copysign -0.0
* skip those
2026-01-27 14:37:17 -05:00
nimlgen
ec1b28bc2c
am: exit early in case of failures ( #14376 )
...
* am: exit early in case of failures
* sorry, pre-linter
* reset when error state
2026-01-27 22:10:02 +03:00
chenyu
cd22ee9ed0
add InvalidType to ConstType [pr] ( #14373 )
...
* add InvalidType to ConstType [pr]
TYPED=1 python test/test_tiny.py passes.
added PyConst = float|int|bool for some Tensor level input types
* hcq
2026-01-27 14:09:34 -05:00
Christopher Milan
5b42a1357b
SCACHE=0 works with DEBUG ( #14377 )
2026-01-27 13:12:43 -05:00
chenyu
db010a31be
IGNORE_OOB -> CHECK_OOB [pr] ( #14374 )
...
flip the meaning
2026-01-27 12:20:59 -05:00
chenyu
c22667b0c4
also skip test_overlapping_shrink_assignment_reverse ( #14375 )
...
crashing
2026-01-27 12:20:39 -05:00
nimlgen
e52d58b041
autogen: update amd ( #14372 )
2026-01-27 19:53:14 +03:00
nimlgen
cbf94a0a95
nv: exit early in case of failures ( #14363 )
...
* nv: exit early in case of failures
* f
* cleaner
2026-01-27 19:16:22 +03:00
nimlgen
ec691cb299
am: print sq intrs ( #14366 )
...
* am: print sq intrs
* cleaner
2026-01-27 18:28:13 +03:00
qazal
a5f3d46423
hcq: do not assume kernel names are unique ( #14371 )
...
* hcq: do not assume kernel names are unique
* colored kernel name
2026-01-27 23:03:15 +09:00
George Hotz
e5df7e640b
fix branches in amd_asm_matmul ( #14369 )
2026-01-27 20:48:42 +08:00
George Hotz
0ced258726
HOTFIX: skip crashing assign test
2026-01-27 20:35:17 +08:00
George Hotz
131ae604de
force_transcendental on sqrt ( #14368 )
2026-01-27 20:24:41 +08:00
imaolo
14574c68fa
Add ContextVar to disable the scheduler cache ( #14257 )
...
* add scheduler cache ContextVar
* test scheduler cache context var
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-01-27 19:55:29 +08:00
George Hotz
bfc88bcfb8
assembly/amd: emu refactors + enable PYTHON_REMU by default ( #14361 )
...
* assembly/amd: start refactors
* cleanups
* those are global
* methods on ctx
* const cleanup
* range helper
* types and imports
* cleanups
* cleanups
* remove stale name
* fix emu2 types
* more typing
* more mypy
* cleanups
* fxns
* scc cleanup
* cleanups
* cleanups
* simpler parse_pcode
* laneid
* no defaults for pcode
* pcode is not optional
* cleanups
* functions cleanup
* splat
* expr_parser functions
* single tok
* invert global loops
* try_eat
* minor
* run parser on all
* no silent 0
* tests
2026-01-27 17:42:24 +08:00
Christopher Milan
2e72625652
Revert "decompose dtypes.long to ints where unsupported ( #14261 )" ( #14362 )
2026-01-27 02:04:59 -05:00
qazal
f866b2a513
mfma loop in asm dsl ( #14349 )
...
* mfma loop in asm dsl
* work
2026-01-27 11:11:37 +09:00
Christopher Milan
0793319929
decompose dtypes.long to ints where unsupported ( #14261 )
...
* add works
* use carry not overflow
* bitwise ops
* use tag instead of vec
* cleaner
* mul somewhat works
* mul actually works
* SUB and NEG work
* SHL/SHR
* ulong support
* this should work?
* oops
* fix indexing
* all ALU mostly works
* refactor
* test_dtype passing
* signed division works
* format
* clean
* some tests
* ruff
2026-01-26 18:34:13 -05:00
wozeparrot
a987a4abc3
feat: llama8b dev_beam.sh ( #14358 )
2026-01-26 14:51:23 -08:00
Christopher Milan
c9c533fc78
libclang path is homebrew on macos ( #14357 )
...
* libclang path is homebrew macos
* typo
* ugh
* typo
* regen
* no LIBCLANG_PATH
2026-01-26 17:32:09 -05:00