Commit Graph

1549 Commits

Author SHA1 Message Date
George Hotz
d10668283d Merge remote-tracking branch 'origin/master' into asm_ucode
# Conflicts:
#	test/test_jit.py
#	test/test_jit_footguns.py
#	tinygrad/engine/jit.py
2026-01-08 05:14:42 -08:00
George Hotz
627c440d38 minmax 2026-01-08 05:14:04 -08:00
George Hotz
e7b5d8a434 assembly/amd: more RDNA4 asm (#14062)
* rdna4 more

* asm

* fixes

* assembly/amd: handwritten wmma failing test

* passes

* wmma default hacks

* space

* 0 skips in rdna3/rdna4 disasm

* more RDNA4 tests

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2026-01-08 05:09:37 -08:00
George Hotz
0c40e52ae1 no void 2026-01-08 05:06:23 -08:00
George Hotz
894230d0a9 fix parser bugs 2026-01-08 05:02:55 -08:00
George Hotz
544a877960 cvt functions 2026-01-08 04:57:04 -08:00
nimlgen
e372c841ba hevc: beam in decode (#14067)
* hevc: beam in decode

* fine

* g
2026-01-08 15:47:16 +03:00
George Hotz
0dfdad0e76 cleanup pcode_parse 2026-01-08 04:27:26 -08:00
nimlgen
f3aceaa08b hevc: fast decoder (#14057) 2026-01-08 15:20:37 +03:00
George Hotz
4a7456caef move more 2026-01-08 04:18:46 -08:00
qazal
309197bca5 assembly/amd: test_roundtrip for cdna/rdna4 (#14066) 2026-01-08 21:03:13 +09:00
George Hotz
d84db5851f calls 2026-01-08 00:55:57 -08:00
qazal
15a056715d fix amd assembly IDE tests on macbook (#14063) 2026-01-08 17:27:52 +09:00
George Hotz
37b4751958 isNAN 2026-01-08 00:24:10 -08:00
George Hotz
5e923ccb5e simpler ucode 2026-01-08 00:10:23 -08:00
George Hotz
10836a5dba lil cleanups 2026-01-07 23:49:53 -08:00
George Hotz
56ba96f5cd uops have types 2026-01-07 21:52:55 -08:00
George Hotz
2db04d0696 assembly/amd: start adding RDNA4 support (#14060)
* assembly/amd: start adding RDNA4 support

* rdna4 asm
2026-01-07 21:19:30 -08:00
George Hotz
c8b42edec6 a bunch of todos for my boy claude 2026-01-07 20:18:04 -08:00
wozeparrot
0069cd9a0b tk: support sliced local -> reg load (#14034) 2026-01-07 20:18:04 -08:00
George Hotz
cb500466c2 assembly/amd: amd_asm_matmul (#13989)
* amd_asm_matmul

* dsl transform

* asm roundtrip

* fixed

* less

* better

* more

* simpler

* simplify

* lil

* simpler

* compact

* work

* cleanups

* simplify

* simpler

* cleanup

* name the regs

* simp

* big simp

* big simp

* simp

* acc grid

* fast

* stuff

* fast

* simpler

* owrks

* save vgprs

* save vgprs

* Compact

* less VGPRs

* after

* SQTT support

* fastest

* faster

* lil faster

* tile regs

* faster

* readable

* one more

* simpler

* lil simpler

* NO_GLOBAL skips early globals

* stock kernel

* cleanups

* cleanups

* one b reg

* safe reg changes

* acc is compact now

* remove confusing stuff

* sregs

* lds cleanups

* vopd
2026-01-07 20:11:05 -08:00
nimlgen
5bd4593eda hevc: cleaner decoder (#14056)
* hevc: cleaner decoder

* nn
2026-01-07 18:29:30 +03:00
chenyu
c714881832 don't allow jit input to be const (#14045)
* don't allow jit input to be unbuffered like const

* just const to fix multi

* fix rnnt
2026-01-06 18:15:22 -05:00
wozeparrot
2b3e01e79c tk: support sliced local -> reg load (#14034) 2026-01-06 05:33:24 -05:00
George Hotz
947747eb5e Merge branch 'master' into asm_ucode 2026-01-06 00:16:03 -08:00
George Hotz
45f7fd073d assembly/amd: pcode bug fixes (#14032)
* bring over pcode parser

* fixes

* pdf test

* delay alu
2026-01-06 00:15:48 -08:00
wozeparrot
21d0f6bb76 tk: flat global -> local load (#14033) 2026-01-05 23:35:53 -08:00
George Hotz
640dac46c2 pcode_exec 2026-01-05 21:15:29 -08:00
George Hotz
05129e58b0 pcode back 2026-01-05 20:15:30 -08:00
George Hotz
b7dc59a68d fix emu 2026-01-05 20:09:20 -08:00
George Hotz
ec7ec99cbd better 2026-01-05 20:06:52 -08:00
George Hotz
c8c6346336 tests 2026-01-05 19:57:30 -08:00
George Hotz
6de310c87f parsing 2026-01-05 19:54:56 -08:00
George Hotz
ffba806b65 pdf/qcode work 2026-01-05 19:42:47 -08:00
George Hotz
c4016d5cac fix psuedocode parsing 2026-01-05 18:58:55 -08:00
George Hotz
a5587fbda1 Merge origin/master, delete pcode.py 2026-01-05 18:53:43 -08:00
George Hotz
20653d2996 assembly/amd: make pdf.py code shine (#14029)
* assembly/amd: make pdf.py code shine

* no merge

* pdf2 is the future

* something

* regen enums

* test

* work

* remove junk

* write

* pcode extraction

* pdf2 passes all tests

* simplify

* simpler pdf

* late filter

* remove hacks

* simplify pdf2.py

* field type

* remove defaults

* don't export srcenum

* simple pdf.py

* simpler

* cleaner

* less hack in PDF
2026-01-05 18:49:40 -08:00
qazal
ea7b149ca5 viz command line tool (#14030) 2026-01-06 10:19:47 +09:00
George Hotz
74da1c6310 test qcode 2026-01-05 02:42:27 -08:00
nimlgen
70405b4f3c am_smi: mi350 (#14018) 2026-01-05 13:10:56 +03:00
George Hotz
400d59c06b simpler 2026-01-04 20:37:06 -08:00
George Hotz
57684d2777 no pcode 2026-01-04 18:35:16 -08:00
George Hotz
8147a78d24 wide dtypes 2026-01-04 17:26:12 -08:00
George Hotz
486248f775 fix pcode 2026-01-04 17:04:52 -08:00
George Hotz
87e72f1540 ftz 2026-01-04 16:32:35 -08:00
George Hotz
b52ff63896 fixes 2026-01-04 15:48:31 -08:00
George Hotz
404eed6172 assembly/amd: improve tests for asm (#14007)
* assembly/amd: improve tests for asm

* upd

* skip

* tests

* re bug

* more passing

* cleanups

* cdna fixups

* improve tests, better CDNA parsing

* fix CI

* no defs

* simpler

* all pass

* from pdf

* regen
2026-01-04 15:14:08 -08:00
George Hotz
7f7f12d5b4 99% match 2026-01-04 15:05:05 -08:00
George Hotz
b10ae6958e roundtripping 2026-01-04 14:31:40 -08:00
George Hotz
10e2c47d52 don't make dtype 2026-01-04 13:49:47 -08:00