Commit Graph

1429 Commits

Author SHA1 Message Date
George Hotz
e9f2aaba2a simplify rdna3 asm (#13835)
* simplify rdna3 asm

* cleanups

* fix names

* fix tests

* fixes

* more test fixes

* type fixes

* tests pass + mypy passes

* 3.11 syntax
2025-12-26 11:21:03 -05:00
George Hotz
c6937fa744 more work on RDNA3 asm (#13833)
* more llvm asm tests

* roundtrip test

* work

* more handwritten

* more handwritten

* work

* tests pass

* dual mov

* all tests pass

* all tests pass fast
2025-12-25 23:28:14 -05:00
George Hotz
9d94b8c6b2 python asm dsl in extra + python REMU (#13436)
* having fun with python asm dsl

* rdna3

* meh

* all in rdna3

* work

* more work

* work

* integration

* tests

* simpler

* simpler

* asm

* better

* simpler

* progress

* emu

* simpler

* emu

* tests

* types

* vopd

* cleaups

* work

* memory ranges

* add tracing

* refactors

* run_asm exit

* more readable

* compare to remu

* test gemm

* bug + stale

* more tests

* refactor

* tests fix

* more ins

* more instructions

* refactor

* faster

* match case

* match case

* simpler

* work

* tests

* run_asm

* work

* bug fixes

* more emu

* alu/emu

* refactor

* no pipeline emu yet

* alu direct

* fix

* bugfixes + new test

* fix exceptions in emulators

* update gen.py

* pylint

* no pdf

* improve bench_emu

* speedups

* cleanups

* more tests
2025-12-25 13:04:14 -05:00
Daniel Xu
4edaaf19e5 Handle tied embeddings for llama 3.2 1B (#13796)
Previously the output.weight layer would not be loaded, and would only
contain randomly initialized values. This led to junk when doing a
forward pass.

Signed-off-by: Daniel Xu <daniel@thinkingmachines.ai>
2025-12-22 16:31:40 -05:00
chenyu
7f1d41c9f9 delete files that import ShapeTracker (#13805) 2025-12-22 15:54:18 -05:00
qazal
389f01c7f4 viz: amdgpu assembly basic block graph (#13755) 2025-12-22 23:17:16 +08:00
qazal
81d9053013 roc: cast to nullptr instead of changing header (#13801) 2025-12-22 22:34:06 +08:00
nimlgen
d299d30f2c am_smi: fix with new autogen (#13800) 2025-12-22 16:53:26 +03:00
George Hotz
45c459848d remove more stale stuff (#13765)
* remove more stale stuff

* remove disassemblers/adreno

* stale
2025-12-19 17:14:56 -04:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
df6cde8a00 cleanup stale examples/extra (#13764)
* cleanup stale files

* examples

* move those back

* old

* delete more
2025-12-19 16:27:37 -04:00
chenyu
80b84f5267 ruff lint tinykitten (#13762)
deleted used import and double spaces. a few ignore to not change the real code
2025-12-19 14:31:00 -05:00
nimlgen
77191fb744 hive_reset for mi350 (#13746) 2025-12-18 12:02:28 +03:00
wozeparrot
99e667bdcd tk fa bwd (#13480) 2025-12-17 23:56:37 -08:00
nimlgen
7081014c73 am_smi: mi300 (#13737)
* am_smi: mi300

* smi

* remo
2025-12-17 17:56:01 +03:00
nimlgen
3eecb4f123 am: mi350 support (#13733) 2025-12-17 14:57:21 +03:00
wozeparrot
5151a341b3 tk: small changes from fa bwd (#13732) 2025-12-16 22:44:36 -08:00
chenyu
041e9a41c9 add contiguous in BertIntermediate (#13713)
faster step with a lot less recomputation
2025-12-15 22:37:36 -05:00
wozeparrot
5d509499b2 tk: kernel finish groups stores (#13704) 2025-12-15 09:16:17 -08:00
nimlgen
615dcab767 am: minimal mi300 boot (#13679)
* nbio7_9

* psp

* gmc

* gfx

* sdma

* ih

* linter

* linter

* minor

* finish

* add missing

* do not allow warm boot for now
2025-12-15 15:55:03 +03:00
wozeparrot
7ef7ce2856 tk reg local store (#13689) 2025-12-14 23:07:30 -08:00
Robbe Derks
cddbdaf5e1 usbgpu: patch: auto-detect controller PID/VID (#13645)
* auto-detect controller

* fix lint?

* needs ''

* just try
2025-12-14 00:54:51 -05:00
George Hotz
bcbf832399 add chrism 2025-12-14 00:45:57 -05:00
qazal
019e71f8ca lds bank count tests from pmc counters (#13667)
* lds bank count tests from pmc counters

* these tests run on the RDNA3 card too

* rename duration to cycles, other rename comment

* add SQ_LDS_IDX_ACTIVE to gfx9 defaults
2025-12-13 17:39:32 +08:00
qazal
93ad1f7732 viz: readable pmc print, share unpacker with tests (#13655)
* viz: readable pmc print, share unpacker with tests

* sections

* static analyzer

* rm that
2025-12-12 19:29:59 +08:00
wozeparrot
8f60b8dd1e fix: cast on transpose (#13653) 2025-12-11 21:03:49 -08:00
nimlgen
b07839493d proclogs with xccs (#13626) 2025-12-09 16:46:08 +03:00
wozeparrot
89c4206e22 fix: typing (#13614) 2025-12-07 20:10:30 -08:00
wozeparrot
93f1baca77 feat: tk fa in tensor (#13580) 2025-12-05 14:36:29 -08:00
wozeparrot
62e2fc5108 tk: global load/store rv (#13577) 2025-12-04 17:23:48 -08:00
qazal
d7caae5f61 viz: tabulate pmc (#13574)
* viz: tabulate pmc

* linter

* enable nesting

* pmc comes before waves
2025-12-05 03:08:39 +08:00
qazal
512a8f3dd4 viz: start global memory PMC tests (#13569) 2025-12-05 00:40:27 +08:00
nimlgen
db99a61fad qcom: support cpu mappings (#13565)
* test

* qcom: support cpu mappings

* clean

* msg
2025-12-04 14:50:46 +03:00
nimlgen
877a7fdd61 jit: support encdec (#13563)
* jit: support encdec

* fix
2025-12-04 11:58:34 +03:00
George Hotz
a909cd4581 faster HEVC decode (#13552)
* faster HEVC decode

* bind to variables

* cleanups

* more cleanups
2025-12-03 11:33:05 -08:00
nimlgen
fcdb01abe7 hip: fix ioctl (#13548) 2025-12-03 16:40:43 +03:00
nimlgen
daea1161cc nv: nvdec for blackwell (#13546) 2025-12-03 16:30:22 +03:00
George Hotz
ddf3f2d0c4 rdna3 asm + zip_extract (#13499)
* rdna3 asm + zip_extract

* include sqtt

* fix end parsing

* disassembler working

* parsing fields

* instruction

* op

* more parsing
2025-12-02 22:56:01 -08:00
qazal
7622be761f add new remu instructions from #13533 (#13539) 2025-12-03 06:29:20 +08:00
qazal
c65aa93081 refactor sqtt loader to enable PMC=1 SQTT=0 (#13526) 2025-12-02 22:50:38 +08:00
wozeparrot
1b7dbfb37f tk: named kernels + per kernel range id (#13522) 2025-12-01 22:51:04 -08:00
qazal
a5ec3b24be viz: start PMC in the counters view (#13510) 2025-12-02 00:01:57 +08:00
George Hotz
97b56e11e0 hotfix: 32 workgroups for radeon 8050s 2025-11-30 08:20:17 -08:00
George Hotz
bd4b9de7d2 use numpy in amd_uop_matmul for simpler tracing (#13503) 2025-11-30 08:04:38 -08:00
qazal
9023ca30ef show number of waves in each SE/CU (#13491)
* show number of waves in each SE/CU

* update to test_ones
2025-11-30 22:29:16 +08:00
nimlgen
455dd88236 nv: minimal hevc (#13502)
* nv: minimal hevc

* validate

* not needed

* tralin

* var

* cpu

* fxi

* desc

* move

* cleanup
2025-11-30 16:46:55 +03:00
qazal
d457ee0ba4 viz: correctly handle multiple sqtt traces of the same prg (#13460) 2025-11-29 20:52:41 +08:00
wozeparrot
ffc31a23f4 tk mi350 (#13288) 2025-11-25 15:49:44 -08:00
qazal
5520f1fb0b viz: per cu timeline (#13451)
* add cu_loc

* work

* WAVE -> W
2025-11-26 00:05:20 +08:00
wozeparrot
249553a119 tinyfs tweaks (#13444) 2025-11-24 18:07:32 -08:00