Commit Graph

1426 Commits

Author SHA1 Message Date
Daniel Xu
4edaaf19e5 Handle tied embeddings for llama 3.2 1B (#13796)
Previously the output.weight layer would not be loaded, and would only
contain randomly initialized values. This led to junk when doing a
forward pass.

Signed-off-by: Daniel Xu <daniel@thinkingmachines.ai>
2025-12-22 16:31:40 -05:00
chenyu
7f1d41c9f9 delete files that import ShapeTracker (#13805) 2025-12-22 15:54:18 -05:00
qazal
389f01c7f4 viz: amdgpu assembly basic block graph (#13755) 2025-12-22 23:17:16 +08:00
qazal
81d9053013 roc: cast to nullptr instead of changing header (#13801) 2025-12-22 22:34:06 +08:00
nimlgen
d299d30f2c am_smi: fix with new autogen (#13800) 2025-12-22 16:53:26 +03:00
George Hotz
45c459848d remove more stale stuff (#13765)
* remove more stale stuff

* remove disassemblers/adreno

* stale
2025-12-19 17:14:56 -04:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
df6cde8a00 cleanup stale examples/extra (#13764)
* cleanup stale files

* examples

* move those back

* old

* delete more
2025-12-19 16:27:37 -04:00
chenyu
80b84f5267 ruff lint tinykitten (#13762)
deleted used import and double spaces. a few ignore to not change the real code
2025-12-19 14:31:00 -05:00
nimlgen
77191fb744 hive_reset for mi350 (#13746) 2025-12-18 12:02:28 +03:00
wozeparrot
99e667bdcd tk fa bwd (#13480) 2025-12-17 23:56:37 -08:00
nimlgen
7081014c73 am_smi: mi300 (#13737)
* am_smi: mi300

* smi

* remo
2025-12-17 17:56:01 +03:00
nimlgen
3eecb4f123 am: mi350 support (#13733) 2025-12-17 14:57:21 +03:00
wozeparrot
5151a341b3 tk: small changes from fa bwd (#13732) 2025-12-16 22:44:36 -08:00
chenyu
041e9a41c9 add contiguous in BertIntermediate (#13713)
faster step with a lot less recomputation
2025-12-15 22:37:36 -05:00
wozeparrot
5d509499b2 tk: kernel finish groups stores (#13704) 2025-12-15 09:16:17 -08:00
nimlgen
615dcab767 am: minimal mi300 boot (#13679)
* nbio7_9

* psp

* gmc

* gfx

* sdma

* ih

* linter

* linter

* minor

* finish

* add missing

* do not allow warm boot for now
2025-12-15 15:55:03 +03:00
wozeparrot
7ef7ce2856 tk reg local store (#13689) 2025-12-14 23:07:30 -08:00
Robbe Derks
cddbdaf5e1 usbgpu: patch: auto-detect controller PID/VID (#13645)
* auto-detect controller

* fix lint?

* needs ''

* just try
2025-12-14 00:54:51 -05:00
George Hotz
bcbf832399 add chrism 2025-12-14 00:45:57 -05:00
qazal
019e71f8ca lds bank count tests from pmc counters (#13667)
* lds bank count tests from pmc counters

* these tests run on the RDNA3 card too

* rename duration to cycles, other rename comment

* add SQ_LDS_IDX_ACTIVE to gfx9 defaults
2025-12-13 17:39:32 +08:00
qazal
93ad1f7732 viz: readable pmc print, share unpacker with tests (#13655)
* viz: readable pmc print, share unpacker with tests

* sections

* static analyzer

* rm that
2025-12-12 19:29:59 +08:00
wozeparrot
8f60b8dd1e fix: cast on transpose (#13653) 2025-12-11 21:03:49 -08:00
nimlgen
b07839493d proclogs with xccs (#13626) 2025-12-09 16:46:08 +03:00
wozeparrot
89c4206e22 fix: typing (#13614) 2025-12-07 20:10:30 -08:00
wozeparrot
93f1baca77 feat: tk fa in tensor (#13580) 2025-12-05 14:36:29 -08:00
wozeparrot
62e2fc5108 tk: global load/store rv (#13577) 2025-12-04 17:23:48 -08:00
qazal
d7caae5f61 viz: tabulate pmc (#13574)
* viz: tabulate pmc

* linter

* enable nesting

* pmc comes before waves
2025-12-05 03:08:39 +08:00
qazal
512a8f3dd4 viz: start global memory PMC tests (#13569) 2025-12-05 00:40:27 +08:00
nimlgen
db99a61fad qcom: support cpu mappings (#13565)
* test

* qcom: support cpu mappings

* clean

* msg
2025-12-04 14:50:46 +03:00
nimlgen
877a7fdd61 jit: support encdec (#13563)
* jit: support encdec

* fix
2025-12-04 11:58:34 +03:00
George Hotz
a909cd4581 faster HEVC decode (#13552)
* faster HEVC decode

* bind to variables

* cleanups

* more cleanups
2025-12-03 11:33:05 -08:00
nimlgen
fcdb01abe7 hip: fix ioctl (#13548) 2025-12-03 16:40:43 +03:00
nimlgen
daea1161cc nv: nvdec for blackwell (#13546) 2025-12-03 16:30:22 +03:00
George Hotz
ddf3f2d0c4 rdna3 asm + zip_extract (#13499)
* rdna3 asm + zip_extract

* include sqtt

* fix end parsing

* disassembler working

* parsing fields

* instruction

* op

* more parsing
2025-12-02 22:56:01 -08:00
qazal
7622be761f add new remu instructions from #13533 (#13539) 2025-12-03 06:29:20 +08:00
qazal
c65aa93081 refactor sqtt loader to enable PMC=1 SQTT=0 (#13526) 2025-12-02 22:50:38 +08:00
wozeparrot
1b7dbfb37f tk: named kernels + per kernel range id (#13522) 2025-12-01 22:51:04 -08:00
qazal
a5ec3b24be viz: start PMC in the counters view (#13510) 2025-12-02 00:01:57 +08:00
George Hotz
97b56e11e0 hotfix: 32 workgroups for radeon 8050s 2025-11-30 08:20:17 -08:00
George Hotz
bd4b9de7d2 use numpy in amd_uop_matmul for simpler tracing (#13503) 2025-11-30 08:04:38 -08:00
qazal
9023ca30ef show number of waves in each SE/CU (#13491)
* show number of waves in each SE/CU

* update to test_ones
2025-11-30 22:29:16 +08:00
nimlgen
455dd88236 nv: minimal hevc (#13502)
* nv: minimal hevc

* validate

* not needed

* tralin

* var

* cpu

* fxi

* desc

* move

* cleanup
2025-11-30 16:46:55 +03:00
qazal
d457ee0ba4 viz: correctly handle multiple sqtt traces of the same prg (#13460) 2025-11-29 20:52:41 +08:00
wozeparrot
ffc31a23f4 tk mi350 (#13288) 2025-11-25 15:49:44 -08:00
qazal
5520f1fb0b viz: per cu timeline (#13451)
* add cu_loc

* work

* WAVE -> W
2025-11-26 00:05:20 +08:00
wozeparrot
249553a119 tinyfs tweaks (#13444) 2025-11-24 18:07:32 -08:00
wozeparrot
f46bc31156 tk: start and step in range (#13442) 2025-11-24 15:43:24 -08:00
qazal
2a9bd12700 sqtt: add occupancy events to the timeline (#13430) 2025-11-24 22:28:05 +08:00
qazal
712c7a6448 sqtt loader cleanups from the occupancy branch (#13431)
* cleanup err handling

* from disasms

* s/wave_execs/wave_insts
2025-11-23 21:50:34 +08:00