Commit Graph

1467 Commits

Author SHA1 Message Date
b1tg
24723327ac fix tc_up in search (#13438)
* tensor_core is missing from Scheduler

* test upcast max

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-01 10:25:08 -05:00
qazal
9726500de8 enable using assembly in Tensor.custom_kernel (#13895) 2026-01-02 00:12:01 +09:00
qazal
c0f52c9dcb split assembly gemm to per arch directory (#13953) 2026-01-02 00:10:22 +09:00
qazal
6a5430ab00 correct args order in mi350x gemm (#13949) 2026-01-01 23:01:46 +09:00
George Hotz
2bb07d4824 assembly/amd: move Reg out of the psuedocode (#13934)
* assembly/amd: move Reg out of the psuedocode

* remove extra

* fix pcode tests

* simpler pcode

* simpler

* simpler

* cleaner

* fix mypy
2025-12-31 15:34:51 -05:00
George Hotz
f14428090f assembly/amd: speed up emulator (#13932) 2025-12-31 13:32:25 -05:00
George Hotz
29402034a1 assembly/amd: cleanups to asm and emu (#13912)
* a bunch of cleanups

* ops are back

* bug fixes

* cleanups

* a lil simpler

* more refactors

* _disasm_vop1

* sops

* more

* continue

* more

* num_srcs

* simpler

* no _is16

* op cleanups

* isinstnace
2025-12-31 12:46:11 -05:00
George Hotz
b998a80b5d assembly/amd: split generated stuff into enum/ins (#13924) 2025-12-31 10:10:52 -05:00
qazal
b23f4517ab prep mi350x gemm for python dsl (#13918)
* start by pruning existing asm

* better branch names

* split to template and real instructions
2025-12-31 20:00:57 +09:00
qazal
3f3786ded9 mmapeak: fix compiler import (#13915) 2025-12-31 16:52:23 +09:00
George Hotz
0221b96761 assembly/amd: fix all ops tests (#13910)
* assembly/amd: fix all ops tests

* test_ops with smaller sizes

* ds store/load 2addr
2025-12-30 18:01:34 -05:00
George Hotz
efc99d0c55 assembly/amd: more refactors (#13907)
* assembly/amd: more refactors

* more refactors

* more refactors

* simpler emu

* generate.py

* regen all

* cleanups

* more

* work

* more readme

* lil
2025-12-30 16:13:24 -05:00
George Hotz
49d1bf93d6 assembly/amd: refactor asm.py to be simpler (#13900)
* assembly/amd: refactor asm.py

* assembly/amd: refactor asm.py to be simpler

* multiple fxns

* fast

* more tests pass

* regen

* stop decode
2025-12-30 13:51:40 -05:00
George Hotz
7e14cdcb06 assembly/amd: clean up clt/ctz hack (#13901)
* assembly/amd: clean up clt/ctz hack

* add breaks
2025-12-30 11:59:28 -05:00
George Hotz
69cdc8066d assembly/amd: add dtype tests to AMD IDE CI (#13899)
* add dtype tests to AMD IDE CI

* more tests

* add trig preop

* regen done

* split to amd autogen

* simpler
2025-12-30 11:09:51 -05:00
George Hotz
9c89be5235 assembly/amd: fix v_perm_b32 + PC fixes (#13897)
* assembly/amd: fix v_perm_b32

* add pc support
2025-12-30 09:25:40 -05:00
George Hotz
2b838dc1d8 assembly/amd: fix AMD_LLVM=1 support in emulator (#13881)
* fix AMD_LLVM=1 support in emulator

* more llvm with dtype

* work

* more fixes

* fix dtype
2025-12-30 09:09:57 -05:00
qazal
b557c46233 assembly gemm clean ups, instructions for cli (#13892) 2025-12-30 16:14:06 +09:00
qazal
d7e1f26e3d command line interface for sqtt viz (#13891)
* command line interface for sqtt viz

* cleanup

* api surface area

* this confuses the llms

* document
2025-12-30 12:33:21 +09:00
George Hotz
94bca91f3e assembly/amd: have asm go through the dsl (#13886)
* assembly/amd: have asm go through the dsl

* lil
2025-12-29 17:39:11 -05:00
George Hotz
7322d9ec4a assembly/amd: add new instruction support to pcode (#13885)
* assembly/amd: add new instruction support

* more

* regen all
2025-12-29 17:30:17 -05:00
George Hotz
0d326f5b9b fix missing instructions in psuedocode (#13884) 2025-12-29 16:11:22 -05:00
George Hotz
9d8397be11 add CDNA3+RDNA4 support (#13882)
* fix CI

* remove junk

* rename lib to dsl

* correct

* cleanups
2025-12-29 15:51:29 -05:00
George Hotz
81cf9ea0ab rename to extra.assembly.amd (#13879) 2025-12-29 14:10:55 -05:00
George Hotz
37f0fa11b6 rdna3 test cleanups (#13878)
* rdna3 test cleanups

* cleanups

* ugh DONT SKIP
2025-12-29 13:41:59 -05:00
George Hotz
35db73b231 add cdna4 support to parsers (#13877)
* add cdna4 support to parsers

* cdna4
2025-12-29 13:23:43 -05:00
George Hotz
ff856a74cb minor refactoring for rdna3 (#13873)
* minor refactoring for rdna3

* fix div scale stuff

* more bugfixes
2025-12-29 13:20:00 -05:00
George Hotz
f1471a3b99 speed up rdna3 unit tests + add to CI (#13871)
* speed up rdna3 unit tests

* add test to CI

* faster and simpler

* speedups

* bugfixes

* use helper

* fix CI maybe

* test fixes

* llvm-21 on 24.04

* upd

* llvm-21

* fix test

* bring that back

* merge gen into lib

* test generators
2025-12-29 10:26:48 -05:00
George Hotz
25ef866e89 write python emulator from RDNA3 psuedocode in pdf (#13841)
* write python emulator from RDNA3 psuedocode in pdf

* emu2

* more emu

* working

* more psueod

* progress

* cleanups

* delete junk

* delete stale files

* just emu

* work

* emu compare

* bemu

* cleanups and more failures

* revert bench emu

* fix emu cmp

* four tests fail

* bugfixes

* dsl

* ext

* refactor

* dsl

* div scale fix

* test_emu

* fix emu tests

* pcode

* test pcode

* top imports

* fix test_emu to use run_asm

* emu tests on real hardware

* more tests

* more emu tests

* more

* work

* work

* bug fix

* bugfixes

* fix fp16 gemm

* all ops tests pass in emulator

* fix llvm tests

* fix a few more tests

* fix mockgpu timeout
2025-12-29 07:39:53 -05:00
qazal
f541540129 variable N for asm gemm (#13869)
* variable N for asm gemm

* cleanup spacing
2025-12-29 19:35:50 +09:00
qazal
fc5278746f mi350x assembly gemm cleanups (#13867) 2025-12-29 18:47:23 +09:00
George Hotz
f07c39cfa4 hwtest fixes for rdna3 dsl (#13865) 2025-12-28 20:42:29 -05:00
George Hotz
d9603c1bee improve asm dsl syntax (#13864)
* improve asm dsl syntax

* improve asm dsl syntax
2025-12-28 20:04:59 -05:00
qazal
066d96c397 print tflops in asm gemm test (#13859)
* print tflops in asm gemm test

* change order
2025-12-29 02:26:40 +09:00
qazal
2cfbabdc34 mi350x 1tflop bf16 gemm in extra (#13702) 2025-12-28 21:45:42 +09:00
qazal
2180eee5e4 use the asm dsl in remu hwtest.py (#13856)
* remu hw test with the asm dsl

* simpler

* nthreads and exec mask

* cmp/cmpx

* assembler error in s_mov_b32

* vopd in dsl?
2025-12-28 11:32:41 +09:00
qazal
f6c660f7fa simplify sqtt decoder infra (#13849)
* more work

* simpler
2025-12-28 00:31:16 +09:00
qazal
a2da61d096 use new style amd compiler in viz (#13848)
* working version, handcode gfx1100 arch

* get target from device properties

* lib in cfg test program spec
2025-12-27 23:59:30 +09:00
George Hotz
e9f2aaba2a simplify rdna3 asm (#13835)
* simplify rdna3 asm

* cleanups

* fix names

* fix tests

* fixes

* more test fixes

* type fixes

* tests pass + mypy passes

* 3.11 syntax
2025-12-26 11:21:03 -05:00
George Hotz
c6937fa744 more work on RDNA3 asm (#13833)
* more llvm asm tests

* roundtrip test

* work

* more handwritten

* more handwritten

* work

* tests pass

* dual mov

* all tests pass

* all tests pass fast
2025-12-25 23:28:14 -05:00
George Hotz
9d94b8c6b2 python asm dsl in extra + python REMU (#13436)
* having fun with python asm dsl

* rdna3

* meh

* all in rdna3

* work

* more work

* work

* integration

* tests

* simpler

* simpler

* asm

* better

* simpler

* progress

* emu

* simpler

* emu

* tests

* types

* vopd

* cleaups

* work

* memory ranges

* add tracing

* refactors

* run_asm exit

* more readable

* compare to remu

* test gemm

* bug + stale

* more tests

* refactor

* tests fix

* more ins

* more instructions

* refactor

* faster

* match case

* match case

* simpler

* work

* tests

* run_asm

* work

* bug fixes

* more emu

* alu/emu

* refactor

* no pipeline emu yet

* alu direct

* fix

* bugfixes + new test

* fix exceptions in emulators

* update gen.py

* pylint

* no pdf

* improve bench_emu

* speedups

* cleanups

* more tests
2025-12-25 13:04:14 -05:00
Daniel Xu
4edaaf19e5 Handle tied embeddings for llama 3.2 1B (#13796)
Previously the output.weight layer would not be loaded, and would only
contain randomly initialized values. This led to junk when doing a
forward pass.

Signed-off-by: Daniel Xu <daniel@thinkingmachines.ai>
2025-12-22 16:31:40 -05:00
chenyu
7f1d41c9f9 delete files that import ShapeTracker (#13805) 2025-12-22 15:54:18 -05:00
qazal
389f01c7f4 viz: amdgpu assembly basic block graph (#13755) 2025-12-22 23:17:16 +08:00
qazal
81d9053013 roc: cast to nullptr instead of changing header (#13801) 2025-12-22 22:34:06 +08:00
nimlgen
d299d30f2c am_smi: fix with new autogen (#13800) 2025-12-22 16:53:26 +03:00
George Hotz
45c459848d remove more stale stuff (#13765)
* remove more stale stuff

* remove disassemblers/adreno

* stale
2025-12-19 17:14:56 -04:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
df6cde8a00 cleanup stale examples/extra (#13764)
* cleanup stale files

* examples

* move those back

* old

* delete more
2025-12-19 16:27:37 -04:00
chenyu
80b84f5267 ruff lint tinykitten (#13762)
deleted used import and double spaces. a few ignore to not change the real code
2025-12-19 14:31:00 -05:00