qazal
3f3786ded9
mmapeak: fix compiler import ( #13915 )
2025-12-31 16:52:23 +09:00
George Hotz
0221b96761
assembly/amd: fix all ops tests ( #13910 )
...
* assembly/amd: fix all ops tests
* test_ops with smaller sizes
* ds store/load 2addr
2025-12-30 18:01:34 -05:00
George Hotz
efc99d0c55
assembly/amd: more refactors ( #13907 )
...
* assembly/amd: more refactors
* more refactors
* more refactors
* simpler emu
* generate.py
* regen all
* cleanups
* more
* work
* more readme
* lil
2025-12-30 16:13:24 -05:00
George Hotz
49d1bf93d6
assembly/amd: refactor asm.py to be simpler ( #13900 )
...
* assembly/amd: refactor asm.py
* assembly/amd: refactor asm.py to be simpler
* multiple fxns
* fast
* more tests pass
* regen
* stop decode
2025-12-30 13:51:40 -05:00
George Hotz
7e14cdcb06
assembly/amd: clean up clt/ctz hack ( #13901 )
...
* assembly/amd: clean up clt/ctz hack
* add breaks
2025-12-30 11:59:28 -05:00
George Hotz
69cdc8066d
assembly/amd: add dtype tests to AMD IDE CI ( #13899 )
...
* add dtype tests to AMD IDE CI
* more tests
* add trig preop
* regen done
* split to amd autogen
* simpler
2025-12-30 11:09:51 -05:00
George Hotz
9c89be5235
assembly/amd: fix v_perm_b32 + PC fixes ( #13897 )
...
* assembly/amd: fix v_perm_b32
* add pc support
2025-12-30 09:25:40 -05:00
George Hotz
2b838dc1d8
assembly/amd: fix AMD_LLVM=1 support in emulator ( #13881 )
...
* fix AMD_LLVM=1 support in emulator
* more llvm with dtype
* work
* more fixes
* fix dtype
2025-12-30 09:09:57 -05:00
qazal
b557c46233
assembly gemm clean ups, instructions for cli ( #13892 )
2025-12-30 16:14:06 +09:00
qazal
d7e1f26e3d
command line interface for sqtt viz ( #13891 )
...
* command line interface for sqtt viz
* cleanup
* api surface area
* this confuses the llms
* document
2025-12-30 12:33:21 +09:00
George Hotz
94bca91f3e
assembly/amd: have asm go through the dsl ( #13886 )
...
* assembly/amd: have asm go through the dsl
* lil
2025-12-29 17:39:11 -05:00
George Hotz
7322d9ec4a
assembly/amd: add new instruction support to pcode ( #13885 )
...
* assembly/amd: add new instruction support
* more
* regen all
2025-12-29 17:30:17 -05:00
George Hotz
0d326f5b9b
fix missing instructions in psuedocode ( #13884 )
2025-12-29 16:11:22 -05:00
George Hotz
9d8397be11
add CDNA3+RDNA4 support ( #13882 )
...
* fix CI
* remove junk
* rename lib to dsl
* correct
* cleanups
2025-12-29 15:51:29 -05:00
George Hotz
81cf9ea0ab
rename to extra.assembly.amd ( #13879 )
2025-12-29 14:10:55 -05:00
George Hotz
37f0fa11b6
rdna3 test cleanups ( #13878 )
...
* rdna3 test cleanups
* cleanups
* ugh DONT SKIP
2025-12-29 13:41:59 -05:00
George Hotz
35db73b231
add cdna4 support to parsers ( #13877 )
...
* add cdna4 support to parsers
* cdna4
2025-12-29 13:23:43 -05:00
George Hotz
ff856a74cb
minor refactoring for rdna3 ( #13873 )
...
* minor refactoring for rdna3
* fix div scale stuff
* more bugfixes
2025-12-29 13:20:00 -05:00
George Hotz
f1471a3b99
speed up rdna3 unit tests + add to CI ( #13871 )
...
* speed up rdna3 unit tests
* add test to CI
* faster and simpler
* speedups
* bugfixes
* use helper
* fix CI maybe
* test fixes
* llvm-21 on 24.04
* upd
* llvm-21
* fix test
* bring that back
* merge gen into lib
* test generators
2025-12-29 10:26:48 -05:00
George Hotz
25ef866e89
write python emulator from RDNA3 psuedocode in pdf ( #13841 )
...
* write python emulator from RDNA3 psuedocode in pdf
* emu2
* more emu
* working
* more psueod
* progress
* cleanups
* delete junk
* delete stale files
* just emu
* work
* emu compare
* bemu
* cleanups and more failures
* revert bench emu
* fix emu cmp
* four tests fail
* bugfixes
* dsl
* ext
* refactor
* dsl
* div scale fix
* test_emu
* fix emu tests
* pcode
* test pcode
* top imports
* fix test_emu to use run_asm
* emu tests on real hardware
* more tests
* more emu tests
* more
* work
* work
* bug fix
* bugfixes
* fix fp16 gemm
* all ops tests pass in emulator
* fix llvm tests
* fix a few more tests
* fix mockgpu timeout
2025-12-29 07:39:53 -05:00
qazal
f541540129
variable N for asm gemm ( #13869 )
...
* variable N for asm gemm
* cleanup spacing
2025-12-29 19:35:50 +09:00
qazal
fc5278746f
mi350x assembly gemm cleanups ( #13867 )
2025-12-29 18:47:23 +09:00
George Hotz
f07c39cfa4
hwtest fixes for rdna3 dsl ( #13865 )
2025-12-28 20:42:29 -05:00
George Hotz
d9603c1bee
improve asm dsl syntax ( #13864 )
...
* improve asm dsl syntax
* improve asm dsl syntax
2025-12-28 20:04:59 -05:00
qazal
066d96c397
print tflops in asm gemm test ( #13859 )
...
* print tflops in asm gemm test
* change order
2025-12-29 02:26:40 +09:00
qazal
2cfbabdc34
mi350x 1tflop bf16 gemm in extra ( #13702 )
2025-12-28 21:45:42 +09:00
qazal
2180eee5e4
use the asm dsl in remu hwtest.py ( #13856 )
...
* remu hw test with the asm dsl
* simpler
* nthreads and exec mask
* cmp/cmpx
* assembler error in s_mov_b32
* vopd in dsl?
2025-12-28 11:32:41 +09:00
qazal
f6c660f7fa
simplify sqtt decoder infra ( #13849 )
...
* more work
* simpler
2025-12-28 00:31:16 +09:00
qazal
a2da61d096
use new style amd compiler in viz ( #13848 )
...
* working version, handcode gfx1100 arch
* get target from device properties
* lib in cfg test program spec
2025-12-27 23:59:30 +09:00
George Hotz
e9f2aaba2a
simplify rdna3 asm ( #13835 )
...
* simplify rdna3 asm
* cleanups
* fix names
* fix tests
* fixes
* more test fixes
* type fixes
* tests pass + mypy passes
* 3.11 syntax
2025-12-26 11:21:03 -05:00
George Hotz
c6937fa744
more work on RDNA3 asm ( #13833 )
...
* more llvm asm tests
* roundtrip test
* work
* more handwritten
* more handwritten
* work
* tests pass
* dual mov
* all tests pass
* all tests pass fast
2025-12-25 23:28:14 -05:00
George Hotz
9d94b8c6b2
python asm dsl in extra + python REMU ( #13436 )
...
* having fun with python asm dsl
* rdna3
* meh
* all in rdna3
* work
* more work
* work
* integration
* tests
* simpler
* simpler
* asm
* better
* simpler
* progress
* emu
* simpler
* emu
* tests
* types
* vopd
* cleaups
* work
* memory ranges
* add tracing
* refactors
* run_asm exit
* more readable
* compare to remu
* test gemm
* bug + stale
* more tests
* refactor
* tests fix
* more ins
* more instructions
* refactor
* faster
* match case
* match case
* simpler
* work
* tests
* run_asm
* work
* bug fixes
* more emu
* alu/emu
* refactor
* no pipeline emu yet
* alu direct
* fix
* bugfixes + new test
* fix exceptions in emulators
* update gen.py
* pylint
* no pdf
* improve bench_emu
* speedups
* cleanups
* more tests
2025-12-25 13:04:14 -05:00
Daniel Xu
4edaaf19e5
Handle tied embeddings for llama 3.2 1B ( #13796 )
...
Previously the output.weight layer would not be loaded, and would only
contain randomly initialized values. This led to junk when doing a
forward pass.
Signed-off-by: Daniel Xu <daniel@thinkingmachines.ai >
2025-12-22 16:31:40 -05:00
chenyu
7f1d41c9f9
delete files that import ShapeTracker ( #13805 )
2025-12-22 15:54:18 -05:00
qazal
389f01c7f4
viz: amdgpu assembly basic block graph ( #13755 )
2025-12-22 23:17:16 +08:00
qazal
81d9053013
roc: cast to nullptr instead of changing header ( #13801 )
2025-12-22 22:34:06 +08:00
nimlgen
d299d30f2c
am_smi: fix with new autogen ( #13800 )
2025-12-22 16:53:26 +03:00
George Hotz
45c459848d
remove more stale stuff ( #13765 )
...
* remove more stale stuff
* remove disassemblers/adreno
* stale
2025-12-19 17:14:56 -04:00
George Hotz
744af193f0
remove ScheduleItem and merge it with ExecItem ( #13759 )
...
* remove ExecItem and merge it with ScheduleItem
* less diff
* fix issues
* min diff
* don't change bufs in _lower
* min diff
* update
* revert
* fixes
* diff
2025-12-19 17:04:24 -04:00
George Hotz
df6cde8a00
cleanup stale examples/extra ( #13764 )
...
* cleanup stale files
* examples
* move those back
* old
* delete more
2025-12-19 16:27:37 -04:00
chenyu
80b84f5267
ruff lint tinykitten ( #13762 )
...
deleted used import and double spaces. a few ignore to not change the real code
2025-12-19 14:31:00 -05:00
nimlgen
77191fb744
hive_reset for mi350 ( #13746 )
2025-12-18 12:02:28 +03:00
wozeparrot
99e667bdcd
tk fa bwd ( #13480 )
2025-12-17 23:56:37 -08:00
nimlgen
7081014c73
am_smi: mi300 ( #13737 )
...
* am_smi: mi300
* smi
* remo
2025-12-17 17:56:01 +03:00
nimlgen
3eecb4f123
am: mi350 support ( #13733 )
2025-12-17 14:57:21 +03:00
wozeparrot
5151a341b3
tk: small changes from fa bwd ( #13732 )
2025-12-16 22:44:36 -08:00
chenyu
041e9a41c9
add contiguous in BertIntermediate ( #13713 )
...
faster step with a lot less recomputation
2025-12-15 22:37:36 -05:00
wozeparrot
5d509499b2
tk: kernel finish groups stores ( #13704 )
2025-12-15 09:16:17 -08:00
nimlgen
615dcab767
am: minimal mi300 boot ( #13679 )
...
* nbio7_9
* psp
* gmc
* gfx
* sdma
* ih
* linter
* linter
* minor
* finish
* add missing
* do not allow warm boot for now
2025-12-15 15:55:03 +03:00
wozeparrot
7ef7ce2856
tk reg local store ( #13689 )
2025-12-14 23:07:30 -08:00