qazal
c88bb075f0
hotfix: correct way to get renderer arch ( #14743 )
2026-02-14 12:38:20 +08:00
qazal
6dc7ea58fd
make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4 ( #14742 )
...
* make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4
* no if CI, this is just the arch
2026-02-14 12:24:37 +09:00
George Hotz
4088d686b2
remove llvm requirement from amd ( #14717 )
...
* remove llvm requirement from amd
* tests pass
* test
* sink kernarg_size
* move stuff
* amd_asm_matmul to new style
* default type
* fix tests, simpler
* cu mode is faster and simpler
* darken
2026-02-13 10:50:12 +08:00
George Hotz
4680247e35
renderer/amd: move in tree ( #14702 )
...
* renderer/amd: move in tree
* fix paths in tests
* 24000 lines
* no delete for amd files
2026-02-12 18:09:16 +08:00
qazal
80b0119cef
llama: add new asm gemm shape ( #14611 )
...
* llama: add new asm gemm shape
* work
* cleanup
* half dtype
* more comment
2026-02-10 00:34:29 +09:00
qazal
087dab4c3b
gemm/asm: split out cdna tests from CI ( #14619 )
...
* gemm/asm: split out cdna tests from CI
* reorder
* work
2026-02-08 21:33:42 +09:00
George Hotz
183d38b128
remove CUSTOM_KERNEL / directly construct it ( #14604 )
...
* remove CUSTOM_KERNEL / directly construct it
* clean that up
* simpler multi
* custom kernel spec
* remove Kernel
* fix multi
* use sharded shape
* explicit regression test
2026-02-08 18:43:33 +08:00
nimlgen
6838b35cff
mockgpu: hevc ( #14606 )
...
* mockgpu: hevc
* eng
2026-02-07 12:27:55 +03:00
qazal
cf73d7e2a7
hotfix: disable slower asm gemm shape from llama seqlen 8192 ( #14582 )
2026-02-06 15:05:19 +09:00
George Hotz
6cbcf98627
KernelInfo is required on get_program ( #14571 )
...
* rangeify always adds KernelInfo
* fix tests
* skip flaky test
2026-02-06 10:49:27 +08:00
George Hotz
43e7eda4e7
grad_b uses custom gemm ( #14550 )
...
* grad_b uses custom gemm
* fix multi backward, acc is in float32
* test_gemm_batched
* square gemm
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
Co-authored-by: qazal <qazal.software@gmail.com >
2026-02-05 15:22:27 +09:00
qazal
f9cfb64cd9
test asm_gemm in CI ( #14551 )
...
* test asm_gemm in CI
* default float16
* use a smaller shape for multi
* smaller size
* smaller for CI
* smaller for ci
* need half
2026-02-05 13:32:22 +09:00
wozeparrot
bbcd3d67a3
fa: faster ( #14453 )
2026-02-02 21:34:17 -08:00
George Hotz
d4007f36e0
remove DEFINE_GLOBAL (it is PARAM now) ( #14488 )
2026-02-02 14:56:37 +08:00
qazal
54e78dbec8
viz: remove hardcoded strings in cfg tests ( #14462 )
2026-02-01 09:30:43 +09:00
qazal
66d6a68016
viz: sqtt work from cdna gemm ( #14434 )
...
* it's the tag
* initialize rows based on the disasm
* test_cfg with Ops.BINARY
* pyremu wants s_code_end?
* test_diamond
* diff cleanup
2026-01-30 14:00:56 +09:00
George Hotz
774a454bb5
assembly/amd: fix scratch SVE ( #14340 )
...
* assembly/amd: default python REMU
* mem_used
* no lane
* sve
* remove that
* needs s_code_end in tests
2026-01-26 21:03:51 +08:00
qazal
bf2d9d138f
viz: simplify amdgpu cfg ( #14326 )
...
* viz: replace llvm disasm with our disasm
* it starts with more code
* then it becomes less
* simpler, cdna disassembles with decimal simm16
* s_branch is upper case, add test
* simm16s and others
2026-01-25 15:21:45 +09:00
wozeparrot
d74587f16d
fa multi fix 2 ( #14314 )
2026-01-23 23:35:02 -08:00
wozeparrot
a879b54234
tk: fa jit fix ( #14170 )
2026-01-16 16:38:45 -08:00
qazal
b46da603fe
codegen/custom_kernel: do not attach KernelInfo to user program ( #14160 )
2026-01-15 14:01:48 +09:00
wozeparrot
a92778aa0c
tk: fa multi fix ( #14134 )
2026-01-13 17:22:15 -08:00
qazal
79d00521f8
viz: fix cfg err when endpgm is in the middle of stream ( #14128 )
...
* kernel from beautiful_mnist
* minimal test
* correct way to do this
* rm that
2026-01-14 02:00:34 +09:00
qazal
fd10fd245a
viz: cfg tokenizer fix and unit tests ( #14121 )
...
* output Ops.BINARY
* failing test for the cfg
* dsl renamed to offset and sz
* add better asserts
* move the note
2026-01-13 15:08:55 +09:00
wozeparrot
7c967399a4
tk: add failing test for fa multidevice ( #14116 )
2026-01-12 19:11:09 -08:00
George Hotz
330a0b686e
assembly/amd: clean up dsl and make type verification strict ( #14102 )
...
* assembly/amd: start newdsl
* work
* newdsl upd
* Reg is p nice
* cleaner
* work
* getting clean
* all fields
* more BitFields
* redo the pdfs with dsl2 syntax
* no lit
* cleanups
* more defaults
* fix get and remove crap
* aliases
* ugly but kind of works
* NULL, not rawimm
* clean up defaults
* only dsl
* asm fixes
* lit fixup
* more lit
* cleanups
* olddsl
* single pcode dict
* emu sort of works
* trash test
* global is global
* types property
* reg mods
* fix a few tests
* remove monkey patch
* fixes
* less hacks in tests
* less hacks in tests
* 4 test failures
* hw tests all pass
* fix compare emulator
* fix some tests
* 3 more
* fix and shorten sqtt
* handwritten
* fix validation
* test corrections
* all types validate
* fix dsl2 tests
* fix bugs in disasm
* skips on cdna
* work
* repr with reg[]
* fix bitfield tests
* merge pcodes in dsl
* remove override
* disasm uses inst.types
* simpler
2026-01-13 08:52:16 +09:00
chenyu
92246ea731
update tests, WEBGPU=1 pytest . passes ( #14089 )
...
* update tests, `WEBGPU=1 pytest .` passes
* minor update
2026-01-10 00:03:02 -05:00
b1tg
0fbc551622
train bert with fp8 ( #13874 )
...
* fp8 train
* clean
* lint
* test fix from #13439
* skip first/last layer
* rm __init__, restore unroll <=32 check
* tests
* clean test, remove unused
* multi-gpu test, clean quantize_to_fp8
* remove bert contiguous
* run script
* test: better check
* run script search
* add seed in bert data shuffle
* move script to mi350x folder
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-01-09 09:21:59 -05:00
wozeparrot
027b935269
tk: fix grouped load store ( #14035 )
2026-01-07 22:38:02 -08:00
qazal
3170365a5b
visualize SQTT with the same cfg infrastructure ( #13870 )
...
* start
* rough sketch
* post render dag
* art
* intro g key
* work
* custom color scale
* colors
* more blue
* better
* smaller
* use for loop in test
2026-01-06 14:53:20 +09:00
wozeparrot
f550f9204c
fa: failing test for bwd jit ( #14009 )
...
* tk: failing test for bwd jit
* feat: mark expectedFailure
* clean: spaces
2026-01-04 16:57:43 -05:00
qazal
2cc64d71b0
simplify mi350x gemm / viz asm tests ( #13984 )
...
* mi350x gemm cleanup
* asm tests work
* simpler asm tests
2026-01-03 11:11:07 +09:00
wozeparrot
ecbac8a338
tk: fa cleanups + causal test ( #13963 )
2026-01-01 18:05:00 -08:00
George Hotz
b998a80b5d
assembly/amd: split generated stuff into enum/ins ( #13924 )
2025-12-31 10:10:52 -05:00
George Hotz
81cf9ea0ab
rename to extra.assembly.amd ( #13879 )
2025-12-29 14:10:55 -05:00
qazal
a2da61d096
use new style amd compiler in viz ( #13848 )
...
* working version, handcode gfx1100 arch
* get target from device properties
* lib in cfg test program spec
2025-12-27 23:59:30 +09:00
qazal
f6de9095a0
switch asm tests to dsl ( #13840 )
...
* switch asm tests to dsl
* labeled basic blocks also work
* indenting for basic blocks
* allow define from star import
2025-12-27 02:15:16 +09:00
qazal
a1c1684b91
set .amdhsa_kernarg_size in asm test ( #13826 )
2025-12-25 13:08:14 +09:00
qazal
389f01c7f4
viz: amdgpu assembly basic block graph ( #13755 )
2025-12-22 23:17:16 +08:00
George Hotz
744af193f0
remove ScheduleItem and merge it with ExecItem ( #13759 )
...
* remove ExecItem and merge it with ScheduleItem
* less diff
* fix issues
* min diff
* don't change bufs in _lower
* min diff
* update
* revert
* fixes
* diff
2025-12-19 17:04:24 -04:00
wozeparrot
99e667bdcd
tk fa bwd ( #13480 )
2025-12-17 23:56:37 -08:00
wozeparrot
5d509499b2
tk: kernel finish groups stores ( #13704 )
2025-12-15 09:16:17 -08:00
wozeparrot
7ef7ce2856
tk reg local store ( #13689 )
2025-12-14 23:07:30 -08:00
wozeparrot
93f1baca77
feat: tk fa in tensor ( #13580 )
2025-12-05 14:36:29 -08:00
George Hotz
6bd355fa26
add needs_second_gpu decorator ( #13543 )
...
* add needs_second_gpu decorator
* more skips
* two more fixes
2025-12-02 19:08:23 -08:00
wozeparrot
0d55aec605
fix after end ( #13542 )
2025-12-02 18:42:58 -08:00
nimlgen
0874ba8cc8
test_hevc: do not download the whole file ( #13531 )
...
* test_hevc: do not download the whole file
* fix
2025-12-02 21:31:28 +03:00
wozeparrot
1b7dbfb37f
tk: named kernels + per kernel range id ( #13522 )
2025-12-01 22:51:04 -08:00
nimlgen
455dd88236
nv: minimal hevc ( #13502 )
...
* nv: minimal hevc
* validate
* not needed
* tralin
* var
* cpu
* fxi
* desc
* move
* cleanup
2025-11-30 16:46:55 +03:00
qazal
ae9c56134e
skip test_tk failing locally on macbook ( #13476 )
2025-11-29 01:15:37 +08:00