Commit Graph

88 Commits

Author SHA1 Message Date
George Hotz
1ae6528bb6 move schedule into schedule (#15736)
* move schedule into schedule

* callify to root

* sched docs
2026-04-15 11:03:25 +08:00
Christopher Milan
0ed8d9271d Renderers accept Target or nothing (#15590) 2026-04-03 01:09:41 -04:00
Christopher Milan
a12d3951de fix test_export_model imports (#15389) 2026-03-20 07:27:01 -04:00
qazal
00817cf65e viz: all tests can run on the NULL device (#15328)
* remove that

* move to test_viz

* get_cfg

* do not use os.environ

* hm

* it's always on NULL

* import renderer

* no import *
2026-03-18 04:14:20 +09:00
qazal
4d60312f7f viz: asm python dsl syntax highlighting (#15259) 2026-03-14 06:37:43 +09:00
qazal
6209ddfc90 viz: improve disasm of s_code_end (#15258)
* viz: improve amd disasm of s_code_end

* better tests

* order was good
2026-03-14 03:31:14 +09:00
wozeparrot
25565b2410 fa: test for mp (#14907) 2026-02-22 21:47:36 -08:00
qazal
1538960002 viz: smaller view for repeated asm instructions in cfg (#14954)
* simple test

* todo

* feature
2026-02-23 10:41:43 +09:00
qazal
52b51a0324 test fixes from rdna4 sqtt (#14902) 2026-02-20 14:42:33 +09:00
qazal
f590564bf7 gemm multiple is only for cdna4 asm (#14814)
* gemm multiple is only for cdna4 asm

* move to backend

* and arch

* path
2026-02-17 14:00:02 +09:00
George Hotz
f081f154ae parameterize the CDNA asm gemm (#14813)
* parameterize the CDNA asm gemm

* fix llama test

* fix

* add more gemmt ests

* confirm all match

* test these asm gemms
2026-02-17 11:35:18 +08:00
wozeparrot
45aebe1572 hipkittens fa backward (#14723) 2026-02-16 00:38:44 -08:00
qazal
ac62d28ddc viz: amdgpu arch cleanup (#14790)
* viz: amdgpu arch cleanup

* don't do that

* simpler sqttmap

* work

* self.arch
2026-02-16 16:48:12 +09:00
qazal
55a4dfa2e0 cdna4 asm_gemm tests in CI on the null backend (#14785)
* cdna4 asm_gemm tests in CI on the null backend

* no .numpy() in null

* better

* gemm/asm: device comes from renderer
2026-02-16 14:06:23 +09:00
qazal
33b31d9cd6 tinykittens flash attention dtype fix, add CI (#14770)
* don't hardcdoe amd device

* add failing tests, ci too

* fix: fix for dtype mixin

* bump to rocm 7.1

---------

Co-authored-by: Woze Parrot <wozeparrot@gmail.com>
2026-02-16 01:15:11 +09:00
qazal
c88bb075f0 hotfix: correct way to get renderer arch (#14743) 2026-02-14 12:38:20 +08:00
qazal
6dc7ea58fd make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4 (#14742)
* make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4

* no if CI, this is just the arch
2026-02-14 12:24:37 +09:00
George Hotz
4088d686b2 remove llvm requirement from amd (#14717)
* remove llvm requirement from amd

* tests pass

* test

* sink kernarg_size

* move stuff

* amd_asm_matmul to new style

* default type

* fix tests, simpler

* cu mode is faster and simpler

* darken
2026-02-13 10:50:12 +08:00
George Hotz
4680247e35 renderer/amd: move in tree (#14702)
* renderer/amd: move in tree

* fix paths in tests

* 24000 lines

* no delete for amd files
2026-02-12 18:09:16 +08:00
qazal
80b0119cef llama: add new asm gemm shape (#14611)
* llama: add new asm gemm shape

* work

* cleanup

* half dtype

* more comment
2026-02-10 00:34:29 +09:00
qazal
087dab4c3b gemm/asm: split out cdna tests from CI (#14619)
* gemm/asm: split out cdna tests from CI

* reorder

* work
2026-02-08 21:33:42 +09:00
George Hotz
183d38b128 remove CUSTOM_KERNEL / directly construct it (#14604)
* remove CUSTOM_KERNEL / directly construct it

* clean that up

* simpler multi

* custom kernel spec

* remove Kernel

* fix multi

* use sharded shape

* explicit regression test
2026-02-08 18:43:33 +08:00
nimlgen
6838b35cff mockgpu: hevc (#14606)
* mockgpu: hevc

* eng
2026-02-07 12:27:55 +03:00
qazal
cf73d7e2a7 hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582) 2026-02-06 15:05:19 +09:00
George Hotz
6cbcf98627 KernelInfo is required on get_program (#14571)
* rangeify always adds KernelInfo

* fix tests

* skip flaky test
2026-02-06 10:49:27 +08:00
George Hotz
43e7eda4e7 grad_b uses custom gemm (#14550)
* grad_b uses custom gemm

* fix multi backward, acc is in float32

* test_gemm_batched

* square gemm

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
Co-authored-by: qazal <qazal.software@gmail.com>
2026-02-05 15:22:27 +09:00
qazal
f9cfb64cd9 test asm_gemm in CI (#14551)
* test asm_gemm in CI

* default float16

* use a smaller shape for multi

* smaller size

* smaller for CI

* smaller for ci

* need half
2026-02-05 13:32:22 +09:00
wozeparrot
bbcd3d67a3 fa: faster (#14453) 2026-02-02 21:34:17 -08:00
George Hotz
d4007f36e0 remove DEFINE_GLOBAL (it is PARAM now) (#14488) 2026-02-02 14:56:37 +08:00
qazal
54e78dbec8 viz: remove hardcoded strings in cfg tests (#14462) 2026-02-01 09:30:43 +09:00
qazal
66d6a68016 viz: sqtt work from cdna gemm (#14434)
* it's the tag

* initialize rows based on the disasm

* test_cfg with Ops.BINARY

* pyremu wants s_code_end?

* test_diamond

* diff cleanup
2026-01-30 14:00:56 +09:00
George Hotz
774a454bb5 assembly/amd: fix scratch SVE (#14340)
* assembly/amd: default python REMU

* mem_used

* no lane

* sve

* remove that

* needs s_code_end in tests
2026-01-26 21:03:51 +08:00
qazal
bf2d9d138f viz: simplify amdgpu cfg (#14326)
* viz: replace llvm disasm with our disasm

* it starts with more code

* then it becomes less

* simpler, cdna disassembles with decimal simm16

* s_branch is upper case, add test

* simm16s and others
2026-01-25 15:21:45 +09:00
wozeparrot
d74587f16d fa multi fix 2 (#14314) 2026-01-23 23:35:02 -08:00
wozeparrot
a879b54234 tk: fa jit fix (#14170) 2026-01-16 16:38:45 -08:00
qazal
b46da603fe codegen/custom_kernel: do not attach KernelInfo to user program (#14160) 2026-01-15 14:01:48 +09:00
wozeparrot
a92778aa0c tk: fa multi fix (#14134) 2026-01-13 17:22:15 -08:00
qazal
79d00521f8 viz: fix cfg err when endpgm is in the middle of stream (#14128)
* kernel from beautiful_mnist

* minimal test

* correct way to do this

* rm that
2026-01-14 02:00:34 +09:00
qazal
fd10fd245a viz: cfg tokenizer fix and unit tests (#14121)
* output Ops.BINARY

* failing test for the cfg

* dsl renamed to offset and sz

* add better asserts

* move the note
2026-01-13 15:08:55 +09:00
wozeparrot
7c967399a4 tk: add failing test for fa multidevice (#14116) 2026-01-12 19:11:09 -08:00
George Hotz
330a0b686e assembly/amd: clean up dsl and make type verification strict (#14102)
* assembly/amd: start newdsl

* work

* newdsl upd

* Reg is p nice

* cleaner

* work

* getting clean

* all fields

* more BitFields

* redo the pdfs with dsl2 syntax

* no lit

* cleanups

* more defaults

* fix get and remove crap

* aliases

* ugly but kind of works

* NULL, not rawimm

* clean up defaults

* only dsl

* asm fixes

* lit fixup

* more lit

* cleanups

* olddsl

* single pcode dict

* emu sort of works

* trash test

* global is global

* types property

* reg mods

* fix a few tests

* remove monkey patch

* fixes

* less hacks in tests

* less hacks in tests

* 4 test failures

* hw tests all pass

* fix compare emulator

* fix some tests

* 3 more

* fix and shorten sqtt

* handwritten

* fix validation

* test corrections

* all types validate

* fix dsl2 tests

* fix bugs in disasm

* skips on cdna

* work

* repr with reg[]

* fix bitfield tests

* merge pcodes in dsl

* remove override

* disasm uses inst.types

* simpler
2026-01-13 08:52:16 +09:00
chenyu
92246ea731 update tests, WEBGPU=1 pytest . passes (#14089)
* update tests, `WEBGPU=1 pytest .` passes

* minor update
2026-01-10 00:03:02 -05:00
b1tg
0fbc551622 train bert with fp8 (#13874)
* fp8 train

* clean

* lint

* test fix from #13439

* skip first/last layer

* rm __init__, restore unroll <=32 check

* tests

* clean test, remove unused

* multi-gpu test, clean quantize_to_fp8

* remove bert contiguous

* run script

* test: better check

* run script search

* add seed in bert data shuffle

* move script to mi350x folder

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-09 09:21:59 -05:00
wozeparrot
027b935269 tk: fix grouped load store (#14035) 2026-01-07 22:38:02 -08:00
qazal
3170365a5b visualize SQTT with the same cfg infrastructure (#13870)
* start

* rough sketch

* post render dag

* art

* intro g key

* work

* custom color scale

* colors

* more blue

* better

* smaller

* use for loop in test
2026-01-06 14:53:20 +09:00
wozeparrot
f550f9204c fa: failing test for bwd jit (#14009)
* tk: failing test for bwd jit

* feat: mark expectedFailure

* clean: spaces
2026-01-04 16:57:43 -05:00
qazal
2cc64d71b0 simplify mi350x gemm / viz asm tests (#13984)
* mi350x gemm cleanup

* asm tests work

* simpler asm tests
2026-01-03 11:11:07 +09:00
wozeparrot
ecbac8a338 tk: fa cleanups + causal test (#13963) 2026-01-01 18:05:00 -08:00
George Hotz
b998a80b5d assembly/amd: split generated stuff into enum/ins (#13924) 2025-12-31 10:10:52 -05:00
George Hotz
81cf9ea0ab rename to extra.assembly.amd (#13879) 2025-12-29 14:10:55 -05:00