Commit Graph

1679 Commits

Author SHA1 Message Date
chenyu
842c978df3 remove staticmethod dtypes.max/min (#15227)
always use x.dtype.max/min
2026-03-11 23:11:24 -04:00
qazal
d3eef70162 viz: render shader clock frequency graph (#15197) 2026-03-12 01:32:49 +09:00
nimlgen
086081e35b tbgpu: add stapler to the script (#15180) 2026-03-07 00:07:27 +03:00
qazal
83f1faa142 sqtt: update CDNA wave packet field, start unskipping tests (#15168)
* correct field names

* packet types

* packet 5 is regc

* test skips
2026-03-06 21:37:44 +09:00
Roelof van Dijk
d65923bda5 tensor.py: add normalize function (#15159)
* tensor.py: add normalize function

* p==0 should match torch
2026-03-05 18:55:53 +08:00
wozeparrot
be23772d43 llama3 fixes part2 (#15150) 2026-03-04 23:43:50 -08:00
qazal
33a1970045 sqtt: simplify inst mapping, validate JUMP processing in CI (#15139)
* jump cleanup

* assert there's a JUMP

* new example for JUMP

* regenerate examples

* rdna4 work

* new packets

* work

* less for branch handling

* less verbose

* fix err message
2026-03-05 09:53:12 +09:00
nimlgen
cdc48da9cd hevc: assert and speed (#15122)
* hevc: assert and speed

* simpler
2026-03-04 19:01:02 +03:00
wozeparrot
4e9b85ecfd fa: pull inputs out of call (#15127) 2026-03-04 03:15:49 -08:00
George Hotz
8ebd24637b fix fa forward building with clang 22 (#15124)
* fix fa forward building with clang 22

* fix: override rocm path

---------

Co-authored-by: Woze Parrot <wozeparrot@gmail.com>
2026-03-04 02:32:25 -08:00
wozeparrot
df23057984 fa: change bwd grid dim + unshuffle using mops (#15068) 2026-03-04 01:23:40 -08:00
qazal
8dd691761d sqtt: remove old files (#15108) 2026-03-03 22:43:24 +09:00
Christopher Milan
de043226ba benchmark comma usbgpu driving_vision step and load time (#15103)
Co-authored-by: Comma Device <device@comma.ai>
2026-03-03 06:08:03 -05:00
wozeparrot
c35de9bd68 asm_gemm: support more sharding (#15002) 2026-03-02 23:16:37 -08:00
qazal
62ee976c1b gemm/asm: cleanup repeated patterns to helper functions (#15094) 2026-03-03 08:14:47 +09:00
nimlgen
dfa180413d tbgpu: sign nv (#15087) 2026-03-02 22:58:30 +03:00
chenyu
71f228f80f test exact kernel count in torch_backend/test_kernel_fusion (#15091) 2026-03-02 14:26:32 -05:00
qazal
f7aeff6061 viz: cli.py cleanups, do not require PYTHONPATH (#15085)
* cleanup the print

* sys.exit

* equal check

* cleanup unpacker

* cli doesn't need PYTHONPATH

* no semicolons

* %s/PYTHONPATH=. //g
2026-03-02 19:24:38 +09:00
qazal
b8a55d5f68 sqtt: new packet types, add discovery script (#14960) 2026-02-28 04:27:27 +09:00
qazal
448e997be4 gemm/asm: cleanup custom function args (#15007) 2026-02-25 22:05:56 +09:00
wozeparrot
8d9545e09e llama3: correctly shard wqkv (#14978) 2026-02-23 23:57:10 -08:00
wozeparrot
25565b2410 fa: test for mp (#14907) 2026-02-22 21:47:36 -08:00
qazal
d6145736c7 sqtt: examples generator changes from inst_discovery (#14961)
* sqtt examples generator changes from inst_discovery

* rdna4

* rdna3

* cdna

* sad reality for mi300x
2026-02-23 14:42:48 +09:00
George Hotz
8ef5544e4a realized PYTHON copies (#14934)
* realized PYTHON copies

* comment that out

* fix that test

* append afters

* contig

* disk copies

* should be 124

* 332
2026-02-21 20:29:31 +08:00
George Hotz
55d3a5def9 preallocate all realized buffers (#14823)
* preallocate all realized buffers

* contiguous

* work

* comment that out

* move to schedule

* better

* correct fix

* just buffer

* disk bufs

* fixes disk tensor stuff

* fix symbolic stuff

* fix multi

* 162 failures

* bugfixes

* don't check that anymore

* fix schedule tests

* mnist should be contiguious

* type and buffer

* fix tests

* shrink axis correction

* mypy fixes

* tests skips

* same 37 failures

* dedup

* no shrink in the graph

* 29 failures

* skips

* fix custom kernel

* fix training

* those optimizations aren't supported currently

* simpler

* more correct

* tests

* 14 failures

* works

* fix that test

* broken

* 11 failures

* only kernel counts left

* fixes

* all tests pass

* remove tensor_map

* op test

* 200 -> 230

* test fixes

* fixes

* revert test_tiny thing

* guard

* revert that

* test tiny passes

* no contigs there

* base realize back

* Revert "no contigs there"

This reverts commit c45bb9fcfd.

* revert that

* chop many assigns

* 12 failures

* fix tests

* tests

* apply after

* pre-commit

* remove old code

* delete that

* fix types

* remove extra contig

* fix dataloader

* torch fix

* disk fix

* update kernel fusion numbres

* runs on amd

* restore kernel count

* add that rule back

* that

* disable that

* wrong

* add the correct rule for that folding

* more tests

* guard c1.arg

* no newlines

* realize those

* split into a different file

* remove detach/contig back

* skip 2

* update that
2026-02-20 20:05:54 +08:00
qazal
32f569b573 viz/sqtt: decoder fixes pre rdna4/cdna4 work (#14900)
* viz/sqtt: decoder fixes pre rdna4/cdna4 work

* fix

* branch_inst + more tests

* smaller
2026-02-20 12:10:15 +09:00
wozeparrot
9317e96881 fa: explicitly pass shapes (#14857) 2026-02-19 05:26:16 -08:00
nimlgen
3b95fa0ed4 am_smi: enable mem usage back (#14858) 2026-02-18 19:27:27 +03:00
wozeparrot
6d301ad2c4 feat: llama wqkv (#14841) 2026-02-17 23:01:33 -08:00
wozeparrot
95e97ec341 seperate llama optim (#14810) 2026-02-17 13:02:35 -08:00
qazal
f8e485ee9e nvcc/nvdisasm macos shim (#14822)
* move to backend

* and arch

* setup_nvcc_osx

* blackwell

* min test

* now getting dumb assert is_ptx

* support cubin.

* work

* remove that

* simpler
2026-02-17 20:07:05 +09:00
qazal
f590564bf7 gemm multiple is only for cdna4 asm (#14814)
* gemm multiple is only for cdna4 asm

* move to backend

* and arch

* path
2026-02-17 14:00:02 +09:00
George Hotz
5bd2862d1a late compile the cdna gemm (#14783)
* late compile the cdna gemm

* remove old things

* finalize inplace

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2026-02-17 13:04:22 +09:00
George Hotz
f081f154ae parameterize the CDNA asm gemm (#14813)
* parameterize the CDNA asm gemm

* fix llama test

* fix

* add more gemmt ests

* confirm all match

* test these asm gemms
2026-02-17 11:35:18 +08:00
nimlgen
131bbbbfd8 am: smu_v13_0_12 (#14800) 2026-02-16 22:58:10 +03:00
wozeparrot
45aebe1572 hipkittens fa backward (#14723) 2026-02-16 00:38:44 -08:00
qazal
c7a4dbf918 viz: get program binary from the UOp (#14787)
* viz: get program binary from the UOp

* remove that

* less

* rename View Program to View Source

* two words

* fix
2026-02-16 15:46:58 +09:00
George Hotz
dff9cf35c2 amd asm emulator fixes + run it in CI (#14786)
* amd asm fix, try 2

* fix tests
2026-02-16 13:24:21 +08:00
qazal
55a4dfa2e0 cdna4 asm_gemm tests in CI on the null backend (#14785)
* cdna4 asm_gemm tests in CI on the null backend

* no .numpy() in null

* better

* gemm/asm: device comes from renderer
2026-02-16 14:06:23 +09:00
George Hotz
ac079e43d7 ElementwiseMixin (#14777) 2026-02-16 08:50:47 +08:00
qazal
33b31d9cd6 tinykittens flash attention dtype fix, add CI (#14770)
* don't hardcdoe amd device

* add failing tests, ci too

* fix: fix for dtype mixin

* bump to rocm 7.1

---------

Co-authored-by: Woze Parrot <wozeparrot@gmail.com>
2026-02-16 01:15:11 +09:00
qazal
9bb6014900 keep existing profile trace in viz cli (#14757) 2026-02-15 13:16:32 +09:00
nimlgen
4ab51b55bd stream pma decoder (#14746) 2026-02-14 17:40:18 +03:00
George Hotz
c0de4f75b1 improve mmapeak, print names with sqtt (#14726) 2026-02-13 16:07:06 +08:00
wozeparrot
0613c0ac0c hipkittens fa forward (#14692) 2026-02-12 20:16:43 -08:00
George Hotz
4088d686b2 remove llvm requirement from amd (#14717)
* remove llvm requirement from amd

* tests pass

* test

* sink kernarg_size

* move stuff

* amd_asm_matmul to new style

* default type

* fix tests, simpler

* cu mode is faster and simpler

* darken
2026-02-13 10:50:12 +08:00
chenyu
557134e1c7 model/test fix that failed with WEBGPU=1 DEBUG=2 (#14706) 2026-02-12 09:08:16 -05:00
George Hotz
4680247e35 renderer/amd: move in tree (#14702)
* renderer/amd: move in tree

* fix paths in tests

* 24000 lines

* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
d5fc3ea1ba assembly/amd: mypy+ruff passes (#14701)
* assembly/amd: mypy+ruff passes

* touchups
2026-02-12 16:59:42 +08:00
George Hotz
025049c521 clean up sqtt / update src formatting in viz (#14696)
* update src formatting in viz

* rename to RDNA3/RDNA4 in sqtt

* wrap

* move sqttmap

* update readme

* why did that change?

* cdna

* that's just for test
2026-02-12 14:27:14 +08:00