Commit Graph

1311 Commits

Author SHA1 Message Date
qazal
8119d9f082 sqtt: decode each instruction exec (#13093)
* sqtt: decode each instruction exec

* start tests

* run_asm

* capture sqtt per kernel

* chaining vgprs

* test things

* inst_execs in viz

* can also configure l and g

* 1l + cleanup

* test_sleep

* test_wmma

* work

* test sleep with llvm builtin
2025-11-05 17:30:27 +08:00
nimlgen
eaf7cbc178 amd: flush sqtt after each kernel (#13092)
* amd: flush sqtt after each kernel

* merge for rgp
2025-11-04 22:12:48 +08:00
nimlgen
49191ada77 roc: install sqtt decoder (#13091)
* roc: install?

* msg

* 0.1.4
2025-11-04 18:56:01 +08:00
nimlgen
2e97eaa866 roc: no nullptr when no wave instructions (#13087) 2025-11-04 17:32:14 +08:00
wozeparrot
9c00c0688a tk fa: use 16x64 tiles (#13086) 2025-11-03 18:25:38 -08:00
wozeparrot
4ed0f216b5 fix: make max_matmul run again (#13085) 2025-11-03 18:09:09 -08:00
qazal
6df34a5887 lint sqtt parser with mypy (#13079)
* llvm address table errs

* mypy likes annotated dicts

* unwrap nullable
2025-11-04 00:53:59 +08:00
nimlgen
dfde3f54d9 rocprof: use llvm disasm (#13077)
* rocprof: use llvm disasm

* rm
2025-11-03 23:58:58 +08:00
qazal
27d42fd575 sqtt decoder print behind DEBUG>=5 (#13076)
* sqtt decoder print behind DEBUG>=5

* gfx version stuff also behind 5
2025-11-03 23:20:03 +08:00
George Hotz
416b15cc59 improve uop matmul syntax (#13074)
* improve uop matmul syntax

* store takes const

* copy

* cleanups

* faster and simpler

* label them reduce

* better syntax

* touchup
2025-11-03 21:34:26 +08:00
qazal
1c0d4f1cd2 viz: counters loader (#12987)
* standalone custom loader

* first iteration on the ui

* work

* add center helper

* add edge offsets

* enumerate all edge types

* try dagre layout algorithm

* simpler spec

* bring back double edges

* more work on edge paths

* aesthetics

* custom edges also works

* dimmer inactive links

* cleanup

* cleanup

* split out the ncu layout

* this is just a k/v map now

* rm that

* more cleanup and comments

* do work

* also this work

* simpler start

* rm that

* sqtt work

* view sqtt

* sqtt

* --custom is just in profile

* wrap c call

* from tinygrad install

* eg. module not found
2025-11-03 19:42:36 +08:00
George Hotz
1e3d6e49a6 index slicing + allclose (#13071)
* continue work on slicing+allclose

* Revert "Revert "slicing + allclose""

This reverts commit 6c7a12f21c.

* fix tests + better syntax

* forgot an after

* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
8cbef912d2 move reshape to MathTraits (#13054)
* move reshape to MathTraits

* confirm it works in amd_uop_matmul
2025-11-02 12:56:15 +08:00
George Hotz
267be7fc5e fp16 acc 2025-11-02 12:53:04 +08:00
wozeparrot
8206eab4fc fix: tk fa 4 workers (#13052) 2025-11-01 16:41:29 -07:00
George Hotz
e98506735b add CONTRACT support to UOp programs (#13043)
* add contract support

* use contract

* 342 tflops
2025-11-01 19:11:32 +08:00
George Hotz
65a0a31475 AMD mi350x matmul from stream (#13040)
* works

* working mfma

* 120 TFLOPS

* regs

* 192 TFLOPS

* try pipelining

* something

* notes

* contract

* linter to 3.11

* that was a bug
2025-11-01 17:55:19 +08:00
nimlgen
a23226e61e amd: pmc for gfx9 (#13036)
* amd: pmc for gfx9

* xcc

* vmid mask

* ugh

* tiny

* minor

* sorryg
2025-11-01 04:26:34 +08:00
nimlgen
f6786c1bfd autogen: py314 (#13038)
* autogen: py314

* bump py?
2025-11-01 04:02:19 +08:00
George Hotz
bc178d14a9 matmul example on metal showing off tensor core (#13033)
* matmul example on metal showing off tensor core

* flip the args of placeholder

* mat_idx

* imp
2025-10-31 19:40:36 +08:00
George Hotz
b46229ca51 use shrink in amd_matmul_uop (#13026)
* use shrink in amd_matmul_uop

* colors
2025-10-31 10:43:41 +08:00
wozeparrot
78f7650eec faster tk matmul (#13006) 2025-10-30 19:09:27 -07:00
George Hotz
512513c403 cleanup amd uop matmul (#13025)
* cleanup amd uop matmul

* remove mod

* move that out

* better variable names

* var names

* more

* render fallback

* colors
2025-10-31 10:04:45 +08:00
nimlgen
629b177b66 amd: sqtt works in profile mode (#13019) 2025-10-30 23:48:52 +08:00
nimlgen
4d7a7096c9 am: enable perfmon (#13013)
* am: enable perfmon

* try

* msg
2025-10-30 22:28:36 +08:00
George Hotz
4a741e8364 modernize amd uop matmul (#13011)
* modernize amd uop matmul

* progress

* comment

* more comments

* revert that

* mac cleanups

* fix estimates

* format
2025-10-30 17:02:38 +08:00
wozeparrot
92a87e37e4 fix: fetch_file (#13010) 2025-10-29 22:44:22 -07:00
nimlgen
a6f5b1482e amd: perf counters (#12975)
* amd: perf counters

* sq

* cleaner

* fix

* if enabled

* ruff

* mypy

* counters

* reset

* fix

* no cpu
2025-10-30 00:10:31 +08:00
wozeparrot
d66c997a39 feat: thunderkittens fa2 (#12955) 2025-10-28 11:27:45 -07:00
wozeparrot
24884c6768 fix: don't use KITTENS_HOPPER for 4090 (#12954) 2025-10-27 17:19:53 -07:00
George Hotz
25c2da1579 check SPEC=2 in CI (#12945)
* check SPEC=2 in CI

* split SPEC=2

* fast enough
2025-10-27 21:53:57 +08:00
nimlgen
f4da94af28 system: reset is a method of pcidevice (#12936) 2025-10-27 16:21:10 +08:00
wozeparrot
6b54378eba working kitten matmul (#12935) 2025-10-26 23:40:49 -07:00
George Hotz
db5c918215 source extra/cl_android.sh to fix opencl on android 2025-10-26 15:27:51 +08:00
qazal
2f95c10702 remu new instructions / use volatile in emulator tests (#12862)
* remu new instructions

* start moving to volatile

* test_simple works

* test_exec_mov works and lid is still here

* test_exec_cmp_vopc

* clang did s_mov_b32 exec_lo, 1

* don't hardcode v1

* support volatile in tests

* hw_test passes

* only the volatile version

* subrev saturating behavior
2025-10-23 11:13:43 +08:00
chenyu
c5cee74706 remove BLOCK_REORDER (#12854)
not used
2025-10-21 19:10:14 -04:00
b1tg
60d7e232f2 cuda fp8 (#12782)
* cuda fp8

* tensor core

* tc test

* clean

* clean pm
2025-10-21 15:05:25 -04:00
chenyu
8baa61bd67 use torch 2.9 and its Muon in test (#12773)
* use torch 2.9 and its Muon in test

* relax and disable
2025-10-21 13:35:17 -04:00
chenyu
f51f9aaa16 muon ns_params -> ns_coefficients (#12850)
match the official torch one
2025-10-21 12:35:52 -04:00
nimlgen
1ad6598963 amd: trace all instructions (#12831) 2025-10-21 20:52:24 +08:00
George Hotz
cad3ada909 tinygpu: build with SIP off works 2025-10-20 09:11:09 +08:00
nimlgen
59784a5972 amd: ensure ts is written (#12794) 2025-10-19 23:55:49 +08:00
George Hotz
89e7f2fa00 mmapeak: gfx1103 support 2025-10-19 16:57:28 +08:00
George Hotz
617614beb7 add mi350x support to mmapeak (#12784) 2025-10-19 16:11:07 +08:00
nimlgen
037f6e8fa0 qcom: ioctl for 7xx (#12777) 2025-10-18 20:33:14 +08:00
geohotstan
5d209ee7ec onnx helper intermediate node output validation (#12740)
* start

* update comments

* good

* add comments and better printing

* done
2025-10-16 11:17:47 -04:00
nimlgen
3aa2277b8f nv: usb4 (#12696)
* hackish

* prog

* match

* l

* simpler

* refactor

* not osx

* apple things

* tiny changes

* fix mask

* match fix

* nn
2025-10-16 20:11:19 +08:00
wozeparrot
cc2dfe22f5 tinyfs: fetch file utility (#12719) 2025-10-15 23:38:56 -07:00
George Hotz
4a151e7533 make xcode signing happy, waiting for entitlement (#12712) 2025-10-16 10:20:34 +08:00
Daniel
d65bd669f8 update tiny torch backend hook (#12575)
* update the backend to fix torch deprecation warning

* use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients

* fix indentation

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-15 14:02:33 -04:00