Commit Graph

171 Commits

Author SHA1 Message Date
qazal
bd55507ee4 RDNA3 fp16 assembly gemm 85 TFLOPS (#13990) 2026-01-03 18:34:23 +09:00
qazal
2cc64d71b0 simplify mi350x gemm / viz asm tests (#13984)
* mi350x gemm cleanup

* asm tests work

* simpler asm tests
2026-01-03 11:11:07 +09:00
qazal
5f52266225 mi350x gemm: use Tensor.custom_kernel in asm test (#13969)
* mi350x gemm: use Tensor.custom_kernel in asm test

* A @ B for baseline
2026-01-02 18:30:50 +09:00
qazal
c0f52c9dcb split assembly gemm to per arch directory (#13953) 2026-01-02 00:10:22 +09:00
qazal
6a5430ab00 correct args order in mi350x gemm (#13949) 2026-01-01 23:01:46 +09:00
qazal
b23f4517ab prep mi350x gemm for python dsl (#13918)
* start by pruning existing asm

* better branch names

* split to template and real instructions
2025-12-31 20:00:57 +09:00
qazal
b557c46233 assembly gemm clean ups, instructions for cli (#13892) 2025-12-30 16:14:06 +09:00
qazal
f541540129 variable N for asm gemm (#13869)
* variable N for asm gemm

* cleanup spacing
2025-12-29 19:35:50 +09:00
qazal
fc5278746f mi350x assembly gemm cleanups (#13867) 2025-12-29 18:47:23 +09:00
qazal
066d96c397 print tflops in asm gemm test (#13859)
* print tflops in asm gemm test

* change order
2025-12-29 02:26:40 +09:00
qazal
2cfbabdc34 mi350x 1tflop bf16 gemm in extra (#13702) 2025-12-28 21:45:42 +09:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
df6cde8a00 cleanup stale examples/extra (#13764)
* cleanup stale files

* examples

* move those back

* old

* delete more
2025-12-19 16:27:37 -04:00
George Hotz
bd4b9de7d2 use numpy in amd_uop_matmul for simpler tracing (#13503) 2025-11-30 08:04:38 -08:00
George Hotz
98e9e73286 hotfix: amd_uop_matmul getenvs 2025-11-17 13:26:01 -08:00
George Hotz
ba84d415fe work from benchmarking tinybox red v2 (#13264)
* work from benchmarking tinybox red v2

* gpuburn
2025-11-13 16:38:40 -08:00
George Hotz
faf68c03a8 more mi350x matmul work (#13138)
* more mi350x matmul work

* broken compute
2025-11-13 09:09:28 -08:00
George Hotz
2d4f01fda0 move mixins to mixin dir (#13105)
* move mixins to mixin dir

* math
2025-11-05 10:18:33 -08:00
wozeparrot
4ed0f216b5 fix: make max_matmul run again (#13085) 2025-11-03 18:09:09 -08:00
George Hotz
416b15cc59 improve uop matmul syntax (#13074)
* improve uop matmul syntax

* store takes const

* copy

* cleanups

* faster and simpler

* label them reduce

* better syntax

* touchup
2025-11-03 21:34:26 +08:00
George Hotz
1e3d6e49a6 index slicing + allclose (#13071)
* continue work on slicing+allclose

* Revert "Revert "slicing + allclose""

This reverts commit 6c7a12f21c.

* fix tests + better syntax

* forgot an after

* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
8cbef912d2 move reshape to MathTraits (#13054)
* move reshape to MathTraits

* confirm it works in amd_uop_matmul
2025-11-02 12:56:15 +08:00
George Hotz
267be7fc5e fp16 acc 2025-11-02 12:53:04 +08:00
George Hotz
e98506735b add CONTRACT support to UOp programs (#13043)
* add contract support

* use contract

* 342 tflops
2025-11-01 19:11:32 +08:00
George Hotz
65a0a31475 AMD mi350x matmul from stream (#13040)
* works

* working mfma

* 120 TFLOPS

* regs

* 192 TFLOPS

* try pipelining

* something

* notes

* contract

* linter to 3.11

* that was a bug
2025-11-01 17:55:19 +08:00
George Hotz
bc178d14a9 matmul example on metal showing off tensor core (#13033)
* matmul example on metal showing off tensor core

* flip the args of placeholder

* mat_idx

* imp
2025-10-31 19:40:36 +08:00
George Hotz
b46229ca51 use shrink in amd_matmul_uop (#13026)
* use shrink in amd_matmul_uop

* colors
2025-10-31 10:43:41 +08:00
George Hotz
512513c403 cleanup amd uop matmul (#13025)
* cleanup amd uop matmul

* remove mod

* move that out

* better variable names

* var names

* more

* render fallback

* colors
2025-10-31 10:04:45 +08:00
George Hotz
4a741e8364 modernize amd uop matmul (#13011)
* modernize amd uop matmul

* progress

* comment

* more comments

* revert that

* mac cleanups

* fix estimates

* format
2025-10-30 17:02:38 +08:00
George Hotz
25c2da1579 check SPEC=2 in CI (#12945)
* check SPEC=2 in CI

* split SPEC=2

* fast enough
2025-10-27 21:53:57 +08:00
chenyu
c5cee74706 remove BLOCK_REORDER (#12854)
not used
2025-10-21 19:10:14 -04:00
b1tg
60d7e232f2 cuda fp8 (#12782)
* cuda fp8

* tensor core

* tc test

* clean

* clean pm
2025-10-21 15:05:25 -04:00
chenyu
ae51bdd06a remove trivial use of RANGEIFY flag (#12550)
some tests need update still
2025-10-09 02:29:38 -04:00
chenyu
0e266f376c ops_gpu -> ops_cl (#12103) 2025-09-10 15:15:48 -04:00
nimlgen
fb96394ff5 auto-select available compilers (#12094)
* device: auto select compilers

* fix

* metal+opencl

* nv/cuda

* test without ptx

* ptx

* fix tests

* fix

* fix test

* rename

* test + cleaner

* xx

* ops

* better test

* win?

* um?

* types

* debug

* win??

* sep rung

* wtf?

* debug

* skip win

* revert this

* types
2025-09-10 19:52:01 +03:00
George Hotz
38dcadf07b delete kernel.py (#12040)
* delete kernel.py

* delete that file

* rip and tear

* don't test search

* imports

* fix torch frontend

* not a part of regen
2025-09-05 15:52:07 -07:00
George Hotz
afad7d0cd1 remove dtype from range, it will be dtypes.index soon [pr] (#11914)
* remove dtype from range, it will be dtypes.index soon [pr]

* a few more
2025-08-29 09:52:07 -07:00
George Hotz
394c2d1db1 update Kernel API in tests + move optimize_local_size (#11907) 2025-08-28 15:12:47 -07:00
George Hotz
27701ef823 add locals support to rangeify (#11826) 2025-08-24 14:03:12 -07:00
qazal
793ace530e update amd_uop_matmul.py import (#11581)
Using this for testing SQTT
2025-08-08 17:07:35 +03:00
George Hotz
82be8abfd2 move opt under codegen (#11569) 2025-08-07 14:19:17 -07:00
George Hotz
4f26a9ad32 check elements_per_thread in tensorcore [pr] (#11435) 2025-07-30 11:55:48 -07:00
George Hotz
1bef2d80c1 unrolls are all in the same scope (#11429)
* unrolls are all in the same scope

* fix that import
2025-07-29 16:55:37 -07:00
George Hotz
03909f2772 permute locals for HL uop matmul (#11412)
* permute locals for HL uop matmul

* parens fix that

* permutes

* 20 TFLOPS
2025-07-29 08:19:59 -07:00
George Hotz
735ad5f10d kernel4 and 5 in uops (#11411)
* move simplify views to merge views

* add amd kernel 4

* Revert "move simplify views to merge views"

This reverts commit 1e07dff384.

* k4 in python

* kernel4 written in uops

* k5 support

* cleanups
2025-07-28 19:35:48 -07:00
George Hotz
fddc645668 HL=2 top matmul (#11406)
* HL=2 top matmul

* top colored
2025-07-28 12:32:38 -07:00
George Hotz
dfeee63d30 uop matmul work (#11388)
* uop matmul work

* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
2c70eaf18c fix load / barrier (#11386)
* fix load / barrier

* cleanups

* fix CI
2025-07-26 10:27:37 -07:00
George Hotz
466ab5a3f2 store/load not pass through index (#11381)
* noop

* fix noop

* store cat is NOOP

* store dtype is void

* stores aren't passed through anymore

* meh, skip those for ptx

* correct ptx skip

* hl runs
2025-07-25 21:01:47 -07:00
George Hotz
490a93902c define reg doesn't have init anymore (#11365)
* define reg doesn't have init anymore

* remove that

* no special logic for dr

* fix amd uop matmul
2025-07-24 19:15:49 -07:00