31 Commits

Author SHA1 Message Date
qazal
bd55507ee4 RDNA3 fp16 assembly gemm 85 TFLOPS (#13990) 2026-01-03 18:34:23 +09:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
bd4b9de7d2 use numpy in amd_uop_matmul for simpler tracing (#13503) 2025-11-30 08:04:38 -08:00
George Hotz
98e9e73286 hotfix: amd_uop_matmul getenvs 2025-11-17 13:26:01 -08:00
George Hotz
416b15cc59 improve uop matmul syntax (#13074)
* improve uop matmul syntax

* store takes const

* copy

* cleanups

* faster and simpler

* label them reduce

* better syntax

* touchup
2025-11-03 21:34:26 +08:00
George Hotz
1e3d6e49a6 index slicing + allclose (#13071)
* continue work on slicing+allclose

* Revert "Revert "slicing + allclose""

This reverts commit 6c7a12f21c.

* fix tests + better syntax

* forgot an after

* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
8cbef912d2 move reshape to MathTraits (#13054)
* move reshape to MathTraits

* confirm it works in amd_uop_matmul
2025-11-02 12:56:15 +08:00
George Hotz
bc178d14a9 matmul example on metal showing off tensor core (#13033)
* matmul example on metal showing off tensor core

* flip the args of placeholder

* mat_idx

* imp
2025-10-31 19:40:36 +08:00
George Hotz
b46229ca51 use shrink in amd_matmul_uop (#13026)
* use shrink in amd_matmul_uop

* colors
2025-10-31 10:43:41 +08:00
George Hotz
512513c403 cleanup amd uop matmul (#13025)
* cleanup amd uop matmul

* remove mod

* move that out

* better variable names

* var names

* more

* render fallback

* colors
2025-10-31 10:04:45 +08:00
George Hotz
4a741e8364 modernize amd uop matmul (#13011)
* modernize amd uop matmul

* progress

* comment

* more comments

* revert that

* mac cleanups

* fix estimates

* format
2025-10-30 17:02:38 +08:00
chenyu
c5cee74706 remove BLOCK_REORDER (#12854)
not used
2025-10-21 19:10:14 -04:00
chenyu
ae51bdd06a remove trivial use of RANGEIFY flag (#12550)
some tests need update still
2025-10-09 02:29:38 -04:00
George Hotz
afad7d0cd1 remove dtype from range, it will be dtypes.index soon [pr] (#11914)
* remove dtype from range, it will be dtypes.index soon [pr]

* a few more
2025-08-29 09:52:07 -07:00
George Hotz
27701ef823 add locals support to rangeify (#11826) 2025-08-24 14:03:12 -07:00
qazal
793ace530e update amd_uop_matmul.py import (#11581)
Using this for testing SQTT
2025-08-08 17:07:35 +03:00
George Hotz
82be8abfd2 move opt under codegen (#11569) 2025-08-07 14:19:17 -07:00
George Hotz
4f26a9ad32 check elements_per_thread in tensorcore [pr] (#11435) 2025-07-30 11:55:48 -07:00
George Hotz
1bef2d80c1 unrolls are all in the same scope (#11429)
* unrolls are all in the same scope

* fix that import
2025-07-29 16:55:37 -07:00
George Hotz
03909f2772 permute locals for HL uop matmul (#11412)
* permute locals for HL uop matmul

* parens fix that

* permutes

* 20 TFLOPS
2025-07-29 08:19:59 -07:00
George Hotz
735ad5f10d kernel4 and 5 in uops (#11411)
* move simplify views to merge views

* add amd kernel 4

* Revert "move simplify views to merge views"

This reverts commit 1e07dff384.

* k4 in python

* kernel4 written in uops

* k5 support

* cleanups
2025-07-28 19:35:48 -07:00
George Hotz
fddc645668 HL=2 top matmul (#11406)
* HL=2 top matmul

* top colored
2025-07-28 12:32:38 -07:00
George Hotz
dfeee63d30 uop matmul work (#11388)
* uop matmul work

* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
2c70eaf18c fix load / barrier (#11386)
* fix load / barrier

* cleanups

* fix CI
2025-07-26 10:27:37 -07:00
George Hotz
466ab5a3f2 store/load not pass through index (#11381)
* noop

* fix noop

* store cat is NOOP

* store dtype is void

* stores aren't passed through anymore

* meh, skip those for ptx

* correct ptx skip

* hl runs
2025-07-25 21:01:47 -07:00
George Hotz
490a93902c define reg doesn't have init anymore (#11365)
* define reg doesn't have init anymore

* remove that

* no special logic for dr

* fix amd uop matmul
2025-07-24 19:15:49 -07:00
George Hotz
0602b22086 kernel spec (#11359)
* kernel spec

* ops.VIEW

* work
2025-07-24 12:45:38 -07:00
George Hotz
b0dc97d1f7 write out kernel 3 in uops (#11352)
* write out kernel 3 in uops

* matmul is correct

* gemm passes spec

* bugfix to match speed

* cleanups
2025-07-23 17:32:38 -07:00
George Hotz
108aac8af4 use AddrSpace instead of local (#11314)
* use AddrSpace instead of local

* addrspace in test
2025-07-21 14:00:06 -07:00
George Hotz
842184a1ab rename kernelize to schedule, try 2 (#11305) 2025-07-21 11:18:36 -07:00
George Hotz
d67c8e7b42 local metal on metal in uop syntax (#11185)
* local metal on metal in uop syntax

* TODO: just put the axis_info in the kernelinfo

* local

* amd_matmul works @ 28 TFLOPS

* clean up matmul

* kernel8 works

* remove that

* locals

* axistype innovation

* work

* cleanup

* kernel3 regs

* cleanup kernel3

* work

* why is it broken

* no beam

* reenable

* permutes
2025-07-12 16:31:19 -07:00