Commit Graph

10899 Commits

Author SHA1 Message Date
chenyu
ca17718b6d remove symbolic_flat (#13083)
* remove symbolic_flat

some kernels are different but sometimes it's better so not clear, will merge as long as benchmark passes

* test_location
2025-11-03 17:25:21 -05:00
chenyu
fda720e013 simpler _is_balanced [pr] (#13082)
returns False earlier
2025-11-03 16:47:14 -05:00
chenyu
ddf01fdb15 revert mlperf.yml setting (#13080) 2025-11-03 15:24:13 -05:00
qazal
6df34a5887 lint sqtt parser with mypy (#13079)
* llvm address table errs

* mypy likes annotated dicts

* unwrap nullable
2025-11-04 00:53:59 +08:00
qazal
2d2040bc92 viz: tabulate sqtt (#13078)
* viz: tabulate sqtt

* nomore asdict
2025-11-04 00:03:15 +08:00
nimlgen
dfde3f54d9 rocprof: use llvm disasm (#13077)
* rocprof: use llvm disasm

* rm
2025-11-03 23:58:58 +08:00
qazal
27d42fd575 sqtt decoder print behind DEBUG>=5 (#13076)
* sqtt decoder print behind DEBUG>=5

* gfx version stuff also behind 5
2025-11-03 23:20:03 +08:00
George Hotz
416b15cc59 improve uop matmul syntax (#13074)
* improve uop matmul syntax

* store takes const

* copy

* cleanups

* faster and simpler

* label them reduce

* better syntax

* touchup
2025-11-03 21:34:26 +08:00
nimlgen
08855c162b amd: correct sqtt_read for several xccs (#13075)
* amd: correct sqtt_read for several xccs

* default mask
2025-11-03 19:59:56 +08:00
qazal
1c0d4f1cd2 viz: counters loader (#12987)
* standalone custom loader

* first iteration on the ui

* work

* add center helper

* add edge offsets

* enumerate all edge types

* try dagre layout algorithm

* simpler spec

* bring back double edges

* more work on edge paths

* aesthetics

* custom edges also works

* dimmer inactive links

* cleanup

* cleanup

* split out the ncu layout

* this is just a k/v map now

* rm that

* more cleanup and comments

* do work

* also this work

* simpler start

* rm that

* sqtt work

* view sqtt

* sqtt

* --custom is just in profile

* wrap c call

* from tinygrad install

* eg. module not found
2025-11-03 19:42:36 +08:00
George Hotz
1e3d6e49a6 index slicing + allclose (#13071)
* continue work on slicing+allclose

* Revert "Revert "slicing + allclose""

This reverts commit 6c7a12f21c.

* fix tests + better syntax

* forgot an after

* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
6c7a12f21c Revert "slicing + allclose"
This reverts commit c9a1e35b1e.
2025-11-03 12:05:44 +08:00
George Hotz
c9a1e35b1e slicing + allclose 2025-11-03 12:00:45 +08:00
chenyu
a317d6e625 extra/amdpci/setup_python_cap.sh (#13070) 2025-11-02 19:19:36 -05:00
chenyu
ad501ce50a mlperf cron install tqdm (#13069)
one more...
2025-11-02 18:09:27 -05:00
chenyu
2c8d619147 mlperf cron install influxdb3-python (#13068) 2025-11-02 17:55:40 -05:00
chenyu
4c22f089fc mlperf cron install tensorflow try 2 (#13067) 2025-11-02 17:11:01 -05:00
chenyu
c58cf91850 mlperf cron install tensorflow (#13066) 2025-11-02 16:48:05 -05:00
chenyu
74db65cf72 update mlperf bert LOGMLPERF (#13065) 2025-11-02 15:26:37 -05:00
chenyu
b18293de96 train bert in mlperf cron (#13064)
more relevant now
2025-11-02 15:04:02 -05:00
nimlgen
be0028d3ce amd: universal set_grbm (#13062)
* amd: universal set_grbm

* fix
2025-11-03 03:35:55 +08:00
nimlgen
37a730abce amd: fix pmc sq gfx11+ (#13058)
* amd: fix pmc sq gfx11+

* fix
2025-11-02 21:56:47 +08:00
qazal
24054bb655 viz: check overlay width after layout (#13060) 2025-11-02 21:47:58 +08:00
George Hotz
962d980919 fuse hasn't worked since rangeify, remove it (#13057) 2025-11-02 14:01:52 +08:00
George Hotz
036ee9f84c Self type + mixins (#13056)
* use Self type

* mixin

* fix later
2025-11-02 13:30:01 +08:00
George Hotz
8cbef912d2 move reshape to MathTraits (#13054)
* move reshape to MathTraits

* confirm it works in amd_uop_matmul
2025-11-02 12:56:15 +08:00
George Hotz
1ff341bae5 python 3.11 is now required (#13055) 2025-11-02 12:55:40 +08:00
George Hotz
267be7fc5e fp16 acc 2025-11-02 12:53:04 +08:00
wozeparrot
8206eab4fc fix: tk fa 4 workers (#13052) 2025-11-01 16:41:29 -07:00
Sieds Lykles
885b6dea9e multiple reduce range arange folding (#13047)
* multi reduce arange folding

* add test

* cvar to var

* add circular_pad_bw test
2025-11-01 22:11:26 +01:00
Sieds Lykles
f97fb703c8 catch group error in matvec heuristic (#13051) 2025-11-01 22:09:35 +01:00
Sieds Lykles
ecb8565f67 Revert "Better cleanup of arange bufferize (#13046)" (#13048)
This reverts commit c99b7dfd4a.
2025-11-01 18:09:37 +01:00
Sieds Lykles
c99b7dfd4a Better cleanup of arange bufferize (#13046)
* check for reduce and index instead of cast

* add test
2025-11-01 16:16:31 +01:00
nimlgen
051aab5481 open viz with sqtt flags (#13001) 2025-11-01 22:48:17 +08:00
nimlgen
2db57f3a97 amd: better msg when out of perf regs (#13042) 2025-11-01 22:47:50 +08:00
chenyu
bebec73471 write custom_sum with set and after (#13045) 2025-11-01 10:45:30 -04:00
George Hotz
e98506735b add CONTRACT support to UOp programs (#13043)
* add contract support

* use contract

* 342 tflops
2025-11-01 19:11:32 +08:00
George Hotz
65a0a31475 AMD mi350x matmul from stream (#13040)
* works

* working mfma

* 120 TFLOPS

* regs

* 192 TFLOPS

* try pipelining

* something

* notes

* contract

* linter to 3.11

* that was a bug
2025-11-01 17:55:19 +08:00
chenyu
f396df26ea test custom sum (#13039)
* test custom sum

this is higher level than set and after?

* only float
2025-10-31 19:25:56 -04:00
nimlgen
a23226e61e amd: pmc for gfx9 (#13036)
* amd: pmc for gfx9

* xcc

* vmid mask

* ugh

* tiny

* minor

* sorryg
2025-11-01 04:26:34 +08:00
nimlgen
f6786c1bfd autogen: py314 (#13038)
* autogen: py314

* bump py?
2025-11-01 04:02:19 +08:00
nimlgen
d532117df5 amd: rename set_grbm_se -> set_grbm_se_sh (#13037) 2025-11-01 01:37:57 +08:00
nimlgen
a9e5ffd3d1 amd: new pmc src (#13034) 2025-11-01 01:33:23 +08:00
Sieds Lykles
3dc593c536 add strip_params to pyrender (#13021)
* add strip_params to pyrender

* update that one too

* strip_parens fix

* cleaner

* add test

* add some more tests

* cleaner strip_parens
2025-10-31 14:15:56 +01:00
George Hotz
bc178d14a9 matmul example on metal showing off tensor core (#13033)
* matmul example on metal showing off tensor core

* flip the args of placeholder

* mat_idx

* imp
2025-10-31 19:40:36 +08:00
George Hotz
e066b3176b hotfix: types and names for custom kernel test 2025-10-31 17:34:55 +08:00
George Hotz
54f48f93c6 working backward pass in custom kernel (#13032)
* working backward pass in custom kernel

* custom_kernel tensor method

* no SPEC=2
2025-10-31 17:26:18 +08:00
George Hotz
b791d70725 support custom UOp kernels (#13028)
* support custom UOp kernels

* no number

* multioutput works

* backward kernel runs

* move kernel class

* grad later

* work

* no tags in kernel graph

* test arange

* arange + contig

* delete comment
2025-10-31 15:51:39 +08:00
qazal
9f0c25ec48 viz: use indexing toggle for schedule graph (#13031) 2025-10-31 15:32:08 +08:00
George Hotz
b2caf4c2b3 prepare for custom kernel (#13029) 2025-10-31 14:47:37 +08:00