Commit Graph

10923 Commits

Author SHA1 Message Date
qazal
b2bb3af12a make range_color work in VIZ (#13121) 2025-11-06 14:26:48 +08:00
chenyu
f33c182393 test custom qkv kernel (#13118)
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
c65e6d8887 add ranges to print_uops (#13116)
* remove tuplize from linearizer

* try this

* simple priority

* add colored ranges to print_uops

* improve comments

* fix no const in src

* fix mypy

* fix define global

* fix var placement

* no prefer early load

* revert linearizer for now
2025-11-05 20:26:56 -08:00
George Hotz
9b2b535fa4 fix issue with multi flip (#13115) 2025-11-05 15:28:50 -08:00
George Hotz
4027eef264 fix test warnings (#13114)
* fix test warnings

* precommit passes

* ignore std_mean warning
2025-11-05 15:06:29 -08:00
George Hotz
bcfe42937f move permute/flip/shrink to mixins (#13113)
* move permute to mixins

* move more stuff

* two more

* fix local mypy

* fix tests

* fix shrink
2025-11-05 14:14:15 -08:00
George Hotz
2d4f01fda0 move mixins to mixin dir (#13105)
* move mixins to mixin dir

* math
2025-11-05 10:18:33 -08:00
chenyu
52f0081e77 use where instead of mul in Embedding (#13112) 2025-11-05 12:49:01 -05:00
b1tg
edc4e1aede ignore trailing nops in llvm-objdump output (#13110) 2025-11-06 01:10:51 +08:00
chenyu
03ee0cfe45 minor fast_idiv cleanup [pr] (#13109) 2025-11-05 11:44:36 -05:00
chenyu
18d4ecc1f3 lower nv test_gemm_4096 target (#13107) 2025-11-05 11:05:16 -05:00
nimlgen
eff80beeed amd: props in device not sqtt (#13106)
* amd: props in device not sqtt

* fix

* f

* fix

* fix
2025-11-05 23:43:20 +08:00
nimlgen
757ceab2a2 system: allow using vidmem for uc mem (#13104) 2025-11-05 19:12:59 +08:00
qazal
8119d9f082 sqtt: decode each instruction exec (#13093)
* sqtt: decode each instruction exec

* start tests

* run_asm

* capture sqtt per kernel

* chaining vgprs

* test things

* inst_execs in viz

* can also configure l and g

* 1l + cleanup

* test_sleep

* test_wmma

* work

* test sleep with llvm builtin
2025-11-05 17:30:27 +08:00
chenyu
54141e9cb9 DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096) 2025-11-04 11:28:18 -05:00
chenyu
1c9f720654 remove unused type ignore [pr] (#13095) 2025-11-04 10:08:07 -05:00
nimlgen
c857dc5af0 autogen: try/except in try_dlopen (#13094)
* autogen: try/except in try_dlopen

* ugh
2025-11-04 22:51:53 +08:00
nimlgen
eaf7cbc178 amd: flush sqtt after each kernel (#13092)
* amd: flush sqtt after each kernel

* merge for rgp
2025-11-04 22:12:48 +08:00
qazal
96417665e8 show sqtt decoder errs in viz (#13088)
* show sqtt decoder errs in viz

* don't touch roc.py

* give hljs a default language

* work from tinyr9

* work
2025-11-04 22:05:06 +08:00
nimlgen
49191ada77 roc: install sqtt decoder (#13091)
* roc: install?

* msg

* 0.1.4
2025-11-04 18:56:01 +08:00
nimlgen
16f1f644ba amd: remove sqtt=2 (#13090) 2025-11-04 18:29:24 +08:00
nimlgen
2e97eaa866 roc: no nullptr when no wave instructions (#13087) 2025-11-04 17:32:14 +08:00
wozeparrot
9c00c0688a tk fa: use 16x64 tiles (#13086) 2025-11-03 18:25:38 -08:00
wozeparrot
4ed0f216b5 fix: make max_matmul run again (#13085) 2025-11-03 18:09:09 -08:00
chenyu
ca17718b6d remove symbolic_flat (#13083)
* remove symbolic_flat

some kernels are different but sometimes it's better so not clear, will merge as long as benchmark passes

* test_location
2025-11-03 17:25:21 -05:00
chenyu
fda720e013 simpler _is_balanced [pr] (#13082)
returns False earlier
2025-11-03 16:47:14 -05:00
chenyu
ddf01fdb15 revert mlperf.yml setting (#13080) 2025-11-03 15:24:13 -05:00
qazal
6df34a5887 lint sqtt parser with mypy (#13079)
* llvm address table errs

* mypy likes annotated dicts

* unwrap nullable
2025-11-04 00:53:59 +08:00
qazal
2d2040bc92 viz: tabulate sqtt (#13078)
* viz: tabulate sqtt

* nomore asdict
2025-11-04 00:03:15 +08:00
nimlgen
dfde3f54d9 rocprof: use llvm disasm (#13077)
* rocprof: use llvm disasm

* rm
2025-11-03 23:58:58 +08:00
qazal
27d42fd575 sqtt decoder print behind DEBUG>=5 (#13076)
* sqtt decoder print behind DEBUG>=5

* gfx version stuff also behind 5
2025-11-03 23:20:03 +08:00
George Hotz
416b15cc59 improve uop matmul syntax (#13074)
* improve uop matmul syntax

* store takes const

* copy

* cleanups

* faster and simpler

* label them reduce

* better syntax

* touchup
2025-11-03 21:34:26 +08:00
nimlgen
08855c162b amd: correct sqtt_read for several xccs (#13075)
* amd: correct sqtt_read for several xccs

* default mask
2025-11-03 19:59:56 +08:00
qazal
1c0d4f1cd2 viz: counters loader (#12987)
* standalone custom loader

* first iteration on the ui

* work

* add center helper

* add edge offsets

* enumerate all edge types

* try dagre layout algorithm

* simpler spec

* bring back double edges

* more work on edge paths

* aesthetics

* custom edges also works

* dimmer inactive links

* cleanup

* cleanup

* split out the ncu layout

* this is just a k/v map now

* rm that

* more cleanup and comments

* do work

* also this work

* simpler start

* rm that

* sqtt work

* view sqtt

* sqtt

* --custom is just in profile

* wrap c call

* from tinygrad install

* eg. module not found
2025-11-03 19:42:36 +08:00
George Hotz
1e3d6e49a6 index slicing + allclose (#13071)
* continue work on slicing+allclose

* Revert "Revert "slicing + allclose""

This reverts commit 6c7a12f21c.

* fix tests + better syntax

* forgot an after

* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
6c7a12f21c Revert "slicing + allclose"
This reverts commit c9a1e35b1e.
2025-11-03 12:05:44 +08:00
George Hotz
c9a1e35b1e slicing + allclose 2025-11-03 12:00:45 +08:00
chenyu
a317d6e625 extra/amdpci/setup_python_cap.sh (#13070) 2025-11-02 19:19:36 -05:00
chenyu
ad501ce50a mlperf cron install tqdm (#13069)
one more...
2025-11-02 18:09:27 -05:00
chenyu
2c8d619147 mlperf cron install influxdb3-python (#13068) 2025-11-02 17:55:40 -05:00
chenyu
4c22f089fc mlperf cron install tensorflow try 2 (#13067) 2025-11-02 17:11:01 -05:00
chenyu
c58cf91850 mlperf cron install tensorflow (#13066) 2025-11-02 16:48:05 -05:00
chenyu
74db65cf72 update mlperf bert LOGMLPERF (#13065) 2025-11-02 15:26:37 -05:00
chenyu
b18293de96 train bert in mlperf cron (#13064)
more relevant now
2025-11-02 15:04:02 -05:00
nimlgen
be0028d3ce amd: universal set_grbm (#13062)
* amd: universal set_grbm

* fix
2025-11-03 03:35:55 +08:00
nimlgen
37a730abce amd: fix pmc sq gfx11+ (#13058)
* amd: fix pmc sq gfx11+

* fix
2025-11-02 21:56:47 +08:00
qazal
24054bb655 viz: check overlay width after layout (#13060) 2025-11-02 21:47:58 +08:00
George Hotz
962d980919 fuse hasn't worked since rangeify, remove it (#13057) 2025-11-02 14:01:52 +08:00
George Hotz
036ee9f84c Self type + mixins (#13056)
* use Self type

* mixin

* fix later
2025-11-02 13:30:01 +08:00
George Hotz
8cbef912d2 move reshape to MathTraits (#13054)
* move reshape to MathTraits

* confirm it works in amd_uop_matmul
2025-11-02 12:56:15 +08:00