Commit Graph

10934 Commits

Author SHA1 Message Date
George Hotz
f215f84241 use end range count in priority 2025-11-06 10:17:35 -08:00
George Hotz
290441dd44 do loads early (#13131)
* do loads early

* local and reg
2025-11-06 09:57:09 -08:00
George Hotz
097264853d very simple priority (#13130)
* very simple priority

* still simple
2025-11-06 09:25:28 -08:00
George Hotz
07b415e831 fixup op order (#13128)
* fixup op order

* more order

* move a few more

* more

* DEBUG_LINEARIZE
2025-11-06 08:50:04 -08:00
nimlgen
b9b68bf437 amd: add kern to sqtt event (#13126)
* amd: add kern to sqtt event

* fix
2025-11-06 22:02:02 +08:00
qazal
88245d6579 qol improvements to sqtt decoder and timing tests (#13125) 2025-11-06 20:51:30 +08:00
nimlgen
dafdb4bfb1 test hcq open with pytest (#13124)
* test hcq open with pytest

* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87 system: fix flock on pcidevs (#13123)
* system: fix locking of hcq devices

* rename and fullrun

* force ok

* fix

* fix
2025-11-06 19:02:13 +08:00
qazal
3126c89b84 viz: visible horizontal scrollbar in long texts (#13122) 2025-11-06 17:23:02 +08:00
George Hotz
91cc773397 add run count to toposort (#13119) 2025-11-05 22:29:34 -08:00
Adeeb Shihadeh
dca7fb0a49 qcom: make priority configurable (#13120) 2025-11-05 22:27:54 -08:00
qazal
b2bb3af12a make range_color work in VIZ (#13121) 2025-11-06 14:26:48 +08:00
chenyu
f33c182393 test custom qkv kernel (#13118)
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
c65e6d8887 add ranges to print_uops (#13116)
* remove tuplize from linearizer

* try this

* simple priority

* add colored ranges to print_uops

* improve comments

* fix no const in src

* fix mypy

* fix define global

* fix var placement

* no prefer early load

* revert linearizer for now
2025-11-05 20:26:56 -08:00
George Hotz
9b2b535fa4 fix issue with multi flip (#13115) 2025-11-05 15:28:50 -08:00
George Hotz
4027eef264 fix test warnings (#13114)
* fix test warnings

* precommit passes

* ignore std_mean warning
2025-11-05 15:06:29 -08:00
George Hotz
bcfe42937f move permute/flip/shrink to mixins (#13113)
* move permute to mixins

* move more stuff

* two more

* fix local mypy

* fix tests

* fix shrink
2025-11-05 14:14:15 -08:00
George Hotz
2d4f01fda0 move mixins to mixin dir (#13105)
* move mixins to mixin dir

* math
2025-11-05 10:18:33 -08:00
chenyu
52f0081e77 use where instead of mul in Embedding (#13112) 2025-11-05 12:49:01 -05:00
b1tg
edc4e1aede ignore trailing nops in llvm-objdump output (#13110) 2025-11-06 01:10:51 +08:00
chenyu
03ee0cfe45 minor fast_idiv cleanup [pr] (#13109) 2025-11-05 11:44:36 -05:00
chenyu
18d4ecc1f3 lower nv test_gemm_4096 target (#13107) 2025-11-05 11:05:16 -05:00
nimlgen
eff80beeed amd: props in device not sqtt (#13106)
* amd: props in device not sqtt

* fix

* f

* fix

* fix
2025-11-05 23:43:20 +08:00
nimlgen
757ceab2a2 system: allow using vidmem for uc mem (#13104) 2025-11-05 19:12:59 +08:00
qazal
8119d9f082 sqtt: decode each instruction exec (#13093)
* sqtt: decode each instruction exec

* start tests

* run_asm

* capture sqtt per kernel

* chaining vgprs

* test things

* inst_execs in viz

* can also configure l and g

* 1l + cleanup

* test_sleep

* test_wmma

* work

* test sleep with llvm builtin
2025-11-05 17:30:27 +08:00
chenyu
54141e9cb9 DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096) 2025-11-04 11:28:18 -05:00
chenyu
1c9f720654 remove unused type ignore [pr] (#13095) 2025-11-04 10:08:07 -05:00
nimlgen
c857dc5af0 autogen: try/except in try_dlopen (#13094)
* autogen: try/except in try_dlopen

* ugh
2025-11-04 22:51:53 +08:00
nimlgen
eaf7cbc178 amd: flush sqtt after each kernel (#13092)
* amd: flush sqtt after each kernel

* merge for rgp
2025-11-04 22:12:48 +08:00
qazal
96417665e8 show sqtt decoder errs in viz (#13088)
* show sqtt decoder errs in viz

* don't touch roc.py

* give hljs a default language

* work from tinyr9

* work
2025-11-04 22:05:06 +08:00
nimlgen
49191ada77 roc: install sqtt decoder (#13091)
* roc: install?

* msg

* 0.1.4
2025-11-04 18:56:01 +08:00
nimlgen
16f1f644ba amd: remove sqtt=2 (#13090) 2025-11-04 18:29:24 +08:00
nimlgen
2e97eaa866 roc: no nullptr when no wave instructions (#13087) 2025-11-04 17:32:14 +08:00
wozeparrot
9c00c0688a tk fa: use 16x64 tiles (#13086) 2025-11-03 18:25:38 -08:00
wozeparrot
4ed0f216b5 fix: make max_matmul run again (#13085) 2025-11-03 18:09:09 -08:00
chenyu
ca17718b6d remove symbolic_flat (#13083)
* remove symbolic_flat

some kernels are different but sometimes it's better so not clear, will merge as long as benchmark passes

* test_location
2025-11-03 17:25:21 -05:00
chenyu
fda720e013 simpler _is_balanced [pr] (#13082)
returns False earlier
2025-11-03 16:47:14 -05:00
chenyu
ddf01fdb15 revert mlperf.yml setting (#13080) 2025-11-03 15:24:13 -05:00
qazal
6df34a5887 lint sqtt parser with mypy (#13079)
* llvm address table errs

* mypy likes annotated dicts

* unwrap nullable
2025-11-04 00:53:59 +08:00
qazal
2d2040bc92 viz: tabulate sqtt (#13078)
* viz: tabulate sqtt

* nomore asdict
2025-11-04 00:03:15 +08:00
nimlgen
dfde3f54d9 rocprof: use llvm disasm (#13077)
* rocprof: use llvm disasm

* rm
2025-11-03 23:58:58 +08:00
qazal
27d42fd575 sqtt decoder print behind DEBUG>=5 (#13076)
* sqtt decoder print behind DEBUG>=5

* gfx version stuff also behind 5
2025-11-03 23:20:03 +08:00
George Hotz
416b15cc59 improve uop matmul syntax (#13074)
* improve uop matmul syntax

* store takes const

* copy

* cleanups

* faster and simpler

* label them reduce

* better syntax

* touchup
2025-11-03 21:34:26 +08:00
nimlgen
08855c162b amd: correct sqtt_read for several xccs (#13075)
* amd: correct sqtt_read for several xccs

* default mask
2025-11-03 19:59:56 +08:00
qazal
1c0d4f1cd2 viz: counters loader (#12987)
* standalone custom loader

* first iteration on the ui

* work

* add center helper

* add edge offsets

* enumerate all edge types

* try dagre layout algorithm

* simpler spec

* bring back double edges

* more work on edge paths

* aesthetics

* custom edges also works

* dimmer inactive links

* cleanup

* cleanup

* split out the ncu layout

* this is just a k/v map now

* rm that

* more cleanup and comments

* do work

* also this work

* simpler start

* rm that

* sqtt work

* view sqtt

* sqtt

* --custom is just in profile

* wrap c call

* from tinygrad install

* eg. module not found
2025-11-03 19:42:36 +08:00
George Hotz
1e3d6e49a6 index slicing + allclose (#13071)
* continue work on slicing+allclose

* Revert "Revert "slicing + allclose""

This reverts commit 6c7a12f21c.

* fix tests + better syntax

* forgot an after

* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
6c7a12f21c Revert "slicing + allclose"
This reverts commit c9a1e35b1e.
2025-11-03 12:05:44 +08:00
George Hotz
c9a1e35b1e slicing + allclose 2025-11-03 12:00:45 +08:00
chenyu
a317d6e625 extra/amdpci/setup_python_cap.sh (#13070) 2025-11-02 19:19:36 -05:00
chenyu
ad501ce50a mlperf cron install tqdm (#13069)
one more...
2025-11-02 18:09:27 -05:00