George Hotz
6809ff8fe1
simplify priority
2025-11-06 07:57:59 -08:00
nimlgen
b9b68bf437
amd: add kern to sqtt event ( #13126 )
...
* amd: add kern to sqtt event
* fix
2025-11-06 22:02:02 +08:00
qazal
88245d6579
qol improvements to sqtt decoder and timing tests ( #13125 )
2025-11-06 20:51:30 +08:00
nimlgen
dafdb4bfb1
test hcq open with pytest ( #13124 )
...
* test hcq open with pytest
* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87
system: fix flock on pcidevs ( #13123 )
...
* system: fix locking of hcq devices
* rename and fullrun
* force ok
* fix
* fix
2025-11-06 19:02:13 +08:00
qazal
3126c89b84
viz: visible horizontal scrollbar in long texts ( #13122 )
2025-11-06 17:23:02 +08:00
George Hotz
91cc773397
add run count to toposort ( #13119 )
2025-11-05 22:29:34 -08:00
Adeeb Shihadeh
dca7fb0a49
qcom: make priority configurable ( #13120 )
2025-11-05 22:27:54 -08:00
qazal
b2bb3af12a
make range_color work in VIZ ( #13121 )
2025-11-06 14:26:48 +08:00
chenyu
f33c182393
test custom qkv kernel ( #13118 )
...
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
c65e6d8887
add ranges to print_uops ( #13116 )
...
* remove tuplize from linearizer
* try this
* simple priority
* add colored ranges to print_uops
* improve comments
* fix no const in src
* fix mypy
* fix define global
* fix var placement
* no prefer early load
* revert linearizer for now
2025-11-05 20:26:56 -08:00
George Hotz
9b2b535fa4
fix issue with multi flip ( #13115 )
2025-11-05 15:28:50 -08:00
George Hotz
4027eef264
fix test warnings ( #13114 )
...
* fix test warnings
* precommit passes
* ignore std_mean warning
2025-11-05 15:06:29 -08:00
George Hotz
bcfe42937f
move permute/flip/shrink to mixins ( #13113 )
...
* move permute to mixins
* move more stuff
* two more
* fix local mypy
* fix tests
* fix shrink
2025-11-05 14:14:15 -08:00
George Hotz
2d4f01fda0
move mixins to mixin dir ( #13105 )
...
* move mixins to mixin dir
* math
2025-11-05 10:18:33 -08:00
chenyu
52f0081e77
use where instead of mul in Embedding ( #13112 )
2025-11-05 12:49:01 -05:00
b1tg
edc4e1aede
ignore trailing nops in llvm-objdump output ( #13110 )
2025-11-06 01:10:51 +08:00
chenyu
03ee0cfe45
minor fast_idiv cleanup [pr] ( #13109 )
2025-11-05 11:44:36 -05:00
chenyu
18d4ecc1f3
lower nv test_gemm_4096 target ( #13107 )
2025-11-05 11:05:16 -05:00
nimlgen
eff80beeed
amd: props in device not sqtt ( #13106 )
...
* amd: props in device not sqtt
* fix
* f
* fix
* fix
2025-11-05 23:43:20 +08:00
nimlgen
757ceab2a2
system: allow using vidmem for uc mem ( #13104 )
2025-11-05 19:12:59 +08:00
qazal
8119d9f082
sqtt: decode each instruction exec ( #13093 )
...
* sqtt: decode each instruction exec
* start tests
* run_asm
* capture sqtt per kernel
* chaining vgprs
* test things
* inst_execs in viz
* can also configure l and g
* 1l + cleanup
* test_sleep
* test_wmma
* work
* test sleep with llvm builtin
2025-11-05 17:30:27 +08:00
chenyu
54141e9cb9
DISABLE_COMPILER_CACHE=1 in speed_v_theoretical ( #13096 )
2025-11-04 11:28:18 -05:00
chenyu
1c9f720654
remove unused type ignore [pr] ( #13095 )
2025-11-04 10:08:07 -05:00
nimlgen
c857dc5af0
autogen: try/except in try_dlopen ( #13094 )
...
* autogen: try/except in try_dlopen
* ugh
2025-11-04 22:51:53 +08:00
nimlgen
eaf7cbc178
amd: flush sqtt after each kernel ( #13092 )
...
* amd: flush sqtt after each kernel
* merge for rgp
2025-11-04 22:12:48 +08:00
qazal
96417665e8
show sqtt decoder errs in viz ( #13088 )
...
* show sqtt decoder errs in viz
* don't touch roc.py
* give hljs a default language
* work from tinyr9
* work
2025-11-04 22:05:06 +08:00
nimlgen
49191ada77
roc: install sqtt decoder ( #13091 )
...
* roc: install?
* msg
* 0.1.4
2025-11-04 18:56:01 +08:00
nimlgen
16f1f644ba
amd: remove sqtt=2 ( #13090 )
2025-11-04 18:29:24 +08:00
nimlgen
2e97eaa866
roc: no nullptr when no wave instructions ( #13087 )
2025-11-04 17:32:14 +08:00
wozeparrot
9c00c0688a
tk fa: use 16x64 tiles ( #13086 )
2025-11-03 18:25:38 -08:00
wozeparrot
4ed0f216b5
fix: make max_matmul run again ( #13085 )
2025-11-03 18:09:09 -08:00
chenyu
ca17718b6d
remove symbolic_flat ( #13083 )
...
* remove symbolic_flat
some kernels are different but sometimes it's better so not clear, will merge as long as benchmark passes
* test_location
2025-11-03 17:25:21 -05:00
chenyu
fda720e013
simpler _is_balanced [pr] ( #13082 )
...
returns False earlier
2025-11-03 16:47:14 -05:00
chenyu
ddf01fdb15
revert mlperf.yml setting ( #13080 )
2025-11-03 15:24:13 -05:00
qazal
6df34a5887
lint sqtt parser with mypy ( #13079 )
...
* llvm address table errs
* mypy likes annotated dicts
* unwrap nullable
2025-11-04 00:53:59 +08:00
qazal
2d2040bc92
viz: tabulate sqtt ( #13078 )
...
* viz: tabulate sqtt
* nomore asdict
2025-11-04 00:03:15 +08:00
nimlgen
dfde3f54d9
rocprof: use llvm disasm ( #13077 )
...
* rocprof: use llvm disasm
* rm
2025-11-03 23:58:58 +08:00
qazal
27d42fd575
sqtt decoder print behind DEBUG>=5 ( #13076 )
...
* sqtt decoder print behind DEBUG>=5
* gfx version stuff also behind 5
2025-11-03 23:20:03 +08:00
George Hotz
416b15cc59
improve uop matmul syntax ( #13074 )
...
* improve uop matmul syntax
* store takes const
* copy
* cleanups
* faster and simpler
* label them reduce
* better syntax
* touchup
2025-11-03 21:34:26 +08:00
nimlgen
08855c162b
amd: correct sqtt_read for several xccs ( #13075 )
...
* amd: correct sqtt_read for several xccs
* default mask
2025-11-03 19:59:56 +08:00
qazal
1c0d4f1cd2
viz: counters loader ( #12987 )
...
* standalone custom loader
* first iteration on the ui
* work
* add center helper
* add edge offsets
* enumerate all edge types
* try dagre layout algorithm
* simpler spec
* bring back double edges
* more work on edge paths
* aesthetics
* custom edges also works
* dimmer inactive links
* cleanup
* cleanup
* split out the ncu layout
* this is just a k/v map now
* rm that
* more cleanup and comments
* do work
* also this work
* simpler start
* rm that
* sqtt work
* view sqtt
* sqtt
* --custom is just in profile
* wrap c call
* from tinygrad install
* eg. module not found
2025-11-03 19:42:36 +08:00
George Hotz
1e3d6e49a6
index slicing + allclose ( #13071 )
...
* continue work on slicing+allclose
* Revert "Revert "slicing + allclose""
This reverts commit 6c7a12f21c .
* fix tests + better syntax
* forgot an after
* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
6c7a12f21c
Revert "slicing + allclose"
...
This reverts commit c9a1e35b1e .
2025-11-03 12:05:44 +08:00
George Hotz
c9a1e35b1e
slicing + allclose
2025-11-03 12:00:45 +08:00
chenyu
a317d6e625
extra/amdpci/setup_python_cap.sh ( #13070 )
2025-11-02 19:19:36 -05:00
chenyu
ad501ce50a
mlperf cron install tqdm ( #13069 )
...
one more...
2025-11-02 18:09:27 -05:00
chenyu
2c8d619147
mlperf cron install influxdb3-python ( #13068 )
2025-11-02 17:55:40 -05:00
chenyu
4c22f089fc
mlperf cron install tensorflow try 2 ( #13067 )
2025-11-02 17:11:01 -05:00
chenyu
c58cf91850
mlperf cron install tensorflow ( #13066 )
2025-11-02 16:48:05 -05:00