chenyu
ca17718b6d
remove symbolic_flat ( #13083 )
...
* remove symbolic_flat
some kernels are different but sometimes it's better so not clear, will merge as long as benchmark passes
* test_location
2025-11-03 17:25:21 -05:00
chenyu
fda720e013
simpler _is_balanced [pr] ( #13082 )
...
returns False earlier
2025-11-03 16:47:14 -05:00
chenyu
ddf01fdb15
revert mlperf.yml setting ( #13080 )
2025-11-03 15:24:13 -05:00
qazal
6df34a5887
lint sqtt parser with mypy ( #13079 )
...
* llvm address table errs
* mypy likes annotated dicts
* unwrap nullable
2025-11-04 00:53:59 +08:00
qazal
2d2040bc92
viz: tabulate sqtt ( #13078 )
...
* viz: tabulate sqtt
* nomore asdict
2025-11-04 00:03:15 +08:00
nimlgen
dfde3f54d9
rocprof: use llvm disasm ( #13077 )
...
* rocprof: use llvm disasm
* rm
2025-11-03 23:58:58 +08:00
qazal
27d42fd575
sqtt decoder print behind DEBUG>=5 ( #13076 )
...
* sqtt decoder print behind DEBUG>=5
* gfx version stuff also behind 5
2025-11-03 23:20:03 +08:00
George Hotz
416b15cc59
improve uop matmul syntax ( #13074 )
...
* improve uop matmul syntax
* store takes const
* copy
* cleanups
* faster and simpler
* label them reduce
* better syntax
* touchup
2025-11-03 21:34:26 +08:00
nimlgen
08855c162b
amd: correct sqtt_read for several xccs ( #13075 )
...
* amd: correct sqtt_read for several xccs
* default mask
2025-11-03 19:59:56 +08:00
qazal
1c0d4f1cd2
viz: counters loader ( #12987 )
...
* standalone custom loader
* first iteration on the ui
* work
* add center helper
* add edge offsets
* enumerate all edge types
* try dagre layout algorithm
* simpler spec
* bring back double edges
* more work on edge paths
* aesthetics
* custom edges also works
* dimmer inactive links
* cleanup
* cleanup
* split out the ncu layout
* this is just a k/v map now
* rm that
* more cleanup and comments
* do work
* also this work
* simpler start
* rm that
* sqtt work
* view sqtt
* sqtt
* --custom is just in profile
* wrap c call
* from tinygrad install
* eg. module not found
2025-11-03 19:42:36 +08:00
George Hotz
1e3d6e49a6
index slicing + allclose ( #13071 )
...
* continue work on slicing+allclose
* Revert "Revert "slicing + allclose""
This reverts commit 6c7a12f21c .
* fix tests + better syntax
* forgot an after
* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
6c7a12f21c
Revert "slicing + allclose"
...
This reverts commit c9a1e35b1e .
2025-11-03 12:05:44 +08:00
George Hotz
c9a1e35b1e
slicing + allclose
2025-11-03 12:00:45 +08:00
chenyu
a317d6e625
extra/amdpci/setup_python_cap.sh ( #13070 )
2025-11-02 19:19:36 -05:00
chenyu
ad501ce50a
mlperf cron install tqdm ( #13069 )
...
one more...
2025-11-02 18:09:27 -05:00
chenyu
2c8d619147
mlperf cron install influxdb3-python ( #13068 )
2025-11-02 17:55:40 -05:00
chenyu
4c22f089fc
mlperf cron install tensorflow try 2 ( #13067 )
2025-11-02 17:11:01 -05:00
chenyu
c58cf91850
mlperf cron install tensorflow ( #13066 )
2025-11-02 16:48:05 -05:00
chenyu
74db65cf72
update mlperf bert LOGMLPERF ( #13065 )
2025-11-02 15:26:37 -05:00
chenyu
b18293de96
train bert in mlperf cron ( #13064 )
...
more relevant now
2025-11-02 15:04:02 -05:00
nimlgen
be0028d3ce
amd: universal set_grbm ( #13062 )
...
* amd: universal set_grbm
* fix
2025-11-03 03:35:55 +08:00
nimlgen
37a730abce
amd: fix pmc sq gfx11+ ( #13058 )
...
* amd: fix pmc sq gfx11+
* fix
2025-11-02 21:56:47 +08:00
qazal
24054bb655
viz: check overlay width after layout ( #13060 )
2025-11-02 21:47:58 +08:00
George Hotz
962d980919
fuse hasn't worked since rangeify, remove it ( #13057 )
2025-11-02 14:01:52 +08:00
George Hotz
036ee9f84c
Self type + mixins ( #13056 )
...
* use Self type
* mixin
* fix later
2025-11-02 13:30:01 +08:00
George Hotz
8cbef912d2
move reshape to MathTraits ( #13054 )
...
* move reshape to MathTraits
* confirm it works in amd_uop_matmul
2025-11-02 12:56:15 +08:00
George Hotz
1ff341bae5
python 3.11 is now required ( #13055 )
2025-11-02 12:55:40 +08:00
George Hotz
267be7fc5e
fp16 acc
2025-11-02 12:53:04 +08:00
wozeparrot
8206eab4fc
fix: tk fa 4 workers ( #13052 )
2025-11-01 16:41:29 -07:00
Sieds Lykles
885b6dea9e
multiple reduce range arange folding ( #13047 )
...
* multi reduce arange folding
* add test
* cvar to var
* add circular_pad_bw test
2025-11-01 22:11:26 +01:00
Sieds Lykles
f97fb703c8
catch group error in matvec heuristic ( #13051 )
2025-11-01 22:09:35 +01:00
Sieds Lykles
ecb8565f67
Revert "Better cleanup of arange bufferize ( #13046 )" ( #13048 )
...
This reverts commit c99b7dfd4a .
2025-11-01 18:09:37 +01:00
Sieds Lykles
c99b7dfd4a
Better cleanup of arange bufferize ( #13046 )
...
* check for reduce and index instead of cast
* add test
2025-11-01 16:16:31 +01:00
nimlgen
051aab5481
open viz with sqtt flags ( #13001 )
2025-11-01 22:48:17 +08:00
nimlgen
2db57f3a97
amd: better msg when out of perf regs ( #13042 )
2025-11-01 22:47:50 +08:00
chenyu
bebec73471
write custom_sum with set and after ( #13045 )
2025-11-01 10:45:30 -04:00
George Hotz
e98506735b
add CONTRACT support to UOp programs ( #13043 )
...
* add contract support
* use contract
* 342 tflops
2025-11-01 19:11:32 +08:00
George Hotz
65a0a31475
AMD mi350x matmul from stream ( #13040 )
...
* works
* working mfma
* 120 TFLOPS
* regs
* 192 TFLOPS
* try pipelining
* something
* notes
* contract
* linter to 3.11
* that was a bug
2025-11-01 17:55:19 +08:00
chenyu
f396df26ea
test custom sum ( #13039 )
...
* test custom sum
this is higher level than set and after?
* only float
2025-10-31 19:25:56 -04:00
nimlgen
a23226e61e
amd: pmc for gfx9 ( #13036 )
...
* amd: pmc for gfx9
* xcc
* vmid mask
* ugh
* tiny
* minor
* sorryg
2025-11-01 04:26:34 +08:00
nimlgen
f6786c1bfd
autogen: py314 ( #13038 )
...
* autogen: py314
* bump py?
2025-11-01 04:02:19 +08:00
nimlgen
d532117df5
amd: rename set_grbm_se -> set_grbm_se_sh ( #13037 )
2025-11-01 01:37:57 +08:00
nimlgen
a9e5ffd3d1
amd: new pmc src ( #13034 )
2025-11-01 01:33:23 +08:00
Sieds Lykles
3dc593c536
add strip_params to pyrender ( #13021 )
...
* add strip_params to pyrender
* update that one too
* strip_parens fix
* cleaner
* add test
* add some more tests
* cleaner strip_parens
2025-10-31 14:15:56 +01:00
George Hotz
bc178d14a9
matmul example on metal showing off tensor core ( #13033 )
...
* matmul example on metal showing off tensor core
* flip the args of placeholder
* mat_idx
* imp
2025-10-31 19:40:36 +08:00
George Hotz
e066b3176b
hotfix: types and names for custom kernel test
2025-10-31 17:34:55 +08:00
George Hotz
54f48f93c6
working backward pass in custom kernel ( #13032 )
...
* working backward pass in custom kernel
* custom_kernel tensor method
* no SPEC=2
2025-10-31 17:26:18 +08:00
George Hotz
b791d70725
support custom UOp kernels ( #13028 )
...
* support custom UOp kernels
* no number
* multioutput works
* backward kernel runs
* move kernel class
* grad later
* work
* no tags in kernel graph
* test arange
* arange + contig
* delete comment
2025-10-31 15:51:39 +08:00
qazal
9f0c25ec48
viz: use indexing toggle for schedule graph ( #13031 )
2025-10-31 15:32:08 +08:00
George Hotz
b2caf4c2b3
prepare for custom kernel ( #13029 )
2025-10-31 14:47:37 +08:00