George Hotz
1255eeec6d
confirm it works in amd_uop_matmul
2025-11-02 12:48:39 +08:00
George Hotz
a1f88fea37
move reshape to MathTraits
2025-11-02 12:39:56 +08:00
wozeparrot
8206eab4fc
fix: tk fa 4 workers ( #13052 )
2025-11-01 16:41:29 -07:00
Sieds Lykles
885b6dea9e
multiple reduce range arange folding ( #13047 )
...
* multi reduce arange folding
* add test
* cvar to var
* add circular_pad_bw test
2025-11-01 22:11:26 +01:00
Sieds Lykles
f97fb703c8
catch group error in matvec heuristic ( #13051 )
2025-11-01 22:09:35 +01:00
Sieds Lykles
ecb8565f67
Revert "Better cleanup of arange bufferize ( #13046 )" ( #13048 )
...
This reverts commit c99b7dfd4a .
2025-11-01 18:09:37 +01:00
Sieds Lykles
c99b7dfd4a
Better cleanup of arange bufferize ( #13046 )
...
* check for reduce and index instead of cast
* add test
2025-11-01 16:16:31 +01:00
nimlgen
051aab5481
open viz with sqtt flags ( #13001 )
2025-11-01 22:48:17 +08:00
nimlgen
2db57f3a97
amd: better msg when out of perf regs ( #13042 )
2025-11-01 22:47:50 +08:00
chenyu
bebec73471
write custom_sum with set and after ( #13045 )
2025-11-01 10:45:30 -04:00
George Hotz
e98506735b
add CONTRACT support to UOp programs ( #13043 )
...
* add contract support
* use contract
* 342 tflops
2025-11-01 19:11:32 +08:00
George Hotz
65a0a31475
AMD mi350x matmul from stream ( #13040 )
...
* works
* working mfma
* 120 TFLOPS
* regs
* 192 TFLOPS
* try pipelining
* something
* notes
* contract
* linter to 3.11
* that was a bug
2025-11-01 17:55:19 +08:00
chenyu
f396df26ea
test custom sum ( #13039 )
...
* test custom sum
this is higher level than set and after?
* only float
2025-10-31 19:25:56 -04:00
nimlgen
a23226e61e
amd: pmc for gfx9 ( #13036 )
...
* amd: pmc for gfx9
* xcc
* vmid mask
* ugh
* tiny
* minor
* sorryg
2025-11-01 04:26:34 +08:00
nimlgen
f6786c1bfd
autogen: py314 ( #13038 )
...
* autogen: py314
* bump py?
2025-11-01 04:02:19 +08:00
nimlgen
d532117df5
amd: rename set_grbm_se -> set_grbm_se_sh ( #13037 )
2025-11-01 01:37:57 +08:00
nimlgen
a9e5ffd3d1
amd: new pmc src ( #13034 )
2025-11-01 01:33:23 +08:00
Sieds Lykles
3dc593c536
add strip_params to pyrender ( #13021 )
...
* add strip_params to pyrender
* update that one too
* strip_parens fix
* cleaner
* add test
* add some more tests
* cleaner strip_parens
2025-10-31 14:15:56 +01:00
George Hotz
bc178d14a9
matmul example on metal showing off tensor core ( #13033 )
...
* matmul example on metal showing off tensor core
* flip the args of placeholder
* mat_idx
* imp
2025-10-31 19:40:36 +08:00
George Hotz
e066b3176b
hotfix: types and names for custom kernel test
2025-10-31 17:34:55 +08:00
George Hotz
54f48f93c6
working backward pass in custom kernel ( #13032 )
...
* working backward pass in custom kernel
* custom_kernel tensor method
* no SPEC=2
2025-10-31 17:26:18 +08:00
George Hotz
b791d70725
support custom UOp kernels ( #13028 )
...
* support custom UOp kernels
* no number
* multioutput works
* backward kernel runs
* move kernel class
* grad later
* work
* no tags in kernel graph
* test arange
* arange + contig
* delete comment
2025-10-31 15:51:39 +08:00
qazal
9f0c25ec48
viz: use indexing toggle for schedule graph ( #13031 )
2025-10-31 15:32:08 +08:00
George Hotz
b2caf4c2b3
prepare for custom kernel ( #13029 )
2025-10-31 14:47:37 +08:00
qazal
564e9ccc31
fix show indexing toggle default on ( #13030 )
2025-10-31 14:41:15 +08:00
qazal
6cd341354e
viz: add toggle to hide indexing UOps ( #13027 )
...
* start
* pass opts to worker
* works
* rename to showIndexing
* keep toggle through rewrites
* fix nan
* real fix for nan
* move render function
* fix firefox
* fix safari
* more work
2025-10-31 13:21:11 +08:00
George Hotz
b46229ca51
use shrink in amd_matmul_uop ( #13026 )
...
* use shrink in amd_matmul_uop
* colors
2025-10-31 10:43:41 +08:00
wozeparrot
78f7650eec
faster tk matmul ( #13006 )
2025-10-30 19:09:27 -07:00
George Hotz
512513c403
cleanup amd uop matmul ( #13025 )
...
* cleanup amd uop matmul
* remove mod
* move that out
* better variable names
* var names
* more
* render fallback
* colors
2025-10-31 10:04:45 +08:00
chenyu
f6430a0559
add script for one slow openpilot conv ( #12953 )
...
* add script for one slow openpilot conv
* fix ruff
2025-10-30 18:08:41 -04:00
chenyu
73002ebffa
print p.applied_opts with DEBUG >= 3 ( #13024 )
2025-10-30 16:51:21 -04:00
chenyu
99e76f33a0
remove unneeded TYPE_CHECKING [pr] ( #13020 )
2025-10-30 12:01:13 -04:00
nimlgen
629b177b66
amd: sqtt works in profile mode ( #13019 )
2025-10-30 23:48:52 +08:00
Sieds Lykles
4c8362128b
New symbolic renderer + strip parens ( #13017 )
...
* new uop renderer
* better tester
* strip parens
* update tests
* split method check_uop_against_string
* use ctx.update instead of add_rendered method
* strip parens based on precedence
* update test
* new symbolic renderer
* add comment
2025-10-30 16:41:32 +01:00
chenyu
c78dfcc5a1
simplify ProgramSpec __post_init__ STORE/LOAD [pr] ( #13018 )
2025-10-30 11:13:21 -04:00
b1tg
363a201cc6
fp8 amd cstyle ( #12999 )
...
* amd fp8 cstyle
* don't repeat
* space
* lint
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-10-30 10:45:52 -04:00
nimlgen
5be3a93d02
amd: enable pmc on gfx12 ( #13015 )
2025-10-30 22:43:10 +08:00
nimlgen
cf5ab93b8e
amd: pmc grbm block ( #13016 )
2025-10-30 22:42:59 +08:00
nimlgen
4d7a7096c9
am: enable perfmon ( #13013 )
...
* am: enable perfmon
* try
* msg
2025-10-30 22:28:36 +08:00
chenyu
985b6eb95f
ues less typing.cast [pr] ( #13002 )
2025-10-30 09:29:52 -04:00
George Hotz
5eb87ab131
hotfix: bump cifar time to 350
2025-10-30 17:29:20 +08:00
George Hotz
4a741e8364
modernize amd uop matmul ( #13011 )
...
* modernize amd uop matmul
* progress
* comment
* more comments
* revert that
* mac cleanups
* fix estimates
* format
2025-10-30 17:02:38 +08:00
qazal
66ea3a0be4
put DEFINE_LOCAL counter in context ( #13008 )
2025-10-30 15:49:26 +08:00
George Hotz
e456f2cb1e
more uop programs ( #13007 )
...
* more uop program
* test_matmul_relu
* tests fix
2025-10-30 14:57:59 +08:00
wozeparrot
c18b283f58
feat: timeout on stuck socket ( #13009 )
2025-10-29 23:11:26 -07:00
wozeparrot
92a87e37e4
fix: fetch_file ( #13010 )
2025-10-29 22:44:22 -07:00
George Hotz
e64d4b3b44
uops programs ( #13005 )
...
* uops programs
* work
* work
* more syntax
* more syntax
* comments
2025-10-30 12:28:10 +08:00
George Hotz
5894df059c
hotfix: prevent inf loop if reduce splits
2025-10-30 11:21:40 +08:00
George Hotz
2da02f1ae1
add loads at the end ( #12988 )
...
* add loads at the end
* simpler
* late load
* tests passing
* fix matvec
* spec test passes
* fix where on load
* fix abs2
* fix more tests
2025-10-30 10:42:19 +08:00
nimlgen
4b001ec723
amd: pmc in mockgpu ( #13000 )
...
* amd: pmc in mockgpu
* fix
* do not open in ci
2025-10-30 01:52:02 +08:00