Commit Graph

10877 Commits

Author SHA1 Message Date
George Hotz
6ffd33e1e5 fix later 2025-11-02 13:18:41 +08:00
George Hotz
4198efb8bc mixin 2025-11-02 13:05:26 +08:00
George Hotz
13e8914deb use Self type 2025-11-02 13:00:35 +08:00
George Hotz
8cbef912d2 move reshape to MathTraits (#13054)
* move reshape to MathTraits

* confirm it works in amd_uop_matmul
2025-11-02 12:56:15 +08:00
George Hotz
1ff341bae5 python 3.11 is now required (#13055) 2025-11-02 12:55:40 +08:00
George Hotz
267be7fc5e fp16 acc 2025-11-02 12:53:04 +08:00
wozeparrot
8206eab4fc fix: tk fa 4 workers (#13052) 2025-11-01 16:41:29 -07:00
Sieds Lykles
885b6dea9e multiple reduce range arange folding (#13047)
* multi reduce arange folding

* add test

* cvar to var

* add circular_pad_bw test
2025-11-01 22:11:26 +01:00
Sieds Lykles
f97fb703c8 catch group error in matvec heuristic (#13051) 2025-11-01 22:09:35 +01:00
Sieds Lykles
ecb8565f67 Revert "Better cleanup of arange bufferize (#13046)" (#13048)
This reverts commit c99b7dfd4a.
2025-11-01 18:09:37 +01:00
Sieds Lykles
c99b7dfd4a Better cleanup of arange bufferize (#13046)
* check for reduce and index instead of cast

* add test
2025-11-01 16:16:31 +01:00
nimlgen
051aab5481 open viz with sqtt flags (#13001) 2025-11-01 22:48:17 +08:00
nimlgen
2db57f3a97 amd: better msg when out of perf regs (#13042) 2025-11-01 22:47:50 +08:00
chenyu
bebec73471 write custom_sum with set and after (#13045) 2025-11-01 10:45:30 -04:00
George Hotz
e98506735b add CONTRACT support to UOp programs (#13043)
* add contract support

* use contract

* 342 tflops
2025-11-01 19:11:32 +08:00
George Hotz
65a0a31475 AMD mi350x matmul from stream (#13040)
* works

* working mfma

* 120 TFLOPS

* regs

* 192 TFLOPS

* try pipelining

* something

* notes

* contract

* linter to 3.11

* that was a bug
2025-11-01 17:55:19 +08:00
chenyu
f396df26ea test custom sum (#13039)
* test custom sum

this is higher level than set and after?

* only float
2025-10-31 19:25:56 -04:00
nimlgen
a23226e61e amd: pmc for gfx9 (#13036)
* amd: pmc for gfx9

* xcc

* vmid mask

* ugh

* tiny

* minor

* sorryg
2025-11-01 04:26:34 +08:00
nimlgen
f6786c1bfd autogen: py314 (#13038)
* autogen: py314

* bump py?
2025-11-01 04:02:19 +08:00
nimlgen
d532117df5 amd: rename set_grbm_se -> set_grbm_se_sh (#13037) 2025-11-01 01:37:57 +08:00
nimlgen
a9e5ffd3d1 amd: new pmc src (#13034) 2025-11-01 01:33:23 +08:00
Sieds Lykles
3dc593c536 add strip_params to pyrender (#13021)
* add strip_params to pyrender

* update that one too

* strip_parens fix

* cleaner

* add test

* add some more tests

* cleaner strip_parens
2025-10-31 14:15:56 +01:00
George Hotz
bc178d14a9 matmul example on metal showing off tensor core (#13033)
* matmul example on metal showing off tensor core

* flip the args of placeholder

* mat_idx

* imp
2025-10-31 19:40:36 +08:00
George Hotz
e066b3176b hotfix: types and names for custom kernel test 2025-10-31 17:34:55 +08:00
George Hotz
54f48f93c6 working backward pass in custom kernel (#13032)
* working backward pass in custom kernel

* custom_kernel tensor method

* no SPEC=2
2025-10-31 17:26:18 +08:00
George Hotz
b791d70725 support custom UOp kernels (#13028)
* support custom UOp kernels

* no number

* multioutput works

* backward kernel runs

* move kernel class

* grad later

* work

* no tags in kernel graph

* test arange

* arange + contig

* delete comment
2025-10-31 15:51:39 +08:00
qazal
9f0c25ec48 viz: use indexing toggle for schedule graph (#13031) 2025-10-31 15:32:08 +08:00
George Hotz
b2caf4c2b3 prepare for custom kernel (#13029) 2025-10-31 14:47:37 +08:00
qazal
564e9ccc31 fix show indexing toggle default on (#13030) 2025-10-31 14:41:15 +08:00
qazal
6cd341354e viz: add toggle to hide indexing UOps (#13027)
* start

* pass opts to worker

* works

* rename to showIndexing

* keep toggle through rewrites

* fix nan

* real fix for nan

* move render function

* fix firefox

* fix safari

* more work
2025-10-31 13:21:11 +08:00
George Hotz
b46229ca51 use shrink in amd_matmul_uop (#13026)
* use shrink in amd_matmul_uop

* colors
2025-10-31 10:43:41 +08:00
wozeparrot
78f7650eec faster tk matmul (#13006) 2025-10-30 19:09:27 -07:00
George Hotz
512513c403 cleanup amd uop matmul (#13025)
* cleanup amd uop matmul

* remove mod

* move that out

* better variable names

* var names

* more

* render fallback

* colors
2025-10-31 10:04:45 +08:00
chenyu
f6430a0559 add script for one slow openpilot conv (#12953)
* add script for one slow openpilot conv

* fix ruff
2025-10-30 18:08:41 -04:00
chenyu
73002ebffa print p.applied_opts with DEBUG >= 3 (#13024) 2025-10-30 16:51:21 -04:00
chenyu
99e76f33a0 remove unneeded TYPE_CHECKING [pr] (#13020) 2025-10-30 12:01:13 -04:00
nimlgen
629b177b66 amd: sqtt works in profile mode (#13019) 2025-10-30 23:48:52 +08:00
Sieds Lykles
4c8362128b New symbolic renderer + strip parens (#13017)
* new uop renderer

* better tester

* strip parens

* update tests

* split method check_uop_against_string

* use ctx.update instead of add_rendered method

* strip parens based on precedence

* update test

* new symbolic renderer

* add comment
2025-10-30 16:41:32 +01:00
chenyu
c78dfcc5a1 simplify ProgramSpec __post_init__ STORE/LOAD [pr] (#13018) 2025-10-30 11:13:21 -04:00
b1tg
363a201cc6 fp8 amd cstyle (#12999)
* amd fp8 cstyle

* don't repeat

* space

* lint

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-30 10:45:52 -04:00
nimlgen
5be3a93d02 amd: enable pmc on gfx12 (#13015) 2025-10-30 22:43:10 +08:00
nimlgen
cf5ab93b8e amd: pmc grbm block (#13016) 2025-10-30 22:42:59 +08:00
nimlgen
4d7a7096c9 am: enable perfmon (#13013)
* am: enable perfmon

* try

* msg
2025-10-30 22:28:36 +08:00
chenyu
985b6eb95f ues less typing.cast [pr] (#13002) 2025-10-30 09:29:52 -04:00
George Hotz
5eb87ab131 hotfix: bump cifar time to 350 2025-10-30 17:29:20 +08:00
George Hotz
4a741e8364 modernize amd uop matmul (#13011)
* modernize amd uop matmul

* progress

* comment

* more comments

* revert that

* mac cleanups

* fix estimates

* format
2025-10-30 17:02:38 +08:00
qazal
66ea3a0be4 put DEFINE_LOCAL counter in context (#13008) 2025-10-30 15:49:26 +08:00
George Hotz
e456f2cb1e more uop programs (#13007)
* more uop program

* test_matmul_relu

* tests fix
2025-10-30 14:57:59 +08:00
wozeparrot
c18b283f58 feat: timeout on stuck socket (#13009) 2025-10-29 23:11:26 -07:00
wozeparrot
92a87e37e4 fix: fetch_file (#13010) 2025-10-29 22:44:22 -07:00