Commit Graph

11106 Commits

Author SHA1 Message Date
Sieds Lykles
3dc593c536 add strip_params to pyrender (#13021)
* add strip_params to pyrender

* update that one too

* strip_parens fix

* cleaner

* add test

* add some more tests

* cleaner strip_parens
2025-10-31 14:15:56 +01:00
George Hotz
bc178d14a9 matmul example on metal showing off tensor core (#13033)
* matmul example on metal showing off tensor core

* flip the args of placeholder

* mat_idx

* imp
2025-10-31 19:40:36 +08:00
George Hotz
e066b3176b hotfix: types and names for custom kernel test 2025-10-31 17:34:55 +08:00
George Hotz
54f48f93c6 working backward pass in custom kernel (#13032)
* working backward pass in custom kernel

* custom_kernel tensor method

* no SPEC=2
2025-10-31 17:26:18 +08:00
George Hotz
b791d70725 support custom UOp kernels (#13028)
* support custom UOp kernels

* no number

* multioutput works

* backward kernel runs

* move kernel class

* grad later

* work

* no tags in kernel graph

* test arange

* arange + contig

* delete comment
2025-10-31 15:51:39 +08:00
qazal
9f0c25ec48 viz: use indexing toggle for schedule graph (#13031) 2025-10-31 15:32:08 +08:00
George Hotz
b2caf4c2b3 prepare for custom kernel (#13029) 2025-10-31 14:47:37 +08:00
qazal
564e9ccc31 fix show indexing toggle default on (#13030) 2025-10-31 14:41:15 +08:00
qazal
6cd341354e viz: add toggle to hide indexing UOps (#13027)
* start

* pass opts to worker

* works

* rename to showIndexing

* keep toggle through rewrites

* fix nan

* real fix for nan

* move render function

* fix firefox

* fix safari

* more work
2025-10-31 13:21:11 +08:00
George Hotz
b46229ca51 use shrink in amd_matmul_uop (#13026)
* use shrink in amd_matmul_uop

* colors
2025-10-31 10:43:41 +08:00
wozeparrot
78f7650eec faster tk matmul (#13006) 2025-10-30 19:09:27 -07:00
George Hotz
512513c403 cleanup amd uop matmul (#13025)
* cleanup amd uop matmul

* remove mod

* move that out

* better variable names

* var names

* more

* render fallback

* colors
2025-10-31 10:04:45 +08:00
chenyu
f6430a0559 add script for one slow openpilot conv (#12953)
* add script for one slow openpilot conv

* fix ruff
2025-10-30 18:08:41 -04:00
chenyu
73002ebffa print p.applied_opts with DEBUG >= 3 (#13024) 2025-10-30 16:51:21 -04:00
chenyu
99e76f33a0 remove unneeded TYPE_CHECKING [pr] (#13020) 2025-10-30 12:01:13 -04:00
nimlgen
629b177b66 amd: sqtt works in profile mode (#13019) 2025-10-30 23:48:52 +08:00
Sieds Lykles
4c8362128b New symbolic renderer + strip parens (#13017)
* new uop renderer

* better tester

* strip parens

* update tests

* split method check_uop_against_string

* use ctx.update instead of add_rendered method

* strip parens based on precedence

* update test

* new symbolic renderer

* add comment
2025-10-30 16:41:32 +01:00
chenyu
c78dfcc5a1 simplify ProgramSpec __post_init__ STORE/LOAD [pr] (#13018) 2025-10-30 11:13:21 -04:00
b1tg
363a201cc6 fp8 amd cstyle (#12999)
* amd fp8 cstyle

* don't repeat

* space

* lint

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-30 10:45:52 -04:00
nimlgen
5be3a93d02 amd: enable pmc on gfx12 (#13015) 2025-10-30 22:43:10 +08:00
nimlgen
cf5ab93b8e amd: pmc grbm block (#13016) 2025-10-30 22:42:59 +08:00
nimlgen
4d7a7096c9 am: enable perfmon (#13013)
* am: enable perfmon

* try

* msg
2025-10-30 22:28:36 +08:00
chenyu
985b6eb95f ues less typing.cast [pr] (#13002) 2025-10-30 09:29:52 -04:00
George Hotz
5eb87ab131 hotfix: bump cifar time to 350 2025-10-30 17:29:20 +08:00
George Hotz
4a741e8364 modernize amd uop matmul (#13011)
* modernize amd uop matmul

* progress

* comment

* more comments

* revert that

* mac cleanups

* fix estimates

* format
2025-10-30 17:02:38 +08:00
qazal
66ea3a0be4 put DEFINE_LOCAL counter in context (#13008) 2025-10-30 15:49:26 +08:00
George Hotz
e456f2cb1e more uop programs (#13007)
* more uop program

* test_matmul_relu

* tests fix
2025-10-30 14:57:59 +08:00
wozeparrot
c18b283f58 feat: timeout on stuck socket (#13009) 2025-10-29 23:11:26 -07:00
wozeparrot
92a87e37e4 fix: fetch_file (#13010) 2025-10-29 22:44:22 -07:00
George Hotz
e64d4b3b44 uops programs (#13005)
* uops programs

* work

* work

* more syntax

* more syntax

* comments
2025-10-30 12:28:10 +08:00
George Hotz
5894df059c hotfix: prevent inf loop if reduce splits 2025-10-30 11:21:40 +08:00
George Hotz
2da02f1ae1 add loads at the end (#12988)
* add loads at the end

* simpler

* late load

* tests passing

* fix matvec

* spec test passes

* fix where on load

* fix abs2

* fix more tests
2025-10-30 10:42:19 +08:00
nimlgen
4b001ec723 amd: pmc in mockgpu (#13000)
* amd: pmc in mockgpu

* fix

* do not open in ci
2025-10-30 01:52:02 +08:00
nimlgen
a6f5b1482e amd: perf counters (#12975)
* amd: perf counters

* sq

* cleaner

* fix

* if enabled

* ruff

* mypy

* counters

* reset

* fix

* no cpu
2025-10-30 00:10:31 +08:00
b1tg
457602b350 fix fp8 cast folding (#12997) 2025-10-29 09:27:42 -04:00
Sieds Lykles
70bce62c67 dont collapse possibly empty symbolic range (#12994)
* dont collapse a symbolic range based on min/max

* refactor z3 renderer

* include sink explicitely instead of dtypes.void

* use dtype.scalar()
2025-10-29 12:17:09 +01:00
Sieds Lykles
79903ae2be refactor z3 renderer (#12996)
* refactor z3 renderer

* include sink explicitely instead of dtypes.void

* use dtype.scalar()
2025-10-29 12:01:07 +01:00
George Hotz
819592ee67 hotfix: disable DoubleMatmul for PTX 2025-10-29 16:37:17 +08:00
George Hotz
30ca3f2af8 all double matmul (#12993)
* fix more double matmuls

* a few more

* all double matmul passes

* opts for flash attention

* fix spec

* comment
2025-10-29 16:25:27 +08:00
Sieds Lykles
9f39f6391c shared_codegen_spec and fix index spec (#12967)
* split shared_codegen_spec and fix index

* add VCONST to program_spec and move index to shared_codegen_spec

* working ignore_oob=0

* cleanup

* fix spec

* undo that

* move barrier and special earlier

* fix more spec issues

* more updates

* remove special from program_spec

* cleanup and fixes

* move more to shared

* special is not in shared_spec

* some comments

* dont do bounds check there
2025-10-29 09:14:11 +01:00
George Hotz
1c362736aa fix more double matmuls (#12991)
* fix more double matmuls

* a few more
2025-10-29 16:09:48 +08:00
George Hotz
e42b4edf8c remove if stuff (#12992) 2025-10-29 15:29:35 +08:00
George Hotz
8c47cf4323 pcontig double matmul works (#12899)
* pcontig double matmul works

* tests

* contract

* closer

* works-ish

* add that broadcast

* 2 more work

* something

* disable broken ones

* llvm

* align 16
2025-10-29 13:06:43 +08:00
George Hotz
35b6f4148d delete untested quantize (#12990) 2025-10-29 12:46:32 +08:00
Sieds Lykles
5ce8a1d2f2 Merge adjacent try all permutations for reduce (#12972) 2025-10-29 05:04:54 +01:00
George Hotz
b147e7e8e6 flatten bufferize (#12984)
* flatten bufferize

* simpler

* tests pass

* flat

* not flat
2025-10-29 11:23:43 +08:00
qazal
a7dac11aad viz: keep rewrite step in back button history (#12986) 2025-10-29 11:09:43 +08:00
qazal
37967fa17b viz: add integer query param helper and more typing (#12985)
* viz: query param helper

* json.dumps once
2025-10-29 10:44:01 +08:00
chenyu
fb53bdad5d unused propagate_invalid rules [pr] (#12983)
named is not used, so you know it never matched
2025-10-28 22:16:50 -04:00
chenyu
ef16e6c68c unwrap instead of cast [pr] (#12982) 2025-10-28 21:29:23 -04:00