Sieds Lykles
3dc593c536
add strip_params to pyrender ( #13021 )
...
* add strip_params to pyrender
* update that one too
* strip_parens fix
* cleaner
* add test
* add some more tests
* cleaner strip_parens
2025-10-31 14:15:56 +01:00
George Hotz
bc178d14a9
matmul example on metal showing off tensor core ( #13033 )
...
* matmul example on metal showing off tensor core
* flip the args of placeholder
* mat_idx
* imp
2025-10-31 19:40:36 +08:00
George Hotz
e066b3176b
hotfix: types and names for custom kernel test
2025-10-31 17:34:55 +08:00
George Hotz
54f48f93c6
working backward pass in custom kernel ( #13032 )
...
* working backward pass in custom kernel
* custom_kernel tensor method
* no SPEC=2
2025-10-31 17:26:18 +08:00
George Hotz
b791d70725
support custom UOp kernels ( #13028 )
...
* support custom UOp kernels
* no number
* multioutput works
* backward kernel runs
* move kernel class
* grad later
* work
* no tags in kernel graph
* test arange
* arange + contig
* delete comment
2025-10-31 15:51:39 +08:00
qazal
9f0c25ec48
viz: use indexing toggle for schedule graph ( #13031 )
2025-10-31 15:32:08 +08:00
George Hotz
b2caf4c2b3
prepare for custom kernel ( #13029 )
2025-10-31 14:47:37 +08:00
qazal
564e9ccc31
fix show indexing toggle default on ( #13030 )
2025-10-31 14:41:15 +08:00
qazal
6cd341354e
viz: add toggle to hide indexing UOps ( #13027 )
...
* start
* pass opts to worker
* works
* rename to showIndexing
* keep toggle through rewrites
* fix nan
* real fix for nan
* move render function
* fix firefox
* fix safari
* more work
2025-10-31 13:21:11 +08:00
George Hotz
b46229ca51
use shrink in amd_matmul_uop ( #13026 )
...
* use shrink in amd_matmul_uop
* colors
2025-10-31 10:43:41 +08:00
wozeparrot
78f7650eec
faster tk matmul ( #13006 )
2025-10-30 19:09:27 -07:00
George Hotz
512513c403
cleanup amd uop matmul ( #13025 )
...
* cleanup amd uop matmul
* remove mod
* move that out
* better variable names
* var names
* more
* render fallback
* colors
2025-10-31 10:04:45 +08:00
chenyu
f6430a0559
add script for one slow openpilot conv ( #12953 )
...
* add script for one slow openpilot conv
* fix ruff
2025-10-30 18:08:41 -04:00
chenyu
73002ebffa
print p.applied_opts with DEBUG >= 3 ( #13024 )
2025-10-30 16:51:21 -04:00
chenyu
99e76f33a0
remove unneeded TYPE_CHECKING [pr] ( #13020 )
2025-10-30 12:01:13 -04:00
nimlgen
629b177b66
amd: sqtt works in profile mode ( #13019 )
2025-10-30 23:48:52 +08:00
Sieds Lykles
4c8362128b
New symbolic renderer + strip parens ( #13017 )
...
* new uop renderer
* better tester
* strip parens
* update tests
* split method check_uop_against_string
* use ctx.update instead of add_rendered method
* strip parens based on precedence
* update test
* new symbolic renderer
* add comment
2025-10-30 16:41:32 +01:00
chenyu
c78dfcc5a1
simplify ProgramSpec __post_init__ STORE/LOAD [pr] ( #13018 )
2025-10-30 11:13:21 -04:00
b1tg
363a201cc6
fp8 amd cstyle ( #12999 )
...
* amd fp8 cstyle
* don't repeat
* space
* lint
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-10-30 10:45:52 -04:00
nimlgen
5be3a93d02
amd: enable pmc on gfx12 ( #13015 )
2025-10-30 22:43:10 +08:00
nimlgen
cf5ab93b8e
amd: pmc grbm block ( #13016 )
2025-10-30 22:42:59 +08:00
nimlgen
4d7a7096c9
am: enable perfmon ( #13013 )
...
* am: enable perfmon
* try
* msg
2025-10-30 22:28:36 +08:00
chenyu
985b6eb95f
ues less typing.cast [pr] ( #13002 )
2025-10-30 09:29:52 -04:00
George Hotz
5eb87ab131
hotfix: bump cifar time to 350
2025-10-30 17:29:20 +08:00
George Hotz
4a741e8364
modernize amd uop matmul ( #13011 )
...
* modernize amd uop matmul
* progress
* comment
* more comments
* revert that
* mac cleanups
* fix estimates
* format
2025-10-30 17:02:38 +08:00
qazal
66ea3a0be4
put DEFINE_LOCAL counter in context ( #13008 )
2025-10-30 15:49:26 +08:00
George Hotz
e456f2cb1e
more uop programs ( #13007 )
...
* more uop program
* test_matmul_relu
* tests fix
2025-10-30 14:57:59 +08:00
wozeparrot
c18b283f58
feat: timeout on stuck socket ( #13009 )
2025-10-29 23:11:26 -07:00
wozeparrot
92a87e37e4
fix: fetch_file ( #13010 )
2025-10-29 22:44:22 -07:00
George Hotz
e64d4b3b44
uops programs ( #13005 )
...
* uops programs
* work
* work
* more syntax
* more syntax
* comments
2025-10-30 12:28:10 +08:00
George Hotz
5894df059c
hotfix: prevent inf loop if reduce splits
2025-10-30 11:21:40 +08:00
George Hotz
2da02f1ae1
add loads at the end ( #12988 )
...
* add loads at the end
* simpler
* late load
* tests passing
* fix matvec
* spec test passes
* fix where on load
* fix abs2
* fix more tests
2025-10-30 10:42:19 +08:00
nimlgen
4b001ec723
amd: pmc in mockgpu ( #13000 )
...
* amd: pmc in mockgpu
* fix
* do not open in ci
2025-10-30 01:52:02 +08:00
nimlgen
a6f5b1482e
amd: perf counters ( #12975 )
...
* amd: perf counters
* sq
* cleaner
* fix
* if enabled
* ruff
* mypy
* counters
* reset
* fix
* no cpu
2025-10-30 00:10:31 +08:00
b1tg
457602b350
fix fp8 cast folding ( #12997 )
2025-10-29 09:27:42 -04:00
Sieds Lykles
70bce62c67
dont collapse possibly empty symbolic range ( #12994 )
...
* dont collapse a symbolic range based on min/max
* refactor z3 renderer
* include sink explicitely instead of dtypes.void
* use dtype.scalar()
2025-10-29 12:17:09 +01:00
Sieds Lykles
79903ae2be
refactor z3 renderer ( #12996 )
...
* refactor z3 renderer
* include sink explicitely instead of dtypes.void
* use dtype.scalar()
2025-10-29 12:01:07 +01:00
George Hotz
819592ee67
hotfix: disable DoubleMatmul for PTX
2025-10-29 16:37:17 +08:00
George Hotz
30ca3f2af8
all double matmul ( #12993 )
...
* fix more double matmuls
* a few more
* all double matmul passes
* opts for flash attention
* fix spec
* comment
2025-10-29 16:25:27 +08:00
Sieds Lykles
9f39f6391c
shared_codegen_spec and fix index spec ( #12967 )
...
* split shared_codegen_spec and fix index
* add VCONST to program_spec and move index to shared_codegen_spec
* working ignore_oob=0
* cleanup
* fix spec
* undo that
* move barrier and special earlier
* fix more spec issues
* more updates
* remove special from program_spec
* cleanup and fixes
* move more to shared
* special is not in shared_spec
* some comments
* dont do bounds check there
2025-10-29 09:14:11 +01:00
George Hotz
1c362736aa
fix more double matmuls ( #12991 )
...
* fix more double matmuls
* a few more
2025-10-29 16:09:48 +08:00
George Hotz
e42b4edf8c
remove if stuff ( #12992 )
2025-10-29 15:29:35 +08:00
George Hotz
8c47cf4323
pcontig double matmul works ( #12899 )
...
* pcontig double matmul works
* tests
* contract
* closer
* works-ish
* add that broadcast
* 2 more work
* something
* disable broken ones
* llvm
* align 16
2025-10-29 13:06:43 +08:00
George Hotz
35b6f4148d
delete untested quantize ( #12990 )
2025-10-29 12:46:32 +08:00
Sieds Lykles
5ce8a1d2f2
Merge adjacent try all permutations for reduce ( #12972 )
2025-10-29 05:04:54 +01:00
George Hotz
b147e7e8e6
flatten bufferize ( #12984 )
...
* flatten bufferize
* simpler
* tests pass
* flat
* not flat
2025-10-29 11:23:43 +08:00
qazal
a7dac11aad
viz: keep rewrite step in back button history ( #12986 )
2025-10-29 11:09:43 +08:00
qazal
37967fa17b
viz: add integer query param helper and more typing ( #12985 )
...
* viz: query param helper
* json.dumps once
2025-10-29 10:44:01 +08:00
chenyu
fb53bdad5d
unused propagate_invalid rules [pr] ( #12983 )
...
named is not used, so you know it never matched
2025-10-28 22:16:50 -04:00
chenyu
ef16e6c68c
unwrap instead of cast [pr] ( #12982 )
2025-10-28 21:29:23 -04:00