Commit Graph

10852 Commits

Author SHA1 Message Date
George Hotz
e6806015a5 colors 2025-10-31 09:57:23 +08:00
George Hotz
40af34f9ab render fallback 2025-10-31 09:50:23 +08:00
George Hotz
be3fad06f4 more 2025-10-31 09:40:43 +08:00
George Hotz
540a11a850 var names 2025-10-31 09:31:47 +08:00
George Hotz
0a4c77e85f better variable names 2025-10-31 09:28:51 +08:00
George Hotz
2543ce7585 move that out 2025-10-31 09:12:38 +08:00
George Hotz
ba1d1142be remove mod 2025-10-31 09:02:35 +08:00
George Hotz
59ad5d51f5 cleanup amd uop matmul 2025-10-31 08:44:05 +08:00
chenyu
f6430a0559 add script for one slow openpilot conv (#12953)
* add script for one slow openpilot conv

* fix ruff
2025-10-30 18:08:41 -04:00
chenyu
73002ebffa print p.applied_opts with DEBUG >= 3 (#13024) 2025-10-30 16:51:21 -04:00
chenyu
99e76f33a0 remove unneeded TYPE_CHECKING [pr] (#13020) 2025-10-30 12:01:13 -04:00
nimlgen
629b177b66 amd: sqtt works in profile mode (#13019) 2025-10-30 23:48:52 +08:00
Sieds Lykles
4c8362128b New symbolic renderer + strip parens (#13017)
* new uop renderer

* better tester

* strip parens

* update tests

* split method check_uop_against_string

* use ctx.update instead of add_rendered method

* strip parens based on precedence

* update test

* new symbolic renderer

* add comment
2025-10-30 16:41:32 +01:00
chenyu
c78dfcc5a1 simplify ProgramSpec __post_init__ STORE/LOAD [pr] (#13018) 2025-10-30 11:13:21 -04:00
b1tg
363a201cc6 fp8 amd cstyle (#12999)
* amd fp8 cstyle

* don't repeat

* space

* lint

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-30 10:45:52 -04:00
nimlgen
5be3a93d02 amd: enable pmc on gfx12 (#13015) 2025-10-30 22:43:10 +08:00
nimlgen
cf5ab93b8e amd: pmc grbm block (#13016) 2025-10-30 22:42:59 +08:00
nimlgen
4d7a7096c9 am: enable perfmon (#13013)
* am: enable perfmon

* try

* msg
2025-10-30 22:28:36 +08:00
chenyu
985b6eb95f ues less typing.cast [pr] (#13002) 2025-10-30 09:29:52 -04:00
George Hotz
5eb87ab131 hotfix: bump cifar time to 350 2025-10-30 17:29:20 +08:00
George Hotz
4a741e8364 modernize amd uop matmul (#13011)
* modernize amd uop matmul

* progress

* comment

* more comments

* revert that

* mac cleanups

* fix estimates

* format
2025-10-30 17:02:38 +08:00
qazal
66ea3a0be4 put DEFINE_LOCAL counter in context (#13008) 2025-10-30 15:49:26 +08:00
George Hotz
e456f2cb1e more uop programs (#13007)
* more uop program

* test_matmul_relu

* tests fix
2025-10-30 14:57:59 +08:00
wozeparrot
c18b283f58 feat: timeout on stuck socket (#13009) 2025-10-29 23:11:26 -07:00
wozeparrot
92a87e37e4 fix: fetch_file (#13010) 2025-10-29 22:44:22 -07:00
George Hotz
e64d4b3b44 uops programs (#13005)
* uops programs

* work

* work

* more syntax

* more syntax

* comments
2025-10-30 12:28:10 +08:00
George Hotz
5894df059c hotfix: prevent inf loop if reduce splits 2025-10-30 11:21:40 +08:00
George Hotz
2da02f1ae1 add loads at the end (#12988)
* add loads at the end

* simpler

* late load

* tests passing

* fix matvec

* spec test passes

* fix where on load

* fix abs2

* fix more tests
2025-10-30 10:42:19 +08:00
nimlgen
4b001ec723 amd: pmc in mockgpu (#13000)
* amd: pmc in mockgpu

* fix

* do not open in ci
2025-10-30 01:52:02 +08:00
nimlgen
a6f5b1482e amd: perf counters (#12975)
* amd: perf counters

* sq

* cleaner

* fix

* if enabled

* ruff

* mypy

* counters

* reset

* fix

* no cpu
2025-10-30 00:10:31 +08:00
b1tg
457602b350 fix fp8 cast folding (#12997) 2025-10-29 09:27:42 -04:00
Sieds Lykles
70bce62c67 dont collapse possibly empty symbolic range (#12994)
* dont collapse a symbolic range based on min/max

* refactor z3 renderer

* include sink explicitely instead of dtypes.void

* use dtype.scalar()
2025-10-29 12:17:09 +01:00
Sieds Lykles
79903ae2be refactor z3 renderer (#12996)
* refactor z3 renderer

* include sink explicitely instead of dtypes.void

* use dtype.scalar()
2025-10-29 12:01:07 +01:00
George Hotz
819592ee67 hotfix: disable DoubleMatmul for PTX 2025-10-29 16:37:17 +08:00
George Hotz
30ca3f2af8 all double matmul (#12993)
* fix more double matmuls

* a few more

* all double matmul passes

* opts for flash attention

* fix spec

* comment
2025-10-29 16:25:27 +08:00
Sieds Lykles
9f39f6391c shared_codegen_spec and fix index spec (#12967)
* split shared_codegen_spec and fix index

* add VCONST to program_spec and move index to shared_codegen_spec

* working ignore_oob=0

* cleanup

* fix spec

* undo that

* move barrier and special earlier

* fix more spec issues

* more updates

* remove special from program_spec

* cleanup and fixes

* move more to shared

* special is not in shared_spec

* some comments

* dont do bounds check there
2025-10-29 09:14:11 +01:00
George Hotz
1c362736aa fix more double matmuls (#12991)
* fix more double matmuls

* a few more
2025-10-29 16:09:48 +08:00
George Hotz
e42b4edf8c remove if stuff (#12992) 2025-10-29 15:29:35 +08:00
George Hotz
8c47cf4323 pcontig double matmul works (#12899)
* pcontig double matmul works

* tests

* contract

* closer

* works-ish

* add that broadcast

* 2 more work

* something

* disable broken ones

* llvm

* align 16
2025-10-29 13:06:43 +08:00
George Hotz
35b6f4148d delete untested quantize (#12990) 2025-10-29 12:46:32 +08:00
Sieds Lykles
5ce8a1d2f2 Merge adjacent try all permutations for reduce (#12972) 2025-10-29 05:04:54 +01:00
George Hotz
b147e7e8e6 flatten bufferize (#12984)
* flatten bufferize

* simpler

* tests pass

* flat

* not flat
2025-10-29 11:23:43 +08:00
qazal
a7dac11aad viz: keep rewrite step in back button history (#12986) 2025-10-29 11:09:43 +08:00
qazal
37967fa17b viz: add integer query param helper and more typing (#12985)
* viz: query param helper

* json.dumps once
2025-10-29 10:44:01 +08:00
chenyu
fb53bdad5d unused propagate_invalid rules [pr] (#12983)
named is not used, so you know it never matched
2025-10-28 22:16:50 -04:00
chenyu
ef16e6c68c unwrap instead of cast [pr] (#12982) 2025-10-28 21:29:23 -04:00
chenyu
f55fcfecf9 ProgramSpec uops must end with SINK [pr] (#12981) 2025-10-28 17:12:22 -04:00
chenyu
9442442cb1 update variable names in search [pr] (#12979)
no lin nor linearize
2025-10-28 15:37:52 -04:00
wozeparrot
d66c997a39 feat: thunderkittens fa2 (#12955) 2025-10-28 11:27:45 -07:00
b1tg
bb307b9e81 fix fp8 vectorization (#12977)
* fix fp8 vectorization

* add fp8 tc to benchmark
2025-10-28 13:55:30 -04:00