10840 Commits

Author SHA1 Message Date
George Hotz
f0b9416432 Merge branch 'master' into amd_uop 2025-10-30 16:54:39 +08:00
George Hotz
b1db99cefe format 2025-10-30 16:53:21 +08:00
George Hotz
de28eaa610 fix estimates 2025-10-30 16:48:46 +08:00
George Hotz
f3da3c9be5 mac cleanups 2025-10-30 16:45:01 +08:00
George Hotz
c43f22b143 revert that 2025-10-30 16:35:33 +08:00
George Hotz
34e631eb26 more comments 2025-10-30 08:24:26 +00:00
George Hotz
16983e9c95 comment 2025-10-30 08:04:57 +00:00
qazal
66ea3a0be4 put DEFINE_LOCAL counter in context (#13008) 2025-10-30 15:49:26 +08:00
George Hotz
3eb00a421f progress 2025-10-30 07:43:03 +00:00
George Hotz
b54493b003 modernize amd uop matmul 2025-10-30 07:10:00 +00:00
George Hotz
e456f2cb1e more uop programs (#13007)
* more uop program

* test_matmul_relu

* tests fix
2025-10-30 14:57:59 +08:00
wozeparrot
c18b283f58 feat: timeout on stuck socket (#13009) 2025-10-29 23:11:26 -07:00
wozeparrot
92a87e37e4 fix: fetch_file (#13010) 2025-10-29 22:44:22 -07:00
George Hotz
e64d4b3b44 uops programs (#13005)
* uops programs

* work

* work

* more syntax

* more syntax

* comments
2025-10-30 12:28:10 +08:00
George Hotz
5894df059c hotfix: prevent inf loop if reduce splits 2025-10-30 11:21:40 +08:00
George Hotz
2da02f1ae1 add loads at the end (#12988)
* add loads at the end

* simpler

* late load

* tests passing

* fix matvec

* spec test passes

* fix where on load

* fix abs2

* fix more tests
2025-10-30 10:42:19 +08:00
nimlgen
4b001ec723 amd: pmc in mockgpu (#13000)
* amd: pmc in mockgpu

* fix

* do not open in ci
2025-10-30 01:52:02 +08:00
nimlgen
a6f5b1482e amd: perf counters (#12975)
* amd: perf counters

* sq

* cleaner

* fix

* if enabled

* ruff

* mypy

* counters

* reset

* fix

* no cpu
2025-10-30 00:10:31 +08:00
b1tg
457602b350 fix fp8 cast folding (#12997) 2025-10-29 09:27:42 -04:00
Sieds Lykles
70bce62c67 dont collapse possibly empty symbolic range (#12994)
* dont collapse a symbolic range based on min/max

* refactor z3 renderer

* include sink explicitely instead of dtypes.void

* use dtype.scalar()
2025-10-29 12:17:09 +01:00
Sieds Lykles
79903ae2be refactor z3 renderer (#12996)
* refactor z3 renderer

* include sink explicitely instead of dtypes.void

* use dtype.scalar()
2025-10-29 12:01:07 +01:00
George Hotz
819592ee67 hotfix: disable DoubleMatmul for PTX 2025-10-29 16:37:17 +08:00
George Hotz
30ca3f2af8 all double matmul (#12993)
* fix more double matmuls

* a few more

* all double matmul passes

* opts for flash attention

* fix spec

* comment
2025-10-29 16:25:27 +08:00
Sieds Lykles
9f39f6391c shared_codegen_spec and fix index spec (#12967)
* split shared_codegen_spec and fix index

* add VCONST to program_spec and move index to shared_codegen_spec

* working ignore_oob=0

* cleanup

* fix spec

* undo that

* move barrier and special earlier

* fix more spec issues

* more updates

* remove special from program_spec

* cleanup and fixes

* move more to shared

* special is not in shared_spec

* some comments

* dont do bounds check there
2025-10-29 09:14:11 +01:00
George Hotz
1c362736aa fix more double matmuls (#12991)
* fix more double matmuls

* a few more
2025-10-29 16:09:48 +08:00
George Hotz
e42b4edf8c remove if stuff (#12992) 2025-10-29 15:29:35 +08:00
George Hotz
8c47cf4323 pcontig double matmul works (#12899)
* pcontig double matmul works

* tests

* contract

* closer

* works-ish

* add that broadcast

* 2 more work

* something

* disable broken ones

* llvm

* align 16
2025-10-29 13:06:43 +08:00
George Hotz
35b6f4148d delete untested quantize (#12990) 2025-10-29 12:46:32 +08:00
Sieds Lykles
5ce8a1d2f2 Merge adjacent try all permutations for reduce (#12972) 2025-10-29 05:04:54 +01:00
George Hotz
b147e7e8e6 flatten bufferize (#12984)
* flatten bufferize

* simpler

* tests pass

* flat

* not flat
2025-10-29 11:23:43 +08:00
qazal
a7dac11aad viz: keep rewrite step in back button history (#12986) 2025-10-29 11:09:43 +08:00
qazal
37967fa17b viz: add integer query param helper and more typing (#12985)
* viz: query param helper

* json.dumps once
2025-10-29 10:44:01 +08:00
chenyu
fb53bdad5d unused propagate_invalid rules [pr] (#12983)
named is not used, so you know it never matched
2025-10-28 22:16:50 -04:00
chenyu
ef16e6c68c unwrap instead of cast [pr] (#12982) 2025-10-28 21:29:23 -04:00
chenyu
f55fcfecf9 ProgramSpec uops must end with SINK [pr] (#12981) 2025-10-28 17:12:22 -04:00
chenyu
9442442cb1 update variable names in search [pr] (#12979)
no lin nor linearize
2025-10-28 15:37:52 -04:00
wozeparrot
d66c997a39 feat: thunderkittens fa2 (#12955) 2025-10-28 11:27:45 -07:00
b1tg
bb307b9e81 fix fp8 vectorization (#12977)
* fix fp8 vectorization

* add fp8 tc to benchmark
2025-10-28 13:55:30 -04:00
nimlgen
c11dd56956 amd: cleanup import urls (#12976) 2025-10-29 00:43:02 +08:00
George Hotz
5e01cc299b zero len ranges fail (#12974)
* zero len ranges fail

* fix Python backend

* fix llvm

* fix ptx

* yolo fix nir

* this works...

* always store...

* always store...

* Revert "always store..."

This reverts commit 0816cf344d.
2025-10-28 22:49:55 +08:00
George Hotz
e936aa7974 cleanups from if range branch (#12973) 2025-10-28 20:58:47 +08:00
qazal
901d27b3ba viz: optional text dims try 2 (#12971) 2025-10-28 18:54:28 +08:00
George Hotz
f5a3b33d33 add fun with nhwc convs 2025-10-28 17:12:22 +08:00
George Hotz
907499b02c clean up GROUP/SINK (#12969)
* clean up GROUP/SINK

* fix end

* range_str color
2025-10-28 16:08:10 +08:00
Sieds Lykles
e22c5e7e73 process_replay uses opts argument for KernelInfo.opts_to_apply (#12946)
* opts_to_apply is opts

* skip beamed kernels

* simpler change

* fix the tensor cores tests for process replay

* use opts
2025-10-28 09:00:28 +01:00
George Hotz
6c9560a846 more syntactic sugar for pyrender (#12968) 2025-10-28 15:24:33 +08:00
George Hotz
b0da173f2f add unique to const, fix longstanding bug (#12965)
* add unique to const, fix longstanding bug

* _force_unique=True

* fix tests

* fix more tests
2025-10-28 15:11:37 +08:00
Sieds Lykles
e110f4632a split cat (on cpu) (#12864)
* split ranges but only on cpu

* except KernelOptError for threads

* use GROUP and END

* no more flatten_range needed

* remove noop end

* always process replay for openpilot

* update test

* skip test

* fix in outs calculation

With the new linearizer the toposort is a problem, this matches the spec
now

* undo that
2025-10-28 07:55:19 +01:00
qazal
3b82dee625 viz: match DEBUG=2 for exec item metadata (#12966)
* viz: match DEBUG=2 for exec item metadata

* remove repr from kernel
2025-10-28 14:53:57 +08:00
qazal
99589dea81 move viz edge tagging to UOp graph (#12964) 2025-10-28 12:46:23 +08:00