George Hotz
f0b9416432
Merge branch 'master' into amd_uop
2025-10-30 16:54:39 +08:00
George Hotz
b1db99cefe
format
2025-10-30 16:53:21 +08:00
George Hotz
de28eaa610
fix estimates
2025-10-30 16:48:46 +08:00
George Hotz
f3da3c9be5
mac cleanups
2025-10-30 16:45:01 +08:00
George Hotz
c43f22b143
revert that
2025-10-30 16:35:33 +08:00
George Hotz
34e631eb26
more comments
2025-10-30 08:24:26 +00:00
George Hotz
16983e9c95
comment
2025-10-30 08:04:57 +00:00
qazal
66ea3a0be4
put DEFINE_LOCAL counter in context ( #13008 )
2025-10-30 15:49:26 +08:00
George Hotz
3eb00a421f
progress
2025-10-30 07:43:03 +00:00
George Hotz
b54493b003
modernize amd uop matmul
2025-10-30 07:10:00 +00:00
George Hotz
e456f2cb1e
more uop programs ( #13007 )
...
* more uop program
* test_matmul_relu
* tests fix
2025-10-30 14:57:59 +08:00
wozeparrot
c18b283f58
feat: timeout on stuck socket ( #13009 )
2025-10-29 23:11:26 -07:00
wozeparrot
92a87e37e4
fix: fetch_file ( #13010 )
2025-10-29 22:44:22 -07:00
George Hotz
e64d4b3b44
uops programs ( #13005 )
...
* uops programs
* work
* work
* more syntax
* more syntax
* comments
2025-10-30 12:28:10 +08:00
George Hotz
5894df059c
hotfix: prevent inf loop if reduce splits
2025-10-30 11:21:40 +08:00
George Hotz
2da02f1ae1
add loads at the end ( #12988 )
...
* add loads at the end
* simpler
* late load
* tests passing
* fix matvec
* spec test passes
* fix where on load
* fix abs2
* fix more tests
2025-10-30 10:42:19 +08:00
nimlgen
4b001ec723
amd: pmc in mockgpu ( #13000 )
...
* amd: pmc in mockgpu
* fix
* do not open in ci
2025-10-30 01:52:02 +08:00
nimlgen
a6f5b1482e
amd: perf counters ( #12975 )
...
* amd: perf counters
* sq
* cleaner
* fix
* if enabled
* ruff
* mypy
* counters
* reset
* fix
* no cpu
2025-10-30 00:10:31 +08:00
b1tg
457602b350
fix fp8 cast folding ( #12997 )
2025-10-29 09:27:42 -04:00
Sieds Lykles
70bce62c67
dont collapse possibly empty symbolic range ( #12994 )
...
* dont collapse a symbolic range based on min/max
* refactor z3 renderer
* include sink explicitely instead of dtypes.void
* use dtype.scalar()
2025-10-29 12:17:09 +01:00
Sieds Lykles
79903ae2be
refactor z3 renderer ( #12996 )
...
* refactor z3 renderer
* include sink explicitely instead of dtypes.void
* use dtype.scalar()
2025-10-29 12:01:07 +01:00
George Hotz
819592ee67
hotfix: disable DoubleMatmul for PTX
2025-10-29 16:37:17 +08:00
George Hotz
30ca3f2af8
all double matmul ( #12993 )
...
* fix more double matmuls
* a few more
* all double matmul passes
* opts for flash attention
* fix spec
* comment
2025-10-29 16:25:27 +08:00
Sieds Lykles
9f39f6391c
shared_codegen_spec and fix index spec ( #12967 )
...
* split shared_codegen_spec and fix index
* add VCONST to program_spec and move index to shared_codegen_spec
* working ignore_oob=0
* cleanup
* fix spec
* undo that
* move barrier and special earlier
* fix more spec issues
* more updates
* remove special from program_spec
* cleanup and fixes
* move more to shared
* special is not in shared_spec
* some comments
* dont do bounds check there
2025-10-29 09:14:11 +01:00
George Hotz
1c362736aa
fix more double matmuls ( #12991 )
...
* fix more double matmuls
* a few more
2025-10-29 16:09:48 +08:00
George Hotz
e42b4edf8c
remove if stuff ( #12992 )
2025-10-29 15:29:35 +08:00
George Hotz
8c47cf4323
pcontig double matmul works ( #12899 )
...
* pcontig double matmul works
* tests
* contract
* closer
* works-ish
* add that broadcast
* 2 more work
* something
* disable broken ones
* llvm
* align 16
2025-10-29 13:06:43 +08:00
George Hotz
35b6f4148d
delete untested quantize ( #12990 )
2025-10-29 12:46:32 +08:00
Sieds Lykles
5ce8a1d2f2
Merge adjacent try all permutations for reduce ( #12972 )
2025-10-29 05:04:54 +01:00
George Hotz
b147e7e8e6
flatten bufferize ( #12984 )
...
* flatten bufferize
* simpler
* tests pass
* flat
* not flat
2025-10-29 11:23:43 +08:00
qazal
a7dac11aad
viz: keep rewrite step in back button history ( #12986 )
2025-10-29 11:09:43 +08:00
qazal
37967fa17b
viz: add integer query param helper and more typing ( #12985 )
...
* viz: query param helper
* json.dumps once
2025-10-29 10:44:01 +08:00
chenyu
fb53bdad5d
unused propagate_invalid rules [pr] ( #12983 )
...
named is not used, so you know it never matched
2025-10-28 22:16:50 -04:00
chenyu
ef16e6c68c
unwrap instead of cast [pr] ( #12982 )
2025-10-28 21:29:23 -04:00
chenyu
f55fcfecf9
ProgramSpec uops must end with SINK [pr] ( #12981 )
2025-10-28 17:12:22 -04:00
chenyu
9442442cb1
update variable names in search [pr] ( #12979 )
...
no lin nor linearize
2025-10-28 15:37:52 -04:00
wozeparrot
d66c997a39
feat: thunderkittens fa2 ( #12955 )
2025-10-28 11:27:45 -07:00
b1tg
bb307b9e81
fix fp8 vectorization ( #12977 )
...
* fix fp8 vectorization
* add fp8 tc to benchmark
2025-10-28 13:55:30 -04:00
nimlgen
c11dd56956
amd: cleanup import urls ( #12976 )
2025-10-29 00:43:02 +08:00
George Hotz
5e01cc299b
zero len ranges fail ( #12974 )
...
* zero len ranges fail
* fix Python backend
* fix llvm
* fix ptx
* yolo fix nir
* this works...
* always store...
* always store...
* Revert "always store..."
This reverts commit 0816cf344d .
2025-10-28 22:49:55 +08:00
George Hotz
e936aa7974
cleanups from if range branch ( #12973 )
2025-10-28 20:58:47 +08:00
qazal
901d27b3ba
viz: optional text dims try 2 ( #12971 )
2025-10-28 18:54:28 +08:00
George Hotz
f5a3b33d33
add fun with nhwc convs
2025-10-28 17:12:22 +08:00
George Hotz
907499b02c
clean up GROUP/SINK ( #12969 )
...
* clean up GROUP/SINK
* fix end
* range_str color
2025-10-28 16:08:10 +08:00
Sieds Lykles
e22c5e7e73
process_replay uses opts argument for KernelInfo.opts_to_apply ( #12946 )
...
* opts_to_apply is opts
* skip beamed kernels
* simpler change
* fix the tensor cores tests for process replay
* use opts
2025-10-28 09:00:28 +01:00
George Hotz
6c9560a846
more syntactic sugar for pyrender ( #12968 )
2025-10-28 15:24:33 +08:00
George Hotz
b0da173f2f
add unique to const, fix longstanding bug ( #12965 )
...
* add unique to const, fix longstanding bug
* _force_unique=True
* fix tests
* fix more tests
2025-10-28 15:11:37 +08:00
Sieds Lykles
e110f4632a
split cat (on cpu) ( #12864 )
...
* split ranges but only on cpu
* except KernelOptError for threads
* use GROUP and END
* no more flatten_range needed
* remove noop end
* always process replay for openpilot
* update test
* skip test
* fix in outs calculation
With the new linearizer the toposort is a problem, this matches the spec
now
* undo that
2025-10-28 07:55:19 +01:00
qazal
3b82dee625
viz: match DEBUG=2 for exec item metadata ( #12966 )
...
* viz: match DEBUG=2 for exec item metadata
* remove repr from kernel
2025-10-28 14:53:57 +08:00
qazal
99589dea81
move viz edge tagging to UOp graph ( #12964 )
2025-10-28 12:46:23 +08:00