George Hotz
89d8b79196
fix more tests
2025-10-30 10:32:22 +08:00
George Hotz
54ffca78a7
spec test passes
2025-10-30 10:09:56 +08:00
George Hotz
42726bcc29
tests passing
2025-10-30 09:55:36 +08:00
nimlgen
4b001ec723
amd: pmc in mockgpu ( #13000 )
...
* amd: pmc in mockgpu
* fix
* do not open in ci
2025-10-30 01:52:02 +08:00
Sieds Lykles
70bce62c67
dont collapse possibly empty symbolic range ( #12994 )
...
* dont collapse a symbolic range based on min/max
* refactor z3 renderer
* include sink explicitely instead of dtypes.void
* use dtype.scalar()
2025-10-29 12:17:09 +01:00
George Hotz
819592ee67
hotfix: disable DoubleMatmul for PTX
2025-10-29 16:37:17 +08:00
George Hotz
30ca3f2af8
all double matmul ( #12993 )
...
* fix more double matmuls
* a few more
* all double matmul passes
* opts for flash attention
* fix spec
* comment
2025-10-29 16:25:27 +08:00
Sieds Lykles
9f39f6391c
shared_codegen_spec and fix index spec ( #12967 )
...
* split shared_codegen_spec and fix index
* add VCONST to program_spec and move index to shared_codegen_spec
* working ignore_oob=0
* cleanup
* fix spec
* undo that
* move barrier and special earlier
* fix more spec issues
* more updates
* remove special from program_spec
* cleanup and fixes
* move more to shared
* special is not in shared_spec
* some comments
* dont do bounds check there
2025-10-29 09:14:11 +01:00
George Hotz
1c362736aa
fix more double matmuls ( #12991 )
...
* fix more double matmuls
* a few more
2025-10-29 16:09:48 +08:00
George Hotz
8c47cf4323
pcontig double matmul works ( #12899 )
...
* pcontig double matmul works
* tests
* contract
* closer
* works-ish
* add that broadcast
* 2 more work
* something
* disable broken ones
* llvm
* align 16
2025-10-29 13:06:43 +08:00
George Hotz
b147e7e8e6
flatten bufferize ( #12984 )
...
* flatten bufferize
* simpler
* tests pass
* flat
* not flat
2025-10-29 11:23:43 +08:00
chenyu
ef16e6c68c
unwrap instead of cast [pr] ( #12982 )
2025-10-28 21:29:23 -04:00
George Hotz
5e01cc299b
zero len ranges fail ( #12974 )
...
* zero len ranges fail
* fix Python backend
* fix llvm
* fix ptx
* yolo fix nir
* this works...
* always store...
* always store...
* Revert "always store..."
This reverts commit 0816cf344d .
2025-10-28 22:49:55 +08:00
George Hotz
f5a3b33d33
add fun with nhwc convs
2025-10-28 17:12:22 +08:00
George Hotz
907499b02c
clean up GROUP/SINK ( #12969 )
...
* clean up GROUP/SINK
* fix end
* range_str color
2025-10-28 16:08:10 +08:00
Sieds Lykles
e22c5e7e73
process_replay uses opts argument for KernelInfo.opts_to_apply ( #12946 )
...
* opts_to_apply is opts
* skip beamed kernels
* simpler change
* fix the tensor cores tests for process replay
* use opts
2025-10-28 09:00:28 +01:00
George Hotz
b0da173f2f
add unique to const, fix longstanding bug ( #12965 )
...
* add unique to const, fix longstanding bug
* _force_unique=True
* fix tests
* fix more tests
2025-10-28 15:11:37 +08:00
Sieds Lykles
e110f4632a
split cat (on cpu) ( #12864 )
...
* split ranges but only on cpu
* except KernelOptError for threads
* use GROUP and END
* no more flatten_range needed
* remove noop end
* always process replay for openpilot
* update test
* skip test
* fix in outs calculation
With the new linearizer the toposort is a problem, this matches the spec
now
* undo that
2025-10-28 07:55:19 +01:00
George Hotz
4d817a289e
simplify spec ( #12958 )
...
* simplify spec
* more
2025-10-28 09:52:32 +08:00
chenyu
a79832b01f
control_flow.py -> linearizer.py [pr] ( #12948 )
2025-10-27 12:38:13 -04:00
George Hotz
25c2da1579
check SPEC=2 in CI ( #12945 )
...
* check SPEC=2 in CI
* split SPEC=2
* fast enough
2025-10-27 21:53:57 +08:00
George Hotz
701a632907
move VECTORIZE/CONST ( #12942 )
2025-10-27 17:37:13 +08:00
George Hotz
804133cffd
rename RECIP to RECIPROCAL ( #12939 )
2025-10-27 16:53:13 +08:00
chenyu
e18922f111
limit AND const min max to ints [pr] ( #12918 )
2025-10-25 16:07:52 -04:00
George Hotz
b4f6a2c7a3
add kernel spec ( #12911 )
...
* add kernel spec
* fix kernel spec
2025-10-25 11:49:20 +08:00
George Hotz
8a941d95a4
SPEC=2 is full spec, SPEC=1 is default ( #12910 )
...
* SPEC=1 passes all tests
* just use SPEC, not __debug__
2025-10-25 11:10:43 +08:00
chenyu
4b7329001d
clean up test_avg_pool3d ( #12905 )
2025-10-24 14:31:36 -04:00
George Hotz
6b35467f53
stores don't end ranges ( #12902 )
...
* early endrange
* bugfixes
2025-10-24 23:05:03 +08:00
Sieds Lykles
e1f8c82938
Onnx Layer/Group/RMS/Batch-Norm ReduceL2 fp32 intermediates for fp16 ( #12109 )
...
* match onnx spec
* use least_upper_dtype
* promote the square
* just cast before the square
2025-10-24 12:26:11 +02:00
George Hotz
0bde87d8d7
cleanups from flash attention branch ( #12897 )
2025-10-24 14:14:56 +08:00
wozeparrot
9dac505565
variable bs keccak ( #10731 )
2025-10-23 14:10:21 -07:00
Sieds Lykles
c1db62ff7c
move reduce collapse to rangeify ( #12845 )
2025-10-23 15:44:17 +02:00
George Hotz
ff68a6263b
move locals into codegen (dedup works) ( #12885 )
...
* move locals into codegen (dedup works)
* move in optimize
2025-10-23 17:07:39 +08:00
George Hotz
ddb53d1d48
PCONTIG=3 both saves ram and flops ( #12884 )
...
* PCONTIG=3 both saves ram and flops
* group
* gate locals
* should be correct
2025-10-23 16:37:26 +08:00
George Hotz
e85cee0aad
flip Ops.END srcs ( #12882 )
...
* flip Ops.END srcs
* backward
* late end split
2025-10-23 12:47:50 +08:00
George Hotz
74b4cfe44b
Ops.GROUP + range check ( #12880 )
...
* simpler
* fix that
* Ops.GROUP + range check
* fix bugs
* fix linter
* fix test
2025-10-23 12:05:21 +08:00
George Hotz
7762b3558b
clean up the spec ( #12868 )
...
* tighten up the spec
* move validate into a different file
* that moved to validate
* after(barr)
2025-10-22 19:50:42 +08:00
George Hotz
726988fa4b
late ifs try 2 ( #12865 )
...
* late ifs try 2
* fix image
* fix that test
* panic
* ptx fixups
* preserve toposort
* those pass locally
* Revert "those pass locally"
This reverts commit 063409f828 .
* no ls
* make that explicit
2025-10-22 18:49:27 +08:00
Sieds Lykles
8d0256c46b
Move gate to load for loaded index ( #12861 )
...
* change condition
* change test to better represent how the uop looks irl
2025-10-22 09:53:07 +02:00
George Hotz
92778c7a8b
rename opts to ren, add store ranges back ( #12856 )
...
* rename opts to ren
* fix docs and bring store back
2025-10-22 09:15:38 +08:00
b1tg
60d7e232f2
cuda fp8 ( #12782 )
...
* cuda fp8
* tensor core
* tc test
* clean
* clean pm
2025-10-21 15:05:25 -04:00
chenyu
8baa61bd67
use torch 2.9 and its Muon in test ( #12773 )
...
* use torch 2.9 and its Muon in test
* relax and disable
2025-10-21 13:35:17 -04:00
chenyu
f51f9aaa16
muon ns_params -> ns_coefficients ( #12850 )
...
match the official torch one
2025-10-21 12:35:52 -04:00
wozeparrot
62e7b8b870
feat: just use compile3 ( #12849 )
2025-10-21 07:56:50 -07:00
George Hotz
8960ac54f3
remove RewriteStep premature optimization ( #12840 )
...
* remove RewriteStep premature optimization
* fix ebs
* core line count
2025-10-21 21:45:20 +08:00
Sieds Lykles
7f798a9630
Cleanup const buffers ( #12829 )
...
* split pm_cleanups
* update test_schedule
* shrink when we remove bufferize
* dont do shrink if shape is empty
* update tests
* remove *1 from metadata
* deal with the noop bufferize
* only noop on cvar
* cleanup
* fix if
* rename
2025-10-21 14:53:49 +02:00
George Hotz
20a232f1c5
bugfixes from multioutput + PCONTIG=3 for fa bw memory fix ( #12837 )
...
* bugfixes from multioutput
* PCONTIG=3 fixes fa memory usage
* that's base
2025-10-21 19:21:02 +08:00
George Hotz
7d9551ce2e
move to late/control_flow.py ( #12835 )
2025-10-21 18:15:06 +08:00
George Hotz
d711a4b933
delete old linearizer ( #12834 )
...
* new linearizer with early endrange
* cleanups
* second stage removal
* not store
* do that later
* end cleanup
* fix globals
* end
* multi end
* fix ends earlier
* work
* do_merge_ends
* mini change
* range_gate
* fix cpu
* test fixups
* ranges on index
* not for ptx
* delete linearizer
* remove more junk
* delete that test
* we insert endif
* all ends
2025-10-21 17:52:18 +08:00
George Hotz
c780cd9abb
new linearizer with early endrange ( #12823 )
...
* new linearizer with early endrange
* cleanups
* second stage removal
* not store
* do that later
* end cleanup
* fix globals
* end
* multi end
* fix ends earlier
* work
* do_merge_ends
* mini change
* range_gate
* fix cpu
* test fixups
* ranges on index
* not for ptx
2025-10-21 17:37:48 +08:00