George Hotz
cdef359305
cleanups from flash attention branch
2025-10-24 13:15:31 +08:00
wozeparrot
9dac505565
variable bs keccak ( #10731 )
2025-10-23 14:10:21 -07:00
chenyu
6e4ee8deea
small heuristic cleanup [pr] ( #12892 )
2025-10-23 10:50:15 -04:00
nimlgen
f835566e27
sqtt: correct header ( #12891 )
...
* sqtt: correct header
* f
2025-10-23 22:37:17 +08:00
Sieds Lykles
c1db62ff7c
move reduce collapse to rangeify ( #12845 )
2025-10-23 15:44:17 +02:00
Sieds Lykles
04b3e51f1b
remove old reduce collapse rule ( #12889 )
...
* comment this out
* remove
2025-10-23 13:51:49 +02:00
qazal
cdfb8e31ae
hotfix: correct viz rewrite step counter reset ( #12890 )
2025-10-23 19:47:16 +08:00
George Hotz
6df19a4ac6
lil qol improvements to viz ( #12887 )
2025-10-23 18:41:07 +08:00
George Hotz
ff68a6263b
move locals into codegen (dedup works) ( #12885 )
...
* move locals into codegen (dedup works)
* move in optimize
2025-10-23 17:07:39 +08:00
George Hotz
ddb53d1d48
PCONTIG=3 both saves ram and flops ( #12884 )
...
* PCONTIG=3 both saves ram and flops
* group
* gate locals
* should be correct
2025-10-23 16:37:26 +08:00
qazal
bcc30e5e10
viz: add linearized UOp list view ( #12883 )
...
* viz: add linearized UOp list view
* lang
2025-10-23 12:52:14 +08:00
George Hotz
e85cee0aad
flip Ops.END srcs ( #12882 )
...
* flip Ops.END srcs
* backward
* late end split
2025-10-23 12:47:50 +08:00
George Hotz
74b4cfe44b
Ops.GROUP + range check ( #12880 )
...
* simpler
* fix that
* Ops.GROUP + range check
* fix bugs
* fix linter
* fix test
2025-10-23 12:05:21 +08:00
Sieds Lykles
914defd55d
give endrange priority ( #12870 )
...
* uncomment line
* try giving endrange priority
2025-10-23 05:19:13 +02:00
George Hotz
e718254004
simpler end ( #12879 )
...
* simpler
* fix that
2025-10-23 10:35:58 +08:00
wozeparrot
3a9aa05359
feat: extra nvcc options ( #12876 )
2025-10-22 13:21:11 -07:00
nimlgen
e7e535cd53
amd: sqtt for gfx9 ( #12844 )
...
* amd: start sqtt for gfx9
* writes something, but sometimes zeroes
* HEADER!
* w
* tiny
* mypy
2025-10-23 02:31:07 +08:00
b1tg
81108f91ee
amd tc: 16x16x32 ( #12874 )
...
* amd tc: 16x16x32
* test
* clean, test amd_cdna4
2025-10-22 13:48:01 -04:00
George Hotz
bf173c0a37
we don't support multi end yet ( #12869 )
2025-10-22 23:43:32 +08:00
nimlgen
a7bc0104c2
amd: clean up sqtt_stop ( #12872 )
2025-10-22 22:17:03 +08:00
nimlgen
b6eb9172ea
amd: fix ip offsets ( #12867 )
2025-10-22 20:50:18 +08:00
George Hotz
174811fc0f
hotfix: slightly looser load spec for AMD bfloat16
2025-10-22 19:55:59 +08:00
George Hotz
7762b3558b
clean up the spec ( #12868 )
...
* tighten up the spec
* move validate into a different file
* that moved to validate
* after(barr)
2025-10-22 19:50:42 +08:00
George Hotz
726988fa4b
late ifs try 2 ( #12865 )
...
* late ifs try 2
* fix image
* fix that test
* panic
* ptx fixups
* preserve toposort
* those pass locally
* Revert "those pass locally"
This reverts commit 063409f828 .
* no ls
* make that explicit
2025-10-22 18:49:27 +08:00
George Hotz
6abe90fb7c
fix linearizer non-determinism ( #12866 )
2025-10-22 17:51:35 +08:00
qazal
cebc2b5721
cleanup viz profiler metadata ui ( #12860 )
...
* cleanup viz profiler metadata ui
* text
* select over .args
* space
2025-10-22 17:31:12 +08:00
Sieds Lykles
8d0256c46b
Move gate to load for loaded index ( #12861 )
...
* change condition
* change test to better represent how the uop looks irl
2025-10-22 09:53:07 +02:00
George Hotz
92778c7a8b
rename opts to ren, add store ranges back ( #12856 )
...
* rename opts to ren
* fix docs and bring store back
2025-10-22 09:15:38 +08:00
chenyu
c5cee74706
remove BLOCK_REORDER ( #12854 )
...
not used
2025-10-21 19:10:14 -04:00
chenyu
0b673eddec
simpler newton_schulz transpose ( #12853 )
2025-10-21 17:21:45 -04:00
b1tg
60d7e232f2
cuda fp8 ( #12782 )
...
* cuda fp8
* tensor core
* tc test
* clean
* clean pm
2025-10-21 15:05:25 -04:00
wozeparrot
c3149c618a
feat: nvcc compiler ( #12852 )
2025-10-21 11:31:23 -07:00
chenyu
8baa61bd67
use torch 2.9 and its Muon in test ( #12773 )
...
* use torch 2.9 and its Muon in test
* relax and disable
2025-10-21 13:35:17 -04:00
chenyu
f51f9aaa16
muon ns_params -> ns_coefficients ( #12850 )
...
match the official torch one
2025-10-21 12:35:52 -04:00
nimlgen
c7336c3e31
amd: sqtt for aql ( #12846 )
2025-10-21 22:35:01 +08:00
George Hotz
8960ac54f3
remove RewriteStep premature optimization ( #12840 )
...
* remove RewriteStep premature optimization
* fix ebs
* core line count
2025-10-21 21:45:20 +08:00
Sieds Lykles
7f798a9630
Cleanup const buffers ( #12829 )
...
* split pm_cleanups
* update test_schedule
* shrink when we remove bufferize
* dont do shrink if shape is empty
* update tests
* remove *1 from metadata
* deal with the noop bufferize
* only noop on cvar
* cleanup
* fix if
* rename
2025-10-21 14:53:49 +02:00
nimlgen
1ad6598963
amd: trace all instructions ( #12831 )
2025-10-21 20:52:24 +08:00
Christopher Milan
cdc72556a1
no more brew ( #12839 )
2025-10-21 08:12:46 -04:00
George Hotz
20a232f1c5
bugfixes from multioutput + PCONTIG=3 for fa bw memory fix ( #12837 )
...
* bugfixes from multioutput
* PCONTIG=3 fixes fa memory usage
* that's base
2025-10-21 19:21:02 +08:00
qazal
0435d31f1c
viz: generic back button functionality ( #12838 )
2025-10-21 18:52:00 +08:00
George Hotz
7d9551ce2e
move to late/control_flow.py ( #12835 )
2025-10-21 18:15:06 +08:00
George Hotz
d711a4b933
delete old linearizer ( #12834 )
...
* new linearizer with early endrange
* cleanups
* second stage removal
* not store
* do that later
* end cleanup
* fix globals
* end
* multi end
* fix ends earlier
* work
* do_merge_ends
* mini change
* range_gate
* fix cpu
* test fixups
* ranges on index
* not for ptx
* delete linearizer
* remove more junk
* delete that test
* we insert endif
* all ends
2025-10-21 17:52:18 +08:00
qazal
40633ab34d
list buffer args to kernel in profiler ( #12826 )
...
* list buffer args to kernel in profiler
* stable order
* back button works
* deselect also works
2025-10-21 17:51:36 +08:00
George Hotz
c780cd9abb
new linearizer with early endrange ( #12823 )
...
* new linearizer with early endrange
* cleanups
* second stage removal
* not store
* do that later
* end cleanup
* fix globals
* end
* multi end
* fix ends earlier
* work
* do_merge_ends
* mini change
* range_gate
* fix cpu
* test fixups
* ranges on index
* not for ptx
2025-10-21 17:37:48 +08:00
qazal
32af1ff84b
viz graph drawing small cleanups ( #12830 )
...
* viz graph drawing small cleanups
* str literal
2025-10-21 15:51:32 +08:00
Sieds Lykles
367fbabc30
remove Ops.SUBSTITUTE ( #12827 )
...
* remove Ops.SUBSTITUTE
* remove from viz
2025-10-21 08:19:42 +02:00
qazal
57f6b6f229
style view codegen like a link in profiler ( #12825 )
2025-10-21 12:15:13 +08:00
qazal
154cdfe46d
viz state cleanups ( #12821 )
...
* viz state cleanups
* more generic
2025-10-21 11:44:51 +08:00
George Hotz
a71a41f6d1
rename Ops.ENDRANGE -> Ops.END ( #12824 )
2025-10-21 11:32:18 +08:00