Commit Graph

4562 Commits

Author SHA1 Message Date
George Hotz
cdef359305 cleanups from flash attention branch 2025-10-24 13:15:31 +08:00
wozeparrot
9dac505565 variable bs keccak (#10731) 2025-10-23 14:10:21 -07:00
Sieds Lykles
c1db62ff7c move reduce collapse to rangeify (#12845) 2025-10-23 15:44:17 +02:00
George Hotz
ff68a6263b move locals into codegen (dedup works) (#12885)
* move locals into codegen (dedup works)

* move in optimize
2025-10-23 17:07:39 +08:00
George Hotz
ddb53d1d48 PCONTIG=3 both saves ram and flops (#12884)
* PCONTIG=3 both saves ram and flops

* group

* gate locals

* should be correct
2025-10-23 16:37:26 +08:00
George Hotz
e85cee0aad flip Ops.END srcs (#12882)
* flip Ops.END srcs

* backward

* late end split
2025-10-23 12:47:50 +08:00
George Hotz
74b4cfe44b Ops.GROUP + range check (#12880)
* simpler

* fix that

* Ops.GROUP + range check

* fix bugs

* fix linter

* fix test
2025-10-23 12:05:21 +08:00
George Hotz
7762b3558b clean up the spec (#12868)
* tighten up the spec

* move validate into a different file

* that moved to validate

* after(barr)
2025-10-22 19:50:42 +08:00
George Hotz
726988fa4b late ifs try 2 (#12865)
* late ifs try 2

* fix image

* fix that test

* panic

* ptx fixups

* preserve toposort

* those pass locally

* Revert "those pass locally"

This reverts commit 063409f828.

* no ls

* make that explicit
2025-10-22 18:49:27 +08:00
Sieds Lykles
8d0256c46b Move gate to load for loaded index (#12861)
* change condition

* change test to better represent how the uop looks irl
2025-10-22 09:53:07 +02:00
George Hotz
92778c7a8b rename opts to ren, add store ranges back (#12856)
* rename opts to ren

* fix docs and bring store back
2025-10-22 09:15:38 +08:00
b1tg
60d7e232f2 cuda fp8 (#12782)
* cuda fp8

* tensor core

* tc test

* clean

* clean pm
2025-10-21 15:05:25 -04:00
chenyu
8baa61bd67 use torch 2.9 and its Muon in test (#12773)
* use torch 2.9 and its Muon in test

* relax and disable
2025-10-21 13:35:17 -04:00
chenyu
f51f9aaa16 muon ns_params -> ns_coefficients (#12850)
match the official torch one
2025-10-21 12:35:52 -04:00
wozeparrot
62e7b8b870 feat: just use compile3 (#12849) 2025-10-21 07:56:50 -07:00
George Hotz
8960ac54f3 remove RewriteStep premature optimization (#12840)
* remove RewriteStep premature optimization

* fix ebs

* core line count
2025-10-21 21:45:20 +08:00
Sieds Lykles
7f798a9630 Cleanup const buffers (#12829)
* split pm_cleanups

* update test_schedule

* shrink when we remove bufferize

* dont do shrink if shape is empty

* update tests

* remove *1 from metadata

* deal with the noop bufferize

* only noop on cvar

* cleanup

* fix if

* rename
2025-10-21 14:53:49 +02:00
George Hotz
20a232f1c5 bugfixes from multioutput + PCONTIG=3 for fa bw memory fix (#12837)
* bugfixes from multioutput

* PCONTIG=3 fixes fa memory usage

* that's base
2025-10-21 19:21:02 +08:00
George Hotz
7d9551ce2e move to late/control_flow.py (#12835) 2025-10-21 18:15:06 +08:00
George Hotz
d711a4b933 delete old linearizer (#12834)
* new linearizer with early endrange

* cleanups

* second stage removal

* not store

* do that later

* end cleanup

* fix globals

* end

* multi end

* fix ends earlier

* work

* do_merge_ends

* mini change

* range_gate

* fix cpu

* test fixups

* ranges on index

* not for ptx

* delete linearizer

* remove more junk

* delete that test

* we insert endif

* all ends
2025-10-21 17:52:18 +08:00
George Hotz
c780cd9abb new linearizer with early endrange (#12823)
* new linearizer with early endrange

* cleanups

* second stage removal

* not store

* do that later

* end cleanup

* fix globals

* end

* multi end

* fix ends earlier

* work

* do_merge_ends

* mini change

* range_gate

* fix cpu

* test fixups

* ranges on index

* not for ptx
2025-10-21 17:37:48 +08:00
George Hotz
d59d4cdbe4 lil less is okay 2025-10-21 17:09:44 +08:00
qazal
32af1ff84b viz graph drawing small cleanups (#12830)
* viz graph drawing small cleanups

* str literal
2025-10-21 15:51:32 +08:00
George Hotz
a71a41f6d1 rename Ops.ENDRANGE -> Ops.END (#12824) 2025-10-21 11:32:18 +08:00
George Hotz
203a93363c Revert "after clean up of locals (#12813)" (#12814)
This reverts commit 5d0d3d7aac.
2025-10-20 19:33:35 +08:00
George Hotz
5d0d3d7aac after clean up of locals (#12813) 2025-10-20 19:24:24 +08:00
Sieds Lykles
a8e4614436 remove REAL_SUBSTITUTE=0 and make it fast (#12809)
* fast REAL_substitute

* remove REAL_SUBSTITUTE=0
2025-10-20 12:44:20 +02:00
George Hotz
2e9082e0bc after op (#12801)
* after op

* fix tests
2025-10-20 12:27:56 +08:00
George Hotz
ba593f7b98 don't render index (#12796)
* don't render index

* update to ignore_indexing

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-10-20 09:48:36 +08:00
chenyu
63a23dfe80 test step 0 in TestTrainingOnnxOps (#12790)
and tighter rtol
2025-10-19 09:15:49 -04:00
chenyu
e8158afd4b update test_qlinear_add_round_half_to_even (#12789)
this does not pass locally
2025-10-19 08:47:27 -04:00
Sieds Lykles
fd6ef4801c rangeify uses symbolic_flat (#12786)
* symbolic_simple -> symbolic_flat

* remove expected failures
2025-10-19 12:27:14 +02:00
qazal
c8ef4b60f6 viz: share match tracing and TINY device profiler (#12783)
* set a default name for the traces

* set profile_matches + renames

* profile_matches test

* traces 4 steps total
2025-10-19 14:30:07 +08:00
chenyu
30ff84d050 update test_conv2d_ceildiv_edge_case (#12779) 2025-10-18 16:43:32 -04:00
nimlgen
442218266d qcom: fix profiler (#12778)
* qcom: fix profiler

* this way
2025-10-19 01:27:59 +08:00
wozeparrot
82f10cfe2e feat: assert on bufferview math (#12772) 2025-10-17 14:20:08 -07:00
chenyu
fcdf4ab37e remove a contiguous in LARS (#12770) 2025-10-17 17:07:30 -04:00
George Hotz
062a6d68d7 test flash attention backward (#12762)
* test flash attention backward

* TODO: fix pcontig

* end ranges

* render colors

* very big

* multiout at every level

* reset ending ranges

* fix tests

* ugh
2025-10-17 23:15:59 +08:00
George Hotz
c9a3464f76 those decimals never mattered (#12760)
* those decimals never mattered

* this

* improve debug

* real substitute fixes pcontig

* locals are different buffers
2025-10-17 17:16:24 +08:00
qazal
0160f034d6 viz: show display name for copy runners (#12761)
* viz: show display name for copy runners

* more u32
2025-10-17 16:59:51 +08:00
qazal
253d32b065 viz: add metadata to buffer user list (#12758)
* simple failing test

* encodings

* test passing

* key is deduped
2025-10-17 16:28:54 +08:00
George Hotz
935a60db72 bring back partial contig and flash attention (#12756)
* bring back partial contig and flash attention

* why not 2

* work

* that

* fix pcontig
2025-10-17 16:19:05 +08:00
qazal
dfb8f9fc9e viz: annotate buffer mutability in the memory graph (#12750) 2025-10-17 11:53:02 +08:00
chenyu
9561803cb0 fix assert in test_schedule (#12745)
* fix assert in test_schedule

updated kernel counts and some old tests

* fix
2025-10-16 15:39:50 -04:00
chenyu
285534ce64 delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744)
does nothing now
2025-10-16 14:11:33 -04:00
chenyu
98239f1156 few shapetracker cleanups (#12741) 2025-10-16 12:43:27 -04:00
George Hotz
8be7844b2e use apply uop for assign to fix assign metadata (#12732)
* use apply uop for assign

* fix metadata for assign

* fix backward metadata

* those aren't real tests
2025-10-16 20:34:12 +08:00
qazal
533f18b22c viz: add trace data for inflight buffers (#12728)
* viz: add trace data for inflight buffers

* add test_inflight_buf

* temp stores the keys

* update tests / use Tensor.ones
2025-10-16 19:15:03 +08:00
George Hotz
af4479c169 faster stable diffusion load (#12725)
* faster stable diffusion load

* failing tests
2025-10-16 18:31:59 +08:00
George Hotz
1d1e1d9d88 delete the ShapeTracker (#12720)
* delete the ShapeTracker

* fix tests

* fix more

* fix gc test
2025-10-16 15:36:22 +08:00