10655 Commits

Author SHA1 Message Date
George Hotz
70a1126830 ugh 2025-10-17 23:14:36 +08:00
George Hotz
b22790eb0f fix tests 2025-10-17 23:05:01 +08:00
George Hotz
dad778564c reset ending ranges 2025-10-17 22:52:43 +08:00
George Hotz
c5617ed8cf Merge branch 'master' into test_fa 2025-10-17 22:41:31 +08:00
George Hotz
33025b99f6 small changes from fa backward (#12769) 2025-10-17 22:41:18 +08:00
chenyu
e0d0d4372d fix shape of m and v in onnx Adam with FUSE_OPTIM (#12768)
value is still slightly off but that's not onnx specific
2025-10-17 10:32:41 -04:00
qazal
bd662bea67 viz: light up program runs (#12764)
* basics work

* fix the color

* light up program events

* swap a with p

* better
2025-10-17 19:33:18 +08:00
George Hotz
28efb4395c multiout at every level 2025-10-17 19:32:38 +08:00
George Hotz
bc9048ccca very big 2025-10-17 19:11:19 +08:00
George Hotz
7c80285fa8 render colors 2025-10-17 18:58:44 +08:00
George Hotz
05f69b48e9 end ranges 2025-10-17 18:25:31 +08:00
George Hotz
5fa053a5ee TODO: fix pcontig 2025-10-17 17:51:36 +08:00
George Hotz
eb5070786a test flash attention backward 2025-10-17 17:28:59 +08:00
George Hotz
c9a3464f76 those decimals never mattered (#12760)
* those decimals never mattered

* this

* improve debug

* real substitute fixes pcontig

* locals are different buffers
2025-10-17 17:16:24 +08:00
qazal
0160f034d6 viz: show display name for copy runners (#12761)
* viz: show display name for copy runners

* more u32
2025-10-17 16:59:51 +08:00
qazal
253d32b065 viz: add metadata to buffer user list (#12758)
* simple failing test

* encodings

* test passing

* key is deduped
2025-10-17 16:28:54 +08:00
George Hotz
935a60db72 bring back partial contig and flash attention (#12756)
* bring back partial contig and flash attention

* why not 2

* work

* that

* fix pcontig
2025-10-17 16:19:05 +08:00
Sieds Lykles
f6bc620169 UOp.prod and UOp.sum methods (#12755) 2025-10-17 10:02:01 +02:00
Sieds Lykles
d1bb5c0426 slightly flatter symbolic (#12757) 2025-10-17 09:58:45 +02:00
qazal
5417e4b099 viz helper cleanups (#12754) 2025-10-17 15:20:24 +08:00
qazal
3196a7aae3 viz: pre reqs for lighting up programs (#12753) 2025-10-17 15:03:21 +08:00
qazal
dfb8f9fc9e viz: annotate buffer mutability in the memory graph (#12750) 2025-10-17 11:53:02 +08:00
Sieds Lykles
79c2f1ae26 remove reduce_rangless and replace with reduce_unparented (#12749) 2025-10-17 04:46:05 +02:00
chenyu
9561803cb0 fix assert in test_schedule (#12745)
* fix assert in test_schedule

updated kernel counts and some old tests

* fix
2025-10-16 15:39:50 -04:00
chenyu
285534ce64 delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744)
does nothing now
2025-10-16 14:11:33 -04:00
chenyu
98239f1156 few shapetracker cleanups (#12741) 2025-10-16 12:43:27 -04:00
chenyu
53478c741d relax ASSERT_MIN_STEP_TIME for space lab policy (#12742) 2025-10-16 11:40:36 -04:00
geohotstan
5d209ee7ec onnx helper intermediate node output validation (#12740)
* start

* update comments

* good

* add comments and better printing

* done
2025-10-16 11:17:47 -04:00
Christopher Milan
bce2bc0465 Revert "use RTLD_GLOBAL on macos" (#12738)
This reverts commit 89fe3e574d.
2025-10-16 10:07:21 -04:00
chenyu
f34f26bca0 fix gpt2 with benchmark (#12736)
`CPU=1 python3 examples/gpt2.py --benchmark 128` works now
2025-10-16 09:55:20 -04:00
Sieds Lykles
55db1b0e0e reduce where that is cut from two sides (#12733)
* better rule

* correct pattern

* shorten line
2025-10-16 15:25:15 +02:00
nimlgen
cf9baeea61 Revert "nv: check if jitlink is avail (#12731)" (#12735)
This reverts commit a069a45d14.
2025-10-16 20:41:49 +08:00
George Hotz
8be7844b2e use apply uop for assign to fix assign metadata (#12732)
* use apply uop for assign

* fix metadata for assign

* fix backward metadata

* those aren't real tests
2025-10-16 20:34:12 +08:00
nimlgen
3aa2277b8f nv: usb4 (#12696)
* hackish

* prog

* match

* l

* simpler

* refactor

* not osx

* apple things

* tiny changes

* fix mask

* match fix

* nn
2025-10-16 20:11:19 +08:00
nimlgen
a069a45d14 nv: check if jitlink is avail (#12731) 2025-10-16 19:58:50 +08:00
George Hotz
a498ec9c18 cleanup names of postrange + fast FUSE_OPTIM (#12730)
* cleanup names of postrange

* make FUSE_OPTIM not slow

* delete junk in def r
2025-10-16 19:38:31 +08:00
Sieds Lykles
8f740e07ff no broadcasting/vectors in reduce collapse (#12729) 2025-10-16 13:22:57 +02:00
qazal
533f18b22c viz: add trace data for inflight buffers (#12728)
* viz: add trace data for inflight buffers

* add test_inflight_buf

* temp stores the keys

* update tests / use Tensor.ones
2025-10-16 19:15:03 +08:00
George Hotz
af4479c169 faster stable diffusion load (#12725)
* faster stable diffusion load

* failing tests
2025-10-16 18:31:59 +08:00
nimlgen
e7c057d5dc system: alloc_sysmem return view (#12724)
* system: alloc_sysmem return view

* e
2025-10-16 17:55:01 +08:00
nimlgen
b86a33a312 ptx: support bw (#12722) 2025-10-16 15:38:08 +08:00
nimlgen
b8cd66c7a2 nv: support all gb20x and small bar (#12721) 2025-10-16 15:37:54 +08:00
George Hotz
1d1e1d9d88 delete the ShapeTracker (#12720)
* delete the ShapeTracker

* fix tests

* fix more

* fix gc test
2025-10-16 15:36:22 +08:00
George Hotz
592e86f6f5 remove UOp.st (#12716)
* remove UOp.st

* fix tests

* torch backend disable
2025-10-16 14:44:09 +08:00
wozeparrot
cc2dfe22f5 tinyfs: fetch file utility (#12719) 2025-10-15 23:38:56 -07:00
nimlgen
3ed543f956 system: reorder funcs + barrier on macos (#12714) 2025-10-16 14:38:01 +08:00
qazal
b77bdbbc62 viz: count unpickle in server startup time (#12715)
* viz: count unpickle in server startup time

* type checking
2025-10-16 13:07:46 +08:00
George Hotz
7c19db00f1 remove st from jit/split_reduceop (#12713)
* remove st from jit

* fix by merging reshapes

* no st usage in rangeify

* hmm, stop early works

* fix speed regressions
2025-10-16 12:50:58 +08:00
qazal
069177c1be trace buffer producer and consumers (#12639)
* trace buffer producer and consumers

* work

* generic colored util

* fix batched

* basic clicking works

* generic javascript that works for producer and consumers

* keep focused shape

* idle time

* timings for producer and consumers dedup

* from sd test

* tiny cleanups

* timeline

* work

* up to here

* assert

* list it

* work
2025-10-16 11:11:31 +08:00
George Hotz
4a151e7533 make xcode signing happy, waiting for entitlement (#12712) 2025-10-16 10:20:34 +08:00