Commit Graph

10644 Commits

Author SHA1 Message Date
George Hotz
206b46687b locals are different buffers 2025-10-17 17:04:29 +08:00
George Hotz
78b2d76e3b real substitute fixes pcontig 2025-10-17 17:01:26 +08:00
George Hotz
4f7005f72a improve debug 2025-10-17 16:45:54 +08:00
George Hotz
c2af5c806b this 2025-10-17 16:35:25 +08:00
George Hotz
8d35780e1a those decimals never mattered 2025-10-17 16:28:36 +08:00
George Hotz
935a60db72 bring back partial contig and flash attention (#12756)
* bring back partial contig and flash attention

* why not 2

* work

* that

* fix pcontig
2025-10-17 16:19:05 +08:00
Sieds Lykles
f6bc620169 UOp.prod and UOp.sum methods (#12755) 2025-10-17 10:02:01 +02:00
Sieds Lykles
d1bb5c0426 slightly flatter symbolic (#12757) 2025-10-17 09:58:45 +02:00
qazal
5417e4b099 viz helper cleanups (#12754) 2025-10-17 15:20:24 +08:00
qazal
3196a7aae3 viz: pre reqs for lighting up programs (#12753) 2025-10-17 15:03:21 +08:00
qazal
dfb8f9fc9e viz: annotate buffer mutability in the memory graph (#12750) 2025-10-17 11:53:02 +08:00
Sieds Lykles
79c2f1ae26 remove reduce_rangless and replace with reduce_unparented (#12749) 2025-10-17 04:46:05 +02:00
chenyu
9561803cb0 fix assert in test_schedule (#12745)
* fix assert in test_schedule

updated kernel counts and some old tests

* fix
2025-10-16 15:39:50 -04:00
chenyu
285534ce64 delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744)
does nothing now
2025-10-16 14:11:33 -04:00
chenyu
98239f1156 few shapetracker cleanups (#12741) 2025-10-16 12:43:27 -04:00
chenyu
53478c741d relax ASSERT_MIN_STEP_TIME for space lab policy (#12742) 2025-10-16 11:40:36 -04:00
geohotstan
5d209ee7ec onnx helper intermediate node output validation (#12740)
* start

* update comments

* good

* add comments and better printing

* done
2025-10-16 11:17:47 -04:00
Christopher Milan
bce2bc0465 Revert "use RTLD_GLOBAL on macos" (#12738)
This reverts commit 89fe3e574d.
2025-10-16 10:07:21 -04:00
chenyu
f34f26bca0 fix gpt2 with benchmark (#12736)
`CPU=1 python3 examples/gpt2.py --benchmark 128` works now
2025-10-16 09:55:20 -04:00
Sieds Lykles
55db1b0e0e reduce where that is cut from two sides (#12733)
* better rule

* correct pattern

* shorten line
2025-10-16 15:25:15 +02:00
nimlgen
cf9baeea61 Revert "nv: check if jitlink is avail (#12731)" (#12735)
This reverts commit a069a45d14.
2025-10-16 20:41:49 +08:00
George Hotz
8be7844b2e use apply uop for assign to fix assign metadata (#12732)
* use apply uop for assign

* fix metadata for assign

* fix backward metadata

* those aren't real tests
2025-10-16 20:34:12 +08:00
nimlgen
3aa2277b8f nv: usb4 (#12696)
* hackish

* prog

* match

* l

* simpler

* refactor

* not osx

* apple things

* tiny changes

* fix mask

* match fix

* nn
2025-10-16 20:11:19 +08:00
nimlgen
a069a45d14 nv: check if jitlink is avail (#12731) 2025-10-16 19:58:50 +08:00
George Hotz
a498ec9c18 cleanup names of postrange + fast FUSE_OPTIM (#12730)
* cleanup names of postrange

* make FUSE_OPTIM not slow

* delete junk in def r
2025-10-16 19:38:31 +08:00
Sieds Lykles
8f740e07ff no broadcasting/vectors in reduce collapse (#12729) 2025-10-16 13:22:57 +02:00
qazal
533f18b22c viz: add trace data for inflight buffers (#12728)
* viz: add trace data for inflight buffers

* add test_inflight_buf

* temp stores the keys

* update tests / use Tensor.ones
2025-10-16 19:15:03 +08:00
George Hotz
af4479c169 faster stable diffusion load (#12725)
* faster stable diffusion load

* failing tests
2025-10-16 18:31:59 +08:00
nimlgen
e7c057d5dc system: alloc_sysmem return view (#12724)
* system: alloc_sysmem return view

* e
2025-10-16 17:55:01 +08:00
nimlgen
b86a33a312 ptx: support bw (#12722) 2025-10-16 15:38:08 +08:00
nimlgen
b8cd66c7a2 nv: support all gb20x and small bar (#12721) 2025-10-16 15:37:54 +08:00
George Hotz
1d1e1d9d88 delete the ShapeTracker (#12720)
* delete the ShapeTracker

* fix tests

* fix more

* fix gc test
2025-10-16 15:36:22 +08:00
George Hotz
592e86f6f5 remove UOp.st (#12716)
* remove UOp.st

* fix tests

* torch backend disable
2025-10-16 14:44:09 +08:00
wozeparrot
cc2dfe22f5 tinyfs: fetch file utility (#12719) 2025-10-15 23:38:56 -07:00
nimlgen
3ed543f956 system: reorder funcs + barrier on macos (#12714) 2025-10-16 14:38:01 +08:00
qazal
b77bdbbc62 viz: count unpickle in server startup time (#12715)
* viz: count unpickle in server startup time

* type checking
2025-10-16 13:07:46 +08:00
George Hotz
7c19db00f1 remove st from jit/split_reduceop (#12713)
* remove st from jit

* fix by merging reshapes

* no st usage in rangeify

* hmm, stop early works

* fix speed regressions
2025-10-16 12:50:58 +08:00
qazal
069177c1be trace buffer producer and consumers (#12639)
* trace buffer producer and consumers

* work

* generic colored util

* fix batched

* basic clicking works

* generic javascript that works for producer and consumers

* keep focused shape

* idle time

* timings for producer and consumers dedup

* from sd test

* tiny cleanups

* timeline

* work

* up to here

* assert

* list it

* work
2025-10-16 11:11:31 +08:00
George Hotz
4a151e7533 make xcode signing happy, waiting for entitlement (#12712) 2025-10-16 10:20:34 +08:00
chenyu
c3278e5622 clean up old tests (#12708) 2025-10-15 17:53:17 -04:00
chenyu
b8cf35fb77 print macOS version in CI (#12705) 2025-10-15 15:05:33 -04:00
Daniel
d65bd669f8 update tiny torch backend hook (#12575)
* update the backend to fix torch deprecation warning

* use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients

* fix indentation

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-15 14:02:33 -04:00
nimlgen
db5ae846aa nv: do not use va_addr for cpu accesses (#12697)
* nv: do not use va_addr for cpu accesses

* mypy
2025-10-15 22:48:12 +08:00
nimlgen
3ab23af829 nv: copy prog with copyin (#12701)
* nv: copy prog with copyin

* to bytes

* fix test
2025-10-15 22:48:01 +08:00
nimlgen
fafbf3daea memory: reserve ptable (#12702) 2025-10-15 22:47:50 +08:00
George Hotz
85a907605c hotfix: only 20 steps of beautiful_mnist_torch, some CI machines are slow 2025-10-15 22:29:34 +08:00
Christopher Milan
e1996d358c use RTLD_GLOBAL on macos (#12699) 2025-10-15 22:24:50 +08:00
chenyu
312c622d35 support None in pad_to and shrink_to (#12700) 2025-10-15 09:25:31 -04:00
George Hotz
612e3d6143 replace mop arg with vectorized index (#12695)
* replace mop arg with vectorized index

* tests passing

* better viz

* no compile4
2025-10-15 20:50:06 +08:00
wozeparrot
9ec4c06d7d feat: one request per device (#12698) 2025-10-15 05:22:07 -07:00