George Hotz
206b46687b
locals are different buffers
2025-10-17 17:04:29 +08:00
George Hotz
78b2d76e3b
real substitute fixes pcontig
2025-10-17 17:01:26 +08:00
George Hotz
4f7005f72a
improve debug
2025-10-17 16:45:54 +08:00
George Hotz
c2af5c806b
this
2025-10-17 16:35:25 +08:00
George Hotz
8d35780e1a
those decimals never mattered
2025-10-17 16:28:36 +08:00
George Hotz
935a60db72
bring back partial contig and flash attention ( #12756 )
...
* bring back partial contig and flash attention
* why not 2
* work
* that
* fix pcontig
2025-10-17 16:19:05 +08:00
Sieds Lykles
f6bc620169
UOp.prod and UOp.sum methods ( #12755 )
2025-10-17 10:02:01 +02:00
Sieds Lykles
d1bb5c0426
slightly flatter symbolic ( #12757 )
2025-10-17 09:58:45 +02:00
qazal
5417e4b099
viz helper cleanups ( #12754 )
2025-10-17 15:20:24 +08:00
qazal
3196a7aae3
viz: pre reqs for lighting up programs ( #12753 )
2025-10-17 15:03:21 +08:00
qazal
dfb8f9fc9e
viz: annotate buffer mutability in the memory graph ( #12750 )
2025-10-17 11:53:02 +08:00
Sieds Lykles
79c2f1ae26
remove reduce_rangless and replace with reduce_unparented ( #12749 )
2025-10-17 04:46:05 +02:00
chenyu
9561803cb0
fix assert in test_schedule ( #12745 )
...
* fix assert in test_schedule
updated kernel counts and some old tests
* fix
2025-10-16 15:39:50 -04:00
chenyu
285534ce64
delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES ( #12744 )
...
does nothing now
2025-10-16 14:11:33 -04:00
chenyu
98239f1156
few shapetracker cleanups ( #12741 )
2025-10-16 12:43:27 -04:00
chenyu
53478c741d
relax ASSERT_MIN_STEP_TIME for space lab policy ( #12742 )
2025-10-16 11:40:36 -04:00
geohotstan
5d209ee7ec
onnx helper intermediate node output validation ( #12740 )
...
* start
* update comments
* good
* add comments and better printing
* done
2025-10-16 11:17:47 -04:00
Christopher Milan
bce2bc0465
Revert "use RTLD_GLOBAL on macos" ( #12738 )
...
This reverts commit 89fe3e574d .
2025-10-16 10:07:21 -04:00
chenyu
f34f26bca0
fix gpt2 with benchmark ( #12736 )
...
`CPU=1 python3 examples/gpt2.py --benchmark 128` works now
2025-10-16 09:55:20 -04:00
Sieds Lykles
55db1b0e0e
reduce where that is cut from two sides ( #12733 )
...
* better rule
* correct pattern
* shorten line
2025-10-16 15:25:15 +02:00
nimlgen
cf9baeea61
Revert "nv: check if jitlink is avail ( #12731 )" ( #12735 )
...
This reverts commit a069a45d14 .
2025-10-16 20:41:49 +08:00
George Hotz
8be7844b2e
use apply uop for assign to fix assign metadata ( #12732 )
...
* use apply uop for assign
* fix metadata for assign
* fix backward metadata
* those aren't real tests
2025-10-16 20:34:12 +08:00
nimlgen
3aa2277b8f
nv: usb4 ( #12696 )
...
* hackish
* prog
* match
* l
* simpler
* refactor
* not osx
* apple things
* tiny changes
* fix mask
* match fix
* nn
2025-10-16 20:11:19 +08:00
nimlgen
a069a45d14
nv: check if jitlink is avail ( #12731 )
2025-10-16 19:58:50 +08:00
George Hotz
a498ec9c18
cleanup names of postrange + fast FUSE_OPTIM ( #12730 )
...
* cleanup names of postrange
* make FUSE_OPTIM not slow
* delete junk in def r
2025-10-16 19:38:31 +08:00
Sieds Lykles
8f740e07ff
no broadcasting/vectors in reduce collapse ( #12729 )
2025-10-16 13:22:57 +02:00
qazal
533f18b22c
viz: add trace data for inflight buffers ( #12728 )
...
* viz: add trace data for inflight buffers
* add test_inflight_buf
* temp stores the keys
* update tests / use Tensor.ones
2025-10-16 19:15:03 +08:00
George Hotz
af4479c169
faster stable diffusion load ( #12725 )
...
* faster stable diffusion load
* failing tests
2025-10-16 18:31:59 +08:00
nimlgen
e7c057d5dc
system: alloc_sysmem return view ( #12724 )
...
* system: alloc_sysmem return view
* e
2025-10-16 17:55:01 +08:00
nimlgen
b86a33a312
ptx: support bw ( #12722 )
2025-10-16 15:38:08 +08:00
nimlgen
b8cd66c7a2
nv: support all gb20x and small bar ( #12721 )
2025-10-16 15:37:54 +08:00
George Hotz
1d1e1d9d88
delete the ShapeTracker ( #12720 )
...
* delete the ShapeTracker
* fix tests
* fix more
* fix gc test
2025-10-16 15:36:22 +08:00
George Hotz
592e86f6f5
remove UOp.st ( #12716 )
...
* remove UOp.st
* fix tests
* torch backend disable
2025-10-16 14:44:09 +08:00
wozeparrot
cc2dfe22f5
tinyfs: fetch file utility ( #12719 )
2025-10-15 23:38:56 -07:00
nimlgen
3ed543f956
system: reorder funcs + barrier on macos ( #12714 )
2025-10-16 14:38:01 +08:00
qazal
b77bdbbc62
viz: count unpickle in server startup time ( #12715 )
...
* viz: count unpickle in server startup time
* type checking
2025-10-16 13:07:46 +08:00
George Hotz
7c19db00f1
remove st from jit/split_reduceop ( #12713 )
...
* remove st from jit
* fix by merging reshapes
* no st usage in rangeify
* hmm, stop early works
* fix speed regressions
2025-10-16 12:50:58 +08:00
qazal
069177c1be
trace buffer producer and consumers ( #12639 )
...
* trace buffer producer and consumers
* work
* generic colored util
* fix batched
* basic clicking works
* generic javascript that works for producer and consumers
* keep focused shape
* idle time
* timings for producer and consumers dedup
* from sd test
* tiny cleanups
* timeline
* work
* up to here
* assert
* list it
* work
2025-10-16 11:11:31 +08:00
George Hotz
4a151e7533
make xcode signing happy, waiting for entitlement ( #12712 )
2025-10-16 10:20:34 +08:00
chenyu
c3278e5622
clean up old tests ( #12708 )
2025-10-15 17:53:17 -04:00
chenyu
b8cf35fb77
print macOS version in CI ( #12705 )
2025-10-15 15:05:33 -04:00
Daniel
d65bd669f8
update tiny torch backend hook ( #12575 )
...
* update the backend to fix torch deprecation warning
* use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients
* fix indentation
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-10-15 14:02:33 -04:00
nimlgen
db5ae846aa
nv: do not use va_addr for cpu accesses ( #12697 )
...
* nv: do not use va_addr for cpu accesses
* mypy
2025-10-15 22:48:12 +08:00
nimlgen
3ab23af829
nv: copy prog with copyin ( #12701 )
...
* nv: copy prog with copyin
* to bytes
* fix test
2025-10-15 22:48:01 +08:00
nimlgen
fafbf3daea
memory: reserve ptable ( #12702 )
2025-10-15 22:47:50 +08:00
George Hotz
85a907605c
hotfix: only 20 steps of beautiful_mnist_torch, some CI machines are slow
2025-10-15 22:29:34 +08:00
Christopher Milan
e1996d358c
use RTLD_GLOBAL on macos ( #12699 )
2025-10-15 22:24:50 +08:00
chenyu
312c622d35
support None in pad_to and shrink_to ( #12700 )
2025-10-15 09:25:31 -04:00
George Hotz
612e3d6143
replace mop arg with vectorized index ( #12695 )
...
* replace mop arg with vectorized index
* tests passing
* better viz
* no compile4
2025-10-15 20:50:06 +08:00
wozeparrot
9ec4c06d7d
feat: one request per device ( #12698 )
2025-10-15 05:22:07 -07:00