Christopher Milan
4a2fc7ecbb
autogen: cache downloads ( #14997 )
2026-02-25 01:34:27 -05:00
George Hotz
e3fa9896b7
start function and add walk rewrite ( #14992 )
...
* start function and add walk rewrite
* work
* add function on feed_forward
* llm progress
* stuff
* none of that
2026-02-25 13:56:27 +08:00
chenyu
fde7a40bb0
allow dtype mismatched assign on disk ( #14993 )
...
reverted #14473 , that was a bad idea. also added a test that safe_save only has copy
2026-02-24 20:49:55 -05:00
chenyu
46d9a9a74f
minor indexing cleanups [pr] ( #14991 )
2026-02-24 16:49:35 -05:00
chenyu
8dae9be573
move realize_map fixup into realize_assign_src [pr] ( #14990 )
2026-02-24 15:51:40 -05:00
chenyu
9d9151a21e
remove const normalization in indexing [pr] ( #14989 )
...
rangeify can create const with device, and all is normalized in to_define_global
2026-02-24 15:09:11 -05:00
chenyu
f68a472244
end range for COPY/BUFFER_VIEW [pr] ( #14987 )
2026-02-24 13:33:35 -05:00
chenyu
e5d27a3773
remove BUFFER_VIEW from ended_ranges special case [pr] ( #14986 )
...
* remove BUFFER_VIEW from ended_ranges special case [pr]
* will fix later
2026-02-24 10:37:29 -05:00
chenyu
5fd4fc0c6d
fix tinyfs ( #14974 )
...
* fix tinyfs
* fix that
2026-02-24 08:50:53 -05:00
George Hotz
8a6dffc87e
Tensor.callify will be the JIT ( #14983 )
...
* close
* simple callify, support linear in the scheduler
* all tests pass
* everyone is happy
* dumb test
* Remove unnecessary blank line in rangeify.py
2026-02-24 18:42:24 +08:00
nimlgen
6f1cb6be86
am: tiny err handling cleanups ( #14981 )
...
* am: tiny err handling cleanups
* x
* x
2026-02-24 12:43:45 +03:00
George Hotz
b643fca51e
clean up complete_create_schedule_with_vars ( #14980 )
...
* clean up complete_create_schedule_with_vars
* transform_to_call
* update viz tests
2026-02-24 16:12:36 +08:00
wozeparrot
8d9545e09e
llama3: correctly shard wqkv ( #14978 )
2026-02-23 23:57:10 -08:00
wozeparrot
a36a26d4ed
llama3: optim does grad acc in correct order ( #14965 )
2026-02-23 22:25:13 -08:00
George Hotz
e2b1f2620d
schedule is linear ( #14975 )
...
* schedule is linear
* cleanup
* cleanups
2026-02-24 11:30:41 +08:00
Christopher Milan
57ade7608a
consider indexing math cost for IMAGE=1 ( #14973 )
2026-02-23 18:57:45 -05:00
chenyu
0bda5585c7
unit test TestTinyFS ( #14972 )
...
these passed before the allocation change
2026-02-23 16:59:39 -05:00
imaolo
405d37423e
call release() in MetalAllocator._free ( #14970 )
...
* add failing test
* call MTLBuffer.release() in MetalAllocator._free()
* Update test_metal.py
---------
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2026-02-23 23:33:31 +03:00
nimlgen
77db8e1c07
cpu: wait on dep signals ( #14862 )
...
* cpu: task_done() in case of failures
* print
* fix
* x
* f
* x
* um
* ?
* u
* f
* x
* gh
* f
* f
* virt
* x
* simpler
2026-02-23 21:09:41 +03:00
chenyu
127136421d
enable a few WEBGPU isnan tests that work now ( #14967 )
...
* enable a few WEBGPU isnan tests that work now
* still failed
2026-02-23 11:06:08 -05:00
ttomsa
0366474089
Bool cast to cmpne ( #14544 )
...
* test
* rm in llvmir
* rm in ptx and nir
* hmmmm
* rm in decompositions
* skip tests
* add test
* just this
* rm comment
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-02-23 10:31:36 -05:00
George Hotz
806581f807
rename rewrites + sink filter + bump to dagre 2.0.0 ( #14966 )
...
* bump to dagre 2.0.0
* transform to call
* cleanup names
* get kernel graph
* dagre recursion fix + better error
* add toggle to hide sink nodes
* no sink by default
* revert that
* only hide final sinks
* lol
2026-02-23 22:47:22 +08:00
nimlgen
d86f1d66b5
system: apl validate dev_id bounds ( #14964 )
2026-02-23 12:18:03 +03:00
George Hotz
b824490e3f
allocate generates a call ( #14958 )
...
* allocate generates a call
* symbolic works too
* DEFINE_VAR is param
* replace param later
* apply buffers
* name
* upd
* this was a bug...
2026-02-23 15:59:20 +08:00
wozeparrot
dd8302a6d0
fix: optim device is never none here ( #14963 )
2026-02-22 23:34:57 -08:00
wozeparrot
25565b2410
fa: test for mp ( #14907 )
2026-02-22 21:47:36 -08:00
qazal
d6145736c7
sqtt: examples generator changes from inst_discovery ( #14961 )
...
* sqtt examples generator changes from inst_discovery
* rdna4
* rdna3
* cdna
* sad reality for mi300x
2026-02-23 14:42:48 +09:00
George Hotz
3acd763684
simple call in allocate ( #14962 )
...
* allocate generates a call
* symbolic works too
* add min/max to PARAM
* revert viz
2026-02-23 13:34:20 +08:00
George Hotz
f45199269b
hotfix: regress NV cifar_10steps_half to 120 ms
2026-02-23 12:29:25 +08:00
George Hotz
677145b393
all consts have shapes ( #14959 )
...
* all consts have shapes
* vconst has shape too
* use normal schedule
* cast ptrdtype
* image
* bitcast issue + hack
2026-02-23 10:26:50 +08:00
qazal
1538960002
viz: smaller view for repeated asm instructions in cfg ( #14954 )
...
* simple test
* todo
* feature
2026-02-23 10:41:43 +09:00
George Hotz
226d4a2440
hotfix: code DEBUG=1 defensively
2026-02-23 08:44:54 +08:00
chenyu
4424757b9a
update test_sharded_memory ( #14956 )
...
cleaned up and moved to test/null
2026-02-22 16:56:08 -05:00
b1tg
f9b7493e7a
cleanup fp8 conversion helpers and fp8 edge-case tests ( #14953 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-02-22 09:16:42 -05:00
qazal
60f90dd97c
sqtt: fix jitted program deduping, failing test for graphed kernels ( #14951 )
...
* work
* hcq_profile fix, test with JIT=2 passes
* ci, -n=auto
* rm duplicate test
* less
2026-02-22 15:22:31 +09:00
chenyu
ccfd878e0f
minor fix_assign_hazard improvement [pr] ( #14949 )
...
target.base cannot be s if s.op is a movement
2026-02-21 21:21:28 -05:00
chenyu
24e8919438
raise explicitly for test_crossunder_assign ( #14948 )
2026-02-21 21:21:13 -05:00
chenyu
acf8f6b287
faster fix_assign_hazard [pr] ( #14947 )
...
one toposort. `time NULL_ALLOW_COPYOUT=1 MNISTMOCK=1 PYTHONPATH="." NULL=1 DEFAULT_FLOAT=HALF BENCHMARK=10 BS=256 GPUS=1 MODEL=resnet python3 examples/mlperf/model_train.py` 150s -> 40s
2026-02-21 19:42:13 -05:00
chenyu
9764e2561c
more assign into unrealize silent fail cases ( #14944 )
2026-02-21 18:12:57 -05:00
nimlgen
6de15dc480
mockam usb ( #14916 )
...
* mockam usb
* f
* win
* x
* x
2026-02-21 23:05:54 +03:00
chenyu
0dbcd764ad
a few assign into unrealized failed test case ( #14940 )
2026-02-21 13:18:45 -05:00
wozeparrot
3cda781876
llama optim offload ( #14901 )
2026-02-21 08:53:45 -08:00
chenyu
0255a64a27
update test_jit_init_empty ( #14938 )
...
* update test_jit_init_empty
now it fails silently
* that
2026-02-21 09:01:50 -05:00
George Hotz
8ef5544e4a
realized PYTHON copies ( #14934 )
...
* realized PYTHON copies
* comment that out
* fix that test
* append afters
* contig
* disk copies
* should be 124
* 332
2026-02-21 20:29:31 +08:00
qazal
cf23c2eee7
viz: merge readelfs, clean up toggles UI code ( #14936 )
...
* no extra readelf function
* that node can never be null, display block is wrong fix the css
2026-02-21 19:58:35 +09:00
George Hotz
639224e6e1
no call hack needed anymore ( #14935 )
2026-02-21 18:06:00 +08:00
George Hotz
d3b829a189
print schedule caller with DEBUG=1 ( #14933 )
2026-02-21 16:22:45 +08:00
qazal
8278886cf9
test_profiler cleanup, non flaky cpu_profile test ( #14932 )
...
* test_profiler cleanup, non flaky cpu_profile test
* existing device is okay
2026-02-21 16:58:10 +09:00
George Hotz
06fb35a1e5
don't graph_rewrite into calls ( #14931 )
...
* don't graph_rewrite into calls
* optional
* pm_gate_kernel_sink removed
2026-02-21 15:39:59 +08:00
qazal
c5029fa460
jit case with Tensor.empty input, realized means allocated ( #14930 )
...
* simple failing jit test case with Tensor.empty
* this used to exist in ops.py...
* Revert "removed if self.buffer.is_allocated() in realized (#14836 )"
This reverts commit 72cf603805 .
2026-02-21 16:33:55 +09:00