Commit Graph

12395 Commits

Author SHA1 Message Date
Christopher Milan
4a2fc7ecbb autogen: cache downloads (#14997) 2026-02-25 01:34:27 -05:00
George Hotz
e3fa9896b7 start function and add walk rewrite (#14992)
* start function and add walk rewrite

* work

* add function on feed_forward

* llm progress

* stuff

* none of that
2026-02-25 13:56:27 +08:00
chenyu
fde7a40bb0 allow dtype mismatched assign on disk (#14993)
reverted #14473, that was a bad idea. also added a test that safe_save only has copy
2026-02-24 20:49:55 -05:00
chenyu
46d9a9a74f minor indexing cleanups [pr] (#14991) 2026-02-24 16:49:35 -05:00
chenyu
8dae9be573 move realize_map fixup into realize_assign_src [pr] (#14990) 2026-02-24 15:51:40 -05:00
chenyu
9d9151a21e remove const normalization in indexing [pr] (#14989)
rangeify can create const with device, and all is normalized in to_define_global
2026-02-24 15:09:11 -05:00
chenyu
f68a472244 end range for COPY/BUFFER_VIEW [pr] (#14987) 2026-02-24 13:33:35 -05:00
chenyu
e5d27a3773 remove BUFFER_VIEW from ended_ranges special case [pr] (#14986)
* remove BUFFER_VIEW from ended_ranges special case [pr]

* will fix later
2026-02-24 10:37:29 -05:00
chenyu
5fd4fc0c6d fix tinyfs (#14974)
* fix tinyfs

* fix that
2026-02-24 08:50:53 -05:00
George Hotz
8a6dffc87e Tensor.callify will be the JIT (#14983)
* close

* simple callify, support linear in the scheduler

* all tests pass

* everyone is happy

* dumb test

* Remove unnecessary blank line in rangeify.py
2026-02-24 18:42:24 +08:00
nimlgen
6f1cb6be86 am: tiny err handling cleanups (#14981)
* am: tiny err handling cleanups

* x

* x
2026-02-24 12:43:45 +03:00
George Hotz
b643fca51e clean up complete_create_schedule_with_vars (#14980)
* clean up complete_create_schedule_with_vars

* transform_to_call

* update viz tests
2026-02-24 16:12:36 +08:00
wozeparrot
8d9545e09e llama3: correctly shard wqkv (#14978) 2026-02-23 23:57:10 -08:00
wozeparrot
a36a26d4ed llama3: optim does grad acc in correct order (#14965) 2026-02-23 22:25:13 -08:00
George Hotz
e2b1f2620d schedule is linear (#14975)
* schedule is linear

* cleanup

* cleanups
2026-02-24 11:30:41 +08:00
Christopher Milan
57ade7608a consider indexing math cost for IMAGE=1 (#14973) 2026-02-23 18:57:45 -05:00
chenyu
0bda5585c7 unit test TestTinyFS (#14972)
these passed before the allocation change
2026-02-23 16:59:39 -05:00
imaolo
405d37423e call release() in MetalAllocator._free (#14970)
* add failing test

* call MTLBuffer.release() in MetalAllocator._free()

* Update test_metal.py

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2026-02-23 23:33:31 +03:00
nimlgen
77db8e1c07 cpu: wait on dep signals (#14862)
* cpu: task_done() in case of failures

* print

* fix

* x

* f

* x

* um

* ?

* u

* f

* x

* gh

* f

* f

* virt

* x

* simpler
2026-02-23 21:09:41 +03:00
chenyu
127136421d enable a few WEBGPU isnan tests that work now (#14967)
* enable a few WEBGPU isnan tests that work now

* still failed
2026-02-23 11:06:08 -05:00
ttomsa
0366474089 Bool cast to cmpne (#14544)
* test

* rm in llvmir

* rm in ptx and nir

* hmmmm

* rm in decompositions

* skip tests

* add test

* just this

* rm comment

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-23 10:31:36 -05:00
George Hotz
806581f807 rename rewrites + sink filter + bump to dagre 2.0.0 (#14966)
* bump to dagre 2.0.0

* transform to call

* cleanup names

* get kernel graph

* dagre recursion fix + better error

* add toggle to hide sink nodes

* no sink by default

* revert that

* only hide final sinks

* lol
2026-02-23 22:47:22 +08:00
nimlgen
d86f1d66b5 system: apl validate dev_id bounds (#14964) 2026-02-23 12:18:03 +03:00
George Hotz
b824490e3f allocate generates a call (#14958)
* allocate generates a call

* symbolic works too

* DEFINE_VAR is param

* replace param later

* apply buffers

* name

* upd

* this was a bug...
2026-02-23 15:59:20 +08:00
wozeparrot
dd8302a6d0 fix: optim device is never none here (#14963) 2026-02-22 23:34:57 -08:00
wozeparrot
25565b2410 fa: test for mp (#14907) 2026-02-22 21:47:36 -08:00
qazal
d6145736c7 sqtt: examples generator changes from inst_discovery (#14961)
* sqtt examples generator changes from inst_discovery

* rdna4

* rdna3

* cdna

* sad reality for mi300x
2026-02-23 14:42:48 +09:00
George Hotz
3acd763684 simple call in allocate (#14962)
* allocate generates a call

* symbolic works too

* add min/max to PARAM

* revert viz
2026-02-23 13:34:20 +08:00
George Hotz
f45199269b hotfix: regress NV cifar_10steps_half to 120 ms 2026-02-23 12:29:25 +08:00
George Hotz
677145b393 all consts have shapes (#14959)
* all consts have shapes

* vconst has shape too

* use normal schedule

* cast ptrdtype

* image

* bitcast issue + hack
2026-02-23 10:26:50 +08:00
qazal
1538960002 viz: smaller view for repeated asm instructions in cfg (#14954)
* simple test

* todo

* feature
2026-02-23 10:41:43 +09:00
George Hotz
226d4a2440 hotfix: code DEBUG=1 defensively 2026-02-23 08:44:54 +08:00
chenyu
4424757b9a update test_sharded_memory (#14956)
cleaned up and moved to test/null
2026-02-22 16:56:08 -05:00
b1tg
f9b7493e7a cleanup fp8 conversion helpers and fp8 edge-case tests (#14953)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-22 09:16:42 -05:00
qazal
60f90dd97c sqtt: fix jitted program deduping, failing test for graphed kernels (#14951)
* work

* hcq_profile fix, test with JIT=2 passes

* ci, -n=auto

* rm duplicate test

* less
2026-02-22 15:22:31 +09:00
chenyu
ccfd878e0f minor fix_assign_hazard improvement [pr] (#14949)
target.base cannot be s if s.op is a movement
2026-02-21 21:21:28 -05:00
chenyu
24e8919438 raise explicitly for test_crossunder_assign (#14948) 2026-02-21 21:21:13 -05:00
chenyu
acf8f6b287 faster fix_assign_hazard [pr] (#14947)
one toposort. `time NULL_ALLOW_COPYOUT=1 MNISTMOCK=1 PYTHONPATH="." NULL=1 DEFAULT_FLOAT=HALF BENCHMARK=10 BS=256 GPUS=1 MODEL=resnet python3 examples/mlperf/model_train.py` 150s -> 40s
2026-02-21 19:42:13 -05:00
chenyu
9764e2561c more assign into unrealize silent fail cases (#14944) 2026-02-21 18:12:57 -05:00
nimlgen
6de15dc480 mockam usb (#14916)
* mockam usb

* f

* win

* x

* x
2026-02-21 23:05:54 +03:00
chenyu
0dbcd764ad a few assign into unrealized failed test case (#14940) 2026-02-21 13:18:45 -05:00
wozeparrot
3cda781876 llama optim offload (#14901) 2026-02-21 08:53:45 -08:00
chenyu
0255a64a27 update test_jit_init_empty (#14938)
* update test_jit_init_empty

now it fails silently

* that
2026-02-21 09:01:50 -05:00
George Hotz
8ef5544e4a realized PYTHON copies (#14934)
* realized PYTHON copies

* comment that out

* fix that test

* append afters

* contig

* disk copies

* should be 124

* 332
2026-02-21 20:29:31 +08:00
qazal
cf23c2eee7 viz: merge readelfs, clean up toggles UI code (#14936)
* no extra readelf function

* that node can never be null, display block is wrong fix the css
2026-02-21 19:58:35 +09:00
George Hotz
639224e6e1 no call hack needed anymore (#14935) 2026-02-21 18:06:00 +08:00
George Hotz
d3b829a189 print schedule caller with DEBUG=1 (#14933) 2026-02-21 16:22:45 +08:00
qazal
8278886cf9 test_profiler cleanup, non flaky cpu_profile test (#14932)
* test_profiler cleanup, non flaky cpu_profile test

* existing device is okay
2026-02-21 16:58:10 +09:00
George Hotz
06fb35a1e5 don't graph_rewrite into calls (#14931)
* don't graph_rewrite into calls

* optional

* pm_gate_kernel_sink removed
2026-02-21 15:39:59 +08:00
qazal
c5029fa460 jit case with Tensor.empty input, realized means allocated (#14930)
* simple failing jit test case with Tensor.empty

* this used to exist in ops.py...

* Revert "removed if self.buffer.is_allocated() in realized (#14836)"

This reverts commit 72cf603805.
2026-02-21 16:33:55 +09:00