Commit Graph

5194 Commits

Author SHA1 Message Date
qazal
b8a55d5f68 sqtt: new packet types, add discovery script (#14960) 2026-02-28 04:27:27 +09:00
chenyu
1406d49eef failed test cases for advanced setitem (#15048) 2026-02-27 10:50:18 -05:00
qazal
ad99b77f6d assembly/amd: add gfx12_asm_vflat llvm tests, disasm fixes (#15046)
* add gfx12_asm_vflat.s

* work
2026-02-27 20:20:31 +09:00
George Hotz
010d2790ce fix multi minimal (#15044) 2026-02-27 14:31:58 +08:00
George Hotz
d23b79530e remove disk from GGUF GEMV test (#15041)
* remove disk from GGUF GEMV test

* keep copy
2026-02-27 12:03:00 +08:00
chenyu
d345f7f5dc remove _pending_assigns (#15040) 2026-02-26 22:38:10 -05:00
George Hotz
37e31e7da4 gguf gemv test (#15039)
* add gemv tests

* gguf big

* skip

* make realize optional
2026-02-27 10:54:43 +08:00
chenyu
0f94a4bb73 failed test case for early fixup const copy (#15038)
* failed test case for early fixup const copy

wrong with PAD

* test no copy
2026-02-26 19:09:33 -05:00
chenyu
3a4db53b43 raise RuntimeError in schedule for conflicted var_val [pr] (#15031) 2026-02-26 15:16:01 -05:00
George Hotz
fe3ee8c27e fix symbolic shapes in calls (#15021)
* fix symbolic shapes in calls

* fix after in the big graph

* real tests
2026-02-26 17:17:18 +08:00
George Hotz
2655655a0c call gradient creates a call (#15020)
* function creates a full subgraph

* tests

* fix var

* fix tests

* implict assign/contig

* move kv init
2026-02-26 14:15:29 +08:00
chenyu
ed9d475a12 assign tests with test_function (#15015) 2026-02-25 16:15:59 -05:00
nimlgen
faa66e0a61 mi350 hive_reset am repro (#15014) 2026-02-25 21:30:18 +03:00
George Hotz
0d35b67f2c revert realize to only be buffers (#15008)
* revert realize to only be buffers

* fix that

* broken attention

* Revert "broken attention"

This reverts commit a23c3cd96c.

* and that
2026-02-25 22:43:06 +08:00
George Hotz
68831cd852 add more tests to test_function (#15003)
* add more tests to test_function

* add function to llm

* function decorator on llm

* works

* symbolic fixups

* minimum change

* implicit inputs

* don't actually update llama yet
2026-02-25 18:42:06 +08:00
George Hotz
e3fa9896b7 start function and add walk rewrite (#14992)
* start function and add walk rewrite

* work

* add function on feed_forward

* llm progress

* stuff

* none of that
2026-02-25 13:56:27 +08:00
chenyu
fde7a40bb0 allow dtype mismatched assign on disk (#14993)
reverted #14473, that was a bad idea. also added a test that safe_save only has copy
2026-02-24 20:49:55 -05:00
chenyu
5fd4fc0c6d fix tinyfs (#14974)
* fix tinyfs

* fix that
2026-02-24 08:50:53 -05:00
George Hotz
8a6dffc87e Tensor.callify will be the JIT (#14983)
* close

* simple callify, support linear in the scheduler

* all tests pass

* everyone is happy

* dumb test

* Remove unnecessary blank line in rangeify.py
2026-02-24 18:42:24 +08:00
George Hotz
b643fca51e clean up complete_create_schedule_with_vars (#14980)
* clean up complete_create_schedule_with_vars

* transform_to_call

* update viz tests
2026-02-24 16:12:36 +08:00
chenyu
0bda5585c7 unit test TestTinyFS (#14972)
these passed before the allocation change
2026-02-23 16:59:39 -05:00
imaolo
405d37423e call release() in MetalAllocator._free (#14970)
* add failing test

* call MTLBuffer.release() in MetalAllocator._free()

* Update test_metal.py

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2026-02-23 23:33:31 +03:00
chenyu
127136421d enable a few WEBGPU isnan tests that work now (#14967)
* enable a few WEBGPU isnan tests that work now

* still failed
2026-02-23 11:06:08 -05:00
ttomsa
0366474089 Bool cast to cmpne (#14544)
* test

* rm in llvmir

* rm in ptx and nir

* hmmmm

* rm in decompositions

* skip tests

* add test

* just this

* rm comment

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-23 10:31:36 -05:00
George Hotz
b824490e3f allocate generates a call (#14958)
* allocate generates a call

* symbolic works too

* DEFINE_VAR is param

* replace param later

* apply buffers

* name

* upd

* this was a bug...
2026-02-23 15:59:20 +08:00
wozeparrot
25565b2410 fa: test for mp (#14907) 2026-02-22 21:47:36 -08:00
qazal
d6145736c7 sqtt: examples generator changes from inst_discovery (#14961)
* sqtt examples generator changes from inst_discovery

* rdna4

* rdna3

* cdna

* sad reality for mi300x
2026-02-23 14:42:48 +09:00
George Hotz
677145b393 all consts have shapes (#14959)
* all consts have shapes

* vconst has shape too

* use normal schedule

* cast ptrdtype

* image

* bitcast issue + hack
2026-02-23 10:26:50 +08:00
qazal
1538960002 viz: smaller view for repeated asm instructions in cfg (#14954)
* simple test

* todo

* feature
2026-02-23 10:41:43 +09:00
chenyu
4424757b9a update test_sharded_memory (#14956)
cleaned up and moved to test/null
2026-02-22 16:56:08 -05:00
b1tg
f9b7493e7a cleanup fp8 conversion helpers and fp8 edge-case tests (#14953)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-22 09:16:42 -05:00
qazal
60f90dd97c sqtt: fix jitted program deduping, failing test for graphed kernels (#14951)
* work

* hcq_profile fix, test with JIT=2 passes

* ci, -n=auto

* rm duplicate test

* less
2026-02-22 15:22:31 +09:00
chenyu
24e8919438 raise explicitly for test_crossunder_assign (#14948) 2026-02-21 21:21:13 -05:00
chenyu
9764e2561c more assign into unrealize silent fail cases (#14944) 2026-02-21 18:12:57 -05:00
nimlgen
6de15dc480 mockam usb (#14916)
* mockam usb

* f

* win

* x

* x
2026-02-21 23:05:54 +03:00
chenyu
0dbcd764ad a few assign into unrealized failed test case (#14940) 2026-02-21 13:18:45 -05:00
chenyu
0255a64a27 update test_jit_init_empty (#14938)
* update test_jit_init_empty

now it fails silently

* that
2026-02-21 09:01:50 -05:00
George Hotz
8ef5544e4a realized PYTHON copies (#14934)
* realized PYTHON copies

* comment that out

* fix that test

* append afters

* contig

* disk copies

* should be 124

* 332
2026-02-21 20:29:31 +08:00
qazal
8278886cf9 test_profiler cleanup, non flaky cpu_profile test (#14932)
* test_profiler cleanup, non flaky cpu_profile test

* existing device is okay
2026-02-21 16:58:10 +09:00
qazal
c5029fa460 jit case with Tensor.empty input, realized means allocated (#14930)
* simple failing jit test case with Tensor.empty

* this used to exist in ops.py...

* Revert "removed if self.buffer.is_allocated() in realized (#14836)"

This reverts commit 72cf603805.
2026-02-21 16:33:55 +09:00
qazal
5b6fcd1cda gemm/asm: smallest cdna4 asm gemm test (#14925) 2026-02-21 11:56:05 +09:00
George Hotz
df7774661a remove late numbering of UOps (#14923)
* remove late numbering of UOps

* stupid fix

* dead code
2026-02-21 09:18:48 +08:00
chenyu
24286c5593 fix clone for multi (#14919)
also update empty_like to make sure it's backed by buffers
2026-02-20 17:21:09 -05:00
chenyu
1fc1508f67 add assign to test_realize_is_realize.py (#14918) 2026-02-20 16:48:01 -05:00
chenyu
a4634b253a fix empty_like for sharded tensor (#14915) 2026-02-20 16:30:04 -05:00
Nicolas Pinto
aa905db7f7 ptx: use setp.neu for float CMPNE (#14805)
* ptx: use setp.neu for float CMPNE

* test ptx float CMPNE renders setp.neu

* check NaN behavior, not grep ptx strings...

* skip WEBGPU for test_cmpne_nan (Vulkan NaN behavior)

---------

Co-authored-by: Nicolas Pinto <41171+npinto@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-20 16:11:04 -05:00
George Hotz
2611907afb start ripping out old scheduler -- no maps (#14909)
* start ripping out old scheduler -- no maps

* no more metadata
2026-02-20 21:05:04 +08:00
nimlgen
1b3b94a72a fix mockam mypy (#14908) 2026-02-20 15:15:05 +03:00
George Hotz
55d3a5def9 preallocate all realized buffers (#14823)
* preallocate all realized buffers

* contiguous

* work

* comment that out

* move to schedule

* better

* correct fix

* just buffer

* disk bufs

* fixes disk tensor stuff

* fix symbolic stuff

* fix multi

* 162 failures

* bugfixes

* don't check that anymore

* fix schedule tests

* mnist should be contiguious

* type and buffer

* fix tests

* shrink axis correction

* mypy fixes

* tests skips

* same 37 failures

* dedup

* no shrink in the graph

* 29 failures

* skips

* fix custom kernel

* fix training

* those optimizations aren't supported currently

* simpler

* more correct

* tests

* 14 failures

* works

* fix that test

* broken

* 11 failures

* only kernel counts left

* fixes

* all tests pass

* remove tensor_map

* op test

* 200 -> 230

* test fixes

* fixes

* revert test_tiny thing

* guard

* revert that

* test tiny passes

* no contigs there

* base realize back

* Revert "no contigs there"

This reverts commit c45bb9fcfd.

* revert that

* chop many assigns

* 12 failures

* fix tests

* tests

* apply after

* pre-commit

* remove old code

* delete that

* fix types

* remove extra contig

* fix dataloader

* torch fix

* disk fix

* update kernel fusion numbres

* runs on amd

* restore kernel count

* add that rule back

* that

* disable that

* wrong

* add the correct rule for that folding

* more tests

* guard c1.arg

* no newlines

* realize those

* split into a different file

* remove detach/contig back

* skip 2

* update that
2026-02-20 20:05:54 +08:00
nimlgen
dbf894215a init mockam (#14889)
* mockam

* more tests

* linter

* x
2026-02-20 14:09:11 +03:00