Commit Graph

1114 Commits

Author SHA1 Message Date
b1tg
18dc77ccab add fp8 fnuz dtypes with PYTHON backend support (#14945)
* add fp8 fnuz dtypes with PYTHON backend support

* rm emu related change

* clarify fp8 fnuz zero handling

* Revert "rm emu related change"

This reverts commit efa4763c22.

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-03-11 22:30:18 -04:00
George Hotz
4f3f55328b do not patch on invalid tensor tests (#15226)
* do not patch on invalid tensor tests

* cleanup
2026-03-12 09:35:20 +08:00
Christopher Milan
2fb8a7f60f fix test_invalid_tensor when before values are nan (#15215) 2026-03-10 23:51:19 -04:00
Christopher Milan
ffaafd391a Invalid in Tensor (#15154) 2026-03-10 02:49:54 -04:00
chenyu
a53187eef7 fix TestPartialAssignToSharedBuffer (#15202)
bufferize_to_store issue with assign
2026-03-09 23:14:23 -04:00
b1tg
891a73befc llm: fix chunked prefill (#15182)
* llm: fix chunked prefill

* less lines

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2026-03-07 22:08:31 +08:00
Ananta Ranganathan
5bdad8ee41 update mxfp4 tests to use the same patterns as the others (#15177)
* update mxfp4 tests to use the same patterns as the others

* fix typo in test call not sure how it committed
2026-03-06 13:21:40 -05:00
Ananta Ranganathan
5c50035e0d avoid using arithmetic for mxfp4 (#15172)
* avoid using arithmetic for mxfp4

* update tests to use assert equal

* no longer todo
2026-03-06 11:17:56 -05:00
Roelof van Dijk
059c6326c0 metal uint32 icb offset overflow (#15156)
* metal uint32 icb offset overflow

fix: diff

supports_exec_item

GraphRunner.supports_exec_item

tests

fix: can't import on non-metal

stricter

* also test the non-metal buffer case

* imports on non-mac
2026-03-06 00:54:39 +03:00
Ananta Ranganathan
8ef656324e FIXED TEST Q5_K GGUF dequant (#15147)
* q5_k gguf support as separate pr

* fix the problematic gemv test for q5_k

* add assert to make sure the gemv test cant fail with warning instead of error
2026-03-05 16:32:36 +08:00
George Hotz
e97922a57c LLM speedup with two jits, prefill/rollout (#15153)
* START_TIME

* print cleanup

* fix tests
2026-03-05 16:21:09 +08:00
George Hotz
fb43b415f9 fix symbolic shape call + chunked prefill (#15149)
* fix precompile for symbolic shape

* chunked prefill

* cleaner

* test that
2026-03-05 14:02:26 +08:00
George Hotz
ac1847cbf7 fully symbolic llm (#15097)
* work

* llm symbolic (almost)

* work

* revert that

* llm sym

* works

* cleanups

* cache tokens with the kv cache

* cleanups

* cleanups
2026-03-05 10:22:11 +08:00
chenyu
34594bcaaf Revert "bug in metal: offset is stored as uint32, overflow (#15129)" (#15136)
This reverts commit 9c58db16fa.
2026-03-04 16:54:42 -05:00
Roelof van Dijk
9c58db16fa bug in metal: offset is stored as uint32, overflow (#15129)
* metal uint32 icb offset overflow

* fix: diff

* supports_exec_item

* GraphRunner.supports_exec_item

* tests

* fix: can't import on non-metal
2026-03-04 22:52:12 +03:00
chenyu
fae400d300 update assign tests to also test the expected behavior (#15132) 2026-03-04 11:34:43 -05:00
chenyu
1f96cc2b51 update non-contiguous buffer error message [pr] (#15131)
* update non-contiguous buffer error message [pr]

also cleaned up the tests

* order
2026-03-04 11:13:26 -05:00
George Hotz
01ddb4c267 add precompile to call (#15099)
* add precompile to call

* put get back

* something

* after structure

* alt

* keep it call

* resolve call

* resolve linear call

* precompile works with llm

* revert rangeify

* color for debugging

* getenv PRECOMPILE

* clean up deco pattern

* fully recursive sink scheduling

* revert llama

* fix SPEC=2
2026-03-03 22:32:42 +08:00
chenyu
5dcf29b1a0 use clone in test_swap_slices (#15096) 2026-03-02 22:05:12 -05:00
George Hotz
d483e4153a buffer view is like buffer (#15082)
* buffer view is like buffer

* fix

* swap_reshape_shrink

* contiguous on gguf, fix overlap

* revert that

* _device_supports_view

* this

* fix that test

* 0 buffers

* that test was wrong

* this

* check correct size

* contig BUFFER_VIEW

* this

* fix tests

* buffer view tests

* om

* fix torch

* no MOCKGPU

* skip
2026-03-03 09:52:33 +08:00
chenyu
14d1c5fdfd assign fusion tests on detach and contiguous_backward (#15092) 2026-03-02 15:21:51 -05:00
chenyu
103ea16ec0 add contiguous back to svd (#15074)
can cause infinite loop
2026-02-28 16:49:26 -05:00
George Hotz
bb84e389cf functions for llama trainer (#15045)
* functions for llama trainer

* function there

* axis match

* fix multi

* lil cleaner

* there's a bug with HK_FLASH_ATTENTION

* training functions

* for commit
2026-02-28 12:15:18 +08:00
chenyu
5fd06f4f02 differentiable setitem (#15054)
* differentiable setitem

go through the where path for bw

* no return
2026-02-27 17:25:15 -05:00
chenyu
c9f6d8751b don't remove_bufferize for Invalid (#15053)
* don't remove_bufferize for Invalid

* replaced
2026-02-27 15:16:09 -05:00
George Hotz
010d2790ce fix multi minimal (#15044) 2026-02-27 14:31:58 +08:00
George Hotz
d23b79530e remove disk from GGUF GEMV test (#15041)
* remove disk from GGUF GEMV test

* keep copy
2026-02-27 12:03:00 +08:00
chenyu
d345f7f5dc remove _pending_assigns (#15040) 2026-02-26 22:38:10 -05:00
George Hotz
37e31e7da4 gguf gemv test (#15039)
* add gemv tests

* gguf big

* skip

* make realize optional
2026-02-27 10:54:43 +08:00
George Hotz
fe3ee8c27e fix symbolic shapes in calls (#15021)
* fix symbolic shapes in calls

* fix after in the big graph

* real tests
2026-02-26 17:17:18 +08:00
George Hotz
2655655a0c call gradient creates a call (#15020)
* function creates a full subgraph

* tests

* fix var

* fix tests

* implict assign/contig

* move kv init
2026-02-26 14:15:29 +08:00
chenyu
ed9d475a12 assign tests with test_function (#15015) 2026-02-25 16:15:59 -05:00
George Hotz
0d35b67f2c revert realize to only be buffers (#15008)
* revert realize to only be buffers

* fix that

* broken attention

* Revert "broken attention"

This reverts commit a23c3cd96c.

* and that
2026-02-25 22:43:06 +08:00
George Hotz
68831cd852 add more tests to test_function (#15003)
* add more tests to test_function

* add function to llm

* function decorator on llm

* works

* symbolic fixups

* minimum change

* implicit inputs

* don't actually update llama yet
2026-02-25 18:42:06 +08:00
George Hotz
e3fa9896b7 start function and add walk rewrite (#14992)
* start function and add walk rewrite

* work

* add function on feed_forward

* llm progress

* stuff

* none of that
2026-02-25 13:56:27 +08:00
chenyu
fde7a40bb0 allow dtype mismatched assign on disk (#14993)
reverted #14473, that was a bad idea. also added a test that safe_save only has copy
2026-02-24 20:49:55 -05:00
chenyu
5fd4fc0c6d fix tinyfs (#14974)
* fix tinyfs

* fix that
2026-02-24 08:50:53 -05:00
George Hotz
8a6dffc87e Tensor.callify will be the JIT (#14983)
* close

* simple callify, support linear in the scheduler

* all tests pass

* everyone is happy

* dumb test

* Remove unnecessary blank line in rangeify.py
2026-02-24 18:42:24 +08:00
chenyu
0bda5585c7 unit test TestTinyFS (#14972)
these passed before the allocation change
2026-02-23 16:59:39 -05:00
chenyu
24e8919438 raise explicitly for test_crossunder_assign (#14948) 2026-02-21 21:21:13 -05:00
chenyu
9764e2561c more assign into unrealize silent fail cases (#14944) 2026-02-21 18:12:57 -05:00
chenyu
0dbcd764ad a few assign into unrealized failed test case (#14940) 2026-02-21 13:18:45 -05:00
qazal
c5029fa460 jit case with Tensor.empty input, realized means allocated (#14930)
* simple failing jit test case with Tensor.empty

* this used to exist in ops.py...

* Revert "removed if self.buffer.is_allocated() in realized (#14836)"

This reverts commit 72cf603805.
2026-02-21 16:33:55 +09:00
chenyu
1fc1508f67 add assign to test_realize_is_realize.py (#14918) 2026-02-20 16:48:01 -05:00
George Hotz
55d3a5def9 preallocate all realized buffers (#14823)
* preallocate all realized buffers

* contiguous

* work

* comment that out

* move to schedule

* better

* correct fix

* just buffer

* disk bufs

* fixes disk tensor stuff

* fix symbolic stuff

* fix multi

* 162 failures

* bugfixes

* don't check that anymore

* fix schedule tests

* mnist should be contiguious

* type and buffer

* fix tests

* shrink axis correction

* mypy fixes

* tests skips

* same 37 failures

* dedup

* no shrink in the graph

* 29 failures

* skips

* fix custom kernel

* fix training

* those optimizations aren't supported currently

* simpler

* more correct

* tests

* 14 failures

* works

* fix that test

* broken

* 11 failures

* only kernel counts left

* fixes

* all tests pass

* remove tensor_map

* op test

* 200 -> 230

* test fixes

* fixes

* revert test_tiny thing

* guard

* revert that

* test tiny passes

* no contigs there

* base realize back

* Revert "no contigs there"

This reverts commit c45bb9fcfd.

* revert that

* chop many assigns

* 12 failures

* fix tests

* tests

* apply after

* pre-commit

* remove old code

* delete that

* fix types

* remove extra contig

* fix dataloader

* torch fix

* disk fix

* update kernel fusion numbres

* runs on amd

* restore kernel count

* add that rule back

* that

* disable that

* wrong

* add the correct rule for that folding

* more tests

* guard c1.arg

* no newlines

* realize those

* split into a different file

* remove detach/contig back

* skip 2

* update that
2026-02-20 20:05:54 +08:00
chenyu
06ef8a26b7 add a test case that triggers CALL passthrough_multi (#14887) 2026-02-19 10:45:40 -05:00
Kartik Vashishta
9a9c7648e9 system: fix pci_scan_bus vendor filter (#14885)
* system: fix pci_scan_bus vendor filter

* fix: formatting
2026-02-19 17:23:32 +03:00
chenyu
e8252e6e4f use offical gguf in test (#14872)
also deleted bad test_load_sample_mxfp4, added some hard coded simple tests
2026-02-18 19:55:09 -05:00
Ananta Ranganathan
4005e9db6d Mxfp4 fix (#14866)
* double e2m1 values for mxfp4

* check if assert equal works in ci

* Revert "check if assert equal works in ci"

This reverts commit 8cf902ce0d.

* remove unnecessary whitespace change

* add test case that fails for old implementation but passes for new

* add note that the previous test is bad

* clarification on the methodology for the test

* fix the indent problem that happened to skip this test

* for now update mxfp4 block test to similarly use allclose (bad)

* add gist link and clearer explanation of process for computing test data
2026-02-18 18:50:59 -05:00
George Hotz
d5636fba90 assign after copy shouldn't contig (#14847)
* assign after copy shouldn't contig

* fix assign copy
2026-02-18 12:23:49 +08:00