Commit Graph

1093 Commits

Author SHA1 Message Date
chenyu
103ea16ec0 add contiguous back to svd (#15074)
can cause infinite loop
2026-02-28 16:49:26 -05:00
George Hotz
bb84e389cf functions for llama trainer (#15045)
* functions for llama trainer

* function there

* axis match

* fix multi

* lil cleaner

* there's a bug with HK_FLASH_ATTENTION

* training functions

* for commit
2026-02-28 12:15:18 +08:00
chenyu
5fd06f4f02 differentiable setitem (#15054)
* differentiable setitem

go through the where path for bw

* no return
2026-02-27 17:25:15 -05:00
chenyu
c9f6d8751b don't remove_bufferize for Invalid (#15053)
* don't remove_bufferize for Invalid

* replaced
2026-02-27 15:16:09 -05:00
George Hotz
010d2790ce fix multi minimal (#15044) 2026-02-27 14:31:58 +08:00
George Hotz
d23b79530e remove disk from GGUF GEMV test (#15041)
* remove disk from GGUF GEMV test

* keep copy
2026-02-27 12:03:00 +08:00
chenyu
d345f7f5dc remove _pending_assigns (#15040) 2026-02-26 22:38:10 -05:00
George Hotz
37e31e7da4 gguf gemv test (#15039)
* add gemv tests

* gguf big

* skip

* make realize optional
2026-02-27 10:54:43 +08:00
George Hotz
fe3ee8c27e fix symbolic shapes in calls (#15021)
* fix symbolic shapes in calls

* fix after in the big graph

* real tests
2026-02-26 17:17:18 +08:00
George Hotz
2655655a0c call gradient creates a call (#15020)
* function creates a full subgraph

* tests

* fix var

* fix tests

* implict assign/contig

* move kv init
2026-02-26 14:15:29 +08:00
chenyu
ed9d475a12 assign tests with test_function (#15015) 2026-02-25 16:15:59 -05:00
George Hotz
0d35b67f2c revert realize to only be buffers (#15008)
* revert realize to only be buffers

* fix that

* broken attention

* Revert "broken attention"

This reverts commit a23c3cd96c.

* and that
2026-02-25 22:43:06 +08:00
George Hotz
68831cd852 add more tests to test_function (#15003)
* add more tests to test_function

* add function to llm

* function decorator on llm

* works

* symbolic fixups

* minimum change

* implicit inputs

* don't actually update llama yet
2026-02-25 18:42:06 +08:00
George Hotz
e3fa9896b7 start function and add walk rewrite (#14992)
* start function and add walk rewrite

* work

* add function on feed_forward

* llm progress

* stuff

* none of that
2026-02-25 13:56:27 +08:00
chenyu
fde7a40bb0 allow dtype mismatched assign on disk (#14993)
reverted #14473, that was a bad idea. also added a test that safe_save only has copy
2026-02-24 20:49:55 -05:00
chenyu
5fd4fc0c6d fix tinyfs (#14974)
* fix tinyfs

* fix that
2026-02-24 08:50:53 -05:00
George Hotz
8a6dffc87e Tensor.callify will be the JIT (#14983)
* close

* simple callify, support linear in the scheduler

* all tests pass

* everyone is happy

* dumb test

* Remove unnecessary blank line in rangeify.py
2026-02-24 18:42:24 +08:00
chenyu
0bda5585c7 unit test TestTinyFS (#14972)
these passed before the allocation change
2026-02-23 16:59:39 -05:00
chenyu
24e8919438 raise explicitly for test_crossunder_assign (#14948) 2026-02-21 21:21:13 -05:00
chenyu
9764e2561c more assign into unrealize silent fail cases (#14944) 2026-02-21 18:12:57 -05:00
chenyu
0dbcd764ad a few assign into unrealized failed test case (#14940) 2026-02-21 13:18:45 -05:00
qazal
c5029fa460 jit case with Tensor.empty input, realized means allocated (#14930)
* simple failing jit test case with Tensor.empty

* this used to exist in ops.py...

* Revert "removed if self.buffer.is_allocated() in realized (#14836)"

This reverts commit 72cf603805.
2026-02-21 16:33:55 +09:00
chenyu
1fc1508f67 add assign to test_realize_is_realize.py (#14918) 2026-02-20 16:48:01 -05:00
George Hotz
55d3a5def9 preallocate all realized buffers (#14823)
* preallocate all realized buffers

* contiguous

* work

* comment that out

* move to schedule

* better

* correct fix

* just buffer

* disk bufs

* fixes disk tensor stuff

* fix symbolic stuff

* fix multi

* 162 failures

* bugfixes

* don't check that anymore

* fix schedule tests

* mnist should be contiguious

* type and buffer

* fix tests

* shrink axis correction

* mypy fixes

* tests skips

* same 37 failures

* dedup

* no shrink in the graph

* 29 failures

* skips

* fix custom kernel

* fix training

* those optimizations aren't supported currently

* simpler

* more correct

* tests

* 14 failures

* works

* fix that test

* broken

* 11 failures

* only kernel counts left

* fixes

* all tests pass

* remove tensor_map

* op test

* 200 -> 230

* test fixes

* fixes

* revert test_tiny thing

* guard

* revert that

* test tiny passes

* no contigs there

* base realize back

* Revert "no contigs there"

This reverts commit c45bb9fcfd.

* revert that

* chop many assigns

* 12 failures

* fix tests

* tests

* apply after

* pre-commit

* remove old code

* delete that

* fix types

* remove extra contig

* fix dataloader

* torch fix

* disk fix

* update kernel fusion numbres

* runs on amd

* restore kernel count

* add that rule back

* that

* disable that

* wrong

* add the correct rule for that folding

* more tests

* guard c1.arg

* no newlines

* realize those

* split into a different file

* remove detach/contig back

* skip 2

* update that
2026-02-20 20:05:54 +08:00
chenyu
06ef8a26b7 add a test case that triggers CALL passthrough_multi (#14887) 2026-02-19 10:45:40 -05:00
Kartik Vashishta
9a9c7648e9 system: fix pci_scan_bus vendor filter (#14885)
* system: fix pci_scan_bus vendor filter

* fix: formatting
2026-02-19 17:23:32 +03:00
chenyu
e8252e6e4f use offical gguf in test (#14872)
also deleted bad test_load_sample_mxfp4, added some hard coded simple tests
2026-02-18 19:55:09 -05:00
Ananta Ranganathan
4005e9db6d Mxfp4 fix (#14866)
* double e2m1 values for mxfp4

* check if assert equal works in ci

* Revert "check if assert equal works in ci"

This reverts commit 8cf902ce0d.

* remove unnecessary whitespace change

* add test case that fails for old implementation but passes for new

* add note that the previous test is bad

* clarification on the methodology for the test

* fix the indent problem that happened to skip this test

* for now update mxfp4 block test to similarly use allclose (bad)

* add gist link and clearer explanation of process for computing test data
2026-02-18 18:50:59 -05:00
George Hotz
d5636fba90 assign after copy shouldn't contig (#14847)
* assign after copy shouldn't contig

* fix assign copy
2026-02-18 12:23:49 +08:00
chenyu
e3c120c8e1 exclude 100 in test_assign_add (#14846)
this can crash, not sure why. skip 100 to see if it's better
2026-02-17 19:12:47 -05:00
chenyu
72cf603805 removed if self.buffer.is_allocated() in realized (#14836)
automatically fixes is_realized issue for empty
2026-02-17 15:35:56 -05:00
chenyu
aec8a6c85b Revert "one run_schedule for assign realize (#14835)" (#14837)
This reverts commit df7c37f611.
2026-02-17 14:34:26 -05:00
chenyu
df7c37f611 one run_schedule for assign realize (#14835)
concat schedules. separate out the execution part
2026-02-17 14:01:55 -05:00
chenyu
61867c2f35 TestRealizeIsRealized (#14834)
test after calling .realize(), uop.is_realized is True. currently not working for empty (thus disk tensor), and const
2026-02-17 13:30:35 -05:00
chenyu
f147791105 update test to reset and test kernel_count directly (#14832) 2026-02-17 11:48:46 -05:00
chenyu
9d4937ab5e remove assign test @unittest.skip("this test is crashing!") (#14831) 2026-02-17 10:30:58 -05:00
nimlgen
dda5ccf63b hcq: fix usb<->cpu mappings (#14827)
* hcq: fix usb<->cpu mappings

* non cpu

* um
2026-02-17 18:04:18 +03:00
chenyu
f2f039cc0f fix chained full-buffer assign (#14828)
this shows issue that pm_remove_bufferize drops tags, will fix in bufferize next. this also fixed rand being different in jit vs no-jit
2026-02-17 09:11:04 -05:00
chenyu
58fa82eef5 stronger test_assign_add (#14826)
also test self add 10 and 100 times
2026-02-17 08:36:09 -05:00
chenyu
5bca5be2d2 test slice assign twice retains the buffer (#14807) 2026-02-16 20:01:47 -05:00
chenyu
9b44fbe0b8 update test_assign_add_twice (#14806)
failed test case to show that `+=1` twice returns a different buffer
2026-02-16 17:52:11 -05:00
Bautista Garcia
0f1ca8eb43 torch_load: fix shared storage slicing (#14771)
* faster zip_extract + usage in torch load

* clean zip in torch load

* working zipextract in torchload

* tar_extract in tar path

* faster tar path

* tests passing, cleanup needed

* faster tar with 1MB buffer

* comments

* unify storage_source with all paths

* use bufferedreader in zip path

* fix ruff

* clean

* removed unnecessary string conversion

* fix for tensors that share storage

* less hacky

* shared storage test

* test comment

* linter

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-02-16 14:30:13 +08:00
George Hotz
0e215c433d remove hack from cast (#14760)
* remove hack from cast

* skip tests

* linters to 3.12, another skip

* fix rand

* m_
2026-02-15 13:56:38 +08:00
chenyu
ca68037f26 lazy basic setitem to unrealized Tensor (#14756)
undo the view and make it a mask, this fuses the setitem with any pending compute too.

one behavior change is that for target not backed by a buffer (const and arange), rangeify makes output contiguous under the hood.
this is stricter better than raise and ask user to call contiguous, as that would no longer be fuse-able.
2026-02-14 20:27:03 -05:00
chenyu
902dc7c09c fix test_numpy_parity_and_backward_2d (#14755)
test setup issue, test failed locally with `RUN_SLOW=1`
2026-02-14 17:59:00 -05:00
chenyu
043f5dbfa0 fix write-after-read tracking (#14754)
AFTER-AFTER was silently dropped, which breaks write-after-read
2026-02-14 17:23:05 -05:00
chenyu
d79c63a0ff test_multi_step_assign_read_write_same_buffer (#14752)
pattern in LAMB that can be off subtly
2026-02-14 16:39:08 -05:00
chenyu
0ce4a55dad clean up test_setitem_slice (#14750)
moved to test_setitem_schedule, and use contiguous zeros as scheduler handles empty differently now
2026-02-14 14:29:16 -05:00
chenyu
8f6772fd8c more setitem kernel mem tests (#14749)
* more setitem kernel mem tests

test only the slice is accessed

* update
2026-02-14 11:01:03 -05:00
chenyu
787998fac3 fix getitem tensor indexing detection (#14712)
issue with sint
2026-02-12 16:04:37 -05:00