* metal uint32 icb offset overflow
fix: diff
supports_exec_item
GraphRunner.supports_exec_item
tests
fix: can't import on non-metal
stricter
* also test the non-metal buffer case
* imports on non-mac
* q5_k gguf support as separate pr
* fix the problematic gemv test for q5_k
* add assert to make sure the gemv test cant fail with warning instead of error
* add precompile to call
* put get back
* something
* after structure
* alt
* keep it call
* resolve call
* resolve linear call
* precompile works with llm
* revert rangeify
* color for debugging
* getenv PRECOMPILE
* clean up deco pattern
* fully recursive sink scheduling
* revert llama
* fix SPEC=2
* buffer view is like buffer
* fix
* swap_reshape_shrink
* contiguous on gguf, fix overlap
* revert that
* _device_supports_view
* this
* fix that test
* 0 buffers
* that test was wrong
* this
* check correct size
* contig BUFFER_VIEW
* this
* fix tests
* buffer view tests
* om
* fix torch
* no MOCKGPU
* skip
* functions for llama trainer
* function there
* axis match
* fix multi
* lil cleaner
* there's a bug with HK_FLASH_ATTENTION
* training functions
* for commit
* add more tests to test_function
* add function to llm
* function decorator on llm
* works
* symbolic fixups
* minimum change
* implicit inputs
* don't actually update llama yet
* close
* simple callify, support linear in the scheduler
* all tests pass
* everyone is happy
* dumb test
* Remove unnecessary blank line in rangeify.py
* simple failing jit test case with Tensor.empty
* this used to exist in ops.py...
* Revert "removed if self.buffer.is_allocated() in realized (#14836)"
This reverts commit 72cf603805.
* preallocate all realized buffers
* contiguous
* work
* comment that out
* move to schedule
* better
* correct fix
* just buffer
* disk bufs
* fixes disk tensor stuff
* fix symbolic stuff
* fix multi
* 162 failures
* bugfixes
* don't check that anymore
* fix schedule tests
* mnist should be contiguious
* type and buffer
* fix tests
* shrink axis correction
* mypy fixes
* tests skips
* same 37 failures
* dedup
* no shrink in the graph
* 29 failures
* skips
* fix custom kernel
* fix training
* those optimizations aren't supported currently
* simpler
* more correct
* tests
* 14 failures
* works
* fix that test
* broken
* 11 failures
* only kernel counts left
* fixes
* all tests pass
* remove tensor_map
* op test
* 200 -> 230
* test fixes
* fixes
* revert test_tiny thing
* guard
* revert that
* test tiny passes
* no contigs there
* base realize back
* Revert "no contigs there"
This reverts commit c45bb9fcfd.
* revert that
* chop many assigns
* 12 failures
* fix tests
* tests
* apply after
* pre-commit
* remove old code
* delete that
* fix types
* remove extra contig
* fix dataloader
* torch fix
* disk fix
* update kernel fusion numbres
* runs on amd
* restore kernel count
* add that rule back
* that
* disable that
* wrong
* add the correct rule for that folding
* more tests
* guard c1.arg
* no newlines
* realize those
* split into a different file
* remove detach/contig back
* skip 2
* update that
* double e2m1 values for mxfp4
* check if assert equal works in ci
* Revert "check if assert equal works in ci"
This reverts commit 8cf902ce0d.
* remove unnecessary whitespace change
* add test case that fails for old implementation but passes for new
* add note that the previous test is bad
* clarification on the methodology for the test
* fix the indent problem that happened to skip this test
* for now update mxfp4 block test to similarly use allclose (bad)
* add gist link and clearer explanation of process for computing test data