* very close
* remove comment
* negative strides working
* almost everything passes
* calculate offset with list comprehension
* some cleanup
* got disk load working
* review suggestions
* fix after merge
* overlap working
* did it
* clean
* fixed disk load
* lint
* mypy
* removed as_strided
* trying without simplify
* added back simplify
* make sure expanding to smaller shape
* cleanup
* removed comment
* removed env file
* trying whisper test again
* onnx test sqlite issue
* working on test
* finished test
* eliminate unnecessary shrink-then-pad
* don't shrink buffer
* added strides check
* added to ci under linters
* switch issue
* allow symbolic stride
* removed .env
* isinstance
* adjust strides for double expand
* cleanup
* needed to add type hint for mypy
* set pythonpath
* add Tensor.multinomial only with replacement
* add support for 2D input in Tensor.multinomial
* fix multinomial output shape
* allow passing replacement=False to Tensor.multinomial when num_samples=1
* improve tests for Tensor.multinomial
* fix edge case in Tensor.multinomial
* Tensor.multinomial no more staticmethod
* zero in shape start
* no assert for that
* if output size is 0, return without exec
* tweak
* strides
* reduce over non-zero
* shrink and expand
* fix import
* test_elementwise where
* cannot reshape from size 0 to size 1
* compiled backend reduce over 0
* zeros for numpy
* reduce over 0 and keepdim resulted in 1
* reduce empty set default values
* compare with same input
* pad test case
* cat test case
* torch does not support that?
* metal indirect command buffers
* sub 1ms gpt
* metal batch exec is good
* remove whitespace
* input_replace
* fix ci
* useResources
* very simple cacheallocator
* update_stats
* fix CI
* minor
* remove that from jit
* refactor/ci: delete many `# type: ignore`
* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag
refs #2240
* ci: move `--warn-unused-ignores` flag to mypy config
refs #2240
* var_vals are global
* working with global ish
* better
* fix export model
* fix tests
* better kv cache
* does it run?
* use where for kvmask
* fix excessive var_vals
* fix import
* how does multigpu use this?
* llama kinda work
* faster and simpler
* cleanup
* fix conversation mode
* test cleanups
* fix one more test
* test cleanup
---------
Co-authored-by: George Hotz <geohot@gmail.com>
* Change linearizer to parse CAST
* Oneliner renders for cstyle and triton
* LLVM cast and ALU implementation
* pylint fixes
* cast in gep
* remove printbufs
* use cast for post-load ops
* get rid of parse_cast
* partially supported vectorized dtypes for initial dev
* render phi as the dtype
* Revert "partially supported vectorized dtypes for initial dev"
This reverts commit 1bf1a818a3.
* Revert "render phi as the dtype"
This reverts commit d08cb270b4.
* reenable triton tests
* no vstore_half if dtype is already half
* upcast max
* Change linearizer to parse CAST
* Oneliner renders for cstyle and triton
* LLVM cast and ALU implementation
* pylint fixes
* cast in gep
* remove printbufs
* use cast for post-load ops
* get rid of parse_cast
* partially supported vectorized dtypes for initial dev
* render phi as the dtype
* Revert "partially supported vectorized dtypes for initial dev"
This reverts commit 1bf1a818a3.
* Revert "render phi as the dtype"
This reverts commit d08cb270b4.
* reenable triton tests
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* For cuda get current free space from device, and rery alloc failures
* type ignore for mypy
* add init to get free mem in cuda
* Move retry logic in common lib.
Fix typo in override _get_cur_free_space
* linter error fix in test file
* Not catch all, as it will catch KeyboardInterrupt
* fix unintened line changes
* fix test ops
* decompose the err from test_ops
* skipTest skips the entire test, we dont want that
* handle cases with the same priority
* add int16 to torch map