* fix no grad fn for < and ==
* remove 2 line breaks
* Remove deprecated autograd variable
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* remove np from DType
* convert to dataclass
* remove dunder hash, eq, ne overrides from ImageDType
* is dataclass required for PtrDType?
* fix GPU tests
* reduce lines
* revert changes to np
* minor cleanup
cast implicitly resets shapetracker and makes it contiguous (for disk tensor), which fails for Interpreted backend if inputs contain non-contiguous st.
* wmma: enable METAL half tensor cores and clean up cstyle
* revert simple_matmul rand changes and break line in tensor
* added metal fp16->fp32 tensor core
* fix broadcasted logic if there's 0 in shapes
should always expand into 0, not the other way around. fixed matmul with 0 in input shapes.
for forwards for now though, backward is more involved and would need to change 0 size shortcuts
* fix tests
* onehot in Tensor.py
* one_hot tests
* works for all shapes, not just 1
* pylint
* not a static method
* moved around, num_classes mandatory
* pylint
* pylint
* space & moving
* formatting
* moved tests
* add llama attention test for multigpu
* test fails
* kv cache trying to shrink on sharded axis
* mask None works for scale dot product
* kv cache seems to be working but scale dot product breaks
* scaled dot product works, but the last linear layer failed
* running into the reshape case where it could be wrong for multigpu
* making sure it was the reshape
* adding contiguous doesn't solve
* need to shard more properly
* remove reshape test
* minor adjustment to scale dot product attention test
* weights are sharded wrong
* continue fix new weight sharding
* clean up
* fix attention when start_pos is 0
* remove print
* add TODOs for the best mutigpu interface
* use device from LinearizerOptions in kernel search
removed all Device.DEFAULT in search.py
* pass device string for parallel pickle
* device for interpreted backends in LinearizerOptions
* mem_estimate is always int, not symbolic
op_estimate can be symbolic, but mem_estimate is always int, thus we don't need to sym_infer it.
fixed some long lines too. update_stats is a very big function
* operator does not need underscores