need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos,
in which case cannot remove the parens of children
* lars optimizer + tests
* fix skip list!
* use id to compare in skip list
* go back to using set
* Tensor(bool) * Tensor(bool) is and
* don't lint external/mlperf_resnet
* whitespace
* add external_test_optim to opencl tests
* give mlperf task a name
* mlperf under onnx
* remove track_gnorm
* contiguous instead of realize
* assert momentum and weight decay positive
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* hip bf16
* remu dev mac
* Revert "remu dev mac"
This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd.
* skip disk tests in CI
* bring float8 back
* this mem fault still happening
* smaller
* that print doesn't work
* overflows test
* hip doesn't uses_ptr_arithmetic
* only with locals
* test overflow new name
* it's not ptr arith
* simpler
* simple repro
* old compiler
* simpler
* put that back
1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device.
2. Tensor.to result was missing graph, even though requires_grad and grad were propagated .
Add corresponding tests.
* explicitly create_lt_node when used in shapetracker
leave regular __lt__ and cmps for symbolic shape cmp
* hmm it fixed that?
* LtNode.substitute uses create_lt_node
* allow LB <- MLB assign, but don't reuse buffer
* update test
* update test
* assign assert axes are the same
* update tests to manually shard running stats
* unused import
* UnsyncedBatchNorm with synced trainable weights for hlb cifar
* multitensor reshape tests
* test mlb assign change axis
* E501
* argfix axis
* don't import batchnorm from hlb_cifar in test_multitensor
* pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB
* add backprop test for UnsyncedBatchNorm
* break out MLB assign and reshape changes
* manually shard running mean and running var
* don't shard unless syncbn=0
* replace nn.BatchNorm2d with UnsyncedBatchNorm
* don't increment num_batches_tracked if not tracking running stats
* update tests
* oops
* Revert "oops"
This reverts commit 5e8a67a535.
* Revert "update tests"
This reverts commit 7ebf65d89a.
* Revert "don't increment num_batches_tracked if not tracking running stats"
This reverts commit 78de0ea9ee.
* Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm"
This reverts commit d03da53da7.
* don't increment num_batched_tracked if not tracking running stats
* oops
* test_batchnorm_axis
* compare against torch
* types
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* search: add tensor core to beam search space
* kernel: refactor apply_tensor_core into apply_opt and hand_coded
* kernel: revert removal of apply_tensor_cores
also revert BEAM search parameter changes
Some devices create cache table names with non-alphanumerical characters, e.g. "compile_hip_gfx1010:xnack-_12".
This commit escapes the table name in single quotes s.t. sqlite works (see https://github.com/tinygrad/tinygrad/issues/3538).
* init
* removed mulacc
* is uoptimize the problem?
* lol hax make work temporarily fix l8er
* revert extra/ changes
* clean up
* flaky metal tests?
* add back mulacc for metal
* revert last commit
* try skipping linearizer_failure tests
* skip flammit tests... cuz tests all work locally
* try narrow down exact linearizer failure test
* try 2
* try 4
* generated code is the exact same wtf why CI fails
* code for 15 and 17 are exact same with or without mulacc, this should pass
* try only 1 failure
* try garbage collecting lol...
* try del variables lol
* try gcing after del lol...
* is diskcache the problem???
* try disabling opts cache idk
* try remove hack
* try disable github metal cache...
* try CACHELEVEL=0 :D idk anymore
* try increase newCommandQueueWithMaxCommandBufferCount_, im almost out of ideas...
* revert
* actually not a HACK
* oops
* simplest one
* but i can trust this will be cached correctly
* wait that was wrong too
* cleanup
* test_reduce_upcast for single reduce case
* a late accumulator always outputs to gds
lint
this fixes .split where self.shape[dim] is not perfectly divisible by
sizes - .chunk is always the wrong choice here:
- tensor((5,)).split(4) should result in (tensor((4,)), tensor((1,)))
was (tensor((3,)), tensor((2,)))
this also fixes issues in .split and .chunk where tensors with
shape[dim]==0 lead to empty tuples/lists when the tensor itself should
have been returned instead
because tinygrad is expected to fail in all cases where torch fails
tinygrad will now be strict regarding sizes having to sum up to passed
dimension in .split, num having to be non-null for .chunk and only
allowing valid dims in .unsqueeze