* add FUZZ_NTH to fuzz_linearizer
also update tests in test_linearizer_failures to not just run on METAL
* update failures for HIP/HSA
* test_failure_21 LLVM PADTO
* working PolynomialDecayWithWarmup + tests.......
add lars_util.py, oops
* keep lars_util.py as intact as possible, simplify our interface
* whitespace
* clean up
* clean up
* asserts
* test polylr for full resnet training run
* add comment
* rename
* fix do_optim
* don't cast lr
* info
* calculate from train_files
* skip it
included non-reduce kernel and kernel with variables. green msg when everything passed
it's possible that creating rawbufs failed due to memory error, included that in failure cases
* Fix bug in login functionality
* Remove HSA backend test and add bfloat16 dtype tests that run in CI
* Skip tests on HIPCPU
* skip tests causing segfault on LLVM backend
* Exclude bfloat16 tests causing segfaults in LLVM backend
* move bf16 cast tests to only test on HIP
need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos,
in which case cannot remove the parens of children
* lars optimizer + tests
* fix skip list!
* use id to compare in skip list
* go back to using set
* Tensor(bool) * Tensor(bool) is and
* don't lint external/mlperf_resnet
* whitespace
* add external_test_optim to opencl tests
* give mlperf task a name
* mlperf under onnx
* remove track_gnorm
* contiguous instead of realize
* assert momentum and weight decay positive
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* hip bf16
* remu dev mac
* Revert "remu dev mac"
This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd.
* skip disk tests in CI
* bring float8 back
* this mem fault still happening
* smaller
* that print doesn't work
* overflows test
* hip doesn't uses_ptr_arithmetic
* only with locals
* test overflow new name
* it's not ptr arith
* simpler
* simple repro
* old compiler
* simpler
* put that back
1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device.
2. Tensor.to result was missing graph, even though requires_grad and grad were propagated .
Add corresponding tests.
* explicitly create_lt_node when used in shapetracker
leave regular __lt__ and cmps for symbolic shape cmp
* hmm it fixed that?
* LtNode.substitute uses create_lt_node
* allow LB <- MLB assign, but don't reuse buffer
* update test
* update test
* assign assert axes are the same
* update tests to manually shard running stats
* unused import
* UnsyncedBatchNorm with synced trainable weights for hlb cifar
* multitensor reshape tests
* test mlb assign change axis
* E501
* argfix axis
* don't import batchnorm from hlb_cifar in test_multitensor
* pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB
* add backprop test for UnsyncedBatchNorm
* break out MLB assign and reshape changes
* manually shard running mean and running var
* don't shard unless syncbn=0
* replace nn.BatchNorm2d with UnsyncedBatchNorm
* don't increment num_batches_tracked if not tracking running stats
* update tests
* oops
* Revert "oops"
This reverts commit 5e8a67a535.
* Revert "update tests"
This reverts commit 7ebf65d89a.
* Revert "don't increment num_batches_tracked if not tracking running stats"
This reverts commit 78de0ea9ee.
* Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm"
This reverts commit d03da53da7.
* don't increment num_batched_tracked if not tracking running stats
* oops
* test_batchnorm_axis
* compare against torch
* types
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* search: add tensor core to beam search space
* kernel: refactor apply_tensor_core into apply_opt and hand_coded
* kernel: revert removal of apply_tensor_cores
also revert BEAM search parameter changes