when compilation succeeds, but runtime fails due to thread limits
on METAL, this allows a beam search to proceed, treating this
the same way as a compile failure.
* examples/stable_diffusion: support model checkpoints without alphas_cumprod key
(which is most models on civitai)
* fix indent
---------
Co-authored-by: a <a@a.aa>
* fix: make Tensor.rand produce correct values for float16
Due to precision loss when casting to float16, the data distribution created by custom_random isnt correctly in the interval ]0, 1[, but instead in the interval ]0, 1], which causes the Tensor.randn to incorrectly generate values of infinity.
The solution uses a scaling value to make sure the values stay under 1, when using half precision.
Closes#3611
* update implementation to truncate to closest f16 value to 1
* chore: fix whitespace
* test larger distribution
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
be more specific about invalid kernel opt, used that in test_linearizer_failures.
make BEAM kernel search work even with assertion disabled.
`BEAM=2 python3 -O examples/llama.py --temperature=0 --count=10 --prompt="Hello." --timing`
* add FUZZ_NTH to fuzz_linearizer
also update tests in test_linearizer_failures to not just run on METAL
* update failures for HIP/HSA
* test_failure_21 LLVM PADTO
* working PolynomialDecayWithWarmup + tests.......
add lars_util.py, oops
* keep lars_util.py as intact as possible, simplify our interface
* whitespace
* clean up
* clean up
* asserts
* test polylr for full resnet training run
* add comment
* rename
* fix do_optim
* don't cast lr
* info
* calculate from train_files
* skip it
included non-reduce kernel and kernel with variables. green msg when everything passed
it's possible that creating rawbufs failed due to memory error, included that in failure cases
disk tensor load contains big offset and is not meant to be run by gpu.
repro steps
```
time ./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
* Fix bug in login functionality
* Remove HSA backend test and add bfloat16 dtype tests that run in CI
* Skip tests on HIPCPU
* skip tests causing segfault on LLVM backend
* Exclude bfloat16 tests causing segfaults in LLVM backend
* move bf16 cast tests to only test on HIP
need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos,
in which case cannot remove the parens of children
* lars optimizer + tests
* fix skip list!
* use id to compare in skip list
* go back to using set
* Tensor(bool) * Tensor(bool) is and
* don't lint external/mlperf_resnet
* whitespace
* add external_test_optim to opencl tests
* give mlperf task a name
* mlperf under onnx
* remove track_gnorm
* contiguous instead of realize
* assert momentum and weight decay positive
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* hip bf16
* remu dev mac
* Revert "remu dev mac"
This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd.
* skip disk tests in CI
* bring float8 back