David Hou
9f66dcf718
PolynomialDecayWithWarmup + tests ( #3649 )
...
* working PolynomialDecayWithWarmup + tests.......
add lars_util.py, oops
* keep lars_util.py as intact as possible, simplify our interface
* whitespace
* clean up
* clean up
* asserts
* test polylr for full resnet training run
* add comment
* rename
* fix do_optim
* don't cast lr
* info
* calculate from train_files
* skip it
2024-03-07 18:53:36 -05:00
chenyu
57df8e8d82
update fuzz_linearizer ( #3648 )
...
included non-reduce kernel and kernel with variables. green msg when everything passed
it's possible that creating rawbufs failed due to memory error, included that in failure cases
2024-03-07 18:41:22 -05:00
chenyu
b282a45e39
fix direct store float4 with same vin ( #3652 )
...
In a kernel that stores expanded value, the vin of float4 can come from same source, and we only remove once in that case.
2024-03-07 18:11:50 -05:00
chenyu
a66ffec6d3
update kernel dataset to exclude the disktensor ones ( #3651 )
...
disk tensor load contains big offset and is not meant to be run by gpu.
repro steps
```
time ./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
2024-03-07 17:35:19 -05:00
chenyu
fcf4a5ccf2
fix example that calls Tensor.__bool__ ( #3650 )
...
also removed `.cpu()` calls in mask_rcnn so `python3 examples/mlperf/model_spec.py` runs
2024-03-07 16:59:26 -05:00
George Hotz
6e50582e62
working to improve ptx ( #3647 )
...
* working to improve ptx
* fix compile fail
2024-03-07 12:39:31 -08:00
Zaffer
1853ec9a02
add tests for bfloat16 on HIP ( #3638 )
...
* Fix bug in login functionality
* Remove HSA backend test and add bfloat16 dtype tests that run in CI
* Skip tests on HIPCPU
* skip tests causing segfault on LLVM backend
* Exclude bfloat16 tests causing segfaults in LLVM backend
* move bf16 cast tests to only test on HIP
2024-03-07 10:45:36 -08:00
chenyu
0cef284aac
fix typing FlopCounter.flops can be sint ( #3646 )
2024-03-07 12:49:17 -05:00
chenyu
906cc3a69b
cleanup tests Device[Device.DEFAULT] is always Compiled ( #3645 )
2024-03-07 11:15:42 -05:00
qazal
bdd62c7fd8
make the bf16 include dynamic ( #3642 )
...
* dynamic prefix
* add common ones above
these are common dtypes
aesthetics
* regression test
fuzz it
test
* run in CI
* use .append
* faster
2024-03-07 10:31:35 -05:00
chenyu
4552248c84
fix Tensor.to preserves grad.data ( #3636 )
2024-03-06 21:44:49 -05:00
chenyu
d33311ebe0
remove parens of ALU if it has associative property ( #3635 )
...
need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos,
in which case cannot remove the parens of children
2024-03-06 21:12:11 -05:00
chenyu
fe6b6e38c1
remove parentheses of GEP if it's from SSA ( #3634 )
...
fixed some bracket nesting level exceeded maximum of 256 errors
2024-03-06 20:22:46 -05:00
David Hou
0afaf70d57
lars optimizer + tests ( #3631 )
...
* lars optimizer + tests
* fix skip list!
* use id to compare in skip list
* go back to using set
* Tensor(bool) * Tensor(bool) is and
* don't lint external/mlperf_resnet
* whitespace
* add external_test_optim to opencl tests
* give mlperf task a name
* mlperf under onnx
* remove track_gnorm
* contiguous instead of realize
* assert momentum and weight decay positive
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-03-06 18:11:01 -05:00
chenyu
b2e92d44fa
skip METAL sin test in test_dtype_alu ( #3633 )
...
revert this part of #3629 . this is flaky
2024-03-06 17:29:19 -05:00
chenyu
8f10bfa2ff
ban __bool__ on Tensor ( #3632 )
...
* ban __bool__ on Tensor
avoid misuse
* test case
* fix tests
* fix more tests
2024-03-06 17:12:35 -05:00
George Hotz
81baf3eed3
bring ptx back ( #3623 )
...
* bring ptx back
* ptx back
* fix define var
* fix a few bugs
* bugfixes
* fixes
* fix llvm bug
* fix test bug
2024-03-06 13:34:21 -08:00
chenyu
c270d54c32
update test_dtype_alu for METAL ( #3629 )
2024-03-06 14:55:19 -05:00
qazal
abc5f3a6a0
hip bf16 hotfix ( #3630 )
...
* hip bf16
* remu dev mac
* Revert "remu dev mac"
This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd.
* skip disk tests in CI
* bring float8 back
2024-03-06 11:42:30 -08:00
chenyu
bc2a13a5f7
test case to show clang and python doing math in double ( #3628 )
2024-03-06 13:49:03 -05:00
George Hotz
568353fa84
hotfix: bump line count to 6500
2024-03-06 07:52:18 -08:00
Elias Wahl
a1507c7fd4
Fix Tensor.dropout() with multigpu ( #3619 )
...
* Tensor.rand with multilazybuffer
* remove recursive + test
* whitespace
* another whitespace. Sorry
* remove else
* Conconicalize multidevice tuple + Remove src
2024-03-05 18:26:21 -05:00
Jungwan Woo
e5ee6bb2bd
fix outdated url in showcase doc ( #3624 )
2024-03-05 14:44:40 -08:00
George Hotz
8500265561
this mem fault still happening ( #3620 )
...
* this mem fault still happening
* smaller
* that print doesn't work
* overflows test
* hip doesn't uses_ptr_arithmetic
* only with locals
* test overflow new name
* it's not ptr arith
* simpler
* simple repro
* old compiler
* simpler
* put that back
2024-03-05 10:39:32 -08:00
chenyu
3c3f846c45
tinybox benchmark with HSA ( #3603 )
...
* tinybox benchmark with HSA
* torch cuda init can fail
* no TORCHCUDA
* print torch version
* LD_PRELOAD="/opt/rocm/lib/libhsa-runtime64.so"
2024-03-05 11:03:52 -05:00
George Hotz
f500be1313
out of bounds access caused by launch bounds ( #3615 )
...
* lin overflow
* remove launch bounds
* remove launch bounds infra
* oops, fix bufs type
2024-03-05 06:34:00 -08:00
qazal
eb83e2d3a0
decouple buffer mutability from cstyle ( #3617 )
...
* buffer mutability as an arg
* update test_uops
2024-03-05 06:20:59 -08:00
chenyu
3275260c98
Revert "test: add failing bfloat16 test case for metal backend ( #3481 )" ( #3618 )
...
This reverts commit 1e12a2ae80 .
2024-03-05 09:08:42 -05:00
Skosh
1e12a2ae80
test: add failing bfloat16 test case for metal backend ( #3481 )
...
* test: add failing bfloat16 test case for metal backend
* test: move bfloat 16 test to dtypes test
2024-03-05 08:44:54 -05:00
chenyu
957e9800f1
llama + beam to mac benchmark, full cifar to nvidia benchmark ( #3612 )
...
would merge if it's also ~1 minute. btw why is gpt2 beam not slower in the first beam run?
2024-03-04 21:35:57 -05:00
chenyu
282bbd5acb
check the input length into argfix ( #3610 )
...
* check the input length into argfix
it's possible to overlook setting keyword for kwargs and argfix silently truncates input
* add test
2024-03-04 19:50:17 -05:00
Elias Wahl
7db6dd725d
multilazybuffer fix ( #3609 )
2024-03-04 17:36:23 -05:00
chenyu
c3b8d285aa
cleanup uops ( #3605 )
...
using `is` to compare with enums, remove long lines and slightly more compact
2024-03-04 11:03:14 -05:00
qazal
94679322a3
simpler float4 direct store and locals support ( #3592 )
...
* swap vins instead
* delete the upcast
* leave it to remove_childless try 1
* Revert "leave it to remove_childless try 1"
This reverts commit bf25e935f8 .
* try 2, simpler
* Revert "try 2, simpler"
This reverts commit d2472af711 .
* add note
2024-03-04 06:28:28 -08:00
nimlgen
3db826e195
hsa in lin opts ( #3602 )
2024-03-04 06:17:32 -08:00
Francis Lam
7c90005c65
search: hotfix to make sure TC behavior is all in applied_opts ( #3598 )
...
* search: hotfix to make sure TC behavior is all in applied_opts
* fix linter error
* fix mypy
2024-03-03 21:44:38 -05:00
chenyu
8e5d60a322
add more gpt2 variant in mac/nvidia benchmark ( #3599 )
2024-03-03 17:55:30 -05:00
chenyu
968d109453
apply more create_lt_node ( #3597 )
...
updated one in linearizer if condition, and various symbolic tests
2024-03-03 16:12:39 -05:00
Patrick Tsai
bc562c4747
Python div alu behavior differs slightly from others ( #3596 )
...
* Divide op rounding for negatives
* extra space
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com >
2024-03-03 10:48:25 -08:00
Marcin Słowik
56d21d77b3
Fix two bugs concerning Tensor.to. ( #3593 )
...
1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device.
2. Tensor.to result was missing graph, even though requires_grad and grad were propagated .
Add corresponding tests.
2024-03-03 08:48:56 -08:00
Patrick Tsai
0082300a59
Fix symbolic negative floordiv ( #3594 )
...
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com >
2024-03-03 11:40:52 -05:00
chenyu
e09619ab6c
explicitly create_lt_node when used in shapetracker _expr_view ( #3561 )
...
* explicitly create_lt_node when used in shapetracker
leave regular __lt__ and cmps for symbolic shape cmp
* hmm it fixed that?
* LtNode.substitute uses create_lt_node
2024-03-03 10:08:21 -05:00
nimlgen
640dc0fc51
hsa flush hdp ( #3591 )
...
* hsa flush hdp
* use _alloc()
2024-03-03 04:55:07 -08:00
reddyn12
660df3cff1
Add test for .softmax.argmax ( #3559 )
...
* Add broken test for known issue
* skip PYTHON
* skip PYTHON
* fix commit
---------
Co-authored-by: schlimeszn <schlimeszn@gmail.com >
Co-authored-by: reddyn <nikidsniper@gmail.com >
2024-03-02 20:51:52 -08:00
chenyu
ee41fafdab
use operator instead of lambda in python_alu ( #3590 )
2024-03-02 19:33:21 -05:00
qazal
a89afd4ffa
Directly store float4 nodes ( #3564 )
...
* float4 cast collapse
* simplify cstyle
* simplify uoptimizer
* ci
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-03-02 15:58:20 -08:00
George Hotz
770707b376
hotfix: gpuocelot no rebuild
2024-03-02 15:57:38 -08:00
George Hotz
74c9acddb0
simple python ALU ( #3589 )
...
* shorter
* bugfix
2024-03-02 15:50:58 -08:00
Francis Lam
162dfb07d9
fuzz_linearizer: fix uops and add to test.yml ( #3588 )
2024-03-02 15:03:42 -08:00
Jovan Sardinha
8978488565
add sanity tests for bufs_from_lin ( #3586 )
2024-03-02 14:17:43 -08:00