Commit Graph

10633 Commits

Author SHA1 Message Date
chenyu
0cef284aac fix typing FlopCounter.flops can be sint (#3646) 2024-03-07 12:49:17 -05:00
chenyu
906cc3a69b cleanup tests Device[Device.DEFAULT] is always Compiled (#3645) 2024-03-07 11:15:42 -05:00
qazal
bdd62c7fd8 make the bf16 include dynamic (#3642)
* dynamic prefix

* add common ones above

these are common dtypes

aesthetics

* regression test

fuzz it

test

* run in CI

* use .append

* faster
2024-03-07 10:31:35 -05:00
chenyu
4552248c84 fix Tensor.to preserves grad.data (#3636) 2024-03-06 21:44:49 -05:00
chenyu
d33311ebe0 remove parens of ALU if it has associative property (#3635)
need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos,
in which case cannot remove the parens of children
2024-03-06 21:12:11 -05:00
chenyu
fe6b6e38c1 remove parentheses of GEP if it's from SSA (#3634)
fixed some bracket nesting level exceeded maximum of 256 errors
2024-03-06 20:22:46 -05:00
David Hou
0afaf70d57 lars optimizer + tests (#3631)
* lars optimizer + tests

* fix skip list!

* use id to compare in skip list

* go back to using set

* Tensor(bool) * Tensor(bool) is and

* don't lint external/mlperf_resnet

* whitespace

* add external_test_optim to opencl tests

* give mlperf task a name

* mlperf under onnx

* remove track_gnorm

* contiguous instead of realize

* assert momentum and weight decay positive

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-06 18:11:01 -05:00
chenyu
b2e92d44fa skip METAL sin test in test_dtype_alu (#3633)
revert this part of #3629. this is flaky
2024-03-06 17:29:19 -05:00
chenyu
8f10bfa2ff ban __bool__ on Tensor (#3632)
* ban __bool__ on Tensor

avoid misuse

* test case

* fix tests

* fix more tests
2024-03-06 17:12:35 -05:00
George Hotz
81baf3eed3 bring ptx back (#3623)
* bring ptx back

* ptx back

* fix define var

* fix a few bugs

* bugfixes

* fixes

* fix llvm bug

* fix test bug
2024-03-06 13:34:21 -08:00
chenyu
c270d54c32 update test_dtype_alu for METAL (#3629) 2024-03-06 14:55:19 -05:00
qazal
abc5f3a6a0 hip bf16 hotfix (#3630)
* hip bf16

* remu dev mac

* Revert "remu dev mac"

This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd.

* skip disk tests in CI

* bring float8 back
2024-03-06 11:42:30 -08:00
chenyu
bc2a13a5f7 test case to show clang and python doing math in double (#3628) 2024-03-06 13:49:03 -05:00
George Hotz
568353fa84 hotfix: bump line count to 6500 2024-03-06 07:52:18 -08:00
Elias Wahl
a1507c7fd4 Fix Tensor.dropout() with multigpu (#3619)
* Tensor.rand with multilazybuffer

* remove recursive + test

* whitespace

* another whitespace. Sorry

* remove else

* Conconicalize multidevice tuple + Remove src
2024-03-05 18:26:21 -05:00
Jungwan Woo
e5ee6bb2bd fix outdated url in showcase doc (#3624) 2024-03-05 14:44:40 -08:00
George Hotz
8500265561 this mem fault still happening (#3620)
* this mem fault still happening

* smaller

* that print doesn't work

* overflows test

* hip doesn't uses_ptr_arithmetic

* only with locals

* test overflow new name

* it's not ptr arith

* simpler

* simple repro

* old compiler

* simpler

* put that back
2024-03-05 10:39:32 -08:00
chenyu
3c3f846c45 tinybox benchmark with HSA (#3603)
* tinybox benchmark with HSA

* torch cuda init can fail

* no TORCHCUDA

* print torch version

* LD_PRELOAD="/opt/rocm/lib/libhsa-runtime64.so"
2024-03-05 11:03:52 -05:00
George Hotz
f500be1313 out of bounds access caused by launch bounds (#3615)
* lin overflow

* remove launch bounds

* remove launch bounds infra

* oops, fix bufs type
2024-03-05 06:34:00 -08:00
qazal
eb83e2d3a0 decouple buffer mutability from cstyle (#3617)
* buffer mutability as an arg

* update test_uops
2024-03-05 06:20:59 -08:00
chenyu
3275260c98 Revert "test: add failing bfloat16 test case for metal backend (#3481)" (#3618)
This reverts commit 1e12a2ae80.
2024-03-05 09:08:42 -05:00
Skosh
1e12a2ae80 test: add failing bfloat16 test case for metal backend (#3481)
* test: add failing bfloat16 test case for metal backend

* test: move bfloat 16 test to dtypes test
2024-03-05 08:44:54 -05:00
chenyu
957e9800f1 llama + beam to mac benchmark, full cifar to nvidia benchmark (#3612)
would merge if it's also ~1 minute. btw why is gpt2 beam not slower in the first beam run?
2024-03-04 21:35:57 -05:00
chenyu
282bbd5acb check the input length into argfix (#3610)
* check the input length into argfix

it's possible to overlook setting keyword for kwargs and argfix silently truncates input

* add test
2024-03-04 19:50:17 -05:00
Elias Wahl
7db6dd725d multilazybuffer fix (#3609) 2024-03-04 17:36:23 -05:00
chenyu
c3b8d285aa cleanup uops (#3605)
using `is` to compare with enums, remove long lines and slightly more compact
2024-03-04 11:03:14 -05:00
qazal
94679322a3 simpler float4 direct store and locals support (#3592)
* swap vins instead

* delete the upcast

* leave it to remove_childless try 1

* Revert "leave it to remove_childless try 1"

This reverts commit bf25e935f8.

* try 2, simpler

* Revert "try 2, simpler"

This reverts commit d2472af711.

* add note
2024-03-04 06:28:28 -08:00
nimlgen
3db826e195 hsa in lin opts (#3602) 2024-03-04 06:17:32 -08:00
Francis Lam
7c90005c65 search: hotfix to make sure TC behavior is all in applied_opts (#3598)
* search: hotfix to make sure TC behavior is all in applied_opts

* fix linter error

* fix mypy
2024-03-03 21:44:38 -05:00
chenyu
8e5d60a322 add more gpt2 variant in mac/nvidia benchmark (#3599) 2024-03-03 17:55:30 -05:00
chenyu
968d109453 apply more create_lt_node (#3597)
updated one in linearizer if condition, and various symbolic tests
2024-03-03 16:12:39 -05:00
Patrick Tsai
bc562c4747 Python div alu behavior differs slightly from others (#3596)
* Divide op rounding for negatives

* extra space

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-03-03 10:48:25 -08:00
Marcin Słowik
56d21d77b3 Fix two bugs concerning Tensor.to. (#3593)
1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device.
2. Tensor.to result was missing graph, even though requires_grad and grad were propagated .

Add corresponding tests.
2024-03-03 08:48:56 -08:00
Patrick Tsai
0082300a59 Fix symbolic negative floordiv (#3594)
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-03-03 11:40:52 -05:00
chenyu
e09619ab6c explicitly create_lt_node when used in shapetracker _expr_view (#3561)
* explicitly create_lt_node when used in shapetracker

leave regular __lt__ and cmps for symbolic shape cmp

* hmm it fixed that?

* LtNode.substitute uses create_lt_node
2024-03-03 10:08:21 -05:00
nimlgen
640dc0fc51 hsa flush hdp (#3591)
* hsa flush hdp

* use _alloc()
2024-03-03 04:55:07 -08:00
reddyn12
660df3cff1 Add test for .softmax.argmax (#3559)
* Add broken test for known issue

* skip PYTHON

* skip PYTHON

* fix commit

---------

Co-authored-by: schlimeszn <schlimeszn@gmail.com>
Co-authored-by: reddyn <nikidsniper@gmail.com>
2024-03-02 20:51:52 -08:00
chenyu
ee41fafdab use operator instead of lambda in python_alu (#3590) 2024-03-02 19:33:21 -05:00
qazal
a89afd4ffa Directly store float4 nodes (#3564)
* float4 cast collapse

* simplify cstyle

* simplify uoptimizer

* ci

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-02 15:58:20 -08:00
George Hotz
770707b376 hotfix: gpuocelot no rebuild 2024-03-02 15:57:38 -08:00
George Hotz
74c9acddb0 simple python ALU (#3589)
* shorter

* bugfix
2024-03-02 15:50:58 -08:00
Francis Lam
162dfb07d9 fuzz_linearizer: fix uops and add to test.yml (#3588) 2024-03-02 15:03:42 -08:00
Jovan Sardinha
8978488565 add sanity tests for bufs_from_lin (#3586) 2024-03-02 14:17:43 -08:00
George Hotz
aa9b013d79 add constant folding for WHERE in uops (#3584)
* add constant folding for WHERE in uops

* prereqs for generic constant folding

* fix test

* disable slow overflow logic

* make that test faster
2024-03-02 10:37:14 -08:00
nimlgen
3b7e3fa2e4 fix sync in hsa graph (#3582) 2024-03-02 07:37:51 -08:00
Szymon Ożóg
6c36264790 Improve type hints for optimizer (#3583)
* Improve type hints for optimizer

* lint fix
2024-03-02 07:35:44 -08:00
George Hotz
83530a585f add quick external data select test 2024-03-02 05:38:32 -08:00
George Hotz
9a37273d36 consts don't have nodes in the graph (#3579)
* consts don't have nodes in the graph

* add idx
2024-03-02 04:19:11 -08:00
George Hotz
41f0a25b53 lazy.py: cache consts (#3577)
* lazy.py: cache consts

* add regression test

* always always cache const

* bump by 1
2024-03-02 03:50:05 -08:00
uuuvn
fb8acd1851 Don't touch UOps.DEFINE_GLOBAL (#3575) 2024-03-02 03:30:05 -08:00