Commit Graph

1459 Commits

Author SHA1 Message Date
chenyu
4552248c84 fix Tensor.to preserves grad.data (#3636) 2024-03-06 21:44:49 -05:00
chenyu
d33311ebe0 remove parens of ALU if it has associative property (#3635)
need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos,
in which case cannot remove the parens of children
2024-03-06 21:12:11 -05:00
chenyu
fe6b6e38c1 remove parentheses of GEP if it's from SSA (#3634)
fixed some bracket nesting level exceeded maximum of 256 errors
2024-03-06 20:22:46 -05:00
David Hou
0afaf70d57 lars optimizer + tests (#3631)
* lars optimizer + tests

* fix skip list!

* use id to compare in skip list

* go back to using set

* Tensor(bool) * Tensor(bool) is and

* don't lint external/mlperf_resnet

* whitespace

* add external_test_optim to opencl tests

* give mlperf task a name

* mlperf under onnx

* remove track_gnorm

* contiguous instead of realize

* assert momentum and weight decay positive

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-06 18:11:01 -05:00
chenyu
b2e92d44fa skip METAL sin test in test_dtype_alu (#3633)
revert this part of #3629. this is flaky
2024-03-06 17:29:19 -05:00
chenyu
8f10bfa2ff ban __bool__ on Tensor (#3632)
* ban __bool__ on Tensor

avoid misuse

* test case

* fix tests

* fix more tests
2024-03-06 17:12:35 -05:00
George Hotz
81baf3eed3 bring ptx back (#3623)
* bring ptx back

* ptx back

* fix define var

* fix a few bugs

* bugfixes

* fixes

* fix llvm bug

* fix test bug
2024-03-06 13:34:21 -08:00
chenyu
c270d54c32 update test_dtype_alu for METAL (#3629) 2024-03-06 14:55:19 -05:00
qazal
abc5f3a6a0 hip bf16 hotfix (#3630)
* hip bf16

* remu dev mac

* Revert "remu dev mac"

This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd.

* skip disk tests in CI

* bring float8 back
2024-03-06 11:42:30 -08:00
chenyu
bc2a13a5f7 test case to show clang and python doing math in double (#3628) 2024-03-06 13:49:03 -05:00
Elias Wahl
a1507c7fd4 Fix Tensor.dropout() with multigpu (#3619)
* Tensor.rand with multilazybuffer

* remove recursive + test

* whitespace

* another whitespace. Sorry

* remove else

* Conconicalize multidevice tuple + Remove src
2024-03-05 18:26:21 -05:00
George Hotz
8500265561 this mem fault still happening (#3620)
* this mem fault still happening

* smaller

* that print doesn't work

* overflows test

* hip doesn't uses_ptr_arithmetic

* only with locals

* test overflow new name

* it's not ptr arith

* simpler

* simple repro

* old compiler

* simpler

* put that back
2024-03-05 10:39:32 -08:00
George Hotz
f500be1313 out of bounds access caused by launch bounds (#3615)
* lin overflow

* remove launch bounds

* remove launch bounds infra

* oops, fix bufs type
2024-03-05 06:34:00 -08:00
qazal
eb83e2d3a0 decouple buffer mutability from cstyle (#3617)
* buffer mutability as an arg

* update test_uops
2024-03-05 06:20:59 -08:00
chenyu
3275260c98 Revert "test: add failing bfloat16 test case for metal backend (#3481)" (#3618)
This reverts commit 1e12a2ae80.
2024-03-05 09:08:42 -05:00
Skosh
1e12a2ae80 test: add failing bfloat16 test case for metal backend (#3481)
* test: add failing bfloat16 test case for metal backend

* test: move bfloat 16 test to dtypes test
2024-03-05 08:44:54 -05:00
chenyu
282bbd5acb check the input length into argfix (#3610)
* check the input length into argfix

it's possible to overlook setting keyword for kwargs and argfix silently truncates input

* add test
2024-03-04 19:50:17 -05:00
qazal
94679322a3 simpler float4 direct store and locals support (#3592)
* swap vins instead

* delete the upcast

* leave it to remove_childless try 1

* Revert "leave it to remove_childless try 1"

This reverts commit bf25e935f8.

* try 2, simpler

* Revert "try 2, simpler"

This reverts commit d2472af711.

* add note
2024-03-04 06:28:28 -08:00
chenyu
968d109453 apply more create_lt_node (#3597)
updated one in linearizer if condition, and various symbolic tests
2024-03-03 16:12:39 -05:00
Patrick Tsai
bc562c4747 Python div alu behavior differs slightly from others (#3596)
* Divide op rounding for negatives

* extra space

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-03-03 10:48:25 -08:00
Marcin Słowik
56d21d77b3 Fix two bugs concerning Tensor.to. (#3593)
1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device.
2. Tensor.to result was missing graph, even though requires_grad and grad were propagated .

Add corresponding tests.
2024-03-03 08:48:56 -08:00
Patrick Tsai
0082300a59 Fix symbolic negative floordiv (#3594)
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-03-03 11:40:52 -05:00
chenyu
e09619ab6c explicitly create_lt_node when used in shapetracker _expr_view (#3561)
* explicitly create_lt_node when used in shapetracker

leave regular __lt__ and cmps for symbolic shape cmp

* hmm it fixed that?

* LtNode.substitute uses create_lt_node
2024-03-03 10:08:21 -05:00
reddyn12
660df3cff1 Add test for .softmax.argmax (#3559)
* Add broken test for known issue

* skip PYTHON

* skip PYTHON

* fix commit

---------

Co-authored-by: schlimeszn <schlimeszn@gmail.com>
Co-authored-by: reddyn <nikidsniper@gmail.com>
2024-03-02 20:51:52 -08:00
qazal
a89afd4ffa Directly store float4 nodes (#3564)
* float4 cast collapse

* simplify cstyle

* simplify uoptimizer

* ci

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-02 15:58:20 -08:00
Francis Lam
162dfb07d9 fuzz_linearizer: fix uops and add to test.yml (#3588) 2024-03-02 15:03:42 -08:00
Jovan Sardinha
8978488565 add sanity tests for bufs_from_lin (#3586) 2024-03-02 14:17:43 -08:00
George Hotz
aa9b013d79 add constant folding for WHERE in uops (#3584)
* add constant folding for WHERE in uops

* prereqs for generic constant folding

* fix test

* disable slow overflow logic

* make that test faster
2024-03-02 10:37:14 -08:00
George Hotz
83530a585f add quick external data select test 2024-03-02 05:38:32 -08:00
George Hotz
41f0a25b53 lazy.py: cache consts (#3577)
* lazy.py: cache consts

* add regression test

* always always cache const

* bump by 1
2024-03-02 03:50:05 -08:00
David Hou
ba6c041eab fix SCE ignore_index with label_smoothing (#3574)
* fix SCE ignore_index with label_smoothing

* break up the line

* only 3 cats in test

* Revert "only 3 cats in test"

This reverts commit 18be069c90.
2024-03-01 22:19:45 -05:00
David Hou
b3cdc11a58 label_smoothing in sparse_cat_crossentropy (#3568)
* label_smoothing in sparse_cat_crossentropy

* test multiple values, assert
2024-03-01 20:02:46 -05:00
George Hotz
6b29c70b3d Refactor to UOpGraph class (#3566)
* Refactor to UOpGraph class

* fix test
2024-03-01 15:14:48 -08:00
chenyu
48d22067ca clean up test_linearizer_failures (#3562)
* cleanup test_linearizer_failures

* fix test_failure_8

* fix that

* better assert message
2024-03-01 15:57:17 -05:00
David Hou
d16aa89561 don't allow MLB assigns with different axes (#3557)
* allow LB <- MLB assign, but don't reuse buffer

* update test

* update test

* assign assert axes are the same

* update tests to manually shard running stats

* unused import
2024-03-01 07:59:06 -05:00
chenyu
cfd23f398d Revert "don't allow MLB assigns with different axes (#3483)" (#3554)
This reverts commit f19d8bb7b4.
2024-02-29 23:13:07 -05:00
David Hou
f19d8bb7b4 don't allow MLB assigns with different axes (#3483)
* allow LB <- MLB assign, but don't reuse buffer

* update test

* update test

* assign assert axes are the same
2024-02-29 23:04:12 -05:00
David Hou
e5385eecfc UnsyncedBatchNorm with synced trainable weights for hlb cifar (#3472)
* UnsyncedBatchNorm with synced trainable weights for hlb cifar

* multitensor reshape tests

* test mlb assign change axis

* E501

* argfix axis

* don't import batchnorm from hlb_cifar in test_multitensor

* pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB

* add backprop test for UnsyncedBatchNorm

* break out MLB assign and reshape changes

* manually shard running mean and running var

* don't shard unless syncbn=0

* replace nn.BatchNorm2d with UnsyncedBatchNorm

* don't increment num_batches_tracked if not tracking running stats

* update tests

* oops

* Revert "oops"

This reverts commit 5e8a67a535.

* Revert "update tests"

This reverts commit 7ebf65d89a.

* Revert "don't increment num_batches_tracked if not tracking running stats"

This reverts commit 78de0ea9ee.

* Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm"

This reverts commit d03da53da7.

* don't increment num_batched_tracked if not tracking running stats

* oops

* test_batchnorm_axis

* compare against torch

* types

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-29 22:52:07 -05:00
George Hotz
bd9c2ced07 define var can be removed from vars to keep (#3549)
* define var can be removed

* sint

* oops, didn't store
2024-02-29 17:44:19 -08:00
George Hotz
83cdc85790 add index to DEFINE_GLOBAL (#3542)
* remove DEFINE_GLOBAL from uops with side effects

* add index to DEFINE_GLOBAL

* bugfix

* better var name
2024-02-29 15:22:26 -08:00
Francis Lam
5d434801fa search: add tensor core to beam search space (#3275)
* search: add tensor core to beam search space

* kernel: refactor apply_tensor_core into apply_opt and hand_coded

* kernel: revert removal of apply_tensor_cores

also revert BEAM search parameter changes
2024-02-29 13:05:10 -08:00
Marcin Słowik
f90caa4b92 Escape table name in diskcache queries. (#3543)
Some devices create cache table names with non-alphanumerical characters, e.g. "compile_hip_gfx1010:xnack-_12".
This commit escapes the table name in single quotes s.t. sqlite works (see https://github.com/tinygrad/tinygrad/issues/3538).
2024-02-29 13:04:21 -08:00
geohotstan
9268a8b154 remove MULACC (#3459)
* init

* removed mulacc

* is uoptimize the problem?

* lol hax make work temporarily fix l8er

* revert extra/ changes

* clean up

* flaky metal tests?

* add back mulacc for metal

* revert last commit

* try skipping linearizer_failure tests

* skip flammit tests... cuz tests all work locally

* try narrow down exact linearizer failure test

* try 2

* try 4

* generated code is the exact same wtf why CI fails

* code for 15 and 17 are exact same with or without mulacc, this should pass

* try only 1 failure

* try garbage collecting lol...

* try del variables lol

* try gcing after del lol...

* is diskcache the problem???

* try disabling opts cache idk

* try remove hack

* try disable github metal cache...

* try CACHELEVEL=0 :D idk anymore

* try increase newCommandQueueWithMaxCommandBufferCount_, im almost out of ideas...

* revert

* actually not a HACK

* oops
2024-02-29 07:40:40 -05:00
qazal
94fc0fd546 uop the float4 acc upcast in group_for_reduce kernels (#3466)
* simplest one

* but i can trust this will be cached correctly

* wait that was wrong too

* cleanup

* test_reduce_upcast for single reduce case

* a late accumulator always outputs to gds

lint
2024-02-28 17:33:47 -08:00
George Hotz
48918fa75a fix disktensor offset issue (#3532) 2024-02-28 17:22:17 -08:00
David Friehs
275971e616 fix: align .split, .chunk and .unsqueeze with torch, add fuzz tests (#3505)
this fixes .split where self.shape[dim] is not perfectly divisible by
sizes - .chunk is always the wrong choice here:
 - tensor((5,)).split(4) should result in (tensor((4,)), tensor((1,)))
   was (tensor((3,)), tensor((2,)))

this also fixes issues in .split and .chunk where tensors with
shape[dim]==0 lead to empty tuples/lists when the tensor itself should
have been returned instead

because tinygrad is expected to fail in all cases where torch fails
tinygrad will now be strict regarding sizes having to sum up to passed
dimension in .split, num having to be non-null for .chunk and only
allowing valid dims in .unsqueeze
2024-02-28 17:06:39 -08:00
chenyu
0c6846f9fc failed test case for disk tensor assign into dtype int64 (#3527)
failed case for #3510, mark as expectedFailure for now
2024-02-28 17:52:21 -05:00
chenyu
d89e3c4e08 enable METAL tests now runner is M1 and no fast-math (#3523) 2024-02-28 14:14:23 -05:00
chenyu
1136e2a82a skipIf(not( -> skipUnless( in test_linearizer_failures (#3519)
if these behaves weirdly in CI might need to disable them in CI
2024-02-28 13:48:47 -05:00
chenyu
2127c1c6c2 test for the split reduce kernel (#3515)
somehow this was not tested
2024-02-27 21:29:25 -05:00