Commit Graph

3710 Commits

Author SHA1 Message Date
chenyu
282bbd5acb check the input length into argfix (#3610)
* check the input length into argfix

it's possible to overlook setting keyword for kwargs and argfix silently truncates input

* add test
2024-03-04 19:50:17 -05:00
Elias Wahl
7db6dd725d multilazybuffer fix (#3609) 2024-03-04 17:36:23 -05:00
chenyu
c3b8d285aa cleanup uops (#3605)
using `is` to compare with enums, remove long lines and slightly more compact
2024-03-04 11:03:14 -05:00
qazal
94679322a3 simpler float4 direct store and locals support (#3592)
* swap vins instead

* delete the upcast

* leave it to remove_childless try 1

* Revert "leave it to remove_childless try 1"

This reverts commit bf25e935f8.

* try 2, simpler

* Revert "try 2, simpler"

This reverts commit d2472af711.

* add note
2024-03-04 06:28:28 -08:00
nimlgen
3db826e195 hsa in lin opts (#3602) 2024-03-04 06:17:32 -08:00
Francis Lam
7c90005c65 search: hotfix to make sure TC behavior is all in applied_opts (#3598)
* search: hotfix to make sure TC behavior is all in applied_opts

* fix linter error

* fix mypy
2024-03-03 21:44:38 -05:00
chenyu
8e5d60a322 add more gpt2 variant in mac/nvidia benchmark (#3599) 2024-03-03 17:55:30 -05:00
chenyu
968d109453 apply more create_lt_node (#3597)
updated one in linearizer if condition, and various symbolic tests
2024-03-03 16:12:39 -05:00
Patrick Tsai
bc562c4747 Python div alu behavior differs slightly from others (#3596)
* Divide op rounding for negatives

* extra space

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-03-03 10:48:25 -08:00
Marcin Słowik
56d21d77b3 Fix two bugs concerning Tensor.to. (#3593)
1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device.
2. Tensor.to result was missing graph, even though requires_grad and grad were propagated .

Add corresponding tests.
2024-03-03 08:48:56 -08:00
Patrick Tsai
0082300a59 Fix symbolic negative floordiv (#3594)
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-03-03 11:40:52 -05:00
chenyu
e09619ab6c explicitly create_lt_node when used in shapetracker _expr_view (#3561)
* explicitly create_lt_node when used in shapetracker

leave regular __lt__ and cmps for symbolic shape cmp

* hmm it fixed that?

* LtNode.substitute uses create_lt_node
2024-03-03 10:08:21 -05:00
nimlgen
640dc0fc51 hsa flush hdp (#3591)
* hsa flush hdp

* use _alloc()
2024-03-03 04:55:07 -08:00
reddyn12
660df3cff1 Add test for .softmax.argmax (#3559)
* Add broken test for known issue

* skip PYTHON

* skip PYTHON

* fix commit

---------

Co-authored-by: schlimeszn <schlimeszn@gmail.com>
Co-authored-by: reddyn <nikidsniper@gmail.com>
2024-03-02 20:51:52 -08:00
chenyu
ee41fafdab use operator instead of lambda in python_alu (#3590) 2024-03-02 19:33:21 -05:00
qazal
a89afd4ffa Directly store float4 nodes (#3564)
* float4 cast collapse

* simplify cstyle

* simplify uoptimizer

* ci

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-02 15:58:20 -08:00
George Hotz
770707b376 hotfix: gpuocelot no rebuild 2024-03-02 15:57:38 -08:00
George Hotz
74c9acddb0 simple python ALU (#3589)
* shorter

* bugfix
2024-03-02 15:50:58 -08:00
Francis Lam
162dfb07d9 fuzz_linearizer: fix uops and add to test.yml (#3588) 2024-03-02 15:03:42 -08:00
Jovan Sardinha
8978488565 add sanity tests for bufs_from_lin (#3586) 2024-03-02 14:17:43 -08:00
George Hotz
aa9b013d79 add constant folding for WHERE in uops (#3584)
* add constant folding for WHERE in uops

* prereqs for generic constant folding

* fix test

* disable slow overflow logic

* make that test faster
2024-03-02 10:37:14 -08:00
nimlgen
3b7e3fa2e4 fix sync in hsa graph (#3582) 2024-03-02 07:37:51 -08:00
Szymon Ożóg
6c36264790 Improve type hints for optimizer (#3583)
* Improve type hints for optimizer

* lint fix
2024-03-02 07:35:44 -08:00
George Hotz
83530a585f add quick external data select test 2024-03-02 05:38:32 -08:00
George Hotz
9a37273d36 consts don't have nodes in the graph (#3579)
* consts don't have nodes in the graph

* add idx
2024-03-02 04:19:11 -08:00
George Hotz
41f0a25b53 lazy.py: cache consts (#3577)
* lazy.py: cache consts

* add regression test

* always always cache const

* bump by 1
2024-03-02 03:50:05 -08:00
uuuvn
fb8acd1851 Don't touch UOps.DEFINE_GLOBAL (#3575) 2024-03-02 03:30:05 -08:00
George Hotz
50e1445e60 Revert "allow overriding weight init for Linear (#3569)" (#3576)
This reverts commit 2d0973a852.
2024-03-02 03:17:13 -08:00
David Hou
2d0973a852 allow overriding weight init for Linear (#3569) 2024-03-02 03:16:04 -08:00
Francis Lam
9642a8f547 search: add BEAM UPCAST/LOCAL params and loosen TC criteria during BEAM (#3563) 2024-03-02 03:11:25 -08:00
David Hou
ba6c041eab fix SCE ignore_index with label_smoothing (#3574)
* fix SCE ignore_index with label_smoothing

* break up the line

* only 3 cats in test

* Revert "only 3 cats in test"

This reverts commit 18be069c90.
2024-03-01 22:19:45 -05:00
Francis Lam
e17f1821a7 wmma: add CUDA tensor core and fix test_speed_v_torch failure (#3544) 2024-03-01 17:51:02 -08:00
David Hou
b3cdc11a58 label_smoothing in sparse_cat_crossentropy (#3568)
* label_smoothing in sparse_cat_crossentropy

* test multiple values, assert
2024-03-01 20:02:46 -05:00
George Hotz
6b29c70b3d Refactor to UOpGraph class (#3566)
* Refactor to UOpGraph class

* fix test
2024-03-01 15:14:48 -08:00
chenyu
b7e555f6c0 run test_linearizer_failures on PYTHON backend (#3565)
* run test_linearizer_failures on PYTHON backend

only test 1, some have hanging issues and gated store is not implemented

* --durations=20

* two less slow ones
2024-03-01 17:00:18 -05:00
chenyu
48d22067ca clean up test_linearizer_failures (#3562)
* cleanup test_linearizer_failures

* fix test_failure_8

* fix that

* better assert message
2024-03-01 15:57:17 -05:00
David Hou
d16aa89561 don't allow MLB assigns with different axes (#3557)
* allow LB <- MLB assign, but don't reuse buffer

* update test

* update test

* assign assert axes are the same

* update tests to manually shard running stats

* unused import
2024-03-01 07:59:06 -05:00
chenyu
cfd23f398d Revert "don't allow MLB assigns with different axes (#3483)" (#3554)
This reverts commit f19d8bb7b4.
2024-02-29 23:13:07 -05:00
David Hou
f19d8bb7b4 don't allow MLB assigns with different axes (#3483)
* allow LB <- MLB assign, but don't reuse buffer

* update test

* update test

* assign assert axes are the same
2024-02-29 23:04:12 -05:00
chenyu
35d998efa8 disable flaky test_conv_beam in CI (#3553)
might fail due to CL_OUT_OF_RESOURCES
2024-02-29 22:59:41 -05:00
David Hou
e5385eecfc UnsyncedBatchNorm with synced trainable weights for hlb cifar (#3472)
* UnsyncedBatchNorm with synced trainable weights for hlb cifar

* multitensor reshape tests

* test mlb assign change axis

* E501

* argfix axis

* don't import batchnorm from hlb_cifar in test_multitensor

* pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB

* add backprop test for UnsyncedBatchNorm

* break out MLB assign and reshape changes

* manually shard running mean and running var

* don't shard unless syncbn=0

* replace nn.BatchNorm2d with UnsyncedBatchNorm

* don't increment num_batches_tracked if not tracking running stats

* update tests

* oops

* Revert "oops"

This reverts commit 5e8a67a535.

* Revert "update tests"

This reverts commit 7ebf65d89a.

* Revert "don't increment num_batches_tracked if not tracking running stats"

This reverts commit 78de0ea9ee.

* Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm"

This reverts commit d03da53da7.

* don't increment num_batched_tracked if not tracking running stats

* oops

* test_batchnorm_axis

* compare against torch

* types

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-29 22:52:07 -05:00
George Hotz
5a6e151844 no barrier side effect (#3550)
* no barrier side effect

* finish barrier removal
2024-02-29 18:10:04 -08:00
George Hotz
bd9c2ced07 define var can be removed from vars to keep (#3549)
* define var can be removed

* sint

* oops, didn't store
2024-02-29 17:44:19 -08:00
George Hotz
2c19ab6561 define var (#3548)
* define var

* remove vars from there

* fix python symbolic ops

* fix llvm

* pypath
2024-02-29 16:43:27 -08:00
George Hotz
83cdc85790 add index to DEFINE_GLOBAL (#3542)
* remove DEFINE_GLOBAL from uops with side effects

* add index to DEFINE_GLOBAL

* bugfix

* better var name
2024-02-29 15:22:26 -08:00
chenyu
978a997d1f print nvidia-smi in CI benchmark (#3546) 2024-02-29 17:31:37 -05:00
Francis Lam
5d434801fa search: add tensor core to beam search space (#3275)
* search: add tensor core to beam search space

* kernel: refactor apply_tensor_core into apply_opt and hand_coded

* kernel: revert removal of apply_tensor_cores

also revert BEAM search parameter changes
2024-02-29 13:05:10 -08:00
Marcin Słowik
f90caa4b92 Escape table name in diskcache queries. (#3543)
Some devices create cache table names with non-alphanumerical characters, e.g. "compile_hip_gfx1010:xnack-_12".
This commit escapes the table name in single quotes s.t. sqlite works (see https://github.com/tinygrad/tinygrad/issues/3538).
2024-02-29 13:04:21 -08:00
nimlgen
0afde98ba5 scan all gpu agents at launch (#3535) 2024-02-29 09:37:37 -08:00
Mark McLoughlin
2e82c5b7a4 README: ops_cpu and ops_torch have been removed (#3539)
Removed by pull #3399
2024-02-29 10:22:11 -05:00