chenyu
282bbd5acb
check the input length into argfix ( #3610 )
...
* check the input length into argfix
it's possible to overlook setting keyword for kwargs and argfix silently truncates input
* add test
2024-03-04 19:50:17 -05:00
Elias Wahl
7db6dd725d
multilazybuffer fix ( #3609 )
2024-03-04 17:36:23 -05:00
chenyu
c3b8d285aa
cleanup uops ( #3605 )
...
using `is` to compare with enums, remove long lines and slightly more compact
2024-03-04 11:03:14 -05:00
qazal
94679322a3
simpler float4 direct store and locals support ( #3592 )
...
* swap vins instead
* delete the upcast
* leave it to remove_childless try 1
* Revert "leave it to remove_childless try 1"
This reverts commit bf25e935f8 .
* try 2, simpler
* Revert "try 2, simpler"
This reverts commit d2472af711 .
* add note
2024-03-04 06:28:28 -08:00
nimlgen
3db826e195
hsa in lin opts ( #3602 )
2024-03-04 06:17:32 -08:00
Francis Lam
7c90005c65
search: hotfix to make sure TC behavior is all in applied_opts ( #3598 )
...
* search: hotfix to make sure TC behavior is all in applied_opts
* fix linter error
* fix mypy
2024-03-03 21:44:38 -05:00
chenyu
8e5d60a322
add more gpt2 variant in mac/nvidia benchmark ( #3599 )
2024-03-03 17:55:30 -05:00
chenyu
968d109453
apply more create_lt_node ( #3597 )
...
updated one in linearizer if condition, and various symbolic tests
2024-03-03 16:12:39 -05:00
Patrick Tsai
bc562c4747
Python div alu behavior differs slightly from others ( #3596 )
...
* Divide op rounding for negatives
* extra space
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com >
2024-03-03 10:48:25 -08:00
Marcin Słowik
56d21d77b3
Fix two bugs concerning Tensor.to. ( #3593 )
...
1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device.
2. Tensor.to result was missing graph, even though requires_grad and grad were propagated .
Add corresponding tests.
2024-03-03 08:48:56 -08:00
Patrick Tsai
0082300a59
Fix symbolic negative floordiv ( #3594 )
...
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com >
2024-03-03 11:40:52 -05:00
chenyu
e09619ab6c
explicitly create_lt_node when used in shapetracker _expr_view ( #3561 )
...
* explicitly create_lt_node when used in shapetracker
leave regular __lt__ and cmps for symbolic shape cmp
* hmm it fixed that?
* LtNode.substitute uses create_lt_node
2024-03-03 10:08:21 -05:00
nimlgen
640dc0fc51
hsa flush hdp ( #3591 )
...
* hsa flush hdp
* use _alloc()
2024-03-03 04:55:07 -08:00
reddyn12
660df3cff1
Add test for .softmax.argmax ( #3559 )
...
* Add broken test for known issue
* skip PYTHON
* skip PYTHON
* fix commit
---------
Co-authored-by: schlimeszn <schlimeszn@gmail.com >
Co-authored-by: reddyn <nikidsniper@gmail.com >
2024-03-02 20:51:52 -08:00
chenyu
ee41fafdab
use operator instead of lambda in python_alu ( #3590 )
2024-03-02 19:33:21 -05:00
qazal
a89afd4ffa
Directly store float4 nodes ( #3564 )
...
* float4 cast collapse
* simplify cstyle
* simplify uoptimizer
* ci
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-03-02 15:58:20 -08:00
George Hotz
770707b376
hotfix: gpuocelot no rebuild
2024-03-02 15:57:38 -08:00
George Hotz
74c9acddb0
simple python ALU ( #3589 )
...
* shorter
* bugfix
2024-03-02 15:50:58 -08:00
Francis Lam
162dfb07d9
fuzz_linearizer: fix uops and add to test.yml ( #3588 )
2024-03-02 15:03:42 -08:00
Jovan Sardinha
8978488565
add sanity tests for bufs_from_lin ( #3586 )
2024-03-02 14:17:43 -08:00
George Hotz
aa9b013d79
add constant folding for WHERE in uops ( #3584 )
...
* add constant folding for WHERE in uops
* prereqs for generic constant folding
* fix test
* disable slow overflow logic
* make that test faster
2024-03-02 10:37:14 -08:00
nimlgen
3b7e3fa2e4
fix sync in hsa graph ( #3582 )
2024-03-02 07:37:51 -08:00
Szymon Ożóg
6c36264790
Improve type hints for optimizer ( #3583 )
...
* Improve type hints for optimizer
* lint fix
2024-03-02 07:35:44 -08:00
George Hotz
83530a585f
add quick external data select test
2024-03-02 05:38:32 -08:00
George Hotz
9a37273d36
consts don't have nodes in the graph ( #3579 )
...
* consts don't have nodes in the graph
* add idx
2024-03-02 04:19:11 -08:00
George Hotz
41f0a25b53
lazy.py: cache consts ( #3577 )
...
* lazy.py: cache consts
* add regression test
* always always cache const
* bump by 1
2024-03-02 03:50:05 -08:00
uuuvn
fb8acd1851
Don't touch UOps.DEFINE_GLOBAL ( #3575 )
2024-03-02 03:30:05 -08:00
George Hotz
50e1445e60
Revert "allow overriding weight init for Linear ( #3569 )" ( #3576 )
...
This reverts commit 2d0973a852 .
2024-03-02 03:17:13 -08:00
David Hou
2d0973a852
allow overriding weight init for Linear ( #3569 )
2024-03-02 03:16:04 -08:00
Francis Lam
9642a8f547
search: add BEAM UPCAST/LOCAL params and loosen TC criteria during BEAM ( #3563 )
2024-03-02 03:11:25 -08:00
David Hou
ba6c041eab
fix SCE ignore_index with label_smoothing ( #3574 )
...
* fix SCE ignore_index with label_smoothing
* break up the line
* only 3 cats in test
* Revert "only 3 cats in test"
This reverts commit 18be069c90 .
2024-03-01 22:19:45 -05:00
Francis Lam
e17f1821a7
wmma: add CUDA tensor core and fix test_speed_v_torch failure ( #3544 )
2024-03-01 17:51:02 -08:00
David Hou
b3cdc11a58
label_smoothing in sparse_cat_crossentropy ( #3568 )
...
* label_smoothing in sparse_cat_crossentropy
* test multiple values, assert
2024-03-01 20:02:46 -05:00
George Hotz
6b29c70b3d
Refactor to UOpGraph class ( #3566 )
...
* Refactor to UOpGraph class
* fix test
2024-03-01 15:14:48 -08:00
chenyu
b7e555f6c0
run test_linearizer_failures on PYTHON backend ( #3565 )
...
* run test_linearizer_failures on PYTHON backend
only test 1, some have hanging issues and gated store is not implemented
* --durations=20
* two less slow ones
2024-03-01 17:00:18 -05:00
chenyu
48d22067ca
clean up test_linearizer_failures ( #3562 )
...
* cleanup test_linearizer_failures
* fix test_failure_8
* fix that
* better assert message
2024-03-01 15:57:17 -05:00
David Hou
d16aa89561
don't allow MLB assigns with different axes ( #3557 )
...
* allow LB <- MLB assign, but don't reuse buffer
* update test
* update test
* assign assert axes are the same
* update tests to manually shard running stats
* unused import
2024-03-01 07:59:06 -05:00
chenyu
cfd23f398d
Revert "don't allow MLB assigns with different axes ( #3483 )" ( #3554 )
...
This reverts commit f19d8bb7b4 .
2024-02-29 23:13:07 -05:00
David Hou
f19d8bb7b4
don't allow MLB assigns with different axes ( #3483 )
...
* allow LB <- MLB assign, but don't reuse buffer
* update test
* update test
* assign assert axes are the same
2024-02-29 23:04:12 -05:00
chenyu
35d998efa8
disable flaky test_conv_beam in CI ( #3553 )
...
might fail due to CL_OUT_OF_RESOURCES
2024-02-29 22:59:41 -05:00
David Hou
e5385eecfc
UnsyncedBatchNorm with synced trainable weights for hlb cifar ( #3472 )
...
* UnsyncedBatchNorm with synced trainable weights for hlb cifar
* multitensor reshape tests
* test mlb assign change axis
* E501
* argfix axis
* don't import batchnorm from hlb_cifar in test_multitensor
* pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB
* add backprop test for UnsyncedBatchNorm
* break out MLB assign and reshape changes
* manually shard running mean and running var
* don't shard unless syncbn=0
* replace nn.BatchNorm2d with UnsyncedBatchNorm
* don't increment num_batches_tracked if not tracking running stats
* update tests
* oops
* Revert "oops"
This reverts commit 5e8a67a535 .
* Revert "update tests"
This reverts commit 7ebf65d89a .
* Revert "don't increment num_batches_tracked if not tracking running stats"
This reverts commit 78de0ea9ee .
* Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm"
This reverts commit d03da53da7 .
* don't increment num_batched_tracked if not tracking running stats
* oops
* test_batchnorm_axis
* compare against torch
* types
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-02-29 22:52:07 -05:00
George Hotz
5a6e151844
no barrier side effect ( #3550 )
...
* no barrier side effect
* finish barrier removal
2024-02-29 18:10:04 -08:00
George Hotz
bd9c2ced07
define var can be removed from vars to keep ( #3549 )
...
* define var can be removed
* sint
* oops, didn't store
2024-02-29 17:44:19 -08:00
George Hotz
2c19ab6561
define var ( #3548 )
...
* define var
* remove vars from there
* fix python symbolic ops
* fix llvm
* pypath
2024-02-29 16:43:27 -08:00
George Hotz
83cdc85790
add index to DEFINE_GLOBAL ( #3542 )
...
* remove DEFINE_GLOBAL from uops with side effects
* add index to DEFINE_GLOBAL
* bugfix
* better var name
2024-02-29 15:22:26 -08:00
chenyu
978a997d1f
print nvidia-smi in CI benchmark ( #3546 )
2024-02-29 17:31:37 -05:00
Francis Lam
5d434801fa
search: add tensor core to beam search space ( #3275 )
...
* search: add tensor core to beam search space
* kernel: refactor apply_tensor_core into apply_opt and hand_coded
* kernel: revert removal of apply_tensor_cores
also revert BEAM search parameter changes
2024-02-29 13:05:10 -08:00
Marcin Słowik
f90caa4b92
Escape table name in diskcache queries. ( #3543 )
...
Some devices create cache table names with non-alphanumerical characters, e.g. "compile_hip_gfx1010:xnack-_12".
This commit escapes the table name in single quotes s.t. sqlite works (see https://github.com/tinygrad/tinygrad/issues/3538 ).
2024-02-29 13:04:21 -08:00
nimlgen
0afde98ba5
scan all gpu agents at launch ( #3535 )
2024-02-29 09:37:37 -08:00
Mark McLoughlin
2e82c5b7a4
README: ops_cpu and ops_torch have been removed ( #3539 )
...
Removed by pull #3399
2024-02-29 10:22:11 -05:00