Commit Graph

3696 Commits

Author SHA1 Message Date
chenyu
ee41fafdab use operator instead of lambda in python_alu (#3590) 2024-03-02 19:33:21 -05:00
qazal
a89afd4ffa Directly store float4 nodes (#3564)
* float4 cast collapse

* simplify cstyle

* simplify uoptimizer

* ci

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-02 15:58:20 -08:00
George Hotz
770707b376 hotfix: gpuocelot no rebuild 2024-03-02 15:57:38 -08:00
George Hotz
74c9acddb0 simple python ALU (#3589)
* shorter

* bugfix
2024-03-02 15:50:58 -08:00
Francis Lam
162dfb07d9 fuzz_linearizer: fix uops and add to test.yml (#3588) 2024-03-02 15:03:42 -08:00
Jovan Sardinha
8978488565 add sanity tests for bufs_from_lin (#3586) 2024-03-02 14:17:43 -08:00
George Hotz
aa9b013d79 add constant folding for WHERE in uops (#3584)
* add constant folding for WHERE in uops

* prereqs for generic constant folding

* fix test

* disable slow overflow logic

* make that test faster
2024-03-02 10:37:14 -08:00
nimlgen
3b7e3fa2e4 fix sync in hsa graph (#3582) 2024-03-02 07:37:51 -08:00
Szymon Ożóg
6c36264790 Improve type hints for optimizer (#3583)
* Improve type hints for optimizer

* lint fix
2024-03-02 07:35:44 -08:00
George Hotz
83530a585f add quick external data select test 2024-03-02 05:38:32 -08:00
George Hotz
9a37273d36 consts don't have nodes in the graph (#3579)
* consts don't have nodes in the graph

* add idx
2024-03-02 04:19:11 -08:00
George Hotz
41f0a25b53 lazy.py: cache consts (#3577)
* lazy.py: cache consts

* add regression test

* always always cache const

* bump by 1
2024-03-02 03:50:05 -08:00
uuuvn
fb8acd1851 Don't touch UOps.DEFINE_GLOBAL (#3575) 2024-03-02 03:30:05 -08:00
George Hotz
50e1445e60 Revert "allow overriding weight init for Linear (#3569)" (#3576)
This reverts commit 2d0973a852.
2024-03-02 03:17:13 -08:00
David Hou
2d0973a852 allow overriding weight init for Linear (#3569) 2024-03-02 03:16:04 -08:00
Francis Lam
9642a8f547 search: add BEAM UPCAST/LOCAL params and loosen TC criteria during BEAM (#3563) 2024-03-02 03:11:25 -08:00
David Hou
ba6c041eab fix SCE ignore_index with label_smoothing (#3574)
* fix SCE ignore_index with label_smoothing

* break up the line

* only 3 cats in test

* Revert "only 3 cats in test"

This reverts commit 18be069c90.
2024-03-01 22:19:45 -05:00
Francis Lam
e17f1821a7 wmma: add CUDA tensor core and fix test_speed_v_torch failure (#3544) 2024-03-01 17:51:02 -08:00
David Hou
b3cdc11a58 label_smoothing in sparse_cat_crossentropy (#3568)
* label_smoothing in sparse_cat_crossentropy

* test multiple values, assert
2024-03-01 20:02:46 -05:00
George Hotz
6b29c70b3d Refactor to UOpGraph class (#3566)
* Refactor to UOpGraph class

* fix test
2024-03-01 15:14:48 -08:00
chenyu
b7e555f6c0 run test_linearizer_failures on PYTHON backend (#3565)
* run test_linearizer_failures on PYTHON backend

only test 1, some have hanging issues and gated store is not implemented

* --durations=20

* two less slow ones
2024-03-01 17:00:18 -05:00
chenyu
48d22067ca clean up test_linearizer_failures (#3562)
* cleanup test_linearizer_failures

* fix test_failure_8

* fix that

* better assert message
2024-03-01 15:57:17 -05:00
David Hou
d16aa89561 don't allow MLB assigns with different axes (#3557)
* allow LB <- MLB assign, but don't reuse buffer

* update test

* update test

* assign assert axes are the same

* update tests to manually shard running stats

* unused import
2024-03-01 07:59:06 -05:00
chenyu
cfd23f398d Revert "don't allow MLB assigns with different axes (#3483)" (#3554)
This reverts commit f19d8bb7b4.
2024-02-29 23:13:07 -05:00
David Hou
f19d8bb7b4 don't allow MLB assigns with different axes (#3483)
* allow LB <- MLB assign, but don't reuse buffer

* update test

* update test

* assign assert axes are the same
2024-02-29 23:04:12 -05:00
chenyu
35d998efa8 disable flaky test_conv_beam in CI (#3553)
might fail due to CL_OUT_OF_RESOURCES
2024-02-29 22:59:41 -05:00
David Hou
e5385eecfc UnsyncedBatchNorm with synced trainable weights for hlb cifar (#3472)
* UnsyncedBatchNorm with synced trainable weights for hlb cifar

* multitensor reshape tests

* test mlb assign change axis

* E501

* argfix axis

* don't import batchnorm from hlb_cifar in test_multitensor

* pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB

* add backprop test for UnsyncedBatchNorm

* break out MLB assign and reshape changes

* manually shard running mean and running var

* don't shard unless syncbn=0

* replace nn.BatchNorm2d with UnsyncedBatchNorm

* don't increment num_batches_tracked if not tracking running stats

* update tests

* oops

* Revert "oops"

This reverts commit 5e8a67a535.

* Revert "update tests"

This reverts commit 7ebf65d89a.

* Revert "don't increment num_batches_tracked if not tracking running stats"

This reverts commit 78de0ea9ee.

* Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm"

This reverts commit d03da53da7.

* don't increment num_batched_tracked if not tracking running stats

* oops

* test_batchnorm_axis

* compare against torch

* types

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-29 22:52:07 -05:00
George Hotz
5a6e151844 no barrier side effect (#3550)
* no barrier side effect

* finish barrier removal
2024-02-29 18:10:04 -08:00
George Hotz
bd9c2ced07 define var can be removed from vars to keep (#3549)
* define var can be removed

* sint

* oops, didn't store
2024-02-29 17:44:19 -08:00
George Hotz
2c19ab6561 define var (#3548)
* define var

* remove vars from there

* fix python symbolic ops

* fix llvm

* pypath
2024-02-29 16:43:27 -08:00
George Hotz
83cdc85790 add index to DEFINE_GLOBAL (#3542)
* remove DEFINE_GLOBAL from uops with side effects

* add index to DEFINE_GLOBAL

* bugfix

* better var name
2024-02-29 15:22:26 -08:00
chenyu
978a997d1f print nvidia-smi in CI benchmark (#3546) 2024-02-29 17:31:37 -05:00
Francis Lam
5d434801fa search: add tensor core to beam search space (#3275)
* search: add tensor core to beam search space

* kernel: refactor apply_tensor_core into apply_opt and hand_coded

* kernel: revert removal of apply_tensor_cores

also revert BEAM search parameter changes
2024-02-29 13:05:10 -08:00
Marcin Słowik
f90caa4b92 Escape table name in diskcache queries. (#3543)
Some devices create cache table names with non-alphanumerical characters, e.g. "compile_hip_gfx1010:xnack-_12".
This commit escapes the table name in single quotes s.t. sqlite works (see https://github.com/tinygrad/tinygrad/issues/3538).
2024-02-29 13:04:21 -08:00
nimlgen
0afde98ba5 scan all gpu agents at launch (#3535) 2024-02-29 09:37:37 -08:00
Mark McLoughlin
2e82c5b7a4 README: ops_cpu and ops_torch have been removed (#3539)
Removed by pull #3399
2024-02-29 10:22:11 -05:00
nimlgen
b05776ef3e fix addresses of dispatch packets (#3534) 2024-02-29 05:43:55 -08:00
geohotstan
9268a8b154 remove MULACC (#3459)
* init

* removed mulacc

* is uoptimize the problem?

* lol hax make work temporarily fix l8er

* revert extra/ changes

* clean up

* flaky metal tests?

* add back mulacc for metal

* revert last commit

* try skipping linearizer_failure tests

* skip flammit tests... cuz tests all work locally

* try narrow down exact linearizer failure test

* try 2

* try 4

* generated code is the exact same wtf why CI fails

* code for 15 and 17 are exact same with or without mulacc, this should pass

* try only 1 failure

* try garbage collecting lol...

* try del variables lol

* try gcing after del lol...

* is diskcache the problem???

* try disabling opts cache idk

* try remove hack

* try disable github metal cache...

* try CACHELEVEL=0 :D idk anymore

* try increase newCommandQueueWithMaxCommandBufferCount_, im almost out of ideas...

* revert

* actually not a HACK

* oops
2024-02-29 07:40:40 -05:00
qazal
94fc0fd546 uop the float4 acc upcast in group_for_reduce kernels (#3466)
* simplest one

* but i can trust this will be cached correctly

* wait that was wrong too

* cleanup

* test_reduce_upcast for single reduce case

* a late accumulator always outputs to gds

lint
2024-02-28 17:33:47 -08:00
George Hotz
48918fa75a fix disktensor offset issue (#3532) 2024-02-28 17:22:17 -08:00
Caleb Bunch
0b1fc5888a fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py (#3531) 2024-02-28 17:15:32 -08:00
David Friehs
275971e616 fix: align .split, .chunk and .unsqueeze with torch, add fuzz tests (#3505)
this fixes .split where self.shape[dim] is not perfectly divisible by
sizes - .chunk is always the wrong choice here:
 - tensor((5,)).split(4) should result in (tensor((4,)), tensor((1,)))
   was (tensor((3,)), tensor((2,)))

this also fixes issues in .split and .chunk where tensors with
shape[dim]==0 lead to empty tuples/lists when the tensor itself should
have been returned instead

because tinygrad is expected to fail in all cases where torch fails
tinygrad will now be strict regarding sizes having to sum up to passed
dimension in .split, num having to be non-null for .chunk and only
allowing valid dims in .unsqueeze
2024-02-28 17:06:39 -08:00
George Hotz
e7cda40d52 Revert "hotfix: disable metal graph"
This reverts commit 3541602877.
2024-02-28 16:25:12 -08:00
George Hotz
42eb8de0d4 Revert "move all reduces to the end in lazy (#3475)" (#3529)
This reverts commit 2113e1eb63.

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-28 16:24:10 -08:00
chenyu
0c6846f9fc failed test case for disk tensor assign into dtype int64 (#3527)
failed case for #3510, mark as expectedFailure for now
2024-02-28 17:52:21 -05:00
chenyu
d89e3c4e08 enable METAL tests now runner is M1 and no fast-math (#3523) 2024-02-28 14:14:23 -05:00
chenyu
1136e2a82a skipIf(not( -> skipUnless( in test_linearizer_failures (#3519)
if these behaves weirdly in CI might need to disable them in CI
2024-02-28 13:48:47 -05:00
George Hotz
3541602877 hotfix: disable metal graph 2024-02-28 10:33:34 -08:00
George Hotz
c34d382a1e bump to macos-14 M1 (#3520)
* bump to macos-14 M1

* bump cache key

* no -n auto

* jit=2

* real tensor cores
2024-02-28 10:28:25 -08:00
George Hotz
505ac6ac96 Revert "check buffers are seeable by other gpu before transfer (#3504)" (#3522)
This reverts commit db2cf48828.
2024-02-28 10:26:27 -08:00