Commit Graph

3661 Commits

Author SHA1 Message Date
Mark McLoughlin
2e82c5b7a4 README: ops_cpu and ops_torch have been removed (#3539)
Removed by pull #3399
2024-02-29 10:22:11 -05:00
nimlgen
b05776ef3e fix addresses of dispatch packets (#3534) 2024-02-29 05:43:55 -08:00
geohotstan
9268a8b154 remove MULACC (#3459)
* init

* removed mulacc

* is uoptimize the problem?

* lol hax make work temporarily fix l8er

* revert extra/ changes

* clean up

* flaky metal tests?

* add back mulacc for metal

* revert last commit

* try skipping linearizer_failure tests

* skip flammit tests... cuz tests all work locally

* try narrow down exact linearizer failure test

* try 2

* try 4

* generated code is the exact same wtf why CI fails

* code for 15 and 17 are exact same with or without mulacc, this should pass

* try only 1 failure

* try garbage collecting lol...

* try del variables lol

* try gcing after del lol...

* is diskcache the problem???

* try disabling opts cache idk

* try remove hack

* try disable github metal cache...

* try CACHELEVEL=0 :D idk anymore

* try increase newCommandQueueWithMaxCommandBufferCount_, im almost out of ideas...

* revert

* actually not a HACK

* oops
2024-02-29 07:40:40 -05:00
qazal
94fc0fd546 uop the float4 acc upcast in group_for_reduce kernels (#3466)
* simplest one

* but i can trust this will be cached correctly

* wait that was wrong too

* cleanup

* test_reduce_upcast for single reduce case

* a late accumulator always outputs to gds

lint
2024-02-28 17:33:47 -08:00
George Hotz
48918fa75a fix disktensor offset issue (#3532) 2024-02-28 17:22:17 -08:00
Caleb Bunch
0b1fc5888a fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py (#3531) 2024-02-28 17:15:32 -08:00
David Friehs
275971e616 fix: align .split, .chunk and .unsqueeze with torch, add fuzz tests (#3505)
this fixes .split where self.shape[dim] is not perfectly divisible by
sizes - .chunk is always the wrong choice here:
 - tensor((5,)).split(4) should result in (tensor((4,)), tensor((1,)))
   was (tensor((3,)), tensor((2,)))

this also fixes issues in .split and .chunk where tensors with
shape[dim]==0 lead to empty tuples/lists when the tensor itself should
have been returned instead

because tinygrad is expected to fail in all cases where torch fails
tinygrad will now be strict regarding sizes having to sum up to passed
dimension in .split, num having to be non-null for .chunk and only
allowing valid dims in .unsqueeze
2024-02-28 17:06:39 -08:00
George Hotz
e7cda40d52 Revert "hotfix: disable metal graph"
This reverts commit 3541602877.
2024-02-28 16:25:12 -08:00
George Hotz
42eb8de0d4 Revert "move all reduces to the end in lazy (#3475)" (#3529)
This reverts commit 2113e1eb63.

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-28 16:24:10 -08:00
chenyu
0c6846f9fc failed test case for disk tensor assign into dtype int64 (#3527)
failed case for #3510, mark as expectedFailure for now
2024-02-28 17:52:21 -05:00
chenyu
d89e3c4e08 enable METAL tests now runner is M1 and no fast-math (#3523) 2024-02-28 14:14:23 -05:00
chenyu
1136e2a82a skipIf(not( -> skipUnless( in test_linearizer_failures (#3519)
if these behaves weirdly in CI might need to disable them in CI
2024-02-28 13:48:47 -05:00
George Hotz
3541602877 hotfix: disable metal graph 2024-02-28 10:33:34 -08:00
George Hotz
c34d382a1e bump to macos-14 M1 (#3520)
* bump to macos-14 M1

* bump cache key

* no -n auto

* jit=2

* real tensor cores
2024-02-28 10:28:25 -08:00
George Hotz
505ac6ac96 Revert "check buffers are seeable by other gpu before transfer (#3504)" (#3522)
This reverts commit db2cf48828.
2024-02-28 10:26:27 -08:00
nimlgen
db2cf48828 check buffers are seeable by other gpu before transfer (#3504) 2024-02-28 10:24:50 -08:00
wozeparrot
da32c37346 use hash as key for beam (#3516)
* feat: use hash as key for beam

* feat: bump db version
2024-02-28 10:19:01 -08:00
uuuvn
1f5c24798b Raise exception if MTLCommandBuffer fails (#3465) 2024-02-28 10:14:08 -08:00
nimlgen
08ef77c721 hsa multigpu graph (#3403)
* init hsa multigraph

* better handling of accesses to buffers

* revert sdma0 only when copies from fd
2024-02-28 09:40:53 -08:00
chenyu
fa88e1d0d0 cleanup lazy reduce (#3517)
* cleanup lazy reduce

removed useless assert now arg is axis and cleaned split logic

* stride can be symbolic with int shape
2024-02-28 08:15:01 -05:00
chenyu
2127c1c6c2 test for the split reduce kernel (#3515)
somehow this was not tested
2024-02-27 21:29:25 -05:00
nimlgen
94b7ac7a29 no cuda compile helper (#3512) 2024-02-28 01:50:10 +01:00
chenyu
88939c3347 fix Node.max can be symbolic (#3514)
Also made sure taking max twice can get int.
2024-02-27 17:21:31 -05:00
chenyu
969b57f0fe enable symbolic_ops and jits test of two vars (#3513) 2024-02-27 11:17:46 -05:00
wozeparrot
ea4b8e5b1f feat: don't hardcode the arch (#3511) 2024-02-27 07:58:03 -05:00
Francis Lam
11da65bccd test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option (#3455)
* test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option

this allows us to limit the size of the kernel and reduce running
times by avoiding ones that take a long time

* fix spacing and re-order to put parameters together
2024-02-27 07:34:59 -05:00
qazal
a29cd6d464 run f64 increased precision tests on remu (#3509)
* run the test in CI

* temp: use the pre-release

* Revert "temp: use the pre-release"

This reverts commit 28e8571421.
2024-02-26 18:01:07 -05:00
chenyu
b1426f3a4c cleanup SumNode mod (#3503) 2024-02-26 11:10:55 -05:00
chenyu
61605ccc69 Remove special case of SumNode div SumNode (#3502) 2024-02-26 09:42:06 -05:00
Francis Lam
39d75f0d58 test_linearizer_failures: add more METAL examples (#3495)
these were obtained from running fuzz_linearizer on METAL
2024-02-26 10:19:05 +01:00
chenyu
b154089884 float64 function support for HIP (#3492)
* float64 function support for HIP

* not CI
2024-02-24 09:46:20 -05:00
chenyu
35aff8b0c2 properly exclude PYTHON backend and support of half (#3491)
should be able to run in CI with python 3.12
2024-02-24 09:22:06 -05:00
David Friehs
2fe98b64bb fix Tensor.split not passing dim to Tensor.chunk (#3490) 2024-02-24 07:53:11 -05:00
Caleb Bunch
b41761488d change specific string 'CLANG' to DEVICE variable in abstractions2.py (#3488) 2024-02-24 07:51:39 -05:00
chenyu
c032df520b minor symbolic type related cleaups (#3489) 2024-02-23 16:44:43 -05:00
Carson Radtke
15df9406d6 fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test (#3487)
* fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test

* sqrt(0) != nan

* fix tabs
2024-02-23 18:28:00 +01:00
nimlgen
52567da07f jit grapher simplified (#3478) 2024-02-23 16:20:16 +01:00
George Hotz
2113e1eb63 move all reduces to the end in lazy (#3475)
* move all reduces to the end in lazy

* apply as reshape, not permute
2024-02-23 15:49:11 +01:00
David Hou
5cfcc2a8d7 support MLB reshaping on-axis for evenly sharded (#3484)
* support MLB reshaping on-axis for evenly sharded

* update test

* not -> !=
2024-02-23 07:51:36 -05:00
chenyu
358a24eae6 symbolic use mod for rmod and use floordiv for rfloordiv (#3485) 2024-02-23 01:05:13 -05:00
nimlgen
6d048a0c0b cache collector optimizations are allowed only for kernel operations (#3476) 2024-02-22 12:26:57 +01:00
George Hotz
7698781389 Revert "wmma: add CUDA tensor core (#3464)" (#3474)
This reverts commit e9cef13f0b.
2024-02-22 11:58:16 +01:00
Francis Lam
e9cef13f0b wmma: add CUDA tensor core (#3464) 2024-02-22 11:57:08 +01:00
wozeparrot
57678012e1 Upload correct benchmark artifact (#3471)
* fix: correct filename

* fix: why is this .py?
2024-02-22 01:14:16 -05:00
chenyu
ab40c0cf93 clean up long lines in symbolic (#3469) 2024-02-21 21:57:44 -05:00
chenyu
7c0fc40123 enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d (#3468) 2024-02-21 18:30:12 -05:00
chenyu
77d2a4c12a regenerate kernel dataset after reduce arg to axis change (#3467)
```
./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
2024-02-21 18:16:13 -05:00
David Hou
f513c37e64 support same uidx in multiple shape positions (#3205)
* support same uidx in multiple shape positions

* rename var

* update comment

* add contiguous index check to global_store too

* update comment

* small change

* is this better?

* smh

* smaller change?

* get rid of more changes

* get rid of more changes

* is this even making anything better

* comment

* fix test

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-02-21 19:37:03 +01:00
chenyu
1eb24af63b fix softmax and log_softmax for 0d tensor (#3463)
matched torch to take axis \in [-1, 0] and used axis=None internally
2024-02-21 11:30:30 -05:00
George Hotz
871ba73e65 _reduce_op is axis based now (#3462)
* _reduce_op is axis based now

* axis_

* update lin failures

* disable that

* fix shape
2024-02-21 16:36:31 +01:00