George Hotz
48918fa75a
fix disktensor offset issue ( #3532 )
2024-02-28 17:22:17 -08:00
Caleb Bunch
0b1fc5888a
fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py ( #3531 )
2024-02-28 17:15:32 -08:00
David Friehs
275971e616
fix: align .split, .chunk and .unsqueeze with torch, add fuzz tests ( #3505 )
...
this fixes .split where self.shape[dim] is not perfectly divisible by
sizes - .chunk is always the wrong choice here:
- tensor((5,)).split(4) should result in (tensor((4,)), tensor((1,)))
was (tensor((3,)), tensor((2,)))
this also fixes issues in .split and .chunk where tensors with
shape[dim]==0 lead to empty tuples/lists when the tensor itself should
have been returned instead
because tinygrad is expected to fail in all cases where torch fails
tinygrad will now be strict regarding sizes having to sum up to passed
dimension in .split, num having to be non-null for .chunk and only
allowing valid dims in .unsqueeze
2024-02-28 17:06:39 -08:00
George Hotz
e7cda40d52
Revert "hotfix: disable metal graph"
...
This reverts commit 3541602877 .
2024-02-28 16:25:12 -08:00
George Hotz
42eb8de0d4
Revert "move all reduces to the end in lazy ( #3475 )" ( #3529 )
...
This reverts commit 2113e1eb63 .
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-02-28 16:24:10 -08:00
chenyu
0c6846f9fc
failed test case for disk tensor assign into dtype int64 ( #3527 )
...
failed case for #3510 , mark as expectedFailure for now
2024-02-28 17:52:21 -05:00
chenyu
d89e3c4e08
enable METAL tests now runner is M1 and no fast-math ( #3523 )
2024-02-28 14:14:23 -05:00
chenyu
1136e2a82a
skipIf(not( -> skipUnless( in test_linearizer_failures (#3519 )
...
if these behaves weirdly in CI might need to disable them in CI
2024-02-28 13:48:47 -05:00
George Hotz
3541602877
hotfix: disable metal graph
2024-02-28 10:33:34 -08:00
George Hotz
c34d382a1e
bump to macos-14 M1 ( #3520 )
...
* bump to macos-14 M1
* bump cache key
* no -n auto
* jit=2
* real tensor cores
2024-02-28 10:28:25 -08:00
George Hotz
505ac6ac96
Revert "check buffers are seeable by other gpu before transfer ( #3504 )" ( #3522 )
...
This reverts commit db2cf48828 .
2024-02-28 10:26:27 -08:00
nimlgen
db2cf48828
check buffers are seeable by other gpu before transfer ( #3504 )
2024-02-28 10:24:50 -08:00
wozeparrot
da32c37346
use hash as key for beam ( #3516 )
...
* feat: use hash as key for beam
* feat: bump db version
2024-02-28 10:19:01 -08:00
uuuvn
1f5c24798b
Raise exception if MTLCommandBuffer fails ( #3465 )
2024-02-28 10:14:08 -08:00
nimlgen
08ef77c721
hsa multigpu graph ( #3403 )
...
* init hsa multigraph
* better handling of accesses to buffers
* revert sdma0 only when copies from fd
2024-02-28 09:40:53 -08:00
chenyu
fa88e1d0d0
cleanup lazy reduce ( #3517 )
...
* cleanup lazy reduce
removed useless assert now arg is axis and cleaned split logic
* stride can be symbolic with int shape
2024-02-28 08:15:01 -05:00
chenyu
2127c1c6c2
test for the split reduce kernel ( #3515 )
...
somehow this was not tested
2024-02-27 21:29:25 -05:00
nimlgen
94b7ac7a29
no cuda compile helper ( #3512 )
2024-02-28 01:50:10 +01:00
chenyu
88939c3347
fix Node.max can be symbolic ( #3514 )
...
Also made sure taking max twice can get int.
2024-02-27 17:21:31 -05:00
chenyu
969b57f0fe
enable symbolic_ops and jits test of two vars ( #3513 )
2024-02-27 11:17:46 -05:00
wozeparrot
ea4b8e5b1f
feat: don't hardcode the arch ( #3511 )
2024-02-27 07:58:03 -05:00
Francis Lam
11da65bccd
test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option ( #3455 )
...
* test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option
this allows us to limit the size of the kernel and reduce running
times by avoiding ones that take a long time
* fix spacing and re-order to put parameters together
2024-02-27 07:34:59 -05:00
qazal
a29cd6d464
run f64 increased precision tests on remu ( #3509 )
...
* run the test in CI
* temp: use the pre-release
* Revert "temp: use the pre-release"
This reverts commit 28e8571421 .
2024-02-26 18:01:07 -05:00
chenyu
b1426f3a4c
cleanup SumNode mod ( #3503 )
2024-02-26 11:10:55 -05:00
chenyu
61605ccc69
Remove special case of SumNode div SumNode ( #3502 )
2024-02-26 09:42:06 -05:00
Francis Lam
39d75f0d58
test_linearizer_failures: add more METAL examples ( #3495 )
...
these were obtained from running fuzz_linearizer on METAL
2024-02-26 10:19:05 +01:00
chenyu
b154089884
float64 function support for HIP ( #3492 )
...
* float64 function support for HIP
* not CI
2024-02-24 09:46:20 -05:00
chenyu
35aff8b0c2
properly exclude PYTHON backend and support of half ( #3491 )
...
should be able to run in CI with python 3.12
2024-02-24 09:22:06 -05:00
David Friehs
2fe98b64bb
fix Tensor.split not passing dim to Tensor.chunk ( #3490 )
2024-02-24 07:53:11 -05:00
Caleb Bunch
b41761488d
change specific string 'CLANG' to DEVICE variable in abstractions2.py ( #3488 )
2024-02-24 07:51:39 -05:00
chenyu
c032df520b
minor symbolic type related cleaups ( #3489 )
2024-02-23 16:44:43 -05:00
Carson Radtke
15df9406d6
fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test ( #3487 )
...
* fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test
* sqrt(0) != nan
* fix tabs
2024-02-23 18:28:00 +01:00
nimlgen
52567da07f
jit grapher simplified ( #3478 )
2024-02-23 16:20:16 +01:00
George Hotz
2113e1eb63
move all reduces to the end in lazy ( #3475 )
...
* move all reduces to the end in lazy
* apply as reshape, not permute
2024-02-23 15:49:11 +01:00
David Hou
5cfcc2a8d7
support MLB reshaping on-axis for evenly sharded ( #3484 )
...
* support MLB reshaping on-axis for evenly sharded
* update test
* not -> !=
2024-02-23 07:51:36 -05:00
chenyu
358a24eae6
symbolic use mod for rmod and use floordiv for rfloordiv ( #3485 )
2024-02-23 01:05:13 -05:00
nimlgen
6d048a0c0b
cache collector optimizations are allowed only for kernel operations ( #3476 )
2024-02-22 12:26:57 +01:00
George Hotz
7698781389
Revert "wmma: add CUDA tensor core ( #3464 )" ( #3474 )
...
This reverts commit e9cef13f0b .
2024-02-22 11:58:16 +01:00
Francis Lam
e9cef13f0b
wmma: add CUDA tensor core ( #3464 )
2024-02-22 11:57:08 +01:00
wozeparrot
57678012e1
Upload correct benchmark artifact ( #3471 )
...
* fix: correct filename
* fix: why is this .py?
2024-02-22 01:14:16 -05:00
chenyu
ab40c0cf93
clean up long lines in symbolic ( #3469 )
2024-02-21 21:57:44 -05:00
chenyu
7c0fc40123
enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d ( #3468 )
2024-02-21 18:30:12 -05:00
chenyu
77d2a4c12a
regenerate kernel dataset after reduce arg to axis change ( #3467 )
...
```
./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
2024-02-21 18:16:13 -05:00
David Hou
f513c37e64
support same uidx in multiple shape positions ( #3205 )
...
* support same uidx in multiple shape positions
* rename var
* update comment
* add contiguous index check to global_store too
* update comment
* small change
* is this better?
* smh
* smaller change?
* get rid of more changes
* get rid of more changes
* is this even making anything better
* comment
* fix test
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-02-21 19:37:03 +01:00
chenyu
1eb24af63b
fix softmax and log_softmax for 0d tensor ( #3463 )
...
matched torch to take axis \in [-1, 0] and used axis=None internally
2024-02-21 11:30:30 -05:00
George Hotz
871ba73e65
_reduce_op is axis based now ( #3462 )
...
* _reduce_op is axis based now
* axis_
* update lin failures
* disable that
* fix shape
2024-02-21 16:36:31 +01:00
George Hotz
22a90cbb15
change frontend reduce API to use axis ( #3460 )
...
* change frontend API to axis
* switch lazy to also take axis input
2024-02-21 12:26:17 +01:00
chenyu
6c1063ba39
add mypy --strict-equality to pre-commit ( #3458 )
...
matched ci mypy behavior
2024-02-21 03:41:05 -05:00
chenyu
02683a8659
gate the cast before movements in lazy ( #3452 )
...
it made gpt2 slower (2ms -> 2.5ms on 3090, 7ms -> 8ms on M1 Max with BEAM=2).
disabled it in gpt2 benchmark before understanding the full issue
2024-02-20 09:36:22 -05:00
chenyu
0d326a48b8
fix LtNode simplification when lhs and rhs contain same variables ( #3451 )
...
* fix LtNode simplification when lhs and rhs contain same variables
`(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)`
* fix with less perf impact
2024-02-20 09:06:55 -05:00