chenyu
d89e3c4e08
enable METAL tests now runner is M1 and no fast-math ( #3523 )
2024-02-28 14:14:23 -05:00
chenyu
1136e2a82a
skipIf(not( -> skipUnless( in test_linearizer_failures (#3519 )
...
if these behaves weirdly in CI might need to disable them in CI
2024-02-28 13:48:47 -05:00
George Hotz
3541602877
hotfix: disable metal graph
2024-02-28 10:33:34 -08:00
George Hotz
c34d382a1e
bump to macos-14 M1 ( #3520 )
...
* bump to macos-14 M1
* bump cache key
* no -n auto
* jit=2
* real tensor cores
2024-02-28 10:28:25 -08:00
George Hotz
505ac6ac96
Revert "check buffers are seeable by other gpu before transfer ( #3504 )" ( #3522 )
...
This reverts commit db2cf48828 .
2024-02-28 10:26:27 -08:00
nimlgen
db2cf48828
check buffers are seeable by other gpu before transfer ( #3504 )
2024-02-28 10:24:50 -08:00
wozeparrot
da32c37346
use hash as key for beam ( #3516 )
...
* feat: use hash as key for beam
* feat: bump db version
2024-02-28 10:19:01 -08:00
uuuvn
1f5c24798b
Raise exception if MTLCommandBuffer fails ( #3465 )
2024-02-28 10:14:08 -08:00
nimlgen
08ef77c721
hsa multigpu graph ( #3403 )
...
* init hsa multigraph
* better handling of accesses to buffers
* revert sdma0 only when copies from fd
2024-02-28 09:40:53 -08:00
chenyu
fa88e1d0d0
cleanup lazy reduce ( #3517 )
...
* cleanup lazy reduce
removed useless assert now arg is axis and cleaned split logic
* stride can be symbolic with int shape
2024-02-28 08:15:01 -05:00
chenyu
2127c1c6c2
test for the split reduce kernel ( #3515 )
...
somehow this was not tested
2024-02-27 21:29:25 -05:00
nimlgen
94b7ac7a29
no cuda compile helper ( #3512 )
2024-02-28 01:50:10 +01:00
chenyu
88939c3347
fix Node.max can be symbolic ( #3514 )
...
Also made sure taking max twice can get int.
2024-02-27 17:21:31 -05:00
chenyu
969b57f0fe
enable symbolic_ops and jits test of two vars ( #3513 )
2024-02-27 11:17:46 -05:00
wozeparrot
ea4b8e5b1f
feat: don't hardcode the arch ( #3511 )
2024-02-27 07:58:03 -05:00
Francis Lam
11da65bccd
test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option ( #3455 )
...
* test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option
this allows us to limit the size of the kernel and reduce running
times by avoiding ones that take a long time
* fix spacing and re-order to put parameters together
2024-02-27 07:34:59 -05:00
qazal
a29cd6d464
run f64 increased precision tests on remu ( #3509 )
...
* run the test in CI
* temp: use the pre-release
* Revert "temp: use the pre-release"
This reverts commit 28e8571421 .
2024-02-26 18:01:07 -05:00
chenyu
b1426f3a4c
cleanup SumNode mod ( #3503 )
2024-02-26 11:10:55 -05:00
chenyu
61605ccc69
Remove special case of SumNode div SumNode ( #3502 )
2024-02-26 09:42:06 -05:00
Francis Lam
39d75f0d58
test_linearizer_failures: add more METAL examples ( #3495 )
...
these were obtained from running fuzz_linearizer on METAL
2024-02-26 10:19:05 +01:00
chenyu
b154089884
float64 function support for HIP ( #3492 )
...
* float64 function support for HIP
* not CI
2024-02-24 09:46:20 -05:00
chenyu
35aff8b0c2
properly exclude PYTHON backend and support of half ( #3491 )
...
should be able to run in CI with python 3.12
2024-02-24 09:22:06 -05:00
David Friehs
2fe98b64bb
fix Tensor.split not passing dim to Tensor.chunk ( #3490 )
2024-02-24 07:53:11 -05:00
Caleb Bunch
b41761488d
change specific string 'CLANG' to DEVICE variable in abstractions2.py ( #3488 )
2024-02-24 07:51:39 -05:00
chenyu
c032df520b
minor symbolic type related cleaups ( #3489 )
2024-02-23 16:44:43 -05:00
Carson Radtke
15df9406d6
fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test ( #3487 )
...
* fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test
* sqrt(0) != nan
* fix tabs
2024-02-23 18:28:00 +01:00
nimlgen
52567da07f
jit grapher simplified ( #3478 )
2024-02-23 16:20:16 +01:00
George Hotz
2113e1eb63
move all reduces to the end in lazy ( #3475 )
...
* move all reduces to the end in lazy
* apply as reshape, not permute
2024-02-23 15:49:11 +01:00
David Hou
5cfcc2a8d7
support MLB reshaping on-axis for evenly sharded ( #3484 )
...
* support MLB reshaping on-axis for evenly sharded
* update test
* not -> !=
2024-02-23 07:51:36 -05:00
chenyu
358a24eae6
symbolic use mod for rmod and use floordiv for rfloordiv ( #3485 )
2024-02-23 01:05:13 -05:00
nimlgen
6d048a0c0b
cache collector optimizations are allowed only for kernel operations ( #3476 )
2024-02-22 12:26:57 +01:00
George Hotz
7698781389
Revert "wmma: add CUDA tensor core ( #3464 )" ( #3474 )
...
This reverts commit e9cef13f0b .
2024-02-22 11:58:16 +01:00
Francis Lam
e9cef13f0b
wmma: add CUDA tensor core ( #3464 )
2024-02-22 11:57:08 +01:00
wozeparrot
57678012e1
Upload correct benchmark artifact ( #3471 )
...
* fix: correct filename
* fix: why is this .py?
2024-02-22 01:14:16 -05:00
chenyu
ab40c0cf93
clean up long lines in symbolic ( #3469 )
2024-02-21 21:57:44 -05:00
chenyu
7c0fc40123
enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d ( #3468 )
2024-02-21 18:30:12 -05:00
chenyu
77d2a4c12a
regenerate kernel dataset after reduce arg to axis change ( #3467 )
...
```
./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
2024-02-21 18:16:13 -05:00
David Hou
f513c37e64
support same uidx in multiple shape positions ( #3205 )
...
* support same uidx in multiple shape positions
* rename var
* update comment
* add contiguous index check to global_store too
* update comment
* small change
* is this better?
* smh
* smaller change?
* get rid of more changes
* get rid of more changes
* is this even making anything better
* comment
* fix test
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-02-21 19:37:03 +01:00
chenyu
1eb24af63b
fix softmax and log_softmax for 0d tensor ( #3463 )
...
matched torch to take axis \in [-1, 0] and used axis=None internally
2024-02-21 11:30:30 -05:00
George Hotz
871ba73e65
_reduce_op is axis based now ( #3462 )
...
* _reduce_op is axis based now
* axis_
* update lin failures
* disable that
* fix shape
2024-02-21 16:36:31 +01:00
George Hotz
22a90cbb15
change frontend reduce API to use axis ( #3460 )
...
* change frontend API to axis
* switch lazy to also take axis input
2024-02-21 12:26:17 +01:00
chenyu
6c1063ba39
add mypy --strict-equality to pre-commit ( #3458 )
...
matched ci mypy behavior
2024-02-21 03:41:05 -05:00
chenyu
02683a8659
gate the cast before movements in lazy ( #3452 )
...
it made gpt2 slower (2ms -> 2.5ms on 3090, 7ms -> 8ms on M1 Max with BEAM=2).
disabled it in gpt2 benchmark before understanding the full issue
2024-02-20 09:36:22 -05:00
chenyu
0d326a48b8
fix LtNode simplification when lhs and rhs contain same variables ( #3451 )
...
* fix LtNode simplification when lhs and rhs contain same variables
`(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)`
* fix with less perf impact
2024-02-20 09:06:55 -05:00
George Hotz
1b6e890ef2
uops flop counter ( #3373 )
...
* factor out winograd functions
* test counter
* uops flop counter
* more correct
* ish
* correct
* cleanup
* tests for uops flop counter
* tests still fail
* fix symbolic uops flop cnt
* fix symbolic uops flop cnt
* hmm, it's an alu
* uops alu resolve
* relax that
2024-02-20 09:36:30 +01:00
Patrick Tsai
9dd64b1f5f
Fix python cast uint/int overflow ( #3448 )
...
* Fix numpy uint/int overflow
* lol
* Works
* Update
* Move overflow test to float64/float32
* One line
* Update
* One more
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com >
2024-02-20 09:20:43 +01:00
qazal
7864fb69d1
delete MovementOps ( #3434 )
...
* delete MovementOps
* keep extra/to_movement_ops.py
2024-02-19 23:21:44 +01:00
nimlgen
015d414786
fix gpu page fault by ensuring code memory persistence during execution ( #3435 )
...
* fix pf for exec image memory
* no new noqa: E501
2024-02-19 13:40:53 +01:00
Daniel Yeh
0a4029c519
fix path to models folder ( #3442 )
...
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de >
2024-02-19 13:35:57 +01:00
Patrick Tsai
ac9d94a068
Cast correctly in python emulator (dtype tests pass) ( #3446 )
...
* Cast correctly in python emulator
* Update test yml and fix lint
* make ruff pass
* mypy passes
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com >
2024-02-19 13:34:02 +01:00