chenyu
969b57f0fe
enable symbolic_ops and jits test of two vars ( #3513 )
2024-02-27 11:17:46 -05:00
wozeparrot
ea4b8e5b1f
feat: don't hardcode the arch ( #3511 )
2024-02-27 07:58:03 -05:00
Francis Lam
11da65bccd
test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option ( #3455 )
...
* test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option
this allows us to limit the size of the kernel and reduce running
times by avoiding ones that take a long time
* fix spacing and re-order to put parameters together
2024-02-27 07:34:59 -05:00
qazal
a29cd6d464
run f64 increased precision tests on remu ( #3509 )
...
* run the test in CI
* temp: use the pre-release
* Revert "temp: use the pre-release"
This reverts commit 28e8571421 .
2024-02-26 18:01:07 -05:00
chenyu
b1426f3a4c
cleanup SumNode mod ( #3503 )
2024-02-26 11:10:55 -05:00
chenyu
61605ccc69
Remove special case of SumNode div SumNode ( #3502 )
2024-02-26 09:42:06 -05:00
Francis Lam
39d75f0d58
test_linearizer_failures: add more METAL examples ( #3495 )
...
these were obtained from running fuzz_linearizer on METAL
2024-02-26 10:19:05 +01:00
chenyu
b154089884
float64 function support for HIP ( #3492 )
...
* float64 function support for HIP
* not CI
2024-02-24 09:46:20 -05:00
chenyu
35aff8b0c2
properly exclude PYTHON backend and support of half ( #3491 )
...
should be able to run in CI with python 3.12
2024-02-24 09:22:06 -05:00
David Friehs
2fe98b64bb
fix Tensor.split not passing dim to Tensor.chunk ( #3490 )
2024-02-24 07:53:11 -05:00
Caleb Bunch
b41761488d
change specific string 'CLANG' to DEVICE variable in abstractions2.py ( #3488 )
2024-02-24 07:51:39 -05:00
chenyu
c032df520b
minor symbolic type related cleaups ( #3489 )
2024-02-23 16:44:43 -05:00
Carson Radtke
15df9406d6
fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test ( #3487 )
...
* fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test
* sqrt(0) != nan
* fix tabs
2024-02-23 18:28:00 +01:00
nimlgen
52567da07f
jit grapher simplified ( #3478 )
2024-02-23 16:20:16 +01:00
George Hotz
2113e1eb63
move all reduces to the end in lazy ( #3475 )
...
* move all reduces to the end in lazy
* apply as reshape, not permute
2024-02-23 15:49:11 +01:00
David Hou
5cfcc2a8d7
support MLB reshaping on-axis for evenly sharded ( #3484 )
...
* support MLB reshaping on-axis for evenly sharded
* update test
* not -> !=
2024-02-23 07:51:36 -05:00
chenyu
358a24eae6
symbolic use mod for rmod and use floordiv for rfloordiv ( #3485 )
2024-02-23 01:05:13 -05:00
nimlgen
6d048a0c0b
cache collector optimizations are allowed only for kernel operations ( #3476 )
2024-02-22 12:26:57 +01:00
George Hotz
7698781389
Revert "wmma: add CUDA tensor core ( #3464 )" ( #3474 )
...
This reverts commit e9cef13f0b .
2024-02-22 11:58:16 +01:00
Francis Lam
e9cef13f0b
wmma: add CUDA tensor core ( #3464 )
2024-02-22 11:57:08 +01:00
wozeparrot
57678012e1
Upload correct benchmark artifact ( #3471 )
...
* fix: correct filename
* fix: why is this .py?
2024-02-22 01:14:16 -05:00
chenyu
ab40c0cf93
clean up long lines in symbolic ( #3469 )
2024-02-21 21:57:44 -05:00
chenyu
7c0fc40123
enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d ( #3468 )
2024-02-21 18:30:12 -05:00
chenyu
77d2a4c12a
regenerate kernel dataset after reduce arg to axis change ( #3467 )
...
```
./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
2024-02-21 18:16:13 -05:00
David Hou
f513c37e64
support same uidx in multiple shape positions ( #3205 )
...
* support same uidx in multiple shape positions
* rename var
* update comment
* add contiguous index check to global_store too
* update comment
* small change
* is this better?
* smh
* smaller change?
* get rid of more changes
* get rid of more changes
* is this even making anything better
* comment
* fix test
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-02-21 19:37:03 +01:00
chenyu
1eb24af63b
fix softmax and log_softmax for 0d tensor ( #3463 )
...
matched torch to take axis \in [-1, 0] and used axis=None internally
2024-02-21 11:30:30 -05:00
George Hotz
871ba73e65
_reduce_op is axis based now ( #3462 )
...
* _reduce_op is axis based now
* axis_
* update lin failures
* disable that
* fix shape
2024-02-21 16:36:31 +01:00
George Hotz
22a90cbb15
change frontend reduce API to use axis ( #3460 )
...
* change frontend API to axis
* switch lazy to also take axis input
2024-02-21 12:26:17 +01:00
chenyu
6c1063ba39
add mypy --strict-equality to pre-commit ( #3458 )
...
matched ci mypy behavior
2024-02-21 03:41:05 -05:00
chenyu
02683a8659
gate the cast before movements in lazy ( #3452 )
...
it made gpt2 slower (2ms -> 2.5ms on 3090, 7ms -> 8ms on M1 Max with BEAM=2).
disabled it in gpt2 benchmark before understanding the full issue
2024-02-20 09:36:22 -05:00
chenyu
0d326a48b8
fix LtNode simplification when lhs and rhs contain same variables ( #3451 )
...
* fix LtNode simplification when lhs and rhs contain same variables
`(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)`
* fix with less perf impact
2024-02-20 09:06:55 -05:00
George Hotz
1b6e890ef2
uops flop counter ( #3373 )
...
* factor out winograd functions
* test counter
* uops flop counter
* more correct
* ish
* correct
* cleanup
* tests for uops flop counter
* tests still fail
* fix symbolic uops flop cnt
* fix symbolic uops flop cnt
* hmm, it's an alu
* uops alu resolve
* relax that
2024-02-20 09:36:30 +01:00
Patrick Tsai
9dd64b1f5f
Fix python cast uint/int overflow ( #3448 )
...
* Fix numpy uint/int overflow
* lol
* Works
* Update
* Move overflow test to float64/float32
* One line
* Update
* One more
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com >
2024-02-20 09:20:43 +01:00
qazal
7864fb69d1
delete MovementOps ( #3434 )
...
* delete MovementOps
* keep extra/to_movement_ops.py
2024-02-19 23:21:44 +01:00
nimlgen
015d414786
fix gpu page fault by ensuring code memory persistence during execution ( #3435 )
...
* fix pf for exec image memory
* no new noqa: E501
2024-02-19 13:40:53 +01:00
Daniel Yeh
0a4029c519
fix path to models folder ( #3442 )
...
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de >
2024-02-19 13:35:57 +01:00
Patrick Tsai
ac9d94a068
Cast correctly in python emulator (dtype tests pass) ( #3446 )
...
* Cast correctly in python emulator
* Update test yml and fix lint
* make ruff pass
* mypy passes
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com >
2024-02-19 13:34:02 +01:00
chenyu
ddec76e9c4
remove unused LtNode.__floordiv__ ( #3445 )
2024-02-18 22:12:54 -05:00
chenyu
86efdf0b34
remove create_rednode ( #3444 )
...
handle Node collapsing into NumNode similar to OpNode
2024-02-18 21:08:19 -05:00
chenyu
2da734920e
use __getnewargs__ to fix unpickling Variable ( #3441 )
...
it's recommended to use __getnewargs__ to update the args of classes that use __new__ when unpickling.
It's preferred because it does not change the __new__ behavior.
2024-02-18 10:28:37 -05:00
nimlgen
5647148937
fix hip invalid ordinal ( #3440 )
2024-02-18 08:31:44 -05:00
chenyu
8c0e85fdaf
limit symbolic substitute var_vals to have NumNode or Variable ( #3438 )
...
this can greatly reduce the posiible output types of substitute
2024-02-18 01:29:44 -05:00
George Hotz
6b4f734dc1
hotfix: better copy stats
2024-02-16 16:52:39 +01:00
George Hotz
c7fda10aa0
hotfix: disk doesn't sync
2024-02-16 16:46:48 +01:00
chenyu
230fc33d5b
limit sint to be Union[int, Variable, MulNode, SumNode] ( #3430 )
...
* limit sint to be Union[int, Variable, MulNode, SumNode]
these are the only allowed nodes in a Tensor shape
* stride can be sint
2024-02-16 10:05:46 -05:00
George Hotz
fe97a85014
the compiler is a driver ( #3427 )
2024-02-16 10:18:09 +01:00
zku
2d702ca073
If feasible, do not truncate float64 down to float32 in cstyle renderer ( #3420 )
...
* do not truncate float64 precision
* use l suffix to try avoid overload confusion
* long line, ruff bloats the function otherwise
* fmt
* remove long double suffix (l), it's sufficient to have the float32 (f) suffix to avoid function overload ambigouity; add test showcasing rtol=1e-12 precision increase, the test fails without the renderer changes
* use more reasonable test values, same as test_int_to_float_unary_func
* disable test for CUDACPU, does not support half and segfaults on some operations per dtypes_alu test
* disable test for HIP, renderer does not support f64 precision
* do not use noqa E501, break up condition
2024-02-16 10:08:59 +01:00
chenyu
30f26279c5
add back "CPU" in test_onnx_backend supports_device ( #3426 )
...
the onnx tests were all skipped.
2024-02-16 00:49:30 -05:00
xarkes
28a8b72024
Remove Interpreted device & remaining CPU/TORCH ref ( #3423 )
...
* Remove Interpreted device & remaining CPU/TORCH ref
* Oops
* supports_device was useful
* Fix doc wording
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-02-16 00:30:21 -05:00
chenyu
6efa68f97b
remove use of TORCH in pre-commit ( #3424 )
...
it's silently using DEFAULT after removing TORCH
2024-02-15 19:38:37 -05:00