Commit Graph

10633 Commits

Author SHA1 Message Date
chenyu
61605ccc69 Remove special case of SumNode div SumNode (#3502) 2024-02-26 09:42:06 -05:00
Francis Lam
39d75f0d58 test_linearizer_failures: add more METAL examples (#3495)
these were obtained from running fuzz_linearizer on METAL
2024-02-26 10:19:05 +01:00
chenyu
b154089884 float64 function support for HIP (#3492)
* float64 function support for HIP

* not CI
2024-02-24 09:46:20 -05:00
chenyu
35aff8b0c2 properly exclude PYTHON backend and support of half (#3491)
should be able to run in CI with python 3.12
2024-02-24 09:22:06 -05:00
David Friehs
2fe98b64bb fix Tensor.split not passing dim to Tensor.chunk (#3490) 2024-02-24 07:53:11 -05:00
Caleb Bunch
b41761488d change specific string 'CLANG' to DEVICE variable in abstractions2.py (#3488) 2024-02-24 07:51:39 -05:00
chenyu
c032df520b minor symbolic type related cleaups (#3489) 2024-02-23 16:44:43 -05:00
Carson Radtke
15df9406d6 fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test (#3487)
* fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test

* sqrt(0) != nan

* fix tabs
2024-02-23 18:28:00 +01:00
nimlgen
52567da07f jit grapher simplified (#3478) 2024-02-23 16:20:16 +01:00
George Hotz
2113e1eb63 move all reduces to the end in lazy (#3475)
* move all reduces to the end in lazy

* apply as reshape, not permute
2024-02-23 15:49:11 +01:00
David Hou
5cfcc2a8d7 support MLB reshaping on-axis for evenly sharded (#3484)
* support MLB reshaping on-axis for evenly sharded

* update test

* not -> !=
2024-02-23 07:51:36 -05:00
chenyu
358a24eae6 symbolic use mod for rmod and use floordiv for rfloordiv (#3485) 2024-02-23 01:05:13 -05:00
nimlgen
6d048a0c0b cache collector optimizations are allowed only for kernel operations (#3476) 2024-02-22 12:26:57 +01:00
George Hotz
7698781389 Revert "wmma: add CUDA tensor core (#3464)" (#3474)
This reverts commit e9cef13f0b.
2024-02-22 11:58:16 +01:00
Francis Lam
e9cef13f0b wmma: add CUDA tensor core (#3464) 2024-02-22 11:57:08 +01:00
wozeparrot
57678012e1 Upload correct benchmark artifact (#3471)
* fix: correct filename

* fix: why is this .py?
2024-02-22 01:14:16 -05:00
chenyu
ab40c0cf93 clean up long lines in symbolic (#3469) 2024-02-21 21:57:44 -05:00
chenyu
7c0fc40123 enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d (#3468) 2024-02-21 18:30:12 -05:00
chenyu
77d2a4c12a regenerate kernel dataset after reduce arg to axis change (#3467)
```
./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
2024-02-21 18:16:13 -05:00
David Hou
f513c37e64 support same uidx in multiple shape positions (#3205)
* support same uidx in multiple shape positions

* rename var

* update comment

* add contiguous index check to global_store too

* update comment

* small change

* is this better?

* smh

* smaller change?

* get rid of more changes

* get rid of more changes

* is this even making anything better

* comment

* fix test

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-02-21 19:37:03 +01:00
chenyu
1eb24af63b fix softmax and log_softmax for 0d tensor (#3463)
matched torch to take axis \in [-1, 0] and used axis=None internally
2024-02-21 11:30:30 -05:00
George Hotz
871ba73e65 _reduce_op is axis based now (#3462)
* _reduce_op is axis based now

* axis_

* update lin failures

* disable that

* fix shape
2024-02-21 16:36:31 +01:00
George Hotz
22a90cbb15 change frontend reduce API to use axis (#3460)
* change frontend API to axis

* switch lazy to also take axis input
2024-02-21 12:26:17 +01:00
chenyu
6c1063ba39 add mypy --strict-equality to pre-commit (#3458)
matched ci mypy behavior
2024-02-21 03:41:05 -05:00
chenyu
02683a8659 gate the cast before movements in lazy (#3452)
it made gpt2 slower (2ms -> 2.5ms on 3090, 7ms -> 8ms on M1 Max with BEAM=2).
disabled it in gpt2 benchmark before understanding the full issue
2024-02-20 09:36:22 -05:00
chenyu
0d326a48b8 fix LtNode simplification when lhs and rhs contain same variables (#3451)
* fix LtNode simplification when lhs and rhs contain same variables

`(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)`

* fix with less perf impact
2024-02-20 09:06:55 -05:00
George Hotz
1b6e890ef2 uops flop counter (#3373)
* factor out winograd functions

* test counter

* uops flop counter

* more correct

* ish

* correct

* cleanup

* tests for uops flop counter

* tests still fail

* fix symbolic uops flop cnt

* fix symbolic uops flop cnt

* hmm, it's an alu

* uops alu resolve

* relax that
2024-02-20 09:36:30 +01:00
Patrick Tsai
9dd64b1f5f Fix python cast uint/int overflow (#3448)
* Fix numpy uint/int overflow

* lol

* Works

* Update

* Move overflow test to float64/float32

* One line

* Update

* One more

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-02-20 09:20:43 +01:00
qazal
7864fb69d1 delete MovementOps (#3434)
* delete MovementOps

* keep extra/to_movement_ops.py
2024-02-19 23:21:44 +01:00
nimlgen
015d414786 fix gpu page fault by ensuring code memory persistence during execution (#3435)
* fix pf for exec image memory

* no new noqa: E501
2024-02-19 13:40:53 +01:00
Daniel Yeh
0a4029c519 fix path to models folder (#3442)
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de>
2024-02-19 13:35:57 +01:00
Patrick Tsai
ac9d94a068 Cast correctly in python emulator (dtype tests pass) (#3446)
* Cast correctly in python emulator

* Update test yml and fix lint

* make ruff pass

* mypy passes

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-02-19 13:34:02 +01:00
chenyu
ddec76e9c4 remove unused LtNode.__floordiv__ (#3445) 2024-02-18 22:12:54 -05:00
chenyu
86efdf0b34 remove create_rednode (#3444)
handle Node collapsing into NumNode similar to OpNode
2024-02-18 21:08:19 -05:00
chenyu
2da734920e use __getnewargs__ to fix unpickling Variable (#3441)
it's recommended to use __getnewargs__ to update the args of classes that use __new__ when unpickling.
It's preferred because it does not change the __new__ behavior.
2024-02-18 10:28:37 -05:00
nimlgen
5647148937 fix hip invalid ordinal (#3440) 2024-02-18 08:31:44 -05:00
chenyu
8c0e85fdaf limit symbolic substitute var_vals to have NumNode or Variable (#3438)
this can greatly reduce the posiible output types of substitute
2024-02-18 01:29:44 -05:00
George Hotz
6b4f734dc1 hotfix: better copy stats 2024-02-16 16:52:39 +01:00
George Hotz
c7fda10aa0 hotfix: disk doesn't sync 2024-02-16 16:46:48 +01:00
chenyu
230fc33d5b limit sint to be Union[int, Variable, MulNode, SumNode] (#3430)
* limit sint to be Union[int, Variable, MulNode, SumNode]

these are the only allowed nodes in a Tensor shape

* stride can be sint
2024-02-16 10:05:46 -05:00
George Hotz
fe97a85014 the compiler is a driver (#3427) 2024-02-16 10:18:09 +01:00
zku
2d702ca073 If feasible, do not truncate float64 down to float32 in cstyle renderer (#3420)
* do not truncate float64 precision

* use l suffix to try avoid overload confusion

* long line, ruff bloats the function otherwise

* fmt

* remove long double suffix (l), it's sufficient to have the float32 (f) suffix to avoid function overload ambigouity; add test showcasing rtol=1e-12 precision increase, the test fails without the renderer changes

* use more reasonable test values, same as test_int_to_float_unary_func

* disable test for CUDACPU, does not support half and segfaults on some operations per dtypes_alu test

* disable test for HIP, renderer does not support f64 precision

* do not use noqa E501, break up condition
2024-02-16 10:08:59 +01:00
chenyu
30f26279c5 add back "CPU" in test_onnx_backend supports_device (#3426)
the onnx tests were all skipped.
2024-02-16 00:49:30 -05:00
xarkes
28a8b72024 Remove Interpreted device & remaining CPU/TORCH ref (#3423)
* Remove Interpreted device & remaining CPU/TORCH ref

* Oops

* supports_device was useful

* Fix doc wording

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-16 00:30:21 -05:00
chenyu
6efa68f97b remove use of TORCH in pre-commit (#3424)
it's silently using DEFAULT after removing TORCH
2024-02-15 19:38:37 -05:00
geohotstan
5eb4c902f6 correct division dtype casting (#3405)
* 新年快乐

* fix: exclude floordiv onnx tests

* fix: less weird if statements in div

* 龙年大吉

* fix: tempfix onnx div

* fix: use reference impl for div
2024-02-15 19:34:40 -05:00
George Hotz
5de660ca0d disk runner (prereq for interpreted removal) (#3421)
* disk runner

* simpler diskrunner
2024-02-15 18:14:05 +01:00
qazal
e1a57fe58a test the behavior, not the implementation (#3419) 2024-02-15 17:23:42 +01:00
George Hotz
b1c0d8c99d remove cpu and torch backends (#3399)
* remove cpu and torch backends

* don't copy to cpu

* use clang instead of cpu

* multitensor gathers on the first device

* clang is cpu + use default

* fixup

* bugfix
2024-02-15 16:55:39 +01:00
Obada Khalili
75f7e21a80 Make tests in test/test_ops.py pass for Python emulator (#3384)
* fix OverflowError in UnaryOps.EXP2

* avoid accessing outputs for void uops

* skip execution for UOps.IF and UOps.ENDIF

* initialize bytearray to the correct size in UOps.DEFINE_LOCAL

* validate len of input that has .sz > 1

* remove comment in code

* reinitialize loop of already iterated

* validate first value in input to be a list for inputs with .sz > 1

* add python ops tests to CI

* skip long runtime tests for PYTHON backend

* respect dtype.sz arg in UOps.CONST, and remove incorrect validation in UOps.STORE

* use math.inf instead of float('int')

* handle 0 args to UnaryOPs.LOG2

* handle load op with default of .sz > 1

* initialize the loop correctly using UOps.LOOP arg

* remove unnecessary TODO comment

* remove newline

* select a subset of 22 ops tests to skip in CI when PYTHON=1

* handle gated UOps.LOAD referencing values that have .sz > 1

* Revert "select a subset of 22 ops tests to skip in CI when PYTHON=1"

This reverts commit 7674fee81d.

* skip tests in python backend CI command

* push fix lost in conflict resolve

* Revert "skip long runtime tests for PYTHON backend"

This reverts commit 5dd2a0376e.

* clear loop state after last iteration
2024-02-15 16:40:25 +01:00