Commit Graph

10417 Commits

Author SHA1 Message Date
chenyu
4e2a92cee1 run HALF GPT2 in nvidia benchmark in addition to HALF/BEAM (#2811)
easier to separate the issue between HALF and BEAM when it failed
2023-12-17 02:24:55 -05:00
George Hotz
bad0ff60b7 start Qualcomm GPU driver (#2804)
* hooking works

* working

* qcom work

* parsing command buffers

* proper parse
2023-12-16 23:10:50 -08:00
chenyu
71a60762ed Revert "Make Tensor creation allow multi-dim list of int and bool (#2793)" (#2810)
This reverts commit 798bf813b1.
2023-12-17 02:03:52 -05:00
geohotstan
798bf813b1 Make Tensor creation allow multi-dim list of int and bool (#2793)
* the universe is flat as a 2D tensor

* try this

* TESTS

* less lines in test

* don't change all_int since other places use it

* add tests and del noqa by making non-aesthetic spacing LOOOOOL

* some reordering

* fixed empty list and add tests

* more tests

* add list bool tensors

* clearer with least lines added

* added bool

* oops

* more tests

* improved tests

* oops
2023-12-17 01:58:10 -05:00
chenyu
85c6250a3e support Tensor.einsum with no "->" in formula (#2807)
output is the sorted alphabets if there's no "->"
2023-12-17 00:46:24 -05:00
chenyu
157c0be509 cleanup onnx, pass one more reshape test and remove some casts (#2806) 2023-12-16 20:40:43 -05:00
chenyu
baa94d6142 Tensor(False) has dtypes.bool (#2805) 2023-12-16 19:04:08 -05:00
chenyu
fa84998244 Revert "Tensor.randint is Tensor.uniform with dtypes.int32 (#2801)" (#2802)
This reverts commit 86c2f267d4.
2023-12-16 15:53:28 -05:00
chenyu
86c2f267d4 Tensor.randint is Tensor.uniform with dtypes.int32 (#2801) 2023-12-16 15:14:50 -05:00
chenyu
0bb5d8f956 Revert "Green Uop unary check (#2792)" (#2799)
This reverts commit d958777aed.
2023-12-16 12:49:28 -05:00
qazal
d958777aed Green Uop unary check (#2792)
* assert UnaryOps has same dtype as input in uop

* fallback to float on images

* just unary ops for now

* pass amt

* noqa on the temp line

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-16 12:43:53 -05:00
George Hotz
051402625e remove pushing contig + fix linearizer bug (#2798)
* remove that logic

* fix test, move LOADs

* fix repeat issue on LLVM

* with_phi
2023-12-16 09:36:31 -08:00
Ahmed Harmouche
a7264dcb2b inf hack that works on chrome and wgpu (#2712)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-12-16 11:49:37 -05:00
George Hotz
877c78b4ce lazy tests (#2796)
* tests

* mini sd is very mini
2023-12-16 08:24:21 -08:00
chenyu
88ff1edcf0 fix tensor creation with a list and dtype bfloat16 (#2795)
it went through numpy and numpy does not have bfloat16.

also added broadcasted with a python bool.
2023-12-16 10:06:47 -05:00
chenyu
bb6f7b6172 rsqrt is self.reciprocal().sqrt() (#2790)
(1/self) is incorrect for int tensor
2023-12-16 01:58:05 -05:00
chenyu
c5fa9eb36e int / List[int] data -> dtypes.int32 (#2789) 2023-12-16 01:25:44 -05:00
chenyu
dad4ee4539 use least_upper_dtype mlops to upcast the output type in mlops (#2788)
* InterpretedFlopCounter uses least_upper_dtype for output dtype

* fix target dtype check

* fix that
2023-12-15 23:46:57 -05:00
chenyu
1bc378c3d6 _broadcasted handles the python number types (#2785)
* _broadcasted handles the python number types

* disable that test
2023-12-15 22:43:27 -05:00
chenyu
0703075357 bf16 is float (#2786)
* add bfloat16 to is_float check

* and test
2023-12-15 21:41:30 -05:00
chenyu
e4bbbc5bc3 Revert "Use the reduceop dtype to define the acc in linearizer (#2625)" (#2783)
This reverts commit f3ed96a929.
2023-12-15 16:29:10 -05:00
qazal
f3ed96a929 Use the reduceop dtype to define the acc in linearizer (#2625)
* upcast the other way

* Revert "upcast the other way"

This reverts commit 355692ba79.

* remove uop cast, this should have never been there

* add regression test

* now fuzz it

correct test

* the accumulator is always the output type

lint

* fuzz all reduce ops

* MULACC upcast_dtype could be half too

opencl supports it https://man.opencl.org/mad.html

* cast to the same dtype is a noop

* internal casting support for MULACC

* fuzz test mulacc internal casting

* get_reduce_dtype

handle vectorized acc

update get_reduce_acc calls with the correct dtype

update tests

* pending _complete_ implementation of a function that gets the dtype based on self.reduceop

+more failing tests

* get_reduce_dtype try 2

add TODO

* get_lazyop_info already does it

* cleanup

* bring back internal casting support for mulacc

* use the scalar version of the acc dtype

* conceptual diff cleanup

* one extra line to a cleaner linearizer

* correct test assumptions - these should promote?

* rm mulacc cast, the cast of vins happens with the acc dtype promotion

linearizer hacks

* Revert "rm mulacc cast, the cast of vins happens with the acc dtype promotion"

This reverts commit afdd540733.

Revert "correct test assumptions - these should promote?"

This reverts commit 49ae2206ed.

* skip tests blocked by MULACC->lazyop cleanup

* final changes to add back internal casting for MULACC and update skip test logic, upcast works but downcast does not

* only test the linearizer abstraction layer

we wanna ensure that linearizer matches whatever lazy is returning

* remove unused hypothesis module

* remove mulacc related changes, those will move to the lazy pr

* remove midcast test

* move to helpers

* Revert "remove midcast test"

This reverts commit 86e74d7960.

add TODO with skip

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-15 16:14:32 -05:00
chenyu
765f8b05e5 TernaryOps.WHERE has vin[0] as bool and BinaryOps.CMPLT always outputs bool (#2782)
* vin[0] to where is always bool

* due to better hack

* update test

* fix test_uops
2023-12-15 14:51:51 -05:00
George Hotz
96a276cc7c hotfix: add test_reduce_permute_nofuse to master 2023-12-15 09:39:47 -08:00
qazal
66f07d97e2 don't auto-cast half to float in unary functions (#2776)
* least upper float

* dont cast to the same thing

* tests for least_upper_float

* add regression tests to test_dtype_alu

* the call is pretty cheap probably cache is too much overhead
2023-12-15 10:11:47 -05:00
George Hotz
c6eb618013 tests from new lazy branch (#2774)
* tests from new lazy branch

* fix lin 11

* that was needed

* doesn't fail

* mark

* meant that

* llvm passes
2023-12-14 23:06:39 -08:00
chenyu
a044125c39 validate stable diffusion for seed 0 (#2773)
* validate stable diffusion for seed 0

the closest false positive i can get is with the setup and one less step. dist = 0.0036
same setup with fp16 has dist=5e-6.
so setting validation threshold to 1e-4 should be good

* run with --seed 0
2023-12-15 00:07:09 -05:00
chenyu
9afa8009c1 hot fix explicitly set arange dtype to float (#2772) 2023-12-14 23:14:38 -05:00
chenyu
c0f76ed4ea transformer kvcache and mask have same dtype as input (#2771)
* transformer kvcache and mask have same dtype as input

* don't use `=0` in cstyle ternary where

* (bool)

* where float16 test
2023-12-14 22:41:51 -05:00
chenyu
2dd0dd4ae0 cleanup llvmir (#2770) 2023-12-14 18:13:22 -05:00
chenyu
66d9eb10b6 arange default dtype to int and zeros/ones default to float (#2769) 2023-12-14 17:53:00 -05:00
qazal
3cf4376ce2 test_linearizer cleanup (#2766)
* test_linearizer cleanup

* use unittest.skipIf

* update msg
2023-12-14 17:20:09 -05:00
chenyu
57017c87e9 remove duplicated dtype in DEFINE_GLOBAL args (#2768)
now DEFINE_GLOBAL uop.arg[1] is always the same as uop.dtype, we can remove the one in arg and just use uop.dtype
2023-12-14 15:42:36 -05:00
chenyu
5235cdee3d remove _arg_int32 internal type (#2767)
in DEFINE_GLOBAL, PtrDtype(int32) is buffer and int32 is int
2023-12-14 14:17:14 -05:00
chenyu
8a2a2257b4 minor onnx_op cleanups to prep dtype changes (#2764)
* minor onnx_op cleanups to prep dtype changes

read through it and clean some minor stuff

* revert embedding - is it really being tested
2023-12-14 13:01:27 -05:00
geohotstan
0398288b79 Getitem round3 .... (#2760)
* refactor round 3

* comment

* oops

* oops

* oops2

* factored out multiple condition

* add a comment for type

* wooaah roundup is cool, thanks chenyu lol

* add another walrus for symmetry and some spaces

* lol wtf useless listcompre
2023-12-14 12:22:37 -05:00
chenyu
0ae22b0f81 restore Tensor.default_type in test_hip_rdna3 (#2763)
might cause flaky tests
2023-12-14 11:35:38 -05:00
qazal
746cb5de21 Test coverage for matvec (#2762)
* add test coverage for matvec

* skip devices that don't support locals
2023-12-14 11:34:56 -05:00
chenyu
64fea9ff4a Revert "minor onnx_op cleanups to prep dtype changes (#2758)" (#2759)
This reverts commit 38da001b64.
2023-12-14 03:12:14 -05:00
chenyu
38da001b64 minor onnx_op cleanups to prep dtype changes (#2758)
read through it and clean some minor stuff
2023-12-14 03:05:59 -05:00
jaredeh
d8952fc575 updating to work with new internal apis (#2755) 2023-12-13 21:54:47 -08:00
chenyu
2c6814ba28 insert_before is None means insert at the end (#2757) 2023-12-13 21:05:10 -05:00
chenyu
aad005e220 set default str for CStyleLanguage.arg_int_prefix (#2756)
it's the same `const int` for clang, opencl, cuda and hip
metal overwrites with `constant int&` and webgl has its own thing
2023-12-13 20:23:27 -05:00
chenyu
107dd8f3d7 fix a typo in test_dtype_alu (#2754) 2023-12-13 19:23:21 -05:00
chenyu
fc6bca7ba8 update type annotation of _broadcasted (#2753)
input can be Tensor, float, int.
also updated scaled_dot_product_attention that might add a None to a Tensor
2023-12-13 19:03:14 -05:00
Maksym Sobolyev
bf4165ccac Fix double exception in __del__() when __init__() raises exception. (#2738) 2023-12-13 15:46:11 -08:00
chenyu
81a747fc63 more test cases in test_slice_fancy_indexing_with_idx (#2751) 2023-12-13 17:52:26 -05:00
chenyu
22feb7330e simplify fancy index with negative Tensor entries (#2749) 2023-12-13 14:45:50 -05:00
chenyu
b229879613 refactor _broadcasted (#2747)
also moved the expand noop check to .expand.
2023-12-13 13:36:25 -05:00
George Hotz
7e5b3e53fe changes to prep for new lazy (#2748)
* changes to prep for new lazy

* put those back
2023-12-13 10:28:22 -08:00