Commit Graph

3100 Commits

Author SHA1 Message Date
chenyu
dad4ee4539 use least_upper_dtype mlops to upcast the output type in mlops (#2788)
* InterpretedFlopCounter uses least_upper_dtype for output dtype

* fix target dtype check

* fix that
2023-12-15 23:46:57 -05:00
chenyu
1bc378c3d6 _broadcasted handles the python number types (#2785)
* _broadcasted handles the python number types

* disable that test
2023-12-15 22:43:27 -05:00
chenyu
0703075357 bf16 is float (#2786)
* add bfloat16 to is_float check

* and test
2023-12-15 21:41:30 -05:00
chenyu
e4bbbc5bc3 Revert "Use the reduceop dtype to define the acc in linearizer (#2625)" (#2783)
This reverts commit f3ed96a929.
2023-12-15 16:29:10 -05:00
qazal
f3ed96a929 Use the reduceop dtype to define the acc in linearizer (#2625)
* upcast the other way

* Revert "upcast the other way"

This reverts commit 355692ba79.

* remove uop cast, this should have never been there

* add regression test

* now fuzz it

correct test

* the accumulator is always the output type

lint

* fuzz all reduce ops

* MULACC upcast_dtype could be half too

opencl supports it https://man.opencl.org/mad.html

* cast to the same dtype is a noop

* internal casting support for MULACC

* fuzz test mulacc internal casting

* get_reduce_dtype

handle vectorized acc

update get_reduce_acc calls with the correct dtype

update tests

* pending _complete_ implementation of a function that gets the dtype based on self.reduceop

+more failing tests

* get_reduce_dtype try 2

add TODO

* get_lazyop_info already does it

* cleanup

* bring back internal casting support for mulacc

* use the scalar version of the acc dtype

* conceptual diff cleanup

* one extra line to a cleaner linearizer

* correct test assumptions - these should promote?

* rm mulacc cast, the cast of vins happens with the acc dtype promotion

linearizer hacks

* Revert "rm mulacc cast, the cast of vins happens with the acc dtype promotion"

This reverts commit afdd540733.

Revert "correct test assumptions - these should promote?"

This reverts commit 49ae2206ed.

* skip tests blocked by MULACC->lazyop cleanup

* final changes to add back internal casting for MULACC and update skip test logic, upcast works but downcast does not

* only test the linearizer abstraction layer

we wanna ensure that linearizer matches whatever lazy is returning

* remove unused hypothesis module

* remove mulacc related changes, those will move to the lazy pr

* remove midcast test

* move to helpers

* Revert "remove midcast test"

This reverts commit 86e74d7960.

add TODO with skip

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-15 16:14:32 -05:00
chenyu
765f8b05e5 TernaryOps.WHERE has vin[0] as bool and BinaryOps.CMPLT always outputs bool (#2782)
* vin[0] to where is always bool

* due to better hack

* update test

* fix test_uops
2023-12-15 14:51:51 -05:00
George Hotz
96a276cc7c hotfix: add test_reduce_permute_nofuse to master 2023-12-15 09:39:47 -08:00
qazal
66f07d97e2 don't auto-cast half to float in unary functions (#2776)
* least upper float

* dont cast to the same thing

* tests for least_upper_float

* add regression tests to test_dtype_alu

* the call is pretty cheap probably cache is too much overhead
2023-12-15 10:11:47 -05:00
George Hotz
c6eb618013 tests from new lazy branch (#2774)
* tests from new lazy branch

* fix lin 11

* that was needed

* doesn't fail

* mark

* meant that

* llvm passes
2023-12-14 23:06:39 -08:00
chenyu
a044125c39 validate stable diffusion for seed 0 (#2773)
* validate stable diffusion for seed 0

the closest false positive i can get is with the setup and one less step. dist = 0.0036
same setup with fp16 has dist=5e-6.
so setting validation threshold to 1e-4 should be good

* run with --seed 0
2023-12-15 00:07:09 -05:00
chenyu
9afa8009c1 hot fix explicitly set arange dtype to float (#2772) 2023-12-14 23:14:38 -05:00
chenyu
c0f76ed4ea transformer kvcache and mask have same dtype as input (#2771)
* transformer kvcache and mask have same dtype as input

* don't use `=0` in cstyle ternary where

* (bool)

* where float16 test
2023-12-14 22:41:51 -05:00
chenyu
2dd0dd4ae0 cleanup llvmir (#2770) 2023-12-14 18:13:22 -05:00
chenyu
66d9eb10b6 arange default dtype to int and zeros/ones default to float (#2769) 2023-12-14 17:53:00 -05:00
qazal
3cf4376ce2 test_linearizer cleanup (#2766)
* test_linearizer cleanup

* use unittest.skipIf

* update msg
2023-12-14 17:20:09 -05:00
chenyu
57017c87e9 remove duplicated dtype in DEFINE_GLOBAL args (#2768)
now DEFINE_GLOBAL uop.arg[1] is always the same as uop.dtype, we can remove the one in arg and just use uop.dtype
2023-12-14 15:42:36 -05:00
chenyu
5235cdee3d remove _arg_int32 internal type (#2767)
in DEFINE_GLOBAL, PtrDtype(int32) is buffer and int32 is int
2023-12-14 14:17:14 -05:00
chenyu
8a2a2257b4 minor onnx_op cleanups to prep dtype changes (#2764)
* minor onnx_op cleanups to prep dtype changes

read through it and clean some minor stuff

* revert embedding - is it really being tested
2023-12-14 13:01:27 -05:00
geohotstan
0398288b79 Getitem round3 .... (#2760)
* refactor round 3

* comment

* oops

* oops

* oops2

* factored out multiple condition

* add a comment for type

* wooaah roundup is cool, thanks chenyu lol

* add another walrus for symmetry and some spaces

* lol wtf useless listcompre
2023-12-14 12:22:37 -05:00
chenyu
0ae22b0f81 restore Tensor.default_type in test_hip_rdna3 (#2763)
might cause flaky tests
2023-12-14 11:35:38 -05:00
qazal
746cb5de21 Test coverage for matvec (#2762)
* add test coverage for matvec

* skip devices that don't support locals
2023-12-14 11:34:56 -05:00
chenyu
64fea9ff4a Revert "minor onnx_op cleanups to prep dtype changes (#2758)" (#2759)
This reverts commit 38da001b64.
2023-12-14 03:12:14 -05:00
chenyu
38da001b64 minor onnx_op cleanups to prep dtype changes (#2758)
read through it and clean some minor stuff
2023-12-14 03:05:59 -05:00
jaredeh
d8952fc575 updating to work with new internal apis (#2755) 2023-12-13 21:54:47 -08:00
chenyu
2c6814ba28 insert_before is None means insert at the end (#2757) 2023-12-13 21:05:10 -05:00
chenyu
aad005e220 set default str for CStyleLanguage.arg_int_prefix (#2756)
it's the same `const int` for clang, opencl, cuda and hip
metal overwrites with `constant int&` and webgl has its own thing
2023-12-13 20:23:27 -05:00
chenyu
107dd8f3d7 fix a typo in test_dtype_alu (#2754) 2023-12-13 19:23:21 -05:00
chenyu
fc6bca7ba8 update type annotation of _broadcasted (#2753)
input can be Tensor, float, int.
also updated scaled_dot_product_attention that might add a None to a Tensor
2023-12-13 19:03:14 -05:00
Maksym Sobolyev
bf4165ccac Fix double exception in __del__() when __init__() raises exception. (#2738) 2023-12-13 15:46:11 -08:00
chenyu
81a747fc63 more test cases in test_slice_fancy_indexing_with_idx (#2751) 2023-12-13 17:52:26 -05:00
chenyu
22feb7330e simplify fancy index with negative Tensor entries (#2749) 2023-12-13 14:45:50 -05:00
chenyu
b229879613 refactor _broadcasted (#2747)
also moved the expand noop check to .expand.
2023-12-13 13:36:25 -05:00
George Hotz
7e5b3e53fe changes to prep for new lazy (#2748)
* changes to prep for new lazy

* put those back
2023-12-13 10:28:22 -08:00
Umut Zengin
8ad7cfeeb1 More simplification in to_image_idx and symbolic (#2679)
* less valid

* add test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-13 12:30:44 -05:00
Ahmed Harmouche
e7248b677c Remove wgsl custom render_for (#2729)
* Generic for

* remove custom render_if

* Simplify for loop

* 150 line-length constraint

* Put custom render_if back
2023-12-13 09:04:17 -08:00
tomtom-95
6b0f07e94a add decorator to preserve info about original function (#2743) 2023-12-13 09:03:50 -08:00
chenyu
aa4a0de287 simpler Tensor.pow to integer (#2746) 2023-12-13 11:39:20 -05:00
chenyu
26f49869f4 minor tensor type annotation and cleanup (#2742) 2023-12-13 01:53:59 -05:00
chenyu
2ef33abd20 some unary functions cast int input into float (#2740)
* some unary functions cast int input into float

* precision

* image dtype
2023-12-13 00:10:29 -05:00
George Hotz
3e778fcc52 hotfix: *** 2023-12-12 19:44:31 -08:00
Shawn Hagler
51afe938f1 update onnx model links (#2737) 2023-12-12 19:11:11 -08:00
George Hotz
431fae5ed3 hotfix: update_stats cleanup, yellow is nicer than red 2023-12-12 17:50:22 -08:00
chenyu
0869e7a301 update onnx benchmark urls (#2735)
onnx is remapping the models, old ones are in archive/
2023-12-12 20:46:01 -05:00
George Hotz
6d6eb9302d ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
George Hotz
3635540ddb shorter line (#2733) 2023-12-12 15:34:17 -08:00
nimlgen
ede7971ada save some lines (#2731)
* remove unsused mem_cached var

* one more
2023-12-12 15:26:27 -08:00
chenyu
00b611c156 simplify type promotion - remove weak types (#2730) 2023-12-12 16:12:57 -05:00
Nguyen Nguyen Phuong
07cf45e133 fix cuda matmul (#2725) 2023-12-12 07:59:31 -08:00
chenyu
ef6e942a23 dtype promotion helpers (#2724)
* dtype promotion helpers

* better tests

* space
2023-12-11 23:14:23 -05:00
Christopher Mauri Milan
0232db294d fix tolist issue (#2723) 2023-12-11 19:14:00 -08:00