Commit Graph

73 Commits

Author SHA1 Message Date
Jyotirmaya Mahanta
2ef09ca641 update test_ptr_ne (#3130) 2024-01-15 11:36:29 -05:00
George Hotz
c5a941d466 webgl backend in extra (#3041)
* WebGL WIP

* 84% of ops passing test

* tests passing 100%

* Cleanup, refactor

* Shave off some lines

* Work on dtypes

* TestOps at 100% again

* Efficient net shaders compile in browser webgl2

* Compile all efficientnet shaders in browser

* Create empty textures for tensor buffers

* Run program. Up next weight loading

* Exported WebGL model working

* Add tests, refactor

* Explicit cast alu for GLSL

* Fix CI tests

* WebGL efficientnet demo

* Compile and run yolov8 in browser

* Fix imports

* Simplify yolo compile

* Fix bool*bool and cast cmplt to float

* More tests

* Do std tests pass on CI?

* Skip std tests on CI

* Remove explicit_cast_alu hack, and solve it in code_for_op

* Move to new dtype-less alloc api

* Remove local size hack: optimize local_size only if device has local

* Remove glsl.py, and move content to cstyle

* dont_use_locals in opts

* Fix dtype tests

* type_map in CStyleLanguage

* Make core changes smaller, cleaner, refactor export_model and demo

* Skip pad_slice

* Simplify: render_const, render_conditional

* solve bool alu for other binops, cleaner ops_webgl

* Fix noopt hack

* Remove some skipIfs

* WebGL image hack

* type_names is a better name

* global_max

* Fix dtype import

* Fix type_names -> type_map

* Fix lint

* Remove webgpu, back to 5k lines (#3040)

* remove webgpu

* max 5000 lines

* revert those to master

* retain that cstyle

---------

Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>
2024-01-08 09:29:13 -08:00
George Hotz
f432ec9c33 Bitcast hip fix + fix mixtral (#3022)
* fix bitcast in hip

* wrong dtype for precast, double COPY
2024-01-05 14:51:25 -08:00
chenyu
9f39165188 correct (dtype, device) in test_dtype.is_dtype_supported (#3007)
corrected dtypes for TORCH and float64 support
2024-01-04 00:25:37 -05:00
chenyu
ff5399f053 move one last dtype test from test_helpers to test_dtype (#2975) 2024-01-02 12:37:56 -05:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
063f465604 simpler webgpu (#2956)
* simpler webgpu

* skip that test
2024-01-01 10:28:59 -08:00
chenyu
54629b56d2 minor cleanup in kernel and linearizer (#2937)
* minor cleanup in kernel and linearizer

less long line, spaces and colocate variables

* no deadline in hypothesis test
2023-12-26 12:05:32 -05:00
qazal
dca5e4fe74 tensor == tensor should be bool (#2916)
* return bool

* add tests to the type spec

* fix multinomial

* fix tril

* fix round

* fix NegativeLogLikelihoodLoss

* rm debug

* webgpu

* more webgpu

* bitwise or for adding two bools

* onnx ops dont need to cast anymore

* Revert "bitwise or for adding two bools"

This reverts commit b413babffa.

* workaround for metal neg

* just the tests in the type spec
2023-12-25 12:38:47 -05:00
chenyu
8a8aed23d2 test dtypes of return values of cumsum, argmax/min, multinomial (#2933)
* test dtypes of return values of cumsum, argmax/min, multinomial

cumsum behaves like sum, and functions that return an index return in dtypes.default_int

* because webgpu is different
2023-12-25 11:33:17 -05:00
chenyu
b55b55d56e use at least int32 and uint32 for sum output (#2926)
* use at least int32 and uint32 for sum output

* use the correct type for acc

* fix opencl

* llvm mulacc
2023-12-24 01:14:54 -05:00
chenyu
50927defad s/lazydata.realized/lazydata.base.realized/g (#2914)
* s/lazydata.realized/lazydata.base.realized/g

* not that
2023-12-22 14:45:13 -05:00
chenyu
3855432265 don't use numpy to create Tensor(None) (#2909)
* don't use numpy to create Tensor(None)

empty suffices

* parentheses
2023-12-22 01:07:44 -05:00
chenyu
a543d8bea8 fuzz default dtypes for some test_dtype tests (#2906)
* fuzz default dtypes for some test_dtype tests

* ocd

* setUp and tearDown
2023-12-21 22:00:21 -05:00
chenyu
264fe9c93f clean up test_dtype.py (#2827)
make is_dtype_supported a pure function and clean up long lines
2023-12-18 16:06:09 -05:00
chenyu
0723f26c80 dtypes.default_float and dtypes.default_int (#2824) 2023-12-18 12:21:44 -05:00
chenyu
8aab19ce3d Tensor.full of bool has dtypes.bool (#2823) 2023-12-18 10:51:17 -05:00
Maksym Sobolyev
887f3d9933 Make torch backend more usable, fix bfloat support in the llvm backend (#2765)
* Uncripple dtype tests, TestBFloat16DType never actually runs.

* Fix conversion from/to bfloat16.

Call cast() recursively, so that it works for any type combo.

* Run this test on torch backend as well.

* Add torch.bfloat16.

* Add support for ushort and uint.

* Convert np.uint32 to np.int32 when loading.

* Fix warning.
2023-12-17 14:04:26 -05:00
chenyu
baa94d6142 Tensor(False) has dtypes.bool (#2805) 2023-12-16 19:04:08 -05:00
chenyu
88ff1edcf0 fix tensor creation with a list and dtype bfloat16 (#2795)
it went through numpy and numpy does not have bfloat16.

also added broadcasted with a python bool.
2023-12-16 10:06:47 -05:00
chenyu
bb6f7b6172 rsqrt is self.reciprocal().sqrt() (#2790)
(1/self) is incorrect for int tensor
2023-12-16 01:58:05 -05:00
chenyu
c5fa9eb36e int / List[int] data -> dtypes.int32 (#2789) 2023-12-16 01:25:44 -05:00
chenyu
dad4ee4539 use least_upper_dtype mlops to upcast the output type in mlops (#2788)
* InterpretedFlopCounter uses least_upper_dtype for output dtype

* fix target dtype check

* fix that
2023-12-15 23:46:57 -05:00
chenyu
1bc378c3d6 _broadcasted handles the python number types (#2785)
* _broadcasted handles the python number types

* disable that test
2023-12-15 22:43:27 -05:00
chenyu
0703075357 bf16 is float (#2786)
* add bfloat16 to is_float check

* and test
2023-12-15 21:41:30 -05:00
qazal
66f07d97e2 don't auto-cast half to float in unary functions (#2776)
* least upper float

* dont cast to the same thing

* tests for least_upper_float

* add regression tests to test_dtype_alu

* the call is pretty cheap probably cache is too much overhead
2023-12-15 10:11:47 -05:00
chenyu
66d9eb10b6 arange default dtype to int and zeros/ones default to float (#2769) 2023-12-14 17:53:00 -05:00
chenyu
5235cdee3d remove _arg_int32 internal type (#2767)
in DEFINE_GLOBAL, PtrDtype(int32) is buffer and int32 is int
2023-12-14 14:17:14 -05:00
chenyu
2ef33abd20 some unary functions cast int input into float (#2740)
* some unary functions cast int input into float

* precision

* image dtype
2023-12-13 00:10:29 -05:00
George Hotz
6d6eb9302d ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
chenyu
00b611c156 simplify type promotion - remove weak types (#2730) 2023-12-12 16:12:57 -05:00
chenyu
ef6e942a23 dtype promotion helpers (#2724)
* dtype promotion helpers

* better tests

* space
2023-12-11 23:14:23 -05:00
Christopher Mauri Milan
0232db294d fix tolist issue (#2723) 2023-12-11 19:14:00 -08:00
chenyu
4075208127 some dtype creation spec test cases (#2722) 2023-12-11 19:33:49 -05:00
qazal
a43bc78804 fix dtypes helpers for integers (#2716)
* scalar

* maybe do this instead

* Revert "scalar"

everything is a scalar

* add tests in test_dtype

* fuzz testing + fix unsigned ints

* fuzz everything
2023-12-11 09:28:19 -08:00
qazal
be09cc87c1 Bitcast support / fast bf16 load (#2011)
* bitcast renderers

* fast llama load

* make it one kernel

* regression testing p1: re-enable test_dtype for all backends

fix GPU

* regression testing p2: fuzz all possible cases against numpy

remove hancoded tests since the fuzzer covers them

* define ushort

* fix indent, probably need flake8 back for CI to catch

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-12-05 16:19:28 -08:00
George Hotz
8c67eb1c92 GPT bugfixes (#2624)
* simple fixes

* fix exp2

* fixed

* parallel beam for CUDA

* fix image dtypes
2023-12-05 11:42:28 -08:00
George Hotz
d87a246439 move to new cached fetch (#2493)
* move to new cached fetch

* extra.utils is over

* loads

* bump download cache

* bump timeout
2023-11-28 17:36:55 -08:00
George Hotz
9e07824542 move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz
8ff2e13550 From teeny (#2426)
* changes from teenygrad work

* support not supporting ImageDType/PtrDType

* fixups from teeny
2023-11-24 12:50:56 -08:00
qazal
b6aaf12df7 Internal cast 2 with more tests (#2257)
* Change linearizer to parse CAST

* Oneliner renders for cstyle and triton

* LLVM cast and ALU implementation

* pylint fixes

* cast in gep

* remove printbufs

* use cast for post-load ops

* get rid of parse_cast

* partially supported vectorized dtypes for initial dev

* render phi as the dtype

* Revert "partially supported vectorized dtypes for initial dev"

This reverts commit 1bf1a818a3.

* Revert "render phi as the dtype"

This reverts commit d08cb270b4.

* reenable triton tests

* no vstore_half if dtype is already half

* upcast max
2023-11-10 10:42:39 -08:00
George Hotz
330484c072 Revert "Internal casting support (#2046)" (#2256)
This reverts commit 7e1d08b2ae.
2023-11-09 21:27:13 -08:00
qazal
7e1d08b2ae Internal casting support (#2046)
* Change linearizer to parse CAST

* Oneliner renders for cstyle and triton

* LLVM cast and ALU implementation

* pylint fixes

* cast in gep

* remove printbufs

* use cast for post-load ops

* get rid of parse_cast

* partially supported vectorized dtypes for initial dev

* render phi as the dtype

* Revert "partially supported vectorized dtypes for initial dev"

This reverts commit 1bf1a818a3.

* Revert "render phi as the dtype"

This reverts commit d08cb270b4.

* reenable triton tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-09 21:02:32 -08:00
qazal
2465d5d267 fix ops tests in test_dtype (#2237)
* fix test ops

* decompose the err from test_ops

* skipTest skips the entire test, we dont want that

* handle cases with the same priority

* add int16 to torch map
2023-11-09 15:17:43 -08:00
qazal
be5f185ac0 Higher test coverage for dtypes (#2156)
* refactor unit tests for dtypes

* add missing dtypes in llvmir.py and lib.py

* skip torch tests

* webgpu

* cleaner skips

* fix llvm bool casting issue using compare

* llvm 100% passing

* llvm segfault

* TEMP decrease timeout mins to 11

debug

* add bf16 to setup

* skip half tests in cuda cpu

* check for CUDACPU insetad

* add int16 to triton dtypes

* u16 for triton

* remove debug - diff is still hard to read

* derive from base class TestDType

* enhance test_upcast and downcast by running on every possible version

* dummy commit to rerun the flakey test

* skip the correct tests for CUDA

* bf16 should be skipped in the common TestDType cases

* re-enable bf16

* more consistent structure

* tiny changes to is_dtype_supported 1

* tiny changes 2

add reason

* fuzz

* fuzzer p2

* run fp32 twice

* remove duplicate fp32 run

* clang: use stdbool

* skip triton on bool casts

* merge and resolve conflicts
2023-10-30 22:38:42 -07:00
qazal
a7439af786 Fix llvm int->bool cast (#2164)
* add to ir

* add test case

* minimize diff

* todo

* enable fast math

* added both False and True case
2023-10-30 15:28:23 -07:00
George Hotz
1bf4aef0f5 fix image dtype cmp (#2089)
* fix image dtype cmp

* print that with debug 3
2023-10-16 17:52:38 -07:00
Ahmed Harmouche
fb4d830a2a Fix cast error in render_load in wgsl (#1956)
* Fix cast error in wgsl

* User render_cast intead of introducing new method

* Make it shorter

* Add back webgpu tests: efficientnet and dtypes
2023-10-04 02:29:14 -07:00
George Hotz
a6d842af7a move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
George Hotz
739f327d2d Shorter (#1582)
* deleting lines

* remove insert dims

* if statement is never hit

* bug fixes
2023-08-20 08:12:16 -07:00