* WebGL WIP
* 84% of ops passing test
* tests passing 100%
* Cleanup, refactor
* Shave off some lines
* Work on dtypes
* TestOps at 100% again
* Efficient net shaders compile in browser webgl2
* Compile all efficientnet shaders in browser
* Create empty textures for tensor buffers
* Run program. Up next weight loading
* Exported WebGL model working
* Add tests, refactor
* Explicit cast alu for GLSL
* Fix CI tests
* WebGL efficientnet demo
* Compile and run yolov8 in browser
* Fix imports
* Simplify yolo compile
* Fix bool*bool and cast cmplt to float
* More tests
* Do std tests pass on CI?
* Skip std tests on CI
* Remove explicit_cast_alu hack, and solve it in code_for_op
* Move to new dtype-less alloc api
* Remove local size hack: optimize local_size only if device has local
* Remove glsl.py, and move content to cstyle
* dont_use_locals in opts
* Fix dtype tests
* type_map in CStyleLanguage
* Make core changes smaller, cleaner, refactor export_model and demo
* Skip pad_slice
* Simplify: render_const, render_conditional
* solve bool alu for other binops, cleaner ops_webgl
* Fix noopt hack
* Remove some skipIfs
* WebGL image hack
* type_names is a better name
* global_max
* Fix dtype import
* Fix type_names -> type_map
* Fix lint
* Remove webgpu, back to 5k lines (#3040)
* remove webgpu
* max 5000 lines
* revert those to master
* retain that cstyle
---------
Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>
* return bool
* add tests to the type spec
* fix multinomial
* fix tril
* fix round
* fix NegativeLogLikelihoodLoss
* rm debug
* webgpu
* more webgpu
* bitwise or for adding two bools
* onnx ops dont need to cast anymore
* Revert "bitwise or for adding two bools"
This reverts commit b413babffa.
* workaround for metal neg
* just the tests in the type spec
* test dtypes of return values of cumsum, argmax/min, multinomial
cumsum behaves like sum, and functions that return an index return in dtypes.default_int
* because webgpu is different
* Uncripple dtype tests, TestBFloat16DType never actually runs.
* Fix conversion from/to bfloat16.
Call cast() recursively, so that it works for any type combo.
* Run this test on torch backend as well.
* Add torch.bfloat16.
* Add support for ushort and uint.
* Convert np.uint32 to np.int32 when loading.
* Fix warning.
* least upper float
* dont cast to the same thing
* tests for least_upper_float
* add regression tests to test_dtype_alu
* the call is pretty cheap probably cache is too much overhead
* bitcast renderers
* fast llama load
* make it one kernel
* regression testing p1: re-enable test_dtype for all backends
fix GPU
* regression testing p2: fuzz all possible cases against numpy
remove hancoded tests since the fuzzer covers them
* define ushort
* fix indent, probably need flake8 back for CI to catch
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* Change linearizer to parse CAST
* Oneliner renders for cstyle and triton
* LLVM cast and ALU implementation
* pylint fixes
* cast in gep
* remove printbufs
* use cast for post-load ops
* get rid of parse_cast
* partially supported vectorized dtypes for initial dev
* render phi as the dtype
* Revert "partially supported vectorized dtypes for initial dev"
This reverts commit 1bf1a818a3.
* Revert "render phi as the dtype"
This reverts commit d08cb270b4.
* reenable triton tests
* no vstore_half if dtype is already half
* upcast max
* Change linearizer to parse CAST
* Oneliner renders for cstyle and triton
* LLVM cast and ALU implementation
* pylint fixes
* cast in gep
* remove printbufs
* use cast for post-load ops
* get rid of parse_cast
* partially supported vectorized dtypes for initial dev
* render phi as the dtype
* Revert "partially supported vectorized dtypes for initial dev"
This reverts commit 1bf1a818a3.
* Revert "render phi as the dtype"
This reverts commit d08cb270b4.
* reenable triton tests
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* fix test ops
* decompose the err from test_ops
* skipTest skips the entire test, we dont want that
* handle cases with the same priority
* add int16 to torch map
* refactor unit tests for dtypes
* add missing dtypes in llvmir.py and lib.py
* skip torch tests
* webgpu
* cleaner skips
* fix llvm bool casting issue using compare
* llvm 100% passing
* llvm segfault
* TEMP decrease timeout mins to 11
debug
* add bf16 to setup
* skip half tests in cuda cpu
* check for CUDACPU insetad
* add int16 to triton dtypes
* u16 for triton
* remove debug - diff is still hard to read
* derive from base class TestDType
* enhance test_upcast and downcast by running on every possible version
* dummy commit to rerun the flakey test
* skip the correct tests for CUDA
* bf16 should be skipped in the common TestDType cases
* re-enable bf16
* more consistent structure
* tiny changes to is_dtype_supported 1
* tiny changes 2
add reason
* fuzz
* fuzzer p2
* run fp32 twice
* remove duplicate fp32 run
* clang: use stdbool
* skip triton on bool casts
* merge and resolve conflicts