* WebGL WIP
* 84% of ops passing test
* tests passing 100%
* Cleanup, refactor
* Shave off some lines
* Work on dtypes
* TestOps at 100% again
* Efficient net shaders compile in browser webgl2
* Compile all efficientnet shaders in browser
* Create empty textures for tensor buffers
* Run program. Up next weight loading
* Exported WebGL model working
* Add tests, refactor
* Explicit cast alu for GLSL
* Fix CI tests
* WebGL efficientnet demo
* Compile and run yolov8 in browser
* Fix imports
* Simplify yolo compile
* Fix bool*bool and cast cmplt to float
* More tests
* Do std tests pass on CI?
* Skip std tests on CI
* Remove explicit_cast_alu hack, and solve it in code_for_op
* Move to new dtype-less alloc api
* Remove local size hack: optimize local_size only if device has local
* Remove glsl.py, and move content to cstyle
* dont_use_locals in opts
* Fix dtype tests
* type_map in CStyleLanguage
* Make core changes smaller, cleaner, refactor export_model and demo
* Skip pad_slice
* Simplify: render_const, render_conditional
* solve bool alu for other binops, cleaner ops_webgl
* Fix noopt hack
* Remove some skipIfs
* WebGL image hack
* type_names is a better name
* global_max
* Fix dtype import
* Fix type_names -> type_map
* Fix lint
* Remove webgpu, back to 5k lines (#3040)
* remove webgpu
* max 5000 lines
* revert those to master
* retain that cstyle
---------
Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>
* add Tensor.split (#2677)
* fix mypy errors
* add list support for Tensor.split
* fix ruff comments
* match tensor.split api
* simplify split and test_split
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* space removal in formula and a single test to cover it
* space in torch einsum as well
* replacing spaces in a var formula to support truncating all the spaces
* lazy rewrite, try 2
* min fix tests
* pass contig test
* put broken pads back
* move that to realize
* no contig child fixes array packing
* so wrong
* now that's correct
* base children
* fix bind issues
* disable to_image_idx
* fix tests
* that failure shouldn't break other tests
* more fixes
* fix torch
* skip failing tests in CI
* 1e-7
* half is broken
* 1e-6 margin of error
* hopeful impl for Tensor.einsum
* satisfy mypy by having less typing. :(
* a few simple tests
* even more tests
* permute tests
* xfails for improper usage
* fix LLVM test fail
* use argfix
* more helpful error message on shape mismatch
* new getitem
* go
* add temporary simple tests
* better
* comments
* WOW that took awhile
* save 1 line lol
* work
* still need to add comprehensive tests, but i think getitem looks nice :D
* GIMME GREEN CI CHECKMARK PLS
* try..
* k idk
* added tests for errors
* fixed small hack
* added tests
* almost good
* try no contig?
* yay no more contig + comments and spacing
* finishing touches (comments)
* revert regex unittests lol
* add suggested change
* oops I fell asleep yesterday
* pad slice test cases, many failing
* fix failing test cases
check mask if we are outside the base buffer
also create a multi-view if in that case we reshape to an empty shape
* real_offset calculation more readable
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* torch and numpy don't share ops anymore
* that should be filtered out elsewhere
* still const
* graph + enet example cleanup
* hmm, we do still need it because of symbolic
* add back as_strided, move rebuilt mops to extra
* negative stride for ops_cpu
* Revert "negative stride for ops_cpu"
This reverts commit a13b6815ac.
* skip that
* style
* force rebuild of ocelot
* SzymonOzog gpuocelot
* delete that
* downgrade that
* non parallel
* force rebuild
* use llvm
* nauto
* less mem maybe
* print test
* helper_test_exception skip CUDACPU
* helper_test_exception
* shippable
* zero in shape start
* no assert for that
* if output size is 0, return without exec
* tweak
* strides
* reduce over non-zero
* shrink and expand
* fix import
* test_elementwise where
* cannot reshape from size 0 to size 1
* compiled backend reduce over 0
* zeros for numpy
* reduce over 0 and keepdim resulted in 1
* reduce empty set default values
* compare with same input
* pad test case
* cat test case
* torch does not support that?