* WebGPU on Windows
* Fix dawn-python install
* New test
* pydeps
* Minor fix
* Only install dawn-python on windows webgpu
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* remove .relu() call in several conv tests in test_ops
testing negative parts double the effectiveness. keep the relu between two convs and the tests that explicitly test relu
* relax tol
* move high level stuff to unit tests [pr]
* process replay on unit tests
* fix pr, less compute
* set omp num threads
* set 200MB buffer size limit
* delete junk
* fix tests
* faster
* move test_indexing to unit
* faster
* remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape'
* added the same set of unit tests for 'view' as for 'reshape' since 'view' is just an alias for 'reshape'
* improved tests for 'view' op
* probably how cumprod should look like
* update _cumalu to work with MUL
* shorter
* cumprod testing
* clean
* more cleanup
* add cumprod to torch backend.
* make it look like cumsum
* mypy fix
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* why does max_unpool2d feel slower than out.gradient ...
* slightly cleaner
* what happened to ruff
* need to think about this some more
* slightly faster now?
* clean up, 1 more failing edge case
* ok good
* working TINY_BACKEND
* nit doc wording
* retry CI
* wow argmax is so good
* 1 less line
* clean up and better variable names
* is this torch thing right...?
* add more tests
* slap a TODO on it
* clean ups
* prettier looking code and fix ceil mode test
* add return types and some docs
* ok that was a bad example since indices == value, just no example
* poc
* repeated values fail, sigh
* is this being timed out?
* fix up down names
* bitonic v2, does this run?
* bitonic v3, faster
* bitonic v3.1, faster
* bitonic v3.1.1, same speed unlucky
* support dim and indices
* bitonic v3.2, simpler code, TODO repeated indices
* bruv gimme green for once cmon
* cat (stack) implementation, slow but maybe one day when cat is fast meow
* revert to v3.2
* bitonic v4, who let the cats out edition
* clean up variable names
* figured out repeated indices :D
* ruff check --fix
* use sort for topk
* add Tensor.sort everywhere
* fix docs and add some types
* slightly better variable names
* am I doing torch inplace correctly?
* delegate sort to values_stable
* add a contig, faster first sort
* maybe don't test_inplace
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* terrible but somewhat working impl
* linux behaves differently than macos?
* slightly better impl
* small clean up; haven't figured this out yet
* better
* torch has different behavior on linux and macos for duplicated values
* add sum docs
* fix test
* add torch return_type test
* add an exception test
* wrap_fxn instead, and move op lower in order
* better repeated values test
* rerun ci