* why does max_unpool2d feel slower than out.gradient ...
* slightly cleaner
* what happened to ruff
* need to think about this some more
* slightly faster now?
* clean up, 1 more failing edge case
* ok good
* working TINY_BACKEND
* nit doc wording
* retry CI
* wow argmax is so good
* 1 less line
* clean up and better variable names
* is this torch thing right...?
* add more tests
* slap a TODO on it
* clean ups
* prettier looking code and fix ceil mode test
* add return types and some docs
* ok that was a bad example since indices == value, just no example
* terrible but somewhat working impl
* linux behaves differently than macos?
* slightly better impl
* small clean up; haven't figured this out yet
* better
* torch has different behavior on linux and macos for duplicated values
* add sum docs
* fix test
* add torch return_type test
* add an exception test
* wrap_fxn instead, and move op lower in order
* better repeated values test
* rerun ci
* add DynamicDequantizeLinear and corresponding tests
* wow qlinearops are round away from zero
* this passes locally...
* again
* try
* try separate test
* round to even again
* also add QLinearMul
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* boom
* fix webgpu
* use exact variable names in test so that AI can read easier
* add tag for specific test name like test a specific dtype
* fix ruff
* astype everything
* dtype in array creation
* just arange
* is 67% considered fixed?
* move test up
* small cleanups
* share function
* add qgemm as well
* add qgemm too
* make sure qgemm comes out as int
* take out qgemm for now
* fixed test
* add correct qgemm
* addressing feedback here too, early naive fix for now
* simplify bias and c to be minimalistic enough to test correctness
* refactored qlinearops
* maybe these asserts aren't the best..
* fix test
* updated tests to cover new ops
* try to add to CI
* move test_onnx_ops into testextra/
* more attention tests
* qlinear_add atol=1
* attention still not fullllllly correct
* it is what it is
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* pytorch scatter -> scatter_reduce
* WIP scatter_reduce implementation
* _pre_scatter return type hint
* split out src, mask to satisfy linter
* Add src cast back in
* dict of lambdas instead of ifs
* sum and prod reduction ops with include_self
* add reduce arg error message
* add amax and amin reduction ops
* Fix include_self for higher dims
* Simplify
* Simplify amax and amin too
* Pull include_self logic out into _inv_mask function
* reduce arg cannot be None for scatter_reduce
* Fix self-mask issue
* Add mean reduce op
* Add tests
* any() not needed here
* remove comment
* End support for Tensor src with reduce arg in tinygrad scatter
* Process index, dim inside actual functions
* Add scatter_reduce to onnx
* Add excluded onnx ScatterElements reduction tests back in
* Save 2 lines on the mask helpers
* Update docs
* Add include_self=False tests
* cleanup
* Remove unneeded helper function
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* this
* clean up
* more clean ups and improve debug msg
* more correct training toggler
* remove manual training toggling
* change some variable names
* actually just add the training toggle for LIMIT envvar too
* more refinement
* __call__ and OnnxRunner
* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later
* ahhhh found another mistake
* remove limit from __call__
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* validate variable dims and fix buffer_parse to not use numpy
* fix var_dim parsing
* gah float16
* revert buffer_parse stuff
* revert that revert
* correct some err msges
* add some more debug msgs I find helpful
* tensor init noop
* add an assert just for the sake of it.
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
it's a python style mod. possibily can be cleaner with a floor div
relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs
* move to_python_const out
* move more over
* try deleting alternative gather implementation
* Revert "try deleting alternative gather implementation"
This reverts commit d46b30b717.
* add types to onnx ops
* better debug msg
* improve some com.microsoft too
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* 1 is simpler than 2
* variable name
* change error wording
* shapes for sequence type must be homogeneous
* bug fix for model benchmark
* fix comments too
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* start
* simplify ops
* why did this not work before
* will split buffer parse to separate pr
* flip the error order
* only this much for now
* to_python_const clean up
* minimize diff
* move tensor_methods into onnx.py
* improve some type signatures
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* simple clean ups first
* more work
* kinda have adam
* ooo momentum worked nicely
* almost there
* wow.. is the onnx test wrong
* nicer optim stuff
* just skip that test
* small comment changes
* use naming convention from other parts of codebase
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
replaced all dtype.np with _to_np_dtype defined in tensor.py.
after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
* add onnx test_reduce_log_sum_exp
* more reuse
* more
* stuff
* good CenterCropPad
* imports
* good ArrayFeatureExtractor
* pretty good Pad
* stuff
* stuff
* onnx.py
* Atan
* pass int8 test
* dtype related
* fastmath stuff
* Resize linear
* fix CI
* move back
* updated most dtype hacks in onnx_ops
* temporarily revert dequantizelinear change
* I think this is right...
* MORE FIXES WOOOO NEW DTYPE IS AWESOME
* ok
* oops missed a print
* half -> float32 for CI
* is npdtype
* some more
* fix if ordering
* more clean ups
* final cleanups
* casting to half not allowed
* k nvm
* revert ArgMax change
* only GPU
* llvm begone
* teeny tiny change
* fix: attempt to add cast tests
* try this
* fix dequantizelinear
* revert some stuff
* tests pass pls
* less lines in onnx_tests
* oops missed string tensor tests
* clean up
* try: revert default behavior changes
* fix: disabled Cast and Castlike tests
* docs: small changes
* fix: fixed isNaN op and enabled associated tests
* fix: forgot about float16
* done
* update disabled test
* gah missed another float16
* disable rest of failing tests
* rm extra line
* try...
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* some cleanup
* move continue back
* more more more
* added to CI
* try
* try intentionally break some tests
* wtf
* del True for test
* yay tests broke, now pls no break
* try AGAIN
* gahy
* lol
* try
* move over constant
* moved over MORE
* move shrink over
* trailing lines
* try CUDA CI
* try again
* boom
* oops
* improved comments
* try: disable some flags and disable CUDA
* try breaking tests
* traceback has too much info so add --tb=no
* revert forced CI failure
* add comments and del unused imports
* oooooooo using regular debug try enable tb
* intentionally break tests
* added tb back. Maybe not too verbose
* strip whitespcae
* missed something
* Shape op int32 -> int64
* oops missed something
* add some types
* get rid of crazy 1 liners in pad op
* actually test Split this time LOL
* strip that whitespace