* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* new objc message style [pr]
* without sync
* no div 0
* lru cache that
* no sync in the profile
* fix
* update all to new style
* remove comment
* graph one kernel
* fix graph one kernel
* remove that sync
* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* without sync
* no div 0
* lru cache that
* no sync in the profile
* dsp simulator
* progress
* fix
* close on test tiny
* working
* less waste
* line savings
* Device DSP compiler
* mock DSP at the bottom
* DSP tests
* docker caching
* test update
* need load
* skip that test for CI DSP
* last touch
* ugh
* rename Opt amt to arg
* ignore_beam_cache for test_tiny
* move ignore_beam_cache to test_tiny
* move to separate pr
* revert space change
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* this
* clean up
* more clean ups and improve debug msg
* more correct training toggler
* remove manual training toggling
* change some variable names
* actually just add the training toggle for LIMIT envvar too
* more refinement
* __call__ and OnnxRunner
* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later
* ahhhh found another mistake
* remove limit from __call__
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* start
* progress
* fixes
* smth
* mini fixes
* fix2
* ugh, need this for now
* faster
* cleanups
* tiny linters
* make mypy happier
* test & free pts
* ops
* linter
* cleanup vm
* fix
* remove map_from
* tiny fixes
* add test to ci
* is 67% considered fixed?
* move test up
* share function
* add qgemm too
* make sure qgemm comes out as int
* actually that note is not right
* remove qgemm (I did it wrong) and add it later lol.
* QLinearEverything
* ok ort verify passes
* this should be int instead
* cast to int then char to do wraparound
* cleaner
* move contrib ops to microsoft ops
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* implemented in tensor
* apply onnx tests to asymmetrical pads
* better onnx op ordering
* correct ceil_mode asymmetrical
* fix onnx_ops comments
* a few more TODOs and fix some stupidity
* fix some typing
* fix test
* mypy still a little messed up
* refactor out pad struct transformation
* add simple docs for now
* add whatever tests possible
* add tests for _resolve_pool_pads
* better err msg
* whoops didn't mean to include this
* retry CI
* enable asymmetric pads onnx tests
* better docs
---------
Co-authored-by: chenyu <chenyu@fastmail.com>