* BOOM
* cache extra/huggingface/models/
* why max buffer size is not 0
* override MAX_BUFFER_SIZE
* less models
* remove more models and change cache dir to already cached dir
* only metal
* less is more?
* remove check ops
* why is this not setting the ENVVAR
* ughhhhh just test in models
* only cpu and gpu
* only cpu actually
* just override it idk
* final
* move extra dependencies up top
* simplification
* fix print
* make README better
* revert ops_disk fix for now
* clean up test_onnx
* remove testing fashion clip model cuz sloooowwwwww
* actually let METAL run this
* fix comment mistake
* fix download path in run_models
* does this work?
* cleanup setup and teardown
* contextvar like this?
* prove model is cached
* do I need to increment DOWNLOAD_CACHE_VERSION?
* see if cached with incremented DOWNLOAD_CACHE_VERSION
* use warnings to see if the model exists
* revert DOWNLOAD_CACHE_VERSION stuff and clean up
* add retry to download
* nit
* file path as input and have parse be in OnnxRunner.__init__
* modelproto_to_onnxrunner -> modelproto_to_runner
* whoops, fix import
* oh flakiness again, is it because it's getting gc-ed?
* small changes
* CI flaky so just move compile4 fix in
* copy typing of onnx_load
* actually can just import onnx_load instead of onnx.load
* fix external_benchmark_openpilot
* fix onnx_runner test to use onnx_helper
* rerun CI
* try run_modelproto
* spam CI a few times
* revert run_modelproto since that's flaky also
* no external onnx_load usage except onnx.py
* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?
* model_benchmark 193s -> 80s, add OnnxRunner.to()...
* minimize diff and clean up
* device can be None, weird but eh
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* add ability to ORT=1
* test_vs_ort
* useless f
* actually have benchmark take in modelproto for more flexibility in huggingface stuff
* ok runs
* good
* oops fix benchmark_onnx __main__
* 224 as default
* add ORT=1 option to huggingface_onnx
* use Tensor to get_input
* add abilty to do single onnx model testing
* better names
* merge properly...
* copy in onnx_helpers
* better
* decent script
* need to add debug tool first
* new limit usage
* why did narrowing_error come back..
* pretty decent
* revert validate change
* more ops bug fixes
* revert unnecessary changes
* fix InstanceNorm too
* remove op from O4
* minimize diff
* address old feedback
* unsure of this, just revert
* remove that assert
* working attention
* to_python_const Attention
* cant init from np constant so just do this
* final
* fix bug in attention
* attention clean ups
* add hard TODOs and REPOPATH and TRUNCATE envvar
* fix input_ids default value
* final
* fix scatter
* cleaner _prepare_quantize
* use new attention and tempfile for huggingface script
* more stats
* update
* remove outdated code
* big refactor to something usable by CI
* booooooom
* clean up
* update to using yaml as env var input
* add dry run
* try
* valid pad
* use argparser and fix gather bug
* ignore all yaml
* tiny bit more polish
* woah ignoring all yaml was not right
* typo
* decouple huggingface_onnx_run debug run with huggingface_onnx_download
* bug fix for downloading single model
* WOOOO ok much better
* oops argparse 'required' is an invalid argument for positionals
* oops argparse 'required' is an invalid argument for positionals
* add assert
* fix types
---------
Co-authored-by: chenyu <chenyu@fastmail.com>