* np generates randoms
* hotfix: use generator for int dtype
* float32 as default dtype for float generator
* use np.float32 instead of stirng
* add dtype= to integers generator
* change import _to_np_dtype source
* add ability to ORT=1
* test_vs_ort
* useless f
* actually have benchmark take in modelproto for more flexibility in huggingface stuff
* ok runs
* good
* oops fix benchmark_onnx __main__
* 224 as default
* add ORT=1 option to huggingface_onnx
* use Tensor to get_input
* add abilty to do single onnx model testing
* better names
* merge properly...
* copy in onnx_helpers
* better
* decent script
* need to add debug tool first
* new limit usage
* why did narrowing_error come back..
* pretty decent
* revert validate change
* more ops bug fixes
* revert unnecessary changes
* fix InstanceNorm too
* remove op from O4
* minimize diff
* address old feedback
* unsure of this, just revert
* remove that assert
* working attention
* to_python_const Attention
* cant init from np constant so just do this
* final
* fix bug in attention
* attention clean ups
* add hard TODOs and REPOPATH and TRUNCATE envvar
* fix input_ids default value
* final
* fix scatter
* cleaner _prepare_quantize
* use new attention and tempfile for huggingface script
* more stats
* update
* remove outdated code
* big refactor to something usable by CI
* booooooom
* clean up
* update to using yaml as env var input
* add dry run
* try
* valid pad
* use argparser and fix gather bug
* ignore all yaml
* tiny bit more polish
* woah ignoring all yaml was not right
* typo
* decouple huggingface_onnx_run debug run with huggingface_onnx_download
* bug fix for downloading single model
* WOOOO ok much better
* oops argparse 'required' is an invalid argument for positionals
* oops argparse 'required' is an invalid argument for positionals
* add assert
* fix types
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* sqtt
* docs
* multi-device
* ProfileSQTTEvent
* exec update
* 256mb default
* don't let people hang their gpus
* bitfields from autogen
* asic info from mesa
* more bitfields from autogen
* SQTT_ITRACE_SE_MASK
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* fix leak, realize everything on torch optim step
* only realize a subset
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* add torch inplace tests
* first set of tests passing
* wrap all inplace funcs, add more tests
* fixes and wrap more functions
* fix all uint8 tests to avoid slow tests
* fix the one test
* another test, another fix
* and one more, works for ddp now
* something on contiguous, cleanup
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
* terrible but somewhat working impl
* linux behaves differently than macos?
* slightly better impl
* small clean up; haven't figured this out yet
* better
* torch has different behavior on linux and macos for duplicated values
* add sum docs
* fix test
* add torch return_type test
* add an exception test
* wrap_fxn instead, and move op lower in order
* better repeated values test
* rerun ci
* prep refactor for adding buffer ops last [pr]
* freeze buffers
* add swizzle_reduceop
* shape for reduceop_view_right
* simpler elementwise_view_right
* add shapetracker to const
* only const
* from process replay