* QLinearEverything
* ok ort verify passes
* this should be int instead
* cast to int then char to do wraparound
* cleaner
* move contrib ops to microsoft ops
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* add tests
* Refactor
* cache only amd/comgr/build (saves a lot of space)
* fix
* silence warning and add check for cache hit before installing cmake
* run only pytest
* use actions/cache
* lower timeout-minutes and add Device.DEFAULT step
* add nvidia to Device.DEFAULT check
* typo
* fix
* Check only for amd and run only 2 test
* restrict tensor const ShapeTracker in spec [pr]
* pass sink srcs
* reject if any of the specs disagree
* deceive mypy
* viz
* default to float
* just check the view
* create_schedule is gone
* test_verify_arg is flaky
* use HWInterface in autogen
* mockgpu
* HWInterface
* more HWInterface
* fix
* fix
* old code
* fix
* implicit field definition
* add offset check to mockgpu too
* refactor
* forgot to pass flags + read rewrite
* test
* play with vfio
* nv: this should be kept
* try this
* vfio
* rm overwrite=True
* linetr
* do not reinit kfd
* minor
* mypy
* mock
* init them once
---------
Co-authored-by: patrini32 <patrini23@proton.me>
* validate variable dims and fix buffer_parse to not use numpy
* fix var_dim parsing
* gah float16
* revert buffer_parse stuff
* revert that revert
* correct some err msges
* add some more debug msgs I find helpful
* tensor init noop
* add an assert just for the sake of it.
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* implemented in tensor
* apply onnx tests to asymmetrical pads
* better onnx op ordering
* correct ceil_mode asymmetrical
* fix onnx_ops comments
* a few more TODOs and fix some stupidity
* fix some typing
* fix test
* mypy still a little messed up
* refactor out pad struct transformation
* add simple docs for now
* add whatever tests possible
* add tests for _resolve_pool_pads
* better err msg
* whoops didn't mean to include this
* retry CI
* enable asymmetric pads onnx tests
* better docs
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
First iteration of the AMX fix was using symbol lookup + trampoline
approach which required this, however later i replaced it by marking
amx function `static` and assumed that relocation was still used when
callee wasn't inlined, however this turned out not to be the case
because the callee can't be moved around by linker at link-time and
can't be overloaded by other symbols (`static` means priority + local
visibility)
* init mockcuda
* run gpu ocelot
* fix
* sfixes
* disable broken tests
* linter
* these fails as well
* pylint
* myypy
* this fails on real platforms as well
* mypy please
* _padding2d -> _resolve_pool_pads
* rephrase err msg
* even better error msg
* check asymmetric first os people don't hit error twice
* test against torch