* patch to remove hack from stable_diffusion.py
* sorry linter
* realize after assign?
* float16 broken in llvmlite use float64 for now
* int32
* idiot forgot to change test array dtype
* use correct dtype in Tensor when data is an ndarray
* attempt 2
* add assert to be consistent
* Add test case for ndarray
* Add test case for list
* remove whitespace
* proof of concept for variable replace global load
* small hacks to make faster
* clean up a little?
* linter
* allow substituting with an expression
* clean up a little
* fix everything
* try to fix bug?
* type annotation
* typing
* typing
* 1
* 83 failed
* learning how git works
* lol idk
* zero shape aaaa
* space lol
* aaa
* test check
* haha
* fixed gather
* 73 failing
* 71 failing
* 68 failing
* added some debug
* fking resize
* lol
* 62 failing
* 58 failling fucking did nearest resize hell yeah
* clean up
* 56 failing
* janitor duty
* lol
* 53 failing
* hi mom
* 50 failing
* added linear interp, but coord_trans is wrong
* did lin interpolation woohoo
* 43 failing
* 40 failing
* temporary Gather fix
* 39 failing
* fixed slice onnxver<10
* 37 failing
* 35 failing
* excluded tests that use float64
* 32 failing with hacks
* added _batchnorm() for 3D 5D batchnorm, 29 failing
* changed ALLOWED_KERNEL_COUNT from 199 to 207
* added improved Gather op, reverted ALLOWED_KERNEL_COUNT commit
* support Round op
* added storage_order/indices maxpool, 27 failing
* support maxunpool, 25 failures
* support Gradient, 23 failures
* merged new where
* added Adam
* cleanups
* added Momentum and Nesterov Momentum
* added Adagrad
* support sequence_type, 20 failing
* ugh git
* I give up on cubic interp :D, 9 failing
* sexy 1 liner gather, much improved, wow
* polished gather to make it shine bright like a diamond
* clean 1 liner for gather
* improved readability of gather
* uhh
* clean up
* more clean up
* WHITEspace
* implemented SoftmaxCrossEntropyLoss op
* added comments and cleaned up if statements
* update
* thank based wozeparrot for pow and new GatherElements
* CPU and TORCH all pass | cast float64 -> float32 for all fromCPU()
* _nearest_gather() failing on yolo
* reverted ops_cpu change and added assert in Resize
* added comments for resize for multiple channels
* oops
* merge
* test
* switched np.pad to Tensor.pad for constant padding
* gah
* gah2
* sexy reflect pad with movementops -> add
* delete commented out lines
* edge mode pad sexy as well
* trying out model_benchmark
* revert gitignore change lol
* init
* Revert "init"
This reverts commit 682bf2073a.
* wrote cast workaround for CPU, CPU and TORCH all pass
* wrote cast workaround for CPU, CPU and TORCH all pass
* skipped tests w/ 0 shape for METAL and GPU
* excluded tests for CLANG, CPU, TORCH, CLANG pass
* fixed hacky ConvTranspose
* gotta figure out autopad
* UOps.STORE support cast bool -> float
* small fix for fast gather
* reverted 0 shape skipped tests
* oops missed a file
* added comment
* fixed slice op hack
* First commit to pr
* More trig ops
* More trig ops
* format
* isinf support
* More ops
* changed onnx_ops to use our new gather :D
* Det op bug fix
* rebase
* fixed some tests
* det broken and slow
* fixed compress to use new gather
* implemented argmax argmin
* support variable types in type_proto
* support Upsample and Identity sequence
* we support float64 now and tinygrad support automatic broadcasting
* added EyeLike op
* resize does support multiple channels now actually
* yolov8 onnx runs successfully
* added batch size 1
* oops
* finally fixed type_proto I think
* fixed some llvm bugs
* del whitespaces
* added ZenginU Format PR
* test
* oops
* added float64 exclude tests back
* more skipped tests
* try
* ok openpilot pass
* flake8 pass
* woooooohooo
* revert external_model_benchmark changes
* perf tested gather
* removed promote types from ops_cpu
* numerical errors from 1681 is fixed
---------
Co-authored-by: ZenginU <umutzengin00@gmail.com>