* Revert "Revert "ops rdna""
This reverts commit 0400315078.
* Revert "Revert "writing 2""
This reverts commit 325a3bf2cf.
* no dump
* 2x 2
* simple asm
* local size
* sub
* lil work
* support args != 3
* assembler work
* generate that
* ptx assembler
* begin index renderer
* max
* ptx loops
* gemms work
* valid works
* asm working a bit more
* close
* passing all ops tests
* ptx is a codegen only, not a backend
* ptx
* float16 support
* rdna goes here
* install types
* make amd disassemble
* ansilen for pretty print
* fix ptx log2/exp2
* assemblyinstruction
* new asm
* working gemm
* fix cmp
* more passing
* mod
* ptx works again
* rdan3 add works
* log exp
* sin is sin 2pi
* fix types
* progress
* loops work
* rdna xyz
* better addressing
* cleanups
* handle exception in early process
* div support
* rdna float4
* locals work
* fix neg index
* cast
* smaller diff
* yaml
* import only if selected
* fromimport
* types
* this all needs rewriting
* a few more
* initial commit
* added osx check for opencl
* added llvm f64 conversions
* typo in llvmir
* more tests and modified unsupported error
* fixed linting error
* added pragma fp64
* simplified exclusion for OSX
* fixed device check and also added it to cast func
* added ifdef check for fp16 in ops_gpu
* Revert "added ifdef check for fp16 in ops_gpu"
This reverts commit 92de754d48.
* f64 prekernel signature match f16
* moved condition to buffer init
* added metal int64 and some simple tests
* removed bool return type def
* typo in test
* also missing in clang and gpu runtimes
* switched order for opencl
* increased atol and removed new line in kernel prefix
* feat: int8 support
* feat: uint8 support
* feat: int8 tests
* fix: fix uint8 on clang
* feat: test casting between int8/uint8/float16/float32
* clean: way cleaner dtype tests
* feat: preprocess_imagenet using the correct dtype
* feat: add test for overflow between uint8 and int8
* no zeroview start
* closer
* stride mask
* st tests pass, delete ZeroView
* byebye zv
* close to working
* not contiguous with mask
* subtract, don't add
* mask on view
* ugh, that shouldn't have been in there
* shape merge
* bugfixes
* fuzzer + 4 fuzzer failures
* fuzzer for symbolic
* more fuzzing and nothing
* that fuzzer doesn't hit either
* fixes padding...ugh
* no more offsets
* working
* rewrite load and store
* all checks
* fix idxs
* progress
* bugfix
* float4_axis
* works
* cleanups
* complex valids_okay
* linearizer outputs something
* working ish
* cstyle codegen
* clang mostly works
* fix load valid
* fix numberless loop
* fancy gen
* working
* fix enet compiler
* cleanups
* float4 upcasting
* less lines
* supports_float4
* constant folding
* mulacc
* internet tests flaky in CI
* 90% image support
* fix image generic
* bugs exposed with shapetracker and single view
* new llvm
* use vload, remove OLD
* that's really poorly done
* ending up being more lines