* local metal on metal in uop syntax
* TODO: just put the axis_info in the kernelinfo
* local
* amd_matmul works @ 28 TFLOPS
* clean up matmul
* kernel8 works
* remove that
* locals
* axistype innovation
* work
* cleanup
* kernel3 regs
* cleanup kernel3
* work
* why is it broken
* no beam
* reenable
* permutes