* move device tests to test/device
* test speedups
* test device
* linalg to unit
* upd
* so pytest just works
* more divide and skip
* speed
* test devectorize
* add pillow
* add kernelize to keccak for each data block
test_long works now. this prevents internal uops from growing propotional to data length and eventually too deep
* this?
* hash stuff
* gate test
* mv