* copy tensors to and from gpu
* add on GPU
* adding works
* we stick shapes in
* works on cpu and gpu
* test changes, not passing yet
* something else
* op tests pass
* add, mean, and sum have working forward/backward
* mul ops test
* no gpu support, no problem
* test pass, clean up later
* gpu cleanup
* cleanup test ops, don't let div fail
* revert more
* aimpler dispatcher
* clean up grad
* GPU and
* grad is a Tensor now
* gate test on GPU
* cleanups
* late loading gpu
* GPU as input option
* last cleanups