* WIP: clean up update stats
* line savings now
* fix graphs
* fix tests
* tighter prints
* remove extra jit=false
* debug=2 means wait
* that won't update stats
* still wait
* init multidevice cuda graph
* cuda just works!
* clean
* linter happier
* liners happy
* update transfer inputs
* do not change free
* useless check for cuda
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* move gpuctypes in tree
* fix mypy
* regex exclude
* autogen sh
* mypy exclude
* does that fix it
* fix mypy
* add hip confirm
* verify all autogens
* build clang2py
* opencl headers
* gpu on 22.04
* initial multitensor jit support and tests
* Added graphs to multitensor jit and updated tests
* update unbind api
* fix set device, add TinyJit to resnet
* update_stats includes device
---------
Co-authored-by: ramenguy99 <ramenguy99@gmail.com>
* cuda with gpuctypes
* hip gpuctypes
* graphs
* rename + linter happy
* use cpu_time_execution
* no ji in build_kernel_node_params
* remove hip_wrapper
* hip fix
* no arc
* smalle changes
* no clean moduke in cudacpu