* mypy fun
* things are just faster
* running fast
* mypy is fast
* compile.sh
* no gpu hack
* refactor ops_cpu and ops_torch to not subclass
* make weak buffer work
* tensor works
* fix test failing
* cpu/torch cleanups
* no or operator on dict in python 3.8
* that was junk
* fix warnings
* comment and touchup
* remove val expansion
* types for all shapetracker functions:
* more typing
* add all the parens to the test
* more types
* fix tests
* very minor speedup
* Refactor getenv into helpers
* Remove unused os
* Fix default value
* Fix more defaults for CI
* Fix bracket
* Revert changes to openpilot/compile.py
* Use getenv from helpers when possible
* factor out opencl runtime
* don't use CL outside the runtime
* cuda runtime adds
* final_dimension
* tests pass with CUDA backend
* more cuda
* cuda simpler
* retain old functionality
* linter and typing
* move globalcounters out of runtimes
* oops, GlobalCounters in cuda
* MAX_OUTPUT_SHAPE=3 is fine for CUDA