as pointed out by #4877, need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime
* compile raise CompileError and skip only RuntimeError in multiprocess beam
renderer error with multiprocess should not be skipped by beam
* use `==` for dtype to dtype comparison
* that needs to be is
* typo
* subbuffer support
* diskbuffer offset
* cuda subbuffer works
* use subbuffer
* more subbuffer tests
* consecutive
* cast
* consec
* offset
* view is a better name
* offset is in nbytes
* fix view + memory planner
* delete unused DiskRunner
* reverse order
* no subbuffers on unrealized consts
* only enabled for disk
* don't reverse memory
* view supported devices
* pickle buffer view
* ring jit
* support extra view inputs in jit
* fix JIT=2 issue
* test copy jit
* p2p isn't an option anymore
* fix dep tracking issue
* fix mypy
* fix pickle
* from_nv is contents now
* Fix sm89 PTX=1 compilation
The minimum PTX version that supports sm89 is 7.8 (same version also
supports sm90); without this ptxas fails when running tinygrad with
PTX=1 on RTX 4090.
* Use int(arch[3:]) for forward compat with SM10.0 if that happens
* ptx float4 implementation
* remove from cache when trimming uops
* Gate for float4
* Linting fix
* disable test reasonable time for ptx
* import getenv
* Update uops.py
* linter
* Add div test for half
* upcast if op does not support operation
* fix offset
* Run only if dtype supported
* zero out registers when accessing by pred + cleanup
* Remove trailing whitespace
* revert
* spacing fix
* move cache clearing outside loop
* did this suddenly start working?
* unused import removed
* Remove cast
* Use pattern matching
* linting
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* move gpuctypes in tree
* fix mypy
* regex exclude
* autogen sh
* mypy exclude
* does that fix it
* fix mypy
* add hip confirm
* verify all autogens
* build clang2py
* opencl headers
* gpu on 22.04
* compile cache for several devices
* ops_gpu uses hash to not care about sql
* hip rdna test with device
* linter happy
* no device passed where possible
* arch is optional to compile_{hip|cuda}