* subbuffer support
* diskbuffer offset
* cuda subbuffer works
* use subbuffer
* more subbuffer tests
* consecutive
* cast
* consec
* offset
* view is a better name
* offset is in nbytes
* fix view + memory planner
* delete unused DiskRunner
* reverse order
* no subbuffers on unrealized consts
* only enabled for disk
* don't reverse memory
* view supported devices
* pickle buffer view
* ring jit
* support extra view inputs in jit
* fix JIT=2 issue
* test copy jit
* p2p isn't an option anymore
* fix dep tracking issue
* fix mypy
* fix pickle
* from_nv is contents now
* keep old cookie around until next step has been queued (-10ms 6gpu)
* also for eval
* drop cookie before data_get?
* Revert "drop cookie before data_get?"
This reverts commit b01e6aa2b2.
* Revert "Revert "drop cookie before data_get?""
This reverts commit 23464e73d4.
* mlperf/resnet: update beam params to increase time and quality
* revert upcast 8 in search space and add rocm setup function
* refactor to independent setup.sh script
* search: skip timing the unoptimized kernel
also ensure the return the unoptimized kernel if no opts are valid
and refactor debugging to a single BEAM_DEBUG variable
* stop early on fast kernels that can't improve enough
* search: fix edge cases on screening potential ops
won't change correctness, but will save a little python time by
properly deduplicating potential actions
* check for de-duplication instead of exact valid actions
* refactor long line
it did not ask too many details. will put software versions later with tinygrad commit.
```
python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_red.json training 4.0.0
INFO - System description checker passed for tinybox red
```
```
python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_green.json training 4.
0.0
INFO - System description checker passed for tinybox green
```
* fix mean underflow for half tensor
divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var
* skip for python backend
* kernel: change PADTO check to allow up to 4x padding
also optionally remove PADTO from the search action space with
BEAM_PADTO=0.
* fix test_linearizer test_tensor_cores_padded tests
* update resnet runs to use SPLIT_REDUCEOP=1
* fix up search TC axis and amt checking
* fix up the dimensions of the TC tests