* fix extract_dataset + tests
* add CI
* sops.gz itself is same as master
* yml + gzip -c + ge
* don't commit that
* bump limit to 1000
* axis=7
* test_tiny
* refactor count_float4 to take uops as input instead of kernel
* remove some calls to linearize in test_linearizer
* remove some more calls
* remove one more call
* squash commits
* temp fix for const tensor
* actually realizing float16 can only happen in raw_data
* .float -> cast(float) to rerun CI
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
```
training bert
training on ['REMOTE:0', 'REMOTE:1', 'REMOTE:2', 'REMOTE:3', 'REMOTE:4', 'REMOTE:5']
Traceback (most recent call last):
File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 1300, in <module>
with Profiling(enabled=getenv("PYPROFILE")): globals()[nm]()
^^^^^^^^^^^^^^^
File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 975, in train_bert
for x in GPUS: Device[x]
~~~~~~^^^
File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 22, in __getitem__
def __getitem__(self, ix:str) -> Compiled: return self.__get_canonicalized_item(self.canonicalize(ix))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 28, in __get_canonicalized_item
ret = [cls for cname, cls in inspect.getmembers(importlib.import_module(f'{base}.runtime.ops_{x}')) \
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/uuuvn/src/tinygrad/tinygrad/runtime/ops_remote.py", line 417, in __init__
if not renderer[0].startswith("tinygrad.renderer.") or not renderer[1].endswith("Renderer"): raise RuntimeError(f"bad renderer {renderer}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: bad renderer ('tinygrad.runtime.ops_null', 'NullRenderer', ())
```
* proposal: add option to override opts in the get_program API
* update test_linearizer_rewrite
* state in uops
* update process_replay and names
* empty isn't none
* fix process replay
* Don't use numpy inside hlb_cifar10 training loop
* Lint it
* jit it
* Drop the last half-batch
* Use gather for random_crop and reuse perms
* Wrap train_cifar in FUSE_ARANGE context
* No need to pass FUSE_ARANGE=1 to hlb_cifar10.py
* Add cutmix to jittable augmentations
* Remove .contiguous() from fetch_batches
* Fix indexing boundary
---------
Co-authored-by: Irwin1138 <irwin1139@gmail.com>
* minor cleanup on test_tensor_core_opts tests
Tests now notify when skipped
Before, they silently skipped if backend didn't had half precision and
accumulation
Also cleaned up atol and rtol setup
* refactor test_tensor_core_opts_group
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* change clang -march flag to -mcpu with fp16 disassembly test
* fix
* add capstone to macos dependencies
* just check no cast in test
* rm import
* woops
* lets check
* move check
* llvm init before cpu chcek
* try this
* bump autogen llvm version
* also update libclang?
* revert
* add comment
* skip llvm test and add comment
* linter
* move index validation to load/stores
* add name
* add linearizer_failure
* add validate_store with implicit gates
* linearizer_failure_58 is fixed!
* add test_uop_graph test
* rename cond to gate
* test gated load/stores
* use or_casted()