tinygrad

There are plenty of tests you can read through Examples contains tinygrad implementations of popular models (vision and language) and neural networks. LLama, Stable diffusion, GANs and Yolo to name a few

Environment variables

Here is a list of environment variables you can use with tinygrad. Most of these are self-explanatory, and used to enable an option at runtime. Example : GPU=1 DEBUG=4 python3 -m pytest

The columns are: Variable, Value and Description They are also grouped into either general tinygrad or specific files

General tinygrad

DEBUG: [1-4], enable debugging output, with 4 you get operations, timings, speed, generated code and more GPU: [1], enable the GPU backend CPU: [1], enable CPU backend MPS: [1], emable MPS device (for Mac M1 and after) METAL: [1], enable Metal backend (for Mac M1 and after) METAL_XCODE: [1], enable Metal using MacOS Xcode sdk TORCH: [1], enable Torch backend CLANG: [1], enable Clang backend LLVM: [1], enable LLVM backend LLVMOPT: [1], enable LLVM optimization LAZY: [1], enable lazy operations OPT: [1-4], enable optimization OPTLOCAL: [1], enable local optimization JIT: [1], enable Jit GRAPH: [1], Create a graph of all operations GRAPHPATH: [/path/to], what path to generate the graph image PRUNEGRAPH, [1], prune movementops and loadops from the graph PRINT_PRG: [1], print program FLOAT16: [1], use float16 instead of float32 ENABLE_METHOD_CACHE: [1], enable method cache EARLY_STOPPING: [1], stop early DISALLOW_ASSIGN: [1], enable not assigning the realized lazydata to the lazy output buffer

tinygrad/codegen/cstyle.py

NATIVE_EXPLOG: [1], enable using native explog

accel/ane/2_compile/hwx_parse.py

PRINTALL: [1], print all ane registers

extra/onnx.py

ONNXLIMIT: [ ], set a limit for Onnx DEBUGONNX: [1], enable Onnx debugging

extra/thneed.py

DEBUGCL: [1-4], enable Debugging for OpenCL PRINT_KERNEL: [1], Print OpenCL Kernels

extra/kernel_search.py

OP: [1-3], different operations NOTEST: [1], enable not testing ast DUMP: [1], enable dumping of intervention cache REDUCE: [1], enable reduce operations SIMPLE_REDUCE: [1], enable simpler reduce operations BC: [1], enable big conv operations CONVW: [1], enable convw operations FASTCONV: [1], enable faster conv operations GEMM: [1], enable general matrix multiply operations BROKEN: [1], enable a kind of operation BROKEN3: [1], enable a kind of operation

examples/vit.py

LARGE: [1], enable larger dimension model

examples/llama.py

WEIGHTS: [1], enable using weights

examples/mlperf

MODEL: [resnet,retinanet,unet3d,rnnt,bert,maskrcnn], what models to use

examples/benchmark_train_efficientnet.py

CNT: [10], the amount of times to loop the benchmark BACKWARD: [1], enable backward call TRAINING: [1], set Tensor.training CLCACHE: [1], enable Cache for OpenCL

examples/hlb_cifar10.py

TORCHWEIGHTS: [1], use torch to initialize weights DISABLE_BACKWARD: [1], dont use backward operations

examples/benchmark_train_efficientnet.py & examples/hlb_cifar10.py

ADAM: [1], enable Adam optimization

examples/hlb_cifar10.py & xamples/hlb_cifar10_torch.py

STEPS: [0-10], number of steps FAKEDATA: [1], enable to use random data

examples/train_efficientnet.py

STEPS: [1024 dividable], number of steps TINY: [1], use a tiny convolution network IMAGENET: [1], use imagenet for training

examples/train_efficientnet.py & examples/train_resnet.py

TRANSFER: [1], enable to use pretrained data

examples & test/external/external_test_opt.py

NUM: [18, 2], what ResNet[18] / EfficientNet[2] to train

test/test_ops.py

PRINT_TENSORS: [1], print tensors FORWARD_ONLY: [1], use forward operations only

test/test_speed_v_torch.py

TORCHCUDA: [1], enable the torch cuda backend

test/external/external_test_gpu_ast.py

KOPT: [1], enable kernel optimization KCACHE: [1], enable kernel cache

test/external/external_test_opt.py

ENET_NUM: [-2,-1], what EfficientNet to use

test/test_dtype.py & test/extra/test_utils.py & extra/training.py

CI: [1], enable to avoid some tests to run in CI

examples & extra & test

BS: [8, 16, 32, 64, 128], bytesize