Welcome to the tinygrad documentation
General instructions you will find in README.md
abstraction.py is a well documented showcase of the abstraction stack.
There are plenty of tests you can read through Examples contains tinygrad implementations of popular models (vision and language) and neural networks. LLama, Stable diffusion, GANs and Yolo to name a few
Environment variables
Here is a list of environment variables you can use with tinygrad.
Most of these are self-explanatory, and used to enable an option at runtime.
Example : GPU=1 DEBUG=4 python3 -m pytest
The columns are: Variable, Value and Description They are also grouped into either general tinygrad or specific files
General tinygrad
DEBUG: [1-4], enable debugging output, with 4 you get operations, timings, speed, generated code and more GPU: [1], enable the GPU backend CPU: [1], enable CPU backend MPS: [1], emable MPS device (for Mac M1 and after) METAL: [1], enable Metal backend (for Mac M1 and after) METAL_XCODE: [1], enable Metal using MacOS Xcode sdk TORCH: [1], enable Torch backend CLANG: [1], enable Clang backend LLVM: [1], enable LLVM backend LLVMOPT: [1], enable LLVM optimization LAZY: [1], enable lazy operations OPT: [1-4], enable optimization OPTLOCAL: [1], enable local optimization JIT: [1], enable Jit GRAPH: [1], Create a graph of all operations GRAPHPATH: [/path/to], what path to generate the graph image PRUNEGRAPH, [1], prune movementops and loadops from the graph PRINT_PRG: [1], print program FLOAT16: [1], use float16 instead of float32 ENABLE_METHOD_CACHE: [1], enable method cache EARLY_STOPPING: [1], stop early DISALLOW_ASSIGN: [1], enable not assigning the realized lazydata to the lazy output buffer
tinygrad/codegen/cstyle.py
NATIVE_EXPLOG: [1], enable using native explog
accel/ane/2_compile/hwx_parse.py
PRINTALL: [1], print all ane registers
extra/onnx.py
ONNXLIMIT: [ ], set a limit for Onnx DEBUGONNX: [1], enable Onnx debugging
extra/thneed.py
DEBUGCL: [1-4], enable Debugging for OpenCL PRINT_KERNEL: [1], Print OpenCL Kernels
extra/kernel_search.py
OP: [1-3], different operations NOTEST: [1], enable not testing ast DUMP: [1], enable dumping of intervention cache REDUCE: [1], enable reduce operations SIMPLE_REDUCE: [1], enable simpler reduce operations BC: [1], enable big conv operations CONVW: [1], enable convw operations FASTCONV: [1], enable faster conv operations GEMM: [1], enable general matrix multiply operations BROKEN: [1], enable a kind of operation BROKEN3: [1], enable a kind of operation
examples/vit.py
LARGE: [1], enable larger dimension model
examples/llama.py
WEIGHTS: [1], enable using weights
examples/mlperf
MODEL: [resnet,retinanet,unet3d,rnnt,bert,maskrcnn], what models to use
examples/benchmark_train_efficientnet.py
CNT: [10], the amount of times to loop the benchmark BACKWARD: [1], enable backward call TRAINING: [1], set Tensor.training CLCACHE: [1], enable Cache for OpenCL
examples/hlb_cifar10.py
TORCHWEIGHTS: [1], use torch to initialize weights DISABLE_BACKWARD: [1], dont use backward operations
examples/benchmark_train_efficientnet.py & examples/hlb_cifar10.py
ADAM: [1], enable Adam optimization
examples/hlb_cifar10.py & xamples/hlb_cifar10_torch.py
STEPS: [0-10], number of steps FAKEDATA: [1], enable to use random data
examples/train_efficientnet.py
STEPS: [1024 dividable], number of steps TINY: [1], use a tiny convolution network IMAGENET: [1], use imagenet for training
examples/train_efficientnet.py & examples/train_resnet.py
TRANSFER: [1], enable to use pretrained data
examples & test/external/external_test_opt.py
NUM: [18, 2], what ResNet[18] / EfficientNet[2] to train
test/test_ops.py
PRINT_TENSORS: [1], print tensors FORWARD_ONLY: [1], use forward operations only
test/test_speed_v_torch.py
TORCHCUDA: [1], enable the torch cuda backend
test/external/external_test_gpu_ast.py
KOPT: [1], enable kernel optimization KCACHE: [1], enable kernel cache
test/external/external_test_opt.py
ENET_NUM: [-2,-1], what EfficientNet to use
test/test_dtype.py & test/extra/test_utils.py & extra/training.py
CI: [1], enable to avoid some tests to run in CI
examples & extra & test
BS: [8, 16, 32, 64, 128], bytesize