diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000000..052347e02e --- /dev/null +++ b/docs/README.md @@ -0,0 +1,125 @@ +### Welcome to the tinygrad documentation + +General instructions you will find in [README.md](https://github.com/geohot/tinygrad/blob/master/README.md) + +[abstraction.py](https://github.com/geohot/tinygrad/blob/master/docs/abstractions.py) is a well documented showcase of the abstraction stack. + +There are plenty of [tests](https://github.com/geohot/tinygrad/tree/master/test) you can read through +[Examples](https://github.com/geohot/tinygrad/tree/master/examples) contains tinygrad implementations of popular models (vision and language) and neural networks. LLama, Stable diffusion, GANs and Yolo to name a few + +### Environment variables +Here is a list of environment variables you can use with tinygrad. +Most of these are self-explanatory, and used to enable an option at runtime. +Example : `GPU=1 DEBUG=4 python3 -m pytest` + +The columns are: Variable, Value and Description +They are also grouped into either general tinygrad or specific files + +##### General tinygrad +DEBUG: [1-4], enable debugging output, with 4 you get operations, timings, speed, generated code and more +GPU: [1], enable the GPU backend +CPU: [1], enable CPU backend +MPS: [1], emable MPS device (for Mac M1 and after) +METAL: [1], enable Metal backend (for Mac M1 and after) +METAL_XCODE: [1], enable Metal using MacOS Xcode sdk +TORCH: [1], enable Torch backend +CLANG: [1], enable Clang backend +LLVM: [1], enable LLVM backend +LLVMOPT: [1], enable LLVM optimization +LAZY: [1], enable lazy operations +OPT: [1-4], enable optimization +OPTLOCAL: [1], enable local optimization +JIT: [1], enable Jit +GRAPH: [1], Create a graph of all operations +GRAPHPATH: [/path/to], what path to generate the graph image +PRUNEGRAPH, [1], prune movementops and loadops from the graph +PRINT_PRG: [1], print program +FLOAT16: [1], use float16 instead of float32 +ENABLE_METHOD_CACHE: [1], enable method cache +EARLY_STOPPING: [1], stop early +DISALLOW_ASSIGN: [1], enable not assigning the realized lazydata to the lazy output buffer + +##### tinygrad/codegen/cstyle.py +NATIVE_EXPLOG: [1], enable using native explog + +##### accel/ane/2_compile/hwx_parse.py +PRINTALL: [1], print all ane registers + +##### extra/onnx.py +ONNXLIMIT: [ ], set a limit for Onnx +DEBUGONNX: [1], enable Onnx debugging + +##### extra/thneed.py +DEBUGCL: [1-4], enable Debugging for OpenCL +PRINT_KERNEL: [1], Print OpenCL Kernels + +##### extra/kernel_search.py +OP: [1-3], different operations +NOTEST: [1], enable not testing ast +DUMP: [1], enable dumping of intervention cache +REDUCE: [1], enable reduce operations +SIMPLE_REDUCE: [1], enable simpler reduce operations +BC: [1], enable big conv operations +CONVW: [1], enable convw operations +FASTCONV: [1], enable faster conv operations +GEMM: [1], enable general matrix multiply operations +BROKEN: [1], enable a kind of operation +BROKEN3: [1], enable a kind of operation + +##### examples/vit.py +LARGE: [1], enable larger dimension model + +##### examples/llama.py +WEIGHTS: [1], enable using weights + +##### examples/mlperf +MODEL: [resnet,retinanet,unet3d,rnnt,bert,maskrcnn], what models to use + +##### examples/benchmark_train_efficientnet.py +CNT: [10], the amount of times to loop the benchmark +BACKWARD: [1], enable backward call +TRAINING: [1], set Tensor.training +CLCACHE: [1], enable Cache for OpenCL + +##### examples/hlb_cifar10.py +TORCHWEIGHTS: [1], use torch to initialize weights +DISABLE_BACKWARD: [1], dont use backward operations + +##### examples/benchmark_train_efficientnet.py & examples/hlb_cifar10.py +ADAM: [1], enable Adam optimization + +##### examples/hlb_cifar10.py & xamples/hlb_cifar10_torch.py +STEPS: [0-10], number of steps +FAKEDATA: [1], enable to use random data + +##### examples/train_efficientnet.py +STEPS: [1024 dividable], number of steps +TINY: [1], use a tiny convolution network +IMAGENET: [1], use imagenet for training + +##### examples/train_efficientnet.py & examples/train_resnet.py +TRANSFER: [1], enable to use pretrained data + +##### examples & test/external/external_test_opt.py +NUM: [18, 2], what ResNet[18] / EfficientNet[2] to train + +##### test/test_ops.py +PRINT_TENSORS: [1], print tensors +FORWARD_ONLY: [1], use forward operations only + +##### test/test_speed_v_torch.py +TORCHCUDA: [1], enable the torch cuda backend + +##### test/external/external_test_gpu_ast.py +KOPT: [1], enable kernel optimization +KCACHE: [1], enable kernel cache + +##### test/external/external_test_opt.py +ENET_NUM: [-2,-1], what EfficientNet to use + +##### test/test_dtype.py & test/extra/test_utils.py & extra/training.py +CI: [1], enable to avoid some tests to run in CI + +##### examples & extra & test +BS: [8, 16, 32, 64, 128], bytesize +