tinygrad has four pieces * frontend (Tensor -> LazyBuffer) * See tensor.py, function.py, multi.py, and lazy.py * The user interacts with the Tensor class * This outputs LazyBuffers, which form the simple compute graph * scheduler (LazyBuffer -> ScheduleItem) * See engine/schedule.py * When a Tensor is realized, the scheduler is run to get its LazyBuffers to be computed * This takes in LazyBuffers and groups them as appropriate into kernels. * It returns a list of ScheduleItems + all the Variables used in the graph * lowering (TODO: lots of work to clean this up still) * See codegen/ (ScheduleItem.ast -> UOps) * ScheduleItems have an ast that's compiled into actual GPU code * Many optimization choices can be made here, this contains a beam search. * renderer/compiler (UOps -> machine code) * UOps are tinygrad's IR, similar to LLVM IR * Here we either convert them to a high level language or machine code directly * engine/realize.py (ScheduleItem -> ExecItem) * runtime * See runtime/ * Runtime actually interacts with the GPUs * It manages Buffers, Programs, and Queues * Sadly, METAL and GPU (OpenCL) don't have a compiler that can be pulled out from the device itself