mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 17:38:06 -05:00

Files

Eli Frigo 801564f31b Remove POW llop and add SQRT llop (#1104 )

* fixed division by zero for fast operations

* made et closer to 0

* replace POW llop with SQRT

* updated mlops to swap SQRT and POW llops

* updated hlops to swap POW and SQRT

* added sqrt llop to cpu runtime

* added sqrt llop to cstyle codegen

* added POW llop to llvm ir codegen

* added SQRT llop to torch runtime

* moved pow from mlops to hlops

* found a better way to do reverse pow

* fixed indentation

* added SQRT llop to triton

* update docs to match new llops

* removed POW operator from assembly codegen

* added sqrt and rsqrt to pow hlop

* rewrote pow function in tensor.py

* Adjust tolerance

* Adjust for adamw

* Reduce for Adam too

* removed accidental leftover code

* removed all of accidental code

* added rsqrt test

* removed pow from mlops again

it was added back when resolving merge conflicts

---------

Co-authored-by: Jacky Lee <jla524@sfu.ca>

2023-07-05 18:07:58 -07:00

1.7 KiB

Raw Blame History

Adding a new accelerator to tinygrad

It's pretty easy to add a new accelerator to tinygrad. All you need to do is implement a total of 26 (optionally 27) low level ops. Then tinygrad takes care of the rest, handling derivatives and syntactic sugar.

llops

These are the ops that you must implement for your accelerator of choice. Compiled Accelerators do not need to implement movement_ops, as they are handled by the ShapeTracker.

Buffer                                                       # class of memory on this device
unary_op  (NOOP, EXP2, LOG2, CAST, SIN, SQRT)                # A -> A
reduce_op (SUM, MAX)                                         # A -> B (smaller size, B has 1 in shape)
binary_op (ADD, SUB, MUL, DIV, CMPEQ, MAX)                   # A + A -> A (all the same size)
movement_op (EXPAND, RESHAPE, PERMUTE, PAD, SHRINK, STRIDE)  # A -> B (different size)
load_op   (EMPTY, RAND, CONST, FROM, CONTIGUOUS, CUSTOM)     # -> A   (initialize data on device)
fused_op [[optional]] (MULACC)                               # A * A -> B

mlops

These are the mid level ops that handle the derivatives.

Relu, Log, Exp, Sin                            # unary ops
Sum, Max                                       # reduce ops (with axis argument)
Maximum, Add, Sub, Mul, Pow, Div, Equal        # binary ops (no broadcasting, use expand)
Expand, Reshape, Permute, Pad, Shrink, Flip    # movement ops

These are implemented in mlops.py.

hlops

These are the syntax sugar. They are built on top of the mlops and support most of the things that you could expect from a tensor library.

These are implemented in tensor.py.

1.7 KiB Raw Blame History

Adding a new accelerator to tinygrad

llops

mlops

hlops

1.7 KiB

Raw Blame History