* use full_shape to determine if index can potentially overflow * update comment * use shapetracker to check max index value * wip * lint * handle mask * upcast to int64 by st is noop on WGSL * fix comments * Handle negative overflow, intermediaries overflow, int64 support handle negative overflow handle symbolic wip handle intermediate values wip check if typemap support int64 lint comment * add invalid_dtype lint * Fix bug on checking mask overflow wip wip * Add more tests, need to resolve partial upcast test Valid_view_dup test valid op overflow refine test cases clean up cleanup wip refine tests lint * Upcast is handled by lower_load_store upcast as graph_rewrite to backtrack update test wip cleanup wip cleanup do upcast in lower_load_store lint * cleanup * do upcast within lower_load_store and mutate ctx * do upcast in get_idx and view revert lint * cleanup * Upcast in vec, const upcast to const test case 3 upcast on vector lint * simplify idx with symbolic in case of fake overflow test case4 test case 4 update test * test case4 is only for metal * try: upcast inside graph_rewrite instead of shapetracker wip * checking overflow can just be done directly on all views, with idxs * cleanup * REMOVE hard coded uop test for idx upcast * refactor cleanup refactor * do actual casting when necessary, instead of rewriting all idx hard code uop test new upcast * check dtype for int64 in webgpu * cleanup cleanup * cleanup * update tests cleanup comment cleanup cleanup * comment * comment * update comment update comment * refactor * typo * keep the scope to only upcasting * white space * Revert "white space" This reverts commit314d7eb184. * Revert "keep the scope to only upcasting" This reverts commit1ef701dd85. * sym folding is not necessary lint1 * fold symbolic lint * use symbolic simple when folding shapetracker idx * full sym folding is required after all... * Ops.CAST should retain the src min max * put rewrite to lowerer wip * start testing on higher level wip test higher level in test_tensor * find Ops.STORE in list instead of recursively * check dtype support when upcasting * remove invalid_dtype * lint * fix int64 support checks in upcast lint * skipif skipunless * revert fold to find test case * Revert "revert fold to find test case" This reverts commit225bb6e801. * test sym folding * handle ptx * wip * wip * delete hard coded uop test * lint fixes * wip * fix checking for None * lint * handle ptx * comment * dtype for overflow() * update skipIf skipUnless * assert in wgsl renderer for int64 wip * do folded_upcast in to_indexed_op, real_size uses views_to_indexed_ops * assert in lowerer for dtype support lint * Revert "assert in lowerer for dtype support" This reverts commit8e9b1b79bf. * assert dtype in kernel.py * Revert "assert dtype in kernel.py" This reverts commite29b9a9893. * wip * assert in render * remove old assert * check dtype from rendere, assert in upcast wip * smaller arange for sym fold case * linearize directly * use expand directly * lint * lint * rename * no need to check dtype in device.py * trigger pr * remove dtype assert in upcast, make wgpu fail in render * use DType for type hint instead of dtypes * assert on KeyError in tests for webgpu backend int64 * use a tuple for src * test real kernel run wip * lint error * restore * fix real_size * update test example * resolve merge stuff --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
tinygrad: For something between PyTorch and karpathy/micrograd. Maintained by tiny corp.
This may not be the best deep learning framework, but it is a deep learning framework.
Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. If XLA is CISC, tinygrad is RISC.
tinygrad is still alpha software, but we raised some money to make it good. Someday, we will tape out chips.
Features
LLaMA and Stable Diffusion
tinygrad can run LLaMA and Stable Diffusion!
Laziness
Try a matmul. See how, despite the style, it is fused into one kernel with the power of laziness.
DEBUG=3 python3 -c "from tinygrad import Tensor;
N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);
c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);
print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
And we can change DEBUG to 4 to see the generated code.
Neural networks
As it turns out, 90% of what you need for neural networks are a decent autograd/tensor library. Throw in an optimizer, a data loader, and some compute, and you have all you need.
from tinygrad import Tensor, nn
class LinearNet:
def __init__(self):
self.l1 = Tensor.kaiming_uniform(784, 128)
self.l2 = Tensor.kaiming_uniform(128, 10)
def __call__(self, x:Tensor) -> Tensor:
return x.flatten(1).dot(self.l1).relu().dot(self.l2)
model = LinearNet()
optim = nn.optim.Adam([model.l1, model.l2], lr=0.001)
x, y = Tensor.rand(4, 1, 28, 28), Tensor([2,4,3,7]) # replace with real mnist dataloader
with Tensor.train():
for i in range(10):
optim.zero_grad()
loss = model(x).sparse_categorical_crossentropy(y).backward()
optim.step()
print(i, loss.item())
See examples/beautiful_mnist.py for the full version that gets 98% in ~5 seconds
Accelerators
tinygrad already supports numerous accelerators, including:
And it is easy to add more! Your accelerator of choice only needs to support a total of ~25 low level ops.
To check default accelerator run: python3 -c "from tinygrad import Device; print(Device.DEFAULT)"
Installation
The current recommended way to install tinygrad is from source.
From source
git clone https://github.com/tinygrad/tinygrad.git
cd tinygrad
python3 -m pip install -e .
Direct (master)
python3 -m pip install git+https://github.com/tinygrad/tinygrad.git
Documentation
Documentation along with a quick start guide can be found on the docs website built from the docs/ directory.
Quick example comparing to PyTorch
from tinygrad import Tensor
x = Tensor.eye(3, requires_grad=True)
y = Tensor([[2.0,0,-2.0]], requires_grad=True)
z = y.matmul(x).sum()
z.backward()
print(x.grad.tolist()) # dz/dx
print(y.grad.tolist()) # dz/dy
The same thing but in PyTorch:
import torch
x = torch.eye(3, requires_grad=True)
y = torch.tensor([[2.0,0,-2.0]], requires_grad=True)
z = y.matmul(x).sum()
z.backward()
print(x.grad.tolist()) # dz/dx
print(y.grad.tolist()) # dz/dy
Contributing
There has been a lot of interest in tinygrad lately. Following these guidelines will help your PR get accepted.
We'll start with what will get your PR closed with a pointer to this section:
- No code golf! While low line count is a guiding light of this project, anything that remotely looks like code golf will be closed. The true goal is reducing complexity and increasing readability, and deleting
\ns does nothing to help with that. - All docs and whitespace changes will be closed unless you are a well-known contributor. The people writing the docs should be those who know the codebase the absolute best. People who have not demonstrated that shouldn't be messing with docs. Whitespace changes are both useless and carry a risk of introducing bugs.
- Anything you claim is a "speedup" must be benchmarked. In general, the goal is simplicity, so even if your PR makes things marginally faster, you have to consider the tradeoff with maintainablity and readablity.
- In general, the code outside the core
tinygrad/folder is not well tested, so unless the current code there is broken, you shouldn't be changing it. - If your PR looks "complex", is a big diff, or adds lots of lines, it won't be reviewed or merged. Consider breaking it up into smaller PRs that are individually clear wins. A common pattern I see is prerequisite refactors before adding new functionality. If you can (cleanly) refactor to the point that the feature is a 3 line change, this is great, and something easy for us to review.
Now, what we want:
- Bug fixes (with a regression test) are great! This library isn't 1.0 yet, so if you stumble upon a bug, fix it, write a test, and submit a PR, this is valuable work.
- Solving bounties! tinygrad offers cash bounties for certain improvements to the library. All new code should be high quality and well tested.
- Features. However, if you are adding a feature, consider the line tradeoff. If it's 3 lines, there's less of a bar of usefulness it has to meet over something that's 30 or 300 lines. All features must have regression tests. In general with no other constraints, your feature's API should match torch or numpy.
- Refactors that are clear wins. In general, if your refactor isn't a clear win it will be closed. But some refactors are amazing! Think about readability in a deep core sense. A whitespace change or moving a few functions around is useless, but if you realize that two 100 line functions can actually use the same 110 line function with arguments while also improving readability, this is a big win. Refactors should pass process replay.
- Tests/fuzzers. If you can add tests that are non brittle, they are welcome. We have some fuzzers in here too, and there's a plethora of bugs that can be found with them and by improving them. Finding bugs, even writing broken tests (that should pass) with
@unittest.expectedFailureis great. This is how we make progress. - Dead code removal from core
tinygrad/folder. We don't care about the code in extra, but removing dead code from the core library is great. Less for new people to read and be confused by.
Running tests
You should install the pre-commit hooks with pre-commit install. This will run the linter, mypy, and a subset of the tests on every commit.
For more examples on how to run the full test suite please refer to the CI workflow.
Some examples of running tests locally:
python3 -m pip install -e '.[testing]' # install extra deps for testing
python3 test/test_ops.py # just the ops tests
python3 -m pytest test/ # whole test suite
Process replay tests
Process replay compares your PR's generated kernels against master. If your PR is a refactor or speedup without any expected behavior change, It should include [pr] in the pull request title.