tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Go to file

Patrick Tsai 971d7f5d7c O(n) arange attempt (#3530 )

* It works?

* Clamp correctly

* Refactor

* Make code better

* Undo some stuff

* First step to trying to make floats work

* Floats work in Python op but not metal because int div is different

Python integerdivision was implemented as // which rounds towards
negative infinity, but C integer division rounds towards 0 so there
is an off-by-1 division error

* arange does cumsum with ints and then multiplies by step

This is so loop optimization can remain int only

* Undo a lot of symbolic changes

* Final check

* Cleanup

* There can be multiple phis

* Fix multiple phi op removal

* const sets dtype correctly

* Fix bugs

* Fix a couple bugs and add loop vars to resolve

* missed one

* Don't trim too many ops

* Fix symbolic test

* Use ones instead of full

* Delete test

* Lint passes

* max node error

* Small updates to loop logic

* Remove unnecessary changes

* We are getting somewhere

* Simple case

* Fix

* rm, prn

* Better

* If NumNode doesn't work then continue

* clamp is needed for arange(256)

* Move everything into the optim fn

* Replace correctly

* Order optimizations better

* Delete

* mypy

* Test for simplification

* Rename

* Fix test

* update test description

* Undo more

* Cleanup

* No replaced_ops map

* Fix lint

* AssertionError

* back again

* Reinstate assertion

* Return true and make diff not as big

* Bigger range for test

* Change cumsum impl

* fix bug

* make big cumsum work

* lint

* Undo cumsum 2-stage removal

* No while helper

* optional min/max clamping

* floats work

* rm giant arange test

* fix python cast None

* Check phi parents

* one phi allowed per where

* Fix one phi per where

* Rework iteration

* Delete assertions

* convert to int

* Try mul -1 instead of neg for hip..?

* Remove one phi per where requirements

* one accum only

* Lint

* should simplify a loop at a time

* Don't get rid of loop explcitly

* Need to iterate backwards

* lint

* unary neg

* Make optim work for onnx and sum_pad_collapse

* Better message

* filter alu ops correctly

* Fix the limiter

* lint and simplify

* Add it back

* off by one error

* test wheres and phis

* test max ops and non-if stuff

* <=

* cast_scalar

* Oops

* Change test

* Pass loop uops instead of a modified map

* Cut param transfer between linearizer and uops

* Fix issues

* Fix lint

* fix efficientnet python 3.8 invalid syntax

* distinct vars in seen_vars

* accurate var names

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>

2024-03-11 16:09:20 -07:00

.github/workflows

add llama 2 70B in ci and verify output (#3682 )

2024-03-11 12:48:22 -04:00

disassemblers/adreno

start Qualcomm GPU driver (#2804 )

2023-12-16 23:10:50 -08:00

docs

linearizer ast as a tuple of lazyops (#3689 )

2024-03-11 15:39:04 -07:00

examples

linearizer ast as a tuple of lazyops (#3689 )

2024-03-11 15:39:04 -07:00

extra

linearizer ast as a tuple of lazyops (#3689 )

2024-03-11 15:39:04 -07:00

openpilot

move create schedule and delete old API (#3377 )

2024-02-12 18:10:45 +01:00

test

O(n) arange attempt (#3530 )

2024-03-11 16:09:20 -07:00

tinygrad

O(n) arange attempt (#3530 )

2024-03-11 16:09:20 -07:00

.editorconfig

Revert "update editorconfig, enforce via CI (#1343 )" (#1380 )

2023-07-31 10:35:50 -07:00

.gitignore

No extra vars call (#3054 )

2024-01-09 09:52:58 -08:00

.pre-commit-config.yaml

add mypy --strict-equality to pre-commit (#3458 )

2024-02-21 03:41:05 -05:00

.pylintrc

ruff checks the max line length is 150 (#2734 )

2023-12-12 17:34:47 -08:00

.tokeignore

Add a quick start guide (#900 )

2023-06-04 08:51:20 -07:00

autogen_stubs.sh

hsa runtime (#3382 )

2024-02-15 14:14:34 +01:00

LICENSE

Updated LICENSE year (#760 )

2023-05-01 15:35:23 -07:00

mypy.ini

move gpuctypes in tree (#3253 )

2024-01-26 12:25:03 -08:00

push_pypi.sh

push pypi

2020-10-27 08:13:15 -07:00

README.md

README: ops_cpu and ops_torch have been removed (#3539 )

2024-02-29 10:22:11 -05:00

ruff.toml

lars optimizer + tests (#3631 )

2024-03-06 18:11:01 -05:00

run_multibackend.sh

convert $@ to "$@" in run_multibackend.sh (#1379 )

2023-07-31 10:39:22 -07:00

setup.py

lars optimizer + tests (#3631 )

2024-03-06 18:11:01 -05:00

strip_whitespace.sh

strip whitespace

2023-06-27 10:11:43 -07:00

sz.py

move autogen to runtime/autogen (#3254 )

2024-01-26 12:44:19 -08:00

README.md

tinygrad: For something between PyTorch and karpathy/micrograd. Maintained by tiny corp.

Homepage | Documentation | Examples | Showcase | Discord

This may not be the best deep learning framework, but it is a deep learning framework.

Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. If XLA is CISC, tinygrad is RISC.

tinygrad is still alpha software, but we raised some money to make it good. Someday, we will tape out chips.

Features

LLaMA and Stable Diffusion

tinygrad can run LLaMA and Stable Diffusion!

Laziness

Try a matmul. See how, despite the style, it is fused into one kernel with the power of laziness.

DEBUG=3 python3 -c "from tinygrad import Tensor;
N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);
c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);
print((c.numpy() - (a.numpy() @ b.numpy())).mean())"

And we can change DEBUG to 4 to see the generated code.

Neural networks

As it turns out, 90% of what you need for neural networks are a decent autograd/tensor library. Throw in an optimizer, a data loader, and some compute, and you have all you need.

from tinygrad import Tensor, nn

class LinearNet:
  def __init__(self):
    self.l1 = Tensor.kaiming_uniform(784, 128)
    self.l2 = Tensor.kaiming_uniform(128, 10)
  def __call__(self, x:Tensor) -> Tensor:
    return x.flatten(1).dot(self.l1).relu().dot(self.l2)

model = LinearNet()
optim = nn.optim.Adam([model.l1, model.l2], lr=0.001)

x, y = Tensor.rand(4, 1, 28, 28), Tensor([2,4,3,7])  # replace with real mnist dataloader

for i in range(10):
  optim.zero_grad()
  loss = model(x).sparse_categorical_crossentropy(y).backward()
  optim.step()
  print(i, loss.item())

See examples/beautiful_mnist.py for the full version that gets 98% in ~5 seconds

Accelerators

tinygrad already supports numerous accelerators, including:

And it is easy to add more! Your accelerator of choice only needs to support a total of ~25 low level ops. More information can be found in the documentation for adding new accelerators.

Installation

The current recommended way to install tinygrad is from source.

From source

git clone https://github.com/tinygrad/tinygrad.git
cd tinygrad
python3 -m pip install -e .

Direct (master)

python3 -m pip install git+https://github.com/tinygrad/tinygrad.git

Documentation

Documentation along with a quick start guide can be found in the docs/ directory.

Quick example comparing to PyTorch

from tinygrad import Tensor

x = Tensor.eye(3, requires_grad=True)
y = Tensor([[2.0,0,-2.0]], requires_grad=True)
z = y.matmul(x).sum()
z.backward()

print(x.grad.numpy())  # dz/dx
print(y.grad.numpy())  # dz/dy

The same thing but in PyTorch:

import torch

x = torch.eye(3, requires_grad=True)
y = torch.tensor([[2.0,0,-2.0]], requires_grad=True)
z = y.matmul(x).sum()
z.backward()

print(x.grad.numpy())  # dz/dx
print(y.grad.numpy())  # dz/dy

Contributing

There has been a lot of interest in tinygrad lately. Following these guidelines will help your PR get accepted.

We'll start with what will get your PR closed with a pointer to this section:

No code golf! While low line count is a guiding light of this project, anything that remotely looks like code golf will be closed. The true goal is reducing complexity and increasing readability, and deleting \ns does nothing to help with that.
All docs and whitespace changes will be closed unless you are a well-known contributor. The people writing the docs should be those who know the codebase the absolute best. People who have not demonstrated that shouldn't be messing with docs. Whitespace changes are both useless and carry a risk of introducing bugs.
Anything you claim is a "speedup" must be benchmarked. In general, the goal is simplicity, so even if your PR makes things marginally faster, you have to consider the tradeoff with maintainablity and readablity.
In general, the code outside the core tinygrad/ folder is not well tested, so unless the current code there is broken, you shouldn't be changing it.
If your PR looks "complex", is a big diff, or adds lots of lines, it won't be reviewed or merged. Consider breaking it up into smaller PRs that are individually clear wins. A common pattern I see is prerequisite refactors before adding new functionality. If you can (cleanly) refactor to the point that the feature is a 3 line change, this is great, and something easy for us to review.

Now, what we want:

Bug fixes (with a regression test) are great! This library isn't 1.0 yet, so if you stumble upon a bug, fix it, write a test, and submit a PR, this is valuable work.
Solving bounties! tinygrad offers cash bounties for certain improvements to the library. All new code should be high quality and well tested.
Features. However, if you are adding a feature, consider the line tradeoff. If it's 3 lines, there's less of a bar of usefulness it has to meet over something that's 30 or 300 lines. All features must have regression tests. In general with no other constraints, your feature's API should match torch or numpy.
Refactors that are clear wins. In general, if your refactor isn't a clear win it will be closed. But some refactors are amazing! Think about readability in a deep core sense. A whitespace change or moving a few functions around is useless, but if you realize that two 100 line functions can actually use the same 110 line function with arguments while also improving readability, this is a big win.
Tests/fuzzers. If you can add tests that are non brittle, they are welcome. We have some fuzzers in here too, and there's a plethora of bugs that can be found with them and by improving them. Finding bugs, even writing broken tests (that should pass) with @unittest.expectedFailure is great. This is how we make progress.
Dead code removal from core tinygrad/ folder. We don't care about the code in extra, but removing dead code from the core library is great. Less for new people to read and be confused by.

Running tests

You should install the pre-commit hooks with pre-commit install. This will run the linter, mypy, and a subset of the tests on every commit.

For more examples on how to run the full test suite please refer to the CI workflow.

Some examples of running tests locally:

python3 -m pip install -e '.[testing]'  # install extra deps for testing
python3 test/test_ops.py                # just the ops tests
python3 -m pytest test/                 # whole test suite

Languages

Python 72.5%

C 15.7%

Cuda 6.2%

C++ 2.2%

Metal 1.8%

Other 1.5%