Commit Graph

98 Commits

Author SHA1 Message Date
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
वेदांत
2453d99050 rms matching pytorch implementation (#10319)
* rms matching pytorch implementation

* pre commit fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-17 08:23:11 -07:00
qazal
7cfe367c07 failing test for slow embedding kernel with FUSE_ARANGE=1 [pr] (#10330) 2025-05-15 14:58:11 +03:00
chenyu
c8f47c1d07 not_support_multi_device helper (#9831)
unify the test helper to skip ci device that does not support multi
2025-04-10 05:25:29 -04:00
chenyu
ba41076e94 update embedding test to not use dtypes.long [pr] (#9556) 2025-03-23 21:33:38 -04:00
chenyu
f8976dd2eb enable more webgpu tests (#9502)
OSX has larger buffer number limit, and it supports fp16 now
2025-03-18 23:03:54 -04:00
chenyu
e02e3b94c3 remove SQRT hack in llvm (#9067)
replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward
2025-02-13 15:42:34 -05:00
George Hotz
33a1151f2f Revert "match torch rmsnorm implementation (#6799)" (#9052)
This reverts commit a66b8250e0.
2025-02-13 14:42:45 +08:00
Ryan Dorrington
a66b8250e0 match torch rmsnorm implementation (#6799)
* update rmsnorm to match torch implementation

* run all tests

* formatting

* formatting

* oneline

* default to 1e-6

* restore old test

* formatting

* don't save elementwise_affine

* your message

* ignore webgpu

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-13 13:02:51 +08:00
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
chenyu
1f730ae8f8 remove retain_graph in Tensor.backward [pr] (#8835)
not used. gradient accumulation works directly
2025-01-31 13:41:26 -05:00
chenyu
393eec3201 raise RuntimeError for uneven shard [pr] (#8593)
no 7B llama on 6 GPUs

skip 70B
2025-01-14 14:51:48 -05:00
George Hotz
b71c51191b tests from remove uop mutability [pr] (#8442)
* tests from remove uop mutability [pr]

* more test fix

* simpler test fix

* remove that
2024-12-29 12:14:10 -05:00
George Hotz
bd9c015b09 tests from grad uop path [pr] (#8313) 2024-12-18 09:25:05 -08:00
chenyu
40a4c603b9 remove more test skip for webgpu [pr] (#8192) 2024-12-12 14:06:35 -05:00
Eitan Turok
56017c52a0 Raise error when model architecture does not match state dict (#7772)
* init

* style

* style

* style

* fix test
2024-11-20 00:11:54 +08:00
chenyu
26200574dc load_state_dict test cases when model and data shard differently (#7774)
current behavior is weird... when model is sharded and state_dict is not, load shards the state_dict and model shard axis does not change.
but if model and state_dict are sharded differently, model shard axis becomes the state_dict axis after load.

it should either always use model shard axis or always use state_dict shard
2024-11-18 16:08:24 -05:00
George Hotz
205befa788 move is_dtype_supported to device [pr] (#7575) 2024-11-07 20:38:03 +08:00
Ahmed Harmouche
36488a2a43 Use is_dtype_supported in more places in tests (#7529) 2024-11-04 09:21:15 -05:00
George Hotz
c8bf09b7d4 s/UOps/Ops (#7500)
* s/UOps/Ops [pr]

* fix
2024-11-03 11:26:10 +08:00
Bhavya Gada
534597e753 fix all test warnings (#7024)
* fix pytorch warning in nn.conv2d for same padding

* fix future warning in torch load

* fix overflow warning in tensor list test: https://github.com/numpy/numpy/issues/23606#issuecomment-1512752172

* fix floating point warnings in dtype tests using docs https://numpy.org/doc/stable/reference/generated/numpy.errstate.html and a neat solution https://stackoverflow.com/questions/53634965/change-np-seterr-behavior-inside-a-function-only

* put err state in one place; comment taken care of by function hover

* enter np errstate context manager on test setup

* put decorator on class
2024-10-18 08:56:40 +08:00
Bhavya Gada
23c09f4b4c add support for padding='same' in nn.conv (#6975)
* add support for padding='same' in nn.conv

* express concisely

* simplify loop

* test same padding with dilation and conv1d

* fix bad indentation

* make loop one liner
2024-10-11 11:39:07 +08:00
czhu
08bfa8632b embedding shape (#6930) 2024-10-08 14:42:20 +08:00
wozeparrot
c100f3d406 default threefry (#6116) 2024-09-25 17:45:13 +08:00
Tim Becker
dfb818788e Support reduction parameter in more loss functions (#6302) 2024-09-07 05:11:20 +08:00
madt2709
4bb98d8882 Fix track_running_stats in batchnorm (#6200)
* Fix track_running_stats in batchnorm

* Fix linter

* Update test_fold_conv_batchnorm_notrain to keep allowed at 1

* Add test_fold_conv_batchnorm_notrain_no_running_stats

* Save 1 line
2024-08-20 14:01:22 -07:00
qazal
28c75bf2a6 merge uops with ops (#6111)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-08-16 18:17:57 -04:00
qazal
c23d44c779 AST is UOp (#6030)
* most of the work from the uops2 branch

* schedule

* realize

* kernel

* lowerer

* search

* green

* merge uops with ops

* Revert "merge uops with ops"

This reverts commit 1408a59f12.

* fix benchmark

* remove extra dedup
2024-08-16 22:09:00 +03:00
George Hotz
64563abc90 add LSTMCell to nn (#6080)
* add LSTMCell to nn

* lstmcell works with no input on first

* fix no bias 0

* simpler
2024-08-14 12:08:42 -07:00
George Hotz
fa7e734b49 MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
George Hotz
a9f5a764dc make BatchNorm work for 2D and 3D (#5477)
* make BatchNorm work for 2D and 3D

* beautiful mnist shouldn't use BatchNorm2d
2024-07-14 11:39:58 -07:00
George Hotz
6707c778d0 scheduleitem is not Tuple [run_process_replay] (#5425)
* scheduleitem is not Tuple [run_process_replay]

* fix tests

* fix op + fuzzers

* fix mop test
2024-07-12 15:13:19 -07:00
chenyu
b2c3a28a5e nn.RMSNorm (#5272)
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00
Roelof van Dijk
975b811ad9 names shadowing builtins (#5179)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-27 08:15:01 -04:00
chenyu
4296507021 Tensor.sum returns in acc_dtype if specified (#5012)
* Tensor.sum returns in acc_dtype if specified

* skip PYTHON for now

* revert that

* relax that
2024-06-17 16:35:52 -04:00
chenyu
97b05f567e revert the .detach() in layernorm (#4904)
* revert the .detach() in layernorm

it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm.
Added backward tests for weights, bias and input for these norms.

* bigger atol for llvm

* relax backward more
2024-06-10 18:02:05 -04:00
nimlgen
eb9689336e nv mockgpu (#4600)
* mockgpu nv

* works

* comment that out

* fix merge

* setup gpuocelot

* install packages

* not run all of them

* passes

* fix ci

* almost

* should pass

* linter

* linter 2

* try this?

* ugn, not supported

* ci

* remove ticket from description

* better descs
2024-05-15 23:46:08 +03:00
George Hotz
afa9753d39 ruff cleanup (#4594)
* check editor config

* no editorconfig, it doesn't work

* ruff cleanups
2024-05-14 21:16:14 -07:00
wozeparrot
d7670f8141 quantized llama multilazybuffer fix (#4557) 2024-05-12 14:19:21 -07:00
Patrick Tsai
0147174ad6 Embedding in one kernel (#4036)
* Embedding is in one kernel

* embedding is one kernel

* rm extra line

* newline

* bert test counts state vars?

* add a test?

* move items around

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-04-02 11:38:21 -04:00
David Hou
ba6c041eab fix SCE ignore_index with label_smoothing (#3574)
* fix SCE ignore_index with label_smoothing

* break up the line

* only 3 cats in test

* Revert "only 3 cats in test"

This reverts commit 18be069c90.
2024-03-01 22:19:45 -05:00
David Hou
b3cdc11a58 label_smoothing in sparse_cat_crossentropy (#3568)
* label_smoothing in sparse_cat_crossentropy

* test multiple values, assert
2024-03-01 20:02:46 -05:00
David Hou
e5385eecfc UnsyncedBatchNorm with synced trainable weights for hlb cifar (#3472)
* UnsyncedBatchNorm with synced trainable weights for hlb cifar

* multitensor reshape tests

* test mlb assign change axis

* E501

* argfix axis

* don't import batchnorm from hlb_cifar in test_multitensor

* pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB

* add backprop test for UnsyncedBatchNorm

* break out MLB assign and reshape changes

* manually shard running mean and running var

* don't shard unless syncbn=0

* replace nn.BatchNorm2d with UnsyncedBatchNorm

* don't increment num_batches_tracked if not tracking running stats

* update tests

* oops

* Revert "oops"

This reverts commit 5e8a67a535.

* Revert "update tests"

This reverts commit 7ebf65d89a.

* Revert "don't increment num_batches_tracked if not tracking running stats"

This reverts commit 78de0ea9ee.

* Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm"

This reverts commit d03da53da7.

* don't increment num_batched_tracked if not tracking running stats

* oops

* test_batchnorm_axis

* compare against torch

* types

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-29 22:52:07 -05:00
xarkes
28a8b72024 Remove Interpreted device & remaining CPU/TORCH ref (#3423)
* Remove Interpreted device & remaining CPU/TORCH ref

* Oops

* supports_device was useful

* Fix doc wording

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-16 00:30:21 -05:00
chenyu
e52a609240 make WINO a context var, and LATEWINO in hlb_cifar (#3161) 2024-01-17 20:21:26 -05:00
chenyu
ee6a73826b clean up test_nn.py (#3049)
used Tensor.train decorator, reordered to always tinygrad instances first, and removed redundant idx cast
2024-01-08 18:45:03 -05:00
chenyu
3eb3664074 fix nn.Embedding with empty length input (#3048) 2024-01-08 18:08:36 -05:00
George Hotz
1765849937 new lazy, benchmark (#2878)
* lazy rewrite, try 2

* min fix tests

* pass contig test

* put broken pads back

* move that to realize

* no contig child fixes array packing

* so wrong

* now that's correct

* base children

* fix bind issues

* disable to_image_idx

* fix tests

* that failure shouldn't break other tests

* more fixes

* fix torch

* skip failing tests in CI

* 1e-7

* half is broken

* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
chenyu
73cadfbb3c Remove pytest markers (#2831)
* remove pytest marker

* fix some, skip some

* tweak

* fix

* skip slow

* skip more
2023-12-18 18:53:28 -05:00
George Hotz
6d6eb9302d ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00