tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
वेदांत	2453d99050	rms matching pytorch implementation (#10319 ) * rms matching pytorch implementation * pre commit fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-17 08:23:11 -07:00
qazal	7cfe367c07	failing test for slow embedding kernel with FUSE_ARANGE=1 [pr] (#10330 )	2025-05-15 14:58:11 +03:00
chenyu	c8f47c1d07	not_support_multi_device helper (#9831 ) unify the test helper to skip ci device that does not support multi	2025-04-10 05:25:29 -04:00
chenyu	ba41076e94	update embedding test to not use dtypes.long [pr] (#9556 )	2025-03-23 21:33:38 -04:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
chenyu	e02e3b94c3	remove SQRT hack in llvm (#9067 ) replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward	2025-02-13 15:42:34 -05:00
George Hotz	33a1151f2f	Revert "match torch rmsnorm implementation (#6799 )" (#9052 ) This reverts commit `a66b8250e0`.	2025-02-13 14:42:45 +08:00
Ryan Dorrington	a66b8250e0	match torch rmsnorm implementation (#6799 ) * update rmsnorm to match torch implementation * run all tests * formatting * formatting * oneline * default to 1e-6 * restore old test * formatting * don't save elementwise_affine * your message * ignore webgpu --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-13 13:02:51 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
chenyu	1f730ae8f8	remove retain_graph in Tensor.backward [pr] (#8835 ) not used. gradient accumulation works directly	2025-01-31 13:41:26 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
George Hotz	b71c51191b	tests from remove uop mutability [pr] (#8442 ) * tests from remove uop mutability [pr] * more test fix * simpler test fix * remove that	2024-12-29 12:14:10 -05:00
George Hotz	bd9c015b09	tests from grad uop path [pr] (#8313 )	2024-12-18 09:25:05 -08:00
chenyu	40a4c603b9	remove more test skip for webgpu [pr] (#8192 )	2024-12-12 14:06:35 -05:00
Eitan Turok	56017c52a0	Raise error when model architecture does not match state dict (#7772 ) * init * style * style * style * fix test	2024-11-20 00:11:54 +08:00
chenyu	26200574dc	load_state_dict test cases when model and data shard differently (#7774 ) current behavior is weird... when model is sharded and state_dict is not, load shards the state_dict and model shard axis does not change. but if model and state_dict are sharded differently, model shard axis becomes the state_dict axis after load. it should either always use model shard axis or always use state_dict shard	2024-11-18 16:08:24 -05:00
George Hotz	205befa788	move is_dtype_supported to device [pr] (#7575 )	2024-11-07 20:38:03 +08:00
Ahmed Harmouche	36488a2a43	Use is_dtype_supported in more places in tests (#7529 )	2024-11-04 09:21:15 -05:00
George Hotz	c8bf09b7d4	s/UOps/Ops (#7500 ) * s/UOps/Ops [pr] * fix	2024-11-03 11:26:10 +08:00
Bhavya Gada	534597e753	fix all test warnings (#7024 ) * fix pytorch warning in nn.conv2d for same padding * fix future warning in torch load * fix overflow warning in tensor list test: https://github.com/numpy/numpy/issues/23606#issuecomment-1512752172 * fix floating point warnings in dtype tests using docs https://numpy.org/doc/stable/reference/generated/numpy.errstate.html and a neat solution https://stackoverflow.com/questions/53634965/change-np-seterr-behavior-inside-a-function-only * put err state in one place; comment taken care of by function hover * enter np errstate context manager on test setup * put decorator on class	2024-10-18 08:56:40 +08:00
Bhavya Gada	23c09f4b4c	add support for padding='same' in nn.conv (#6975 ) * add support for padding='same' in nn.conv * express concisely * simplify loop * test same padding with dilation and conv1d * fix bad indentation * make loop one liner	2024-10-11 11:39:07 +08:00
czhu	08bfa8632b	embedding shape (#6930 )	2024-10-08 14:42:20 +08:00
wozeparrot	c100f3d406	default threefry (#6116 )	2024-09-25 17:45:13 +08:00
Tim Becker	dfb818788e	Support `reduction` parameter in more loss functions (#6302 )	2024-09-07 05:11:20 +08:00
madt2709	4bb98d8882	Fix track_running_stats in batchnorm (#6200 ) * Fix track_running_stats in batchnorm * Fix linter * Update test_fold_conv_batchnorm_notrain to keep allowed at 1 * Add test_fold_conv_batchnorm_notrain_no_running_stats * Save 1 line	2024-08-20 14:01:22 -07:00
qazal	28c75bf2a6	merge uops with ops (#6111 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-08-16 18:17:57 -04:00
qazal	c23d44c779	AST is UOp (#6030 ) * most of the work from the uops2 branch * schedule * realize * kernel * lowerer * search * green * merge uops with ops * Revert "merge uops with ops" This reverts commit `1408a59f12`. * fix benchmark * remove extra dedup	2024-08-16 22:09:00 +03:00
George Hotz	64563abc90	add LSTMCell to nn (#6080 ) * add LSTMCell to nn * lstmcell works with no input on first * fix no bias 0 * simpler	2024-08-14 12:08:42 -07:00
George Hotz	fa7e734b49	MetaOps.KERNEL (#5543 )	2024-07-17 19:41:23 -07:00
George Hotz	a9f5a764dc	make BatchNorm work for 2D and 3D (#5477 ) * make BatchNorm work for 2D and 3D * beautiful mnist shouldn't use BatchNorm2d	2024-07-14 11:39:58 -07:00
George Hotz	6707c778d0	scheduleitem is not Tuple [run_process_replay] (#5425 ) * scheduleitem is not Tuple [run_process_replay] * fix tests * fix op + fuzzers * fix mop test	2024-07-12 15:13:19 -07:00
chenyu	b2c3a28a5e	nn.RMSNorm (#5272 ) the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize	2024-07-02 21:39:01 -04:00
Roelof van Dijk	975b811ad9	names shadowing builtins (#5179 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-27 08:15:01 -04:00
chenyu	4296507021	Tensor.sum returns in acc_dtype if specified (#5012 ) * Tensor.sum returns in acc_dtype if specified * skip PYTHON for now * revert that * relax that	2024-06-17 16:35:52 -04:00
chenyu	97b05f567e	revert the .detach() in layernorm (#4904 ) * revert the .detach() in layernorm it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm. Added backward tests for weights, bias and input for these norms. * bigger atol for llvm * relax backward more	2024-06-10 18:02:05 -04:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
George Hotz	afa9753d39	ruff cleanup (#4594 ) * check editor config * no editorconfig, it doesn't work * ruff cleanups	2024-05-14 21:16:14 -07:00
wozeparrot	d7670f8141	quantized llama multilazybuffer fix (#4557 )	2024-05-12 14:19:21 -07:00
Patrick Tsai	0147174ad6	Embedding in one kernel (#4036 ) * Embedding is in one kernel * embedding is one kernel * rm extra line * newline * bert test counts state vars? * add a test? * move items around --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-04-02 11:38:21 -04:00
David Hou	ba6c041eab	fix SCE ignore_index with label_smoothing (#3574 ) * fix SCE ignore_index with label_smoothing * break up the line * only 3 cats in test * Revert "only 3 cats in test" This reverts commit `18be069c90`.	2024-03-01 22:19:45 -05:00
David Hou	b3cdc11a58	label_smoothing in sparse_cat_crossentropy (#3568 ) * label_smoothing in sparse_cat_crossentropy * test multiple values, assert	2024-03-01 20:02:46 -05:00
David Hou	e5385eecfc	UnsyncedBatchNorm with synced trainable weights for hlb cifar (#3472 ) * UnsyncedBatchNorm with synced trainable weights for hlb cifar * multitensor reshape tests * test mlb assign change axis * E501 * argfix axis * don't import batchnorm from hlb_cifar in test_multitensor * pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB * add backprop test for UnsyncedBatchNorm * break out MLB assign and reshape changes * manually shard running mean and running var * don't shard unless syncbn=0 * replace nn.BatchNorm2d with UnsyncedBatchNorm * don't increment num_batches_tracked if not tracking running stats * update tests * oops * Revert "oops" This reverts commit `5e8a67a535`. * Revert "update tests" This reverts commit `7ebf65d89a`. * Revert "don't increment num_batches_tracked if not tracking running stats" This reverts commit `78de0ea9ee`. * Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm" This reverts commit `d03da53da7`. * don't increment num_batched_tracked if not tracking running stats * oops * test_batchnorm_axis * compare against torch * types --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-29 22:52:07 -05:00
xarkes	28a8b72024	Remove Interpreted device & remaining CPU/TORCH ref (#3423 ) * Remove Interpreted device & remaining CPU/TORCH ref * Oops * supports_device was useful * Fix doc wording --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-16 00:30:21 -05:00
chenyu	e52a609240	make WINO a context var, and LATEWINO in hlb_cifar (#3161 )	2024-01-17 20:21:26 -05:00
chenyu	ee6a73826b	clean up test_nn.py (#3049 ) used Tensor.train decorator, reordered to always tinygrad instances first, and removed redundant idx cast	2024-01-08 18:45:03 -05:00
chenyu	3eb3664074	fix nn.Embedding with empty length input (#3048 )	2024-01-08 18:08:36 -05:00
George Hotz	1765849937	new lazy, benchmark (#2878 ) * lazy rewrite, try 2 * min fix tests * pass contig test * put broken pads back * move that to realize * no contig child fixes array packing * so wrong * now that's correct * base children * fix bind issues * disable to_image_idx * fix tests * that failure shouldn't break other tests * more fixes * fix torch * skip failing tests in CI * 1e-7 * half is broken * 1e-6 margin of error	2023-12-20 14:33:21 -08:00
chenyu	73cadfbb3c	Remove pytest markers (#2831 ) * remove pytest marker * fix some, skip some * tweak * fix * skip slow * skip more	2023-12-18 18:53:28 -05:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00

1 2

98 Commits