tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 01:18:26 -05:00

Author	SHA1	Message	Date
George Hotz	9635592141	** rangeify, try 3 (#11683 ) * ** rangeify, try 3 * bring that over * bufferize, don't use contig tag * work * ish * fix rangeify * flash attention is back * fix rangeify tests * stuff passes * fix test_log_softmax * more stuff passes * progress children * new endrange solution * progress * progress counter * basic assign * contigs only * symbolic in schedule * unbind_kernel * late children * ops fixed * beautiful mnist is close * that seems to work * mnist works * improve names * fix bmnist * no pcontig * testing backward * work * clone movement ops * new_range helper * MBLOCK/MERGE * ops tests pass * revert mblock stuff * cleanups...but it breaks ops * remove reindex * hack for relu * disable the hacks * more hacks * upd * mostly works with cleanups disabled * ndr * ops tests pass * terrible hacks for indexing to work * context mismatch * pcontig * split pcontig v contig * z3 trunc * null * no fuse in rangeify * ops test passes * lnorm * fix assign * nd rangeify * both should work * tests for rangeify * cleanups * stores pass the pointer through * disable pcontig for now * PARTIAL_CONTIG is a flag	2025-08-20 14:22:44 -07:00
George Hotz	1d307f568c	move device tests to test/device + test cleanups (#11735 ) * move device tests to test/device * test speedups * test device * linalg to unit * upd * so pytest just works * more divide and skip * speed * test devectorize * add pillow	2025-08-19 16:02:20 -07:00
Sieds Lykles	01c770c77b	Fix z3 float cast in indexing (#11590 ) * adjust dtype of z3_renderer and add rule for cast * dtypes.bool is also cast noop * add regression test * make embedding smaller * even smaller test	2025-08-09 17:59:23 +02:00
chenyu	7ee3770961	FUSE_ARANGE=1 (#11427 ) * FUSE_ARANGE=1 * fix test --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-08-07 13:32:34 -04:00
chenyu	ace8e9a706	fix test_conv2d_winograd (#11511 )	2025-08-05 12:15:46 -04:00
chenyu	223aaa0492	clean up more conv tests (#11510 )	2025-08-05 12:15:30 -04:00
Garret Castro	76e62a1c23	extract conv layer test logic (#11488 ) * refactor: extract conv layer test logic * tuple is unnecessary * integrate _test_conv logic into all conv tests * fix linter, forgot dilation * undo winograd extraction adds too many if statements for a single case	2025-08-05 11:15:54 -04:00
chenyu	dbc7807c61	enable WEBGPU tests with buffer limit (#11489 ) TestSample still fails?	2025-08-03 13:02:44 -07:00
chenyu	a2f5a54458	move sparse_categorical_crossentropy to test_ops (#11083 ) also flattened the tests	2025-07-03 21:40:54 -04:00
chenyu	7c8ccb0267	sparse_categorical_crossentropy cleanup [pr] (#11082 )	2025-07-03 18:32:52 -04:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
वेदांत	2453d99050	rms matching pytorch implementation (#10319 ) * rms matching pytorch implementation * pre commit fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-17 08:23:11 -07:00
qazal	7cfe367c07	failing test for slow embedding kernel with FUSE_ARANGE=1 [pr] (#10330 )	2025-05-15 14:58:11 +03:00
chenyu	c8f47c1d07	not_support_multi_device helper (#9831 ) unify the test helper to skip ci device that does not support multi	2025-04-10 05:25:29 -04:00
chenyu	ba41076e94	update embedding test to not use dtypes.long [pr] (#9556 )	2025-03-23 21:33:38 -04:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
chenyu	e02e3b94c3	remove SQRT hack in llvm (#9067 ) replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward	2025-02-13 15:42:34 -05:00
George Hotz	33a1151f2f	Revert "match torch rmsnorm implementation (#6799 )" (#9052 ) This reverts commit `a66b8250e0`.	2025-02-13 14:42:45 +08:00
Ryan Dorrington	a66b8250e0	match torch rmsnorm implementation (#6799 ) * update rmsnorm to match torch implementation * run all tests * formatting * formatting * oneline * default to 1e-6 * restore old test * formatting * don't save elementwise_affine * your message * ignore webgpu --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-13 13:02:51 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
chenyu	1f730ae8f8	remove retain_graph in Tensor.backward [pr] (#8835 ) not used. gradient accumulation works directly	2025-01-31 13:41:26 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
George Hotz	b71c51191b	tests from remove uop mutability [pr] (#8442 ) * tests from remove uop mutability [pr] * more test fix * simpler test fix * remove that	2024-12-29 12:14:10 -05:00
George Hotz	bd9c015b09	tests from grad uop path [pr] (#8313 )	2024-12-18 09:25:05 -08:00
chenyu	40a4c603b9	remove more test skip for webgpu [pr] (#8192 )	2024-12-12 14:06:35 -05:00
Eitan Turok	56017c52a0	Raise error when model architecture does not match state dict (#7772 ) * init * style * style * style * fix test	2024-11-20 00:11:54 +08:00
chenyu	26200574dc	load_state_dict test cases when model and data shard differently (#7774 ) current behavior is weird... when model is sharded and state_dict is not, load shards the state_dict and model shard axis does not change. but if model and state_dict are sharded differently, model shard axis becomes the state_dict axis after load. it should either always use model shard axis or always use state_dict shard	2024-11-18 16:08:24 -05:00
George Hotz	205befa788	move is_dtype_supported to device [pr] (#7575 )	2024-11-07 20:38:03 +08:00
Ahmed Harmouche	36488a2a43	Use is_dtype_supported in more places in tests (#7529 )	2024-11-04 09:21:15 -05:00
George Hotz	c8bf09b7d4	s/UOps/Ops (#7500 ) * s/UOps/Ops [pr] * fix	2024-11-03 11:26:10 +08:00
Bhavya Gada	534597e753	fix all test warnings (#7024 ) * fix pytorch warning in nn.conv2d for same padding * fix future warning in torch load * fix overflow warning in tensor list test: https://github.com/numpy/numpy/issues/23606#issuecomment-1512752172 * fix floating point warnings in dtype tests using docs https://numpy.org/doc/stable/reference/generated/numpy.errstate.html and a neat solution https://stackoverflow.com/questions/53634965/change-np-seterr-behavior-inside-a-function-only * put err state in one place; comment taken care of by function hover * enter np errstate context manager on test setup * put decorator on class	2024-10-18 08:56:40 +08:00
Bhavya Gada	23c09f4b4c	add support for padding='same' in nn.conv (#6975 ) * add support for padding='same' in nn.conv * express concisely * simplify loop * test same padding with dilation and conv1d * fix bad indentation * make loop one liner	2024-10-11 11:39:07 +08:00
czhu	08bfa8632b	embedding shape (#6930 )	2024-10-08 14:42:20 +08:00
wozeparrot	c100f3d406	default threefry (#6116 )	2024-09-25 17:45:13 +08:00
Tim Becker	dfb818788e	Support `reduction` parameter in more loss functions (#6302 )	2024-09-07 05:11:20 +08:00
madt2709	4bb98d8882	Fix track_running_stats in batchnorm (#6200 ) * Fix track_running_stats in batchnorm * Fix linter * Update test_fold_conv_batchnorm_notrain to keep allowed at 1 * Add test_fold_conv_batchnorm_notrain_no_running_stats * Save 1 line	2024-08-20 14:01:22 -07:00
qazal	28c75bf2a6	merge uops with ops (#6111 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-08-16 18:17:57 -04:00
qazal	c23d44c779	AST is UOp (#6030 ) * most of the work from the uops2 branch * schedule * realize * kernel * lowerer * search * green * merge uops with ops * Revert "merge uops with ops" This reverts commit `1408a59f12`. * fix benchmark * remove extra dedup	2024-08-16 22:09:00 +03:00
George Hotz	64563abc90	add LSTMCell to nn (#6080 ) * add LSTMCell to nn * lstmcell works with no input on first * fix no bias 0 * simpler	2024-08-14 12:08:42 -07:00
George Hotz	fa7e734b49	MetaOps.KERNEL (#5543 )	2024-07-17 19:41:23 -07:00
George Hotz	a9f5a764dc	make BatchNorm work for 2D and 3D (#5477 ) * make BatchNorm work for 2D and 3D * beautiful mnist shouldn't use BatchNorm2d	2024-07-14 11:39:58 -07:00
George Hotz	6707c778d0	scheduleitem is not Tuple [run_process_replay] (#5425 ) * scheduleitem is not Tuple [run_process_replay] * fix tests * fix op + fuzzers * fix mop test	2024-07-12 15:13:19 -07:00
chenyu	b2c3a28a5e	nn.RMSNorm (#5272 ) the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize	2024-07-02 21:39:01 -04:00
Roelof van Dijk	975b811ad9	names shadowing builtins (#5179 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-27 08:15:01 -04:00
chenyu	4296507021	Tensor.sum returns in acc_dtype if specified (#5012 ) * Tensor.sum returns in acc_dtype if specified * skip PYTHON for now * revert that * relax that	2024-06-17 16:35:52 -04:00
chenyu	97b05f567e	revert the .detach() in layernorm (#4904 ) * revert the .detach() in layernorm it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm. Added backward tests for weights, bias and input for these norms. * bigger atol for llvm * relax backward more	2024-06-10 18:02:05 -04:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
George Hotz	afa9753d39	ruff cleanup (#4594 ) * check editor config * no editorconfig, it doesn't work * ruff cleanups	2024-05-14 21:16:14 -07:00
wozeparrot	d7670f8141	quantized llama multilazybuffer fix (#4557 )	2024-05-12 14:19:21 -07:00

1 2 3

109 Commits