tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
chenyu	4552248c84	fix Tensor.to preserves grad.data (#3636 )	2024-03-06 21:44:49 -05:00
chenyu	d33311ebe0	remove parens of ALU if it has associative property (#3635 ) need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos, in which case cannot remove the parens of children	2024-03-06 21:12:11 -05:00
chenyu	fe6b6e38c1	remove parentheses of GEP if it's from SSA (#3634 ) fixed some bracket nesting level exceeded maximum of 256 errors	2024-03-06 20:22:46 -05:00
David Hou	0afaf70d57	lars optimizer + tests (#3631 ) * lars optimizer + tests * fix skip list! * use id to compare in skip list * go back to using set * Tensor(bool) * Tensor(bool) is and * don't lint external/mlperf_resnet * whitespace * add external_test_optim to opencl tests * give mlperf task a name * mlperf under onnx * remove track_gnorm * contiguous instead of realize * assert momentum and weight decay positive --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-06 18:11:01 -05:00
chenyu	b2e92d44fa	skip METAL sin test in test_dtype_alu (#3633 ) revert this part of #3629. this is flaky	2024-03-06 17:29:19 -05:00
chenyu	8f10bfa2ff	ban __bool__ on Tensor (#3632 ) * ban __bool__ on Tensor avoid misuse * test case * fix tests * fix more tests	2024-03-06 17:12:35 -05:00
George Hotz	81baf3eed3	bring ptx back (#3623 ) * bring ptx back * ptx back * fix define var * fix a few bugs * bugfixes * fixes * fix llvm bug * fix test bug	2024-03-06 13:34:21 -08:00
chenyu	c270d54c32	update test_dtype_alu for METAL (#3629 )	2024-03-06 14:55:19 -05:00
qazal	abc5f3a6a0	hip bf16 hotfix (#3630 ) * hip bf16 * remu dev mac * Revert "remu dev mac" This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd. * skip disk tests in CI * bring float8 back	2024-03-06 11:42:30 -08:00
chenyu	bc2a13a5f7	test case to show clang and python doing math in double (#3628 )	2024-03-06 13:49:03 -05:00
Elias Wahl	a1507c7fd4	Fix Tensor.dropout() with multigpu (#3619 ) * Tensor.rand with multilazybuffer * remove recursive + test * whitespace * another whitespace. Sorry * remove else * Conconicalize multidevice tuple + Remove src	2024-03-05 18:26:21 -05:00
George Hotz	8500265561	this mem fault still happening (#3620 ) * this mem fault still happening * smaller * that print doesn't work * overflows test * hip doesn't uses_ptr_arithmetic * only with locals * test overflow new name * it's not ptr arith * simpler * simple repro * old compiler * simpler * put that back	2024-03-05 10:39:32 -08:00
George Hotz	f500be1313	out of bounds access caused by launch bounds (#3615 ) * lin overflow * remove launch bounds * remove launch bounds infra * oops, fix bufs type	2024-03-05 06:34:00 -08:00
qazal	eb83e2d3a0	decouple buffer mutability from cstyle (#3617 ) * buffer mutability as an arg * update test_uops	2024-03-05 06:20:59 -08:00
chenyu	3275260c98	Revert "test: add failing bfloat16 test case for metal backend (#3481 )" (#3618 ) This reverts commit `1e12a2ae80`.	2024-03-05 09:08:42 -05:00
Skosh	1e12a2ae80	test: add failing bfloat16 test case for metal backend (#3481 ) * test: add failing bfloat16 test case for metal backend * test: move bfloat 16 test to dtypes test	2024-03-05 08:44:54 -05:00
chenyu	282bbd5acb	check the input length into argfix (#3610 ) * check the input length into argfix it's possible to overlook setting keyword for kwargs and argfix silently truncates input * add test	2024-03-04 19:50:17 -05:00
qazal	94679322a3	simpler float4 direct store and locals support (#3592 ) * swap vins instead * delete the upcast * leave it to remove_childless try 1 * Revert "leave it to remove_childless try 1" This reverts commit `bf25e935f8`. * try 2, simpler * Revert "try 2, simpler" This reverts commit `d2472af711`. * add note	2024-03-04 06:28:28 -08:00
chenyu	968d109453	apply more create_lt_node (#3597 ) updated one in linearizer if condition, and various symbolic tests	2024-03-03 16:12:39 -05:00
Patrick Tsai	bc562c4747	Python div alu behavior differs slightly from others (#3596 ) * Divide op rounding for negatives * extra space --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-03-03 10:48:25 -08:00
Marcin Słowik	56d21d77b3	Fix two bugs concerning Tensor.to. (#3593 ) 1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device. 2. Tensor.to result was missing graph, even though requires_grad and grad were propagated . Add corresponding tests.	2024-03-03 08:48:56 -08:00
Patrick Tsai	0082300a59	Fix symbolic negative floordiv (#3594 ) Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-03-03 11:40:52 -05:00
chenyu	e09619ab6c	explicitly create_lt_node when used in shapetracker _expr_view (#3561 ) * explicitly create_lt_node when used in shapetracker leave regular __lt__ and cmps for symbolic shape cmp * hmm it fixed that? * LtNode.substitute uses create_lt_node	2024-03-03 10:08:21 -05:00
reddyn12	660df3cff1	Add test for .softmax.argmax (#3559 ) * Add broken test for known issue * skip PYTHON * skip PYTHON * fix commit --------- Co-authored-by: schlimeszn <schlimeszn@gmail.com> Co-authored-by: reddyn <nikidsniper@gmail.com>	2024-03-02 20:51:52 -08:00
qazal	a89afd4ffa	Directly store float4 nodes (#3564 ) * float4 cast collapse * simplify cstyle * simplify uoptimizer * ci --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-02 15:58:20 -08:00
Francis Lam	162dfb07d9	fuzz_linearizer: fix uops and add to test.yml (#3588 )	2024-03-02 15:03:42 -08:00
Jovan Sardinha	8978488565	add sanity tests for bufs_from_lin (#3586 )	2024-03-02 14:17:43 -08:00
George Hotz	aa9b013d79	add constant folding for WHERE in uops (#3584 ) * add constant folding for WHERE in uops * prereqs for generic constant folding * fix test * disable slow overflow logic * make that test faster	2024-03-02 10:37:14 -08:00
George Hotz	83530a585f	add quick external data select test	2024-03-02 05:38:32 -08:00
George Hotz	41f0a25b53	lazy.py: cache consts (#3577 ) * lazy.py: cache consts * add regression test * always always cache const * bump by 1	2024-03-02 03:50:05 -08:00
David Hou	ba6c041eab	fix SCE ignore_index with label_smoothing (#3574 ) * fix SCE ignore_index with label_smoothing * break up the line * only 3 cats in test * Revert "only 3 cats in test" This reverts commit `18be069c90`.	2024-03-01 22:19:45 -05:00
David Hou	b3cdc11a58	label_smoothing in sparse_cat_crossentropy (#3568 ) * label_smoothing in sparse_cat_crossentropy * test multiple values, assert	2024-03-01 20:02:46 -05:00
George Hotz	6b29c70b3d	Refactor to UOpGraph class (#3566 ) * Refactor to UOpGraph class * fix test	2024-03-01 15:14:48 -08:00
chenyu	48d22067ca	clean up test_linearizer_failures (#3562 ) * cleanup test_linearizer_failures * fix test_failure_8 * fix that * better assert message	2024-03-01 15:57:17 -05:00
David Hou	d16aa89561	don't allow MLB assigns with different axes (#3557 ) * allow LB <- MLB assign, but don't reuse buffer * update test * update test * assign assert axes are the same * update tests to manually shard running stats * unused import	2024-03-01 07:59:06 -05:00
chenyu	cfd23f398d	Revert "don't allow MLB assigns with different axes (#3483 )" (#3554 ) This reverts commit `f19d8bb7b4`.	2024-02-29 23:13:07 -05:00
David Hou	f19d8bb7b4	don't allow MLB assigns with different axes (#3483 ) * allow LB <- MLB assign, but don't reuse buffer * update test * update test * assign assert axes are the same	2024-02-29 23:04:12 -05:00
David Hou	e5385eecfc	UnsyncedBatchNorm with synced trainable weights for hlb cifar (#3472 ) * UnsyncedBatchNorm with synced trainable weights for hlb cifar * multitensor reshape tests * test mlb assign change axis * E501 * argfix axis * don't import batchnorm from hlb_cifar in test_multitensor * pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB * add backprop test for UnsyncedBatchNorm * break out MLB assign and reshape changes * manually shard running mean and running var * don't shard unless syncbn=0 * replace nn.BatchNorm2d with UnsyncedBatchNorm * don't increment num_batches_tracked if not tracking running stats * update tests * oops * Revert "oops" This reverts commit `5e8a67a535`. * Revert "update tests" This reverts commit `7ebf65d89a`. * Revert "don't increment num_batches_tracked if not tracking running stats" This reverts commit `78de0ea9ee`. * Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm" This reverts commit `d03da53da7`. * don't increment num_batched_tracked if not tracking running stats * oops * test_batchnorm_axis * compare against torch * types --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-29 22:52:07 -05:00
George Hotz	bd9c2ced07	define var can be removed from vars to keep (#3549 ) * define var can be removed * sint * oops, didn't store	2024-02-29 17:44:19 -08:00
George Hotz	83cdc85790	add index to DEFINE_GLOBAL (#3542 ) * remove DEFINE_GLOBAL from uops with side effects * add index to DEFINE_GLOBAL * bugfix * better var name	2024-02-29 15:22:26 -08:00
Francis Lam	5d434801fa	search: add tensor core to beam search space (#3275 ) * search: add tensor core to beam search space * kernel: refactor apply_tensor_core into apply_opt and hand_coded * kernel: revert removal of apply_tensor_cores also revert BEAM search parameter changes	2024-02-29 13:05:10 -08:00
Marcin Słowik	f90caa4b92	Escape table name in diskcache queries. (#3543 ) Some devices create cache table names with non-alphanumerical characters, e.g. "compile_hip_gfx1010:xnack-_12". This commit escapes the table name in single quotes s.t. sqlite works (see https://github.com/tinygrad/tinygrad/issues/3538).	2024-02-29 13:04:21 -08:00
geohotstan	9268a8b154	remove MULACC (#3459 ) * init * removed mulacc * is uoptimize the problem? * lol hax make work temporarily fix l8er * revert extra/ changes * clean up * flaky metal tests? * add back mulacc for metal * revert last commit * try skipping linearizer_failure tests * skip flammit tests... cuz tests all work locally * try narrow down exact linearizer failure test * try 2 * try 4 * generated code is the exact same wtf why CI fails * code for 15 and 17 are exact same with or without mulacc, this should pass * try only 1 failure * try garbage collecting lol... * try del variables lol * try gcing after del lol... * is diskcache the problem??? * try disabling opts cache idk * try remove hack * try disable github metal cache... * try CACHELEVEL=0 :D idk anymore * try increase newCommandQueueWithMaxCommandBufferCount_, im almost out of ideas... * revert * actually not a HACK * oops	2024-02-29 07:40:40 -05:00
qazal	94fc0fd546	uop the float4 acc upcast in group_for_reduce kernels (#3466 ) * simplest one * but i can trust this will be cached correctly * wait that was wrong too * cleanup * test_reduce_upcast for single reduce case * a late accumulator always outputs to gds lint	2024-02-28 17:33:47 -08:00
George Hotz	48918fa75a	fix disktensor offset issue (#3532 )	2024-02-28 17:22:17 -08:00
David Friehs	275971e616	fix: align .split, .chunk and .unsqueeze with torch, add fuzz tests (#3505 ) this fixes .split where self.shape[dim] is not perfectly divisible by sizes - .chunk is always the wrong choice here: - tensor((5,)).split(4) should result in (tensor((4,)), tensor((1,))) was (tensor((3,)), tensor((2,))) this also fixes issues in .split and .chunk where tensors with shape[dim]==0 lead to empty tuples/lists when the tensor itself should have been returned instead because tinygrad is expected to fail in all cases where torch fails tinygrad will now be strict regarding sizes having to sum up to passed dimension in .split, num having to be non-null for .chunk and only allowing valid dims in .unsqueeze	2024-02-28 17:06:39 -08:00
chenyu	0c6846f9fc	failed test case for disk tensor assign into dtype int64 (#3527 ) failed case for #3510, mark as expectedFailure for now	2024-02-28 17:52:21 -05:00
chenyu	d89e3c4e08	enable METAL tests now runner is M1 and no fast-math (#3523 )	2024-02-28 14:14:23 -05:00
chenyu	1136e2a82a	`skipIf(not(` -> `skipUnless(` in test_linearizer_failures (#3519 ) if these behaves weirdly in CI might need to disable them in CI	2024-02-28 13:48:47 -05:00
chenyu	2127c1c6c2	test for the split reduce kernel (#3515 ) somehow this was not tested	2024-02-27 21:29:25 -05:00

1 2 3 4 5 ...

1459 Commits