tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 23:48:01 -05:00

Author	SHA1	Message	Date
George Hotz	83530a585f	add quick external data select test	2024-03-02 05:38:32 -08:00
George Hotz	9a37273d36	consts don't have nodes in the graph (#3579 ) * consts don't have nodes in the graph * add idx	2024-03-02 04:19:11 -08:00
George Hotz	41f0a25b53	lazy.py: cache consts (#3577 ) * lazy.py: cache consts * add regression test * always always cache const * bump by 1	2024-03-02 03:50:05 -08:00
uuuvn	fb8acd1851	Don't touch UOps.DEFINE_GLOBAL (#3575 )	2024-03-02 03:30:05 -08:00
George Hotz	50e1445e60	Revert "allow overriding weight init for Linear (#3569 )" (#3576 ) This reverts commit `2d0973a852`.	2024-03-02 03:17:13 -08:00
David Hou	2d0973a852	allow overriding weight init for Linear (#3569 )	2024-03-02 03:16:04 -08:00
Francis Lam	9642a8f547	search: add BEAM UPCAST/LOCAL params and loosen TC criteria during BEAM (#3563 )	2024-03-02 03:11:25 -08:00
David Hou	ba6c041eab	fix SCE ignore_index with label_smoothing (#3574 ) * fix SCE ignore_index with label_smoothing * break up the line * only 3 cats in test * Revert "only 3 cats in test" This reverts commit `18be069c90`.	2024-03-01 22:19:45 -05:00
Francis Lam	e17f1821a7	wmma: add CUDA tensor core and fix test_speed_v_torch failure (#3544 )	2024-03-01 17:51:02 -08:00
David Hou	b3cdc11a58	label_smoothing in sparse_cat_crossentropy (#3568 ) * label_smoothing in sparse_cat_crossentropy * test multiple values, assert	2024-03-01 20:02:46 -05:00
George Hotz	6b29c70b3d	Refactor to UOpGraph class (#3566 ) * Refactor to UOpGraph class * fix test	2024-03-01 15:14:48 -08:00
chenyu	b7e555f6c0	run test_linearizer_failures on PYTHON backend (#3565 ) * run test_linearizer_failures on PYTHON backend only test 1, some have hanging issues and gated store is not implemented * --durations=20 * two less slow ones	2024-03-01 17:00:18 -05:00
chenyu	48d22067ca	clean up test_linearizer_failures (#3562 ) * cleanup test_linearizer_failures * fix test_failure_8 * fix that * better assert message	2024-03-01 15:57:17 -05:00
David Hou	d16aa89561	don't allow MLB assigns with different axes (#3557 ) * allow LB <- MLB assign, but don't reuse buffer * update test * update test * assign assert axes are the same * update tests to manually shard running stats * unused import	2024-03-01 07:59:06 -05:00
chenyu	cfd23f398d	Revert "don't allow MLB assigns with different axes (#3483 )" (#3554 ) This reverts commit `f19d8bb7b4`.	2024-02-29 23:13:07 -05:00
David Hou	f19d8bb7b4	don't allow MLB assigns with different axes (#3483 ) * allow LB <- MLB assign, but don't reuse buffer * update test * update test * assign assert axes are the same	2024-02-29 23:04:12 -05:00
chenyu	35d998efa8	disable flaky test_conv_beam in CI (#3553 ) might fail due to CL_OUT_OF_RESOURCES	2024-02-29 22:59:41 -05:00
David Hou	e5385eecfc	UnsyncedBatchNorm with synced trainable weights for hlb cifar (#3472 ) * UnsyncedBatchNorm with synced trainable weights for hlb cifar * multitensor reshape tests * test mlb assign change axis * E501 * argfix axis * don't import batchnorm from hlb_cifar in test_multitensor * pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB * add backprop test for UnsyncedBatchNorm * break out MLB assign and reshape changes * manually shard running mean and running var * don't shard unless syncbn=0 * replace nn.BatchNorm2d with UnsyncedBatchNorm * don't increment num_batches_tracked if not tracking running stats * update tests * oops * Revert "oops" This reverts commit `5e8a67a535`. * Revert "update tests" This reverts commit `7ebf65d89a`. * Revert "don't increment num_batches_tracked if not tracking running stats" This reverts commit `78de0ea9ee`. * Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm" This reverts commit `d03da53da7`. * don't increment num_batched_tracked if not tracking running stats * oops * test_batchnorm_axis * compare against torch * types --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-29 22:52:07 -05:00
George Hotz	5a6e151844	no barrier side effect (#3550 ) * no barrier side effect * finish barrier removal	2024-02-29 18:10:04 -08:00
George Hotz	bd9c2ced07	define var can be removed from vars to keep (#3549 ) * define var can be removed * sint * oops, didn't store	2024-02-29 17:44:19 -08:00
George Hotz	2c19ab6561	define var (#3548 ) * define var * remove vars from there * fix python symbolic ops * fix llvm * pypath	2024-02-29 16:43:27 -08:00
George Hotz	83cdc85790	add index to DEFINE_GLOBAL (#3542 ) * remove DEFINE_GLOBAL from uops with side effects * add index to DEFINE_GLOBAL * bugfix * better var name	2024-02-29 15:22:26 -08:00
chenyu	978a997d1f	print nvidia-smi in CI benchmark (#3546 )	2024-02-29 17:31:37 -05:00
Francis Lam	5d434801fa	search: add tensor core to beam search space (#3275 ) * search: add tensor core to beam search space * kernel: refactor apply_tensor_core into apply_opt and hand_coded * kernel: revert removal of apply_tensor_cores also revert BEAM search parameter changes	2024-02-29 13:05:10 -08:00
Marcin Słowik	f90caa4b92	Escape table name in diskcache queries. (#3543 ) Some devices create cache table names with non-alphanumerical characters, e.g. "compile_hip_gfx1010:xnack-_12". This commit escapes the table name in single quotes s.t. sqlite works (see https://github.com/tinygrad/tinygrad/issues/3538).	2024-02-29 13:04:21 -08:00
nimlgen	0afde98ba5	scan all gpu agents at launch (#3535 )	2024-02-29 09:37:37 -08:00
Mark McLoughlin	2e82c5b7a4	README: ops_cpu and ops_torch have been removed (#3539 ) Removed by pull #3399	2024-02-29 10:22:11 -05:00
nimlgen	b05776ef3e	fix addresses of dispatch packets (#3534 )	2024-02-29 05:43:55 -08:00
geohotstan	9268a8b154	remove MULACC (#3459 ) * init * removed mulacc * is uoptimize the problem? * lol hax make work temporarily fix l8er * revert extra/ changes * clean up * flaky metal tests? * add back mulacc for metal * revert last commit * try skipping linearizer_failure tests * skip flammit tests... cuz tests all work locally * try narrow down exact linearizer failure test * try 2 * try 4 * generated code is the exact same wtf why CI fails * code for 15 and 17 are exact same with or without mulacc, this should pass * try only 1 failure * try garbage collecting lol... * try del variables lol * try gcing after del lol... * is diskcache the problem??? * try disabling opts cache idk * try remove hack * try disable github metal cache... * try CACHELEVEL=0 :D idk anymore * try increase newCommandQueueWithMaxCommandBufferCount_, im almost out of ideas... * revert * actually not a HACK * oops	2024-02-29 07:40:40 -05:00
qazal	94fc0fd546	uop the float4 acc upcast in group_for_reduce kernels (#3466 ) * simplest one * but i can trust this will be cached correctly * wait that was wrong too * cleanup * test_reduce_upcast for single reduce case * a late accumulator always outputs to gds lint	2024-02-28 17:33:47 -08:00
George Hotz	48918fa75a	fix disktensor offset issue (#3532 )	2024-02-28 17:22:17 -08:00
Caleb Bunch	0b1fc5888a	fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py (#3531 )	2024-02-28 17:15:32 -08:00
David Friehs	275971e616	fix: align .split, .chunk and .unsqueeze with torch, add fuzz tests (#3505 ) this fixes .split where self.shape[dim] is not perfectly divisible by sizes - .chunk is always the wrong choice here: - tensor((5,)).split(4) should result in (tensor((4,)), tensor((1,))) was (tensor((3,)), tensor((2,))) this also fixes issues in .split and .chunk where tensors with shape[dim]==0 lead to empty tuples/lists when the tensor itself should have been returned instead because tinygrad is expected to fail in all cases where torch fails tinygrad will now be strict regarding sizes having to sum up to passed dimension in .split, num having to be non-null for .chunk and only allowing valid dims in .unsqueeze	2024-02-28 17:06:39 -08:00
George Hotz	e7cda40d52	Revert "hotfix: disable metal graph" This reverts commit `3541602877`.	2024-02-28 16:25:12 -08:00
George Hotz	42eb8de0d4	Revert "move all reduces to the end in lazy (#3475 )" (#3529 ) This reverts commit `2113e1eb63`. Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-28 16:24:10 -08:00
chenyu	0c6846f9fc	failed test case for disk tensor assign into dtype int64 (#3527 ) failed case for #3510, mark as expectedFailure for now	2024-02-28 17:52:21 -05:00
chenyu	d89e3c4e08	enable METAL tests now runner is M1 and no fast-math (#3523 )	2024-02-28 14:14:23 -05:00
chenyu	1136e2a82a	`skipIf(not(` -> `skipUnless(` in test_linearizer_failures (#3519 ) if these behaves weirdly in CI might need to disable them in CI	2024-02-28 13:48:47 -05:00
George Hotz	3541602877	hotfix: disable metal graph	2024-02-28 10:33:34 -08:00
George Hotz	c34d382a1e	bump to macos-14 M1 (#3520 ) * bump to macos-14 M1 * bump cache key * no -n auto * jit=2 * real tensor cores	2024-02-28 10:28:25 -08:00
George Hotz	505ac6ac96	Revert "check buffers are seeable by other gpu before transfer (#3504 )" (#3522 ) This reverts commit `db2cf48828`.	2024-02-28 10:26:27 -08:00
nimlgen	db2cf48828	check buffers are seeable by other gpu before transfer (#3504 )	2024-02-28 10:24:50 -08:00
wozeparrot	da32c37346	use hash as key for beam (#3516 ) * feat: use hash as key for beam * feat: bump db version	2024-02-28 10:19:01 -08:00
uuuvn	1f5c24798b	Raise exception if MTLCommandBuffer fails (#3465 )	2024-02-28 10:14:08 -08:00
nimlgen	08ef77c721	hsa multigpu graph (#3403 ) * init hsa multigraph * better handling of accesses to buffers * revert sdma0 only when copies from fd	2024-02-28 09:40:53 -08:00
chenyu	fa88e1d0d0	cleanup lazy reduce (#3517 ) * cleanup lazy reduce removed useless assert now arg is axis and cleaned split logic * stride can be symbolic with int shape	2024-02-28 08:15:01 -05:00
chenyu	2127c1c6c2	test for the split reduce kernel (#3515 ) somehow this was not tested	2024-02-27 21:29:25 -05:00
nimlgen	94b7ac7a29	no cuda compile helper (#3512 )	2024-02-28 01:50:10 +01:00
chenyu	88939c3347	fix Node.max can be symbolic (#3514 ) Also made sure taking max twice can get int.	2024-02-27 17:21:31 -05:00
chenyu	969b57f0fe	enable symbolic_ops and jits test of two vars (#3513 )	2024-02-27 11:17:46 -05:00

1 2 3 4 5 ...

3687 Commits