tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
Francis Lam	7c90005c65	search: hotfix to make sure TC behavior is all in applied_opts (#3598 ) * search: hotfix to make sure TC behavior is all in applied_opts * fix linter error * fix mypy	2024-03-03 21:44:38 -05:00
chenyu	8e5d60a322	add more gpt2 variant in mac/nvidia benchmark (#3599 )	2024-03-03 17:55:30 -05:00
chenyu	968d109453	apply more create_lt_node (#3597 ) updated one in linearizer if condition, and various symbolic tests	2024-03-03 16:12:39 -05:00
Patrick Tsai	bc562c4747	Python div alu behavior differs slightly from others (#3596 ) * Divide op rounding for negatives * extra space --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-03-03 10:48:25 -08:00
Marcin Słowik	56d21d77b3	Fix two bugs concerning Tensor.to. (#3593 ) 1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device. 2. Tensor.to result was missing graph, even though requires_grad and grad were propagated . Add corresponding tests.	2024-03-03 08:48:56 -08:00
Patrick Tsai	0082300a59	Fix symbolic negative floordiv (#3594 ) Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-03-03 11:40:52 -05:00
chenyu	e09619ab6c	explicitly create_lt_node when used in shapetracker _expr_view (#3561 ) * explicitly create_lt_node when used in shapetracker leave regular __lt__ and cmps for symbolic shape cmp * hmm it fixed that? * LtNode.substitute uses create_lt_node	2024-03-03 10:08:21 -05:00
nimlgen	640dc0fc51	hsa flush hdp (#3591 ) * hsa flush hdp * use _alloc()	2024-03-03 04:55:07 -08:00
reddyn12	660df3cff1	Add test for .softmax.argmax (#3559 ) * Add broken test for known issue * skip PYTHON * skip PYTHON * fix commit --------- Co-authored-by: schlimeszn <schlimeszn@gmail.com> Co-authored-by: reddyn <nikidsniper@gmail.com>	2024-03-02 20:51:52 -08:00
chenyu	ee41fafdab	use operator instead of lambda in python_alu (#3590 )	2024-03-02 19:33:21 -05:00
qazal	a89afd4ffa	Directly store float4 nodes (#3564 ) * float4 cast collapse * simplify cstyle * simplify uoptimizer * ci --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-02 15:58:20 -08:00
George Hotz	770707b376	hotfix: gpuocelot no rebuild	2024-03-02 15:57:38 -08:00
George Hotz	74c9acddb0	simple python ALU (#3589 ) * shorter * bugfix	2024-03-02 15:50:58 -08:00
Francis Lam	162dfb07d9	fuzz_linearizer: fix uops and add to test.yml (#3588 )	2024-03-02 15:03:42 -08:00
Jovan Sardinha	8978488565	add sanity tests for bufs_from_lin (#3586 )	2024-03-02 14:17:43 -08:00
George Hotz	aa9b013d79	add constant folding for WHERE in uops (#3584 ) * add constant folding for WHERE in uops * prereqs for generic constant folding * fix test * disable slow overflow logic * make that test faster	2024-03-02 10:37:14 -08:00
nimlgen	3b7e3fa2e4	fix sync in hsa graph (#3582 )	2024-03-02 07:37:51 -08:00
Szymon Ożóg	6c36264790	Improve type hints for optimizer (#3583 ) * Improve type hints for optimizer * lint fix	2024-03-02 07:35:44 -08:00
George Hotz	83530a585f	add quick external data select test	2024-03-02 05:38:32 -08:00
George Hotz	9a37273d36	consts don't have nodes in the graph (#3579 ) * consts don't have nodes in the graph * add idx	2024-03-02 04:19:11 -08:00
George Hotz	41f0a25b53	lazy.py: cache consts (#3577 ) * lazy.py: cache consts * add regression test * always always cache const * bump by 1	2024-03-02 03:50:05 -08:00
uuuvn	fb8acd1851	Don't touch UOps.DEFINE_GLOBAL (#3575 )	2024-03-02 03:30:05 -08:00
George Hotz	50e1445e60	Revert "allow overriding weight init for Linear (#3569 )" (#3576 ) This reverts commit `2d0973a852`.	2024-03-02 03:17:13 -08:00
David Hou	2d0973a852	allow overriding weight init for Linear (#3569 )	2024-03-02 03:16:04 -08:00
Francis Lam	9642a8f547	search: add BEAM UPCAST/LOCAL params and loosen TC criteria during BEAM (#3563 )	2024-03-02 03:11:25 -08:00
David Hou	ba6c041eab	fix SCE ignore_index with label_smoothing (#3574 ) * fix SCE ignore_index with label_smoothing * break up the line * only 3 cats in test * Revert "only 3 cats in test" This reverts commit `18be069c90`.	2024-03-01 22:19:45 -05:00
Francis Lam	e17f1821a7	wmma: add CUDA tensor core and fix test_speed_v_torch failure (#3544 )	2024-03-01 17:51:02 -08:00
David Hou	b3cdc11a58	label_smoothing in sparse_cat_crossentropy (#3568 ) * label_smoothing in sparse_cat_crossentropy * test multiple values, assert	2024-03-01 20:02:46 -05:00
George Hotz	6b29c70b3d	Refactor to UOpGraph class (#3566 ) * Refactor to UOpGraph class * fix test	2024-03-01 15:14:48 -08:00
chenyu	b7e555f6c0	run test_linearizer_failures on PYTHON backend (#3565 ) * run test_linearizer_failures on PYTHON backend only test 1, some have hanging issues and gated store is not implemented * --durations=20 * two less slow ones	2024-03-01 17:00:18 -05:00
chenyu	48d22067ca	clean up test_linearizer_failures (#3562 ) * cleanup test_linearizer_failures * fix test_failure_8 * fix that * better assert message	2024-03-01 15:57:17 -05:00
David Hou	d16aa89561	don't allow MLB assigns with different axes (#3557 ) * allow LB <- MLB assign, but don't reuse buffer * update test * update test * assign assert axes are the same * update tests to manually shard running stats * unused import	2024-03-01 07:59:06 -05:00
chenyu	cfd23f398d	Revert "don't allow MLB assigns with different axes (#3483 )" (#3554 ) This reverts commit `f19d8bb7b4`.	2024-02-29 23:13:07 -05:00
David Hou	f19d8bb7b4	don't allow MLB assigns with different axes (#3483 ) * allow LB <- MLB assign, but don't reuse buffer * update test * update test * assign assert axes are the same	2024-02-29 23:04:12 -05:00
chenyu	35d998efa8	disable flaky test_conv_beam in CI (#3553 ) might fail due to CL_OUT_OF_RESOURCES	2024-02-29 22:59:41 -05:00
David Hou	e5385eecfc	UnsyncedBatchNorm with synced trainable weights for hlb cifar (#3472 ) * UnsyncedBatchNorm with synced trainable weights for hlb cifar * multitensor reshape tests * test mlb assign change axis * E501 * argfix axis * don't import batchnorm from hlb_cifar in test_multitensor * pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB * add backprop test for UnsyncedBatchNorm * break out MLB assign and reshape changes * manually shard running mean and running var * don't shard unless syncbn=0 * replace nn.BatchNorm2d with UnsyncedBatchNorm * don't increment num_batches_tracked if not tracking running stats * update tests * oops * Revert "oops" This reverts commit `5e8a67a535`. * Revert "update tests" This reverts commit `7ebf65d89a`. * Revert "don't increment num_batches_tracked if not tracking running stats" This reverts commit `78de0ea9ee`. * Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm" This reverts commit `d03da53da7`. * don't increment num_batched_tracked if not tracking running stats * oops * test_batchnorm_axis * compare against torch * types --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-29 22:52:07 -05:00
George Hotz	5a6e151844	no barrier side effect (#3550 ) * no barrier side effect * finish barrier removal	2024-02-29 18:10:04 -08:00
George Hotz	bd9c2ced07	define var can be removed from vars to keep (#3549 ) * define var can be removed * sint * oops, didn't store	2024-02-29 17:44:19 -08:00
George Hotz	2c19ab6561	define var (#3548 ) * define var * remove vars from there * fix python symbolic ops * fix llvm * pypath	2024-02-29 16:43:27 -08:00
George Hotz	83cdc85790	add index to DEFINE_GLOBAL (#3542 ) * remove DEFINE_GLOBAL from uops with side effects * add index to DEFINE_GLOBAL * bugfix * better var name	2024-02-29 15:22:26 -08:00
chenyu	978a997d1f	print nvidia-smi in CI benchmark (#3546 )	2024-02-29 17:31:37 -05:00
Francis Lam	5d434801fa	search: add tensor core to beam search space (#3275 ) * search: add tensor core to beam search space * kernel: refactor apply_tensor_core into apply_opt and hand_coded * kernel: revert removal of apply_tensor_cores also revert BEAM search parameter changes	2024-02-29 13:05:10 -08:00
Marcin Słowik	f90caa4b92	Escape table name in diskcache queries. (#3543 ) Some devices create cache table names with non-alphanumerical characters, e.g. "compile_hip_gfx1010:xnack-_12". This commit escapes the table name in single quotes s.t. sqlite works (see https://github.com/tinygrad/tinygrad/issues/3538).	2024-02-29 13:04:21 -08:00
nimlgen	0afde98ba5	scan all gpu agents at launch (#3535 )	2024-02-29 09:37:37 -08:00
Mark McLoughlin	2e82c5b7a4	README: ops_cpu and ops_torch have been removed (#3539 ) Removed by pull #3399	2024-02-29 10:22:11 -05:00
nimlgen	b05776ef3e	fix addresses of dispatch packets (#3534 )	2024-02-29 05:43:55 -08:00
geohotstan	9268a8b154	remove MULACC (#3459 ) * init * removed mulacc * is uoptimize the problem? * lol hax make work temporarily fix l8er * revert extra/ changes * clean up * flaky metal tests? * add back mulacc for metal * revert last commit * try skipping linearizer_failure tests * skip flammit tests... cuz tests all work locally * try narrow down exact linearizer failure test * try 2 * try 4 * generated code is the exact same wtf why CI fails * code for 15 and 17 are exact same with or without mulacc, this should pass * try only 1 failure * try garbage collecting lol... * try del variables lol * try gcing after del lol... * is diskcache the problem??? * try disabling opts cache idk * try remove hack * try disable github metal cache... * try CACHELEVEL=0 :D idk anymore * try increase newCommandQueueWithMaxCommandBufferCount_, im almost out of ideas... * revert * actually not a HACK * oops	2024-02-29 07:40:40 -05:00
qazal	94fc0fd546	uop the float4 acc upcast in group_for_reduce kernels (#3466 ) * simplest one * but i can trust this will be cached correctly * wait that was wrong too * cleanup * test_reduce_upcast for single reduce case * a late accumulator always outputs to gds lint	2024-02-28 17:33:47 -08:00
George Hotz	48918fa75a	fix disktensor offset issue (#3532 )	2024-02-28 17:22:17 -08:00
Caleb Bunch	0b1fc5888a	fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py (#3531 )	2024-02-28 17:15:32 -08:00

1 2 3 4 5 ...

3705 Commits