tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-05 20:24:57 -05:00

Author	SHA1	Message	Date
chenyu	0cef284aac	fix typing FlopCounter.flops can be sint (#3646 )	2024-03-07 12:49:17 -05:00
chenyu	906cc3a69b	cleanup tests Device[Device.DEFAULT] is always Compiled (#3645 )	2024-03-07 11:15:42 -05:00
qazal	bdd62c7fd8	make the bf16 include dynamic (#3642 ) * dynamic prefix * add common ones above these are common dtypes aesthetics * regression test fuzz it test * run in CI * use .append * faster	2024-03-07 10:31:35 -05:00
chenyu	4552248c84	fix Tensor.to preserves grad.data (#3636 )	2024-03-06 21:44:49 -05:00
chenyu	d33311ebe0	remove parens of ALU if it has associative property (#3635 ) need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos, in which case cannot remove the parens of children	2024-03-06 21:12:11 -05:00
chenyu	fe6b6e38c1	remove parentheses of GEP if it's from SSA (#3634 ) fixed some bracket nesting level exceeded maximum of 256 errors	2024-03-06 20:22:46 -05:00
David Hou	0afaf70d57	lars optimizer + tests (#3631 ) * lars optimizer + tests * fix skip list! * use id to compare in skip list * go back to using set * Tensor(bool) * Tensor(bool) is and * don't lint external/mlperf_resnet * whitespace * add external_test_optim to opencl tests * give mlperf task a name * mlperf under onnx * remove track_gnorm * contiguous instead of realize * assert momentum and weight decay positive --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-06 18:11:01 -05:00
chenyu	b2e92d44fa	skip METAL sin test in test_dtype_alu (#3633 ) revert this part of #3629. this is flaky	2024-03-06 17:29:19 -05:00
chenyu	8f10bfa2ff	ban __bool__ on Tensor (#3632 ) * ban __bool__ on Tensor avoid misuse * test case * fix tests * fix more tests	2024-03-06 17:12:35 -05:00
George Hotz	81baf3eed3	bring ptx back (#3623 ) * bring ptx back * ptx back * fix define var * fix a few bugs * bugfixes * fixes * fix llvm bug * fix test bug	2024-03-06 13:34:21 -08:00
chenyu	c270d54c32	update test_dtype_alu for METAL (#3629 )	2024-03-06 14:55:19 -05:00
qazal	abc5f3a6a0	hip bf16 hotfix (#3630 ) * hip bf16 * remu dev mac * Revert "remu dev mac" This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd. * skip disk tests in CI * bring float8 back	2024-03-06 11:42:30 -08:00
chenyu	bc2a13a5f7	test case to show clang and python doing math in double (#3628 )	2024-03-06 13:49:03 -05:00
George Hotz	568353fa84	hotfix: bump line count to 6500	2024-03-06 07:52:18 -08:00
Elias Wahl	a1507c7fd4	Fix Tensor.dropout() with multigpu (#3619 ) * Tensor.rand with multilazybuffer * remove recursive + test * whitespace * another whitespace. Sorry * remove else * Conconicalize multidevice tuple + Remove src	2024-03-05 18:26:21 -05:00
Jungwan Woo	e5ee6bb2bd	fix outdated url in showcase doc (#3624 )	2024-03-05 14:44:40 -08:00
George Hotz	8500265561	this mem fault still happening (#3620 ) * this mem fault still happening * smaller * that print doesn't work * overflows test * hip doesn't uses_ptr_arithmetic * only with locals * test overflow new name * it's not ptr arith * simpler * simple repro * old compiler * simpler * put that back	2024-03-05 10:39:32 -08:00
chenyu	3c3f846c45	tinybox benchmark with HSA (#3603 ) * tinybox benchmark with HSA * torch cuda init can fail * no TORCHCUDA * print torch version * LD_PRELOAD="/opt/rocm/lib/libhsa-runtime64.so"	2024-03-05 11:03:52 -05:00
George Hotz	f500be1313	out of bounds access caused by launch bounds (#3615 ) * lin overflow * remove launch bounds * remove launch bounds infra * oops, fix bufs type	2024-03-05 06:34:00 -08:00
qazal	eb83e2d3a0	decouple buffer mutability from cstyle (#3617 ) * buffer mutability as an arg * update test_uops	2024-03-05 06:20:59 -08:00
chenyu	3275260c98	Revert "test: add failing bfloat16 test case for metal backend (#3481 )" (#3618 ) This reverts commit `1e12a2ae80`.	2024-03-05 09:08:42 -05:00
Skosh	1e12a2ae80	test: add failing bfloat16 test case for metal backend (#3481 ) * test: add failing bfloat16 test case for metal backend * test: move bfloat 16 test to dtypes test	2024-03-05 08:44:54 -05:00
chenyu	957e9800f1	llama + beam to mac benchmark, full cifar to nvidia benchmark (#3612 ) would merge if it's also ~1 minute. btw why is gpt2 beam not slower in the first beam run?	2024-03-04 21:35:57 -05:00
chenyu	282bbd5acb	check the input length into argfix (#3610 ) * check the input length into argfix it's possible to overlook setting keyword for kwargs and argfix silently truncates input * add test	2024-03-04 19:50:17 -05:00
Elias Wahl	7db6dd725d	multilazybuffer fix (#3609 )	2024-03-04 17:36:23 -05:00
chenyu	c3b8d285aa	cleanup uops (#3605 ) using `is` to compare with enums, remove long lines and slightly more compact	2024-03-04 11:03:14 -05:00
qazal	94679322a3	simpler float4 direct store and locals support (#3592 ) * swap vins instead * delete the upcast * leave it to remove_childless try 1 * Revert "leave it to remove_childless try 1" This reverts commit `bf25e935f8`. * try 2, simpler * Revert "try 2, simpler" This reverts commit `d2472af711`. * add note	2024-03-04 06:28:28 -08:00
nimlgen	3db826e195	hsa in lin opts (#3602 )	2024-03-04 06:17:32 -08:00
Francis Lam	7c90005c65	search: hotfix to make sure TC behavior is all in applied_opts (#3598 ) * search: hotfix to make sure TC behavior is all in applied_opts * fix linter error * fix mypy	2024-03-03 21:44:38 -05:00
chenyu	8e5d60a322	add more gpt2 variant in mac/nvidia benchmark (#3599 )	2024-03-03 17:55:30 -05:00
chenyu	968d109453	apply more create_lt_node (#3597 ) updated one in linearizer if condition, and various symbolic tests	2024-03-03 16:12:39 -05:00
Patrick Tsai	bc562c4747	Python div alu behavior differs slightly from others (#3596 ) * Divide op rounding for negatives * extra space --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-03-03 10:48:25 -08:00
Marcin Słowik	56d21d77b3	Fix two bugs concerning Tensor.to. (#3593 ) 1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device. 2. Tensor.to result was missing graph, even though requires_grad and grad were propagated . Add corresponding tests.	2024-03-03 08:48:56 -08:00
Patrick Tsai	0082300a59	Fix symbolic negative floordiv (#3594 ) Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-03-03 11:40:52 -05:00
chenyu	e09619ab6c	explicitly create_lt_node when used in shapetracker _expr_view (#3561 ) * explicitly create_lt_node when used in shapetracker leave regular __lt__ and cmps for symbolic shape cmp * hmm it fixed that? * LtNode.substitute uses create_lt_node	2024-03-03 10:08:21 -05:00
nimlgen	640dc0fc51	hsa flush hdp (#3591 ) * hsa flush hdp * use _alloc()	2024-03-03 04:55:07 -08:00
reddyn12	660df3cff1	Add test for .softmax.argmax (#3559 ) * Add broken test for known issue * skip PYTHON * skip PYTHON * fix commit --------- Co-authored-by: schlimeszn <schlimeszn@gmail.com> Co-authored-by: reddyn <nikidsniper@gmail.com>	2024-03-02 20:51:52 -08:00
chenyu	ee41fafdab	use operator instead of lambda in python_alu (#3590 )	2024-03-02 19:33:21 -05:00
qazal	a89afd4ffa	Directly store float4 nodes (#3564 ) * float4 cast collapse * simplify cstyle * simplify uoptimizer * ci --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-02 15:58:20 -08:00
George Hotz	770707b376	hotfix: gpuocelot no rebuild	2024-03-02 15:57:38 -08:00
George Hotz	74c9acddb0	simple python ALU (#3589 ) * shorter * bugfix	2024-03-02 15:50:58 -08:00
Francis Lam	162dfb07d9	fuzz_linearizer: fix uops and add to test.yml (#3588 )	2024-03-02 15:03:42 -08:00
Jovan Sardinha	8978488565	add sanity tests for bufs_from_lin (#3586 )	2024-03-02 14:17:43 -08:00
George Hotz	aa9b013d79	add constant folding for WHERE in uops (#3584 ) * add constant folding for WHERE in uops * prereqs for generic constant folding * fix test * disable slow overflow logic * make that test faster	2024-03-02 10:37:14 -08:00
nimlgen	3b7e3fa2e4	fix sync in hsa graph (#3582 )	2024-03-02 07:37:51 -08:00
Szymon Ożóg	6c36264790	Improve type hints for optimizer (#3583 ) * Improve type hints for optimizer * lint fix	2024-03-02 07:35:44 -08:00
George Hotz	83530a585f	add quick external data select test	2024-03-02 05:38:32 -08:00
George Hotz	9a37273d36	consts don't have nodes in the graph (#3579 ) * consts don't have nodes in the graph * add idx	2024-03-02 04:19:11 -08:00
George Hotz	41f0a25b53	lazy.py: cache consts (#3577 ) * lazy.py: cache consts * add regression test * always always cache const * bump by 1	2024-03-02 03:50:05 -08:00
uuuvn	fb8acd1851	Don't touch UOps.DEFINE_GLOBAL (#3575 )	2024-03-02 03:30:05 -08:00

... 137 138 139 140 141 ...

10633 Commits