tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
chenyu	e1b2a82d89	fix st.real_size can be nagative if valid is always false (#3708 ) two followups after this. (1) if a buffer is never accessed in kernel, it can be removed from input (2) real_size can be smaller conditional on valid being true (the old validhack stuff)	2024-03-12 20:34:07 -04:00
Francis Lam	b6e2495fdd	kernel: limit shared memory usage when adding opts (#3705 ) * kernel: limit shared memory usage when adding opts * search: remove unnecessary limit on search space apply_opt will do the more correct check	2024-03-12 17:06:21 -04:00
George Hotz	2024b24f35	add some graph tests (#3702 ) * add some graph tests * PatternMatcher class * speedup * const cast test * fix tests * itertools chain	2024-03-12 09:49:47 -07:00
chenyu	f599c6e7f4	test output dtypes matche in test_ops (#3703 ) need to cast some torch output to int32 because torch default returns int64 for index related function close #2797	2024-03-12 12:44:40 -04:00
chenyu	02ca067bdf	use default_float.np to construct test data in test_ops (#3701 ) first step of #2797	2024-03-12 11:58:20 -04:00
Patrick Tsai	971d7f5d7c	O(n) arange attempt (#3530 ) * It works? * Clamp correctly * Refactor * Make code better * Undo some stuff * First step to trying to make floats work * Floats work in Python op but not metal because int div is different Python integerdivision was implemented as // which rounds towards negative infinity, but C integer division rounds towards 0 so there is an off-by-1 division error * arange does cumsum with ints and then multiplies by step This is so loop optimization can remain int only * Undo a lot of symbolic changes * Final check * Cleanup * There can be multiple phis * Fix multiple phi op removal * const sets dtype correctly * Fix bugs * Fix a couple bugs and add loop vars to resolve * missed one * Don't trim too many ops * Fix symbolic test * Use ones instead of full * Delete test * Lint passes * max node error * Small updates to loop logic * Remove unnecessary changes * We are getting somewhere * Simple case * Fix * rm, prn * Better * If NumNode doesn't work then continue * clamp is needed for arange(256) * Move everything into the optim fn * Replace correctly * Order optimizations better * Delete * mypy * Test for simplification * Rename * Fix test * update test description * Undo more * Cleanup * No replaced_ops map * Fix lint * AssertionError * back again * Reinstate assertion * Return true and make diff not as big * Bigger range for test * Change cumsum impl * fix bug * make big cumsum work * lint * Undo cumsum 2-stage removal * No while helper * optional min/max clamping * floats work * rm giant arange test * fix python cast None * Check phi parents * one phi allowed per where * Fix one phi per where * Rework iteration * Delete assertions * convert to int * Try mul -1 instead of neg for hip..? * Remove one phi per where requirements * one accum only * Lint * should simplify a loop at a time * Don't get rid of loop explcitly * Need to iterate backwards * lint * unary neg * Make optim work for onnx and sum_pad_collapse * Better message * filter alu ops correctly * Fix the limiter * lint and simplify * Add it back * off by one error * test wheres and phis * test max ops and non-if stuff * <= * cast_scalar * Oops * Change test * Pass loop uops instead of a modified map * Cut param transfer between linearizer and uops * Fix issues * Fix lint * fix efficientnet python 3.8 invalid syntax * distinct vars in seen_vars * accurate var names --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-11 16:09:20 -07:00
qazal	aec4c4f01b	linearizer ast as a tuple of lazyops (#3689 ) * multi store op linearizer * currently we do only one output per kernel * named opts	2024-03-11 15:39:04 -07:00
Skosh	e8c350fdac	fix: make Tensor.rand produce correct values for float16 (#3654 ) * fix: make Tensor.rand produce correct values for float16 Due to precision loss when casting to float16, the data distribution created by custom_random isnt correctly in the interval ]0, 1[, but instead in the interval ]0, 1], which causes the Tensor.randn to incorrectly generate values of infinity. The solution uses a scaling value to make sure the values stay under 1, when using half precision. Closes #3611 * update implementation to truncate to closest f16 value to 1 * chore: fix whitespace * test larger distribution --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-10 18:48:00 -04:00
George Hotz	44a67bf783	constant folding (#3675 ) * constant fold * bool math * fix ptx	2024-03-10 14:47:24 -07:00
George Hotz	25aede6fd9	truncate for exec_alu (#3674 )	2024-03-10 14:19:04 -07:00
Francis Lata	957ae9b594	Fix Tensor's __repr__ for printing out grad (#3673 ) * update check for Tensor's __repr__ with grad * add test for repr with grad bugfix	2024-03-10 17:04:29 -04:00
George Hotz	69ca7f7bf9	changes for teenygrad (#3665 ) * changes for teenygrad * upd * simpler test	2024-03-09 15:30:34 -08:00
Maximilian Wolf	8ae85b2cf5	add inference_mode context manager with decorator support (#3621 ) * add inference_mode context manager with decorator support * change val to mode for train and inference_mode * fix wrong rename	2024-03-09 08:38:26 -08:00
Obada Khalili	b5cbf1792a	Fix `Tensor.cumsum` when axis of length 0 is selected (#3473 ) * fix Tensor.cumsum when axis of length 0 is selected * add cumsum regression test * define padding left size in a seperate line	2024-03-09 08:26:41 -08:00
chenyu	915f98791c	use custom KernelOptError in kernel opt (#3661 ) be more specific about invalid kernel opt, used that in test_linearizer_failures. make BEAM kernel search work even with assertion disabled. `BEAM=2 python3 -O examples/llama.py --temperature=0 --count=10 --prompt="Hello." --timing`	2024-03-08 15:36:16 -05:00
George Hotz	ac02e7347d	ptx timing vs cuda timing (#3659 )	2024-03-08 10:17:49 -08:00
chenyu	e25879d50e	don't get new var_val for the same ast in fuzz_linearizer (#3657 ) fixed result comparison for kernels with variables	2024-03-08 09:49:24 -05:00
chenyu	1130c73844	add FUZZ_NTH to fuzz_linearizer (#3656 ) * add FUZZ_NTH to fuzz_linearizer also update tests in test_linearizer_failures to not just run on METAL * update failures for HIP/HSA * test_failure_21 LLVM PADTO	2024-03-08 09:16:49 -05:00
David Hou	9f66dcf718	PolynomialDecayWithWarmup + tests (#3649 ) * working PolynomialDecayWithWarmup + tests....... add lars_util.py, oops * keep lars_util.py as intact as possible, simplify our interface * whitespace * clean up * clean up * asserts * test polylr for full resnet training run * add comment * rename * fix do_optim * don't cast lr * info * calculate from train_files * skip it	2024-03-07 18:53:36 -05:00
chenyu	57df8e8d82	update fuzz_linearizer (#3648 ) included non-reduce kernel and kernel with variables. green msg when everything passed it's possible that creating rawbufs failed due to memory error, included that in failure cases	2024-03-07 18:41:22 -05:00
chenyu	b282a45e39	fix direct store float4 with same vin (#3652 ) In a kernel that stores expanded value, the vin of float4 can come from same source, and we only remove once in that case.	2024-03-07 18:11:50 -05:00
Zaffer	1853ec9a02	add tests for bfloat16 on HIP (#3638 ) * Fix bug in login functionality * Remove HSA backend test and add bfloat16 dtype tests that run in CI * Skip tests on HIPCPU * skip tests causing segfault on LLVM backend * Exclude bfloat16 tests causing segfaults in LLVM backend * move bf16 cast tests to only test on HIP	2024-03-07 10:45:36 -08:00
chenyu	906cc3a69b	cleanup tests Device[Device.DEFAULT] is always Compiled (#3645 )	2024-03-07 11:15:42 -05:00
qazal	bdd62c7fd8	make the bf16 include dynamic (#3642 ) * dynamic prefix * add common ones above these are common dtypes aesthetics * regression test fuzz it test * run in CI * use .append * faster	2024-03-07 10:31:35 -05:00
chenyu	4552248c84	fix Tensor.to preserves grad.data (#3636 )	2024-03-06 21:44:49 -05:00
chenyu	d33311ebe0	remove parens of ALU if it has associative property (#3635 ) need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos, in which case cannot remove the parens of children	2024-03-06 21:12:11 -05:00
chenyu	fe6b6e38c1	remove parentheses of GEP if it's from SSA (#3634 ) fixed some bracket nesting level exceeded maximum of 256 errors	2024-03-06 20:22:46 -05:00
David Hou	0afaf70d57	lars optimizer + tests (#3631 ) * lars optimizer + tests * fix skip list! * use id to compare in skip list * go back to using set * Tensor(bool) * Tensor(bool) is and * don't lint external/mlperf_resnet * whitespace * add external_test_optim to opencl tests * give mlperf task a name * mlperf under onnx * remove track_gnorm * contiguous instead of realize * assert momentum and weight decay positive --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-06 18:11:01 -05:00
chenyu	b2e92d44fa	skip METAL sin test in test_dtype_alu (#3633 ) revert this part of #3629. this is flaky	2024-03-06 17:29:19 -05:00
chenyu	8f10bfa2ff	ban __bool__ on Tensor (#3632 ) * ban __bool__ on Tensor avoid misuse * test case * fix tests * fix more tests	2024-03-06 17:12:35 -05:00
George Hotz	81baf3eed3	bring ptx back (#3623 ) * bring ptx back * ptx back * fix define var * fix a few bugs * bugfixes * fixes * fix llvm bug * fix test bug	2024-03-06 13:34:21 -08:00
chenyu	c270d54c32	update test_dtype_alu for METAL (#3629 )	2024-03-06 14:55:19 -05:00
qazal	abc5f3a6a0	hip bf16 hotfix (#3630 ) * hip bf16 * remu dev mac * Revert "remu dev mac" This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd. * skip disk tests in CI * bring float8 back	2024-03-06 11:42:30 -08:00
chenyu	bc2a13a5f7	test case to show clang and python doing math in double (#3628 )	2024-03-06 13:49:03 -05:00
Elias Wahl	a1507c7fd4	Fix Tensor.dropout() with multigpu (#3619 ) * Tensor.rand with multilazybuffer * remove recursive + test * whitespace * another whitespace. Sorry * remove else * Conconicalize multidevice tuple + Remove src	2024-03-05 18:26:21 -05:00
George Hotz	8500265561	this mem fault still happening (#3620 ) * this mem fault still happening * smaller * that print doesn't work * overflows test * hip doesn't uses_ptr_arithmetic * only with locals * test overflow new name * it's not ptr arith * simpler * simple repro * old compiler * simpler * put that back	2024-03-05 10:39:32 -08:00
George Hotz	f500be1313	out of bounds access caused by launch bounds (#3615 ) * lin overflow * remove launch bounds * remove launch bounds infra * oops, fix bufs type	2024-03-05 06:34:00 -08:00
qazal	eb83e2d3a0	decouple buffer mutability from cstyle (#3617 ) * buffer mutability as an arg * update test_uops	2024-03-05 06:20:59 -08:00
chenyu	3275260c98	Revert "test: add failing bfloat16 test case for metal backend (#3481 )" (#3618 ) This reverts commit `1e12a2ae80`.	2024-03-05 09:08:42 -05:00
Skosh	1e12a2ae80	test: add failing bfloat16 test case for metal backend (#3481 ) * test: add failing bfloat16 test case for metal backend * test: move bfloat 16 test to dtypes test	2024-03-05 08:44:54 -05:00
chenyu	282bbd5acb	check the input length into argfix (#3610 ) * check the input length into argfix it's possible to overlook setting keyword for kwargs and argfix silently truncates input * add test	2024-03-04 19:50:17 -05:00
qazal	94679322a3	simpler float4 direct store and locals support (#3592 ) * swap vins instead * delete the upcast * leave it to remove_childless try 1 * Revert "leave it to remove_childless try 1" This reverts commit `bf25e935f8`. * try 2, simpler * Revert "try 2, simpler" This reverts commit `d2472af711`. * add note	2024-03-04 06:28:28 -08:00
chenyu	968d109453	apply more create_lt_node (#3597 ) updated one in linearizer if condition, and various symbolic tests	2024-03-03 16:12:39 -05:00
Patrick Tsai	bc562c4747	Python div alu behavior differs slightly from others (#3596 ) * Divide op rounding for negatives * extra space --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-03-03 10:48:25 -08:00
Marcin Słowik	56d21d77b3	Fix two bugs concerning Tensor.to. (#3593 ) 1. Tensor.to should return self if device == self.device. This was not the case if provided with non-canonical name of self.device. 2. Tensor.to result was missing graph, even though requires_grad and grad were propagated . Add corresponding tests.	2024-03-03 08:48:56 -08:00
Patrick Tsai	0082300a59	Fix symbolic negative floordiv (#3594 ) Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-03-03 11:40:52 -05:00
chenyu	e09619ab6c	explicitly create_lt_node when used in shapetracker _expr_view (#3561 ) * explicitly create_lt_node when used in shapetracker leave regular __lt__ and cmps for symbolic shape cmp * hmm it fixed that? * LtNode.substitute uses create_lt_node	2024-03-03 10:08:21 -05:00
reddyn12	660df3cff1	Add test for .softmax.argmax (#3559 ) * Add broken test for known issue * skip PYTHON * skip PYTHON * fix commit --------- Co-authored-by: schlimeszn <schlimeszn@gmail.com> Co-authored-by: reddyn <nikidsniper@gmail.com>	2024-03-02 20:51:52 -08:00
qazal	a89afd4ffa	Directly store float4 nodes (#3564 ) * float4 cast collapse * simplify cstyle * simplify uoptimizer * ci --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-02 15:58:20 -08:00
Francis Lam	162dfb07d9	fuzz_linearizer: fix uops and add to test.yml (#3588 )	2024-03-02 15:03:42 -08:00

... 58 59 60 61 62 ...

4433 Commits