tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
David Hou	5cfcc2a8d7	support MLB reshaping on-axis for evenly sharded (#3484 ) * support MLB reshaping on-axis for evenly sharded * update test * not -> !=	2024-02-23 07:51:36 -05:00
David Hou	f513c37e64	support same uidx in multiple shape positions (#3205 ) * support same uidx in multiple shape positions * rename var * update comment * add contiguous index check to global_store too * update comment * small change * is this better? * smh * smaller change? * get rid of more changes * get rid of more changes * is this even making anything better * comment * fix test --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-21 19:37:03 +01:00
chenyu	1eb24af63b	fix softmax and log_softmax for 0d tensor (#3463 ) matched torch to take axis \in [-1, 0] and used axis=None internally	2024-02-21 11:30:30 -05:00
George Hotz	871ba73e65	_reduce_op is axis based now (#3462 ) * _reduce_op is axis based now * axis_ * update lin failures * disable that * fix shape	2024-02-21 16:36:31 +01:00
chenyu	0d326a48b8	fix LtNode simplification when lhs and rhs contain same variables (#3451 ) * fix LtNode simplification when lhs and rhs contain same variables `(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)` * fix with less perf impact	2024-02-20 09:06:55 -05:00
George Hotz	1b6e890ef2	uops flop counter (#3373 ) * factor out winograd functions * test counter * uops flop counter * more correct * ish * correct * cleanup * tests for uops flop counter * tests still fail * fix symbolic uops flop cnt * fix symbolic uops flop cnt * hmm, it's an alu * uops alu resolve * relax that	2024-02-20 09:36:30 +01:00
Patrick Tsai	9dd64b1f5f	Fix python cast uint/int overflow (#3448 ) * Fix numpy uint/int overflow * lol * Works * Update * Move overflow test to float64/float32 * One line * Update * One more --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-02-20 09:20:43 +01:00
chenyu	86efdf0b34	remove create_rednode (#3444 ) handle Node collapsing into NumNode similar to OpNode	2024-02-18 21:08:19 -05:00
chenyu	2da734920e	use __getnewargs__ to fix unpickling Variable (#3441 ) it's recommended to use __getnewargs__ to update the args of classes that use __new__ when unpickling. It's preferred because it does not change the __new__ behavior.	2024-02-18 10:28:37 -05:00
zku	2d702ca073	If feasible, do not truncate float64 down to float32 in cstyle renderer (#3420 ) * do not truncate float64 precision * use l suffix to try avoid overload confusion * long line, ruff bloats the function otherwise * fmt * remove long double suffix (l), it's sufficient to have the float32 (f) suffix to avoid function overload ambigouity; add test showcasing rtol=1e-12 precision increase, the test fails without the renderer changes * use more reasonable test values, same as test_int_to_float_unary_func * disable test for CUDACPU, does not support half and segfaults on some operations per dtypes_alu test * disable test for HIP, renderer does not support f64 precision * do not use noqa E501, break up condition	2024-02-16 10:08:59 +01:00
chenyu	30f26279c5	add back "CPU" in test_onnx_backend supports_device (#3426 ) the onnx tests were all skipped.	2024-02-16 00:49:30 -05:00
xarkes	28a8b72024	Remove Interpreted device & remaining CPU/TORCH ref (#3423 ) * Remove Interpreted device & remaining CPU/TORCH ref * Oops * supports_device was useful * Fix doc wording --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-16 00:30:21 -05:00
geohotstan	5eb4c902f6	correct division dtype casting (#3405 ) * 新年快乐 * fix: exclude floordiv onnx tests * fix: less weird if statements in div * 龙年大吉 * fix: tempfix onnx div * fix: use reference impl for div	2024-02-15 19:34:40 -05:00
qazal	e1a57fe58a	test the behavior, not the implementation (#3419 )	2024-02-15 17:23:42 +01:00
George Hotz	b1c0d8c99d	remove cpu and torch backends (#3399 ) * remove cpu and torch backends * don't copy to cpu * use clang instead of cpu * multitensor gathers on the first device * clang is cpu + use default * fixup * bugfix	2024-02-15 16:55:39 +01:00
Obada Khalili	18bb6a22e0	make tensors sizes smaller in maxpool2d tests (#3417 )	2024-02-15 15:53:52 +01:00
qazal	7919a1e6ec	dtypes: delete the float cast in realize.py (#3401 ) * remove float cast * cast scalars to the correct value in creation time * cast scalar in the correct place * wrong, use y_dtype * make consts have a unique cache key * add cast_scalar back * test_load_cache_const_bufs * add bool dtype * test_const_dtype * fix linters	2024-02-15 14:20:30 +01:00
George Hotz	93eceef727	remove cpu prereqs (#3410 )	2024-02-15 13:45:06 +01:00
George Hotz	a40df14fef	ops_ext to replace cpu import (#3409 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test * fix jit issue	2024-02-15 13:03:42 +01:00
George Hotz	ede4fd4705	hotfix: test_jit_copyin	2024-02-15 12:37:53 +01:00
George Hotz	6356474d6d	Revert "ops_ext to replace cpu import (#3406 )" (#3408 ) This reverts commit `91eb93f85a`.	2024-02-15 12:16:10 +01:00
George Hotz	91eb93f85a	ops_ext to replace cpu import (#3406 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test	2024-02-15 12:14:58 +01:00
qazal	27f4de2ce4	delete half_prekernel (#3388 ) * generic rendering of half and bf16 hotfix * fix uops + regression test * fix the test for metal's half4 * uop.uop fixup * mypy with --strict-equality, fix ops_gpu	2024-02-14 15:40:48 +01:00
chenyu	078a2603d5	set metal fast math default to 0 (disabled) (#3370 ) * set metal fast math default to 0 (disabled) It's a correctness fix because we use inf and nan. Let's see how slow it is * skip failed onnx tests * tmp DISABLE_COMPILER_CACHE=1 in metal benchmark * Revert "tmp DISABLE_COMPILER_CACHE=1 in metal benchmark" This reverts commit `22267df380`.	2024-02-14 11:42:33 +01:00
Francis Lam	668324d92b	wmma: protect TC locals from modification and use only LOCAL (#3379 ) also remove unnecesssary upcast_dim from tensor_core and calculate it from the dimensions and thread sizes	2024-02-13 10:19:35 +01:00
Francis Lam	f1ad01fd91	test_linearizer_failures: add new linearizer compile failure on METAL (#3380 )	2024-02-12 20:28:34 -05:00
George Hotz	2e60012bcf	move create schedule and delete old API (#3377 ) * move create schedule and delete old API * fix test multitensor	2024-02-12 18:10:45 +01:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
George Hotz	0f6cde243d	import from wino_cleanup (#3374 )	2024-02-12 16:26:50 +01:00
Jyotirmaya Mahanta	b6a2600c86	fix merging condition in merge_dims (#3363 ) * fix merging condition in merge_dims * add tests * set contiguous after mask is canonicalized * minor fix	2024-02-12 11:50:26 +01:00
qazal	c8fd66a131	Run RDNA3 tensor core tests in CI (#3367 ) * add test_linearizer * skip test_padto_matmul	2024-02-11 19:54:06 -05:00
chenyu	f798b60338	add METAL_FAST_MATH env var to disable metal fast math (#3369 ) * env var METAL_FAST_MATH to disable fastmath for metal use this to test impact of fast math. might need to disable compiler cache with DISABLE_COMPILER_CACHE * failed onnx test with fast math METAL_FAST_MATH=0 DISABLE_COMPILER_CACHE=1 NOOPT=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py -k test_MaxPool3d_stride_padding_cpu	2024-02-11 04:26:09 -05:00
chenyu	1156a27619	cleanup atol in test_ops (#3368 ) removed the explicit set value if it's the same as default 1e-6, or higher but can be set to default.	2024-02-10 19:44:44 -05:00
Francis Lam	ddb22a60c8	linearizer: fix up edge case bugs in UNROLL opt (#3362 ) Fully UNROLLing the first_reduce should not change the number of local_dims. Fully UNROLLing a GROUP dim should reduce the number of group_for_reduces by one. Also changed group_for_reduces to be a count as the axis number isn't used anywhere (they are always the first reduce dims).	2024-02-10 11:49:25 +01:00
andresgit	28ba1c5406	fix Tensor.randint ignoring kwargs (#3350 ) * fix Tensor.randint ignoring kwargs * randint kwargs fix	2024-02-09 17:12:16 +01:00
Francis Lam	ce21fdfb67	ops_python: add HIP tensor core mock and refactor METAL (#3354 ) * ops_python: add HIP tensor core mock and refactor METAL * Add tests to CI * add DEBUG=2 to full tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-09 12:46:06 +01:00
chenyu	c151131d1b	update onnx tests that no longer fail on CI (#3353 ) was debugging fast math and turned out it passed on CI now. more like a bug in CI	2024-02-08 21:19:00 -05:00
chenyu	7c1c6efee5	exclude half with PYTHON in test_dtype.is_dtype_supported (#3351 ) half memoryview only in 3.12+. rest of the test_dtype (bounty) seems to be legit issue in ops_python.	2024-02-08 20:10:25 -05:00
George Hotz	c32ea95d7d	Python uop emulator (#3327 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style	2024-02-08 19:24:55 +01:00
Francis Lam	2266152b28	linearizer: added FUZZ_BEAM to fuzz_linearizer and additional tests (#3340 ) Fixed test_tensor_core_opts to test all the TCs. Added commented out failing tests in test_color_shapes_with_local.	2024-02-08 16:12:58 +01:00
chenyu	b110c4a7b8	explicitly set input low and high in test_ops (#3347 ) easier to set `(low, high)` than figuring out a,b for `(x+a)*b`. this pr kept the same input ranges	2024-02-08 04:11:45 -05:00
chenyu	0d2dacb549	test intermediate tensors created by function have same device as input (#3338 ) run on TORCH since it's the fastest one on CI. caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.	2024-02-07 09:24:36 -05:00
chenyu	02636ff62d	re-enable test_reduce_0d_default int test case in test_dtype (#3336 )	2024-02-07 05:30:14 -05:00
chenyu	ca66be6a70	add failed Tensor.pow test cases (#3334 ) tried refactoring pow and found some bugs	2024-02-07 04:28:24 -05:00
chenyu	d9ef8e25b3	fix Tensor.var with 0 in reduce dim. (#3324 ) fix when correction is too big. it seems to only work when input size is 0 though. torch can output -inf in var when correction is too big, which does not make sense.	2024-02-05 20:59:13 -05:00
Obada Khalili	ee25f73283	Fix Tensor.mean to compute the mean correctly when 0-length axes are selected (#3318 ) * fix Tensor.mean to compute the mean correctly with 0-length axes are selected * add a regression test * rename sum variable to sum_t to avoid conflict with built it function * refactor Tensor.mean to has less lines	2024-02-05 01:40:37 -05:00
chenyu	97275101e9	fix safetensor load uint32 and uint64 (#3315 ) the correct keys are U32 and U64.	2024-02-04 10:46:27 -05:00
Yoshinori Sano	edb74897b2	support safe load bf16 (#3310 ) * support safe load bf16 * fix lint error E501 * add test for loading safetensors * key should be BOOL * fix lint	2024-02-04 10:08:39 -05:00
chenyu	d459956966	move TestGetContraction to test_helpers (#3313 ) also cleaned long lines in test_shapetracker and enabled the line length check	2024-02-04 06:05:01 -05:00
Obada Khalili	b4ea0e18e3	Fix dot product on buffers with zero strides (#3303 ) * skip matacc opt if the all src buffers of mul op are const buffers * add noqa directive for long test * unskip MALACC opt * ensure that a_axes at least includes summation axes in order to perform np.einsum correctly * add regression test for mulacc op * compute a_slices using a_axes * refactor helper of function to retrieve axes and slices for nonzero strides as well as summation axes * include a regression test that uses and to test the behaviour indirectly	2024-02-04 05:15:06 -05:00

1 2 3 4 5 ...

1399 Commits