tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
zku	2d702ca073	If feasible, do not truncate float64 down to float32 in cstyle renderer (#3420 ) * do not truncate float64 precision * use l suffix to try avoid overload confusion * long line, ruff bloats the function otherwise * fmt * remove long double suffix (l), it's sufficient to have the float32 (f) suffix to avoid function overload ambigouity; add test showcasing rtol=1e-12 precision increase, the test fails without the renderer changes * use more reasonable test values, same as test_int_to_float_unary_func * disable test for CUDACPU, does not support half and segfaults on some operations per dtypes_alu test * disable test for HIP, renderer does not support f64 precision * do not use noqa E501, break up condition	2024-02-16 10:08:59 +01:00
chenyu	30f26279c5	add back "CPU" in test_onnx_backend supports_device (#3426 ) the onnx tests were all skipped.	2024-02-16 00:49:30 -05:00
xarkes	28a8b72024	Remove Interpreted device & remaining CPU/TORCH ref (#3423 ) * Remove Interpreted device & remaining CPU/TORCH ref * Oops * supports_device was useful * Fix doc wording --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-16 00:30:21 -05:00
chenyu	6efa68f97b	remove use of TORCH in pre-commit (#3424 ) it's silently using DEFAULT after removing TORCH	2024-02-15 19:38:37 -05:00
geohotstan	5eb4c902f6	correct division dtype casting (#3405 ) * 新年快乐 * fix: exclude floordiv onnx tests * fix: less weird if statements in div * 龙年大吉 * fix: tempfix onnx div * fix: use reference impl for div	2024-02-15 19:34:40 -05:00
George Hotz	5de660ca0d	disk runner (prereq for interpreted removal) (#3421 ) * disk runner * simpler diskrunner	2024-02-15 18:14:05 +01:00
qazal	e1a57fe58a	test the behavior, not the implementation (#3419 )	2024-02-15 17:23:42 +01:00
George Hotz	b1c0d8c99d	remove cpu and torch backends (#3399 ) * remove cpu and torch backends * don't copy to cpu * use clang instead of cpu * multitensor gathers on the first device * clang is cpu + use default * fixup * bugfix	2024-02-15 16:55:39 +01:00
Obada Khalili	75f7e21a80	Make tests in `test/test_ops.py` pass for Python emulator (#3384 ) * fix OverflowError in UnaryOps.EXP2 * avoid accessing outputs for void uops * skip execution for UOps.IF and UOps.ENDIF * initialize bytearray to the correct size in UOps.DEFINE_LOCAL * validate len of input that has .sz > 1 * remove comment in code * reinitialize loop of already iterated * validate first value in input to be a list for inputs with .sz > 1 * add python ops tests to CI * skip long runtime tests for PYTHON backend * respect dtype.sz arg in UOps.CONST, and remove incorrect validation in UOps.STORE * use math.inf instead of float('int') * handle 0 args to UnaryOPs.LOG2 * handle load op with default of .sz > 1 * initialize the loop correctly using UOps.LOOP arg * remove unnecessary TODO comment * remove newline * select a subset of 22 ops tests to skip in CI when PYTHON=1 * handle gated UOps.LOAD referencing values that have .sz > 1 * Revert "select a subset of 22 ops tests to skip in CI when PYTHON=1" This reverts commit `7674fee81d`. * skip tests in python backend CI command * push fix lost in conflict resolve * Revert "skip long runtime tests for PYTHON backend" This reverts commit `5dd2a0376e`. * clear loop state after last iteration	2024-02-15 16:40:25 +01:00
Obada Khalili	18bb6a22e0	make tensors sizes smaller in maxpool2d tests (#3417 )	2024-02-15 15:53:52 +01:00
Maciej Fijalkowski	736c74b010	Rename .sz to .count on DType (#3413 ) * rename .sz for .count on dtype (and ANETensor for completeness) * revert the changes to extra, as per review * try to make linter happier * remove the change to extra	2024-02-15 15:03:49 +01:00
qazal	7919a1e6ec	dtypes: delete the float cast in realize.py (#3401 ) * remove float cast * cast scalars to the correct value in creation time * cast scalar in the correct place * wrong, use y_dtype * make consts have a unique cache key * add cast_scalar back * test_load_cache_const_bufs * add bool dtype * test_const_dtype * fix linters	2024-02-15 14:20:30 +01:00
nimlgen	002bf380b0	hsa runtime (#3382 ) * hsa init * handles transfer * linter * clean up hwqueue * fix sync freezes * print errors	2024-02-15 14:14:34 +01:00
George Hotz	93eceef727	remove cpu prereqs (#3410 )	2024-02-15 13:45:06 +01:00
George Hotz	a40df14fef	ops_ext to replace cpu import (#3409 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test * fix jit issue	2024-02-15 13:03:42 +01:00
George Hotz	ede4fd4705	hotfix: test_jit_copyin	2024-02-15 12:37:53 +01:00
George Hotz	6356474d6d	Revert "ops_ext to replace cpu import (#3406 )" (#3408 ) This reverts commit `91eb93f85a`.	2024-02-15 12:16:10 +01:00
George Hotz	91eb93f85a	ops_ext to replace cpu import (#3406 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test	2024-02-15 12:14:58 +01:00
qazal	49cb1fee54	run test_indexing on remu (#3404 ) * emulated ops_hip infra * add int4 * include test_indexing in remu * Revert "Merge branch 'remu-dev-mac'" This reverts commit `6870457e57`, reversing changes made to `3c4c8c9e16`.	2024-02-15 11:52:40 +01:00
qazal	9d4d63fcfc	dynamic tc function render (#3387 ) hip cant be done right now	2024-02-15 11:19:46 +01:00
chenyu	3c4c8c9e16	bump db version to 11 (#3398 ) followup after disabled fast math on metal.	2024-02-14 10:13:18 -05:00
qazal	27f4de2ce4	delete half_prekernel (#3388 ) * generic rendering of half and bf16 hotfix * fix uops + regression test * fix the test for metal's half4 * uop.uop fixup * mypy with --strict-equality, fix ops_gpu	2024-02-14 15:40:48 +01:00
chenyu	078a2603d5	set metal fast math default to 0 (disabled) (#3370 ) * set metal fast math default to 0 (disabled) It's a correctness fix because we use inf and nan. Let's see how slow it is * skip failed onnx tests * tmp DISABLE_COMPILER_CACHE=1 in metal benchmark * Revert "tmp DISABLE_COMPILER_CACHE=1 in metal benchmark" This reverts commit `22267df380`.	2024-02-14 11:42:33 +01:00
Francis Lam	668324d92b	wmma: protect TC locals from modification and use only LOCAL (#3379 ) also remove unnecesssary upcast_dim from tensor_core and calculate it from the dimensions and thread sizes	2024-02-13 10:19:35 +01:00
Francis Lam	f1ad01fd91	test_linearizer_failures: add new linearizer compile failure on METAL (#3380 )	2024-02-12 20:28:34 -05:00
George Hotz	ce1f9f5556	hotfix: new linearizer docs	2024-02-12 18:56:30 +01:00
George Hotz	2e60012bcf	move create schedule and delete old API (#3377 ) * move create schedule and delete old API * fix test multitensor	2024-02-12 18:10:45 +01:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
George Hotz	0f6cde243d	import from wino_cleanup (#3374 )	2024-02-12 16:26:50 +01:00
George Hotz	f47e297d4e	refactor: END -> ENDLOOP	2024-02-12 15:46:18 +01:00
George Hotz	29d68ae637	uops endif (#3372 ) * use is instead of == * add endif	2024-02-12 15:43:37 +01:00
George Hotz	1d45f3899d	use is instead of == (#3371 )	2024-02-12 15:35:55 +01:00
David Hou	323393b650	verbose apply_matrix (#3333 ) * verbose apply_matrix * types * not so verbose * small comment change * fix typo --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-12 12:06:12 +01:00
Jyotirmaya Mahanta	d55f99e881	patch merge_views (#3311 )	2024-02-12 11:53:55 +01:00
Jyotirmaya Mahanta	b6a2600c86	fix merging condition in merge_dims (#3363 ) * fix merging condition in merge_dims * add tests * set contiguous after mask is canonicalized * minor fix	2024-02-12 11:50:26 +01:00
qazal	c8fd66a131	Run RDNA3 tensor core tests in CI (#3367 ) * add test_linearizer * skip test_padto_matmul	2024-02-11 19:54:06 -05:00
chenyu	f798b60338	add METAL_FAST_MATH env var to disable metal fast math (#3369 ) * env var METAL_FAST_MATH to disable fastmath for metal use this to test impact of fast math. might need to disable compiler cache with DISABLE_COMPILER_CACHE * failed onnx test with fast math METAL_FAST_MATH=0 DISABLE_COMPILER_CACHE=1 NOOPT=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py -k test_MaxPool3d_stride_padding_cpu	2024-02-11 04:26:09 -05:00
chenyu	1156a27619	cleanup atol in test_ops (#3368 ) removed the explicit set value if it's the same as default 1e-6, or higher but can be set to default.	2024-02-10 19:44:44 -05:00
Yoshinori Sano	98c732cf9d	fix metal compile error in extra/gemm (#3365 )	2024-02-10 12:54:41 +01:00
George Hotz	d1fb1e0ba4	full sync to fix HIP memory leak (#3364 )	2024-02-10 11:50:27 +01:00
Francis Lam	ddb22a60c8	linearizer: fix up edge case bugs in UNROLL opt (#3362 ) Fully UNROLLing the first_reduce should not change the number of local_dims. Fully UNROLLing a GROUP dim should reduce the number of group_for_reduces by one. Also changed group_for_reduces to be a count as the axis number isn't used anywhere (they are always the first reduce dims).	2024-02-10 11:49:25 +01:00
George Hotz	dc82ef6660	hotfix: swap HIP/CUDA bringup order to prevent delay on tinybox	2024-02-09 18:41:25 +01:00
andresgit	28ba1c5406	fix Tensor.randint ignoring kwargs (#3350 ) * fix Tensor.randint ignoring kwargs * randint kwargs fix	2024-02-09 17:12:16 +01:00
Francis Lam	ce21fdfb67	ops_python: add HIP tensor core mock and refactor METAL (#3354 ) * ops_python: add HIP tensor core mock and refactor METAL * Add tests to CI * add DEBUG=2 to full tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-09 12:46:06 +01:00
George Hotz	b385234961	oops, change to 3.12 (#3357 )	2024-02-09 12:21:06 +01:00
George Hotz	7726eef464	ops_python: add image support (#3356 ) * ops_python: add image support * uops tests in their own CI * fix ci	2024-02-09 12:02:06 +01:00
George Hotz	5f93061f67	ops_python: gated load support (#3355 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style * add gated load support to PYTHON * out of bounds error message * cleaner	2024-02-09 11:16:25 +01:00
chenyu	c151131d1b	update onnx tests that no longer fail on CI (#3353 ) was debugging fast math and turned out it passed on CI now. more like a bug in CI	2024-02-08 21:19:00 -05:00
chenyu	7c1c6efee5	exclude half with PYTHON in test_dtype.is_dtype_supported (#3351 ) half memoryview only in 3.12+. rest of the test_dtype (bounty) seems to be legit issue in ops_python.	2024-02-08 20:10:25 -05:00
George Hotz	c32ea95d7d	Python uop emulator (#3327 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style	2024-02-08 19:24:55 +01:00

1 2 3 4 5 ...

3592 Commits