tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-06 12:44:58 -05:00

Author	SHA1	Message	Date
Obada Khalili	18bb6a22e0	make tensors sizes smaller in maxpool2d tests (#3417 )	2024-02-15 15:53:52 +01:00
Maciej Fijalkowski	736c74b010	Rename .sz to .count on DType (#3413 ) * rename .sz for .count on dtype (and ANETensor for completeness) * revert the changes to extra, as per review * try to make linter happier * remove the change to extra	2024-02-15 15:03:49 +01:00
qazal	7919a1e6ec	dtypes: delete the float cast in realize.py (#3401 ) * remove float cast * cast scalars to the correct value in creation time * cast scalar in the correct place * wrong, use y_dtype * make consts have a unique cache key * add cast_scalar back * test_load_cache_const_bufs * add bool dtype * test_const_dtype * fix linters	2024-02-15 14:20:30 +01:00
nimlgen	002bf380b0	hsa runtime (#3382 ) * hsa init * handles transfer * linter * clean up hwqueue * fix sync freezes * print errors	2024-02-15 14:14:34 +01:00
George Hotz	93eceef727	remove cpu prereqs (#3410 )	2024-02-15 13:45:06 +01:00
George Hotz	a40df14fef	ops_ext to replace cpu import (#3409 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test * fix jit issue	2024-02-15 13:03:42 +01:00
George Hotz	ede4fd4705	hotfix: test_jit_copyin	2024-02-15 12:37:53 +01:00
George Hotz	6356474d6d	Revert "ops_ext to replace cpu import (#3406 )" (#3408 ) This reverts commit `91eb93f85a`.	2024-02-15 12:16:10 +01:00
George Hotz	91eb93f85a	ops_ext to replace cpu import (#3406 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test	2024-02-15 12:14:58 +01:00
qazal	49cb1fee54	run test_indexing on remu (#3404 ) * emulated ops_hip infra * add int4 * include test_indexing in remu * Revert "Merge branch 'remu-dev-mac'" This reverts commit `6870457e57`, reversing changes made to `3c4c8c9e16`.	2024-02-15 11:52:40 +01:00
qazal	9d4d63fcfc	dynamic tc function render (#3387 ) hip cant be done right now	2024-02-15 11:19:46 +01:00
chenyu	3c4c8c9e16	bump db version to 11 (#3398 ) followup after disabled fast math on metal.	2024-02-14 10:13:18 -05:00
qazal	27f4de2ce4	delete half_prekernel (#3388 ) * generic rendering of half and bf16 hotfix * fix uops + regression test * fix the test for metal's half4 * uop.uop fixup * mypy with --strict-equality, fix ops_gpu	2024-02-14 15:40:48 +01:00
chenyu	078a2603d5	set metal fast math default to 0 (disabled) (#3370 ) * set metal fast math default to 0 (disabled) It's a correctness fix because we use inf and nan. Let's see how slow it is * skip failed onnx tests * tmp DISABLE_COMPILER_CACHE=1 in metal benchmark * Revert "tmp DISABLE_COMPILER_CACHE=1 in metal benchmark" This reverts commit `22267df380`.	2024-02-14 11:42:33 +01:00
Francis Lam	668324d92b	wmma: protect TC locals from modification and use only LOCAL (#3379 ) also remove unnecesssary upcast_dim from tensor_core and calculate it from the dimensions and thread sizes	2024-02-13 10:19:35 +01:00
Francis Lam	f1ad01fd91	test_linearizer_failures: add new linearizer compile failure on METAL (#3380 )	2024-02-12 20:28:34 -05:00
George Hotz	ce1f9f5556	hotfix: new linearizer docs	2024-02-12 18:56:30 +01:00
George Hotz	2e60012bcf	move create schedule and delete old API (#3377 ) * move create schedule and delete old API * fix test multitensor	2024-02-12 18:10:45 +01:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
George Hotz	0f6cde243d	import from wino_cleanup (#3374 )	2024-02-12 16:26:50 +01:00
George Hotz	f47e297d4e	refactor: END -> ENDLOOP	2024-02-12 15:46:18 +01:00
George Hotz	29d68ae637	uops endif (#3372 ) * use is instead of == * add endif	2024-02-12 15:43:37 +01:00
George Hotz	1d45f3899d	use is instead of == (#3371 )	2024-02-12 15:35:55 +01:00
David Hou	323393b650	verbose apply_matrix (#3333 ) * verbose apply_matrix * types * not so verbose * small comment change * fix typo --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-12 12:06:12 +01:00
Jyotirmaya Mahanta	d55f99e881	patch merge_views (#3311 )	2024-02-12 11:53:55 +01:00
Jyotirmaya Mahanta	b6a2600c86	fix merging condition in merge_dims (#3363 ) * fix merging condition in merge_dims * add tests * set contiguous after mask is canonicalized * minor fix	2024-02-12 11:50:26 +01:00
qazal	c8fd66a131	Run RDNA3 tensor core tests in CI (#3367 ) * add test_linearizer * skip test_padto_matmul	2024-02-11 19:54:06 -05:00
chenyu	f798b60338	add METAL_FAST_MATH env var to disable metal fast math (#3369 ) * env var METAL_FAST_MATH to disable fastmath for metal use this to test impact of fast math. might need to disable compiler cache with DISABLE_COMPILER_CACHE * failed onnx test with fast math METAL_FAST_MATH=0 DISABLE_COMPILER_CACHE=1 NOOPT=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py -k test_MaxPool3d_stride_padding_cpu	2024-02-11 04:26:09 -05:00
chenyu	1156a27619	cleanup atol in test_ops (#3368 ) removed the explicit set value if it's the same as default 1e-6, or higher but can be set to default.	2024-02-10 19:44:44 -05:00
Yoshinori Sano	98c732cf9d	fix metal compile error in extra/gemm (#3365 )	2024-02-10 12:54:41 +01:00
George Hotz	d1fb1e0ba4	full sync to fix HIP memory leak (#3364 )	2024-02-10 11:50:27 +01:00
Francis Lam	ddb22a60c8	linearizer: fix up edge case bugs in UNROLL opt (#3362 ) Fully UNROLLing the first_reduce should not change the number of local_dims. Fully UNROLLing a GROUP dim should reduce the number of group_for_reduces by one. Also changed group_for_reduces to be a count as the axis number isn't used anywhere (they are always the first reduce dims).	2024-02-10 11:49:25 +01:00
George Hotz	dc82ef6660	hotfix: swap HIP/CUDA bringup order to prevent delay on tinybox	2024-02-09 18:41:25 +01:00
andresgit	28ba1c5406	fix Tensor.randint ignoring kwargs (#3350 ) * fix Tensor.randint ignoring kwargs * randint kwargs fix	2024-02-09 17:12:16 +01:00
Francis Lam	ce21fdfb67	ops_python: add HIP tensor core mock and refactor METAL (#3354 ) * ops_python: add HIP tensor core mock and refactor METAL * Add tests to CI * add DEBUG=2 to full tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-09 12:46:06 +01:00
George Hotz	b385234961	oops, change to 3.12 (#3357 )	2024-02-09 12:21:06 +01:00
George Hotz	7726eef464	ops_python: add image support (#3356 ) * ops_python: add image support * uops tests in their own CI * fix ci	2024-02-09 12:02:06 +01:00
George Hotz	5f93061f67	ops_python: gated load support (#3355 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style * add gated load support to PYTHON * out of bounds error message * cleaner	2024-02-09 11:16:25 +01:00
chenyu	c151131d1b	update onnx tests that no longer fail on CI (#3353 ) was debugging fast math and turned out it passed on CI now. more like a bug in CI	2024-02-08 21:19:00 -05:00
chenyu	7c1c6efee5	exclude half with PYTHON in test_dtype.is_dtype_supported (#3351 ) half memoryview only in 3.12+. rest of the test_dtype (bounty) seems to be legit issue in ops_python.	2024-02-08 20:10:25 -05:00
George Hotz	c32ea95d7d	Python uop emulator (#3327 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style	2024-02-08 19:24:55 +01:00
Mason Mahaffey	3ebf7a3e38	reflect changes to shapetracker in doc printouts (#3349 )	2024-02-08 16:20:30 +01:00
Francis Lam	2266152b28	linearizer: added FUZZ_BEAM to fuzz_linearizer and additional tests (#3340 ) Fixed test_tensor_core_opts to test all the TCs. Added commented out failing tests in test_color_shapes_with_local.	2024-02-08 16:12:58 +01:00
chenyu	b110c4a7b8	explicitly set input low and high in test_ops (#3347 ) easier to set `(low, high)` than figuring out a,b for `(x+a)*b`. this pr kept the same input ranges	2024-02-08 04:11:45 -05:00
chenyu	d8ad9e5660	verify eval acc for hlb_cifar training (#3344 ) set to 93% to reduce flakiness for now	2024-02-07 19:19:59 -05:00
chenyu	0d2dacb549	test intermediate tensors created by function have same device as input (#3338 ) run on TORCH since it's the fastest one on CI. caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.	2024-02-07 09:24:36 -05:00
chenyu	1732f1ba83	fix import and long lines in view (#3337 )	2024-02-07 06:50:21 -05:00
chenyu	02636ff62d	re-enable test_reduce_0d_default int test case in test_dtype (#3336 )	2024-02-07 05:30:14 -05:00
chenyu	ca66be6a70	add failed Tensor.pow test cases (#3334 ) tried refactoring pow and found some bugs	2024-02-07 04:28:24 -05:00
chenyu	ea74856d99	remove some noqa: E501 in tensor (#3332 ) left ones in conv2d and wino, no E501 elsewhere in tensor. three functions need general readability improvement: getitem and gather, conv2d and wino, and pow	2024-02-07 00:03:05 -05:00

... 140 141 142 143 144 ...

10633 Commits