tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-29 16:58:18 -05:00

Author	SHA1	Message	Date
chenyu	d736ae7153	example script to show BasicTransformerBlock speed regression (#7724 )	2024-11-15 15:48:25 -05:00
chenyu	aeb1301bab	enable a few tests that work now (#7721 ) should mark the ones that are expected to work with expectedFailure, and delete and ones that are not expected to work	2024-11-15 14:30:52 -05:00
ignaciosica	fc1e123138	minor cleanup in lazy.py (#7719 )	2024-11-15 13:48:24 -05:00
qazal	ef4f402946	add property to flag contig buffer uop [pr] (#7716 )	2024-11-15 22:27:47 +08:00
qazal	313af6d23c	assert buffer VIEW is void [pr] (#7715 )	2024-11-15 22:02:59 +08:00
ignaciosica	c37d142cf8	Refactor metal tc wmma kernel rendering (#7416 ) * refactor metal tc wmma kernel rendering * hotfix: bug * hotfix: hack to avoid backlash in f-string expression * hotfix * hotfix: rename vars * hotfix: moew new_line * hotfix: cleaner wmma rendering	2024-11-15 21:23:08 +08:00
qazal	bddee26114	Ops.VALID cleanup, move recursive tests [pr] (#7713 )	2024-11-15 20:22:46 +08:00
qazal	703a255301	use the method_cache in test_schedule [pr] (#7712 ) * use the method_cache in test_schedule [pr] * need half	2024-11-15 19:20:47 +08:00
qazal	88f760cc32	test_two_sum doesn't need del (#7711 )	2024-11-15 18:50:08 +08:00
George Hotz	9f98f0c93a	use disassemble method for objdump [pr] (#7708 )	2024-11-15 12:55:37 +08:00
George Hotz	9b1605eef9	Revert "objdump intel syntax (#7605 )" (#7707 ) This reverts commit `8f8e375f27`.	2024-11-15 12:13:04 +08:00
ttomsa	8f8e375f27	objdump intel syntax (#7605 ) * objdump intel syntax * test for objdump intel syntax * add disassemble to ClangCompiler and LLVMCompiler. Use just llvm-objdump * linter	2024-11-15 11:32:23 +08:00
chenyu	9cfc4f68c8	clean up Tensor.cat (#7701 )	2024-11-14 13:46:02 -05:00
chenyu	888fcb3643	Tensor.shrink arg cleanup (#7700 ) removed duplicated logic	2024-11-14 13:01:22 -05:00
chenyu	9fb396f660	test_ops maxpool2d -> max_pool2d (#7696 ) and avgpool2d -> avg_pool2d for better grepping the tests	2024-11-14 10:39:12 -05:00
ignaciosica	1419d8e58a	assert op is not store in view (#7679 ) * assert op is not store in view * update view spec * hotfix: nit --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-14 22:17:18 +08:00
Ahmed Harmouche	43040c0e24	add render_cast (#7687 )	2024-11-14 18:01:29 +08:00
geohotstan	f8056a74d6	combine pad2d with pad (#7677 ) * I have pad2d, I have pad, uuh~, pad2dpad~ * fix some small things * strategically placed cast hack * fix more * fix more more * tests * periods	2024-11-14 17:56:02 +08:00
qazal	3747669ab4	post 7655 schedule line savings [pr] (#7692 )	2024-11-14 17:20:41 +08:00
qazal	64ebaa72b5	schedule independent of lazy.py (#7655 ) * make it compile * allow allbufs * _recursive_group starts to work * forced_realize works * _get_isolated_children almost works * 80% * 90% * ocd behavior * 100% for _get_isolated_children * FUSE_CONV_BW=1 works * this took long * can be from buffer's arg too * eventually i'll share these * test_prefer_half_buffer * FUSE_ARANGE=1 sorta * start assign and cleanup fix assign * braindump * diff reset * --- day 3 --- * make _recursive_group work * very minimal groups * BASE * _get_isolated_children that actually works * working version of FUSE_CONV_BW=1 and prefer_half * FUSE_ARANGE=1 works * fix assign * one less problem	2024-11-14 17:01:59 +08:00
qazal	0914c2fec9	add TestLinearizerFailures test_failure_56 and test_failure_57 (#7682 ) * add test_failure_56 and test_failure_57 * so it's only METAL=1	2024-11-14 12:00:33 +08:00
qazal	a87813f063	hotfix: early fold image to image cast store (#7681 ) * hotfix: early fold image to image cast store * count out meta ops	2024-11-14 11:35:59 +08:00
chenyu	e0ad083904	user ceildiv in shard and fix a typo (#7690 )	2024-11-13 18:25:06 -05:00
chenyu	51afc3cc88	update env_vars doc on VIZ link (#7689 ) existing one throws 404 because mkdocs does not allow traverse above doc root (i think?). so for now just stick the github link to it	2024-11-13 17:28:14 -05:00
chenyu	333f5f9f8b	Tensor.bitwise_not (#7688 ) implemented with xor in tensor for now to not add another op. also used it in Tensor.min to fix dtype int on -2**31	2024-11-13 16:31:52 -05:00
chenyu	0423db8d00	simpler nll_loss (#7686 )	2024-11-13 15:10:08 -05:00
chenyu	fb933b79a6	add test case for nll_loss with input > 2D (#7685 ) * failed test case for nll_loss with input > 2D * fixed * add more	2024-11-13 14:34:07 -05:00
geohotstan	9c41c376d3	add Tensor.nll_loss (#7683 ) * move nll_loss to new branch * make nll_loss examples practical * self is * add to docs * small	2024-11-13 13:12:13 -05:00
chenyu	3c6fe4b79a	fix Tensor.bitwise_and and Tensor.bitwise_or to support bool (#7684 )	2024-11-13 13:10:39 -05:00
chenyu	3d82f8e340	simpler rand_like (#7680 )	2024-11-13 12:28:41 -05:00
Roelof van Dijk	e75a855f51	refactor: efficient syntax [pr] (#7673 )	2024-11-13 11:08:48 -05:00
Roelof van Dijk	433ebecee7	refactor: double if statement [pr] (#7674 )	2024-11-13 11:06:59 -05:00
James	d4e4a084a1	fix: Tensor min function for unsigned ints (#7675 ) * add failing tests for uint8 `min()` * fix unsigned data type min() * fix test data * fix whitespace --------- Co-authored-by: rezaarezvan <reza@rezvan.xyz> Co-authored-by: Jamesb <experimentallearning0@gmail.com>	2024-11-13 11:04:27 -05:00
chenyu	d1dfd598a2	assert specifying device to rand_like a multi tensor (#7678 ) * assert specifying device to rand_like a multi tensor raise RuntimeError instead of dropping it silently * fix that	2024-11-13 10:24:40 -05:00
chenyu	51432bfbff	add rand_like test case with device specified (#7663 ) in single device or copied multi case, device is applied. but for sharded case the device is silently ignored now. maybe similar to rand we just don't allow tuple device in rand_like	2024-11-13 09:32:55 -05:00
Reza Rezvan	23363dee55	Add: failing tests for uint8 `min()` (#7669 ) * add failing tests for uint8 `min()` * mark as expected failure	2024-11-13 22:12:53 +08:00
qazal	29508504ea	uop style prefer small dtype + cleanups [pr] (#7671 ) * just this * space * typing 2	2024-11-13 21:32:34 +08:00
qazal	e84d089ef1	delete ReduceOps, only use REDUCE_AXIS (#7667 )	2024-11-13 19:04:27 +08:00
qazal	217c006103	buffer access on UOp [pr] (#7665 ) * add .buffer access on uop * rename to buf_uop * start smaller * ptr != buffer!!	2024-11-13 17:04:19 +08:00
qazal	5da149d23c	uop can have base [pr] (#7666 )	2024-11-13 16:53:49 +08:00
qazal	ca99c67d78	refactors from the delete lazy diff [pr] (#7664 ) * dedup parent shapetrackers [pr] * arg -> dtype * move to ops * arg	2024-11-13 16:23:53 +08:00
chenyu	e6cfaaa496	metal benchmark JIT=2 -> JIT=1 (#7661 )	2024-11-12 22:55:27 -05:00
chenyu	4c5f7ddf1f	flux set model path in args (#7660 ) in addition to default downloading through fetch, add an arg to pass model path directly	2024-11-12 22:11:40 -05:00
chenyu	08706c2ea4	more readable rand [pr] (#7659 ) no walrus inside walrus	2024-11-12 19:02:27 -05:00
chenyu	1884f021e3	add conv3x3 to speed_v_theoretical (#7658 ) * add conv3x3 to speed_v_theoretical * show test duration	2024-11-12 16:41:56 -05:00
ignaciosica	54c0abcb2b	cleaner code_for_op order [pr] (#7653 ) * cleaner code_for_op order * mantain unary-bin-tern order * might as well reorder for cuda and amd	2024-11-12 15:13:56 -05:00
chenyu	962dafb467	use randn in speed_v_theoretical instead of rand (#7656 ) * use randn in speed_v_theoretical instead of rand this made green gemv 20% faster... but why? * update threshold	2024-11-12 15:00:32 -05:00
chenyu	397a2e6eb6	no special case for int32 in truncate [pr] (#7657 ) this masked an issue that idx is not data, and should never need truncate	2024-11-12 14:52:14 -05:00
chenyu	6159790ab8	add gemv to speed_v_theoretical (#7654 ) * add gemv to speed_v_theoretical getting ~300GB/s if we just count the memory of inputs and output * better green numbers * flip	2024-11-12 11:19:35 -05:00
qazal	e07d2d0966	skip TestBeamSearch.test_large_ast (#7652 )	2024-11-12 20:52:22 +08:00

... 71 72 73 74 75 ...

10417 Commits