tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 23:08:06 -05:00

Author	SHA1	Message	Date
chenyu	55707fd00d	fix passing sum_acc_dtype="" to Tensor.sum should fail (#7748 )	2024-11-17 10:58:41 -05:00
chenyu	f18296e23c	simpler Tensor._reduce (#7747 )	2024-11-17 09:20:00 -05:00
qazal	0cc8de2f15	reverse map buf_uops [pr] (#7743 )	2024-11-17 21:29:56 +08:00
chenyu	0292ae7508	Tensor.meshgrid cleanup (#7741 )	2024-11-17 08:26:53 -05:00
qazal	40642cb9ea	to_uop split paths part 2 [pr] (#7746 )	2024-11-17 21:07:28 +08:00
qazal	99024b922b	to_uop one path for all ops part 1 (#7745 ) * flat meta ops * one path for everything * add tests * view is always base * just run	2024-11-17 20:12:44 +08:00
qazal	eeb222f98b	add UOp.new_buffer [pr] (#7742 )	2024-11-17 16:44:52 +08:00
chenyu	a15a900415	fix Tensor.meshgrid for 1D input and check indexing (#7740 )	2024-11-16 23:39:30 -05:00
geohotstan	72a41095bc	add Tensor.meshgrid (#7714 ) * initial implementation and test * some other places that can use meshgrid * revert the onnx_ops change * add to docs * revert interpolate too * update * improve edge case test * might as well test grad * add to test can improve docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-16 23:06:47 -05:00
mesozoic-egg	1a5e896bd4	[pr] Have PTX share code with LLVM (#7635 ) * integrate into ops_cuda * remove debugging stuff * lint fix * mypy fixes * swap ptx.py * edit * simplify wmma * wip * space * refactor * sync the ops removal changes * refactor * rename variables --------- Co-authored-by: judy <mesozoic.egg@proton.mail>	2024-11-17 10:53:56 +08:00
chenyu	f2f7384b67	_resolve_dim cleanup (#7736 ) no duplicated self.ndim+outer	2024-11-16 11:05:39 -05:00
chenyu	e777211a00	Tensor.repeat cleanup (#7735 ) flatten instead of double for loop comprehension	2024-11-16 10:43:45 -05:00
chenyu	f1efd84c92	fix repeat_interleave with negative dim (#7734 )	2024-11-16 10:15:29 -05:00
chenyu	e3105675fb	cond.where(True, False) is cond (#7733 )	2024-11-16 09:44:17 -05:00
qazal	40ae0e9115	smaller big graph (#7695 ) * start * work * rewrite to PRELOAD * st is always from base * fix aesthetics * work * more work * refactor to is_forced_realize * uh * green? * metaop can be image * dont count realized * this is the new src * test_tiny_add passes * work	2024-11-16 22:04:57 +08:00
qazal	f3f95ab9d9	flatten fusion upats [pr] (#7732 )	2024-11-16 21:26:19 +08:00
qazal	ec8c5598f6	refactor to generic UPat for sourcing unrealized bufs [pr] (#7731 ) * base check * use is_scheduled * fixup lazy * update metadata * match is too slow	2024-11-16 21:01:22 +08:00
ignaciosica	597a239e28	Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725 ) * remove unaryops * remove ternaryops * remove metaops * hotfix * remove binaryops * hotfix: test_pattern_matcher --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-16 20:56:56 +08:00
chenyu	22da31b223	clean up Tensor.dot (#7728 ) more docs (similar to numpy) and removed many confusing `-min(n2, 2)`	2024-11-15 18:21:15 -05:00
chenyu	4338c450ac	fix max_pool2d for int tensor with padding (#7726 ) padding inf messed output dtype	2024-11-15 16:22:11 -05:00
chenyu	d736ae7153	example script to show BasicTransformerBlock speed regression (#7724 )	2024-11-15 15:48:25 -05:00
chenyu	aeb1301bab	enable a few tests that work now (#7721 ) should mark the ones that are expected to work with expectedFailure, and delete and ones that are not expected to work	2024-11-15 14:30:52 -05:00
ignaciosica	fc1e123138	minor cleanup in lazy.py (#7719 )	2024-11-15 13:48:24 -05:00
qazal	ef4f402946	add property to flag contig buffer uop [pr] (#7716 )	2024-11-15 22:27:47 +08:00
qazal	313af6d23c	assert buffer VIEW is void [pr] (#7715 )	2024-11-15 22:02:59 +08:00
ignaciosica	c37d142cf8	Refactor metal tc wmma kernel rendering (#7416 ) * refactor metal tc wmma kernel rendering * hotfix: bug * hotfix: hack to avoid backlash in f-string expression * hotfix * hotfix: rename vars * hotfix: moew new_line * hotfix: cleaner wmma rendering	2024-11-15 21:23:08 +08:00
qazal	bddee26114	Ops.VALID cleanup, move recursive tests [pr] (#7713 )	2024-11-15 20:22:46 +08:00
qazal	703a255301	use the method_cache in test_schedule [pr] (#7712 ) * use the method_cache in test_schedule [pr] * need half	2024-11-15 19:20:47 +08:00
qazal	88f760cc32	test_two_sum doesn't need del (#7711 )	2024-11-15 18:50:08 +08:00
George Hotz	9f98f0c93a	use disassemble method for objdump [pr] (#7708 )	2024-11-15 12:55:37 +08:00
George Hotz	9b1605eef9	Revert "objdump intel syntax (#7605 )" (#7707 ) This reverts commit `8f8e375f27`.	2024-11-15 12:13:04 +08:00
ttomsa	8f8e375f27	objdump intel syntax (#7605 ) * objdump intel syntax * test for objdump intel syntax * add disassemble to ClangCompiler and LLVMCompiler. Use just llvm-objdump * linter	2024-11-15 11:32:23 +08:00
chenyu	9cfc4f68c8	clean up Tensor.cat (#7701 )	2024-11-14 13:46:02 -05:00
chenyu	888fcb3643	Tensor.shrink arg cleanup (#7700 ) removed duplicated logic	2024-11-14 13:01:22 -05:00
chenyu	9fb396f660	test_ops maxpool2d -> max_pool2d (#7696 ) and avgpool2d -> avg_pool2d for better grepping the tests	2024-11-14 10:39:12 -05:00
ignaciosica	1419d8e58a	assert op is not store in view (#7679 ) * assert op is not store in view * update view spec * hotfix: nit --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-14 22:17:18 +08:00
Ahmed Harmouche	43040c0e24	add render_cast (#7687 )	2024-11-14 18:01:29 +08:00
geohotstan	f8056a74d6	combine pad2d with pad (#7677 ) * I have pad2d, I have pad, uuh~, pad2dpad~ * fix some small things * strategically placed cast hack * fix more * fix more more * tests * periods	2024-11-14 17:56:02 +08:00
qazal	3747669ab4	post 7655 schedule line savings [pr] (#7692 )	2024-11-14 17:20:41 +08:00
qazal	64ebaa72b5	schedule independent of lazy.py (#7655 ) * make it compile * allow allbufs * _recursive_group starts to work * forced_realize works * _get_isolated_children almost works * 80% * 90% * ocd behavior * 100% for _get_isolated_children * FUSE_CONV_BW=1 works * this took long * can be from buffer's arg too * eventually i'll share these * test_prefer_half_buffer * FUSE_ARANGE=1 sorta * start assign and cleanup fix assign * braindump * diff reset * --- day 3 --- * make _recursive_group work * very minimal groups * BASE * _get_isolated_children that actually works * working version of FUSE_CONV_BW=1 and prefer_half * FUSE_ARANGE=1 works * fix assign * one less problem	2024-11-14 17:01:59 +08:00
qazal	0914c2fec9	add TestLinearizerFailures test_failure_56 and test_failure_57 (#7682 ) * add test_failure_56 and test_failure_57 * so it's only METAL=1	2024-11-14 12:00:33 +08:00
qazal	a87813f063	hotfix: early fold image to image cast store (#7681 ) * hotfix: early fold image to image cast store * count out meta ops	2024-11-14 11:35:59 +08:00
chenyu	e0ad083904	user ceildiv in shard and fix a typo (#7690 )	2024-11-13 18:25:06 -05:00
chenyu	51afc3cc88	update env_vars doc on VIZ link (#7689 ) existing one throws 404 because mkdocs does not allow traverse above doc root (i think?). so for now just stick the github link to it	2024-11-13 17:28:14 -05:00
chenyu	333f5f9f8b	Tensor.bitwise_not (#7688 ) implemented with xor in tensor for now to not add another op. also used it in Tensor.min to fix dtype int on -2**31	2024-11-13 16:31:52 -05:00
chenyu	0423db8d00	simpler nll_loss (#7686 )	2024-11-13 15:10:08 -05:00
chenyu	fb933b79a6	add test case for nll_loss with input > 2D (#7685 ) * failed test case for nll_loss with input > 2D * fixed * add more	2024-11-13 14:34:07 -05:00
geohotstan	9c41c376d3	add Tensor.nll_loss (#7683 ) * move nll_loss to new branch * make nll_loss examples practical * self is * add to docs * small	2024-11-13 13:12:13 -05:00
chenyu	3c6fe4b79a	fix Tensor.bitwise_and and Tensor.bitwise_or to support bool (#7684 )	2024-11-13 13:10:39 -05:00
chenyu	3d82f8e340	simpler rand_like (#7680 )	2024-11-13 12:28:41 -05:00

1 2 3 4 5 ...

6837 Commits