tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
Louis Novy	2ac5aec66b	Fix exponential complexity in _is_padding_okay [pr] (#7008 ) * preliminary test * missed Optional * don't check for cache during recursion * match style from st_fixup... may be marginally faster? * pathological test case: strongly connected DAG * move to test_schedule as this isn't really a fusion * oops this shouldn't be edited * Revert "oops this shouldn't be edited" This reverts commit `487cb027dc`. * Revert "move to test_schedule as this isn't really a fusion" This reverts commit `48d8c550ce`. * move to test_schedule as this isn't really a fusion * ok no more merge error funny business	2024-10-14 02:34:47 +03:00
chenyu	bd8ecf7fd6	remove NumNode (#7035 )	2024-10-13 16:42:19 -04:00
chenyu	c4c806a210	generate new kernel dataset (#7034 ) * generate new kernel dataset pre req to remove NumNode ``` extra/optimization/generate_dataset.sh gzip -k /tmp/sops mv /tmp/sops.gz extra/datasets/ ``` * fix var range in fuzz_linearizer	2024-10-13 16:19:41 -04:00
chenyu	1a27417262	remove arbitrary multiplication case (#7033 ) adds the wrongly simplified kernel in test_linearizer_failures #7019	2024-10-13 15:06:05 -04:00
chenyu	13575f080a	remove bitcast backward in function.py (#7031 ) bitcast cannot backward	2024-10-13 10:08:27 -04:00
Harsh Natuskar	ace834ef7b	=docs update (#7027 )	2024-10-13 19:39:06 +08:00
qazal	13846930cd	hotfix: extract_dataset.py (#7029 )	2024-10-13 11:18:23 +03:00
nimlgen	942a17109a	qcom use QCOMBuffer for all allocated buffers (#7023 ) * qcom use QCOMBuffer for all allocated buffers * checks	2024-10-12 23:44:36 +03:00
chenyu	04d9b46d51	derivative of softmax is indepedent of max (#7009 ) * derivative of softmax is indepedent of max * update test	2024-10-12 15:59:23 -04:00
chenyu	cae1c41755	test case of softmax backward kernel count (#7022 )	2024-10-12 15:46:32 -04:00
George Hotz	5ce224ceb3	handle arbitrary multiplication case (#7019 ) * handle arbitrary multiplication case * remove count restriction	2024-10-12 23:16:27 +08:00
chenyu	23faeacb23	remove outdated comments (#7018 )	2024-10-12 10:51:07 -04:00
George Hotz	85a45164fb	remove pyint [pr] (#7016 ) * remove pyint * bump time on tp [pr] * dont truncate in const fold * remove dead code * Revert "dont truncate in const fold" This reverts commit `29c81db0f7`. * remove define_var	2024-10-12 22:36:24 +08:00
George Hotz	38d45dfba5	hotfix: no rng in test/external/external_benchmark_schedule.py	2024-10-12 22:03:04 +08:00
chenyu	ed1ed9e4ff	bert use BS=72 (#7015 ) memory 131 -> 138 green tflops 201 -> 209 red tflops 160 -> 169	2024-10-12 09:41:56 -04:00
George Hotz	cba4b9a058	clean up ops file [pr] (#7013 )	2024-10-12 19:53:52 +08:00
qazal	746a1f8c86	prep uoping diff for big graph [pr] (#7014 )	2024-10-12 14:09:32 +03:00
ignaciosica	334f499e6a	consistent render of recip in cuda with CStyleLanguage (#6980 )	2024-10-12 18:56:47 +08:00
George Hotz	a71bb09ec3	remove symbolic file [pr] (#7012 )	2024-10-12 18:44:44 +08:00
George Hotz	16271189ea	hotfix: don't spend lines on a (broken) favicon	2024-10-12 18:21:10 +08:00
George Hotz	b737ee5bac	move to_indexed_uops to uops (#7011 ) * move to_indexed_uops to uops * UOp.range	2024-10-12 18:20:57 +08:00
George Hotz	5ae2de9845	UOp.variable (#7010 ) * UOp.variable [pr] * fix tests * clean * improve name rendering * last bug	2024-10-12 18:20:44 +08:00
Bhavya Gada	f79e05cac0	add types in all nn/init.py classes (#7002 ) * add types in batchnorm class * fix lint error in batchnorm types * add types to conv1d function * add types to convtranspose1d func and conv2d, convtranspose2d classes * add types to all remaining classes * change conv1d padding type to also accept str * less is more; only keep non-obvious types * mkdocs need types	2024-10-12 14:42:14 +08:00
ignaciosica	2bb6b95e9f	refactor _make_hip_code_for_op into pm rules (#7001 )	2024-10-12 12:46:22 +08:00
George Hotz	5c9f76e274	hotfix: openpilot compile3 compare to i==1	2024-10-12 09:44:24 +08:00
chenyu	36056e0760	update mlperf systems and copy 4.1 to 5.0 (#7004 )	2024-10-11 16:20:34 -04:00
Markiian Novosad	8831c691e2	Add slice parameter type checking to disallow Tensor usage for slices (#6967 ) * add support for single el tensors for slices * rm trailing spaces * cleanup long lines * remove tensor in slice support, add comprehensive err msg * cleanup getitem, add slice type check * Edit err message	2024-10-11 16:20:21 -04:00
Francis Lam	b0dd407cdd	ops_cuda: add optional dynamic smem parameter (#6956 ) * ops_cuda: add optional dynamic smem parameter This is required to enable larger than 48kb shared memory usage on a per-kernel basis. * move setting max dynamic smem size to init	2024-10-11 21:51:06 +03:00
chenyu	0e42662f2a	log seed at the right place for bert (#7000 )	2024-10-11 10:39:40 -04:00
nimlgen	5496a36536	update red mlperf bert readme (#6969 )	2024-10-11 13:08:06 +03:00
nimlgen	feb0bcb58b	qcom bench bind to perf cluster (#6996 )	2024-10-11 12:21:52 +03:00
qazal	7451812bbf	delete AST_REWRITE ctx var (#6995 )	2024-10-11 11:33:16 +03:00
qazal	7988547df2	start changes from big graph (#6993 ) * start changes from big graph [pr] * space * still capture ctx	2024-10-11 11:13:46 +03:00
George Hotz	e7a0ffe46a	break out linearization [pr] (#6994 )	2024-10-11 15:27:33 +08:00
George Hotz	f319530191	don't track simplify [pr] (#6992 )	2024-10-11 15:03:03 +08:00
George Hotz	e441794c4b	remove custom op support, we waste time maintaining this (#6991 ) * remove custom op support, we waste time maintaining this * customop is over	2024-10-11 14:31:09 +08:00
George Hotz	c08521e823	minor cleanups from toonygrad (#6990 )	2024-10-11 14:19:10 +08:00
George Hotz	f50d0e0ee0	cloud device [pr] (#6964 ) * first try at cloud device [pr] * real separation * we're free * clang works * unhappy with timeout * better timeouts and free * unrelated * use http verbs + add test * lines + better test * fix DELETE * shorter cloud * split key * fix sending renderer * PTXRenderer serialization * add sessions * http.client * minor timeout bump * fix keep-alive * inc server timeout * real fix timeout * that one too	2024-10-11 12:24:06 +08:00
Bhavya Gada	23c09f4b4c	add support for padding='same' in nn.conv (#6975 ) * add support for padding='same' in nn.conv * express concisely * simplify loop * test same padding with dilation and conv1d * fix bad indentation * make loop one liner	2024-10-11 11:39:07 +08:00
qazal	54dcea235d	viz auto recenter on out of view graph [pr] (#6986 )	2024-10-11 02:40:06 +03:00
nimlgen	159ee04489	include qcom in view_supported_devices (#6985 ) * include qcom in view_supported_devices * ignore images	2024-10-11 01:10:51 +03:00
nimlgen	f9d454aed5	correct kernargs alignment (#6984 )	2024-10-11 00:06:28 +03:00
qazal	2b17279d4e	viz don't default open the browser [pr] (#6983 ) * viz don't default open the browser [pr] * move st * scale down	2024-10-10 22:12:18 +03:00
qazal	4f60252210	reduce scheduler process replay overhead [pr] (#6981 )	2024-10-10 20:03:38 +03:00
Friedrich Carl Eichenroth	859d6d0407	Fix mypy examples/beautiful_.py (#6978 ) fix mypy examples/beautiful_.py backwards * add test * Revert "add test" This reverts commit `4d88845ba3`. --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-10-10 11:34:29 -04:00
qazal	4ef5310039	track viz context even if rewrite errors [pr] (#6976 )	2024-10-10 18:33:15 +03:00
chenyu	592e5f1df2	skip test_viz test_no_dedup_different_opts (#6979 )	2024-10-10 11:10:24 -04:00
chenyu	e3dc10f8f6	improve fold_unrolled_divs (#6977 ) addressed #6935 the first few terms in fold_unrolled_divs might have been folded already, so the check should first try to add those terms back. there is a case that every but one term is folded which is not an add chain anymore, so just added as a failed test case for now	2024-10-10 10:52:05 -04:00
qazal	3481468702	bring viz to core (#6970 ) * move viz to core * pathfix * move test_viz to core * cleanup test_viz diff * use contextvars	2024-10-10 16:56:26 +03:00
nimlgen	fad575ec76	qcom tiny cleanups (#6973 )	2024-10-10 12:26:41 +03:00

1 2 3 4 5 ...

6319 Commits