tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 23:38:58 -05:00

Author	SHA1	Message	Date
chenyu	ed1ed9e4ff	bert use BS=72 (#7015 ) memory 131 -> 138 green tflops 201 -> 209 red tflops 160 -> 169	2024-10-12 09:41:56 -04:00
George Hotz	cba4b9a058	clean up ops file [pr] (#7013 )	2024-10-12 19:53:52 +08:00
qazal	746a1f8c86	prep uoping diff for big graph [pr] (#7014 )	2024-10-12 14:09:32 +03:00
ignaciosica	334f499e6a	consistent render of recip in cuda with CStyleLanguage (#6980 )	2024-10-12 18:56:47 +08:00
George Hotz	a71bb09ec3	remove symbolic file [pr] (#7012 )	2024-10-12 18:44:44 +08:00
George Hotz	16271189ea	hotfix: don't spend lines on a (broken) favicon	2024-10-12 18:21:10 +08:00
George Hotz	b737ee5bac	move to_indexed_uops to uops (#7011 ) * move to_indexed_uops to uops * UOp.range	2024-10-12 18:20:57 +08:00
George Hotz	5ae2de9845	UOp.variable (#7010 ) * UOp.variable [pr] * fix tests * clean * improve name rendering * last bug	2024-10-12 18:20:44 +08:00
Bhavya Gada	f79e05cac0	add types in all nn/init.py classes (#7002 ) * add types in batchnorm class * fix lint error in batchnorm types * add types to conv1d function * add types to convtranspose1d func and conv2d, convtranspose2d classes * add types to all remaining classes * change conv1d padding type to also accept str * less is more; only keep non-obvious types * mkdocs need types	2024-10-12 14:42:14 +08:00
ignaciosica	2bb6b95e9f	refactor _make_hip_code_for_op into pm rules (#7001 )	2024-10-12 12:46:22 +08:00
George Hotz	5c9f76e274	hotfix: openpilot compile3 compare to i==1	2024-10-12 09:44:24 +08:00
chenyu	36056e0760	update mlperf systems and copy 4.1 to 5.0 (#7004 )	2024-10-11 16:20:34 -04:00
Markiian Novosad	8831c691e2	Add slice parameter type checking to disallow Tensor usage for slices (#6967 ) * add support for single el tensors for slices * rm trailing spaces * cleanup long lines * remove tensor in slice support, add comprehensive err msg * cleanup getitem, add slice type check * Edit err message	2024-10-11 16:20:21 -04:00
Francis Lam	b0dd407cdd	ops_cuda: add optional dynamic smem parameter (#6956 ) * ops_cuda: add optional dynamic smem parameter This is required to enable larger than 48kb shared memory usage on a per-kernel basis. * move setting max dynamic smem size to init	2024-10-11 21:51:06 +03:00
chenyu	0e42662f2a	log seed at the right place for bert (#7000 )	2024-10-11 10:39:40 -04:00
nimlgen	5496a36536	update red mlperf bert readme (#6969 )	2024-10-11 13:08:06 +03:00
nimlgen	feb0bcb58b	qcom bench bind to perf cluster (#6996 )	2024-10-11 12:21:52 +03:00
qazal	7451812bbf	delete AST_REWRITE ctx var (#6995 )	2024-10-11 11:33:16 +03:00
qazal	7988547df2	start changes from big graph (#6993 ) * start changes from big graph [pr] * space * still capture ctx	2024-10-11 11:13:46 +03:00
George Hotz	e7a0ffe46a	break out linearization [pr] (#6994 )	2024-10-11 15:27:33 +08:00
George Hotz	f319530191	don't track simplify [pr] (#6992 )	2024-10-11 15:03:03 +08:00
George Hotz	e441794c4b	remove custom op support, we waste time maintaining this (#6991 ) * remove custom op support, we waste time maintaining this * customop is over	2024-10-11 14:31:09 +08:00
George Hotz	c08521e823	minor cleanups from toonygrad (#6990 )	2024-10-11 14:19:10 +08:00
George Hotz	f50d0e0ee0	cloud device [pr] (#6964 ) * first try at cloud device [pr] * real separation * we're free * clang works * unhappy with timeout * better timeouts and free * unrelated * use http verbs + add test * lines + better test * fix DELETE * shorter cloud * split key * fix sending renderer * PTXRenderer serialization * add sessions * http.client * minor timeout bump * fix keep-alive * inc server timeout * real fix timeout * that one too	2024-10-11 12:24:06 +08:00
Bhavya Gada	23c09f4b4c	add support for padding='same' in nn.conv (#6975 ) * add support for padding='same' in nn.conv * express concisely * simplify loop * test same padding with dilation and conv1d * fix bad indentation * make loop one liner	2024-10-11 11:39:07 +08:00
qazal	54dcea235d	viz auto recenter on out of view graph [pr] (#6986 )	2024-10-11 02:40:06 +03:00
nimlgen	159ee04489	include qcom in view_supported_devices (#6985 ) * include qcom in view_supported_devices * ignore images	2024-10-11 01:10:51 +03:00
nimlgen	f9d454aed5	correct kernargs alignment (#6984 )	2024-10-11 00:06:28 +03:00
qazal	2b17279d4e	viz don't default open the browser [pr] (#6983 ) * viz don't default open the browser [pr] * move st * scale down	2024-10-10 22:12:18 +03:00
qazal	4f60252210	reduce scheduler process replay overhead [pr] (#6981 )	2024-10-10 20:03:38 +03:00
Friedrich Carl Eichenroth	859d6d0407	Fix mypy examples/beautiful_.py (#6978 ) fix mypy examples/beautiful_.py backwards * add test * Revert "add test" This reverts commit `4d88845ba3`. --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-10-10 11:34:29 -04:00
qazal	4ef5310039	track viz context even if rewrite errors [pr] (#6976 )	2024-10-10 18:33:15 +03:00
chenyu	592e5f1df2	skip test_viz test_no_dedup_different_opts (#6979 )	2024-10-10 11:10:24 -04:00
chenyu	e3dc10f8f6	improve fold_unrolled_divs (#6977 ) addressed #6935 the first few terms in fold_unrolled_divs might have been folded already, so the check should first try to add those terms back. there is a case that every but one term is folded which is not an add chain anymore, so just added as a failed test case for now	2024-10-10 10:52:05 -04:00
qazal	3481468702	bring viz to core (#6970 ) * move viz to core * pathfix * move test_viz to core * cleanup test_viz diff * use contextvars	2024-10-10 16:56:26 +03:00
nimlgen	fad575ec76	qcom tiny cleanups (#6973 )	2024-10-10 12:26:41 +03:00
qazal	3724a66716	move test_viz to test/, prereq for tinygrad/viz [pr] (#6972 )	2024-10-10 11:40:46 +03:00
Kinvert	960c495755	added beautiful fashion mnist and example (#6961 ) * added beautiful fashion mnist and example * fixing whitespace * refactor Fashion MNIST to fewer lines * fix newline to reduce diff * Update beautiful_mnist.py * Update beautiful_mnist.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-10-10 12:01:07 +08:00
chenyu	b5546912e2	10% more TRAIN_STEPS for bert (#6971 ) got two very close run, adding more steps for buffer	2024-10-09 19:21:43 -04:00
nimlgen	f90d8493cc	add HCQDEV_WAIT_TIMEOUT_MS (#6968 )	2024-10-09 19:50:00 +03:00
chenyu	35cf48659b	limit beam param for bert on green (#6966 ) seems to mitigate the crash	2024-10-09 11:48:18 -04:00
mesozoic-egg	0e8bcda07e	get readable error from wait_check (#6965 ) Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>	2024-10-09 17:28:58 +03:00
qazal	20d3c2d113	unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW (#6955 ) * add UOps.VIEW * update hardcoded asts * update sops.gz	2024-10-09 02:00:17 +08:00
nimlgen	137ad5519f	amd fix cwsr for gfx11 (#6950 ) * amd cwsr * ()	2024-10-08 17:44:29 +03:00
nimlgen	0d526e251e	nv sync on gpu before local update (#6954 )	2024-10-08 17:43:58 +03:00
qazal	2800520dd5	even smaller process_replay.py [pr] (#6941 ) * even smaller process_replay.py [pr] * delete those tests * dedup asts	2024-10-08 20:43:22 +08:00
qazal	851f39653a	rename to BUFFER_VIEW + MetaOps cleanup (#6953 )	2024-10-08 20:09:22 +08:00
chenyu	1ff2c98f8a	fix logfile name for bert red (#6952 )	2024-10-08 05:37:52 -04:00
czhu	08bfa8632b	embedding shape (#6930 )	2024-10-08 14:42:20 +08:00
vladov	20a9683403	Make self.fd Optional. (#6855 ) * Make self.fd Optional. * Fix io_uring when missing fd. * Compress io_uring fast path code.	2024-10-08 13:25:34 +08:00

1 2 3 4 5 ...

6305 Commits