tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 06:58:11 -05:00

Author	SHA1	Message	Date
George Hotz	cda0010020	hotfix: docs-legacy	2024-04-16 11:06:56 +04:00
George Hotz	8f749ae0eb	New docs are in mkdocs (#4178 ) * start mkdocs * simple docs for tensor * more docs * move those back * more docs * copy markdown extensions * docs legacy * docs building workflow * fix showcase links * only that? * install tinygrad * add docs to setup.py * Delete examples/llm.c/data	2024-04-16 10:59:51 +04:00
chenyu	aa093efa43	fix handcode_resnet50_opt flops count (#4184 )	2024-04-15 22:13:45 -04:00
chenyu	d5b67c1ca3	log resnet TRAIN_BEAM / EVAL_BEAM (#4181 ) also run eval in benchmark mode if either one is positive	2024-04-15 19:29:08 -04:00
Francis Lam	9d2273235c	search: BEAM_UOPS_MAX to prune candidates with too many uops (#4088 ) * search: add better default settings for fast search not the highest possible performance, but adequate for most usage * search: revert BEAM_MIN_PROGRESS and BEAM_UPCAST_MAX default changes also sneak in a link to .gitignore for the unet3d dataset * revert BEAM_MAX_TASKS_PER_CHILD change and fix uops max condition	2024-04-15 18:56:22 -04:00
qazal	286ea697f3	keep order in realizes (#4180 )	2024-04-16 01:25:50 +04:00
George Hotz	e14a9bca0c	hotfix: bump line count to 7500 for NV backend	2024-04-15 23:18:46 +04:00
chenyu	6a2168e698	TRAIN_BEAM and EVAL_BEAM for resnet (#4177 ) working on measuring compile time	2024-04-15 14:57:21 -04:00
Timmy	4592fc8fe7	Multireduce Kernels - prereq refactor (#4173 ) * refector rendering a reduceop into it's own function (will help for kernels with multiple reduceops) * linters * addressing concerns	2024-04-14 20:16:54 -04:00
David Hou	593c90d7d6	Resnet fp16 training with fp32 master weight copy (#4144 ) * add casts to layers * FLOAT flag * detach * no_grad for eval * whitespace * explicit fp32 initialization * oops * whitespace * put back config['DEFAULT_FLOAT'] * bad * live dangerously (don't hide bugs) * don't bundle changes --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-14 11:25:08 -04:00
chenyu	e20d6f9221	correct resnet estimate time (#4169 ) 7.99 hours was rendered as 7h0m.	2024-04-14 02:21:46 -04:00
George Hotz	ea18d28253	some overview docs	2024-04-13 17:01:09 -07:00
George Hotz	50e780a588	multitensor shouldn't recompile (#4164 ) * multitensor shouldn't recompile * type annotations * fix tests * outcount in reduce	2024-04-13 00:03:48 -07:00
George Hotz	599eb266b1	optionally use a copy kernel instead of SDMA (#4116 ) * optionally use a copy kernel * lazyops in copied kernels * add sync * no sdma at all * work * copy_ast	2024-04-12 23:10:41 -07:00
George Hotz	ba7314c26b	cleanup lbs (#4163 )	2024-04-12 22:32:16 -07:00
chenyu	a7c6864260	remove CAST_BEFORE_VIEW (#4152 ) * remove CAST_BEFORE_VIEW testing perf, also this might have issue with assign? * remove all	2024-04-13 01:05:08 -04:00
George Hotz	ebc94c9d6c	rewrite the jit in the context of new schedule (#4162 ) * rewrite the jit in the context of new schedule * mypy better * fix placeholder * tests * all functionality should work * fix tests * no CacheCollector	2024-04-12 21:54:36 -07:00
George Hotz	b67f759780	abstractions3 is currently wishful thinking (#4124 ) * abstractions3 is currently wishful thinking * a3 * work * minor * progress on a3 * more * update abstractions3 * cleaner	2024-04-12 16:46:01 -07:00
MaximilianEmel	27a98aaecc	Rewritten SVG Logos (#4150 ) * rewrote the svg logos to use polygons and render better * changed self-closing tags' style to better conform to the original	2024-04-12 14:09:57 -07:00
chenyu	63eb0a68af	fix return dtype of gather (#4159 )	2024-04-12 16:25:12 -04:00
chenyu	d9c5a2b1bb	fix return dtype of getitem Tensor indexing (#4158 ) the use of sum can auto-upcast the result. fixed by using the data dtype as the acc_dtype	2024-04-12 15:55:02 -04:00
chenyu	f6c8032e5d	assert if expr_idxs return might be outside of int32 (#4157 )	2024-04-12 14:18:35 -04:00
nimlgen	24a27a01a9	hotfix: CUDA_P2P works (#4155 )	2024-04-12 18:20:12 +03:00
nimlgen	5a57b48134	cuda p2p enable when available (#4153 )	2024-04-12 16:21:54 +03:00
chenyu	380f27d629	move sum acc_dtype into lazy so it applies to backward (#4149 ) * move sum acc_dtype into lazy so it applies to backward * unit test	2024-04-11 14:43:56 -04:00
George Hotz	bbda20c0db	CompiledASTRunner -> CompiledRunner (#4148 )	2024-04-11 08:49:52 -07:00
George Hotz	0f16709c00	hotfix: remove test speed vs torch	2024-04-11 08:37:57 -07:00
qazal	c0796374e4	refactor membufs (#4147 )	2024-04-11 08:30:44 -07:00
George Hotz	b7e281cf10	JitItem -> ExecItem (#4146 ) * JitItem -> ExecItem * execitem in realize * cleaner * JITRunner -> Runner	2024-04-11 08:24:57 -07:00
George Hotz	e79a11b99c	hotfix: revert llama change	2024-04-10 20:13:15 -07:00
George Hotz	2e6c39b0b2	Do less realizes (#4141 ) * less realize * corealize jit inputs * prints * print before we run	2024-04-10 19:50:50 -07:00
chenyu	06bcae13b4	PADTO SUM if parents of sum are all zero-preserving (#4140 ) * PADTO SUM if parents of sum are all zero-preserving * test case unsafe ops after sum is fine * reuse UNSAFE_PAD_OPS * update db version	2024-04-10 22:16:12 -04:00
George Hotz	081dd1573f	hotfix: keep CUDA D2D copy behind the CUDA_P2P flag	2024-04-10 21:36:48 +00:00
George Hotz	af5984df43	cudagraph memcpy through host (#4137 )	2024-04-10 13:17:17 -07:00
terafo	5e6d2155e4	Add driving monitoring model to benchmarks (#4134 ) * add driving monitoring model to benchmarks * handle crash	2024-04-10 14:27:03 -04:00
chenyu	bf3583f9b2	use Buffer.ensure_allocated in search _ensure_buffer_alloc (#4132 )	2024-04-10 13:11:50 -04:00
George Hotz	a35375df85	run_schedule is so simple now (#4130 )	2024-04-10 09:49:30 -07:00
George Hotz	86bd2eb500	hotfix: update copy_from_fd for new DiskBuffer	2024-04-10 15:41:06 +00:00
George Hotz	ee457a4b20	no more underlying diskbuffer, that's just the device (#4129 )	2024-04-10 08:32:25 -07:00
geohotstan	fe88591890	update onnx to 1.16.0 (#4127 ) * update * pass tests and skip tests	2024-04-10 11:19:13 -04:00
chenyu	6bbbeb93ac	skip a few clang test that took > 30 seconds in CI (#4126 ) * skip slow CLANG test test_train_cifar * skip those too * and that * only CI * one more	2024-04-10 02:00:34 -04:00
George Hotz	08ddeb5685	create schedule has global vars (#4125 ) * abstractions3 is currently wishful thinking * create_schedule_with_vars	2024-04-09 21:42:16 -07:00
George Hotz	216eb235e5	hotfix: cast mnist to float	2024-04-09 19:30:03 -07:00
George Hotz	fea774f669	spend 5 lines to bring mnist into the repo (#4122 )	2024-04-09 19:24:57 -07:00
qazal	42edae8935	pickle schedules (#4114 ) * pickle schedules * Update test_pickle.py * Update test_pickle.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-09 13:47:25 -07:00
Felix Kuehling	38ae4194a6	Fixes for ops_kfd (#4105 ) * kfd_ops: Fix GPU node discovery on NUMA systems Ignore potentially multiple CPU NUMA nodes and any GPU nodes that are not accessible because of device cgroups. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> * kfd_ops: Format the GFX arch target name correctly The target version in sysfs properties is a decimal representation with two digits per component. The format for LLVM GFX target names is a bit quirky for historical reasons. It uses one digit for the minor version and stepping. When it ran out of decimal digits for the stepping on gfx90X it started using hexadecimal there. But the major version is still decimal and went double digit in GFX10. Make sure to parse and format it accordingly for all supported GPUs. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> --------- Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>	2024-04-09 13:21:21 -07:00
George Hotz	10dbf90b2c	hotfix: test speed	2024-04-09 13:20:39 -07:00
George Hotz	ae849d12d7	numpy device + pickle it (#4120 )	2024-04-09 13:19:30 -07:00
chenyu	1ef9c50fd7	Update ssa input order and annotate types in cstyle and assembly (#4117 ) variable prefix is never optional (removed the default "t") and UOp can be optional (added the default None).	2024-04-09 13:10:29 -04:00
geohotstan	15f2f39658	conceptually simpler fancy index (#3335 ) * init * add failed case * fix: temp comment out MULACC cast * is this right? * add test case * oops, forgot to get rid of temp test * WOOOOOO TOOK OUT 2 TRANSPOSES IN GATHER YAY * cleaner * comment cleanup * update docs * resolve conflict * oops * SUPA FAST * comment out a test * del some print statements * use new broadcast stuff * more clean up * move try except * skip fancy indexing for python backend test_ops	2024-04-09 11:18:04 -04:00

1 2 3 4 5 ...

4136 Commits