tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
qazal	2094b3b327	graph ScheduleItems (#4224 ) * graph schedules * add logging * inplace --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-19 16:17:11 +04:00
George Hotz	cd88afc98b	datasets isn't a feature + filter docstrings (#4228 ) * datasets isn't a feature * filter docstrings in sz	2024-04-19 16:16:10 +04:00
George Hotz	b9570d6100	clean up update stats (#4226 ) * WIP: clean up update stats * line savings now * fix graphs * fix tests * tighter prints * remove extra jit=false * debug=2 means wait * that won't update stats * still wait	2024-04-19 15:41:30 +04:00
qazal	1c87e5dbf6	fuzz schedule context vars (#4223 ) * fuzz schedule context vars * fuzz unique toposorts * merge ground truth with the rest * Revert "merge ground truth with the rest" This reverts commit `1f3463bb57`. * readability> * can override	2024-04-19 13:16:25 +03:00
George Hotz	d99b512084	llm.c timing (#4219 ) * add timing info * fix malloc * 8s with beam	2024-04-19 12:43:21 +04:00
qazal	43841a32b7	Merge pull request #4222 from Qazalin/fuzz-multi0 Tunable multi output fusion	2024-04-19 08:07:45 +03:00
qazal	b2fe3884fc	Merge branch 'master' into fuzz-multi0	2024-04-19 07:56:26 +03:00
qazal	abb10c83cd	tunable multi output fusion	2024-04-19 07:44:31 +03:00
chenyu	a1133beb80	KFD GEMM (#4221 ) added to benchmark CI and fixed duplicated filenames between cuda and ptx	2024-04-19 00:43:18 -04:00
chenyu	3f3af0fb85	test_linearizer_failures 29 passes now (#4215 ) TC + PADTO fixed	2024-04-18 19:49:23 -04:00
Elias Wahl	2ecd61e3e2	monkey patching (#4214 )	2024-04-18 19:20:52 -04:00
Francis Lam	126826afc8	linearizer: refactor to define accs with potentially TC-modified idxs (#4211 )	2024-04-18 15:31:06 -04:00
George Hotz	39b60a25f0	more llm c work (#4207 ) * more llm c work * print nicely * fake load pretrained * select warmups * output c code	2024-04-18 22:20:44 +04:00
chenyu	f7416916df	update resnet hparams based on BS=1632 RCP (#4210 ) https://github.com/mlcommons/logging/blob/master/mlperf_logging/rcp_checker/training_4.0.0/rcps_resnet.json	2024-04-18 12:01:46 -04:00
George Hotz	fa57c3e7ce	continue llm.c (#4190 ) * continue llm.c * export more * progress on llm.c * simpler optim, names work	2024-04-18 10:57:54 +04:00
geohotstan	269a58d5fa	tolist to return multidimensional list (#4192 ) * lol does this work * some more changes * a tiny note * rename a variable * add test for data const and add TODO comment * make type correct make type correct	2024-04-18 07:43:10 +04:00
Francis Lata	3644077a42	[MLPerf][UNet3D] Add DICE loss + metrics (#4204 ) * add DICE loss and metrics * update dice to include reference implementation's link * remove unused imports * remove unnecessary test file and update pred + label for metrics and losses test * add tests to CI + add exclusion of mlperf_unet3d --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-17 20:09:33 -04:00
chenyu	cd801a15f3	scipy.signal.gaussian -> scipy.signal.windows.gaussian (#4205 ) fixed unet3d model_eval, will add to CI after merging new dice loss	2024-04-17 19:15:37 -04:00
Elias Wahl	6eef8ee22a	Wikipedia download script for MLPerf BERT training (#4202 ) * wikipedia download script * add link * checksum valueError * ops	2024-04-17 16:34:57 -04:00
qazal	f75020a903	minimal diff for multioutput reduce pairs (#4030 ) * simple fusion * compiler cache patch * Revert "compiler cache patch" This reverts commit `fa18049597`. * Revert "Revert "compiler cache patch"" This reverts commit `57f8d41f98`. * delete that * early sort * teeny renames * spec * .empty is great * delete sort * Update test_schedule.py * this is one kernel now --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-17 10:55:44 -04:00
George Hotz	8564e28a1b	new memory scheduler with explicit refcounts (#4198 ) * new memory scheduler with explict refcounts * move central memory planner * typo + use central memory planner in openpilot * cleanups * include lb_refcount in pickle * replace PlaceHolder with memory planner * cleaner	2024-04-17 08:46:47 +04:00
Francis Lam	c91b7b1739	test: add fuzz_matmul and better debugging for simple_matmul (#4199 ) also show unoptimized shape in verify_kernel	2024-04-16 23:40:31 -04:00
qazal	ba8602612b	Fuzz all permutations of schedule (#4136 ) * simple toposort * fuzzer * init in_degree * move to tests * same seed * configure paths * internal graph * compare LazyBuffers * simpler * simple graph * assign works * simpler * fix JIT * upstream ci * move ci * fix the path * DEBUG=1 * limit max paths * launch a cmp kernel * Revert "launch a cmp kernel" This reverts commit `791c608992`. * exec ground truth * better perf * copy ground truth once * gpu allclose ast try1 * Revert "gpu allclose ast try1" This reverts commit `1f82103af3`. * prerealized bufs freezing * teeny cleanups * reuse Buffers * Revert "reuse Buffers" This reverts commit `a71de94b03`. --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-17 05:03:21 +04:00
nimlgen	4ed6b42a8a	fix kernargs check in kfd (#4194 )	2024-04-17 00:44:50 +03:00
David Hou	97d846dd67	in forced_realize, unchase last op if it is upcast (#4185 ) * in forced_realize, unchase last op if it is upcast * start on test * flesh out test * more test * comment * comment out parallel reduce test * reorder * unused	2024-04-16 17:15:17 -04:00
Francis Lam	e9c1616b27	logging: change LOGKERN to LOGKERNS to match LOGOPS (#4193 ) also add printing of ast and applied_opts during verify_kernel to more easily debug errors if they come up	2024-04-16 16:08:32 -04:00
David Hou	7fb220a567	touchup resnet_layer_bench (#4191 )	2024-04-16 14:43:00 -04:00
David Hou	1dbf3b2b19	Benchmarks for individual resnet layers (#4182 ) * resnet individual layer benchmarks! * small * 1 and 2 * mem_used * no ci * better conv print * defaults * prints * adjust * adjust * adjust * benchmark only one layer example * tensor.training, zero_grad, sum instead of mean, last mem, last kernel count * default jitcnt=1 * scale flops/kernels with jitcnt * add note about jitcnt memory * touchup	2024-04-16 13:53:18 -04:00
George Hotz	d49d4324a3	update docs (#4189 )	2024-04-16 16:07:02 +04:00
George Hotz	55ae73e951	Replicate llm.c in tinygrad (#4179 ) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * test tolist * simple fix for onnx test failures (#4186) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * bump line count to 7500 * simplest fix * safenumpy tolist for now --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> --------- Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>	2024-04-16 15:40:48 +04:00
George Hotz	b6e7243bfa	hotfix: skip slow pre-commit test	2024-04-16 11:48:43 +04:00
George Hotz	cda0010020	hotfix: docs-legacy	2024-04-16 11:06:56 +04:00
George Hotz	8f749ae0eb	New docs are in mkdocs (#4178 ) * start mkdocs * simple docs for tensor * more docs * move those back * more docs * copy markdown extensions * docs legacy * docs building workflow * fix showcase links * only that? * install tinygrad * add docs to setup.py * Delete examples/llm.c/data	2024-04-16 10:59:51 +04:00
chenyu	aa093efa43	fix handcode_resnet50_opt flops count (#4184 )	2024-04-15 22:13:45 -04:00
chenyu	d5b67c1ca3	log resnet TRAIN_BEAM / EVAL_BEAM (#4181 ) also run eval in benchmark mode if either one is positive	2024-04-15 19:29:08 -04:00
Francis Lam	9d2273235c	search: BEAM_UOPS_MAX to prune candidates with too many uops (#4088 ) * search: add better default settings for fast search not the highest possible performance, but adequate for most usage * search: revert BEAM_MIN_PROGRESS and BEAM_UPCAST_MAX default changes also sneak in a link to .gitignore for the unet3d dataset * revert BEAM_MAX_TASKS_PER_CHILD change and fix uops max condition	2024-04-15 18:56:22 -04:00
qazal	286ea697f3	keep order in realizes (#4180 )	2024-04-16 01:25:50 +04:00
George Hotz	e14a9bca0c	hotfix: bump line count to 7500 for NV backend	2024-04-15 23:18:46 +04:00
chenyu	6a2168e698	TRAIN_BEAM and EVAL_BEAM for resnet (#4177 ) working on measuring compile time	2024-04-15 14:57:21 -04:00
Timmy	4592fc8fe7	Multireduce Kernels - prereq refactor (#4173 ) * refector rendering a reduceop into it's own function (will help for kernels with multiple reduceops) * linters * addressing concerns	2024-04-14 20:16:54 -04:00
David Hou	593c90d7d6	Resnet fp16 training with fp32 master weight copy (#4144 ) * add casts to layers * FLOAT flag * detach * no_grad for eval * whitespace * explicit fp32 initialization * oops * whitespace * put back config['DEFAULT_FLOAT'] * bad * live dangerously (don't hide bugs) * don't bundle changes --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-14 11:25:08 -04:00
chenyu	e20d6f9221	correct resnet estimate time (#4169 ) 7.99 hours was rendered as 7h0m.	2024-04-14 02:21:46 -04:00
George Hotz	ea18d28253	some overview docs	2024-04-13 17:01:09 -07:00
George Hotz	50e780a588	multitensor shouldn't recompile (#4164 ) * multitensor shouldn't recompile * type annotations * fix tests * outcount in reduce	2024-04-13 00:03:48 -07:00
George Hotz	599eb266b1	optionally use a copy kernel instead of SDMA (#4116 ) * optionally use a copy kernel * lazyops in copied kernels * add sync * no sdma at all * work * copy_ast	2024-04-12 23:10:41 -07:00
George Hotz	ba7314c26b	cleanup lbs (#4163 )	2024-04-12 22:32:16 -07:00
chenyu	a7c6864260	remove CAST_BEFORE_VIEW (#4152 ) * remove CAST_BEFORE_VIEW testing perf, also this might have issue with assign? * remove all	2024-04-13 01:05:08 -04:00
George Hotz	ebc94c9d6c	rewrite the jit in the context of new schedule (#4162 ) * rewrite the jit in the context of new schedule * mypy better * fix placeholder * tests * all functionality should work * fix tests * no CacheCollector	2024-04-12 21:54:36 -07:00
George Hotz	b67f759780	abstractions3 is currently wishful thinking (#4124 ) * abstractions3 is currently wishful thinking * a3 * work * minor * progress on a3 * more * update abstractions3 * cleaner	2024-04-12 16:46:01 -07:00
MaximilianEmel	27a98aaecc	Rewritten SVG Logos (#4150 ) * rewrote the svg logos to use polygons and render better * changed self-closing tags' style to better conform to the original	2024-04-12 14:09:57 -07:00

1 2 3 4 5 ...

4167 Commits