tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-08 21:55:14 -05:00

Author	SHA1	Message	Date
George Hotz	acb32e1766	hotfix: PM4 supports timing	2024-04-24 08:38:59 +00:00
George Hotz	ad28fdecb1	si.inputs+outputs -> bufs (#4279 )	2024-04-24 15:12:34 +08:00
Elias Wahl	69341144ba	Wikipedia preprocessing script (#4229 ) * Preprocessing script * short seq prob * comments + env vars * Add preprocessing reference. Add test * lint fix + add eval test support * whitespaces * point to commit * comment * rename * better comments	2024-04-23 10:28:01 -04:00
George Hotz	967638f0d5	update docs, remove corealize (#4264 ) * update docs, remove corealize * handle 0 line count * tensor schedule	2024-04-23 12:05:29 +04:00
George Hotz	9a95781d51	renamed (#4260 )	2024-04-23 09:00:28 +04:00
George Hotz	2ae4f45272	WIP PM4 Support (#4110 ) * pm4 kernel launch works * disable USE_THREAD_DIMENSIONS * add kernel code * work on real pm4 * pm4 signal * same * gate pm4 * hcq tests pass * ops passes * pm4 is closer * pm4 debug (#4165) * start debug tests passing * prg * smth * hdp flush * cleaner 1 * do not need this * logs not need * small things * linter * remove AQL * test hcq * fix tests * it's subtracting, it shouldn't be -1 * pm4 changes (#4251) * not need this anymore * sdma signal with non atomic --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-04-23 08:31:27 +04:00
chenyu	f1d9d0a151	cleanup external_test_opt (#4234 ) no more OPT=2 or OPT=3, check strict number of kernels, enabled tests that fusion works now	2024-04-20 04:00:08 -04:00
David Hou	dc4b1af09c	more realistic edge behavior for resnet benchmark (#4231 ) * more realistic edge behavior for resnet benchmark * schedule_step * realize all parameters ahead of time * don't save setup and misc schedules	2024-04-19 20:07:46 -04:00
George Hotz	b9570d6100	clean up update stats (#4226 ) * WIP: clean up update stats * line savings now * fix graphs * fix tests * tighter prints * remove extra jit=false * debug=2 means wait * that won't update stats * still wait	2024-04-19 15:41:30 +04:00
qazal	1c87e5dbf6	fuzz schedule context vars (#4223 ) * fuzz schedule context vars * fuzz unique toposorts * merge ground truth with the rest * Revert "merge ground truth with the rest" This reverts commit `1f3463bb57`. * readability> * can override	2024-04-19 13:16:25 +03:00
Francis Lata	3644077a42	[MLPerf][UNet3D] Add DICE loss + metrics (#4204 ) * add DICE loss and metrics * update dice to include reference implementation's link * remove unused imports * remove unnecessary test file and update pred + label for metrics and losses test * add tests to CI + add exclusion of mlperf_unet3d --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-17 20:09:33 -04:00
Francis Lam	c91b7b1739	test: add fuzz_matmul and better debugging for simple_matmul (#4199 ) also show unoptimized shape in verify_kernel	2024-04-16 23:40:31 -04:00
qazal	ba8602612b	Fuzz all permutations of schedule (#4136 ) * simple toposort * fuzzer * init in_degree * move to tests * same seed * configure paths * internal graph * compare LazyBuffers * simpler * simple graph * assign works * simpler * fix JIT * upstream ci * move ci * fix the path * DEBUG=1 * limit max paths * launch a cmp kernel * Revert "launch a cmp kernel" This reverts commit `791c608992`. * exec ground truth * better perf * copy ground truth once * gpu allclose ast try1 * Revert "gpu allclose ast try1" This reverts commit `1f82103af3`. * prerealized bufs freezing * teeny cleanups * reuse Buffers * Revert "reuse Buffers" This reverts commit `a71de94b03`. --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-17 05:03:21 +04:00
Francis Lam	e9c1616b27	logging: change LOGKERN to LOGKERNS to match LOGOPS (#4193 ) also add printing of ast and applied_opts during verify_kernel to more easily debug errors if they come up	2024-04-16 16:08:32 -04:00
David Hou	7fb220a567	touchup resnet_layer_bench (#4191 )	2024-04-16 14:43:00 -04:00
David Hou	1dbf3b2b19	Benchmarks for individual resnet layers (#4182 ) * resnet individual layer benchmarks! * small * 1 and 2 * mem_used * no ci * better conv print * defaults * prints * adjust * adjust * adjust * benchmark only one layer example * tensor.training, zero_grad, sum instead of mean, last mem, last kernel count * default jitcnt=1 * scale flops/kernels with jitcnt * add note about jitcnt memory * touchup	2024-04-16 13:53:18 -04:00
George Hotz	50e780a588	multitensor shouldn't recompile (#4164 ) * multitensor shouldn't recompile * type annotations * fix tests * outcount in reduce	2024-04-13 00:03:48 -07:00
chenyu	a7c6864260	remove CAST_BEFORE_VIEW (#4152 ) * remove CAST_BEFORE_VIEW testing perf, also this might have issue with assign? * remove all	2024-04-13 01:05:08 -04:00
George Hotz	ebc94c9d6c	rewrite the jit in the context of new schedule (#4162 ) * rewrite the jit in the context of new schedule * mypy better * fix placeholder * tests * all functionality should work * fix tests * no CacheCollector	2024-04-12 21:54:36 -07:00
George Hotz	bbda20c0db	CompiledASTRunner -> CompiledRunner (#4148 )	2024-04-11 08:49:52 -07:00
George Hotz	b7e281cf10	JitItem -> ExecItem (#4146 ) * JitItem -> ExecItem * execitem in realize * cleaner * JITRunner -> Runner	2024-04-11 08:24:57 -07:00
terafo	5e6d2155e4	Add driving monitoring model to benchmarks (#4134 ) * add driving monitoring model to benchmarks * handle crash	2024-04-10 14:27:03 -04:00
geohotstan	fe88591890	update onnx to 1.16.0 (#4127 ) * update * pass tests and skip tests	2024-04-10 11:19:13 -04:00
George Hotz	ae849d12d7	numpy device + pickle it (#4120 )	2024-04-09 13:19:30 -07:00
George Hotz	164329a8ea	address kfd feedback (#4087 ) * address kfd feedback * signals cleanup * signals cleanup * handle 2 doorbell pages correctly * signal reset cleanup * signals cleanup * more GTT * cleanups * minor cleanups	2024-04-05 15:24:41 -07:00
George Hotz	a337922c44	more work on kfd (#4079 ) * more work on kfd * fix multitensor test on kfd * stuff	2024-04-05 08:36:36 -07:00
George Hotz	3de855ea50	don't use SVM memory in KFD (#4072 ) * don't use SVM memory in KFD * copy from fd * cleanups * transfer * hacks * ops_hsa * tighter API	2024-04-04 17:33:21 -07:00
George Hotz	7181ffd630	HWCopyQueue in KFD (#4042 ) * HWCopyQueue in KFD * hw compute queue * test * move test * more tests * fix wait * fix multimap * mes crash * tests pass but slow * stuff is working * one more test	2024-04-03 20:14:24 -07:00
chenyu	e3c0ac9fbf	remove old envvar "OPT" (#4060 )	2024-04-03 14:55:21 -04:00
chenyu	406cb5fd90	const fold ReduceOps (#4059 )	2024-04-03 14:39:28 -04:00
chenyu	c71627fee6	move GlobalCounter to helpers (#4002 ) break circular import between ops and buffer	2024-03-30 00:30:30 -04:00
George Hotz	f916aadaea	external that test	2024-03-29 19:35:50 -07:00
chenyu	d9ff636cf5	use is to compare with enum (#3993 ) * use is to compare with enum currently it's mixed between `==` and `is`, moved all to `is` * more	2024-03-29 13:02:56 -04:00
chenyu	b47f6cebb2	LinearizerOptions -> CompilerOptions (#3978 )	2024-03-28 17:50:23 -04:00
George Hotz	42b9d999ea	Buffer isn't always allocated (#3974 ) * buffer alloc * allocate * missing allocates * last one	2024-03-28 13:33:47 -07:00
geohotstan	bd3a7d068c	correct device for validation test in model benchmark CI (#3960 ) * fix tests * add clang back for only metal * change the name to reflect CLANG being ran * add back cuda	2024-03-27 13:40:06 -04:00
George Hotz	68ca4d4276	split to schedule.py (#3949 ) * split to schedule.py * split	2024-03-26 21:02:46 -07:00
George Hotz	150ea2eb76	create engine folder and move code (#3948 ) * retry * older tf * that	2024-03-26 20:38:03 -07:00
Francis Lam	5530b0cbed	fuzz_linearizer: reduce debug verbosity and make easier for CI usage (#3942 ) * fuzz_linearizer: reduce debug verbosity and make easier for CI usage * rename FUZZ_BEAM to FUZZ_ALL_ACTIONS (not choosing a subset) * skip simple ASTs (easier to use with LOGOPS output) * don't fuzz a previously seen AST * add options to allow non-zero --expected-failures * clean up naming and use set	2024-03-26 16:25:24 -04:00
nimlgen	e2d6f76723	_alloc and _free with options (#3934 ) * _alloc has options * linter * fix hsa	2024-03-26 09:11:41 -07:00
wozeparrot	9a9cac58f9	add lars to nn (#3750 ) * feat: add lars * feat: don't remove this comment * clean: smaller diff * clean: shorter line * feat: remove mlperf lars, switch resnet * fix: fully remove mlperf lars * clean: comment * feat: contiguous * feat: no weight decay on skip params * feat: optimizergroup * feat: classic momentum * fix: pylint * clean: move comment * fix: correct algo * feat: lrschedulergroup * feat: skip list tests * feat: :\| forgot that params are a thing * feat: remove skip_list params from main params * feat: set moment --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-24 11:43:12 -04:00
chenyu	a2b2597fc2	replace dtype.name str with render_dtype (#3903 ) fixed some bf16 cast issue since it does not have `.name`. also more robust if there are lang specific type override	2024-03-23 19:25:48 -04:00
Francis Lam	8db7a6bbcc	debug: add optional detailed BEAM_LOG logging (#3883 ) * debug: add optional detailed BEAM_LOG logging show uop count, compile and run times for each candidate in search also add --timing to verify_kernel.py to make it easier to explore hand-crafted applied opts * fix linter	2024-03-22 19:23:31 -04:00
Francis Lam	5587594a00	fuzz_linearizer: add --ast and --file params to read kernels (#3877 ) also fix up ast_str_to_str to support the new tuple of LazyOps	2024-03-22 14:27:40 -04:00
uuuvn	6729f20aab	Ring allreduce try 2 (#3852 ) * Ring allreduce v3 * Configurable size, number of gpus and jit in benchmark * ScheduleBarrier v0 * GB/s that make sense * ScheduleBarrier v0.1 * Fallback on 2 GPUs * ScheduleBarrier v0.2 * ScheduleBarrier v0.3 * ScheduleBarrier v0.3.1 * ScheduleBarrier v0.3.2 * Replace ScheduleBarrier with automatic optimization * unused import * fix comment * typing * better fallback * python 3.8 * RING=2 and use ContextVar * DEBUG >= 2 and change name * linter * type --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-03-21 19:17:51 -04:00
Francis Lam	3c0478bfab	fuzz_linearizer: add additional DEBUG info for comparison errors (#3866 )	2024-03-21 18:58:10 -04:00
chenyu	e50b7abe4f	diversed buf inputs based on dtype in fuzz_linearizer (#3863 )	2024-03-21 16:23:11 -04:00
chenyu	30fa03243e	reuse fuzz_linearizer.compare_linearizer in test_linearizer_failures (#3861 )	2024-03-21 14:12:27 -04:00
chenyu	6bf0b82267	alloc new output in fuzz_linearizer between baseline and real one (#3859 ) if the kernel is an assign `a += 1`, the rawbufs[0] is updated twice and gives false compare_error	2024-03-21 11:36:05 -04:00
nimlgen	85691c8e20	fix hsa sync issue (#3847 ) * fix hsa sync issue * linter	2024-03-21 04:00:30 +03:00

1 2 3 4 5 ...

304 Commits