tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
Francis Lam	a9a1fa6bbf	wmma: add reduce axis choice to TC action space (#4328 ) * wmma: add reduce axis choice to TC action space * add test for TC multi-reduce axis choice	2024-04-29 19:15:39 -04:00
chenyu	93abcd3113	fix function.py sum backward without downcast_half (#4353 ) without downcast_half, sum output dtype can be different from input dtype. cast back to input dtype in function.py	2024-04-29 17:53:02 -04:00
Francis Lam	18c61ce077	test/fuzz_linearizer: add --atol/rtol and change half distribution (#4352 )	2024-04-29 15:53:59 -04:00
Elias Wahl	27613dd881	MLPerf BERT: Main training loop (#4288 ) * BERT language modeling head + trunc normal initializers * add train loop + helpers * shuffle in dataloaders + slight changes in main loop * beam change * Minor changes * random.shuffle * HParam update * Use deque for dataloader * wandb bert project name * half fixes * BENCHMARK + remove epoch * cast + print() --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-29 14:35:27 -04:00
qazal	cc1797673e	all fusion opportunities (#4348 )	2024-04-29 19:32:23 +03:00
chenyu	f363f39e83	fix dtype of const folded sum (#4349 ) const folding sum should return in the same dtype the same as regular sum, which can be different from input dtype	2024-04-29 11:40:45 -04:00
qazal	774a9b0bca	override assign_target in fuzz_schedule (#4342 ) * store assign_targets * cleanup * override target	2024-04-29 11:04:04 +03:00
Francis Lata	bb849a57d1	[MLPerf] UNet3D dataloader (#4343 ) * add support for train/val datasets for kits19 * split dataset into train and val sets * add tests for kits19 dataloader * add MLPerf dataset tests to CI * update unet3d model_eval script * fix linting * add nibabel * fix how mock dataset gets created * update ref implementation with permalink and no edits * clean up test and update rand_flip implementation * cleanups	2024-04-28 22:34:18 -04:00
chenyu	c1d8d425eb	fix mean of half tensor if sum is greater than hlaf.max (#4327 ) sum of half does acc in float32 already, add an arg to not downcast to half and use that in mean	2024-04-28 18:04:54 -04:00
qazal	23445db2b9	no skipped tests in RHIP (#4337 ) * delete skip * delete split skip * remu dev * compiler fails here * Revert "remu dev" This reverts commit `28b933d4eb`.	2024-04-28 12:23:05 -04:00
Obada Khalili	e4befa41d7	Fix in `_reshape_mask` (#4332 ) * handle reshape with remainder in _reshape_mask * remove trailing whitespce * use helper_test_op to generate tensors from shapes * test in shapetracket too * remove whitespace * revert property name in other class tests	2024-04-28 11:57:39 -04:00
Timmy	664b563c91	Add `insert_before` to Linearizer Functions (#4320 ) * adding insert_before to linearizer functions * uop insert_before test case * formatting * more formatting * more formatting * syntax * removing self.cast * addressing err * removing noqa s	2024-04-28 11:38:36 -04:00
qazal	3372bea322	reduce children fusion tests (#4321 ) * base tests * real-world tests	2024-04-28 11:14:02 -04:00
chenyu	24a6342950	add mem/s to external_benchmark_resnet (#4309 )	2024-04-26 20:07:17 -04:00
Szymon Ożóg	de832d26c6	disable bfloat16 from ptx tests (#4305 )	2024-04-26 01:20:10 -04:00
Szymon Ożóg	f1ebcffb87	Ptx beam fix (#4296 ) * Fix beam search for PTX * fix ptr arm test	2024-04-25 15:39:39 -04:00
qazal	9a47ed0705	test crossing diamond assigns (#4298 )	2024-04-25 21:52:05 +03:00
chenyu	5ae252ae83	use at least float32 for optim.lr (#4297 ) * use at least float32 for optim.lr when doing mixed precision training (float32 weight, default_float=half), still use float32 to store lr. it would have been upcasted later in actual weight update, but would have lost precision. this improved resnet convergence significantly * undo type annotation	2024-04-25 14:42:28 -04:00
David Hou	6f792b727b	More improvements for resnet layer bench (#4272 ) * fix first layer size, new schedule stuff * estimates * get different conv layers * \r for estimated times * E501 * space after comma	2024-04-25 12:40:49 -04:00
qazal	74a1be88f5	test reduce graph permutations (#4291 )	2024-04-25 11:34:44 +03:00
George Hotz	acb32e1766	hotfix: PM4 supports timing	2024-04-24 08:38:59 +00:00
George Hotz	ad28fdecb1	si.inputs+outputs -> bufs (#4279 )	2024-04-24 15:12:34 +08:00
George Hotz	38f97aa0fe	rename rawbufs to bufs in ExecItem (#4274 )	2024-04-24 11:27:27 +08:00
geohotstan	17328ded7d	setitem no return value (#4266 ) * no ret value and just force contiguous * ok revert contiguous stuff * actually do force it contiguous * revert again lol * add simple regression test * add assert for MLB * guess we're contiguous everything from now on * lol ugly af empty return... * don't change order cuz i don't get disk	2024-04-23 16:28:14 -04:00
Elias Wahl	69341144ba	Wikipedia preprocessing script (#4229 ) * Preprocessing script * short seq prob * comments + env vars * Add preprocessing reference. Add test * lint fix + add eval test support * whitespaces * point to commit * comment * rename * better comments	2024-04-23 10:28:01 -04:00
George Hotz	967638f0d5	update docs, remove corealize (#4264 ) * update docs, remove corealize * handle 0 line count * tensor schedule	2024-04-23 12:05:29 +04:00
George Hotz	acf4ba5c9f	method cache respects beam option (#4261 ) * method cache respects beam option * cleanup get_runner	2024-04-23 09:00:41 +04:00
George Hotz	9a95781d51	renamed (#4260 )	2024-04-23 09:00:28 +04:00
George Hotz	2ae4f45272	WIP PM4 Support (#4110 ) * pm4 kernel launch works * disable USE_THREAD_DIMENSIONS * add kernel code * work on real pm4 * pm4 signal * same * gate pm4 * hcq tests pass * ops passes * pm4 is closer * pm4 debug (#4165) * start debug tests passing * prg * smth * hdp flush * cleaner 1 * do not need this * logs not need * small things * linter * remove AQL * test hcq * fix tests * it's subtracting, it shouldn't be -1 * pm4 changes (#4251) * not need this anymore * sdma signal with non atomic --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-04-23 08:31:27 +04:00
Francis Lam	3f6c7ca8bf	test: fix test_tensor_core_padded on CUDA and add to benchmarks (#4258 ) * test: fix test_tensor_core_padded on CUDA and add to benchmarks * fix linter * run both tests in one call	2024-04-22 23:22:11 -04:00
Francis Lam	bbb0ad4800	wmma: widen TC usage in search by using PADTO on TC axes when possible (#4216 ) * wmma: widen TC usage in search by using PADTO on TC axes when possible * test: start tests for the new padding TC behavior * search: upgrade padded TC search to TC_OPT >= 2 * test: add behavior and correctness test for padded TC added optional argument to apply_tensor_core to set TC_OPT level * linearizer: add tests for the PADTO behvaior and docs	2024-04-22 16:50:31 -04:00
qazal	77a3780005	assert reduce recompute (#4250 )	2024-04-22 16:12:39 +03:00
qazal	a9bc7c1c49	unify assign tests (#4247 )	2024-04-22 11:01:15 +03:00
chenyu	31c9d9a228	fix test_linearizer tc opt tests for bf16 (#4237 ) bf16 tc has larger rtol	2024-04-20 11:51:50 -04:00
chenyu	f1d9d0a151	cleanup external_test_opt (#4234 ) no more OPT=2 or OPT=3, check strict number of kernels, enabled tests that fusion works now	2024-04-20 04:00:08 -04:00
David Hou	dc4b1af09c	more realistic edge behavior for resnet benchmark (#4231 ) * more realistic edge behavior for resnet benchmark * schedule_step * realize all parameters ahead of time * don't save setup and misc schedules	2024-04-19 20:07:46 -04:00
George Hotz	b9570d6100	clean up update stats (#4226 ) * WIP: clean up update stats * line savings now * fix graphs * fix tests * tighter prints * remove extra jit=false * debug=2 means wait * that won't update stats * still wait	2024-04-19 15:41:30 +04:00
qazal	1c87e5dbf6	fuzz schedule context vars (#4223 ) * fuzz schedule context vars * fuzz unique toposorts * merge ground truth with the rest * Revert "merge ground truth with the rest" This reverts commit `1f3463bb57`. * readability> * can override	2024-04-19 13:16:25 +03:00
chenyu	3f3af0fb85	test_linearizer_failures 29 passes now (#4215 ) TC + PADTO fixed	2024-04-18 19:49:23 -04:00
geohotstan	269a58d5fa	tolist to return multidimensional list (#4192 ) * lol does this work * some more changes * a tiny note * rename a variable * add test for data const and add TODO comment * make type correct make type correct	2024-04-18 07:43:10 +04:00
Francis Lata	3644077a42	[MLPerf][UNet3D] Add DICE loss + metrics (#4204 ) * add DICE loss and metrics * update dice to include reference implementation's link * remove unused imports * remove unnecessary test file and update pred + label for metrics and losses test * add tests to CI + add exclusion of mlperf_unet3d --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-17 20:09:33 -04:00
qazal	f75020a903	minimal diff for multioutput reduce pairs (#4030 ) * simple fusion * compiler cache patch * Revert "compiler cache patch" This reverts commit `fa18049597`. * Revert "Revert "compiler cache patch"" This reverts commit `57f8d41f98`. * delete that * early sort * teeny renames * spec * .empty is great * delete sort * Update test_schedule.py * this is one kernel now --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-17 10:55:44 -04:00
Francis Lam	c91b7b1739	test: add fuzz_matmul and better debugging for simple_matmul (#4199 ) also show unoptimized shape in verify_kernel	2024-04-16 23:40:31 -04:00
qazal	ba8602612b	Fuzz all permutations of schedule (#4136 ) * simple toposort * fuzzer * init in_degree * move to tests * same seed * configure paths * internal graph * compare LazyBuffers * simpler * simple graph * assign works * simpler * fix JIT * upstream ci * move ci * fix the path * DEBUG=1 * limit max paths * launch a cmp kernel * Revert "launch a cmp kernel" This reverts commit `791c608992`. * exec ground truth * better perf * copy ground truth once * gpu allclose ast try1 * Revert "gpu allclose ast try1" This reverts commit `1f82103af3`. * prerealized bufs freezing * teeny cleanups * reuse Buffers * Revert "reuse Buffers" This reverts commit `a71de94b03`. --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-17 05:03:21 +04:00
David Hou	97d846dd67	in forced_realize, unchase last op if it is upcast (#4185 ) * in forced_realize, unchase last op if it is upcast * start on test * flesh out test * more test * comment * comment out parallel reduce test * reorder * unused	2024-04-16 17:15:17 -04:00
Francis Lam	e9c1616b27	logging: change LOGKERN to LOGKERNS to match LOGOPS (#4193 ) also add printing of ast and applied_opts during verify_kernel to more easily debug errors if they come up	2024-04-16 16:08:32 -04:00
David Hou	7fb220a567	touchup resnet_layer_bench (#4191 )	2024-04-16 14:43:00 -04:00
David Hou	1dbf3b2b19	Benchmarks for individual resnet layers (#4182 ) * resnet individual layer benchmarks! * small * 1 and 2 * mem_used * no ci * better conv print * defaults * prints * adjust * adjust * adjust * benchmark only one layer example * tensor.training, zero_grad, sum instead of mean, last mem, last kernel count * default jitcnt=1 * scale flops/kernels with jitcnt * add note about jitcnt memory * touchup	2024-04-16 13:53:18 -04:00
George Hotz	55ae73e951	Replicate llm.c in tinygrad (#4179 ) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * test tolist * simple fix for onnx test failures (#4186) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * bump line count to 7500 * simplest fix * safenumpy tolist for now --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> --------- Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>	2024-04-16 15:40:48 +04:00
George Hotz	b6e7243bfa	hotfix: skip slow pre-commit test	2024-04-16 11:48:43 +04:00

1 2 3 4 5 ...

1694 Commits