tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
qazal	0b47818e0f	simpler reduceop children chasing (#4350 ) * simplest case * midreduce case * all tests * pending things * unify tests	2024-05-02 15:15:30 +03:00
George Hotz	f635c4d273	fix define global (#4383 ) * fix define global * remove name from DEFINE_GLOBAL * fix fuzzing * fix ptx * fix python	2024-05-01 22:32:56 -04:00
chenyu	826cccd54d	fix mean underflow for half tensor (#4377 ) * fix mean underflow for half tensor divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var * skip for python backend	2024-05-01 13:38:57 -04:00
chenyu	077ea6926c	remove downcast_half in sum (#4376 ) breaks boolean mean and other stuff	2024-05-01 11:46:44 -04:00
George Hotz	bd49d2854a	hotfix: skip fetch tests always	2024-05-01 08:43:26 -07:00
qazal	ea06f657df	fusion tests from test_opt (#4357 ) * opt tests * more sgd * batchnorm * models stay in external	2024-05-01 16:44:12 +03:00
George Hotz	27ee49bf30	tensor variable (#4362 ) * tensor variable support * consttype without variable? * __setitem__ * symbolic mean works * arange test * more tests * a few more tests	2024-04-30 14:08:57 -07:00
Francis Lam	0d33c54d99	kernel: change PADTO check to allow up to 4x padding (#4354 ) * kernel: change PADTO check to allow up to 4x padding also optionally remove PADTO from the search action space with BEAM_PADTO=0. * fix test_linearizer test_tensor_cores_padded tests * update resnet runs to use SPLIT_REDUCEOP=1 * fix up search TC axis and amt checking * fix up the dimensions of the TC tests	2024-04-30 15:29:34 -04:00
Elias Wahl	babe87a8ae	BERT: Checkpoint loading tests (#4359 ) * Move checkpoint init to helpers. Add test * linters * Move the steps outside of the main train loop * Move data_get * data_get belongs to helpers	2024-04-30 14:43:41 -04:00
Francis Lam	c12bcabb07	search: fix actions space checks to ignore TC axis and amt (#4360 ) * search: fix actions space checks to ignore TC axis and amt * add test for number of actions in get_linearizer_actions	2024-04-30 14:02:22 -04:00
George Hotz	d325be2540	update docs (#4356 ) * update docs * nn.md * mnist cleanups * rhip test is very slow	2024-04-30 16:51:42 +09:00
Francis Lam	a9a1fa6bbf	wmma: add reduce axis choice to TC action space (#4328 ) * wmma: add reduce axis choice to TC action space * add test for TC multi-reduce axis choice	2024-04-29 19:15:39 -04:00
chenyu	93abcd3113	fix function.py sum backward without downcast_half (#4353 ) without downcast_half, sum output dtype can be different from input dtype. cast back to input dtype in function.py	2024-04-29 17:53:02 -04:00
Francis Lam	18c61ce077	test/fuzz_linearizer: add --atol/rtol and change half distribution (#4352 )	2024-04-29 15:53:59 -04:00
Elias Wahl	27613dd881	MLPerf BERT: Main training loop (#4288 ) * BERT language modeling head + trunc normal initializers * add train loop + helpers * shuffle in dataloaders + slight changes in main loop * beam change * Minor changes * random.shuffle * HParam update * Use deque for dataloader * wandb bert project name * half fixes * BENCHMARK + remove epoch * cast + print() --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-29 14:35:27 -04:00
qazal	cc1797673e	all fusion opportunities (#4348 )	2024-04-29 19:32:23 +03:00
chenyu	f363f39e83	fix dtype of const folded sum (#4349 ) const folding sum should return in the same dtype the same as regular sum, which can be different from input dtype	2024-04-29 11:40:45 -04:00
qazal	774a9b0bca	override assign_target in fuzz_schedule (#4342 ) * store assign_targets * cleanup * override target	2024-04-29 11:04:04 +03:00
Francis Lata	bb849a57d1	[MLPerf] UNet3D dataloader (#4343 ) * add support for train/val datasets for kits19 * split dataset into train and val sets * add tests for kits19 dataloader * add MLPerf dataset tests to CI * update unet3d model_eval script * fix linting * add nibabel * fix how mock dataset gets created * update ref implementation with permalink and no edits * clean up test and update rand_flip implementation * cleanups	2024-04-28 22:34:18 -04:00
chenyu	c1d8d425eb	fix mean of half tensor if sum is greater than hlaf.max (#4327 ) sum of half does acc in float32 already, add an arg to not downcast to half and use that in mean	2024-04-28 18:04:54 -04:00
qazal	23445db2b9	no skipped tests in RHIP (#4337 ) * delete skip * delete split skip * remu dev * compiler fails here * Revert "remu dev" This reverts commit `28b933d4eb`.	2024-04-28 12:23:05 -04:00
Obada Khalili	e4befa41d7	Fix in `_reshape_mask` (#4332 ) * handle reshape with remainder in _reshape_mask * remove trailing whitespce * use helper_test_op to generate tensors from shapes * test in shapetracket too * remove whitespace * revert property name in other class tests	2024-04-28 11:57:39 -04:00
Timmy	664b563c91	Add `insert_before` to Linearizer Functions (#4320 ) * adding insert_before to linearizer functions * uop insert_before test case * formatting * more formatting * more formatting * syntax * removing self.cast * addressing err * removing noqa s	2024-04-28 11:38:36 -04:00
qazal	3372bea322	reduce children fusion tests (#4321 ) * base tests * real-world tests	2024-04-28 11:14:02 -04:00
chenyu	24a6342950	add mem/s to external_benchmark_resnet (#4309 )	2024-04-26 20:07:17 -04:00
Szymon Ożóg	de832d26c6	disable bfloat16 from ptx tests (#4305 )	2024-04-26 01:20:10 -04:00
Szymon Ożóg	f1ebcffb87	Ptx beam fix (#4296 ) * Fix beam search for PTX * fix ptr arm test	2024-04-25 15:39:39 -04:00
qazal	9a47ed0705	test crossing diamond assigns (#4298 )	2024-04-25 21:52:05 +03:00
chenyu	5ae252ae83	use at least float32 for optim.lr (#4297 ) * use at least float32 for optim.lr when doing mixed precision training (float32 weight, default_float=half), still use float32 to store lr. it would have been upcasted later in actual weight update, but would have lost precision. this improved resnet convergence significantly * undo type annotation	2024-04-25 14:42:28 -04:00
David Hou	6f792b727b	More improvements for resnet layer bench (#4272 ) * fix first layer size, new schedule stuff * estimates * get different conv layers * \r for estimated times * E501 * space after comma	2024-04-25 12:40:49 -04:00
qazal	74a1be88f5	test reduce graph permutations (#4291 )	2024-04-25 11:34:44 +03:00
George Hotz	acb32e1766	hotfix: PM4 supports timing	2024-04-24 08:38:59 +00:00
George Hotz	ad28fdecb1	si.inputs+outputs -> bufs (#4279 )	2024-04-24 15:12:34 +08:00
George Hotz	38f97aa0fe	rename rawbufs to bufs in ExecItem (#4274 )	2024-04-24 11:27:27 +08:00
geohotstan	17328ded7d	setitem no return value (#4266 ) * no ret value and just force contiguous * ok revert contiguous stuff * actually do force it contiguous * revert again lol * add simple regression test * add assert for MLB * guess we're contiguous everything from now on * lol ugly af empty return... * don't change order cuz i don't get disk	2024-04-23 16:28:14 -04:00
Elias Wahl	69341144ba	Wikipedia preprocessing script (#4229 ) * Preprocessing script * short seq prob * comments + env vars * Add preprocessing reference. Add test * lint fix + add eval test support * whitespaces * point to commit * comment * rename * better comments	2024-04-23 10:28:01 -04:00
George Hotz	967638f0d5	update docs, remove corealize (#4264 ) * update docs, remove corealize * handle 0 line count * tensor schedule	2024-04-23 12:05:29 +04:00
George Hotz	acf4ba5c9f	method cache respects beam option (#4261 ) * method cache respects beam option * cleanup get_runner	2024-04-23 09:00:41 +04:00
George Hotz	9a95781d51	renamed (#4260 )	2024-04-23 09:00:28 +04:00
George Hotz	2ae4f45272	WIP PM4 Support (#4110 ) * pm4 kernel launch works * disable USE_THREAD_DIMENSIONS * add kernel code * work on real pm4 * pm4 signal * same * gate pm4 * hcq tests pass * ops passes * pm4 is closer * pm4 debug (#4165) * start debug tests passing * prg * smth * hdp flush * cleaner 1 * do not need this * logs not need * small things * linter * remove AQL * test hcq * fix tests * it's subtracting, it shouldn't be -1 * pm4 changes (#4251) * not need this anymore * sdma signal with non atomic --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-04-23 08:31:27 +04:00
Francis Lam	3f6c7ca8bf	test: fix test_tensor_core_padded on CUDA and add to benchmarks (#4258 ) * test: fix test_tensor_core_padded on CUDA and add to benchmarks * fix linter * run both tests in one call	2024-04-22 23:22:11 -04:00
Francis Lam	bbb0ad4800	wmma: widen TC usage in search by using PADTO on TC axes when possible (#4216 ) * wmma: widen TC usage in search by using PADTO on TC axes when possible * test: start tests for the new padding TC behavior * search: upgrade padded TC search to TC_OPT >= 2 * test: add behavior and correctness test for padded TC added optional argument to apply_tensor_core to set TC_OPT level * linearizer: add tests for the PADTO behvaior and docs	2024-04-22 16:50:31 -04:00
qazal	77a3780005	assert reduce recompute (#4250 )	2024-04-22 16:12:39 +03:00
qazal	a9bc7c1c49	unify assign tests (#4247 )	2024-04-22 11:01:15 +03:00
chenyu	31c9d9a228	fix test_linearizer tc opt tests for bf16 (#4237 ) bf16 tc has larger rtol	2024-04-20 11:51:50 -04:00
chenyu	f1d9d0a151	cleanup external_test_opt (#4234 ) no more OPT=2 or OPT=3, check strict number of kernels, enabled tests that fusion works now	2024-04-20 04:00:08 -04:00
David Hou	dc4b1af09c	more realistic edge behavior for resnet benchmark (#4231 ) * more realistic edge behavior for resnet benchmark * schedule_step * realize all parameters ahead of time * don't save setup and misc schedules	2024-04-19 20:07:46 -04:00
George Hotz	b9570d6100	clean up update stats (#4226 ) * WIP: clean up update stats * line savings now * fix graphs * fix tests * tighter prints * remove extra jit=false * debug=2 means wait * that won't update stats * still wait	2024-04-19 15:41:30 +04:00
qazal	1c87e5dbf6	fuzz schedule context vars (#4223 ) * fuzz schedule context vars * fuzz unique toposorts * merge ground truth with the rest * Revert "merge ground truth with the rest" This reverts commit `1f3463bb57`. * readability> * can override	2024-04-19 13:16:25 +03:00
chenyu	3f3af0fb85	test_linearizer_failures 29 passes now (#4215 ) TC + PADTO fixed	2024-04-18 19:49:23 -04:00

... 16 17 18 19 20 ...

2555 Commits