tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
chenyu	826cccd54d	fix mean underflow for half tensor (#4377 ) * fix mean underflow for half tensor divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var * skip for python backend	2024-05-01 13:38:57 -04:00
George Hotz	b683d0f496	hotfix: 100% accuracy is wrong	2024-05-01 08:07:18 -07:00
chenyu	683b7c605a	pad first batch of imagenet dataloader and update eval (#4368 ) * pad first batch of imagenet dataloader and update eval * pad zero instead of empty for training	2024-05-01 00:21:52 -04:00
Francis Lam	16838eae08	mlperf/resnet: update tinybox_red parameters to new best values (#4364 ) about 27 minutes to setup and 345ms/110TF steps	2024-04-30 18:08:12 -04:00
Francis Lam	0d33c54d99	kernel: change PADTO check to allow up to 4x padding (#4354 ) * kernel: change PADTO check to allow up to 4x padding also optionally remove PADTO from the search action space with BEAM_PADTO=0. * fix test_linearizer test_tensor_cores_padded tests * update resnet runs to use SPLIT_REDUCEOP=1 * fix up search TC axis and amt checking * fix up the dimensions of the TC tests	2024-04-30 15:29:34 -04:00
Elias Wahl	babe87a8ae	BERT: Checkpoint loading tests (#4359 ) * Move checkpoint init to helpers. Add test * linters * Move the steps outside of the main train loop * Move data_get * data_get belongs to helpers	2024-04-30 14:43:41 -04:00
Elias Wahl	71ff68b445	dropout after eval step (#4351 )	2024-04-29 15:47:21 -04:00
Elias Wahl	27613dd881	MLPerf BERT: Main training loop (#4288 ) * BERT language modeling head + trunc normal initializers * add train loop + helpers * shuffle in dataloaders + slight changes in main loop * beam change * Minor changes * random.shuffle * HParam update * Use deque for dataloader * wandb bert project name * half fixes * BENCHMARK + remove epoch * cast + print() --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-29 14:35:27 -04:00
Francis Lata	bb849a57d1	[MLPerf] UNet3D dataloader (#4343 ) * add support for train/val datasets for kits19 * split dataset into train and val sets * add tests for kits19 dataloader * add MLPerf dataset tests to CI * update unet3d model_eval script * fix linting * add nibabel * fix how mock dataset gets created * update ref implementation with permalink and no edits * clean up test and update rand_flip implementation * cleanups	2024-04-28 22:34:18 -04:00
Arnav Mehta	f3de17912f	added the download if not present missing function (#4318 )	2024-04-28 16:31:08 +08:00
chenyu	ec65aea32f	resnet stop the script once hit target (#4303 ) * resnet stop the script once hit target * comment	2024-04-25 23:54:56 -04:00
George Hotz	1e37c4a7a1	minor llm.c improvements	2024-04-26 11:15:31 +08:00
chenyu	f9a7badace	use LR=7 for resnet with BS=1536 (#4299 ) had 3 runs after lr float32, seems quite stable and converges at epoch 34 and 35	2024-04-25 15:23:10 -04:00
chenyu	c11bad766d	prepare mlperf submission (#4270 ) * prepare mlperf submission * 28min compile and 3h53m * red 30 minute compile and 56 TFLOPS	2024-04-24 13:19:31 -04:00
chenyu	c1fbacb182	resnet benchmarks use DEFAULT_FLOAT=HALF (#4285 ) also update LR default to scaled based on 1536 (the BS we are submitting)	2024-04-24 12:10:57 -04:00
George Hotz	ad28fdecb1	si.inputs+outputs -> bufs (#4279 )	2024-04-24 15:12:34 +08:00
chenyu	8401de9922	resnet benchmark return early in eval (#4278 ) only do few eval steps to compile, and skip second epoch when doing beam + benchmark. save 2 minutes	2024-04-24 00:55:01 -04:00
chenyu	6637ecc5fe	use IGNORE_JIT_FIRST_BEAM to not BEAM in jit cnt=0 (#4269 ) we want to have different BEAM values for resnet train and eval. global JITBEAM cannot do this. added the flag to change beam behavior at cnt=0 (so it default behaves the same with or without TinyJit), and for cnt=1 it uses existing BEAM.value. Also updated the context var BEAM in resnet to be outside of TinyJit. saves about 3 minutes compile time	2024-04-23 18:59:43 -04:00
Elias Wahl	3a48773f1a	BERT dataloader (#4252 ) * add dataloader * comment	2024-04-23 13:44:49 -04:00
chenyu	37f8be6450	resnet print epoch ops and mem in benchmark (#4244 ) * resnet print epoch ops and mem in benchmark also added a flag to optionally disable reset jitted steps * real per epoch stats	2024-04-21 18:32:31 -04:00
chenyu	30fc1ad415	remove TODO: remove explicit dtypes after broadcast fix in stable_diffusion (#4241 ) this is done	2024-04-21 00:31:24 -04:00
chenyu	a1940ced77	remove the assign hack in whisper (#4240 ) no longer needed, the commented test case was removed too	2024-04-20 23:56:44 -04:00
chenyu	3f126c7664	fix examples vits / converstion.py (#4239 ) it was passing a const numpy array into Tensor.arange	2024-04-20 23:29:12 -04:00
George Hotz	cd88afc98b	datasets isn't a feature + filter docstrings (#4228 ) * datasets isn't a feature * filter docstrings in sz	2024-04-19 16:16:10 +04:00
George Hotz	d99b512084	llm.c timing (#4219 ) * add timing info * fix malloc * 8s with beam	2024-04-19 12:43:21 +04:00
George Hotz	39b60a25f0	more llm c work (#4207 ) * more llm c work * print nicely * fake load pretrained * select warmups * output c code	2024-04-18 22:20:44 +04:00
chenyu	f7416916df	update resnet hparams based on BS=1632 RCP (#4210 ) https://github.com/mlcommons/logging/blob/master/mlperf_logging/rcp_checker/training_4.0.0/rcps_resnet.json	2024-04-18 12:01:46 -04:00
George Hotz	fa57c3e7ce	continue llm.c (#4190 ) * continue llm.c * export more * progress on llm.c * simpler optim, names work	2024-04-18 10:57:54 +04:00
Francis Lata	3644077a42	[MLPerf][UNet3D] Add DICE loss + metrics (#4204 ) * add DICE loss and metrics * update dice to include reference implementation's link * remove unused imports * remove unnecessary test file and update pred + label for metrics and losses test * add tests to CI + add exclusion of mlperf_unet3d --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-17 20:09:33 -04:00
chenyu	cd801a15f3	scipy.signal.gaussian -> scipy.signal.windows.gaussian (#4205 ) fixed unet3d model_eval, will add to CI after merging new dice loss	2024-04-17 19:15:37 -04:00
David Hou	1dbf3b2b19	Benchmarks for individual resnet layers (#4182 ) * resnet individual layer benchmarks! * small * 1 and 2 * mem_used * no ci * better conv print * defaults * prints * adjust * adjust * adjust * benchmark only one layer example * tensor.training, zero_grad, sum instead of mean, last mem, last kernel count * default jitcnt=1 * scale flops/kernels with jitcnt * add note about jitcnt memory * touchup	2024-04-16 13:53:18 -04:00
George Hotz	55ae73e951	Replicate llm.c in tinygrad (#4179 ) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * test tolist * simple fix for onnx test failures (#4186) * write llm.c and add a few new methods to tensor * training works * add jit * tests for new functions * bump line count to 7500 * simplest fix * safenumpy tolist for now --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> --------- Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>	2024-04-16 15:40:48 +04:00
chenyu	aa093efa43	fix handcode_resnet50_opt flops count (#4184 )	2024-04-15 22:13:45 -04:00
chenyu	d5b67c1ca3	log resnet TRAIN_BEAM / EVAL_BEAM (#4181 ) also run eval in benchmark mode if either one is positive	2024-04-15 19:29:08 -04:00
chenyu	6a2168e698	TRAIN_BEAM and EVAL_BEAM for resnet (#4177 ) working on measuring compile time	2024-04-15 14:57:21 -04:00
David Hou	593c90d7d6	Resnet fp16 training with fp32 master weight copy (#4144 ) * add casts to layers * FLOAT flag * detach * no_grad for eval * whitespace * explicit fp32 initialization * oops * whitespace * put back config['DEFAULT_FLOAT'] * bad * live dangerously (don't hide bugs) * don't bundle changes --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-14 11:25:08 -04:00
chenyu	e20d6f9221	correct resnet estimate time (#4169 ) 7.99 hours was rendered as 7h0m.	2024-04-14 02:21:46 -04:00
George Hotz	ebc94c9d6c	rewrite the jit in the context of new schedule (#4162 ) * rewrite the jit in the context of new schedule * mypy better * fix placeholder * tests * all functionality should work * fix tests * no CacheCollector	2024-04-12 21:54:36 -07:00
George Hotz	216eb235e5	hotfix: cast mnist to float	2024-04-09 19:30:03 -07:00
George Hotz	fea774f669	spend 5 lines to bring mnist into the repo (#4122 )	2024-04-09 19:24:57 -07:00
chenyu	92c0675ccf	setitem initial support (#4093 ) * wip setitem it's an eager assign to output shapetracker view * cleanups and tests * more cleanups	2024-04-07 20:35:22 -04:00
George Hotz	97c402d69e	use imagenet spawn (#4096 )	2024-04-06 08:34:10 -07:00
George Hotz	fffd9b05f5	mock mnist data for imagenet trainer (#4095 ) * mock mnist data for imagenet * move print and test * needed to reshape	2024-04-06 08:08:40 -07:00
George Hotz	93824e59eb	support MOCKDATA=1 for resnet (#4090 ) * mockdata for resnet * fix eval, revert hsa	2024-04-05 17:19:18 -07:00
George Hotz	bec2aaf404	add beautiful_mnist_multigpu example	2024-04-02 00:54:04 +00:00
chenyu	aa76d566c2	cleanup mamba (#4004 ) make it read nicer and cleanup some movement methods and math simplification. 790m, 1.4b, 2.8b model does not really run. sampling is not implemented. jit is incorrect. some deadcode / wrong code path and copied from torch stuff stuff.	2024-03-30 02:50:13 -04:00
chenyu	c71627fee6	move GlobalCounter to helpers (#4002 ) break circular import between ops and buffer	2024-03-30 00:30:30 -04:00
chenyu	ecf38f498e	beam search resnet eval too in BENCHMARK (#4000 )	2024-03-29 21:07:23 -04:00
reddyn12	9b5e15db6e	Mamba Implementation (#3456 ) * first commit * state back to orig * mamba comparisions * rm file * rename file * use Tensor.einsum and mke default model 370M * Cleaned code and made a comparision test * Simplyfy pull request. Only has 1 mamba implementation now. * Update prompt * rm whitespaces * last space * remove Einops dependency * rm unused code * add tests * rm print statement * rm imports * skip CLANG * Update skipIf description * skip model test in CI and add CLANG fix * rm Device import * don't be stupid * Fix conv assign When the prompt is too short, the logic for conv_state assign messes up. This can be fixed when padding the tokenized array to min length of 4. I padded using the empty string token, but idk if proper practice is to use the PAD token * fix p1 * temp * fix jit import --------- Co-authored-by: schlimeszn <schlimeszn@gmail.com> Co-authored-by: reddyn <nikidsniper@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-28 17:49:12 -07:00
chenyu	b47f6cebb2	LinearizerOptions -> CompilerOptions (#3978 )	2024-03-28 17:50:23 -04:00

... 10 11 12 13 14 ...

1200 Commits