tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 05:18:01 -05:00

Author	SHA1	Message	Date
chenyu	1bcb58479d	resnet setup power cap red box gpu to 350W (#4484 ) 1%-2% faster	2024-05-08 23:32:41 -04:00
chenyu	0ed755bcf5	resnet use EVAL_BS=192 (#4482 ) * resnet use EVAL_BS=192 also lower green run BEAM_MIN_PROGRESS from 10 to 5 * BEAM_MIN_PROGRESS 5 is too close to setup limit	2024-05-08 22:29:27 -04:00
chenyu	1f6bf9d2f7	real diskcache_clear in model_train resnet (#4445 ) clear cache if INITMLPERF is set, or running run_and_time. dev_beam and dev_run do not clear cache	2024-05-08 19:06:09 -04:00
chenyu	1b4645bea6	hotfix resnet move init_start to start of the script (#4481 )	2024-05-08 19:03:52 -04:00
wozeparrot	a347ae94d6	feat: remove wandb (#4480 )	2024-05-08 15:31:16 -07:00
chenyu	db7e15c46f	hotfix resnet only log epoch start with RUNMLPERF (#4477 )	2024-05-08 15:14:41 -04:00
chenyu	062c6dd65d	mlperf logging, truncate dir in logs and log seed (#4475 )	2024-05-08 12:54:02 -04:00
chenyu	b62a65b617	redo faster sparse_categorical_crossentropy (#4461 ) update LR and DECAY in resnet default that help convergence too	2024-05-08 11:21:43 -04:00
George Hotz	17faae091b	optimizer shouldn't be run without training (#4460 ) * optimizer shouldn't be run without training * set training in relevant tests * fix multitensor * that too	2024-05-06 15:34:12 -07:00
George Hotz	f4e49a7c1a	resnet 50 opt: correct loop + LARS (#4449 ) * correct loop + LARS * ops	2024-05-06 08:01:26 -07:00
George Hotz	fc995d4446	add backward to handcode_resnet50_opt	2024-05-06 06:42:26 -07:00
wozeparrot	603d3a351b	feat: allow keeping multiple cookies (#4440 )	2024-05-05 19:26:48 -07:00
Francis Lam	709410071c	mlperf/resnet: updated BEAM params to increase performance (#4443 )	2024-05-05 21:49:46 -04:00
chenyu	3b30756cbb	update mlperf submission system (#4435 ) more required fields.	2024-05-05 13:19:07 -04:00
David Hou	c0a048c044	batchnorm d(var)/d(mean) = 0 (#4430 ) * d(var)/d(mean) = 0 * drop the number in test_schedule!	2024-05-05 00:25:45 -04:00
qazal	fa17dcaf07	Fix llm.c/export.py (#4423 ) * fix headers * add CI * add stdio * merge clang tests * revert llm.c * revert ci * Revert "revert llm.c" This reverts commit `5fd17e3c8b`.	2024-05-04 19:37:10 +03:00
George Hotz	cb7289f9c9	remove clang program header (#4422 ) * remove clang program header * proper max * bools are numbers * fix compile enet	2024-05-04 08:38:01 -07:00
chenyu	473ecb978a	remove SPLIT_REDUCEOP=1 from resnet scripts (#4404 ) SPLIT_REDUCEOP=1 is default	2024-05-03 12:36:23 -04:00
David Hou	b767d59684	resnet trainer: keep old cookie around until next step has been queued (#4401 ) * keep old cookie around until next step has been queued (-10ms 6gpu) * also for eval * drop cookie before data_get? * Revert "drop cookie before data_get?" This reverts commit `b01e6aa2b2`. * Revert "Revert "drop cookie before data_get?"" This reverts commit `23464e73d4`.	2024-05-03 12:15:21 -04:00
chenyu	2c3b7f8e70	pad resnet training data with training data mean (#4369 ) update model_train resnet to pad training	2024-05-02 20:26:15 -04:00
Francis Lam	3cf8291f2f	mlperf/resnet: update beam params to increase time and quality (#4396 ) * mlperf/resnet: update beam params to increase time and quality * revert upcast 8 in search space and add rocm setup function * refactor to independent setup.sh script	2024-05-02 20:14:46 -04:00
chenyu	ab01a9433d	resnet eval 4n+3 if epoch < 33 (#4391 ) the rule is as thoroughly as 4n+k and we can stop the clock as soon as eval hits target. this can save 24 evals or 12 minutes	2024-05-02 16:52:07 -04:00
chenyu	7492e5d3e7	resnet correct log name for red (#4390 )	2024-05-02 10:58:55 -04:00
chenyu	bf31837e6d	resnet correct steps_in_val_epoch in logging (#4389 ) also added random seed from system in scripts	2024-05-02 10:51:36 -04:00
ym555	3113785604	Llama 3 Models (#4339 ) * Full Impl * fix test * Fix inference loop --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-05-02 06:06:07 -07:00
chenyu	22376e53b7	resnet mlperf logging (#4361 ) * resnet mlperf logging * cropping too much?	2024-05-02 00:00:04 -04:00
chenyu	ad116dc5c6	fill in mlperf system description (#4381 ) it did not ask too many details. will put software versions later with tinygrad commit. ``` python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_red.json training 4.0.0 INFO - System description checker passed for tinybox red ``` ``` python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_green.json training 4. 0.0 INFO - System description checker passed for tinybox green ```	2024-05-01 16:47:45 -04:00
chenyu	9358b62073	rename resnet script to dev_beam.sh and dev_run.sh (#4379 ) final run_and_time needs to be one script for both. rename the old scripts	2024-05-01 14:41:35 -04:00
chenyu	6628e13a5f	pad resnet eval data in model_train (#4374 ) asserted if eval sample count is different from total eval file count.	2024-05-01 14:33:42 -04:00
chenyu	826cccd54d	fix mean underflow for half tensor (#4377 ) * fix mean underflow for half tensor divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var * skip for python backend	2024-05-01 13:38:57 -04:00
George Hotz	b683d0f496	hotfix: 100% accuracy is wrong	2024-05-01 08:07:18 -07:00
chenyu	683b7c605a	pad first batch of imagenet dataloader and update eval (#4368 ) * pad first batch of imagenet dataloader and update eval * pad zero instead of empty for training	2024-05-01 00:21:52 -04:00
Francis Lam	16838eae08	mlperf/resnet: update tinybox_red parameters to new best values (#4364 ) about 27 minutes to setup and 345ms/110TF steps	2024-04-30 18:08:12 -04:00
Francis Lam	0d33c54d99	kernel: change PADTO check to allow up to 4x padding (#4354 ) * kernel: change PADTO check to allow up to 4x padding also optionally remove PADTO from the search action space with BEAM_PADTO=0. * fix test_linearizer test_tensor_cores_padded tests * update resnet runs to use SPLIT_REDUCEOP=1 * fix up search TC axis and amt checking * fix up the dimensions of the TC tests	2024-04-30 15:29:34 -04:00
Elias Wahl	babe87a8ae	BERT: Checkpoint loading tests (#4359 ) * Move checkpoint init to helpers. Add test * linters * Move the steps outside of the main train loop * Move data_get * data_get belongs to helpers	2024-04-30 14:43:41 -04:00
Elias Wahl	71ff68b445	dropout after eval step (#4351 )	2024-04-29 15:47:21 -04:00
Elias Wahl	27613dd881	MLPerf BERT: Main training loop (#4288 ) * BERT language modeling head + trunc normal initializers * add train loop + helpers * shuffle in dataloaders + slight changes in main loop * beam change * Minor changes * random.shuffle * HParam update * Use deque for dataloader * wandb bert project name * half fixes * BENCHMARK + remove epoch * cast + print() --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-29 14:35:27 -04:00
Francis Lata	bb849a57d1	[MLPerf] UNet3D dataloader (#4343 ) * add support for train/val datasets for kits19 * split dataset into train and val sets * add tests for kits19 dataloader * add MLPerf dataset tests to CI * update unet3d model_eval script * fix linting * add nibabel * fix how mock dataset gets created * update ref implementation with permalink and no edits * clean up test and update rand_flip implementation * cleanups	2024-04-28 22:34:18 -04:00
Arnav Mehta	f3de17912f	added the download if not present missing function (#4318 )	2024-04-28 16:31:08 +08:00
chenyu	ec65aea32f	resnet stop the script once hit target (#4303 ) * resnet stop the script once hit target * comment	2024-04-25 23:54:56 -04:00
George Hotz	1e37c4a7a1	minor llm.c improvements	2024-04-26 11:15:31 +08:00
chenyu	f9a7badace	use LR=7 for resnet with BS=1536 (#4299 ) had 3 runs after lr float32, seems quite stable and converges at epoch 34 and 35	2024-04-25 15:23:10 -04:00
chenyu	c11bad766d	prepare mlperf submission (#4270 ) * prepare mlperf submission * 28min compile and 3h53m * red 30 minute compile and 56 TFLOPS	2024-04-24 13:19:31 -04:00
chenyu	c1fbacb182	resnet benchmarks use DEFAULT_FLOAT=HALF (#4285 ) also update LR default to scaled based on 1536 (the BS we are submitting)	2024-04-24 12:10:57 -04:00
George Hotz	ad28fdecb1	si.inputs+outputs -> bufs (#4279 )	2024-04-24 15:12:34 +08:00
chenyu	8401de9922	resnet benchmark return early in eval (#4278 ) only do few eval steps to compile, and skip second epoch when doing beam + benchmark. save 2 minutes	2024-04-24 00:55:01 -04:00
chenyu	6637ecc5fe	use IGNORE_JIT_FIRST_BEAM to not BEAM in jit cnt=0 (#4269 ) we want to have different BEAM values for resnet train and eval. global JITBEAM cannot do this. added the flag to change beam behavior at cnt=0 (so it default behaves the same with or without TinyJit), and for cnt=1 it uses existing BEAM.value. Also updated the context var BEAM in resnet to be outside of TinyJit. saves about 3 minutes compile time	2024-04-23 18:59:43 -04:00
Elias Wahl	3a48773f1a	BERT dataloader (#4252 ) * add dataloader * comment	2024-04-23 13:44:49 -04:00
chenyu	37f8be6450	resnet print epoch ops and mem in benchmark (#4244 ) * resnet print epoch ops and mem in benchmark also added a flag to optionally disable reset jitted steps * real per epoch stats	2024-04-21 18:32:31 -04:00
chenyu	30fc1ad415	remove TODO: remove explicit dtypes after broadcast fix in stable_diffusion (#4241 ) this is done	2024-04-21 00:31:24 -04:00

... 9 10 11 12 13 ...

1179 Commits