tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-07 05:05:13 -05:00

Author	SHA1	Message	Date
George Hotz	3f4eb9006a	test for device mismatch [pr] (#9250 ) * test for device mismatch [pr] * fix bert	2025-02-26 13:06:33 +08:00
chenyu	979e84f30e	RESET_STEP in bert setup and beam (#9248 ) running dev_beam migh OOM without it but runs fine in real run.	2025-02-25 19:15:10 -05:00
chenyu	6610ad58ab	hotfix bert no shard with only one device (#9243 ) `LLVM=1 BERT_SIZE="tiny" DEFAULT_FLOAT=HALF BENCHMARK=5 MODEL="bert" python3 examples/mlperf/model_train.py` runs for me with this. it should not failed with single device shard though	2025-02-25 09:05:11 -05:00
chenyu	8c7be428e5	update bert BS to 78 (#9236 ) fits 78 now. about 215 tflops on green	2025-02-24 22:47:35 -05:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
chenyu	3b37cc898b	add bert tiny config (#9177 ) set with BERT_SIZE=tiny. easier to study embedding and fusion	2025-02-19 14:57:03 -05:00
chenyu	975c318dbc	bert use int32 for input ids (#9173 ) original data was int32 for these. float might have caused precision issues	2025-02-19 08:17:27 -05:00
chenyu	ff05bff221	put bert data shard inside jit (#9160 ) python time 45ms -> 9ms, it was spending time to schedule the shard also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS	2025-02-18 10:36:54 -05:00
chenyu	5dc1257ce0	clean up bert fake data iterator [pr] (#9145 ) reuse the same get_data_bert path in setup and real run	2025-02-17 20:03:38 -05:00
chenyu	81597ddd96	increase lr for bert (#9098 ) had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview	2025-02-14 19:10:35 -05:00
chenyu	b58e7b1898	zero out the weight in bert init run (#9076 ) `DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.	2025-02-14 08:40:41 -05:00
chenyu	9e91898941	bert eval at the end of training (#9070 ) always eval at the last epoch	2025-02-13 16:29:44 -05:00
chenyu	7b5ac2c15e	free_intermediates in bert (#9040 ) also re-enable dropout and update EVAL_BS	2025-02-12 10:00:39 -05:00
chenyu	a092b6395d	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00
chenyu	c7ca7959e6	set DISABLE_DROPOUT=1 in bert script for now (#8799 )	2025-01-29 10:51:29 -05:00
chenyu	c99ae81f63	update default resnet LOSS_SCALER to 256 [pr] (#8774 )	2025-01-27 16:59:05 -05:00
chenyu	af65331b76	update beam params for bert green [pr] (#8726 ) increase BEAM_UPCAST_MAX and BEAM_LOCAL_MAX to default and matched red. 3% faster step	2025-01-22 22:00:05 -05:00
chenyu	9a9079118e	envvar BERT_LAYERS [pr] (#8709 ) default is 24 for large	2025-01-21 22:49:19 -05:00
chenyu	9f6d545a16	bert log global_norm in training step [pr] (#8708 ) * bert log global_norm in training step [pr] and minor cleanups * .item()	2025-01-21 20:36:27 -05:00
chenyu	1e283c33d3	remove realize in bert model init [pr] (#8707 )	2025-01-21 14:11:03 -05:00
chenyu	930728c069	bert BS 72->66 [pr] (#8621 ) 72 does not fit now	2025-01-14 18:41:41 -05:00
chenyu	994944920b	simpler batch_load_train_bert [pr] (#8582 ) don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step. https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview	2025-01-12 20:25:05 -05:00
chenyu	def90b22f6	EVAL_BS=36 for bert [pr] (#8576 ) 3X faster eval compared to BS=6. green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview	2025-01-12 09:43:56 -05:00
chenyu	64a917b7eb	remove LAZYCACHE ContextVar [pr] (#8175 ) also removed from resnet latest script	2024-12-11 22:02:52 -05:00
chenyu	3e2430f822	use tqdm tqdm in mlperf training (#7929 ) issue in benchmark dashboard logging, revert back to tqdm tqdm for now	2024-11-27 21:57:05 -05:00
qazal	9828277c03	view doesn't have buffer, fix the tests [pr] (#7841 ) * view doesn't have buffer, fix the tests [pr] * need assigns	2024-11-22 20:41:55 +08:00
Francis Lata	90eff347e2	tinytqdm write support (#6359 ) * add write support * add test * update test case to compare write outputs * assert final write output * flush when using write * update write logic * Revert "update write logic" This reverts commit `5e0e611b46`. --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-10-16 14:51:41 -04:00
chenyu	ed1ed9e4ff	bert use BS=72 (#7015 ) memory 131 -> 138 green tflops 201 -> 209 red tflops 160 -> 169	2024-10-12 09:41:56 -04:00
chenyu	36056e0760	update mlperf systems and copy 4.1 to 5.0 (#7004 )	2024-10-11 16:20:34 -04:00
chenyu	0e42662f2a	log seed at the right place for bert (#7000 )	2024-10-11 10:39:40 -04:00
nimlgen	5496a36536	update red mlperf bert readme (#6969 )	2024-10-11 13:08:06 +03:00
chenyu	b5546912e2	10% more TRAIN_STEPS for bert (#6971 ) got two very close run, adding more steps for buffer	2024-10-09 19:21:43 -04:00
chenyu	35cf48659b	limit beam param for bert on green (#6966 ) seems to mitigate the crash	2024-10-09 11:48:18 -04:00
chenyu	1ff2c98f8a	fix logfile name for bert red (#6952 )	2024-10-08 05:37:52 -04:00
chenyu	a78c96273a	update bert epoch logging (#6940 ) * update bert epoch logging epoch for bert is simply number of examples seen (which is used for RCP check) * update total steps too * more changes	2024-10-08 00:34:06 -04:00
chenyu	102dfe5510	back to 210 for bert loss scaler (#6934 ) getting 2 NaN for this, revert back to 210	2024-10-07 10:17:21 -04:00
chenyu	0cf815a93a	bert use BS=66 and update hparams (#6932 ) with dropout memory improvement, we can fit BS=66 now. revert back to the hparams in #5891 too	2024-10-07 05:08:27 -04:00
chenyu	718b959349	log epoch start and stop for bert (#6912 )	2024-10-06 06:39:46 -04:00
chenyu	16c1fa4208	use BEAM=3 for red box bert runs (#6904 ) BEAM=4 slightly exceeded 30 minutes setup	2024-10-05 09:21:12 -04:00
chenyu	0e706227a2	add seed to bert result log filename (#6903 ) * add seed to bert result log filename * different name for different benchmark	2024-10-05 09:15:24 -04:00
chenyu	7391376528	update bert hparams (#6876 ) 4h32m with this https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/q99frv1l/overview. loss scaler 213->210. matched the closest submission, no nan for ~10 runs. increased lr and total step a bit. `PARALLEL=0` after setup, same as resnet.	2024-10-04 00:39:06 -04:00
chenyu	5f77217772	bert default CKPT to 0 (#6840 ) not required	2024-10-01 21:55:56 -04:00
chenyu	f59517754e	add RESET_STEP in bert to control reset (#6818 ) same as resnet	2024-09-30 09:39:04 -04:00
chenyu	494b20e886	bert BS back to 54 (#6791 ) 60 does not run end to end	2024-09-27 22:16:05 -04:00
chenyu	572d77d1d9	bert script delete eval data after eval (#6790 ) fits BS=60 which is 2% faster than 54. also fixed wandb logging params	2024-09-27 20:54:00 -04:00
chenyu	f9c8e144ff	chmod +x mlperf bert script for red (#6789 ) also disabled raising power cap in setup. wozeparrot mentioned that's unstable and might cause bert training issue on red	2024-09-27 11:27:32 -04:00
Francis Lata	d3a387be63	[MLPerf] Prepare openimages dataset script (#6747 ) * prepare openimages for MLPerf * cleanup * fix issue when clearing jit_cache on retinanet eval * revert pandas specific changes	2024-09-27 11:13:56 -04:00
chenyu	bea7ed5986	add RUNMLPERF=1 to bert dev_run.sh (#6775 ) already set in run_and_time.sh, need RUNMLPERF=1 for it to load real data	2024-09-26 11:00:49 -04:00
chenyu	12de203a43	add IGNORE_JIT_FIRST_BEAM to bert scripts (#6769 ) * update bert BEAM params copied from resnet to start with * just IGNORE_JIT_FIRST_BEAM	2024-09-26 05:38:24 -04:00
chenyu	5a5fbfa1eb	smaller bert script change (#6768 ) only WANDB and RUNMLPERF order. BENCHMARK and BEAM will be done differently	2024-09-26 04:54:28 -04:00

1 2 3 4

176 Commits