tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Francis Lata	27ec792c19	check for CKPT when target metric is reached before saving	2025-03-02 00:41:08 -08:00
Francis Lata	3ac4ae5870	hotfix: log metric and move target metric check outside of CKPT	2025-03-01 04:31:00 -08:00
Francis Lata	974309862d	update dataloader seed	2025-02-28 21:41:30 +00:00
Francis Lata	6a62ece474	minor cleanups	2025-02-28 15:43:11 +00:00
Francis Lata	074e9f742b	more typing fixes	2025-02-28 15:42:11 +00:00
Francis Lata	e9d1af26b2	undo more changes	2025-02-28 15:11:17 +00:00
Francis Lata	47edcdb834	undo changse	2025-02-28 15:08:55 +00:00
Francis Lata	bdf442717c	update seeding on dataloader and the start of training script	2025-02-28 14:58:28 +00:00
Francis Lata	87bfa77f4a	some typing cleanups	2025-02-28 14:47:29 +00:00
Francis Lata	dc394e8214	Merge branch 'master' into retinanet_mlperf	2025-02-27 15:33:20 -05:00
George Hotz	67ba073c55	hotfix: test accuracy in beautiful_mnist_torch	2025-02-27 11:18:59 +08:00
Francis Lata	4fa62ba304	Merge branch 'master' into retinanet_mlperf	2025-02-26 13:27:35 -05:00
Francis Lata	86b737a120	leakyrelu to leaky_relu (#9270 )	2025-02-26 13:22:08 -05:00
Francis Lata	7cb226d757	Revert "Revert "add nan check during training"" This reverts commit `b7b2943197`.	2025-02-26 15:43:20 +00:00
Francis Lata	e006ae24ea	Merge branch 'master' into retinanet_mlperf	2025-02-26 07:31:32 +00:00
George Hotz	3f4eb9006a	test for device mismatch [pr] (#9250 ) * test for device mismatch [pr] * fix bert	2025-02-26 13:06:33 +08:00
chenyu	979e84f30e	RESET_STEP in bert setup and beam (#9248 ) running dev_beam migh OOM without it but runs fine in real run.	2025-02-25 19:15:10 -05:00
Francis Lata	b7b2943197	Revert "add nan check during training" This reverts commit `ddf1f0d5dd`.	2025-02-25 21:43:28 +00:00
chenyu	6610ad58ab	hotfix bert no shard with only one device (#9243 ) `LLVM=1 BERT_SIZE="tiny" DEFAULT_FLOAT=HALF BENCHMARK=5 MODEL="bert" python3 examples/mlperf/model_train.py` runs for me with this. it should not failed with single device shard though	2025-02-25 09:05:11 -05:00
Francis Lata	ddf1f0d5dd	add nan check during training	2025-02-25 10:53:31 +00:00
Francis Lata	8737020d75	add JIT reset support	2025-02-25 10:52:26 +00:00
Francis Lata	30d5daa121	Merge branch 'master' into retinanet_mlperf	2025-02-25 10:32:34 +00:00
nimlgen	b4c3780df0	hotfix: interop example (#9237 ) * hotfix: interop example * rm this * fix * fix ci mps * atol rtol * no uaf	2025-02-25 10:32:00 +03:00
chenyu	8c7be428e5	update bert BS to 78 (#9236 ) fits 78 now. about 215 tflops on green	2025-02-24 22:47:35 -05:00
nimlgen	56288243e6	metal PyTorch interop (#9229 ) * add from_blob support to mps cuda * objc_id * metal pytorch interop * fix comments --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-02-24 22:36:08 +03:00
nimlgen	1d06d61b16	from_blob for cuda (#9223 ) * from_blob for cuda * maybe docs? * minor docs * example * waiting 9224 --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-24 14:02:06 +03:00
George Hotz	24615db5f5	hotfix: torch cuda interop example	2025-02-24 09:02:48 +00:00
Francis Lata	2c3417dfce	Merge branch 'master' into retinanet_mlperf	2025-02-23 21:23:28 +00:00
Francis Lata	60c13c2932	update loss calculation for regresionhead and some cleanups	2025-02-23 21:22:33 +00:00
ShikChen	05e3202fba	remove unused memsize_to_str and minor cleanups [pr] (#9211 ) * fix edge cases in memsize_to_str() Inputs <= 1 now return "0.00 B" for 0 and "1.00 B" for 1, avoiding an IndexError. Also, memsize_to_str(1000) now returns "1.00 KB" instead of "1000.00 B". Replaced the list comprehension with a next(...) generator for conciseness and efficiency. * simplify code using idiomatic python - Remove the unused `memsize_to_str()` function in helpers. - Use a tuple for checking multiple string prefixes/suffixes. - Avoid unnecessary list construction by using iterables directly. - Check None in @diskcache to ensure proper caching of falsy values. * revert generators back to list comprehension Sometimes building list first could be faster. Keep it as is.	2025-02-23 09:58:37 -05:00
George Hotz	4e6665bda5	different way to write torch backend (#9197 ) * different way to write torch backend * both backends * more work * simpler code * more work * test both * imply unwrap/wrap * FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works * ready to start making test_ops work in torch backend * backward pass, TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works * FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_simple_conv2d works * matmul backward is broken with as_strided	2025-02-22 14:42:26 +08:00
George Hotz	e87be0131e	torch backend start (#9191 ) * start torch backend * progress * ugh, you need cpp crap * 1+1 works * 1+1 works * becoming a real backend * ready to merge?	2025-02-21 16:57:28 +08:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
Francis Lata	7dba815c47	fix train script	2025-02-19 20:43:02 +00:00
Francis Lata	fc36f09b1e	no need to return loaded keys for resnet	2025-02-19 20:35:03 +00:00
chenyu	3b37cc898b	add bert tiny config (#9177 ) set with BERT_SIZE=tiny. easier to study embedding and fusion	2025-02-19 14:57:03 -05:00
Francis Lata	41378e74a6	model init, hyperparam, and data preprocessing updates	2025-02-19 18:47:06 +00:00
chenyu	975c318dbc	bert use int32 for input ids (#9173 ) original data was int32 for these. float might have caused precision issues	2025-02-19 08:17:27 -05:00
chenyu	ff05bff221	put bert data shard inside jit (#9160 ) python time 45ms -> 9ms, it was spending time to schedule the shard also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS	2025-02-18 10:36:54 -05:00
chenyu	5dc1257ce0	clean up bert fake data iterator [pr] (#9145 ) reuse the same get_data_bert path in setup and real run	2025-02-17 20:03:38 -05:00
George Hotz	7eea9b639d	hotfix: add replay_pkl debugging env	2025-02-17 17:34:58 +08:00
George Hotz	4672d9af73	actual tests for the dsp backend [pr] (#9102 ) * actual tests for the dsp backend [pr] * fix name	2025-02-15 15:17:56 +08:00
chenyu	81597ddd96	increase lr for bert (#9098 ) had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview	2025-02-14 19:10:35 -05:00
Francis Lata	cfa1c2d50e	hyperparameter adjustments and cleanups	2025-02-14 17:53:06 +00:00
chenyu	b58e7b1898	zero out the weight in bert init run (#9076 ) `DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.	2025-02-14 08:40:41 -05:00
Francis Lata	caf9b2baa2	Merge branch 'master' into retinanet_mlperf	2025-02-14 06:28:37 +00:00
chenyu	9e91898941	bert eval at the end of training (#9070 ) always eval at the last epoch	2025-02-13 16:29:44 -05:00
Francis Lata	3a2f126e7b	Merge branch 'master' into retinanet_mlperf	2025-02-13 15:40:10 +00:00
Francis Lata	5f26692068	remove frozen layers from optimizer's params	2025-02-13 06:36:13 +00:00
chenyu	f4f56d7c15	move time_linearizer to extra.optimization.helpers [pr] (#9048 ) no longer used in tinygrad	2025-02-12 15:49:58 -05:00

1 2 3 4 5 ...

1111 Commits