tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Francis Lata	7cb226d757	Revert "Revert "add nan check during training"" This reverts commit `b7b2943197`.	2025-02-26 15:43:20 +00:00
Francis Lata	e006ae24ea	Merge branch 'master' into retinanet_mlperf	2025-02-26 07:31:32 +00:00
George Hotz	3f4eb9006a	test for device mismatch [pr] (#9250 ) * test for device mismatch [pr] * fix bert	2025-02-26 13:06:33 +08:00
chenyu	979e84f30e	RESET_STEP in bert setup and beam (#9248 ) running dev_beam migh OOM without it but runs fine in real run.	2025-02-25 19:15:10 -05:00
Francis Lata	b7b2943197	Revert "add nan check during training" This reverts commit `ddf1f0d5dd`.	2025-02-25 21:43:28 +00:00
chenyu	6610ad58ab	hotfix bert no shard with only one device (#9243 ) `LLVM=1 BERT_SIZE="tiny" DEFAULT_FLOAT=HALF BENCHMARK=5 MODEL="bert" python3 examples/mlperf/model_train.py` runs for me with this. it should not failed with single device shard though	2025-02-25 09:05:11 -05:00
Francis Lata	ddf1f0d5dd	add nan check during training	2025-02-25 10:53:31 +00:00
Francis Lata	8737020d75	add JIT reset support	2025-02-25 10:52:26 +00:00
Francis Lata	30d5daa121	Merge branch 'master' into retinanet_mlperf	2025-02-25 10:32:34 +00:00
nimlgen	b4c3780df0	hotfix: interop example (#9237 ) * hotfix: interop example * rm this * fix * fix ci mps * atol rtol * no uaf	2025-02-25 10:32:00 +03:00
chenyu	8c7be428e5	update bert BS to 78 (#9236 ) fits 78 now. about 215 tflops on green	2025-02-24 22:47:35 -05:00
nimlgen	56288243e6	metal PyTorch interop (#9229 ) * add from_blob support to mps cuda * objc_id * metal pytorch interop * fix comments --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-02-24 22:36:08 +03:00
nimlgen	1d06d61b16	from_blob for cuda (#9223 ) * from_blob for cuda * maybe docs? * minor docs * example * waiting 9224 --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-24 14:02:06 +03:00
George Hotz	24615db5f5	hotfix: torch cuda interop example	2025-02-24 09:02:48 +00:00
Francis Lata	2c3417dfce	Merge branch 'master' into retinanet_mlperf	2025-02-23 21:23:28 +00:00
Francis Lata	60c13c2932	update loss calculation for regresionhead and some cleanups	2025-02-23 21:22:33 +00:00
ShikChen	05e3202fba	remove unused memsize_to_str and minor cleanups [pr] (#9211 ) * fix edge cases in memsize_to_str() Inputs <= 1 now return "0.00 B" for 0 and "1.00 B" for 1, avoiding an IndexError. Also, memsize_to_str(1000) now returns "1.00 KB" instead of "1000.00 B". Replaced the list comprehension with a next(...) generator for conciseness and efficiency. * simplify code using idiomatic python - Remove the unused `memsize_to_str()` function in helpers. - Use a tuple for checking multiple string prefixes/suffixes. - Avoid unnecessary list construction by using iterables directly. - Check None in @diskcache to ensure proper caching of falsy values. * revert generators back to list comprehension Sometimes building list first could be faster. Keep it as is.	2025-02-23 09:58:37 -05:00
George Hotz	4e6665bda5	different way to write torch backend (#9197 ) * different way to write torch backend * both backends * more work * simpler code * more work * test both * imply unwrap/wrap * FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works * ready to start making test_ops work in torch backend * backward pass, TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works * FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_simple_conv2d works * matmul backward is broken with as_strided	2025-02-22 14:42:26 +08:00
George Hotz	e87be0131e	torch backend start (#9191 ) * start torch backend * progress * ugh, you need cpp crap * 1+1 works * 1+1 works * becoming a real backend * ready to merge?	2025-02-21 16:57:28 +08:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
Francis Lata	7dba815c47	fix train script	2025-02-19 20:43:02 +00:00
Francis Lata	fc36f09b1e	no need to return loaded keys for resnet	2025-02-19 20:35:03 +00:00
chenyu	3b37cc898b	add bert tiny config (#9177 ) set with BERT_SIZE=tiny. easier to study embedding and fusion	2025-02-19 14:57:03 -05:00
Francis Lata	41378e74a6	model init, hyperparam, and data preprocessing updates	2025-02-19 18:47:06 +00:00
chenyu	975c318dbc	bert use int32 for input ids (#9173 ) original data was int32 for these. float might have caused precision issues	2025-02-19 08:17:27 -05:00
chenyu	ff05bff221	put bert data shard inside jit (#9160 ) python time 45ms -> 9ms, it was spending time to schedule the shard also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS	2025-02-18 10:36:54 -05:00
chenyu	5dc1257ce0	clean up bert fake data iterator [pr] (#9145 ) reuse the same get_data_bert path in setup and real run	2025-02-17 20:03:38 -05:00
George Hotz	7eea9b639d	hotfix: add replay_pkl debugging env	2025-02-17 17:34:58 +08:00
George Hotz	4672d9af73	actual tests for the dsp backend [pr] (#9102 ) * actual tests for the dsp backend [pr] * fix name	2025-02-15 15:17:56 +08:00
chenyu	81597ddd96	increase lr for bert (#9098 ) had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview	2025-02-14 19:10:35 -05:00
Francis Lata	cfa1c2d50e	hyperparameter adjustments and cleanups	2025-02-14 17:53:06 +00:00
chenyu	b58e7b1898	zero out the weight in bert init run (#9076 ) `DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.	2025-02-14 08:40:41 -05:00
Francis Lata	caf9b2baa2	Merge branch 'master' into retinanet_mlperf	2025-02-14 06:28:37 +00:00
chenyu	9e91898941	bert eval at the end of training (#9070 ) always eval at the last epoch	2025-02-13 16:29:44 -05:00
Francis Lata	3a2f126e7b	Merge branch 'master' into retinanet_mlperf	2025-02-13 15:40:10 +00:00
Francis Lata	5f26692068	remove frozen layers from optimizer's params	2025-02-13 06:36:13 +00:00
chenyu	f4f56d7c15	move time_linearizer to extra.optimization.helpers [pr] (#9048 ) no longer used in tinygrad	2025-02-12 15:49:58 -05:00
Francis Lata	ff301f0be9	minor cleanups	2025-02-12 16:03:38 +00:00
Francis Lata	f61b10450e	Merge branch 'master' into retinanet_mlperf	2025-02-12 15:47:05 +00:00
chenyu	7b5ac2c15e	free_intermediates in bert (#9040 ) also re-enable dropout and update EVAL_BS	2025-02-12 10:00:39 -05:00
Ahmed Harmouche	916d5e7f08	WebGPU f16 support (f16 bounty part 2) (#8653 ) * WebGPU f16 support * Don't enable f16 yet * dtype tests passing after bitcast fix * Maybe all WebGPU green? * Require shader-f16 in examples * Minor wgsl touchup * 1 line shorter * Simpler * Add transcendetal support * log2 nan location mismatch on Vulkan * Nan skips	2025-02-12 19:46:53 +08:00
George Hotz	0568720a68	delete revectorize (#9000 ) * delete revectorize * test vectorized LLVM/CLANG * idk about that * was that the segfault?	2025-02-10 18:32:35 +08:00
Francis Lata	37aab697b8	adjust LR to be the ratio of the batch size	2025-02-07 19:46:54 +00:00
Francis Lata	041481f910	Merge branch 'master' into retinanet_mlperf	2025-02-07 15:28:29 +00:00
George Hotz	4de084a835	cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952 ) * cleanup ci [pr] * testing_minimal * add hypothesis to minimal * fail tiktoken import okay * add LLVM speed test * llvm speed w/o beam	2025-02-07 19:01:59 +08:00
chenyu	a092b6395d	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00
George Hotz	8b16c65bca	add compile3 benchmark [pr] (#8929 )	2025-02-06 22:49:31 +08:00
geohotstan	6fb0e5751b	hotfix test_onnx_imagenet (#8897 ) * start * log severity * only change this * change abstraction so it's more usable for huggingface * WHOOPS * actually this is more correct	2025-02-05 14:39:55 +08:00
geohotstan	057c70b05f	add onnx_helpers to extra and add ort validate to benchmark_onnx (#8890 ) * start * log severity * only change this * change abstraction so it's more usable for huggingface --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-04 16:36:01 -05:00
Francis Lata	a483c0d231	Merge branch 'master' into retinanet_mlperf	2025-02-03 19:54:55 +00:00

1 2 3 4 5 ...

1098 Commits