tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-12 16:38:15 -05:00

Author	SHA1	Message	Date
wozeparrot	b979162c5d	llama3 eval train (#11706 )	2025-08-20 19:56:35 -04:00
chenyu	dbd3b67657	clamp GRAD_CLIP_NORM in llama (#11761 )	2025-08-20 19:55:50 -04:00
chenyu	e9d0027591	llama MP realize weight after shard (#11672 ) * llama MP realize weight after shard prevents memory spike on device 0 * empty weight for FAKEDATA	2025-08-14 16:17:46 -04:00
chenyu	ef17af85c6	remove .float call in llama logit (#11598 ) * remove .float call in llama logit * bfloat item	2025-08-10 00:02:18 -04:00
chenyu	45baec1aab	model parallel llama (#11588 ) MP=8 GRADIENT_ACC_STEPS=3 BS=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=70B SEQLEN=512 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py	2025-08-09 16:54:27 -04:00
chenyu	702e38dc19	remove FUSE_ARANGE_UINT (#11567 ) also add IGNORE_OOB=1 to bert runs. lowered BS on tinybox to 90 since 96 oom during eval without reset	2025-08-07 16:49:06 -04:00
wozeparrot	7ae4335127	feat: generate blend index (#11566 )	2025-08-07 14:20:28 -04:00
wozeparrot	2d5bdc939d	faster llama3 dataloader (#11540 )	2025-08-06 18:25:57 -04:00
chenyu	f7965f85aa	Revert "feat: faster index building (#11462 )" (#11478 ) This reverts commit `3a4deb08d2`.	2025-08-02 12:50:48 -04:00
wozeparrot	3a4deb08d2	feat: faster index building (#11462 ) * feat: faster index building * feat: correct training samples	2025-08-02 11:50:18 -04:00
chenyu	9e8e6b45ab	grad acc train llama (#11467 ) * grad acc train llama * log step time	2025-08-01 15:54:50 -04:00
chenyu	7ad7329257	data parallel train llama (#11466 )	2025-08-01 12:13:51 -04:00
George Hotz	8ff03806e8	add llama layers (#11460 ) * add llama layers * add contig bw for speed	2025-07-31 16:28:04 -07:00
wozeparrot	6252f7770e	feat: fake data (#11447 )	2025-07-30 17:18:20 -07:00
chenyu	e300451f3a	update llama3 (#11446 ) `LR=1e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 FUSE_ARANGE=1 JITBEAM=2 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=512 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` trained to 7	2025-07-30 19:34:21 -04:00
wozeparrot	5fb975351a	feat: flag for training on val (#11441 )	2025-07-30 14:29:45 -07:00
wozeparrot	825b6a2505	feat: llama3 dataloader (#11340 )	2025-07-30 13:27:55 -07:00
chenyu	c14c9a8eff	llama3 grad clip (#11003 )	2025-06-27 19:14:12 -04:00
chenyu	f2548afeb5	bert grad clipping start with const 0 (#11008 ) saved the init kernels	2025-06-27 18:02:23 -04:00
chenyu	6ab5a5cb6c	llama3 mlperf train (#10983 ) work in progress. now it can overfit small examples and vram roughly matches	2025-06-26 20:24:27 -04:00
chenyu	8751d47985	CosineAnnealingLRWithWarmup (#10981 )	2025-06-25 17:45:21 -04:00
chenyu	efad567ebd	ruff check whole `examples/mlperf/` (#10979 )	2025-06-25 12:57:48 -04:00
chenyu	0480139def	log_perplexity metrics (#10912 )	2025-06-21 10:44:47 -04:00
chenyu	62a540066e	remove DEBUG=2 in mi300x bert setup (#10886 ) seems fine now, not sure what the issue was	2025-06-19 13:28:53 -04:00
chenyu	f377cc19cd	use AM for bert (#10882 ) have triained 3 runs and all seem fine	2025-06-19 09:48:54 -04:00
chenyu	b70c7d3631	bert grad accumulation (#10863 ) * bert grad accumulation * realize grad	2025-06-18 12:17:07 -04:00
chenyu	075a74cf25	add global_batch_size to mlperf bert (#10852 ) global_batch_size = grad_acc_steps * batch_size. no-op change to prep grad acc for bert	2025-06-17 17:54:15 -04:00
chenyu	81e296d7b8	remove Tensor.test() in retinanet (#10770 ) test was removed	2025-06-10 22:14:57 -04:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
chenyu	4ab3391e6f	`set -o pipefail` for mlperf run_and_time (#10577 ) also run the 5.1 script in ci cron job	2025-05-30 16:36:44 -04:00
chenyu	baf482d314	copy mlperf stuff to 5.1 (#10576 ) 5.0 is finalized, new changes go to 5.1	2025-05-30 16:12:39 -04:00
George Hotz	b3b43a82c4	remove Tensor.no_grad, it's meaningless now [pr] (#10556 )	2025-05-28 22:20:02 -07:00
chenyu	74cf5dbd9e	mlperf system updates (#10550 ) standardized processor and accelerator names	2025-05-28 16:15:46 -04:00
chenyu	51dc7eedb0	correct use AM for resnet run_and_time (#10524 )	2025-05-26 15:33:11 -04:00
chenyu	c1919ad55f	use AM for resnet run_and_time (#10523 )	2025-05-26 14:50:49 -04:00
chenyu	2d50efb92b	`set -e` on mlperf run_and_time scripts (#10519 )	2025-05-26 09:22:30 -04:00
chenyu	dc6309242d	WallTimeEvent for mlperf ci (#10506 )	2025-05-24 10:56:03 -04:00
chenyu	67d1364106	update LOGMLPERF in red resnet run_and_time (#10416 )	2025-05-19 13:23:33 -04:00
chenyu	485e80da69	run_and_time for resnet ci (#10405 )	2025-05-18 23:39:57 -04:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
George Hotz	568d6d96e7	small changes from new multi [pr] (#10318 )	2025-05-14 20:50:59 -07:00
George Hotz	bfc30fa6ea	hotfix: typo in shm_name	2025-05-14 19:34:52 -07:00
George Hotz	2bc54b3e22	manually handle OSX	2025-05-14 19:17:51 -07:00
George Hotz	ab460486d7	Revert "resnet dataloader osx (#10316 )" This reverts commit `aef336930a`.	2025-05-14 19:15:07 -07:00
George Hotz	aef336930a	resnet dataloader osx (#10316 ) * mlperf dataloader on mac * resnet dataloader [pr] * simple should work	2025-05-14 18:31:26 -07:00
chenyu	610ee79b22	cherry pick mlperf5.0 branch to master (#10089 )	2025-04-28 15:36:56 -04:00
chenyu	74c6cf8be3	lint mlperf model_train (#10038 )	2025-04-24 16:19:44 -04:00
chenyu	a25abf55e3	retinanet only call postprocess_detections with RUNMLPERF (#10017 ) during setup only need to compile `_eval_step().numpy()`	2025-04-23 20:45:38 -04:00
chenyu	65faa1d94b	explicit device in mlperf scripts (#10015 )	2025-04-23 17:11:52 -04:00
chenyu	a3f938dbee	remove retinanet INITMLPERF from beam script (#10011 ) it only controls logging, loading real data or not is solely controlled by RUNMLPERF	2025-04-23 14:32:54 -04:00

1 2 3 4 5 ...

277 Commits