tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
wozeparrot	ef09071073	llama: speed 2 (#15960 )	2026-04-28 20:44:37 -07:00
wozeparrot	5e861cd2c4	llama: move llama kernels to llama_kernels (#15952 )	2026-04-27 22:48:53 -07:00
wozeparrot	4b908b6e2c	llama: fused ce loss (#15920 )	2026-04-24 20:01:24 -07:00
wozeparrot	9d134a2848	llama: fix fakedata timing (#15905 )	2026-04-23 21:37:03 -07:00
wozeparrot	06343092c8	llama: combined w13 (#15803 )	2026-04-17 22:27:31 -07:00
wozeparrot	9e60e4a7e7	llama: native fp8 (#15733 )	2026-04-16 22:16:05 -07:00
chenyu	839d37b7bc	update median_step_time in model_train.py (#15649 ) BENCHMARK=5 used to pick the 4th largest, not the middle one	2026-04-08 09:53:59 -04:00
wozeparrot	70dbd35023	llama: move custom_kernel into flat_llama (#15643 )	2026-04-08 00:19:14 -07:00
wozeparrot	7e54992bf6	fp8 llama (#15588 ) Co-authored-by: qazal <qazal.software@gmail.com>	2026-04-04 18:24:57 -07:00
qazal	09f60d80fd	llama: fix FP8=1 FAKEDATA=1 (#15564 )	2026-04-01 20:53:03 +09:00
wozeparrot	0c3e438229	llama: mllog (#15502 )	2026-03-28 11:18:25 -07:00
wozeparrot	a65e958be9	llama: new apply_grad (#15503 )	2026-03-26 19:39:25 -07:00
Christopher Milan	bc180a963c	deprecate <dev>=1 in favor of DEV=<dev> (#15467 ) * start work on target * add test * update actions to use DEV * update docs * update readmes * tests need that too * update example * update tests (comments) * fix that test * ruff * mypy * oops * remove getenvs * don't add Target yet * and the test * lint * and docs * more stuff * assert * few more fixes * test assert	2026-03-26 03:48:03 -04:00
wozeparrot	da2031266a	llama: correct 8b init (#15397 )	2026-03-24 13:41:41 -07:00
wozeparrot	87c4ec1724	llama: use flat llama (#15353 )	2026-03-19 22:12:38 -07:00
wozeparrot	a191ac0566	llama: use mlperf model (#15257 )	2026-03-13 08:08:32 -07:00
wozeparrot	4fab320abe	llama: clean (#15224 )	2026-03-11 13:33:59 -07:00
wozeparrot	05d6d9120a	llama offload null (#15222 )	2026-03-11 10:04:31 -07:00
wozeparrot	525a178966	llama: jit more (#15199 )	2026-03-10 11:04:59 +08:00
wozeparrot	4544da1c54	llama3 fixes part3 (#15152 )	2026-03-05 01:17:54 -08:00
wozeparrot	92c16810ac	feat: per device mem_used (#15100 )	2026-03-03 01:31:28 -08:00
wozeparrot	824ba4386a	llama3 dp fix (#15098 )	2026-03-02 22:43:07 -08:00
wozeparrot	a4f6365929	llama3: fstep takes grads (#15069 )	2026-03-01 20:05:07 -08:00
wozeparrot	cfc5cf65ad	llama3: vocab padding fix + jit copies on fakedata (#15067 )	2026-02-28 08:44:55 -08:00
wozeparrot	d941dd5aeb	llama3: pad vocab when mp sharding (#14998 )	2026-02-25 00:04:06 -08:00
wozeparrot	e1c9985715	llama3: better time keeping (#14999 )	2026-02-24 22:42:05 -08:00
wozeparrot	8d9545e09e	llama3: correctly shard wqkv (#14978 )	2026-02-23 23:57:10 -08:00
wozeparrot	3cda781876	llama optim offload (#14901 )	2026-02-21 08:53:45 -08:00
wozeparrot	95e97ec341	seperate llama optim (#14810 )	2026-02-17 13:02:35 -08:00
wozeparrot	4b5d3bda1f	llama3: data seed (#14681 )	2026-02-11 19:04:40 -08:00
wozeparrot	a60220bed9	llama3: move dl to numpy & jit more (#14677 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-02-10 18:16:40 -08:00
wozeparrot	4845e42135	llama3 gradacc fixes (#14414 )	2026-01-28 19:12:39 -08:00
nimlgen	aec1ae0de1	llama: set manual_seed (#14409 )	2026-01-28 14:40:00 -08:00
George Hotz	0c6b3f50aa	add marker to llama training (#14401 )	2026-01-28 22:44:28 +08:00
wozeparrot	e496547720	llama3 gradacc (#14291 )	2026-01-27 19:48:10 -08:00
wozeparrot	963c59ebdb	fix: pull fixes from gradacc branch (#14296 )	2026-01-22 23:07:54 -08:00
wozeparrot	c1d14ea832	llama8b train fixes (#14264 )	2026-01-20 20:34:47 -08:00
b1tg	0fbc551622	train bert with fp8 (#13874 ) * fp8 train * clean * lint * test fix from #13439 * skip first/last layer * rm __init__, restore unroll <=32 check * tests * clean test, remove unused * multi-gpu test, clean quantize_to_fp8 * remove bert contiguous * run script * test: better check * run script search * add seed in bert data shuffle * move script to mi350x folder --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2026-01-09 09:21:59 -05:00
b1tg	241f0402b4	add seed in bert data shuffle (#14054 )	2026-01-07 10:02:05 -05:00
chenyu	da1cb6a9ec	update llama dataloader (#13825 ) separate creating dataset from itererating over the dataset to not create eval data for each eval	2025-12-24 17:42:08 -05:00
chenyu	903753c60c	llama wandb logging (#13822 )	2025-12-24 10:24:59 -05:00
chenyu	27d899ce97	TRAIN=0 to only eval llama (#13804 )	2025-12-22 11:55:46 -05:00
chenyu	39d962106f	update llama logging (#13803 ) ``` REWRITE_STACK_LIMIT=1000000 SMALL=1 BASEDIR=/raid/datasets/c4-8b SAMPLES=1000 BS=8 DP=8 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=8B SEQLEN=1024 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py 1 93.44 s run, 11.8750 loss, 0.000000000001 LR, 642.43 GB used, 19644.30 GFLOPS 2 101.78 s run, 11.8750 loss, 0.000000000001 LR, 1454.57 GB used, 17039.35 GFLOPS 3 7.34 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 236258.78 GFLOPS 4 4.32 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 401488.40 GFLOPS 5 4.36 s run, 11.9375 loss, 0.000000000003 LR, 1454.57 GB used, 398116.13 GFLOPS 6 4.32 s run, 11.8750 loss, 0.000000000003 LR, 1454.57 GB used, 401878.60 GFLOPS 7 4.34 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 399822.57 GFLOPS 8 4.35 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 398512.24 GFLOPS 9 4.36 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 397832.61 GFLOPS 10 4.40 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 394520.83 GFLOPS ```	2025-12-22 11:28:29 -05:00
chenyu	e428fbfab6	verify dtype of llama model params (#13719 )	2025-12-16 12:32:02 -05:00
chenyu	6cad622f59	don't FREE_INTERMEDIATE in bert (#13684 ) hangs green hcq consistently after an hour of training	2025-12-14 14:27:42 -05:00
chenyu	01e9ad0d52	clean up bert next_data (#13650 ) train iter was designed to never stop for both real and fake data	2025-12-11 22:56:28 -05:00
chenyu	5034c6fb37	reenable FREE_INTERMEDIATE for bert (#13639 ) * reenable FREE_INTERMEDIATE for bert * comment	2025-12-10 12:08:09 -05:00
chenyu	2471b49e45	minor bert / llama change from grad acc branch (#13622 ) * minor bert / llama change from grad acc branch * revert those	2025-12-08 16:04:14 -05:00
chenyu	b981b6f89e	remove old llama grad_acc (#13611 ) * remove old llama grad_acc * GRADIENT_ACC_STEPS=1	2025-12-07 13:03:47 -05:00
chenyu	4562f217e1	more bert updates (#13597 ) prep split jit also lower BS to 72	2025-12-06 08:32:43 -05:00

1 2 3 4

199 Commits