wozeparrot
ef09071073
llama: speed 2 ( #15960 )
2026-04-28 20:44:37 -07:00
wozeparrot
5e861cd2c4
llama: move llama kernels to llama_kernels ( #15952 )
2026-04-27 22:48:53 -07:00
wozeparrot
4b908b6e2c
llama: fused ce loss ( #15920 )
2026-04-24 20:01:24 -07:00
wozeparrot
9d134a2848
llama: fix fakedata timing ( #15905 )
2026-04-23 21:37:03 -07:00
wozeparrot
06343092c8
llama: combined w13 ( #15803 )
2026-04-17 22:27:31 -07:00
wozeparrot
9e60e4a7e7
llama: native fp8 ( #15733 )
2026-04-16 22:16:05 -07:00
chenyu
839d37b7bc
update median_step_time in model_train.py ( #15649 )
...
BENCHMARK=5 used to pick the 4th largest, not the middle one
2026-04-08 09:53:59 -04:00
wozeparrot
70dbd35023
llama: move custom_kernel into flat_llama ( #15643 )
2026-04-08 00:19:14 -07:00
wozeparrot
7e54992bf6
fp8 llama ( #15588 )
...
Co-authored-by: qazal <qazal.software@gmail.com >
2026-04-04 18:24:57 -07:00
qazal
09f60d80fd
llama: fix FP8=1 FAKEDATA=1 ( #15564 )
2026-04-01 20:53:03 +09:00
wozeparrot
0c3e438229
llama: mllog ( #15502 )
2026-03-28 11:18:25 -07:00
wozeparrot
a65e958be9
llama: new apply_grad ( #15503 )
2026-03-26 19:39:25 -07:00
Christopher Milan
bc180a963c
deprecate <dev>=1 in favor of DEV=<dev> ( #15467 )
...
* start work on target
* add test
* update actions to use DEV
* update docs
* update readmes
* tests need that too
* update example
* update tests (comments)
* fix that test
* ruff
* mypy
* oops
* remove getenvs
* don't add Target yet
* and the test
* lint
* and docs
* more stuff
* assert
* few more fixes
* test assert
2026-03-26 03:48:03 -04:00
wozeparrot
da2031266a
llama: correct 8b init ( #15397 )
2026-03-24 13:41:41 -07:00
wozeparrot
87c4ec1724
llama: use flat llama ( #15353 )
2026-03-19 22:12:38 -07:00
wozeparrot
a191ac0566
llama: use mlperf model ( #15257 )
2026-03-13 08:08:32 -07:00
wozeparrot
4fab320abe
llama: clean ( #15224 )
2026-03-11 13:33:59 -07:00
wozeparrot
05d6d9120a
llama offload null ( #15222 )
2026-03-11 10:04:31 -07:00
wozeparrot
525a178966
llama: jit more ( #15199 )
2026-03-10 11:04:59 +08:00
wozeparrot
4544da1c54
llama3 fixes part3 ( #15152 )
2026-03-05 01:17:54 -08:00
wozeparrot
92c16810ac
feat: per device mem_used ( #15100 )
2026-03-03 01:31:28 -08:00
wozeparrot
824ba4386a
llama3 dp fix ( #15098 )
2026-03-02 22:43:07 -08:00
wozeparrot
a4f6365929
llama3: fstep takes grads ( #15069 )
2026-03-01 20:05:07 -08:00
wozeparrot
cfc5cf65ad
llama3: vocab padding fix + jit copies on fakedata ( #15067 )
2026-02-28 08:44:55 -08:00
wozeparrot
d941dd5aeb
llama3: pad vocab when mp sharding ( #14998 )
2026-02-25 00:04:06 -08:00
wozeparrot
e1c9985715
llama3: better time keeping ( #14999 )
2026-02-24 22:42:05 -08:00
wozeparrot
8d9545e09e
llama3: correctly shard wqkv ( #14978 )
2026-02-23 23:57:10 -08:00
wozeparrot
3cda781876
llama optim offload ( #14901 )
2026-02-21 08:53:45 -08:00
wozeparrot
95e97ec341
seperate llama optim ( #14810 )
2026-02-17 13:02:35 -08:00
wozeparrot
4b5d3bda1f
llama3: data seed ( #14681 )
2026-02-11 19:04:40 -08:00
wozeparrot
a60220bed9
llama3: move dl to numpy & jit more ( #14677 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-02-10 18:16:40 -08:00
wozeparrot
4845e42135
llama3 gradacc fixes ( #14414 )
2026-01-28 19:12:39 -08:00
nimlgen
aec1ae0de1
llama: set manual_seed ( #14409 )
2026-01-28 14:40:00 -08:00
George Hotz
0c6b3f50aa
add marker to llama training ( #14401 )
2026-01-28 22:44:28 +08:00
wozeparrot
e496547720
llama3 gradacc ( #14291 )
2026-01-27 19:48:10 -08:00
wozeparrot
963c59ebdb
fix: pull fixes from gradacc branch ( #14296 )
2026-01-22 23:07:54 -08:00
wozeparrot
c1d14ea832
llama8b train fixes ( #14264 )
2026-01-20 20:34:47 -08:00
b1tg
0fbc551622
train bert with fp8 ( #13874 )
...
* fp8 train
* clean
* lint
* test fix from #13439
* skip first/last layer
* rm __init__, restore unroll <=32 check
* tests
* clean test, remove unused
* multi-gpu test, clean quantize_to_fp8
* remove bert contiguous
* run script
* test: better check
* run script search
* add seed in bert data shuffle
* move script to mi350x folder
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-01-09 09:21:59 -05:00
b1tg
241f0402b4
add seed in bert data shuffle ( #14054 )
2026-01-07 10:02:05 -05:00
chenyu
da1cb6a9ec
update llama dataloader ( #13825 )
...
separate creating dataset from itererating over the dataset to not create eval data for each eval
2025-12-24 17:42:08 -05:00
chenyu
903753c60c
llama wandb logging ( #13822 )
2025-12-24 10:24:59 -05:00
chenyu
27d899ce97
TRAIN=0 to only eval llama ( #13804 )
2025-12-22 11:55:46 -05:00
chenyu
39d962106f
update llama logging ( #13803 )
...
```
REWRITE_STACK_LIMIT=1000000 SMALL=1 BASEDIR=/raid/datasets/c4-8b SAMPLES=1000 BS=8 DP=8 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=8B SEQLEN=1024 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py
1 93.44 s run, 11.8750 loss, 0.000000000001 LR, 642.43 GB used, 19644.30 GFLOPS
2 101.78 s run, 11.8750 loss, 0.000000000001 LR, 1454.57 GB used, 17039.35 GFLOPS
3 7.34 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 236258.78 GFLOPS
4 4.32 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 401488.40 GFLOPS
5 4.36 s run, 11.9375 loss, 0.000000000003 LR, 1454.57 GB used, 398116.13 GFLOPS
6 4.32 s run, 11.8750 loss, 0.000000000003 LR, 1454.57 GB used, 401878.60 GFLOPS
7 4.34 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 399822.57 GFLOPS
8 4.35 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 398512.24 GFLOPS
9 4.36 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 397832.61 GFLOPS
10 4.40 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 394520.83 GFLOPS
```
2025-12-22 11:28:29 -05:00
chenyu
e428fbfab6
verify dtype of llama model params ( #13719 )
2025-12-16 12:32:02 -05:00
chenyu
6cad622f59
don't FREE_INTERMEDIATE in bert ( #13684 )
...
hangs green hcq consistently after an hour of training
2025-12-14 14:27:42 -05:00
chenyu
01e9ad0d52
clean up bert next_data ( #13650 )
...
train iter was designed to never stop for both real and fake data
2025-12-11 22:56:28 -05:00
chenyu
5034c6fb37
reenable FREE_INTERMEDIATE for bert ( #13639 )
...
* reenable FREE_INTERMEDIATE for bert
* comment
2025-12-10 12:08:09 -05:00
chenyu
2471b49e45
minor bert / llama change from grad acc branch ( #13622 )
...
* minor bert / llama change from grad acc branch
* revert those
2025-12-08 16:04:14 -05:00
chenyu
b981b6f89e
remove old llama grad_acc ( #13611 )
...
* remove old llama grad_acc
* GRADIENT_ACC_STEPS=1
2025-12-07 13:03:47 -05:00
chenyu
4562f217e1
more bert updates ( #13597 )
...
prep split jit
also lower BS to 72
2025-12-06 08:32:43 -05:00