b1tg
|
0fbc551622
|
train bert with fp8 (#13874)
* fp8 train
* clean
* lint
* test fix from #13439
* skip first/last layer
* rm __init__, restore unroll <=32 check
* tests
* clean test, remove unused
* multi-gpu test, clean quantize_to_fp8
* remove bert contiguous
* run script
* test: better check
* run script search
* add seed in bert data shuffle
* move script to mi350x folder
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
|
2026-01-09 09:21:59 -05:00 |
|
chenyu
|
7cd7593c5d
|
add script to train bert on mi350x (#13743)
adapted from mi300 config
|
2025-12-17 16:54:04 -05:00 |
|
chenyu
|
4562f217e1
|
more bert updates (#13597)
prep split jit
also lower BS to 72
|
2025-12-06 08:32:43 -05:00 |
|
chenyu
|
74db65cf72
|
update mlperf bert LOGMLPERF (#13065)
|
2025-11-02 15:26:37 -05:00 |
|
chenyu
|
70dd297a05
|
BS=96 for bert (#12675)
96 trains fine now
|
2025-10-14 09:07:43 -04:00 |
|
chenyu
|
77b5e6774e
|
fix bert training config (#12647)
FREE_INTERMEDIATE=0 REWRITE_STACK_LIMIT=500000
|
2025-10-13 15:03:47 -04:00 |
|
chenyu
|
0f776c6e46
|
examples/mlperf/training_submission_v6.0 (#12644)
copied from v5.1
|
2025-10-13 09:58:25 -04:00 |
|