wozeparrot
|
b979162c5d
|
llama3 eval train (#11706)
|
2025-08-20 19:56:35 -04:00 |
|
chenyu
|
dbd3b67657
|
clamp GRAD_CLIP_NORM in llama (#11761)
|
2025-08-20 19:55:50 -04:00 |
|
chenyu
|
e9d0027591
|
llama MP realize weight after shard (#11672)
* llama MP realize weight after shard
prevents memory spike on device 0
* empty weight for FAKEDATA
|
2025-08-14 16:17:46 -04:00 |
|
chenyu
|
ef17af85c6
|
remove .float call in llama logit (#11598)
* remove .float call in llama logit
* bfloat item
|
2025-08-10 00:02:18 -04:00 |
|
chenyu
|
45baec1aab
|
model parallel llama (#11588)
MP=8 GRADIENT_ACC_STEPS=3 BS=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=70B SEQLEN=512 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py
|
2025-08-09 16:54:27 -04:00 |
|
chenyu
|
702e38dc19
|
remove FUSE_ARANGE_UINT (#11567)
also add IGNORE_OOB=1 to bert runs. lowered BS on tinybox to 90 since 96 oom during eval without reset
|
2025-08-07 16:49:06 -04:00 |
|
wozeparrot
|
7ae4335127
|
feat: generate blend index (#11566)
|
2025-08-07 14:20:28 -04:00 |
|
wozeparrot
|
2d5bdc939d
|
faster llama3 dataloader (#11540)
|
2025-08-06 18:25:57 -04:00 |
|
chenyu
|
f7965f85aa
|
Revert "feat: faster index building (#11462)" (#11478)
This reverts commit 3a4deb08d2.
|
2025-08-02 12:50:48 -04:00 |
|
wozeparrot
|
3a4deb08d2
|
feat: faster index building (#11462)
* feat: faster index building
* feat: correct training samples
|
2025-08-02 11:50:18 -04:00 |
|
chenyu
|
9e8e6b45ab
|
grad acc train llama (#11467)
* grad acc train llama
* log step time
|
2025-08-01 15:54:50 -04:00 |
|
chenyu
|
7ad7329257
|
data parallel train llama (#11466)
|
2025-08-01 12:13:51 -04:00 |
|
George Hotz
|
8ff03806e8
|
add llama layers (#11460)
* add llama layers
* add contig bw for speed
|
2025-07-31 16:28:04 -07:00 |
|
wozeparrot
|
6252f7770e
|
feat: fake data (#11447)
|
2025-07-30 17:18:20 -07:00 |
|
chenyu
|
e300451f3a
|
update llama3 (#11446)
`LR=1e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 FUSE_ARANGE=1 JITBEAM=2 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=512 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` trained to 7
|
2025-07-30 19:34:21 -04:00 |
|
wozeparrot
|
5fb975351a
|
feat: flag for training on val (#11441)
|
2025-07-30 14:29:45 -07:00 |
|
wozeparrot
|
825b6a2505
|
feat: llama3 dataloader (#11340)
|
2025-07-30 13:27:55 -07:00 |
|
chenyu
|
c14c9a8eff
|
llama3 grad clip (#11003)
|
2025-06-27 19:14:12 -04:00 |
|
chenyu
|
f2548afeb5
|
bert grad clipping start with const 0 (#11008)
saved the init kernels
|
2025-06-27 18:02:23 -04:00 |
|
chenyu
|
6ab5a5cb6c
|
llama3 mlperf train (#10983)
work in progress. now it can overfit small examples and vram roughly matches
|
2025-06-26 20:24:27 -04:00 |
|
chenyu
|
8751d47985
|
CosineAnnealingLRWithWarmup (#10981)
|
2025-06-25 17:45:21 -04:00 |
|
chenyu
|
efad567ebd
|
ruff check whole examples/mlperf/ (#10979)
|
2025-06-25 12:57:48 -04:00 |
|
chenyu
|
0480139def
|
log_perplexity metrics (#10912)
|
2025-06-21 10:44:47 -04:00 |
|
chenyu
|
62a540066e
|
remove DEBUG=2 in mi300x bert setup (#10886)
seems fine now, not sure what the issue was
|
2025-06-19 13:28:53 -04:00 |
|
chenyu
|
f377cc19cd
|
use AM for bert (#10882)
have triained 3 runs and all seem fine
|
2025-06-19 09:48:54 -04:00 |
|
chenyu
|
b70c7d3631
|
bert grad accumulation (#10863)
* bert grad accumulation
* realize grad
|
2025-06-18 12:17:07 -04:00 |
|
chenyu
|
075a74cf25
|
add global_batch_size to mlperf bert (#10852)
global_batch_size = grad_acc_steps * batch_size. no-op change to prep grad acc for bert
|
2025-06-17 17:54:15 -04:00 |
|
chenyu
|
81e296d7b8
|
remove Tensor.test() in retinanet (#10770)
test was removed
|
2025-06-10 22:14:57 -04:00 |
|
George Hotz
|
32e9949052
|
rename lazydata to uop (#10698)
|
2025-06-08 08:42:22 -07:00 |
|
chenyu
|
4ab3391e6f
|
set -o pipefail for mlperf run_and_time (#10577)
also run the 5.1 script in ci cron job
|
2025-05-30 16:36:44 -04:00 |
|
chenyu
|
baf482d314
|
copy mlperf stuff to 5.1 (#10576)
5.0 is finalized, new changes go to 5.1
|
2025-05-30 16:12:39 -04:00 |
|
George Hotz
|
b3b43a82c4
|
remove Tensor.no_grad, it's meaningless now [pr] (#10556)
|
2025-05-28 22:20:02 -07:00 |
|
chenyu
|
74cf5dbd9e
|
mlperf system updates (#10550)
standardized processor and accelerator names
|
2025-05-28 16:15:46 -04:00 |
|
chenyu
|
51dc7eedb0
|
correct use AM for resnet run_and_time (#10524)
|
2025-05-26 15:33:11 -04:00 |
|
chenyu
|
c1919ad55f
|
use AM for resnet run_and_time (#10523)
|
2025-05-26 14:50:49 -04:00 |
|
chenyu
|
2d50efb92b
|
set -e on mlperf run_and_time scripts (#10519)
|
2025-05-26 09:22:30 -04:00 |
|
chenyu
|
dc6309242d
|
WallTimeEvent for mlperf ci (#10506)
|
2025-05-24 10:56:03 -04:00 |
|
chenyu
|
67d1364106
|
update LOGMLPERF in red resnet run_and_time (#10416)
|
2025-05-19 13:23:33 -04:00 |
|
chenyu
|
485e80da69
|
run_and_time for resnet ci (#10405)
|
2025-05-18 23:39:57 -04:00 |
|
wozeparrot
|
1ed04f993b
|
move benchmark stat tracking to influxdb (#10185)
|
2025-05-15 16:14:56 -07:00 |
|
George Hotz
|
568d6d96e7
|
small changes from new multi [pr] (#10318)
|
2025-05-14 20:50:59 -07:00 |
|
George Hotz
|
bfc30fa6ea
|
hotfix: typo in shm_name
|
2025-05-14 19:34:52 -07:00 |
|
George Hotz
|
2bc54b3e22
|
manually handle OSX
|
2025-05-14 19:17:51 -07:00 |
|
George Hotz
|
ab460486d7
|
Revert "resnet dataloader osx (#10316)"
This reverts commit aef336930a.
|
2025-05-14 19:15:07 -07:00 |
|
George Hotz
|
aef336930a
|
resnet dataloader osx (#10316)
* mlperf dataloader on mac
* resnet dataloader [pr]
* simple should work
|
2025-05-14 18:31:26 -07:00 |
|
chenyu
|
610ee79b22
|
cherry pick mlperf5.0 branch to master (#10089)
|
2025-04-28 15:36:56 -04:00 |
|
chenyu
|
74c6cf8be3
|
lint mlperf model_train (#10038)
|
2025-04-24 16:19:44 -04:00 |
|
chenyu
|
a25abf55e3
|
retinanet only call postprocess_detections with RUNMLPERF (#10017)
during setup only need to compile `_eval_step().numpy()`
|
2025-04-23 20:45:38 -04:00 |
|
chenyu
|
65faa1d94b
|
explicit device in mlperf scripts (#10015)
|
2025-04-23 17:11:52 -04:00 |
|
chenyu
|
a3f938dbee
|
remove retinanet INITMLPERF from beam script (#10011)
it only controls logging, loading real data or not is solely controlled by RUNMLPERF
|
2025-04-23 14:32:54 -04:00 |
|