Elias Wahl
|
d2e3c391e8
|
Residual in MLM loss + Change default steps (#4935)
* Residual in mlm loss
* Reduce default steps to 160K * 24
* oops
* comment
|
2024-06-12 16:09:18 -04:00 |
|
Elias Wahl
|
04e237328b
|
Refactor to class style (#4804)
|
2024-06-04 14:08:31 -07:00 |
|
chenyu
|
31358cbea5
|
change Tensor.stack to method (#4719)
|
2024-05-24 17:04:19 -04:00 |
|
wozeparrot
|
d2c347fc74
|
faster gather for bert (#4526)
|
2024-05-10 22:28:48 -07:00 |
|
Elias Wahl
|
27613dd881
|
MLPerf BERT: Main training loop (#4288)
* BERT language modeling head + trunc normal initializers
* add train loop + helpers
* shuffle in dataloaders + slight changes in main loop
* beam change
* Minor changes
* random.shuffle
* HParam update
* Use deque for dataloader
* wandb bert project name
* half fixes
* BENCHMARK + remove epoch
* cast + print()
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
|
2024-04-29 14:35:27 -04:00 |
|
Elias Wahl
|
2ecd61e3e2
|
monkey patching (#4214)
|
2024-04-18 19:20:52 -04:00 |
|
Elias Wahl
|
7db6dd725d
|
multilazybuffer fix (#3609)
|
2024-03-04 17:36:23 -05:00 |
|
George Hotz
|
d87a246439
|
move to new cached fetch (#2493)
* move to new cached fetch
* extra.utils is over
* loads
* bump download cache
* bump timeout
|
2023-11-28 17:36:55 -08:00 |
|
George Hotz
|
0cbf6c1811
|
move things, clean up extra (#2292)
* move things
* idk why pylint needs that now
* delete unused
|
2023-11-13 20:18:40 -08:00 |
|