training

2026-04-24 03:00:54 -04:00 · 2023-02-20 17:28:50 -08:00
parent 6a49452611
commit d0613d91bb
1 changed files with 1 additions and 1 deletions
--- a/docs/train.md
+++ b/docs/train.md
@@ -273,4 +273,4 @@ Because that "sudden converge" always happens, lets say "sudden converge" will h

 In my experiments, (2) is usually better than (1). However, in real cases, perhaps you may need to balance the steps before and after the "sudden converge" on your own to find a balance. The training after "sudden converge" is also important.

-But usually, if your logic batch size is already bigger than 256, then further extending the batch size is not very meaningful. In that case, perhaps a better idea is to train more steps.
+But usually, if your logic batch size is already bigger than 256, then further extending the batch size is not very meaningful. In that case, perhaps a better idea is to train more steps. I tried some "common" logic batch size at 64 or 96 or 128, it seems that many complicated conditions can be solved very well already.