This commit is contained in:
lvmin
2023-02-20 17:28:50 -08:00
parent 6a49452611
commit d0613d91bb

View File

@@ -273,4 +273,4 @@ Because that "sudden converge" always happens, lets say "sudden converge" will h
In my experiments, (2) is usually better than (1). However, in real cases, perhaps you may need to balance the steps before and after the "sudden converge" on your own to find a balance. The training after "sudden converge" is also important.
But usually, if your logic batch size is already bigger than 256, then further extending the batch size is not very meaningful. In that case, perhaps a better idea is to train more steps.
But usually, if your logic batch size is already bigger than 256, then further extending the batch size is not very meaningful. In that case, perhaps a better idea is to train more steps. I tried some "common" logic batch size at 64 or 96 or 128, it seems that many complicated conditions can be solved very well already.