mirror of
https://github.com/acon96/home-llm.git
synced 2026-01-08 21:28:05 -05:00
set up for rev 5.1
This commit is contained in:
@@ -43,18 +43,21 @@ rev 4.2 - yeah nah it's the pad token
|
||||
- batch size 2
|
||||
|
||||
rev 5 - new dataset
|
||||
- 4 epochs
|
||||
- 3 epochs (4th epoch was overfit)
|
||||
- train cx 512
|
||||
- batch size 2
|
||||
- learning rate cosize 1e-5
|
||||
- actually stops generating text. not at the right... place but still!
|
||||
- messing with temperature makes it generate some interesting output.
|
||||
|
||||
TODO:
|
||||
rev 5.1 - gradient accumulation test
|
||||
- 4 epochs
|
||||
- 3 epochs
|
||||
- train cx 512
|
||||
- batch size 8
|
||||
- learning rate cosize 1e-5
|
||||
|
||||
- learning rate cosize 5e-6
|
||||
|
||||
Ideas:
|
||||
- get rid of services block. will i just learn it on it's own?
|
||||
- figure out how to penalize the wrong device name more?
|
||||
- figure out how to penalize the wrong device name more?
|
||||
- need to make the device name/description and device ID match less in the examples.
|
||||
- it is learning to take the name of the device in the serviec call block from the description, not the states block
|
||||
2
train.py
2
train.py
@@ -9,7 +9,7 @@ torch.set_default_device("cuda")
|
||||
torch.set_default_tensor_type('torch.cuda.FloatTensor')
|
||||
|
||||
TRAIN_CTX_SIZE = 512 # The number of tokens to pad + truncate the input examples to
|
||||
BATCH_SIZE = 2 # The simulated "batch size" that we will train on. will tweak gradient accumulations steps
|
||||
BATCH_SIZE = 8 # The simulated "batch size" that we will train on. will tweak gradient accumulations steps
|
||||
MICRO_BATCH_SIZE = 2 # The actual batch size that will fit into VRAM on this machine
|
||||
TRAINING_EPOCHS = 4 # The number of times to train the model on each example
|
||||
LEARNING_RATE_START = 1e-5 # The starting learning rate (speed at which the model trains)
|
||||
|
||||
Reference in New Issue
Block a user