mirror of
https://github.com/acon96/home-llm.git
synced 2026-01-09 21:58:00 -05:00
set up for rev 5.1
This commit is contained in:
@@ -43,18 +43,21 @@ rev 4.2 - yeah nah it's the pad token
|
|||||||
- batch size 2
|
- batch size 2
|
||||||
|
|
||||||
rev 5 - new dataset
|
rev 5 - new dataset
|
||||||
- 4 epochs
|
- 3 epochs (4th epoch was overfit)
|
||||||
- train cx 512
|
- train cx 512
|
||||||
- batch size 2
|
- batch size 2
|
||||||
- learning rate cosize 1e-5
|
- learning rate cosize 1e-5
|
||||||
|
- actually stops generating text. not at the right... place but still!
|
||||||
|
- messing with temperature makes it generate some interesting output.
|
||||||
|
|
||||||
|
TODO:
|
||||||
rev 5.1 - gradient accumulation test
|
rev 5.1 - gradient accumulation test
|
||||||
- 4 epochs
|
- 3 epochs
|
||||||
- train cx 512
|
- train cx 512
|
||||||
- batch size 8
|
- batch size 8
|
||||||
- learning rate cosize 1e-5
|
- learning rate cosize 5e-6
|
||||||
|
|
||||||
|
|
||||||
Ideas:
|
Ideas:
|
||||||
- get rid of services block. will i just learn it on it's own?
|
- figure out how to penalize the wrong device name more?
|
||||||
- figure out how to penalize the wrong device name more?
|
- need to make the device name/description and device ID match less in the examples.
|
||||||
|
- it is learning to take the name of the device in the serviec call block from the description, not the states block
|
||||||
2
train.py
2
train.py
@@ -9,7 +9,7 @@ torch.set_default_device("cuda")
|
|||||||
torch.set_default_tensor_type('torch.cuda.FloatTensor')
|
torch.set_default_tensor_type('torch.cuda.FloatTensor')
|
||||||
|
|
||||||
TRAIN_CTX_SIZE = 512 # The number of tokens to pad + truncate the input examples to
|
TRAIN_CTX_SIZE = 512 # The number of tokens to pad + truncate the input examples to
|
||||||
BATCH_SIZE = 2 # The simulated "batch size" that we will train on. will tweak gradient accumulations steps
|
BATCH_SIZE = 8 # The simulated "batch size" that we will train on. will tweak gradient accumulations steps
|
||||||
MICRO_BATCH_SIZE = 2 # The actual batch size that will fit into VRAM on this machine
|
MICRO_BATCH_SIZE = 2 # The actual batch size that will fit into VRAM on this machine
|
||||||
TRAINING_EPOCHS = 4 # The number of times to train the model on each example
|
TRAINING_EPOCHS = 4 # The number of times to train the model on each example
|
||||||
LEARNING_RATE_START = 1e-5 # The starting learning rate (speed at which the model trains)
|
LEARNING_RATE_START = 1e-5 # The starting learning rate (speed at which the model trains)
|
||||||
|
|||||||
Reference in New Issue
Block a user