set up for rev 5.1

This commit is contained in:
Alex O'Connell
2023-10-16 00:01:38 -04:00
parent 01cef3dcd9
commit 82b7930624
2 changed files with 10 additions and 7 deletions

View File

@@ -43,18 +43,21 @@ rev 4.2 - yeah nah it's the pad token
- batch size 2
rev 5 - new dataset
- 4 epochs
- 3 epochs (4th epoch was overfit)
- train cx 512
- batch size 2
- learning rate cosize 1e-5
- actually stops generating text. not at the right... place but still!
- messing with temperature makes it generate some interesting output.
TODO:
rev 5.1 - gradient accumulation test
- 4 epochs
- 3 epochs
- train cx 512
- batch size 8
- learning rate cosize 1e-5
- learning rate cosize 5e-6
Ideas:
- get rid of services block. will i just learn it on it's own?
- figure out how to penalize the wrong device name more?
- figure out how to penalize the wrong device name more?
- need to make the device name/description and device ID match less in the examples.
- it is learning to take the name of the device in the serviec call block from the description, not the states block

View File

@@ -9,7 +9,7 @@ torch.set_default_device("cuda")
torch.set_default_tensor_type('torch.cuda.FloatTensor')
TRAIN_CTX_SIZE = 512 # The number of tokens to pad + truncate the input examples to
BATCH_SIZE = 2 # The simulated "batch size" that we will train on. will tweak gradient accumulations steps
BATCH_SIZE = 8 # The simulated "batch size" that we will train on. will tweak gradient accumulations steps
MICRO_BATCH_SIZE = 2 # The actual batch size that will fit into VRAM on this machine
TRAINING_EPOCHS = 4 # The number of times to train the model on each example
LEARNING_RATE_START = 1e-5 # The starting learning rate (speed at which the model trains)