mirror of
https://github.com/acon96/home-llm.git
synced 2026-01-09 13:48:05 -05:00
183 lines
4.4 KiB
Markdown
183 lines
4.4 KiB
Markdown
# StableLM 2 Zephyr 1.6B
|
|
## rev1
|
|
- 1 epoch
|
|
- 2048 train ctx
|
|
- batch size 8
|
|
- learning rate 1e-5
|
|
- weight decay 0.1
|
|
- gradient clipping 1.0
|
|
- dataset size: small
|
|
+ it honestly works not terribly and I was kinda able to get it to respond to german
|
|
+ evaluation results: 0.7108953613807982
|
|
|
|
## rev2
|
|
- dataset size: large (also rewrote how it works slightly)
|
|
+ evaluation results:
|
|
- 600: 0.7826321467098166
|
|
- 800: 0.8090614886731392
|
|
- 1000: 0.7669902912621359
|
|
- 1200: 0.7944983818770227
|
|
- 1400: 0.8176914778856527
|
|
- 1600: 0.8268608414239482
|
|
- 1800: 0.8263214670981661
|
|
- Final: 0.8274002157497303
|
|
|
|
# StableLM Zephyr 3B
|
|
## rev1
|
|
- 1 epoch
|
|
- 2048 train ctx
|
|
- batch size 8
|
|
- learning rate 1e-5
|
|
- weight decay 0.1
|
|
- gradient clipping 1.0
|
|
- lora rank: 32, alpha: 64
|
|
- accidentally forgot to turn off fine tuning of embeddings
|
|
- dataset size: large
|
|
+ evaluation results:
|
|
- 400: 0.8344
|
|
- 800: 0.9228694714131608
|
|
- 1200: 0.9401294498381877
|
|
- 1600: 0.95361380798274
|
|
- Final (1929): 0.9492988133764833
|
|
|
|
# rev2
|
|
- not fine-tuning the embeddings (no added tokens)
|
|
- dataset: new version with varied system prompts/responses (small)
|
|
+ evauluation results:
|
|
- 400: 0.6748893105629349
|
|
- 800: 0.7280202403542062
|
|
- 1200: 0.7685009487666035
|
|
- 1600: 0.7798861480075902
|
|
- Final (1967): 0.7849462365591398
|
|
+ definitely needs more data
|
|
|
|
# rev3
|
|
- lora rank: 64, alpha: 128
|
|
- dataset size: large
|
|
+ evaluation results:
|
|
- 400: 0.8785578747628083
|
|
- 800: 0.9247311827956989
|
|
- 1200: 0.9348513598987982
|
|
- 1600: 0.9222011385199241
|
|
- 2000: 0.9354838709677419
|
|
- 2400: 0.9740670461733081
|
|
- 2800: 0.9595192915876027
|
|
- 3200: 0.948134092346616
|
|
- 3600: 0.963314358001265
|
|
- 4000: 0.9614168247944339
|
|
- Final (~4200): 0.9538266919671095
|
|
|
|
# rev4
|
|
- lora rank: 64, alpha: 128
|
|
- dataset size: large (with new device types)
|
|
+ evaluation results:
|
|
- 400: 0.867914979757085
|
|
- 800: 0.9316801619433198
|
|
- 1200: 0.9215587044534413
|
|
- 1600: 0.9686234817813765
|
|
- 2000: 0.9772267206477733
|
|
- 2400: 0.9752024291497976
|
|
- 2800: 0.9802631578947368
|
|
- 3200: 0.9777327935222672
|
|
- 3600: 0.9812753036437247
|
|
- 4000: 0.979251012145749
|
|
- 4400: 0.978744939271255
|
|
- 4800: 0.9777327935222672
|
|
- Final (5234): 0.9782388663967612
|
|
+ overfit
|
|
|
|
# rev5
|
|
- lora rank: 64, alpha: 128
|
|
- dataset size: medium (with new device types)
|
|
+ evaluation results:
|
|
- 400: 0.8709514170040485
|
|
- 800: 0.9316801619433198
|
|
- 1200: 0.9544534412955465
|
|
- 1600: 0.9559716599190283
|
|
- 2000: 0.9671052631578947
|
|
- 2400: 0.9671052631578947
|
|
- 2800: 0.9701417004048583
|
|
- 3200: 0.9696356275303644
|
|
- 3600: 0.9736842105263158
|
|
- 4000: 0.9706477732793523
|
|
- Final: 0.9711538461538461
|
|
|
|
# rev6
|
|
- lora rank: 64, alpha: 128
|
|
- batch size: 32
|
|
- dataset size: medium (with new device types)
|
|
+ evaluation results:
|
|
- 100: 0.7545546558704453
|
|
- 200: 0.8567813765182186
|
|
- 300: 0.8977732793522267
|
|
- 400: 0.9068825910931174
|
|
- 500: 0.9261133603238867
|
|
- 600: 0.9342105263157895
|
|
- 700: 0.9407894736842105
|
|
- 800: 0.9478744939271255
|
|
- 900: 0.937246963562753
|
|
- 1000: 0.9438259109311741
|
|
- Final: 0.9453441295546559
|
|
|
|
# rev7
|
|
- lora rank: 64, alpha: 128
|
|
- epochs: 2
|
|
- batch size: 128
|
|
- dataset size: large (with fixed service names)
|
|
+ evaluation results:
|
|
- 50: 0.6022267206477733
|
|
- 100: 0.8254048582995951
|
|
- 150: 0.8689271255060729
|
|
- 200: 0.9013157894736842
|
|
- 250: 0.9073886639676113
|
|
- 300: 0.9210526315789473
|
|
- 350: 0.937753036437247
|
|
- 400: 0.9362348178137652
|
|
- 450: 0.9478744939271255
|
|
- 500: 0.9463562753036437
|
|
- 550:
|
|
- 600: 0.9473684210526315
|
|
- 650: 0.9387651821862348
|
|
- Final: 0.9463562753036437
|
|
- german: 0.5758754863813229
|
|
- french: 0.6490034030140982
|
|
- spanish: 0.6481391976800387
|
|
|
|
# rev9
|
|
- full fine-tune
|
|
- epochs: 1
|
|
- batch size: 64
|
|
- dataset size: medium /w 4 languages
|
|
+ eval results:
|
|
- english: 0.9961183891314895
|
|
- german: 0.9571984435797666
|
|
- french: 0.9484686436558094
|
|
- spanish: 0.9685838569357177
|
|
|
|
|
|
## stablelm-2-1_6b-zephyr
|
|
|
|
# rev3
|
|
- full fine tune
|
|
- epochs: 1
|
|
- 2048 train ctx
|
|
- batch size 32
|
|
- learning rate 1e-5
|
|
- weight decay 0.1
|
|
- gradient clipping 1.0
|
|
- dataset size: medium
|
|
+ evaluation results:
|
|
- 100: 0.35779352226720645
|
|
- 200: 0.5247975708502024
|
|
- 300: 0.5339068825910931
|
|
- 400: 0.6280364372469636
|
|
- 500: 0.6923076923076923
|
|
- 600: 0.7064777327935222
|
|
- 700: 0.7135627530364372
|
|
- 800: 0.7044534412955465
|
|
- 900: 0.707995951417004
|
|
- 1000: 0.718117408906882
|
|
- Final: 0.7145748987854251
|
|
|
|
# rev4
|
|
- dataset size: large |