Merge branch 'main' into develop

2026-01-10 14:18:00 -05:00 · 2024-01-28 19:21:24 -05:00
parent 8e4c602685 92617f8484
commit 18f9b7cdc0
13 changed files with 143 additions and 100 deletions
--- a/README.md
+++ b/README.md
@@ -8,12 +8,17 @@ The latest models can be found on HuggingFace:
 3B v2 (Based on Phi-2): https://huggingface.co/acon96/Home-3B-v2-GGUF  
 1B v2 (Based on Phi-1.5): https://huggingface.co/acon96/Home-1B-v2-GGUF  

-Make sure you have `llama-cpp-python>=0.2.29` in order to run these models.
+<details>
+
+<summary>Old Models</summary>  

-Old Models:  
 3B v1 (Based on Phi-2): https://huggingface.co/acon96/Home-3B-v1-GGUF  
 1B v1 (Based on Phi-1.5): https://huggingface.co/acon96/Home-1B-v1-GGUF  

+</details>
+
+Make sure you have `llama-cpp-python>=0.2.29` in order to run these models.
+
 The main difference between the 2 models (besides parameter count) is the training data. The 1B model is ONLY trained on the synthetic dataset provided in this project, while the 3B model is trained on a mixture of this synthetic dataset, and the cleaned Stanford Alpaca dataset.

 The model is quantized using Llama.cpp in order to enable running the model in super low resource environments that are common with Home Assistant installations such as Raspberry Pis.
@@ -32,7 +37,7 @@ light.kitchen 'Kitchen Light' = on;80%;red
 light.bedroom 'Bedroom Light' = off<|im_end|>
 ```

-For more about how the model is prompted see [./docs/Model Prompting.md]
+For more about how the model is prompted see [Model Prompting](/docs/Model%20Prompting.md)

 Output from the model will consist of a response that should be relayed back to the user, along with an optional code block that will invoke different Home Assistant "services". The output format from the model for function calling is as follows:

@@ -63,6 +68,9 @@ The supported entity types are: light, fan, cover, lock, media_player, climate,
 ### Training
 The 3B model was trained as a LoRA on an RTX 3090 (24GB) using the following settings for the custom training script. The embedding weights were "saved" and trained normally along with the rank matricies in order to train the newly added tokens to the embeddings. The full model is merged together at the end. Training took approximately 10 hours.

+<details>
+<summary>Training Arguments</summary>
+
 ```
 python3 train.py \
    --run_name home-3b \
@@ -79,7 +87,12 @@ python3 train.py \
    --use_lora --lora_rank 32 --lora_alpha 64 --lora_modules fc1,fc2,q_proj,v_proj,dense --lora_modules_to_save embed_tokens,lm_head --lora_merge
 ```

-The 1B model was trained as a full fine-tuning on on an RTX 3090 (24GB). Training took approximately 1.5 hours.
+</details>
+
+The 1B model was trained as a full fine-tuning on on an RTX 3090 (24GB). Training took approximately 2.5 hours.
+
+<details>
+<summary>Training Arguments</summary>

 ```
 python3 train.py \
@@ -94,6 +107,9 @@ python3 train.py \
    --ctx_size 2048
 ```

+</details>
+<br/>
+
 ## Home Assistant Component
 In order to integrate with Home Assistant, we provide a `custom_component` that exposes the locally running LLM as a "conversation agent" that can be interacted with using a chat interface as well as integrate with Speech-to-Text and Text-to-Speech addons to enable interacting with the model by speaking.  

@@ -142,7 +158,7 @@ You need the following settings in order to configure the "remote" backend:

 With the remote text-generation-webui backend, the component will validate that the selected model is available for use and will ensure it is loaded remotely. The Generic OpenAI compatible version does NOT do any validation or model loading.

-**Setting up with LocalAI**:
+**Setting up with LocalAI**:  
 If you are an existing LocalAI user or would like to use LocalAI as your backend, please refer to [this](https://io.midori-ai.xyz/howtos/setup-with-ha/) website which has instructions on how to setup LocalAI to work with Home-LLM including automatic installation of the latest version of the the Home-LLM model. The auto-installer (LocalAI Manager) will automatically download and setup LocalAI and/or the model of your choice and automatically create the necessary template files for the model to work with this integration.

 ### Configuring the component as a Conversation Agent
@@ -187,11 +203,12 @@ The RPI4 4GB that I have was sitting right at 1.5 tokens/sec for prompt eval and
 It is highly recommend to set up text-generation-webui on a separate machine that can take advantage of a GPU.

 ## Version History
-| Version | Description                                                                                                                                    |
-| ------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
-| v0.2.4  | Fix API key auth on model load for text-generation-webui, and add support for Ollama API backend                                               |
-| v0.2.3  | Fix API key auth, Support chat completion endpoint, and refactor to make it easier to add more remote backends                                 |
-| v0.2.2  | Fix options window after upgrade, fix training script for new Phi model format, and release new models                                         |
-| v0.2.1  | Properly expose generation parameters for each backend, handle config entry updates without reloading, support remote backends with an API key |
-| v0.2    | Bug fixes, support more backends, support for climate + switch devices, JSON style function calling with parameters, GBNF grammars             |
-| v0.1    | Initial Release                                                                                                                                |
+| Version | Description                                                                                                                                                                                      |
+| ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| v0.2.5  | Fix Ollama max tokens parameter, fix GGUF download from Hugging Face, update included llama-cpp-python to 0.2.32, and add parameters to function calling for dataset + component, & model update |
+| v0.2.4  | Fix API key auth on model load for text-generation-webui, and add support for Ollama API backend                                                                                                 |
+| v0.2.3  | Fix API key auth, Support chat completion endpoint, and refactor to make it easier to add more remote backends                                                                                   |
+| v0.2.2  | Fix options window after upgrade, fix training script for new Phi model format, and release new models                                                                                           |
+| v0.2.1  | Properly expose generation parameters for each backend, handle config entry updates without reloading, support remote backends with an API key                                                   |
+| v0.2    | Bug fixes, support more backends, support for climate + switch devices, JSON style function calling with parameters, GBNF grammars                                                               |
+| v0.1    | Initial Release                                                                                                                                                                                  |
--- a/TODO.md
+++ b/TODO.md
@@ -10,6 +10,20 @@
 - [x] Function calling as JSON  
 - [ ] multi-turn prompts; better instruct dataset like dolphin/wizardlm?  
 - [x] Fine tune Phi-1.5 version  
+- [x] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"  
+- [ ] prime kv cache with current "state" so that requests are faster  
+- [x] make a proper evaluation framework to run. not just loss. should test accuracy on the function calling  
+- [x] add more remote backends  
+    - LocalAI (openai compatible)  
+    - Ollama  
+    - support chat completions API (might fix Ollama + adds support for text-gen-ui characters)
+- [x] more config options for prompt template (allow other than chatml)  
+- [ ] publish snapshot of dataset on HF  
+- [ ] figure out DPO for refusals + fixing incorrect entity id  
+- [ ] mixtral + prompting (no fine tuning)  
+- [ ] use varied system prompts to add behaviors  
+
+## more complicated ideas
 - [ ] "context requests"  
    - basically just let the model decide what RAG/extra context it wants  
    - the model predicts special tokens as the first few tokens of its output  
@@ -18,15 +32,4 @@
 - [ ] RAG for getting info for setting up new devices  
    - set up vectordb  
    - ingest home assistant docs  
-    - "context request" from above to initiate a RAG search  
- [x] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"  
- [ ] prime kv cache with current "state" so that requests are faster  
- [ ] make a proper evaluation framework to run. not just loss. should test accuracy on the function calling  
- [ ] add more remote backends  
-    - LocalAI (openai compatible)  
-    - Ollama  
-    - support chat completions API (might fix Ollama + adds support for text-gen-ui characters)
- [x] more config options for prompt template (allow other than chatml)  
- [ ] publish snapshot of dataset on HF  
- [ ] figure out DPO for refusals + fixing incorrect entity id  
- [ ] mixtral + prompting (no fine tuning)
+    - "context request" from above to initiate a RAG search  
--- a/addon/Dockerfile
+++ b/addon/Dockerfile
@@ -23,10 +23,10 @@ RUN \
        python3-venv \
        python3-pip \
    \
-    && git clone https://github.com/oobabooga/text-generation-webui.git ${APP_DIR} --branch snapshot-2024-01-21 \
+    && git clone https://github.com/oobabooga/text-generation-webui.git ${APP_DIR} --branch snapshot-2024-01-28 \
    && python3 -m pip install torch torchvision torchaudio py-cpuinfo==9.0.0 \
    && python3 -m pip install -r ${APP_DIR}/requirements_cpu_only_noavx2.txt -r ${APP_DIR}/extensions/openai/requirements.txt llama-cpp-python \
-    && python3 -m pip install llama-cpp-python==0.2.29 \
+    && python3 -m pip install llama-cpp-python==0.2.32 \
    && apt-get purge -y --auto-remove \
        git \
        build-essential \
--- a/addon/config.yaml
+++ b/addon/config.yaml
@@ -1,6 +1,6 @@
 ---
 name: oobabooga-text-generation-webui
-version: 2024.01.21
+version: 2024.01.28
 slug: text-generation-webui
 description: "A tool for running Large Language Models"
 url: "https://github.com/oobabooga/text-generation-webui"
--- a/custom_components/llama_conversation/config_flow.py
+++ b/custom_components/llama_conversation/config_flow.py
@@ -11,7 +11,7 @@ from typing import Any
 from abc import ABC, abstractmethod
 from importlib.metadata import version

-from huggingface_hub import hf_hub_download
+from huggingface_hub import hf_hub_download, HfFileSystem

 import voluptuous as vol

@@ -170,15 +170,19 @@ def download_model_from_hf(
    model_name: str, quantization_type: str, storage_folder: str
 ):
    try:
-        expected_filename = (
-            model_name.split("/")[1].removesuffix("-GGUF") + f".{quantization_type}.gguf"
-        )
+        fs = HfFileSystem()
+        potential_files = [ f for f in fs.glob(f"{model_name}/*.gguf") ]
+        wanted_file = [f for f in potential_files if (f".{quantization_type.lower()}." in f or f".{quantization_type.upper()}." in f)]
+
+        if len(wanted_file) != 1:
+            raise Exception(f"The quantization '{quantization_type}' does not exist in the HF repo for {model_name}")
+
        os.makedirs(storage_folder, exist_ok=True)

        return hf_hub_download(
            repo_id=model_name,
            repo_type="model",
-            filename=expected_filename,
+            filename=wanted_file[0].removeprefix(model_name + "/"),
            resume_download=True,
            cache_dir=storage_folder,
        )
@@ -446,6 +450,8 @@ class ConfigFlow(BaseLlamaConversationConfigFlow, config_entries.ConfigFlow, dom
            )

        download_result = user_input["result"]
+        self.download_task = None
+
        if isinstance(download_result, Exception):
            _LOGGER.info("Failed to download model: %s", repr(download_result))
            self.download_error = download_result
--- a/custom_components/llama_conversation/manifest.json
+++ b/custom_components/llama_conversation/manifest.json
@@ -1,7 +1,7 @@
 {
  "domain": "llama_conversation",
  "name": "LLaMA Conversation",
-  "version": "0.2.4",
+  "version": "0.2.5",
  "codeowners": ["@acon96"],
  "config_flow": true,
  "dependencies": ["conversation"],
--- a/data/README.md
+++ b/data/README.md
@@ -4,8 +4,26 @@ The dataset is generated from the different CSV "piles". The "piles" contain dif

 ## Generating the custom dataset

-`python3 generate_home_assistant_data.py --train --test`
+`python3 generate_home_assistant_data.py --train --test --large`

-## Merging with Alpaca for training
+## Merging with other datasets for training

-`python3 generate_home_assistant_data.py --merge-alpaca`
+`python3 generate_home_assistant_data.py --merge <dataset>`
+
+Supported datasets right now are: 
+- `alpaca`
+- `wizardlm70k`
+
+## Potential Other Datasets to Use
+
+### SFT
+Alpaca: https://huggingface.co/datasets/yahma/alpaca-cleaned
+Alpaca (Translated): https://huggingface.co/datasets/saillab/taco-datasets
+WizardLM 200k: https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k
+WizardLM 70k: https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_70k
+Huggingface Ultrachat 200k: https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k
+OpenOrca Slim Deduped (363k): https://huggingface.co/datasets/Open-Orca/SlimOrca-Dedup
+
+### DPO
+Intel Orca DPO Pairs: https://huggingface.co/datasets/Intel/orca_dpo_pairs
+Huggingface Ultrachat: https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized
--- a/dist/llama_cpp_python-0.2.29-cp311-cp311-musllinux_1_2_aarch64.whl
+++ b/dist/llama_cpp_python-0.2.29-cp311-cp311-musllinux_1_2_aarch64.whl
--- a/dist/llama_cpp_python-0.2.29-cp311-cp311-musllinux_1_2_x86_64.whl
+++ b/dist/llama_cpp_python-0.2.29-cp311-cp311-musllinux_1_2_x86_64.whl
--- a/dist/llama_cpp_python-0.2.32-cp311-cp311-musllinux_1_2_aarch64.whl
+++ b/dist/llama_cpp_python-0.2.32-cp311-cp311-musllinux_1_2_aarch64.whl
--- a/dist/llama_cpp_python-0.2.32-cp311-cp311-musllinux_1_2_x86_64.whl
+++ b/dist/llama_cpp_python-0.2.32-cp311-cp311-musllinux_1_2_x86_64.whl
--- a/dist/run_docker.sh
+++ b/dist/run_docker.sh
@@ -3,4 +3,4 @@
 docker run -it --rm \
    --entrypoint bash \
    -v $(pwd):/tmp/dist \
-    homeassistant/home-assistant /tmp/dist/make_wheel.sh v0.2.29
+    homeassistant/home-assistant /tmp/dist/make_wheel.sh v0.2.32
--- a/docs/expermement-notes-phi-1_5.txt
+++ b/docs/expermement-notes-phi-1_5.txt
@@ -1,5 +1,5 @@
-# home-llm experiements (phi1.5)
-rev1 - original test
+# early home-llm experiements (phi1.5)
+### rev1 - original test
 - 1 epoch
 - train ctx 1900
 - I think the learning rate was way too high (2e-4)
@@ -7,7 +7,7 @@ rev1 - original test
 - it doesn't get the device name right like ever
 - eval dataset was disabled

-rev2 - it kinda works
+### rev2 - it kinda works
 - eval dataset at 10%
 - 2 epochs
 - train ctx 1200
@@ -18,7 +18,7 @@ rev2 - it kinda works
 + still repeatedly spits out code blocks but at least it closes them correctly
 + names are MUCH more accurate. will still hallucinate names that don't exist

-rev3 - ok it definitely works
+### rev3 - ok it definitely works
 - 4 epochs
 - batch size 2
 - learning rate cosine 5e-5
@@ -26,24 +26,24 @@ rev3 - ok it definitely works
 + still halluncinates device names. (need to figure this one out)
 + need more examples for: garage_door, media_player, 

-rev4 - got to way lower loss. it tries really hard to stop generating text
+### rev4 - got to way lower loss. it tries really hard to stop generating text
 - 4 epochs
 - train ctx 512
 - batch size 2
 - learning rate cosine 1e-4
 - added system prompt and moved services block before states block

-rev 4.1 - really doesn't work as well. loss dropped REALLY fast and then never got as low as rev4
+### rev 4.1 - really doesn't work as well. loss dropped REALLY fast and then never got as low as rev4
 - 4 epochs
 - train ctx 512
 - batch size 3
 - learning rate cosine 1e-4
 - proper pad token

-rev 4.2 - yeah nah it's the pad token
+### rev 4.2 - yeah nah it's the pad token
 - batch size 2

-rev 5 - new dataset
+### rev 5 - new dataset
 - 3 epochs (4th epoch was overfit)
 - train cx 512
 - batch size 2
@@ -51,14 +51,14 @@ rev 5 - new dataset
 + actually stops generating text. not at the right... place but still!
 + messing with temperature makes it generate some interesting output.

-rev 5.1 - gradient accumulation test
+### rev 5.1 - gradient accumulation test
 - 3 epochs
 - train cx 512
 - batch size 8
 - learning rate cosine 1e-5
 + very meh

-rev 5.2 - learning rate test
+### rev 5.2 - learning rate test
 - 3 epochs
 - train cx 512
 - batch size 8
@@ -68,14 +68,14 @@ rev 5.2 - learning rate test
 + still need more examples for multi-device actions (really need room/group support in dataset)
 + need to have more variance in request format. need more informal + more formal versions

-rev 5.3 - learning rate test 2
+### rev 5.3 - learning rate test 2
 - 4 epochs
 - train cx 512
 - batch size 8
 - learning rate cosine 6e-5
 + lower learning rate seemed to not be as effective even though it ran for longer

-rev 6 - dataset revamp again
+### rev 6 - dataset revamp again
 - 3 epochs
 - train cx 512
 - batch size 8
@@ -85,58 +85,58 @@ rev 6 - dataset revamp again
 + definitely a bit overfit
 + maybe not so overfit. able to 0 shot asking to do stuff to a christmas tree

-rev 6.1 - lower train rate
+### rev 6.1 - lower train rate
 - 3 epochs
 - train cx 512
 - batch size 8
 - learning rate cosine 6e-5
 + also definitely a bit overfit. can't generate names it hasn't seen before

-rev 6.2 - fewer epochs
+### rev 6.2 - fewer epochs
 - 2 epochs
 - train cx 512
 - batch size 8
 - learning rate cosine 6e-5

-rev 6.3 - higher batch
+### rev 6.3 - higher batch
 - 2 epochs
 - train cx 512
 - batch size 12
 - learning rate cosine 1e-4

-rev 7 - tweak dataset again
+### rev 7 - tweak dataset again
 - 2 epochs
 - train ctx 512
 - batch size 8
 - learning rate 1e-4
 + when generating results, don't end with a space. it works WAY better

-rev 7.1 - failed padding attempt
+### rev 7.1 - failed padding attempt

-rev 7.2 - try to overfit less + no newline at end
+### rev 7.2 - try to overfit less + no newline at end
 - 1 epoch
 - train ctx 512
 - batch size 8
 - learning rate 1e-4
 + it definitly works with only one epoch

-rev 7.3 - try adding fake end of sentence token
+### rev 7.3 - try adding fake end of sentence token
 - 1 epoch
 - train ctx 512
 - batch size 8
 - learning rate 1e-4

-rev 8 - dataset tweaks. add status requests
+### rev 8 - dataset tweaks. add status requests
 + service requests still mostly work but status requests are pretty broken

-rev 8.1 - tweak example counts + ratios
+### rev 8.1 - tweak example counts + ratios
 - 1 epoch
 - train ctx 512
 - batch size 8
 - learning rate 1e-4
 + seems to have worked better with lower example counts

-rev 8.2 - try to fit learning rate so loss doesn't bottom out till the end of training
+### rev 8.2 - try to fit learning rate so loss doesn't bottom out till the end of training
 - 1 epoch
 - train ctx 512
 - batch size 8
@@ -147,7 +147,7 @@ rev 8.2 - try to fit learning rate so loss doesn't bottom out till the end of tr
 + oh yuuhhhhh it's overcranked. nails both request types (plus even ending generation)
 + needs ambiguous device name examples because I totally just asked it an ambiguous question and it answered the one I wasn't expecting

-rev 8.3 - further reduced training rate
+### rev 8.3 - further reduced training rate
 - 1 epoch
 - train ctx 512
 - batch size 8
@@ -156,7 +156,7 @@ rev 8.3 - further reduced training rate
 + has some creativity with how it repsonds
 + will often get the device name wrong on the first try

-rev 8.4 - re-ordered prompt
+### rev 8.4 - re-ordered prompt
 - 1 epoch
 - train ctx 512
 - batch size 8
@@ -165,7 +165,7 @@ rev 8.4 - re-ordered prompt
 + it *works* but is incredibly open ended
 + basically never stops generating text

-rev 8.5 - tweaked prompt format again
+### rev 8.5 - tweaked prompt format again
 - 1 epoch
 - train ctx 512
 - batch size 8
@@ -173,7 +173,7 @@ rev 8.5 - tweaked prompt format again
 - re-orderd response before actions again but made actions less like a "block" so it might stop generation
 + that worked rather badly

-rev 8.6 - make prompt look more like other examples it has seen before
+### rev 8.6 - make prompt look more like other examples it has seen before
 - 1 epoch
 - train ctx 512
 - batch size 8
@@ -183,7 +183,7 @@ rev 8.6 - make prompt look more like other examples it has seen before
 + only get the correct response about 50% of the time
 + it totally stops correctly when it DOES work

-rev 8.7 - try to fit a bit more. the last iteration jumps around on which format it chooses
+### rev 8.7 - try to fit a bit more. the last iteration jumps around on which format it chooses
 - 1 epoch
 - train ctx 512
 - batch size 8
@@ -192,7 +192,7 @@ rev 8.7 - try to fit a bit more. the last iteration jumps around on which format
 + altering the format (with newlines) makes it pick our format more often
 + comparing to 8.6 with modified format shows this one is better at getting device names right

-rev 8.8 - train with newlines instead of spaces in requets/response
+### rev 8.8 - train with newlines instead of spaces in requets/response
 - 1 epoch
 - train ctx 512
 - batch size 8
@@ -200,14 +200,14 @@ rev 8.8 - train with newlines instead of spaces in requets/response
 + definitely worse than the previous one
 + for some reason both 8.7 and 8.8 are horrible when using their actual template but if you deviate slightly it works a lot better on inference

-rev 8.9 - actually fix pad token
+### rev 8.9 - actually fix pad token
 - 1 epoch
 - train ctx 512
 - batch size 8
 - learning rate 1e-5
 + properly generates a response (+ terminates) when using the actual template

-rev 9 - reduced dataset size
+### rev 9 - reduced dataset size
 - 1 epoch
 - train ctx 512
 - batch size 8
@@ -233,20 +233,19 @@ rev 9 - reduced dataset size
 + it works OK with low temperatures
 + seems to handle the alpaca dataset not so well

-Eval results for existing models:
-Home-1b-v1: 0.767816091954023
-Home-3b-v2: 0.6908045977011494
+### Home-1b-v1-GGUF
+- eval results: 0.767816091954023

-## home-1b-rev5 series
+## home-1b-rev5/6 parameters
 - 1 epoch
 - 2048 train ctx
 - batch size 8
 - learning rate 1e-5
 - weight decay 0.1
 - gradient clipping 1.0
- save model every 200 steps
+- save model every 200 or 400 steps

-home-1b-rev5
+### home-1b-rev5
 - dataset size: medium
 - evaluation results:
  - 200: 0.553448275862069
@@ -257,7 +256,7 @@ home-1b-rev5
  - 1200: 0.8488505747126437 (+.009)
  - Final (1467): 0.8494252873563218 (+.00005)

-home-1b-rev5_1
+### home-1b-rev5_1
 - dataset size: small
 - evaluation results:
  - 200: 0.6057471264367816
@@ -266,7 +265,7 @@ home-1b-rev5_1
  - 800: 0.7729885057471264 (+.0046)
  - Final (869): bad

-home-1b-rev5_2
+### home-1b-rev5_2
 - dataset size: large
 - evaluation results:
  - 200: --
@@ -279,15 +278,11 @@ home-1b-rev5_2
  - 1600: 0.8844827586206897
  - Final (1848): 0.8833333333333333

-home-3b-v3-rev1
- dataset size: large
- evaluation results: 0.9091954022988505
-
-home-1b-rev6
+### home-1b-rev6
 - dataset size: large (fixed templates + function calling arguments; brightness is broken)
 - evaluation results: 0.8254149971379507

-home-1b-rev6_1
+### home-1b-rev6_1
 - dataset size: xl (fixed templates + function calling arguments; 0-255 brightness is broken)
 - evaluation results: 
  - 400: 0.7240984544934173
@@ -297,7 +292,7 @@ home-1b-rev6_1
  - 2000: 0.8551803091013166
  - Final (2322): 0.8586147681740126

-home-1b-rev6_2 = Home-1B-v2-GGUF
+### home-1b-rev6_2 = Home-1B-v2-GGUF
 - dataset size: large (change brightness back to percentages; increase color references by ~2x)
 - evaluation results: 
  - 400: 0.7856064418721691
@@ -307,24 +302,28 @@ home-1b-rev6_2 = Home-1B-v2-GGUF
  - 2000: 0.8852541519879215
  - Final (2048): 

-home-3b-v3-rev2 = Home-3B-v2-GGUF
+# Home 3B
+- 1 epoch
+- 2048 train ctx
+- batch size 8
+- learning rate 1e-5
+- weight decay 0.1
+- gradient clipping 1.0
+- save model every 200 or 400 steps
+
+Missing a lot of earlier 3B training results (not sure where they are)
+
+### Home-3b-v2-GGUF (broken training run)
+- evaluation result: 0.6908045977011494
+
+### home-3b-v3-rev1
+- dataset size: large
+- evaluation results: 0.9091954022988505
+
+### home-3b-v3-rev2 = Home-3B-v2-GGUF (republished)
 - dataset size: xl + alpaca
 - evaluation results: 0.8731756416708606

-Home-3B-v2-GGUF:ha_only
+### Home-3B-v2-GGUF:ha_only
 - dataset size: large
- evaluation results: FAILED (again.....)
-
-# Datasets
-
-## SFT
-Alpaca: https://huggingface.co/datasets/yahma/alpaca-cleaned
-Alpaca (Translated): https://huggingface.co/datasets/saillab/taco-datasets
-WizardLM 200k: https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k
-WizardLM 70k: https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_70k
-Huggingface Ultrachat 200k: https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k
-OpenOrca Slim Deduped (363k): https://huggingface.co/datasets/Open-Orca/SlimOrca-Dedup
-
-## DPO
-Intel Orca DPO Pairs: https://huggingface.co/datasets/Intel/orca_dpo_pairs
-Huggingface Ultrachat: https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized
+- evaluation results: FAILED (again.....)