Merge branch 'main' into develop

This commit is contained in:
Alex O'Connell
2024-01-28 19:21:24 -05:00
13 changed files with 143 additions and 100 deletions

View File

@@ -8,12 +8,17 @@ The latest models can be found on HuggingFace:
3B v2 (Based on Phi-2): https://huggingface.co/acon96/Home-3B-v2-GGUF
1B v2 (Based on Phi-1.5): https://huggingface.co/acon96/Home-1B-v2-GGUF
Make sure you have `llama-cpp-python>=0.2.29` in order to run these models.
<details>
<summary>Old Models</summary>
Old Models:
3B v1 (Based on Phi-2): https://huggingface.co/acon96/Home-3B-v1-GGUF
1B v1 (Based on Phi-1.5): https://huggingface.co/acon96/Home-1B-v1-GGUF
</details>
Make sure you have `llama-cpp-python>=0.2.29` in order to run these models.
The main difference between the 2 models (besides parameter count) is the training data. The 1B model is ONLY trained on the synthetic dataset provided in this project, while the 3B model is trained on a mixture of this synthetic dataset, and the cleaned Stanford Alpaca dataset.
The model is quantized using Llama.cpp in order to enable running the model in super low resource environments that are common with Home Assistant installations such as Raspberry Pis.
@@ -32,7 +37,7 @@ light.kitchen 'Kitchen Light' = on;80%;red
light.bedroom 'Bedroom Light' = off<|im_end|>
```
For more about how the model is prompted see [./docs/Model Prompting.md]
For more about how the model is prompted see [Model Prompting](/docs/Model%20Prompting.md)
Output from the model will consist of a response that should be relayed back to the user, along with an optional code block that will invoke different Home Assistant "services". The output format from the model for function calling is as follows:
@@ -63,6 +68,9 @@ The supported entity types are: light, fan, cover, lock, media_player, climate,
### Training
The 3B model was trained as a LoRA on an RTX 3090 (24GB) using the following settings for the custom training script. The embedding weights were "saved" and trained normally along with the rank matricies in order to train the newly added tokens to the embeddings. The full model is merged together at the end. Training took approximately 10 hours.
<details>
<summary>Training Arguments</summary>
```
python3 train.py \
--run_name home-3b \
@@ -79,7 +87,12 @@ python3 train.py \
--use_lora --lora_rank 32 --lora_alpha 64 --lora_modules fc1,fc2,q_proj,v_proj,dense --lora_modules_to_save embed_tokens,lm_head --lora_merge
```
The 1B model was trained as a full fine-tuning on on an RTX 3090 (24GB). Training took approximately 1.5 hours.
</details>
The 1B model was trained as a full fine-tuning on on an RTX 3090 (24GB). Training took approximately 2.5 hours.
<details>
<summary>Training Arguments</summary>
```
python3 train.py \
@@ -94,6 +107,9 @@ python3 train.py \
--ctx_size 2048
```
</details>
<br/>
## Home Assistant Component
In order to integrate with Home Assistant, we provide a `custom_component` that exposes the locally running LLM as a "conversation agent" that can be interacted with using a chat interface as well as integrate with Speech-to-Text and Text-to-Speech addons to enable interacting with the model by speaking.
@@ -142,7 +158,7 @@ You need the following settings in order to configure the "remote" backend:
With the remote text-generation-webui backend, the component will validate that the selected model is available for use and will ensure it is loaded remotely. The Generic OpenAI compatible version does NOT do any validation or model loading.
**Setting up with LocalAI**:
**Setting up with LocalAI**:
If you are an existing LocalAI user or would like to use LocalAI as your backend, please refer to [this](https://io.midori-ai.xyz/howtos/setup-with-ha/) website which has instructions on how to setup LocalAI to work with Home-LLM including automatic installation of the latest version of the the Home-LLM model. The auto-installer (LocalAI Manager) will automatically download and setup LocalAI and/or the model of your choice and automatically create the necessary template files for the model to work with this integration.
### Configuring the component as a Conversation Agent
@@ -187,11 +203,12 @@ The RPI4 4GB that I have was sitting right at 1.5 tokens/sec for prompt eval and
It is highly recommend to set up text-generation-webui on a separate machine that can take advantage of a GPU.
## Version History
| Version | Description |
| ------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| v0.2.4 | Fix API key auth on model load for text-generation-webui, and add support for Ollama API backend |
| v0.2.3 | Fix API key auth, Support chat completion endpoint, and refactor to make it easier to add more remote backends |
| v0.2.2 | Fix options window after upgrade, fix training script for new Phi model format, and release new models |
| v0.2.1 | Properly expose generation parameters for each backend, handle config entry updates without reloading, support remote backends with an API key |
| v0.2 | Bug fixes, support more backends, support for climate + switch devices, JSON style function calling with parameters, GBNF grammars |
| v0.1 | Initial Release |
| Version | Description |
| ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| v0.2.5 | Fix Ollama max tokens parameter, fix GGUF download from Hugging Face, update included llama-cpp-python to 0.2.32, and add parameters to function calling for dataset + component, & model update |
| v0.2.4 | Fix API key auth on model load for text-generation-webui, and add support for Ollama API backend |
| v0.2.3 | Fix API key auth, Support chat completion endpoint, and refactor to make it easier to add more remote backends |
| v0.2.2 | Fix options window after upgrade, fix training script for new Phi model format, and release new models |
| v0.2.1 | Properly expose generation parameters for each backend, handle config entry updates without reloading, support remote backends with an API key |
| v0.2 | Bug fixes, support more backends, support for climate + switch devices, JSON style function calling with parameters, GBNF grammars |
| v0.1 | Initial Release |

27
TODO.md
View File

@@ -10,6 +10,20 @@
- [x] Function calling as JSON
- [ ] multi-turn prompts; better instruct dataset like dolphin/wizardlm?
- [x] Fine tune Phi-1.5 version
- [x] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"
- [ ] prime kv cache with current "state" so that requests are faster
- [x] make a proper evaluation framework to run. not just loss. should test accuracy on the function calling
- [x] add more remote backends
- LocalAI (openai compatible)
- Ollama
- support chat completions API (might fix Ollama + adds support for text-gen-ui characters)
- [x] more config options for prompt template (allow other than chatml)
- [ ] publish snapshot of dataset on HF
- [ ] figure out DPO for refusals + fixing incorrect entity id
- [ ] mixtral + prompting (no fine tuning)
- [ ] use varied system prompts to add behaviors
## more complicated ideas
- [ ] "context requests"
- basically just let the model decide what RAG/extra context it wants
- the model predicts special tokens as the first few tokens of its output
@@ -18,15 +32,4 @@
- [ ] RAG for getting info for setting up new devices
- set up vectordb
- ingest home assistant docs
- "context request" from above to initiate a RAG search
- [x] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"
- [ ] prime kv cache with current "state" so that requests are faster
- [ ] make a proper evaluation framework to run. not just loss. should test accuracy on the function calling
- [ ] add more remote backends
- LocalAI (openai compatible)
- Ollama
- support chat completions API (might fix Ollama + adds support for text-gen-ui characters)
- [x] more config options for prompt template (allow other than chatml)
- [ ] publish snapshot of dataset on HF
- [ ] figure out DPO for refusals + fixing incorrect entity id
- [ ] mixtral + prompting (no fine tuning)
- "context request" from above to initiate a RAG search

View File

@@ -23,10 +23,10 @@ RUN \
python3-venv \
python3-pip \
\
&& git clone https://github.com/oobabooga/text-generation-webui.git ${APP_DIR} --branch snapshot-2024-01-21 \
&& git clone https://github.com/oobabooga/text-generation-webui.git ${APP_DIR} --branch snapshot-2024-01-28 \
&& python3 -m pip install torch torchvision torchaudio py-cpuinfo==9.0.0 \
&& python3 -m pip install -r ${APP_DIR}/requirements_cpu_only_noavx2.txt -r ${APP_DIR}/extensions/openai/requirements.txt llama-cpp-python \
&& python3 -m pip install llama-cpp-python==0.2.29 \
&& python3 -m pip install llama-cpp-python==0.2.32 \
&& apt-get purge -y --auto-remove \
git \
build-essential \

View File

@@ -1,6 +1,6 @@
---
name: oobabooga-text-generation-webui
version: 2024.01.21
version: 2024.01.28
slug: text-generation-webui
description: "A tool for running Large Language Models"
url: "https://github.com/oobabooga/text-generation-webui"

View File

@@ -11,7 +11,7 @@ from typing import Any
from abc import ABC, abstractmethod
from importlib.metadata import version
from huggingface_hub import hf_hub_download
from huggingface_hub import hf_hub_download, HfFileSystem
import voluptuous as vol
@@ -170,15 +170,19 @@ def download_model_from_hf(
model_name: str, quantization_type: str, storage_folder: str
):
try:
expected_filename = (
model_name.split("/")[1].removesuffix("-GGUF") + f".{quantization_type}.gguf"
)
fs = HfFileSystem()
potential_files = [ f for f in fs.glob(f"{model_name}/*.gguf") ]
wanted_file = [f for f in potential_files if (f".{quantization_type.lower()}." in f or f".{quantization_type.upper()}." in f)]
if len(wanted_file) != 1:
raise Exception(f"The quantization '{quantization_type}' does not exist in the HF repo for {model_name}")
os.makedirs(storage_folder, exist_ok=True)
return hf_hub_download(
repo_id=model_name,
repo_type="model",
filename=expected_filename,
filename=wanted_file[0].removeprefix(model_name + "/"),
resume_download=True,
cache_dir=storage_folder,
)
@@ -446,6 +450,8 @@ class ConfigFlow(BaseLlamaConversationConfigFlow, config_entries.ConfigFlow, dom
)
download_result = user_input["result"]
self.download_task = None
if isinstance(download_result, Exception):
_LOGGER.info("Failed to download model: %s", repr(download_result))
self.download_error = download_result

View File

@@ -1,7 +1,7 @@
{
"domain": "llama_conversation",
"name": "LLaMA Conversation",
"version": "0.2.4",
"version": "0.2.5",
"codeowners": ["@acon96"],
"config_flow": true,
"dependencies": ["conversation"],

View File

@@ -4,8 +4,26 @@ The dataset is generated from the different CSV "piles". The "piles" contain dif
## Generating the custom dataset
`python3 generate_home_assistant_data.py --train --test`
`python3 generate_home_assistant_data.py --train --test --large`
## Merging with Alpaca for training
## Merging with other datasets for training
`python3 generate_home_assistant_data.py --merge-alpaca`
`python3 generate_home_assistant_data.py --merge <dataset>`
Supported datasets right now are:
- `alpaca`
- `wizardlm70k`
## Potential Other Datasets to Use
### SFT
Alpaca: https://huggingface.co/datasets/yahma/alpaca-cleaned
Alpaca (Translated): https://huggingface.co/datasets/saillab/taco-datasets
WizardLM 200k: https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k
WizardLM 70k: https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_70k
Huggingface Ultrachat 200k: https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k
OpenOrca Slim Deduped (363k): https://huggingface.co/datasets/Open-Orca/SlimOrca-Dedup
### DPO
Intel Orca DPO Pairs: https://huggingface.co/datasets/Intel/orca_dpo_pairs
Huggingface Ultrachat: https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized

Binary file not shown.

Binary file not shown.

2
dist/run_docker.sh vendored
View File

@@ -3,4 +3,4 @@
docker run -it --rm \
--entrypoint bash \
-v $(pwd):/tmp/dist \
homeassistant/home-assistant /tmp/dist/make_wheel.sh v0.2.29
homeassistant/home-assistant /tmp/dist/make_wheel.sh v0.2.32

View File

@@ -1,5 +1,5 @@
# home-llm experiements (phi1.5)
rev1 - original test
# early home-llm experiements (phi1.5)
### rev1 - original test
- 1 epoch
- train ctx 1900
- I think the learning rate was way too high (2e-4)
@@ -7,7 +7,7 @@ rev1 - original test
- it doesn't get the device name right like ever
- eval dataset was disabled
rev2 - it kinda works
### rev2 - it kinda works
- eval dataset at 10%
- 2 epochs
- train ctx 1200
@@ -18,7 +18,7 @@ rev2 - it kinda works
+ still repeatedly spits out code blocks but at least it closes them correctly
+ names are MUCH more accurate. will still hallucinate names that don't exist
rev3 - ok it definitely works
### rev3 - ok it definitely works
- 4 epochs
- batch size 2
- learning rate cosine 5e-5
@@ -26,24 +26,24 @@ rev3 - ok it definitely works
+ still halluncinates device names. (need to figure this one out)
+ need more examples for: garage_door, media_player,
rev4 - got to way lower loss. it tries really hard to stop generating text
### rev4 - got to way lower loss. it tries really hard to stop generating text
- 4 epochs
- train ctx 512
- batch size 2
- learning rate cosine 1e-4
- added system prompt and moved services block before states block
rev 4.1 - really doesn't work as well. loss dropped REALLY fast and then never got as low as rev4
### rev 4.1 - really doesn't work as well. loss dropped REALLY fast and then never got as low as rev4
- 4 epochs
- train ctx 512
- batch size 3
- learning rate cosine 1e-4
- proper pad token
rev 4.2 - yeah nah it's the pad token
### rev 4.2 - yeah nah it's the pad token
- batch size 2
rev 5 - new dataset
### rev 5 - new dataset
- 3 epochs (4th epoch was overfit)
- train cx 512
- batch size 2
@@ -51,14 +51,14 @@ rev 5 - new dataset
+ actually stops generating text. not at the right... place but still!
+ messing with temperature makes it generate some interesting output.
rev 5.1 - gradient accumulation test
### rev 5.1 - gradient accumulation test
- 3 epochs
- train cx 512
- batch size 8
- learning rate cosine 1e-5
+ very meh
rev 5.2 - learning rate test
### rev 5.2 - learning rate test
- 3 epochs
- train cx 512
- batch size 8
@@ -68,14 +68,14 @@ rev 5.2 - learning rate test
+ still need more examples for multi-device actions (really need room/group support in dataset)
+ need to have more variance in request format. need more informal + more formal versions
rev 5.3 - learning rate test 2
### rev 5.3 - learning rate test 2
- 4 epochs
- train cx 512
- batch size 8
- learning rate cosine 6e-5
+ lower learning rate seemed to not be as effective even though it ran for longer
rev 6 - dataset revamp again
### rev 6 - dataset revamp again
- 3 epochs
- train cx 512
- batch size 8
@@ -85,58 +85,58 @@ rev 6 - dataset revamp again
+ definitely a bit overfit
+ maybe not so overfit. able to 0 shot asking to do stuff to a christmas tree
rev 6.1 - lower train rate
### rev 6.1 - lower train rate
- 3 epochs
- train cx 512
- batch size 8
- learning rate cosine 6e-5
+ also definitely a bit overfit. can't generate names it hasn't seen before
rev 6.2 - fewer epochs
### rev 6.2 - fewer epochs
- 2 epochs
- train cx 512
- batch size 8
- learning rate cosine 6e-5
rev 6.3 - higher batch
### rev 6.3 - higher batch
- 2 epochs
- train cx 512
- batch size 12
- learning rate cosine 1e-4
rev 7 - tweak dataset again
### rev 7 - tweak dataset again
- 2 epochs
- train ctx 512
- batch size 8
- learning rate 1e-4
+ when generating results, don't end with a space. it works WAY better
rev 7.1 - failed padding attempt
### rev 7.1 - failed padding attempt
rev 7.2 - try to overfit less + no newline at end
### rev 7.2 - try to overfit less + no newline at end
- 1 epoch
- train ctx 512
- batch size 8
- learning rate 1e-4
+ it definitly works with only one epoch
rev 7.3 - try adding fake end of sentence token
### rev 7.3 - try adding fake end of sentence token
- 1 epoch
- train ctx 512
- batch size 8
- learning rate 1e-4
rev 8 - dataset tweaks. add status requests
### rev 8 - dataset tweaks. add status requests
+ service requests still mostly work but status requests are pretty broken
rev 8.1 - tweak example counts + ratios
### rev 8.1 - tweak example counts + ratios
- 1 epoch
- train ctx 512
- batch size 8
- learning rate 1e-4
+ seems to have worked better with lower example counts
rev 8.2 - try to fit learning rate so loss doesn't bottom out till the end of training
### rev 8.2 - try to fit learning rate so loss doesn't bottom out till the end of training
- 1 epoch
- train ctx 512
- batch size 8
@@ -147,7 +147,7 @@ rev 8.2 - try to fit learning rate so loss doesn't bottom out till the end of tr
+ oh yuuhhhhh it's overcranked. nails both request types (plus even ending generation)
+ needs ambiguous device name examples because I totally just asked it an ambiguous question and it answered the one I wasn't expecting
rev 8.3 - further reduced training rate
### rev 8.3 - further reduced training rate
- 1 epoch
- train ctx 512
- batch size 8
@@ -156,7 +156,7 @@ rev 8.3 - further reduced training rate
+ has some creativity with how it repsonds
+ will often get the device name wrong on the first try
rev 8.4 - re-ordered prompt
### rev 8.4 - re-ordered prompt
- 1 epoch
- train ctx 512
- batch size 8
@@ -165,7 +165,7 @@ rev 8.4 - re-ordered prompt
+ it *works* but is incredibly open ended
+ basically never stops generating text
rev 8.5 - tweaked prompt format again
### rev 8.5 - tweaked prompt format again
- 1 epoch
- train ctx 512
- batch size 8
@@ -173,7 +173,7 @@ rev 8.5 - tweaked prompt format again
- re-orderd response before actions again but made actions less like a "block" so it might stop generation
+ that worked rather badly
rev 8.6 - make prompt look more like other examples it has seen before
### rev 8.6 - make prompt look more like other examples it has seen before
- 1 epoch
- train ctx 512
- batch size 8
@@ -183,7 +183,7 @@ rev 8.6 - make prompt look more like other examples it has seen before
+ only get the correct response about 50% of the time
+ it totally stops correctly when it DOES work
rev 8.7 - try to fit a bit more. the last iteration jumps around on which format it chooses
### rev 8.7 - try to fit a bit more. the last iteration jumps around on which format it chooses
- 1 epoch
- train ctx 512
- batch size 8
@@ -192,7 +192,7 @@ rev 8.7 - try to fit a bit more. the last iteration jumps around on which format
+ altering the format (with newlines) makes it pick our format more often
+ comparing to 8.6 with modified format shows this one is better at getting device names right
rev 8.8 - train with newlines instead of spaces in requets/response
### rev 8.8 - train with newlines instead of spaces in requets/response
- 1 epoch
- train ctx 512
- batch size 8
@@ -200,14 +200,14 @@ rev 8.8 - train with newlines instead of spaces in requets/response
+ definitely worse than the previous one
+ for some reason both 8.7 and 8.8 are horrible when using their actual template but if you deviate slightly it works a lot better on inference
rev 8.9 - actually fix pad token
### rev 8.9 - actually fix pad token
- 1 epoch
- train ctx 512
- batch size 8
- learning rate 1e-5
+ properly generates a response (+ terminates) when using the actual template
rev 9 - reduced dataset size
### rev 9 - reduced dataset size
- 1 epoch
- train ctx 512
- batch size 8
@@ -233,20 +233,19 @@ rev 9 - reduced dataset size
+ it works OK with low temperatures
+ seems to handle the alpaca dataset not so well
Eval results for existing models:
Home-1b-v1: 0.767816091954023
Home-3b-v2: 0.6908045977011494
### Home-1b-v1-GGUF
- eval results: 0.767816091954023
## home-1b-rev5 series
## home-1b-rev5/6 parameters
- 1 epoch
- 2048 train ctx
- batch size 8
- learning rate 1e-5
- weight decay 0.1
- gradient clipping 1.0
- save model every 200 steps
- save model every 200 or 400 steps
home-1b-rev5
### home-1b-rev5
- dataset size: medium
- evaluation results:
- 200: 0.553448275862069
@@ -257,7 +256,7 @@ home-1b-rev5
- 1200: 0.8488505747126437 (+.009)
- Final (1467): 0.8494252873563218 (+.00005)
home-1b-rev5_1
### home-1b-rev5_1
- dataset size: small
- evaluation results:
- 200: 0.6057471264367816
@@ -266,7 +265,7 @@ home-1b-rev5_1
- 800: 0.7729885057471264 (+.0046)
- Final (869): bad
home-1b-rev5_2
### home-1b-rev5_2
- dataset size: large
- evaluation results:
- 200: --
@@ -279,15 +278,11 @@ home-1b-rev5_2
- 1600: 0.8844827586206897
- Final (1848): 0.8833333333333333
home-3b-v3-rev1
- dataset size: large
- evaluation results: 0.9091954022988505
home-1b-rev6
### home-1b-rev6
- dataset size: large (fixed templates + function calling arguments; brightness is broken)
- evaluation results: 0.8254149971379507
home-1b-rev6_1
### home-1b-rev6_1
- dataset size: xl (fixed templates + function calling arguments; 0-255 brightness is broken)
- evaluation results:
- 400: 0.7240984544934173
@@ -297,7 +292,7 @@ home-1b-rev6_1
- 2000: 0.8551803091013166
- Final (2322): 0.8586147681740126
home-1b-rev6_2 = Home-1B-v2-GGUF
### home-1b-rev6_2 = Home-1B-v2-GGUF
- dataset size: large (change brightness back to percentages; increase color references by ~2x)
- evaluation results:
- 400: 0.7856064418721691
@@ -307,24 +302,28 @@ home-1b-rev6_2 = Home-1B-v2-GGUF
- 2000: 0.8852541519879215
- Final (2048):
home-3b-v3-rev2 = Home-3B-v2-GGUF
# Home 3B
- 1 epoch
- 2048 train ctx
- batch size 8
- learning rate 1e-5
- weight decay 0.1
- gradient clipping 1.0
- save model every 200 or 400 steps
Missing a lot of earlier 3B training results (not sure where they are)
### Home-3b-v2-GGUF (broken training run)
- evaluation result: 0.6908045977011494
### home-3b-v3-rev1
- dataset size: large
- evaluation results: 0.9091954022988505
### home-3b-v3-rev2 = Home-3B-v2-GGUF (republished)
- dataset size: xl + alpaca
- evaluation results: 0.8731756416708606
Home-3B-v2-GGUF:ha_only
### Home-3B-v2-GGUF:ha_only
- dataset size: large
- evaluation results: FAILED (again.....)
# Datasets
## SFT
Alpaca: https://huggingface.co/datasets/yahma/alpaca-cleaned
Alpaca (Translated): https://huggingface.co/datasets/saillab/taco-datasets
WizardLM 200k: https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k
WizardLM 70k: https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_70k
Huggingface Ultrachat 200k: https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k
OpenOrca Slim Deduped (363k): https://huggingface.co/datasets/Open-Orca/SlimOrca-Dedup
## DPO
Intel Orca DPO Pairs: https://huggingface.co/datasets/Intel/orca_dpo_pairs
Huggingface Ultrachat: https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized
- evaluation results: FAILED (again.....)