Merge branch 'develop'

2026-01-10 14:18:00 -05:00 · 2024-01-25 20:28:21 -05:00
parent 221cceb762 552e084b47
commit 69e7302704
5 changed files with 149 additions and 15 deletions
--- a/README.md
+++ b/README.md
@@ -10,7 +10,7 @@ The latest models can be found on HuggingFace:

 Make sure you have `llama-cpp-python>=0.2.29` in order to run these models.

-Old Models:
+Old Models:  
 3B v1 (Based on Phi-2): https://huggingface.co/acon96/Home-3B-v1-GGUF

 The main difference between the 2 models (besides parameter count) is the training data. The 1B model is ONLY trained on the synthetic dataset provided in this project, while the 3B model is trained on a mixture of this synthetic dataset, and the cleaned Stanford Alpaca dataset.
@@ -63,7 +63,7 @@ The 3B model was trained as a LoRA on an RTX 3090 (24GB) using the following set

 ```
 python3 train.py \
-    --run_name home-llm-rev11_1 \
+    --run_name home-3b \
    --base_model microsoft/phi-2 \
    --add_pad_token \
    --add_chatml_tokens \
@@ -74,11 +74,25 @@ python3 train.py \
    --save_steps 1000 \
    --micro_batch_size 2 --gradient_checkpointing \
    --ctx_size 2048 \
-    --use_lora --lora_rank 32 --lora_alpha 64 --lora_modules fc1,fc2,Wqkv,out_proj --lora_modules_to_save wte,lm_head.linear --lora_merge
+    --use_lora --lora_rank 32 --lora_alpha 64 --lora_modules fc1,fc2,q_proj,v_proj,dense --lora_modules_to_save embed_tokens,lm_head --lora_merge
 ```

 The 1B model was trained as a full fine-tuning on on an RTX 3090 (24GB). Training took approximately 1.5 hours.

+```
+python3 train.py \
+    --run_name home-1b \
+    --base_model microsoft/phi-1_5 \
+    --add_pad_token \
+    --add_chatml_tokens \
+    --bf16 \
+    --train_dataset data/home_assistant_train.json \
+    --test_dataset data/home_assistant_test.json \
+    --learning_rate 1e-5 \
+    --micro_batch_size 4 --gradient_checkpointing \
+    --ctx_size 2048
+```
+
 ## Home Assistant Component
 In order to integrate with Home Assistant, we provide a `custom_component` that exposes the locally running LLM as a "conversation agent" that can be interacted with using a chat interface as well as integrate with Speech-to-Text and Text-to-Speech addons to enable interacting with the model by speaking.  

@@ -105,7 +119,7 @@ When setting up the component, there are 4 different "backend" options to choose
 3. A remote instance of text-generation-webui
 4. A generic OpenAI API compatible interface; *should* be compatible with LocalAI, LM Studio, and all other OpenAI compatible backends

-See (docs/Backend Configuration.md)[/docs/Backend%20Configuration.md] for more info.
+See [docs/Backend Configuration.md](/docs/Backend%20Configuration.md) for more info.

 **Installing llama-cpp-python for local model usage**:  
 In order to run a model directly as part of your Home Assistant installation, you will need to install one of the pre-build wheels because there are no existing musllinux wheels for the package. Compatible wheels for x86_x64 and arm64 are provided in the [dist](./dist) folder. Copy the `*.whl` files to the `custom_components/llama_conversation/` folder. They will be installed while setting up the component.
@@ -127,6 +141,9 @@ You need the following settings in order to configure the "remote" backend:

 With the remote text-generation-webui backend, the component will validate that the selected model is available for use and will ensure it is loaded remotely. The Generic OpenAI compatible version does NOT do any validation or model loading.

+**Setting up with LocalAI**:
+If you are an existing LocalAI user or would like to use LocalAI as your backend, please refer to [this](https://io.midori-ai.xyz/howtos/setup-with-ha/) website which has instructions on how to setup LocalAI to work with Home-LLM including automatic installation of the latest version of the the Home-LLM model. The auto-installer (LocalAI Manager) will automatically download and setup LocalAI and/or the model of your choice and automatically create the necessary template files for the model to work with this integration.
+
 ### Configuring the component as a Conversation Agent
 **NOTE: ANY DEVICES THAT YOU SELECT TO BE EXPOSED TO THE MODEL WILL BE ADDED AS CONTEXT AND POTENTIALLY HAVE THEIR STATE CHANGED BY THE MODEL. ONLY EXPOSE DEVICES THAT YOU ARE OK WITH THE MODEL MODIFYING THE STATE OF, EVEN IF IT IS NOT WHAT YOU REQUESTED. THE MODEL MAY OCCASIONALLY HALLUCINATE AND ISSUE COMMANDS TO THE WRONG DEVICE! USE AT YOUR OWN RISK.**

@@ -171,8 +188,9 @@ It is highly recommend to set up text-generation-webui on a separate machine tha
 ## Version History
 | Version | Description                                                                                                                                    |
 | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
+| v0.2.4  | Fix API key auth on model load for text-generation-webui, and add support for Ollama API backend                                               |
 | v0.2.3  | Fix API key auth, Support chat completion endpoint, and refactor to make it easier to add more remote backends                                 |
 | v0.2.2  | Fix options window after upgrade, fix training script for new Phi model format, and release new models                                         |
 | v0.2.1  | Properly expose generation parameters for each backend, handle config entry updates without reloading, support remote backends with an API key |
 | v0.2    | Bug fixes, support more backends, support for climate + switch devices, JSON style function calling with parameters, GBNF grammars             |
-| v0.1    | Initial Release                                                                                                                                |
+| v0.1    | Initial Release                                                                                                                                |
--- a/custom_components/llama_conversation/init.py
+++ b/custom_components/llama_conversation/init.py
@@ -115,6 +115,8 @@ async def async_setup_entry(hass: HomeAssistant, entry: ConfigEntry) -> bool:
            agent_cls = TextGenerationWebuiAgent
        elif backend_type == BACKEND_TYPE_LLAMA_CPP_PYTHON_SERVER:
            agent_cls = LlamaCppPythonAPIAgent
+        elif backend_type == BACKEND_TYPE_OLLAMA:
+            agent_cls = OllamaAPIAgent
        
        return agent_cls(hass, entry)

@@ -588,7 +590,14 @@ class TextGenerationWebuiAgent(GenericOpenAIAPIAgent):
        self.admin_key = entry.data.get(CONF_TEXT_GEN_WEBUI_ADMIN_KEY, self.api_key)

        try:
-            currently_loaded_result = requests.get(f"{self.api_host}/v1/internal/model/info")
+            headers = {}
+            if self.admin_key:
+                headers["Authorization"] = f"Bearer {self.admin_key}"
+                
+            currently_loaded_result = requests.get(
+                f"{self.api_host}/v1/internal/model/info",
+                headers=headers,
+            )
            currently_loaded_result.raise_for_status()

            loaded_model = currently_loaded_result.json()["model_name"]
@@ -598,9 +607,7 @@ class TextGenerationWebuiAgent(GenericOpenAIAPIAgent):
            else:
                _LOGGER.info(f"Model is not {self.model_name} loaded on the remote backend. Loading it now...")
            
-            headers = {}
-            if self.admin_key:
-                headers["Authorization"] = f"Bearer {self.admin_key}"
+            
            
            load_result = requests.post(
                f"{self.api_host}/v1/internal/model/load",
@@ -608,7 +615,8 @@ class TextGenerationWebuiAgent(GenericOpenAIAPIAgent):
                    "model_name": self.model_name,
                    # TODO: expose arguments to the user in home assistant UI
                    # "args": {},
-                }
+                },
+                headers=headers
            )
            load_result.raise_for_status()

@@ -683,4 +691,89 @@ class LlamaCppPythonAPIAgent(GenericOpenAIAPIAgent):
        if self.entry.options.get(CONF_USE_GBNF_GRAMMAR, DEFAULT_USE_GBNF_GRAMMAR):
            request_params["grammar"] = self.grammar

-        return endpoint, request_params
+        return endpoint, request_params
+
+class OllamaAPIAgent(LLaMAAgent):
+    api_host: str
+    api_key: str
+    model_name: str
+
+    def _load_model(self, entry: ConfigEntry) -> None:
+        # TODO: https
+        self.api_host = f"http://{entry.data[CONF_HOST]}:{entry.data[CONF_PORT]}"
+        self.api_key = entry.data.get(CONF_OPENAI_API_KEY)
+        self.model_name = entry.data.get(CONF_CHAT_MODEL)
+
+
+    def _chat_completion_params(self, conversation: dict) -> (str, dict):
+        request_params = {}
+
+        endpoint = "/api/chat"
+        request_params["messages"] = [ { "role": x["role"], "content": x["message"] } for x in conversation ]
+
+        return endpoint, request_params
+
+    def _completion_params(self, conversation: dict) -> (str, dict):
+        request_params = {}
+
+        endpoint = "/api/generate"
+        request_params["prompt"] = self._format_prompt(conversation)
+
+        return endpoint, request_params
+    
+    def _extract_response(self, response_json: dict) -> str:        
+        if response_json["done"] != "true":
+            _LOGGER.warn("Model response did not end on a stop token (unfinished sentence)")
+
+        if "response" in response_json:
+            return response_json["response"]
+        else:
+            return response_json["message"]["content"]
+    
+    def _generate(self, conversation: dict) -> str:
+        max_tokens = self.entry.options.get(CONF_MAX_TOKENS, DEFAULT_MAX_TOKENS)
+        temperature = self.entry.options.get(CONF_TEMPERATURE, DEFAULT_TEMPERATURE)
+        top_p = self.entry.options.get(CONF_TOP_P, DEFAULT_TOP_P)
+        timeout = self.entry.options.get(CONF_REQUEST_TIMEOUT, DEFAULT_REQUEST_TIMEOUT)
+        use_chat_api = self.entry.options.get(CONF_REMOTE_USE_CHAT_ENDPOINT, DEFAULT_REMOTE_USE_CHAT_ENDPOINT)
+        
+
+        request_params = {
+            "model": self.model_name,
+            "stream": False,
+            "options": {
+                "top_p": top_p,
+                "temperature": temperature,
+                "num_ctx": max_tokens,
+            }   
+        }
+        
+        if use_chat_api:
+            endpoint, additional_params = self._chat_completion_params(conversation)
+        else:
+            endpoint, additional_params = self._completion_params(conversation)
+        
+        request_params.update(additional_params)
+
+        headers = {}
+        if self.api_key:
+            headers["Authorization"] = f"Bearer {self.api_key}"
+
+        result = requests.post(
+            f"{self.api_host}{endpoint}", 
+            json=request_params,
+            timeout=timeout,
+            headers=headers,
+        )
+
+        try:
+            result.raise_for_status()
+        except requests.RequestException as err:
+            _LOGGER.debug(f"Err was: {err}")
+            _LOGGER.debug(f"Request was: {request_params}")
+            _LOGGER.debug(f"Result was: {result.text}")
+            return f"Failed to communicate with the API! {err}"
+        
+        _LOGGER.debug(result.json())
+
+        return self._extract_response(result.json())
--- a/custom_components/llama_conversation/config_flow.py
+++ b/custom_components/llama_conversation/config_flow.py
@@ -112,7 +112,7 @@ def STEP_INIT_DATA_SCHEMA(backend_type=None):
                    BACKEND_TYPE_TEXT_GEN_WEBUI,
                    BACKEND_TYPE_GENERIC_OPENAI,
                    BACKEND_TYPE_LLAMA_CPP_PYTHON_SERVER,
-                    # BACKEND_TYPE_OLLAMA
+                    BACKEND_TYPE_OLLAMA
                ],
                translation_key=CONF_BACKEND_TYPE,
                multiple=False,
@@ -728,5 +728,28 @@ def local_llama_config_option_schema(options: MappingProxyType[str, Any], backen
                default=DEFAULT_USE_GBNF_GRAMMAR,
            ): bool
        })
+    elif backend_type == BACKEND_TYPE_OLLAMA:
+        result = insert_after_key(result, CONF_MAX_TOKENS, {
+            vol.Required(
+                CONF_REQUEST_TIMEOUT,
+                description={"suggested_value": options.get(CONF_REQUEST_TIMEOUT)},
+                default=DEFAULT_REQUEST_TIMEOUT,
+            ): int,
+            vol.Required(
+                CONF_REMOTE_USE_CHAT_ENDPOINT,
+                description={"suggested_value": options.get(CONF_REMOTE_USE_CHAT_ENDPOINT)},
+                default=DEFAULT_REMOTE_USE_CHAT_ENDPOINT,
+            ): bool,
+            vol.Required(
+                CONF_TEMPERATURE,
+                description={"suggested_value": options.get(CONF_TEMPERATURE)},
+                default=DEFAULT_TEMPERATURE,
+            ): NumberSelector(NumberSelectorConfig(min=0, max=1, step=0.05)),
+            vol.Required(
+                CONF_TOP_P,
+                description={"suggested_value": options.get(CONF_TOP_P)},
+                default=DEFAULT_TOP_P,
+            ): NumberSelector(NumberSelectorConfig(min=0, max=1, step=0.05)),
+        })

    return result
--- a/custom_components/llama_conversation/manifest.json
+++ b/custom_components/llama_conversation/manifest.json
@@ -1,7 +1,7 @@
 {
  "domain": "llama_conversation",
  "name": "LLaMA Conversation",
-  "version": "0.2.2",
+  "version": "0.2.4",
  "codeowners": ["@acon96"],
  "config_flow": true,
  "dependencies": ["conversation"],
--- a/custom_components/llama_conversation/translations/en.json
+++ b/custom_components/llama_conversation/translations/en.json
@@ -43,7 +43,7 @@
                    "download_model_from_hf": "Download model from HuggingFace",
                    "use_local_backend": "Use Llama.cpp"
                },
-                "description": "Select the backend for running the model. The options are:\n1. Llama.cpp with a model from HuggingFace\n2. Llama.cpp with a model stored on the disk\n3. [text-generation-webui API](https://github.com/oobabooga/text-generation-webui)\n4. Generic OpenAI API Compatible API\n5. [llama-cpp-python Server](https://llama-cpp-python.readthedocs.io/en/latest/server/)\n\nIf using Llama.cpp locally, make sure you copied the correct wheel file to the same directory as the integration.",
+                "description": "Select the backend for running the model. The options are:\n1. Llama.cpp with a model from HuggingFace\n2. Llama.cpp with a model stored on the disk\n3. [text-generation-webui API](https://github.com/oobabooga/text-generation-webui)\n4. Generic OpenAI API Compatible API\n5. [llama-cpp-python Server](https://llama-cpp-python.readthedocs.io/en/latest/server/)\n6. [Ollama API](https://github.com/jmorganca/ollama/blob/main/docs/api.md)\n\nIf using Llama.cpp locally, make sure you copied the correct wheel file to the same directory as the integration.",
                "title": "Select Backend"
            }
        }
@@ -89,7 +89,7 @@
                "text-generation-webui_api": "text-generation-webui API",
                "generic_openai": "Generic OpenAI Compatible API",
                "llama_cpp_python_server": "llama-cpp-python Server",
-                "ollama": "Ollama"
+                "ollama": "Ollama API"

            }
        },