Merge branch 'master' into develop

2026-01-09 21:58:00 -05:00 · 2023-12-28 17:16:30 -05:00
parent b1575f2512 b63527ec52
commit c92b9a45df
13 changed files with 237 additions and 121 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -6,4 +6,4 @@ config/
 data/*.json
 *.pyc
 main.log
-.venv
+.venv
--- a/README.md
+++ b/README.md
@@ -1,10 +1,12 @@
 # Home LLM
-This project provides the required "glue" components to control your Home Assistant installation with a completely local Large Langage Model acting as a personal assistant. The goal is to provide a drop in solution to be used as a "conversation agent" component type by the Home Assistant project.
+This project provides the required "glue" components to control your Home Assistant installation with a completely local Large Language Model acting as a personal assistant. The goal is to provide a drop in solution to be used as a "conversation agent" component by Home Assistant.

 ## Model
-The "Home" model is a fine tuning of the Phi model series from Microsoft.  The model is able to control devices in the user's house as well as perform basic question and answering.  The fine tuning dataset is a combination of the [Cleaned Stanford Alpaca Dataset](https://huggingface.co/datasets/yahma/alpaca-cleaned) as well as a [custom synthetic dataset](./data) designed to teach the model function calling based on the device information in the context.
+The "Home" model is a fine tuning of the Phi-2 model from Microsoft.  The model is able to control devices in the user's house as well as perform basic question and answering.  The fine tuning dataset is a combination of the [Cleaned Stanford Alpaca Dataset](https://huggingface.co/datasets/yahma/alpaca-cleaned) as well as a [custom synthetic dataset](./data) designed to teach the model function calling based on the device information in the context.

-The model is quantized using Llama.cpp in order to enable running the model in super low resource environments that are common with Home Assistant installations such as Rapsberry Pis.
+The model can be found on HuggingFace: https://huggingface.co/acon96/Home-3B-v1-GGUF
+
+The model is quantized using Llama.cpp in order to enable running the model in super low resource environments that are common with Home Assistant installations such as Raspberry Pis.

 The model can be used as an "instruct" type model using the ChatML prompt format. The system prompt is used to provide information about the state of the Home Assistant installation including available devices and callable services.

@@ -39,6 +41,10 @@ Due to the mix of data used during fine tuning, the model is also capable of bas
 <|im_start|>assistant If Mary is 7 years old, then you are 10 years old (7+3=10).<|im_end|><|endoftext|>
 ```

+### Synthetic Dataset
+The synthetic dataset is aimed at covering basic day to day operations in home assistant such as turning devices on and off.
+The supported entity types are: light, fan, cover, lock, media_player
+
 ### Training
 The model was trained as a LoRA on an RTX 3090 (24GB) using the following settings for the custom training script. The embedding weights were "saved" and trained normally along with the rank matricies in order to train the newly added tokens to the embeddings. The full model is merged together at the end.

@@ -63,16 +69,67 @@ The provided `custom_modeling_phi.py` has Gradient Checkpointing implemented for
 ## Home Assistant Component
 In order to integrate with Home Assistant, we provide a `custom_component` that exposes the locally running LLM as a "conversation agent" that can be interacted with using a chat interface as well as integrate with Speech-to-Text and Text-to-Speech addons to enable interacting with the model by speaking.  

-The component can either run the model directly as part of the Home Assistant software using llama-cpp-python, or you can run the [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) project to provide access to the LLM via an API interface. When doing this, you can host the model yourself and point the addon at machine where the model is hosted, or you can run the model using text-generation-webui using the provided [custom Home Assistant addon](./addon/README.md).
+The component can either run the model directly as part of the Home Assistant software using llama-cpp-python, or you can run the [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) project to provide access to the LLM via an API interface. When doing this, you can host the model yourself and point the add-on at machine where the model is hosted, or you can run the model using text-generation-webui using the provided [custom Home Assistant add-on](./addon).
+
+### Installing
+1. Ensure you have either the Samba, SSH, FTP, or another add-on installed that gives you access to the `config` folder
+2. If there is not already a `custom_components` folder, create one now.
+3. Copy the `custom_components/llama_conversation` folder from this repo to `config/custom_components/llama_conversation` on your Home Assistant machine.
+4. Restart Home Assistant using the "Developer Tools" tab -> Services -> Run `homeassistant.restart`
+5. The "LLaMA Conversation" integration should show up in the "Devices" section now.

 ### Setting up
 When setting up the component, there are 3 different "backend" options to choose from:
 1. Llama.cpp with a model from HuggingFace
 2. Llama.cpp with a locally provided model
-3. A remote instance of text-generateion-webui
+3. A remote instance of text-generation-webui

-**Setting up the Llama.cpp backend with a model from HuggingFace**:
+**Setting up the Llama.cpp backend with a model from HuggingFace**:  
+TODO: need to build wheels for llama.cpp first

-**Setting up the Llama.cpp backend with a locally downloaded model**:
+**Setting up the Llama.cpp backend with a locally downloaded model**:  
+TODO: need to build wheels for llama.cpp first

 **Setting up the "remote" backend**:
+
+You need the following settings in order to configure the "remote" backend
+1. Hostname: the host of the machine where text-generation-webui API is hosted. If you are using the provided add-on then the hostname is `local-text-generation-webui`
+2. Port: the port for accessing the text-generation-webui API. NOTE: this is not the same as the UI port. (Usually 5000)
+3. Name of the Model: This name must EXACTLY match the name as it appears in `text-generation-webui`
+
+On creation, the component will validate that the model is available for use.
+
+### Configuring the component as a Conversation Agent
+**NOTE: ANY DEVICES THAT YOU SELECT TO BE EXPOSED TO THE MODEL WILL BE ADDED AS CONTEXT AND POTENTIALLY HAVE THEIR STATE CHANGED BY THE MODEL. ONLY EXPOSE DEVICES THAT YOU ARE OK WITH THE MODEL MODIFYING THE STATE OF, EVEN IF IT IS NOT WHAT YOU REQUESTED. THE MODEL MAY OCCASIONALLY HALLUCINATE AND ISSUE COMMANDS TO THE WRONG DEVICE! USE AT YOUR OWN RISK.**
+
+In order to utilize the conversation agent in HomeAssistant:
+1. Navigate to "Settings" -> "Voice Assistants"
+2. Select "+ Add Assistant"
+3. Name the assistant whatever you want.
+4. Select the "Conversation Agent" that we created previously
+5. If using STT or TTS configure these now
+6. Return to the "Overview" dashboard and select chat icon in the top left.
+
+From here you can submit queries to the AI agent.
+
+In order for any entities be available to the agent, you must "expose" them first.
+1. Navigate to "Settings" -> "Voice Assistants" -> "Expose" Tab
+2. Select "+ Expose Entities" in the bottom right
+3. Check any entities you would like to be exposed to the conversation agent.
+
+### Running the text-generation-webui add-on
+In order to facilitate running the project entirely on the system where Home Assistant is installed, there is an experimental Home Assistant Add-on that runs the oobabooga/text-generation-webui to connect to using the "remote" backend option.
+
+1. Ensure you have either the Samba, SSH, FTP, or another add-on installed that gives you access to the `addons` folder
+2. Copy the `addon` folder from this repo to `addons/text-generation-webui` on your Home Assistant machine.
+3. Go to the "Add-ons" section in settings and then pick the "Add-on Store" from the bottom right corner.
+4. Select the 3 dots in the top right and click "Check for Updates" and Refresh the webpage.
+5. There should now be a "Local Add-ons" section at the top of the "Add-on Store"
+6. Install the `oobabooga-text-generation-webui` add-on. It will take ~15-20 minutes to build the image on a Raspberry Pi.
+7. Copy any models you want to use to the `addon_configs/local_text-generation-webui/models` folder or download them using the UI.
+8. Load up a model to use. NOTE: The timeout for ingress pages is only 60 seconds so if the model takes longer than 60 seconds to load (very likely) then the UI will appear to time out and you will need to navigate to the add-on's logs to see when the model is fully loaded.
+
+### Performance of running the model on a Raspberry Pi
+The RPI4 4GB that I have was sitting right at 1.5 tokens/sec for prompt eval and 1.6 tokens/sec for token generation when running the `Q4_K_M` quant. I was reliably getting responses in 30-60 seconds after the initial prompt processing which took almost 5 minutes. It depends significantly on the number of devices that have been exposed as well as how many states have changed since the last invocation because llama.cpp caches KV values for identical prompt prefixes.
+
+It is highly recommend to set up text-generation-webui on a separate machine that can take advantage of a GPU.
--- a/TODO.md
+++ b/TODO.md
@@ -18,4 +18,6 @@
 [ ] RAG for getting info for setting up new devices
    - set up vectordb
    - ingest home assistant docs
-    - "context request" from above to initiate a RAG search
+    - "context request" from above to initiate a RAG search
+[ ] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"
+[ ] prime kv cache with current "state" so that requests are faster
--- a/addon/Dockerfile
+++ b/addon/Dockerfile
@@ -1,111 +1,51 @@
-ARG BUILD_FROM=atinoda/text-generation-webui:default
-
-FROM alpine:latest as overlay-downloader
-RUN apk add git && \
-    git clone https://github.com/hassio-addons/addon-ubuntu-base /tmp/addon-ubuntu-base
+ARG BUILD_FROM=ghcr.io/hassio-addons/ubuntu-base:9.0.2

 # hadolint ignore=DL3006
 FROM ${BUILD_FROM}

-# Environment variables
-ENV \
-    CARGO_NET_GIT_FETCH_WITH_CLI=true \
-    DEBIAN_FRONTEND="noninteractive" \
-    HOME="/root" \
-    LANG="C.UTF-8" \
-    PIP_DISABLE_PIP_VERSION_CHECK=1 \
-    PIP_NO_CACHE_DIR=1 \
-    PIP_PREFER_BINARY=1 \
-    PS1="$(whoami)@$(hostname):$(pwd)$ " \
-    PYTHONDONTWRITEBYTECODE=1 \
-    PYTHONUNBUFFERED=1 \
-    S6_BEHAVIOUR_IF_STAGE2_FAILS=2 \
-    S6_CMD_WAIT_FOR_SERVICES_MAXTIME=0 \
-    S6_CMD_WAIT_FOR_SERVICES=1 \
-    YARN_HTTP_TIMEOUT=1000000 \
-    TERM="xterm-256color"
-
 # Set shell
 SHELL ["/bin/bash", "-o", "pipefail", "-c"]

-# Install base system
+# Install text-generation-webui
 ARG BUILD_ARCH=amd64
-ARG BASHIO_VERSION="v0.16.0"
-ARG S6_OVERLAY_VERSION="3.1.5.0"
-ARG TEMPIO_VERSION="2021.09.0"
+ARG APP_DIR=/app
 RUN \
    apt-get update \
    \
    && apt-get install -y --no-install-recommends \
-        ca-certificates=20230311ubuntu0.22.04.1 \
-        curl=7.81.0-1ubuntu1.14 \
-        jq=1.6-2.1ubuntu3 \
-        tzdata=2023c-0ubuntu0.22.04.2 \
-        xz-utils=5.2.5-2ubuntu1 \
-    \
-    && S6_ARCH="${BUILD_ARCH}" \
-    && if [ "${BUILD_ARCH}" = "i386" ]; then S6_ARCH="i686"; \
-    elif [ "${BUILD_ARCH}" = "amd64" ]; then S6_ARCH="x86_64"; \
-    elif [ "${BUILD_ARCH}" = "armv7" ]; then S6_ARCH="arm"; fi \
-    \
-    && curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-noarch.tar.xz" \
-        | tar -C / -Jxpf - \
-    \
-    && curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-${S6_ARCH}.tar.xz" \
-        | tar -C / -Jxpf - \
-    \
-    && curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-symlinks-noarch.tar.xz" \
-        | tar -C / -Jxpf - \
-    \
-    && curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-symlinks-arch.tar.xz" \
-        | tar -C / -Jxpf - \
-    \
-    && mkdir -p /etc/fix-attrs.d \
-    && mkdir -p /etc/services.d \
-    \
-    && curl -J -L -o /tmp/bashio.tar.gz \
-        "https://github.com/hassio-addons/bashio/archive/${BASHIO_VERSION}.tar.gz" \
-    && mkdir /tmp/bashio \
-    && tar zxvf \
-        /tmp/bashio.tar.gz \
-        --strip 1 -C /tmp/bashio \
-    \
-    && mv /tmp/bashio/lib /usr/lib/bashio \
-    && ln -s /usr/lib/bashio/bashio /usr/bin/bashio \
-    \
-    && curl -L -s -o /usr/bin/tempio \
-        "https://github.com/home-assistant/tempio/releases/download/${TEMPIO_VERSION}/tempio_${BUILD_ARCH}" \
-    && chmod a+x /usr/bin/tempio \
+        ca-certificates \
+        curl \
+        git \
+        build-essential \
+        cmake \
+        python3.10 \
+        python3-dev \
+        python3-venv \
+        python3-pip \
    \
+    && git clone https://github.com/oobabooga/text-generation-webui.git ${APP_DIR}\
+    && python3 -m pip install torch torchvision torchaudio py-cpuinfo==9.0.0 \
+    && python3 -m pip install -r ${APP_DIR}/requirements_cpu_only_noavx2.txt -r ${APP_DIR}/extensions/openai/requirements.txt llama-cpp-python \
    && apt-get purge -y --auto-remove \
-        xz-utils \
+        git \
+        build-essential \
+        cmake \
+        python3-dev \
    && apt-get clean \
    && rm -fr \
        /tmp/* \
        /var/{cache,log}/* \
        /var/lib/apt/lists/*

-# Copy s6-overlay adjustments from cloned git repo
-COPY --from=overlay-downloader /tmp/addon-ubuntu-base/base/s6-overlay /package/admin/s6-overlay-${S6_OVERLAY_VERSION}/
-
-# Copy root filesystem for the base image
-COPY --from=overlay-downloader /tmp/addon-ubuntu-base/base/rootfs /
-
 # Copy root filesystem for our image
 COPY rootfs /

-# Entrypoint & CMD
-ENTRYPOINT [ "/init" ]
-
 # Build arugments
 ARG BUILD_DATE
 ARG BUILD_REF
 ARG BUILD_VERSION
 ARG BUILD_REPOSITORY

-# TODO: figure out what is broken with file permissions
-USER root
-
 # Labels
 LABEL \
    io.hass.name="oobabooga text-generation-webui for ${BUILD_ARCH}" \
@@ -113,9 +53,6 @@ LABEL \
    io.hass.arch="${BUILD_ARCH}" \
    io.hass.type="addon" \
    io.hass.version=${BUILD_VERSION} \
-    io.hass.base.version=${BUILD_VERSION} \
-    io.hass.base.name="ubuntu" \
-    io.hass.base.image="hassioaddons/ubuntu-base" \
    maintainer="github.com/acon96" \
    org.opencontainers.image.title="oobabooga text-generation-webui for ${BUILD_ARCH}" \
    org.opencontainers.image.description="Home Assistant Community Add-on: ${BUILD_ARCH} oobabooga text-generation-webui" \
--- a/addon/README.md
+++ b/addon/README.md
@@ -1,4 +1,4 @@
 # text-generation-webui - Home Assistant Addon
-NOTE: This is super experimental and probably won't actually run on a Raspberry PI
+NOTE: This is super experimental and may or may not work on a Raspberry Pi

-This basically takes an existing Docker image and attempts to overlay the required files for Home Assistant to launch and recognize it as an addon.
+Installs text-generation-webui into a docker container using CPU only mode (llama.cpp)
--- a/addon/build.yaml
+++ b/addon/build.yaml
@@ -1,3 +1,4 @@
 ---
 build_from:
-  amd64: atinoda/text-generation-webui:default-snapshot-2023-10-29
+  aarch64: ghcr.io/hassio-addons/ubuntu-base:9.0.2
+  amd64: ghcr.io/hassio-addons/ubuntu-base:9.0.2
--- a/addon/config.yaml
+++ b/addon/config.yaml
@@ -1,16 +1,26 @@
 ---
-name: oobabooga text-generation-webui
+name: oobabooga-text-generation-webui
 version: dev
 slug: text-generation-webui
-description: ""
-url: ""
+description: "A tool for running Large Language Models"
+url: "https://github.com/oobabooga/text-generation-webui"
+init: false
 arch:
- amd64
+  - amd64
+  - aarch64
 ports:
  7860/tcp: 7860 # ingress
  5000/tcp: 5000 # api
+ports_description:
+  7860/tcp: Web interface (Not required for Ingress)
+  5000/tcp: OpenAI compatible API Server
 ingress: true
 ingress_port: 7860
-# options: {}
-# schema: {}
-# TODO: figure out volume mounts so models persist between restarts
+options: {}
+schema:
+  log_level: list(trace|debug|info|notice|warning|error|fatal)?
+  models_directory: str?
+map:
+  - media:rw
+  - share:rw
+  - addon_config:rw
--- a/addon/rootfs/etc/s6-overlay/s6-rc.d/text-generation-webui/run
+++ b/addon/rootfs/etc/s6-overlay/s6-rc.d/text-generation-webui/run
@@ -5,5 +5,29 @@
 # ==============================================================================
 bashio::log.info "Starting Text Generation Webui..."

-cd /app
-exec python3 /app/server.py --listen --verbose --api
+APP_DIR="/app"
+DEFAULT_MODELS_DIR="/config/models"
+
+if bashio::config.has_value "models_directory" && ! bashio::config.is_empty "models_directory"; then
+    MODELS_DIR=$(bashio::config 'models_directory')
+    if ! bashio::fs.directory_exists "$MODELS_DIR"; then
+        MODELS_DIR=$DEFAULT_MODELS_DIR
+        mkdir -p $MODELS_DIR
+        bashio::log.warning "The provided models directory '$MODELS_DIR' does not exist! Defaulting to '$DEFAULT_MODELS_DIR'"
+    else
+        bashio::log.info "Using chosen storage for models: '$MODELS_DIR'"
+    fi
+else
+    MODELS_DIR=$DEFAULT_MODELS_DIR
+    mkdir -p $MODELS_DIR
+    bashio::log.info "Using default local storage for models."
+fi
+
+# ensure we can access the folder
+chmod 0777 $MODELS_DIR
+
+export GRADIO_ROOT_PATH=$(bashio::addon.ingress_entry)
+bashio::log.info "Serving app from $GRADIO_ROOT_PATH"
+
+cd $APP_DIR
+exec python3 server.py --listen --verbose --api --model-dir $MODELS_DIR
--- a/custom_components/llama_conversation/init.py
+++ b/custom_components/llama_conversation/init.py
@@ -8,7 +8,7 @@ from typing import Callable
 import numpy.typing as npt
 import numpy as np

-from llama_cpp import Llama
+# from llama_cpp import Llama
 import requests
 import re
 import os
@@ -42,6 +42,7 @@ from .const import (
    CONF_TEMPERATURE,
    CONF_TOP_K,
    CONF_TOP_P,
+    CONF_REQUEST_TIMEOUT,
    CONF_BACKEND_TYPE,
    CONF_DOWNLOADED_MODEL_FILE,
    DEFAULT_MAX_TOKENS,
@@ -50,6 +51,7 @@ from .const import (
    DEFAULT_TOP_K,
    DEFAULT_TOP_P,
    DEFAULT_BACKEND_TYPE,
+    DEFAULT_REQUEST_TIMEOUT,
    BACKEND_TYPE_REMOTE,
    DOMAIN,
 )
@@ -112,6 +114,8 @@ class LLaMAAgent(conversation.AbstractConversationAgent):
        if self.use_local_backend:
            if not model_path:
                raise Exception(f"Model was not found at '{model_path}'!")
+            
+            raise NotImplementedError()

            self.llm = Llama(
                model_path=model_path,
@@ -200,6 +204,8 @@ class LLaMAAgent(conversation.AbstractConversationAgent):
        prompt.append({"role": "assistant", "message": response})
        self.history[conversation_id] = prompt

+        exposed_entities = list(self._async_get_exposed_entities()[0].keys())
+
        to_say = response.strip().split("\n")[0]
        pattern = re.compile(r"```homeassistant\n([\S\n]*)```")
        for block in pattern.findall(response.strip()):
@@ -213,16 +219,21 @@ class LLaMAAgent(conversation.AbstractConversationAgent):
                service = line.split("(")[0]
                entity = line.split("(")[1][:-1]
                domain, service = tuple(service.split("."))
-                try:
-                    await self.hass.services.async_call(
-                        domain,
-                        service,
-                        service_data={ATTR_ENTITY_ID: entity},
-                        blocking=True,
-                    )
-                except Exception as err:
-                    to_say += f"\nFailed to run: {line}"
-                    _LOGGER.debug(f"err: {err}; {repr(err)}")
+
+                # only acknowledge requests to exposed entities
+                if entity not in exposed_entities:
+                    to_say += f"Can't find device '{entity}'"
+                else:
+                    try:
+                        await self.hass.services.async_call(
+                            domain,
+                            service,
+                            service_data={ATTR_ENTITY_ID: entity},
+                            blocking=True,
+                        )
+                    except Exception as err:
+                        to_say += f"\nFailed to run: {line}"
+                        _LOGGER.debug(f"err: {err}; {repr(err)}")

        intent_response = intent.IntentResponse(language=user_input.language)
        intent_response.async_set_speech(to_say)
@@ -235,8 +246,10 @@ class LLaMAAgent(conversation.AbstractConversationAgent):
            generate_params["model"] = self.model_name
            del generate_params["top_k"]

+            timeout = self.entry.options.get(CONF_REQUEST_TIMEOUT, DEFAULT_REQUEST_TIMEOUT)
+
            result = requests.post(
-                f"{self.api_host}/v1/completions", json=generate_params, timeout=30
+                f"{self.api_host}/v1/completions", json=generate_params, timeout=timeout
            )
            result.raise_for_status()
        except requests.RequestException as err:
--- a/custom_components/llama_conversation/config_flow.py
+++ b/custom_components/llama_conversation/config_flow.py
@@ -35,6 +35,7 @@ from .const import (
    CONF_TEMPERATURE,
    CONF_TOP_K,
    CONF_TOP_P,
+    CONF_REQUEST_TIMEOUT,
    CONF_BACKEND_TYPE,
    CONF_BACKEND_TYPE_OPTIONS,
    CONF_DOWNLOADED_MODEL_FILE,
@@ -48,6 +49,7 @@ from .const import (
    DEFAULT_TEMPERATURE,
    DEFAULT_TOP_K,
    DEFAULT_TOP_P,
+    DEFAULT_REQUEST_TIMEOUT,
    DEFAULT_BACKEND_TYPE,
    BACKEND_TYPE_LLAMA_HF,
    BACKEND_TYPE_LLAMA_EXISTING,
@@ -96,6 +98,7 @@ DEFAULT_OPTIONS = types.MappingProxyType(
        CONF_TOP_K: DEFAULT_TOP_K,
        CONF_TOP_P: DEFAULT_TOP_P,
        CONF_TEMPERATURE: DEFAULT_TEMPERATURE,
+        CONF_REQUEST_TIMEOUT: DEFAULT_REQUEST_TIMEOUT,
    }
 )

@@ -374,18 +377,19 @@ class OptionsFlow(config_entries.OptionsFlow):
        """Manage the options."""
        if user_input is not None:
            return self.async_create_entry(title="LLaMA Conversation", data=user_input)
-        schema = local_llama_config_option_schema(self.config_entry.options)
+        is_local_backend = self.config_entry.data[CONF_BACKEND_TYPE] != BACKEND_TYPE_REMOTE
+        schema = local_llama_config_option_schema(self.config_entry.options, is_local_backend)
        return self.async_show_form(
            step_id="init",
            data_schema=vol.Schema(schema),
        )


-def local_llama_config_option_schema(options: MappingProxyType[str, Any]) -> dict:
+def local_llama_config_option_schema(options: MappingProxyType[str, Any], is_local_backend: bool) -> dict:
    """Return a schema for Local LLaMA completion options."""
    if not options:
        options = DEFAULT_OPTIONS
-    return {
+    result = {
        vol.Optional(
            CONF_PROMPT,
            description={"suggested_value": options[CONF_PROMPT]},
@@ -412,3 +416,12 @@ def local_llama_config_option_schema(options: MappingProxyType[str, Any]) -> dic
            default=DEFAULT_TEMPERATURE,
        ): NumberSelector(NumberSelectorConfig(min=0, max=1, step=0.05)),
    }
+
+    if not is_local_backend:
+        result[vol.Optional(
+            CONF_REQUEST_TIMEOUT,
+            description={"suggested_value": options[CONF_REQUEST_TIMEOUT]},
+            default=DEFAULT_REQUEST_TIMEOUT,
+        )] = int
+
+    return result
--- a/custom_components/llama_conversation/const.py
+++ b/custom_components/llama_conversation/const.py
@@ -16,12 +16,15 @@ CONF_TOP_P = "top_p"
 DEFAULT_TOP_P = 1
 CONF_TEMPERATURE = "temperature"
 DEFAULT_TEMPERATURE = 0.1
+CONF_REQUEST_TIMEOUT = "request_timeout"
+DEFAULT_REQUEST_TIMEOUT = 90
 CONF_BACKEND_TYPE = "model_backend"
 BACKEND_TYPE_LLAMA_HF = "Llama.cpp (HuggingFace)"
 BACKEND_TYPE_LLAMA_EXISTING = "Llama.cpp (existing model)"
 BACKEND_TYPE_REMOTE = "text-generation-webui API"
 DEFAULT_BACKEND_TYPE = BACKEND_TYPE_LLAMA_HF
-CONF_BACKEND_TYPE_OPTIONS = [ BACKEND_TYPE_LLAMA_HF, BACKEND_TYPE_LLAMA_EXISTING, BACKEND_TYPE_REMOTE]
+# CONF_BACKEND_TYPE_OPTIONS = [ BACKEND_TYPE_LLAMA_HF, BACKEND_TYPE_LLAMA_EXISTING, BACKEND_TYPE_REMOTE ]
+CONF_BACKEND_TYPE_OPTIONS = [ BACKEND_TYPE_REMOTE ]
 CONF_DOWNLOADED_MODEL_QUANTIZATION = "downloaded_model_quantization"
 CONF_DOWNLOADED_MODEL_QUANTIZATION_OPTIONS = ["Q8_0", "Q5_K_M", "Q4_K_M", "Q3_K_M"]
 DEFAULT_DOWNLOADED_MODEL_QUANTIZATION = "Q5_K_M"
--- a/custom_components/llama_conversation/manifest.json
+++ b/custom_components/llama_conversation/manifest.json
@@ -10,7 +10,6 @@
  "iot_class": "local_polling",
  "requirements": [
    "requests",
-    "huggingface-hub",
-    "llama-cpp-python>=0.2.24"
+    "huggingface-hub"
  ]
 }
--- a/custom_components/llama_conversation/translations/en.json
+++ b/custom_components/llama_conversation/translations/en.json
@@ -0,0 +1,57 @@
+{
+    "config": {
+        "error": {
+            "download_failed": "The download failed to complete!",
+            "failed_to_connect": "Failed to connect to the remote API. See the logs for more details.",
+            "missing_model_api": "The selected model is not provided by this API.",
+            "missing_model_file": "The provided file does not exist.",
+            "other_existing_local": "Another model is already loaded locally. Please unload it or configure a remote model.",
+            "unknown": "Unexpected error"
+        },
+        "progress": {
+            "download": "Please wait while the model is being downloaded from HuggingFace. This can take a few minutes."
+        },
+        "step": {
+            "local_model": {
+                "data": {
+                    "downloaded_model_file": "Local file name",
+                    "downloaded_model_quantization": "Downloaded model quantization",
+                    "huggingface_model": "HuggingFace Model"
+                },
+                "description": "Please configure llama.cpp for the model",
+                "title": "Configure llama.cpp"
+            },
+            "remote_model": {
+                "data": {
+                    "host": "API Hostname",
+                    "huggingface_model": "Model Name",
+                    "port": "API Port"
+                },
+                "description": "Provide the connection details for an instance of text-generation-webui that is hosting the model.",
+                "title": "Configure connection to remote API"
+            },
+            "user": {
+                "data": {
+                    "download_model_from_hf": "Download model from HuggingFace",
+                    "use_local_backend": "Use Llama.cpp"
+                },
+                "description": "Select the backend for running the model. Either Llama.cpp (locally) or text-generation-webui (remote).",
+                "title": "Select Backend"
+            }
+        }
+    },
+    "options": {
+        "step": {
+            "init": {
+                "data": {
+                    "max_new_tokens": "Maximum tokens to return in response",
+                    "prompt": "Prompt Template",
+                    "temperature": "Temperature",
+                    "top_k": "Top K",
+                    "top_p": "Top P",
+                    "request_timeout": "Remote Request Timeout (seconds)"
+                }
+            }
+        }
+    }
+}