make the addon actually work

2026-01-09 13:48:05 -05:00 · 2023-12-28 13:46:21 -05:00
parent 2b880136b9
commit 6e348ac472
10 changed files with 172 additions and 104 deletions
--- a/README.md
+++ b/README.md
@@ -63,19 +63,38 @@ The provided `custom_modeling_phi.py` has Gradient Checkpointing implemented for
 ## Home Assistant Component
 In order to integrate with Home Assistant, we provide a `custom_component` that exposes the locally running LLM as a "conversation agent" that can be interacted with using a chat interface as well as integrate with Speech-to-Text and Text-to-Speech addons to enable interacting with the model by speaking.  

-The component can either run the model directly as part of the Home Assistant software using llama-cpp-python, or you can run the [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) project to provide access to the LLM via an API interface. When doing this, you can host the model yourself and point the addon at machine where the model is hosted, or you can run the model using text-generation-webui using the provided [custom Home Assistant addon](./addon/README.md).
+The component can either run the model directly as part of the Home Assistant software using llama-cpp-python, or you can run the [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) project to provide access to the LLM via an API interface. When doing this, you can host the model yourself and point the add-on at machine where the model is hosted, or you can run the model using text-generation-webui using the provided [custom Home Assistant add-on](./addon/README.md).
+
+### Installing
+1. Ensure you have either the Samba, SSH, FTP, or another add-on installed that gives you access to the `config` folder
+2. If there is not already a `custom_components` folder, create one now.
+3. Copy the `custom_components/llama_conversation` folder from this repo to `config/custom_components/llama_conversation` on your Home Assistant machine.
+4. Restart Home Assistant using the "Developer Tools" tab -> Services -> Run `homeassistant.restart`
+5. The "LLaMA Conversation" integration should show up in the "Devices" section now.

 ### Setting up
 When setting up the component, there are 3 different "backend" options to choose from:
 1. Llama.cpp with a model from HuggingFace
 2. Llama.cpp with a locally provided model
-3. A remote instance of text-generateion-webui
+3. A remote instance of text-generation-webui

 **Setting up the Llama.cpp backend with a model from HuggingFace**:
-
+TODO: need to build wheels for llama.cpp first
 **Setting up the Llama.cpp backend with a locally downloaded model**:
+TODO: need to build wheels for llama.cpp first

 **Setting up the "remote" backend**:

 ### Configuring the component as a Conversation Agent
-**NOTE: ANY DEVICES THAT YOU SELECT TO BE EXPOSED TO THE MODEL WILL BE ADDED AS CONTEXT AND POTENTIALLY HAVE THEIR STATE CHANGED BY THE MODEL. ONLY EXPOSE DEVICES THAT YOU ARE OK WITH THE MODEL MODIFYING THE STATE OF, EVEN IF IT IS NOT WHAT YOU REQUESTED. THE MODEL MAY OCCASIONALLY HALLUCINATE AND ISSUE COMMANDS TO THE WRONG DEVICE! USE AT YOUR OWN RISK.**
+**NOTE: ANY DEVICES THAT YOU SELECT TO BE EXPOSED TO THE MODEL WILL BE ADDED AS CONTEXT AND POTENTIALLY HAVE THEIR STATE CHANGED BY THE MODEL. ONLY EXPOSE DEVICES THAT YOU ARE OK WITH THE MODEL MODIFYING THE STATE OF, EVEN IF IT IS NOT WHAT YOU REQUESTED. THE MODEL MAY OCCASIONALLY HALLUCINATE AND ISSUE COMMANDS TO THE WRONG DEVICE! USE AT YOUR OWN RISK.**
+
+### Running the text-generation-webui add-on
+In order to facilitate running the project entirely on the system where Home Assistant is installed, there is an experimental Home Assistant Add-on that runs the oobabooga/text-generation-webui to connect to using the "remote" backend option.
+
+1. Ensure you have either the Samba, SSH, FTP, or another add-on installed that gives you access to the `addons` folder
+2. Copy the `addon` folder from this repo to `addons/text-generation-webui` on your Home Assistant machine.
+3. Go to the "Add-ons" section in settings and then pick the "Add-on Store" from the bottom right corner.
+4. Select the 3 dots in the top right and click "Check for Updates" and Refresh the webpage.
+5. There should now be a "Local Add-ons" section at the top of the "Add-on Store"
+6. Install the `oobabooga-text-generation-webui` add-on. It will take ~15-20 minutes to build the image on a Raspberry Pi.
+7. Copy any models you want to use to the `addon_configs/local_text-generation-webui/models` folder.
--- a/addon/Dockerfile
+++ b/addon/Dockerfile
@@ -1,111 +1,51 @@
-ARG BUILD_FROM=atinoda/text-generation-webui:default
-
-FROM alpine:latest as overlay-downloader
-RUN apk add git && \
-    git clone https://github.com/hassio-addons/addon-ubuntu-base /tmp/addon-ubuntu-base
+ARG BUILD_FROM=ghcr.io/hassio-addons/ubuntu-base:9.0.2

 # hadolint ignore=DL3006
 FROM ${BUILD_FROM}

-# Environment variables
-ENV \
-    CARGO_NET_GIT_FETCH_WITH_CLI=true \
-    DEBIAN_FRONTEND="noninteractive" \
-    HOME="/root" \
-    LANG="C.UTF-8" \
-    PIP_DISABLE_PIP_VERSION_CHECK=1 \
-    PIP_NO_CACHE_DIR=1 \
-    PIP_PREFER_BINARY=1 \
-    PS1="$(whoami)@$(hostname):$(pwd)$ " \
-    PYTHONDONTWRITEBYTECODE=1 \
-    PYTHONUNBUFFERED=1 \
-    S6_BEHAVIOUR_IF_STAGE2_FAILS=2 \
-    S6_CMD_WAIT_FOR_SERVICES_MAXTIME=0 \
-    S6_CMD_WAIT_FOR_SERVICES=1 \
-    YARN_HTTP_TIMEOUT=1000000 \
-    TERM="xterm-256color"
-
 # Set shell
 SHELL ["/bin/bash", "-o", "pipefail", "-c"]

-# Install base system
+# Install text-generation-webui
 ARG BUILD_ARCH=amd64
-ARG BASHIO_VERSION="v0.16.0"
-ARG S6_OVERLAY_VERSION="3.1.5.0"
-ARG TEMPIO_VERSION="2021.09.0"
+ARG APP_DIR=/app
 RUN \
    apt-get update \
    \
    && apt-get install -y --no-install-recommends \
-        ca-certificates=20230311ubuntu0.22.04.1 \
-        curl=7.81.0-1ubuntu1.14 \
-        jq=1.6-2.1ubuntu3 \
-        tzdata=2023c-0ubuntu0.22.04.2 \
-        xz-utils=5.2.5-2ubuntu1 \
-    \
-    && S6_ARCH="${BUILD_ARCH}" \
-    && if [ "${BUILD_ARCH}" = "i386" ]; then S6_ARCH="i686"; \
-    elif [ "${BUILD_ARCH}" = "amd64" ]; then S6_ARCH="x86_64"; \
-    elif [ "${BUILD_ARCH}" = "armv7" ]; then S6_ARCH="arm"; fi \
-    \
-    && curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-noarch.tar.xz" \
-        | tar -C / -Jxpf - \
-    \
-    && curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-${S6_ARCH}.tar.xz" \
-        | tar -C / -Jxpf - \
-    \
-    && curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-symlinks-noarch.tar.xz" \
-        | tar -C / -Jxpf - \
-    \
-    && curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-symlinks-arch.tar.xz" \
-        | tar -C / -Jxpf - \
-    \
-    && mkdir -p /etc/fix-attrs.d \
-    && mkdir -p /etc/services.d \
-    \
-    && curl -J -L -o /tmp/bashio.tar.gz \
-        "https://github.com/hassio-addons/bashio/archive/${BASHIO_VERSION}.tar.gz" \
-    && mkdir /tmp/bashio \
-    && tar zxvf \
-        /tmp/bashio.tar.gz \
-        --strip 1 -C /tmp/bashio \
-    \
-    && mv /tmp/bashio/lib /usr/lib/bashio \
-    && ln -s /usr/lib/bashio/bashio /usr/bin/bashio \
-    \
-    && curl -L -s -o /usr/bin/tempio \
-        "https://github.com/home-assistant/tempio/releases/download/${TEMPIO_VERSION}/tempio_${BUILD_ARCH}" \
-    && chmod a+x /usr/bin/tempio \
+        ca-certificates \
+        curl \
+        git \
+        build-essential \
+        cmake \
+        python3.10 \
+        python3-dev \
+        python3-venv \
+        python3-pip \
    \
+    && git clone https://github.com/oobabooga/text-generation-webui.git ${APP_DIR}\
+    && python3 -m pip install torch torchvision torchaudio py-cpuinfo==9.0.0 \
+    && python3 -m pip install -r ${APP_DIR}/requirements_cpu_only_noavx2.txt -r ${APP_DIR}/extensions/openai/requirements.txt llama-cpp-python \
    && apt-get purge -y --auto-remove \
-        xz-utils \
+        git \
+        build-essential \
+        cmake \
+        python3-dev \
    && apt-get clean \
    && rm -fr \
        /tmp/* \
        /var/{cache,log}/* \
        /var/lib/apt/lists/*

-# Copy s6-overlay adjustments from cloned git repo
-COPY --from=overlay-downloader /tmp/addon-ubuntu-base/base/s6-overlay /package/admin/s6-overlay-${S6_OVERLAY_VERSION}/
-
-# Copy root filesystem for the base image
-COPY --from=overlay-downloader /tmp/addon-ubuntu-base/base/rootfs /
-
 # Copy root filesystem for our image
 COPY rootfs /

-# Entrypoint & CMD
-ENTRYPOINT [ "/init" ]
-
 # Build arugments
 ARG BUILD_DATE
 ARG BUILD_REF
 ARG BUILD_VERSION
 ARG BUILD_REPOSITORY

-# TODO: figure out what is broken with file permissions
-USER root
-
 # Labels
 LABEL \
    io.hass.name="oobabooga text-generation-webui for ${BUILD_ARCH}" \
@@ -113,9 +53,6 @@ LABEL \
    io.hass.arch="${BUILD_ARCH}" \
    io.hass.type="addon" \
    io.hass.version=${BUILD_VERSION} \
-    io.hass.base.version=${BUILD_VERSION} \
-    io.hass.base.name="ubuntu" \
-    io.hass.base.image="hassioaddons/ubuntu-base" \
    maintainer="github.com/acon96" \
    org.opencontainers.image.title="oobabooga text-generation-webui for ${BUILD_ARCH}" \
    org.opencontainers.image.description="Home Assistant Community Add-on: ${BUILD_ARCH} oobabooga text-generation-webui" \
--- a/addon/README.md
+++ b/addon/README.md
@@ -1,4 +1,4 @@
 # text-generation-webui - Home Assistant Addon
-NOTE: This is super experimental and probably won't actually run on a Raspberry PI
+NOTE: This is super experimental and may or may not work on a Raspberry Pi

-This basically takes an existing Docker image and attempts to overlay the required files for Home Assistant to launch and recognize it as an addon.
+Installs text-generation-webui into a docker container using CPU only mode (llama.cpp)
--- a/addon/build.yaml
+++ b/addon/build.yaml
@@ -1,3 +1,4 @@
 ---
 build_from:
-  amd64: atinoda/text-generation-webui:default-snapshot-2023-10-29
+  aarch64: ghcr.io/hassio-addons/ubuntu-base:9.0.2
+  amd64: ghcr.io/hassio-addons/ubuntu-base:9.0.2
--- a/addon/config.yaml
+++ b/addon/config.yaml
@@ -1,16 +1,26 @@
 ---
-name: oobabooga text-generation-webui
+name: oobabooga-text-generation-webui
 version: dev
 slug: text-generation-webui
-description: ""
-url: ""
+description: "A tool for running Large Language Models"
+url: "https://github.com/oobabooga/text-generation-webui"
+init: false
 arch:
- amd64
+  - amd64
+  - aarch64
 ports:
  7860/tcp: 7860 # ingress
  5000/tcp: 5000 # api
+ports_description:
+  7860/tcp: Web interface (Not required for Ingress)
+  5000/tcp: OpenAI compatible API Server
 ingress: true
 ingress_port: 7860
-# options: {}
-# schema: {}
-# TODO: figure out volume mounts so models persist between restarts
+options: {}
+schema:
+  log_level: list(trace|debug|info|notice|warning|error|fatal)?
+  models_directory: str?
+map:
+  - media:rw
+  - share:rw
+  - addon_config:rw
--- a/addon/rootfs/etc/s6-overlay/s6-rc.d/text-generation-webui/run
+++ b/addon/rootfs/etc/s6-overlay/s6-rc.d/text-generation-webui/run
@@ -5,5 +5,29 @@
 # ==============================================================================
 bashio::log.info "Starting Text Generation Webui..."

-cd /app
-exec python3 /app/server.py --listen --verbose --api
+APP_DIR="/app"
+DEFAULT_MODELS_DIR="/config/models"
+
+if bashio::config.has_value "models_directory" && ! bashio::config.is_empty "models_directory"; then
+    MODELS_DIR=$(bashio::config 'models_directory')
+    if ! bashio::fs.directory_exists "$MODELS_DIR"; then
+        MODELS_DIR=$DEFAULT_MODELS_DIR
+        mkdir -p $MODELS_DIR
+        bashio::log.warning "The provided models directory '$MODELS_DIR' does not exist! Defaulting to '$DEFAULT_MODELS_DIR'"
+    else
+        bashio::log.info "Using chosen storage for models: '$MODELS_DIR'"
+    fi
+else
+    MODELS_DIR=$DEFAULT_MODELS_DIR
+    mkdir -p $MODELS_DIR
+    bashio::log.info "Using default local storage for models."
+fi
+
+# ensure we can access the folder
+chmod 0777 $MODELS_DIR
+
+export GRADIO_ROOT_PATH=$(bashio::addon.ingress_entry)
+bashio::log.info "Serving app from $GRADIO_ROOT_PATH"
+
+cd $APP_DIR
+exec python3 server.py --listen --verbose --api --model-dir $MODELS_DIR
--- a/custom_components/llama_conversation/init.py
+++ b/custom_components/llama_conversation/init.py
@@ -8,7 +8,7 @@ from typing import Callable
 import numpy.typing as npt
 import numpy as np

-from llama_cpp import Llama
+# from llama_cpp import Llama
 import requests
 import re
 import os
@@ -42,6 +42,7 @@ from .const import (
    CONF_TEMPERATURE,
    CONF_TOP_K,
    CONF_TOP_P,
+    CONF_REQUEST_TIMEOUT,
    CONF_BACKEND_TYPE,
    CONF_DOWNLOADED_MODEL_FILE,
    DEFAULT_MAX_TOKENS,
@@ -50,6 +51,7 @@ from .const import (
    DEFAULT_TOP_K,
    DEFAULT_TOP_P,
    DEFAULT_BACKEND_TYPE,
+    DEFAULT_REQUEST_TIMEOUT,
    BACKEND_TYPE_REMOTE,
    DOMAIN,
 )
@@ -112,6 +114,8 @@ class LLaMAAgent(conversation.AbstractConversationAgent):
        if self.use_local_backend:
            if not model_path:
                raise Exception(f"Model was not found at '{model_path}'!")
+            
+            raise NotImplementedError()

            self.llm = Llama(
                model_path=model_path,
@@ -242,8 +246,10 @@ class LLaMAAgent(conversation.AbstractConversationAgent):
            generate_params["model"] = self.model_name
            del generate_params["top_k"]

+            timeout = self.entry.options.get(CONF_REQUEST_TIMEOUT, DEFAULT_REQUEST_TIMEOUT)
+
            result = requests.post(
-                f"{self.api_host}/v1/completions", json=generate_params, timeout=30
+                f"{self.api_host}/v1/completions", json=generate_params, timeout=timeout
            )
            result.raise_for_status()
        except requests.RequestException as err:
--- a/custom_components/llama_conversation/config_flow.py
+++ b/custom_components/llama_conversation/config_flow.py
@@ -35,6 +35,7 @@ from .const import (
    CONF_TEMPERATURE,
    CONF_TOP_K,
    CONF_TOP_P,
+    CONF_REQUEST_TIMEOUT,
    CONF_BACKEND_TYPE,
    CONF_BACKEND_TYPE_OPTIONS,
    CONF_DOWNLOADED_MODEL_FILE,
@@ -48,6 +49,7 @@ from .const import (
    DEFAULT_TEMPERATURE,
    DEFAULT_TOP_K,
    DEFAULT_TOP_P,
+    DEFAULT_REQUEST_TIMEOUT,
    DEFAULT_BACKEND_TYPE,
    BACKEND_TYPE_LLAMA_HF,
    BACKEND_TYPE_LLAMA_EXISTING,
@@ -374,18 +376,19 @@ class OptionsFlow(config_entries.OptionsFlow):
        """Manage the options."""
        if user_input is not None:
            return self.async_create_entry(title="LLaMA Conversation", data=user_input)
-        schema = local_llama_config_option_schema(self.config_entry.options)
+        is_local_backend = self.config_entry.data[CONF_BACKEND_TYPE] != BACKEND_TYPE_REMOTE
+        schema = local_llama_config_option_schema(self.config_entry.options, is_local_backend)
        return self.async_show_form(
            step_id="init",
            data_schema=vol.Schema(schema),
        )


-def local_llama_config_option_schema(options: MappingProxyType[str, Any]) -> dict:
+def local_llama_config_option_schema(options: MappingProxyType[str, Any], is_local_backend: bool) -> dict:
    """Return a schema for Local LLaMA completion options."""
    if not options:
        options = DEFAULT_OPTIONS
-    return {
+    result = {
        vol.Optional(
            CONF_PROMPT,
            description={"suggested_value": options[CONF_PROMPT]},
@@ -412,3 +415,12 @@ def local_llama_config_option_schema(options: MappingProxyType[str, Any]) -> dic
            default=DEFAULT_TEMPERATURE,
        ): NumberSelector(NumberSelectorConfig(min=0, max=1, step=0.05)),
    }
+
+    if not is_local_backend:
+        result[vol.Optional(
+            CONF_REQUEST_TIMEOUT,
+            description={"suggested_value": options[CONF_REQUEST_TIMEOUT]},
+            default=DEFAULT_REQUEST_TIMEOUT,
+        )] = int
+
+    return result
--- a/custom_components/llama_conversation/const.py
+++ b/custom_components/llama_conversation/const.py
@@ -16,12 +16,15 @@ CONF_TOP_P = "top_p"
 DEFAULT_TOP_P = 1
 CONF_TEMPERATURE = "temperature"
 DEFAULT_TEMPERATURE = 0.1
+CONF_REQUEST_TIMEOUT = "request_timeout"
+DEFAULT_REQUEST_TIMEOUT = 90
 CONF_BACKEND_TYPE = "model_backend"
 BACKEND_TYPE_LLAMA_HF = "Llama.cpp (HuggingFace)"
 BACKEND_TYPE_LLAMA_EXISTING = "Llama.cpp (existing model)"
 BACKEND_TYPE_REMOTE = "text-generation-webui API"
 DEFAULT_BACKEND_TYPE = BACKEND_TYPE_LLAMA_HF
-CONF_BACKEND_TYPE_OPTIONS = [ BACKEND_TYPE_LLAMA_HF, BACKEND_TYPE_LLAMA_EXISTING, BACKEND_TYPE_REMOTE]
+# CONF_BACKEND_TYPE_OPTIONS = [ BACKEND_TYPE_LLAMA_HF, BACKEND_TYPE_LLAMA_EXISTING, BACKEND_TYPE_REMOTE ]
+CONF_BACKEND_TYPE_OPTIONS = [ BACKEND_TYPE_REMOTE ]
 CONF_DOWNLOADED_MODEL_QUANTIZATION = "downloaded_model_quantization"
 CONF_DOWNLOADED_MODEL_QUANTIZATION_OPTIONS = ["Q8_0", "Q5_K_M", "Q4_K_M", "Q3_K_M"]
 DEFAULT_DOWNLOADED_MODEL_QUANTIZATION = "Q5_K_M"
--- a/custom_components/llama_conversation/translations/en.json
+++ b/custom_components/llama_conversation/translations/en.json
@@ -0,0 +1,56 @@
+{
+    "config": {
+        "error": {
+            "download_failed": "The download failed to complete!",
+            "failed_to_connect": "Failed to connect to the remote API. See the logs for more details.",
+            "missing_model_api": "The selected model is not provided by this API.",
+            "missing_model_file": "The provided file does not exist.",
+            "other_existing_local": "Another model is already loaded locally. Please unload it or configure a remote model.",
+            "unknown": "Unexpected error"
+        },
+        "progress": {
+            "download": "Please wait while the model is being downloaded from HuggingFace. This can take a few minutes."
+        },
+        "step": {
+            "local_model": {
+                "data": {
+                    "downloaded_model_file": "Local file name",
+                    "downloaded_model_quantization": "Downloaded model quantization",
+                    "huggingface_model": "HuggingFace Model"
+                },
+                "description": "Please configure llama.cpp for the model",
+                "title": "Configure llama.cpp"
+            },
+            "remote_model": {
+                "data": {
+                    "host": "API Hostname",
+                    "huggingface_model": "Model Name",
+                    "port": "API Port"
+                },
+                "description": "Provide the connection details for an instance of text-generation-webui that is hosting the model.",
+                "title": "Configure connection to remote API"
+            },
+            "user": {
+                "data": {
+                    "download_model_from_hf": "Download model from HuggingFace",
+                    "use_local_backend": "Use Llama.cpp"
+                },
+                "description": "Select the backend for running the model. Either Llama.cpp (locally) or text-generation-webui (remote).",
+                "title": "Select Backend"
+            }
+        }
+    },
+    "options": {
+        "step": {
+            "init": {
+                "data": {
+                    "max_new_tokens": "Maximum tokens to return in response",
+                    "prompt": "Prompt Template",
+                    "temperature": "Temperature",
+                    "top_k": "Top K",
+                    "top_p": "Top P"
+                }
+            }
+        }
+    }
+}