make the addon actually work

This commit is contained in:
Alex O'Connell
2023-12-28 13:46:21 -05:00
parent 2b880136b9
commit 6e348ac472
10 changed files with 172 additions and 104 deletions

View File

@@ -63,19 +63,38 @@ The provided `custom_modeling_phi.py` has Gradient Checkpointing implemented for
## Home Assistant Component
In order to integrate with Home Assistant, we provide a `custom_component` that exposes the locally running LLM as a "conversation agent" that can be interacted with using a chat interface as well as integrate with Speech-to-Text and Text-to-Speech addons to enable interacting with the model by speaking.
The component can either run the model directly as part of the Home Assistant software using llama-cpp-python, or you can run the [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) project to provide access to the LLM via an API interface. When doing this, you can host the model yourself and point the addon at machine where the model is hosted, or you can run the model using text-generation-webui using the provided [custom Home Assistant addon](./addon/README.md).
The component can either run the model directly as part of the Home Assistant software using llama-cpp-python, or you can run the [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) project to provide access to the LLM via an API interface. When doing this, you can host the model yourself and point the add-on at machine where the model is hosted, or you can run the model using text-generation-webui using the provided [custom Home Assistant add-on](./addon/README.md).
### Installing
1. Ensure you have either the Samba, SSH, FTP, or another add-on installed that gives you access to the `config` folder
2. If there is not already a `custom_components` folder, create one now.
3. Copy the `custom_components/llama_conversation` folder from this repo to `config/custom_components/llama_conversation` on your Home Assistant machine.
4. Restart Home Assistant using the "Developer Tools" tab -> Services -> Run `homeassistant.restart`
5. The "LLaMA Conversation" integration should show up in the "Devices" section now.
### Setting up
When setting up the component, there are 3 different "backend" options to choose from:
1. Llama.cpp with a model from HuggingFace
2. Llama.cpp with a locally provided model
3. A remote instance of text-generateion-webui
3. A remote instance of text-generation-webui
**Setting up the Llama.cpp backend with a model from HuggingFace**:
TODO: need to build wheels for llama.cpp first
**Setting up the Llama.cpp backend with a locally downloaded model**:
TODO: need to build wheels for llama.cpp first
**Setting up the "remote" backend**:
### Configuring the component as a Conversation Agent
**NOTE: ANY DEVICES THAT YOU SELECT TO BE EXPOSED TO THE MODEL WILL BE ADDED AS CONTEXT AND POTENTIALLY HAVE THEIR STATE CHANGED BY THE MODEL. ONLY EXPOSE DEVICES THAT YOU ARE OK WITH THE MODEL MODIFYING THE STATE OF, EVEN IF IT IS NOT WHAT YOU REQUESTED. THE MODEL MAY OCCASIONALLY HALLUCINATE AND ISSUE COMMANDS TO THE WRONG DEVICE! USE AT YOUR OWN RISK.**
**NOTE: ANY DEVICES THAT YOU SELECT TO BE EXPOSED TO THE MODEL WILL BE ADDED AS CONTEXT AND POTENTIALLY HAVE THEIR STATE CHANGED BY THE MODEL. ONLY EXPOSE DEVICES THAT YOU ARE OK WITH THE MODEL MODIFYING THE STATE OF, EVEN IF IT IS NOT WHAT YOU REQUESTED. THE MODEL MAY OCCASIONALLY HALLUCINATE AND ISSUE COMMANDS TO THE WRONG DEVICE! USE AT YOUR OWN RISK.**
### Running the text-generation-webui add-on
In order to facilitate running the project entirely on the system where Home Assistant is installed, there is an experimental Home Assistant Add-on that runs the oobabooga/text-generation-webui to connect to using the "remote" backend option.
1. Ensure you have either the Samba, SSH, FTP, or another add-on installed that gives you access to the `addons` folder
2. Copy the `addon` folder from this repo to `addons/text-generation-webui` on your Home Assistant machine.
3. Go to the "Add-ons" section in settings and then pick the "Add-on Store" from the bottom right corner.
4. Select the 3 dots in the top right and click "Check for Updates" and Refresh the webpage.
5. There should now be a "Local Add-ons" section at the top of the "Add-on Store"
6. Install the `oobabooga-text-generation-webui` add-on. It will take ~15-20 minutes to build the image on a Raspberry Pi.
7. Copy any models you want to use to the `addon_configs/local_text-generation-webui/models` folder.

View File

@@ -1,111 +1,51 @@
ARG BUILD_FROM=atinoda/text-generation-webui:default
FROM alpine:latest as overlay-downloader
RUN apk add git && \
git clone https://github.com/hassio-addons/addon-ubuntu-base /tmp/addon-ubuntu-base
ARG BUILD_FROM=ghcr.io/hassio-addons/ubuntu-base:9.0.2
# hadolint ignore=DL3006
FROM ${BUILD_FROM}
# Environment variables
ENV \
CARGO_NET_GIT_FETCH_WITH_CLI=true \
DEBIAN_FRONTEND="noninteractive" \
HOME="/root" \
LANG="C.UTF-8" \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1 \
PIP_PREFER_BINARY=1 \
PS1="$(whoami)@$(hostname):$(pwd)$ " \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
S6_BEHAVIOUR_IF_STAGE2_FAILS=2 \
S6_CMD_WAIT_FOR_SERVICES_MAXTIME=0 \
S6_CMD_WAIT_FOR_SERVICES=1 \
YARN_HTTP_TIMEOUT=1000000 \
TERM="xterm-256color"
# Set shell
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
# Install base system
# Install text-generation-webui
ARG BUILD_ARCH=amd64
ARG BASHIO_VERSION="v0.16.0"
ARG S6_OVERLAY_VERSION="3.1.5.0"
ARG TEMPIO_VERSION="2021.09.0"
ARG APP_DIR=/app
RUN \
apt-get update \
\
&& apt-get install -y --no-install-recommends \
ca-certificates=20230311ubuntu0.22.04.1 \
curl=7.81.0-1ubuntu1.14 \
jq=1.6-2.1ubuntu3 \
tzdata=2023c-0ubuntu0.22.04.2 \
xz-utils=5.2.5-2ubuntu1 \
\
&& S6_ARCH="${BUILD_ARCH}" \
&& if [ "${BUILD_ARCH}" = "i386" ]; then S6_ARCH="i686"; \
elif [ "${BUILD_ARCH}" = "amd64" ]; then S6_ARCH="x86_64"; \
elif [ "${BUILD_ARCH}" = "armv7" ]; then S6_ARCH="arm"; fi \
\
&& curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-noarch.tar.xz" \
| tar -C / -Jxpf - \
\
&& curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-${S6_ARCH}.tar.xz" \
| tar -C / -Jxpf - \
\
&& curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-symlinks-noarch.tar.xz" \
| tar -C / -Jxpf - \
\
&& curl -L -s "https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-symlinks-arch.tar.xz" \
| tar -C / -Jxpf - \
\
&& mkdir -p /etc/fix-attrs.d \
&& mkdir -p /etc/services.d \
\
&& curl -J -L -o /tmp/bashio.tar.gz \
"https://github.com/hassio-addons/bashio/archive/${BASHIO_VERSION}.tar.gz" \
&& mkdir /tmp/bashio \
&& tar zxvf \
/tmp/bashio.tar.gz \
--strip 1 -C /tmp/bashio \
\
&& mv /tmp/bashio/lib /usr/lib/bashio \
&& ln -s /usr/lib/bashio/bashio /usr/bin/bashio \
\
&& curl -L -s -o /usr/bin/tempio \
"https://github.com/home-assistant/tempio/releases/download/${TEMPIO_VERSION}/tempio_${BUILD_ARCH}" \
&& chmod a+x /usr/bin/tempio \
ca-certificates \
curl \
git \
build-essential \
cmake \
python3.10 \
python3-dev \
python3-venv \
python3-pip \
\
&& git clone https://github.com/oobabooga/text-generation-webui.git ${APP_DIR}\
&& python3 -m pip install torch torchvision torchaudio py-cpuinfo==9.0.0 \
&& python3 -m pip install -r ${APP_DIR}/requirements_cpu_only_noavx2.txt -r ${APP_DIR}/extensions/openai/requirements.txt llama-cpp-python \
&& apt-get purge -y --auto-remove \
xz-utils \
git \
build-essential \
cmake \
python3-dev \
&& apt-get clean \
&& rm -fr \
/tmp/* \
/var/{cache,log}/* \
/var/lib/apt/lists/*
# Copy s6-overlay adjustments from cloned git repo
COPY --from=overlay-downloader /tmp/addon-ubuntu-base/base/s6-overlay /package/admin/s6-overlay-${S6_OVERLAY_VERSION}/
# Copy root filesystem for the base image
COPY --from=overlay-downloader /tmp/addon-ubuntu-base/base/rootfs /
# Copy root filesystem for our image
COPY rootfs /
# Entrypoint & CMD
ENTRYPOINT [ "/init" ]
# Build arugments
ARG BUILD_DATE
ARG BUILD_REF
ARG BUILD_VERSION
ARG BUILD_REPOSITORY
# TODO: figure out what is broken with file permissions
USER root
# Labels
LABEL \
io.hass.name="oobabooga text-generation-webui for ${BUILD_ARCH}" \
@@ -113,9 +53,6 @@ LABEL \
io.hass.arch="${BUILD_ARCH}" \
io.hass.type="addon" \
io.hass.version=${BUILD_VERSION} \
io.hass.base.version=${BUILD_VERSION} \
io.hass.base.name="ubuntu" \
io.hass.base.image="hassioaddons/ubuntu-base" \
maintainer="github.com/acon96" \
org.opencontainers.image.title="oobabooga text-generation-webui for ${BUILD_ARCH}" \
org.opencontainers.image.description="Home Assistant Community Add-on: ${BUILD_ARCH} oobabooga text-generation-webui" \

View File

@@ -1,4 +1,4 @@
# text-generation-webui - Home Assistant Addon
NOTE: This is super experimental and probably won't actually run on a Raspberry PI
NOTE: This is super experimental and may or may not work on a Raspberry Pi
This basically takes an existing Docker image and attempts to overlay the required files for Home Assistant to launch and recognize it as an addon.
Installs text-generation-webui into a docker container using CPU only mode (llama.cpp)

View File

@@ -1,3 +1,4 @@
---
build_from:
amd64: atinoda/text-generation-webui:default-snapshot-2023-10-29
aarch64: ghcr.io/hassio-addons/ubuntu-base:9.0.2
amd64: ghcr.io/hassio-addons/ubuntu-base:9.0.2

View File

@@ -1,16 +1,26 @@
---
name: oobabooga text-generation-webui
name: oobabooga-text-generation-webui
version: dev
slug: text-generation-webui
description: ""
url: ""
description: "A tool for running Large Language Models"
url: "https://github.com/oobabooga/text-generation-webui"
init: false
arch:
- amd64
- amd64
- aarch64
ports:
7860/tcp: 7860 # ingress
5000/tcp: 5000 # api
ports_description:
7860/tcp: Web interface (Not required for Ingress)
5000/tcp: OpenAI compatible API Server
ingress: true
ingress_port: 7860
# options: {}
# schema: {}
# TODO: figure out volume mounts so models persist between restarts
options: {}
schema:
log_level: list(trace|debug|info|notice|warning|error|fatal)?
models_directory: str?
map:
- media:rw
- share:rw
- addon_config:rw

View File

@@ -5,5 +5,29 @@
# ==============================================================================
bashio::log.info "Starting Text Generation Webui..."
cd /app
exec python3 /app/server.py --listen --verbose --api
APP_DIR="/app"
DEFAULT_MODELS_DIR="/config/models"
if bashio::config.has_value "models_directory" && ! bashio::config.is_empty "models_directory"; then
MODELS_DIR=$(bashio::config 'models_directory')
if ! bashio::fs.directory_exists "$MODELS_DIR"; then
MODELS_DIR=$DEFAULT_MODELS_DIR
mkdir -p $MODELS_DIR
bashio::log.warning "The provided models directory '$MODELS_DIR' does not exist! Defaulting to '$DEFAULT_MODELS_DIR'"
else
bashio::log.info "Using chosen storage for models: '$MODELS_DIR'"
fi
else
MODELS_DIR=$DEFAULT_MODELS_DIR
mkdir -p $MODELS_DIR
bashio::log.info "Using default local storage for models."
fi
# ensure we can access the folder
chmod 0777 $MODELS_DIR
export GRADIO_ROOT_PATH=$(bashio::addon.ingress_entry)
bashio::log.info "Serving app from $GRADIO_ROOT_PATH"
cd $APP_DIR
exec python3 server.py --listen --verbose --api --model-dir $MODELS_DIR

View File

@@ -8,7 +8,7 @@ from typing import Callable
import numpy.typing as npt
import numpy as np
from llama_cpp import Llama
# from llama_cpp import Llama
import requests
import re
import os
@@ -42,6 +42,7 @@ from .const import (
CONF_TEMPERATURE,
CONF_TOP_K,
CONF_TOP_P,
CONF_REQUEST_TIMEOUT,
CONF_BACKEND_TYPE,
CONF_DOWNLOADED_MODEL_FILE,
DEFAULT_MAX_TOKENS,
@@ -50,6 +51,7 @@ from .const import (
DEFAULT_TOP_K,
DEFAULT_TOP_P,
DEFAULT_BACKEND_TYPE,
DEFAULT_REQUEST_TIMEOUT,
BACKEND_TYPE_REMOTE,
DOMAIN,
)
@@ -112,6 +114,8 @@ class LLaMAAgent(conversation.AbstractConversationAgent):
if self.use_local_backend:
if not model_path:
raise Exception(f"Model was not found at '{model_path}'!")
raise NotImplementedError()
self.llm = Llama(
model_path=model_path,
@@ -242,8 +246,10 @@ class LLaMAAgent(conversation.AbstractConversationAgent):
generate_params["model"] = self.model_name
del generate_params["top_k"]
timeout = self.entry.options.get(CONF_REQUEST_TIMEOUT, DEFAULT_REQUEST_TIMEOUT)
result = requests.post(
f"{self.api_host}/v1/completions", json=generate_params, timeout=30
f"{self.api_host}/v1/completions", json=generate_params, timeout=timeout
)
result.raise_for_status()
except requests.RequestException as err:

View File

@@ -35,6 +35,7 @@ from .const import (
CONF_TEMPERATURE,
CONF_TOP_K,
CONF_TOP_P,
CONF_REQUEST_TIMEOUT,
CONF_BACKEND_TYPE,
CONF_BACKEND_TYPE_OPTIONS,
CONF_DOWNLOADED_MODEL_FILE,
@@ -48,6 +49,7 @@ from .const import (
DEFAULT_TEMPERATURE,
DEFAULT_TOP_K,
DEFAULT_TOP_P,
DEFAULT_REQUEST_TIMEOUT,
DEFAULT_BACKEND_TYPE,
BACKEND_TYPE_LLAMA_HF,
BACKEND_TYPE_LLAMA_EXISTING,
@@ -374,18 +376,19 @@ class OptionsFlow(config_entries.OptionsFlow):
"""Manage the options."""
if user_input is not None:
return self.async_create_entry(title="LLaMA Conversation", data=user_input)
schema = local_llama_config_option_schema(self.config_entry.options)
is_local_backend = self.config_entry.data[CONF_BACKEND_TYPE] != BACKEND_TYPE_REMOTE
schema = local_llama_config_option_schema(self.config_entry.options, is_local_backend)
return self.async_show_form(
step_id="init",
data_schema=vol.Schema(schema),
)
def local_llama_config_option_schema(options: MappingProxyType[str, Any]) -> dict:
def local_llama_config_option_schema(options: MappingProxyType[str, Any], is_local_backend: bool) -> dict:
"""Return a schema for Local LLaMA completion options."""
if not options:
options = DEFAULT_OPTIONS
return {
result = {
vol.Optional(
CONF_PROMPT,
description={"suggested_value": options[CONF_PROMPT]},
@@ -412,3 +415,12 @@ def local_llama_config_option_schema(options: MappingProxyType[str, Any]) -> dic
default=DEFAULT_TEMPERATURE,
): NumberSelector(NumberSelectorConfig(min=0, max=1, step=0.05)),
}
if not is_local_backend:
result[vol.Optional(
CONF_REQUEST_TIMEOUT,
description={"suggested_value": options[CONF_REQUEST_TIMEOUT]},
default=DEFAULT_REQUEST_TIMEOUT,
)] = int
return result

View File

@@ -16,12 +16,15 @@ CONF_TOP_P = "top_p"
DEFAULT_TOP_P = 1
CONF_TEMPERATURE = "temperature"
DEFAULT_TEMPERATURE = 0.1
CONF_REQUEST_TIMEOUT = "request_timeout"
DEFAULT_REQUEST_TIMEOUT = 90
CONF_BACKEND_TYPE = "model_backend"
BACKEND_TYPE_LLAMA_HF = "Llama.cpp (HuggingFace)"
BACKEND_TYPE_LLAMA_EXISTING = "Llama.cpp (existing model)"
BACKEND_TYPE_REMOTE = "text-generation-webui API"
DEFAULT_BACKEND_TYPE = BACKEND_TYPE_LLAMA_HF
CONF_BACKEND_TYPE_OPTIONS = [ BACKEND_TYPE_LLAMA_HF, BACKEND_TYPE_LLAMA_EXISTING, BACKEND_TYPE_REMOTE]
# CONF_BACKEND_TYPE_OPTIONS = [ BACKEND_TYPE_LLAMA_HF, BACKEND_TYPE_LLAMA_EXISTING, BACKEND_TYPE_REMOTE ]
CONF_BACKEND_TYPE_OPTIONS = [ BACKEND_TYPE_REMOTE ]
CONF_DOWNLOADED_MODEL_QUANTIZATION = "downloaded_model_quantization"
CONF_DOWNLOADED_MODEL_QUANTIZATION_OPTIONS = ["Q8_0", "Q5_K_M", "Q4_K_M", "Q3_K_M"]
DEFAULT_DOWNLOADED_MODEL_QUANTIZATION = "Q5_K_M"

View File

@@ -0,0 +1,56 @@
{
"config": {
"error": {
"download_failed": "The download failed to complete!",
"failed_to_connect": "Failed to connect to the remote API. See the logs for more details.",
"missing_model_api": "The selected model is not provided by this API.",
"missing_model_file": "The provided file does not exist.",
"other_existing_local": "Another model is already loaded locally. Please unload it or configure a remote model.",
"unknown": "Unexpected error"
},
"progress": {
"download": "Please wait while the model is being downloaded from HuggingFace. This can take a few minutes."
},
"step": {
"local_model": {
"data": {
"downloaded_model_file": "Local file name",
"downloaded_model_quantization": "Downloaded model quantization",
"huggingface_model": "HuggingFace Model"
},
"description": "Please configure llama.cpp for the model",
"title": "Configure llama.cpp"
},
"remote_model": {
"data": {
"host": "API Hostname",
"huggingface_model": "Model Name",
"port": "API Port"
},
"description": "Provide the connection details for an instance of text-generation-webui that is hosting the model.",
"title": "Configure connection to remote API"
},
"user": {
"data": {
"download_model_from_hf": "Download model from HuggingFace",
"use_local_backend": "Use Llama.cpp"
},
"description": "Select the backend for running the model. Either Llama.cpp (locally) or text-generation-webui (remote).",
"title": "Select Backend"
}
}
},
"options": {
"step": {
"init": {
"data": {
"max_new_tokens": "Maximum tokens to return in response",
"prompt": "Prompt Template",
"temperature": "Temperature",
"top_k": "Top K",
"top_p": "Top P"
}
}
}
}
}