Merge pull request #155 from acon96/release/v0.3

Release v0.3
This commit is contained in:
Alex O'Connell
2024-06-07 00:05:26 -04:00
committed by GitHub
20 changed files with 712 additions and 772 deletions

View File

@@ -1,19 +1,23 @@
# Home LLM
![Banner Logo](/docs/banner.svg)
This project provides the required "glue" components to control your Home Assistant installation with a completely local Large Language Model acting as a personal assistant. The goal is to provide a drop in solution to be used as a "conversation agent" component by Home Assistant. The 2 main pieces of this solution are Home LLM and Llama Conversation.
This project provides the required "glue" components to control your Home Assistant installation with a **completely local** Large Language Model acting as a personal assistant. The goal is to provide a drop in solution to be used as a "conversation agent" component by Home Assistant. The 2 main pieces of this solution are the Home LLM model and Local LLM Conversation integration.
## Quick Start
Please see the [Setup Guide](./docs/Setup.md) for more information on installation.
## LLama Conversation Integration
## Local LLM Conversation Integration
In order to integrate with Home Assistant, we provide a custom component that exposes the locally running LLM as a "conversation agent".
This component can be interacted with in a few ways:
- using a chat interface so you can chat with it.
- integrating with Speech-to-Text and Text-to-Speech addons so you can just speak to it.
The component can either run the model directly as part of the Home Assistant software using llama-cpp-python, or you can run [Ollama](https://ollama.com/) (simple) or the [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) project (advanced) to provide access to the LLM via an API interface.
The integration can either run the model in 2 different ways:
1. Directly as part of the Home Assistant software using llama-cpp-python
2. On a separate machine using one of the following backends:
- [Ollama](https://ollama.com/) (easier)
- [LocalAI](https://localai.io/) via the Generic OpenAI backend (easier)
- [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) project (advanced)
- [llama.cpp example server](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md) (advanced)
## Home LLM Model
The "Home" models are a fine tuning of various Large Languages Models that are under 5B parameters. The models are able to control devices in the user's house as well as perform basic question and answering. The fine tuning dataset is a [custom synthetic dataset](./data) designed to teach the model function calling based on the device information in the context.
@@ -27,10 +31,12 @@ The latest models can be found on HuggingFace:
<summary>Old Models</summary>
3B v2 (Based on Phi-2): https://huggingface.co/acon96/Home-3B-v2-GGUF (ChatML prompt format)
3B v1 (Based on Phi-2): https://huggingface.co/acon96/Home-3B-v1-GGUF (ChatML prompt format)
1B v2 (Based on Phi-1.5): https://huggingface.co/acon96/Home-1B-v2-GGUF (ChatML prompt format)
1B v1 (Based on Phi-1.5): https://huggingface.co/acon96/Home-1B-v1-GGUF (ChatML prompt format)
NOTE: The models below are only compatible with version 0.2.17 and older!
3B v1 (Based on Phi-2): https://huggingface.co/acon96/Home-3B-v1-GGUF (ChatML prompt format)
</details>
The model is quantized using Llama.cpp in order to enable running the model in super low resource environments that are common with Home Assistant installations such as Raspberry Pis.
@@ -126,6 +132,7 @@ In order to facilitate running the project entirely on the system where Home Ass
## Version History
| Version | Description |
|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| v0.3 | Adds support for Home Assistant LLM APIs, improved model prompting and tool formatting options, and automatic detection of GGUF quantization levels on HuggingFace |
| v0.2.17 | Disable native llama.cpp wheel optimizations, add Command R prompt format |
| v0.2.16 | Fix for missing huggingface_hub package preventing startup |
| v0.2.15 | Fix startup error when using llama.cpp backend and add flash attention to llama.cpp backend |

19
TODO.md
View File

@@ -1,10 +1,16 @@
# TODO
- [x] detection/mitigation of too many entities being exposed & blowing out the context length
- [ ] support new LLM APIs
- rewrite how services are called
- handle no API selected
- rewrite prompts + service block formats
- implement new LLM API that has `HassCallService` so old models can still work
- update dataset so new models will work with the API
- [ ] make ICL examples into conversation turns
- [ ] translate ICL examples + make better ones
- [ ] areas/room support
- [ ] figure out DPO to improve response quality
- [ ] train the model to respond to house events
- present the model with an event + a "prompt" from the user of what you want it to do (i.e. turn on the lights when I get home = the model turns on lights when your entity presence triggers as being home)
- basically lets you write automations in plain english
- [ ] convert requests to aiohttp
- [x] detection/mitigation of too many entities being exposed & blowing out the context length
- [ ] figure out DPO to improve response quality
- [x] setup github actions to build wheels that are optimized for RPIs
- [x] mixtral + prompting (no fine tuning)
- add in context learning variables to sys prompt template
@@ -45,3 +51,6 @@
- set up vectordb
- ingest home assistant docs
- "context request" from above to initiate a RAG search
- [ ] train the model to respond to house events
- present the model with an event + a "prompt" from the user of what you want it to do (i.e. turn on the lights when I get home = the model turns on lights when your entity presence triggers as being home)
- basically lets you write automations in plain english

View File

@@ -1,16 +1,21 @@
"""The Local LLaMA Conversation integration."""
"""The Local LLM Conversation integration."""
from __future__ import annotations
import logging
import homeassistant.components.conversation as ha_conversation
from homeassistant.config_entries import ConfigEntry
from homeassistant.const import ATTR_ENTITY_ID
from homeassistant.core import HomeAssistant
from homeassistant.helpers import config_validation as cv
from homeassistant.exceptions import HomeAssistantError
from homeassistant.helpers import config_validation as cv, llm
from homeassistant.util.json import JsonObjectType
import voluptuous as vol
from .agent import (
LLaMAAgent,
LocalLLaMAAgent,
LocalLLMAgent,
LlamaCppAgent,
GenericOpenAIAPIAgent,
TextGenerationWebuiAgent,
LlamaCppPythonAPIAgent,
@@ -26,7 +31,10 @@ from .const import (
BACKEND_TYPE_GENERIC_OPENAI,
BACKEND_TYPE_LLAMA_CPP_PYTHON_SERVER,
BACKEND_TYPE_OLLAMA,
ALLOWED_LEGACY_SERVICE_CALL_ARGUMENTS,
DOMAIN,
HOME_LLM_API_ID,
SERVICE_TOOL_NAME,
)
_LOGGER = logging.getLogger(__name__)
@@ -38,19 +46,19 @@ async def update_listener(hass: HomeAssistant, entry: ConfigEntry):
hass.data[DOMAIN][entry.entry_id] = entry
# call update handler
agent: LLaMAAgent = await ha_conversation._get_agent_manager(hass).async_get_agent(entry.entry_id)
agent._update_options()
agent: LocalLLMAgent = ha_conversation.get_agent_manager(hass).async_get_agent(entry.entry_id)
await hass.async_add_executor_job(agent._update_options)
return True
async def async_setup_entry(hass: HomeAssistant, entry: ConfigEntry) -> bool:
"""Set up Local LLaMA Conversation from a config entry."""
"""Set up Local LLM Conversation from a config entry."""
def create_agent(backend_type):
agent_cls = None
if backend_type in [ BACKEND_TYPE_LLAMA_HF, BACKEND_TYPE_LLAMA_EXISTING ]:
agent_cls = LocalLLaMAAgent
agent_cls = LlamaCppAgent
elif backend_type == BACKEND_TYPE_GENERIC_OPENAI:
agent_cls = GenericOpenAIAPIAgent
elif backend_type == BACKEND_TYPE_TEXT_GEN_WEBUI:
@@ -78,7 +86,7 @@ async def async_setup_entry(hass: HomeAssistant, entry: ConfigEntry) -> bool:
async def async_unload_entry(hass: HomeAssistant, entry: ConfigEntry) -> bool:
"""Unload Local LLaMA."""
"""Unload Local LLM."""
hass.data[DOMAIN].pop(entry.entry_id)
ha_conversation.async_unset_agent(hass, entry)
return True
@@ -87,18 +95,79 @@ async def async_migrate_entry(hass, config_entry: ConfigEntry):
"""Migrate old entry."""
_LOGGER.debug("Migrating from version %s", config_entry.version)
if config_entry.version > 1:
# This means the user has downgraded from a future version
return False
# if config_entry.version < 2:
# # just ensure that the defaults are set
# new_options = dict(DEFAULT_OPTIONS)
# new_options.update(config_entry.options)
# config_entry.version = 2
# hass.config_entries.async_update_entry(config_entry, options=new_options)
# 1 -> 2: This was a breaking change so force users to re-create entries
if config_entry.version == 1:
_LOGGER.error("Cannot upgrade models that were created prior to v0.3. Please delete and re-create them.")
return False
_LOGGER.debug("Migration to version %s successful", config_entry.version)
return True
class HassServiceTool(llm.Tool):
"""Tool to get the current time."""
name = SERVICE_TOOL_NAME
description = "Executes a Home Assistant service"
# Optional. A voluptuous schema of the input parameters.
parameters = vol.Schema({
vol.Required('service'): str,
vol.Required('target_device'): str,
vol.Optional('rgb_color'): str,
vol.Optional('brightness'): float,
vol.Optional('temperature'): float,
vol.Optional('humidity'): float,
vol.Optional('fan_mode'): str,
vol.Optional('hvac_mode'): str,
vol.Optional('preset_mode'): str,
vol.Optional('duration'): str,
vol.Optional('item'): str,
})
async def async_call(
self, hass: HomeAssistant, tool_input: llm.ToolInput, llm_context: llm.LLMContext
) -> JsonObjectType:
"""Call the tool."""
domain, service = tuple(tool_input.tool_args["service"].split("."))
target_device = tool_input.tool_args["target_device"]
service_data = {ATTR_ENTITY_ID: target_device}
for attr in ALLOWED_LEGACY_SERVICE_CALL_ARGUMENTS:
if attr in tool_input.tool_args.keys():
service_data[attr] = tool_input.tool_args[attr]
try:
await hass.services.async_call(
domain,
service,
service_data=service_data,
blocking=True,
)
except Exception:
_LOGGER.exception("Failed to execute service for model")
return { "result": "failed" }
return { "result": "success" }
class HomeLLMAPI(llm.API):
"""
An API that allows calling Home Assistant services to maintain compatibility
with the older (v3 and older) Home LLM models
"""
def __init__(self, hass: HomeAssistant) -> None:
"""Init the class."""
super().__init__(
hass=hass,
id=HOME_LLM_API_ID,
name="Home-LLM (v1-v3)",
)
async def async_get_api_instance(self, llm_context: llm.LLMContext) -> llm.APIInstance:
"""Return the instance of the API."""
return llm.APIInstance(
api=self,
api_prompt="Call services in Home Assistant by passing the service name and the device to control.",
llm_context=llm_context,
tools=[HassServiceTool()],
)

View File

@@ -5,6 +5,7 @@ import logging
import threading
import importlib
from typing import Literal, Any, Callable
import voluptuous as vol
import requests
import re
@@ -15,17 +16,21 @@ import random
import time
from homeassistant.components.conversation import ConversationInput, ConversationResult, AbstractConversationAgent
import homeassistant.components.conversation as ha_conversation
from homeassistant.components.conversation.const import DOMAIN as CONVERSATION_DOMAIN
from homeassistant.components.homeassistant.exposed_entities import async_should_expose
from homeassistant.config_entries import ConfigEntry
from homeassistant.const import ATTR_ENTITY_ID, CONF_HOST, CONF_PORT, CONF_SSL, MATCH_ALL
from homeassistant.const import ATTR_ENTITY_ID, CONF_HOST, CONF_PORT, CONF_SSL, MATCH_ALL, CONF_LLM_HASS_API
from homeassistant.core import HomeAssistant, callback
from homeassistant.exceptions import ConfigEntryNotReady, ConfigEntryError, TemplateError
from homeassistant.helpers import config_validation as cv, intent, template, entity_registry as er
from homeassistant.exceptions import ConfigEntryNotReady, ConfigEntryError, TemplateError, HomeAssistantError
from homeassistant.helpers import config_validation as cv, intent, template, entity_registry as er, llm
from homeassistant.helpers.event import async_track_state_change, async_call_later
from homeassistant.util import ulid
from homeassistant.util import ulid, color
from .utils import closest_color, flatten_vol_schema, install_llama_cpp_python, validate_llama_cpp_python_installation
import voluptuous_serialize
from .utils import closest_color, flatten_vol_schema, custom_custom_serializer, install_llama_cpp_python, \
validate_llama_cpp_python_installation
from .const import (
CONF_CHAT_MODEL,
CONF_MAX_TOKENS,
@@ -39,8 +44,9 @@ from .const import (
CONF_BACKEND_TYPE,
CONF_DOWNLOADED_MODEL_FILE,
CONF_EXTRA_ATTRIBUTES_TO_EXPOSE,
CONF_ALLOWED_SERVICE_CALL_ARGUMENTS,
CONF_PROMPT_TEMPLATE,
CONF_TOOL_FORMAT,
CONF_TOOL_MULTI_TURN_CHAT,
CONF_ENABLE_FLASH_ATTENTION,
CONF_USE_GBNF_GRAMMAR,
CONF_GBNF_GRAMMAR_FILE,
@@ -74,8 +80,9 @@ from .const import (
DEFAULT_BACKEND_TYPE,
DEFAULT_REQUEST_TIMEOUT,
DEFAULT_EXTRA_ATTRIBUTES_TO_EXPOSE,
DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS,
DEFAULT_PROMPT_TEMPLATE,
DEFAULT_TOOL_FORMAT,
DEFAULT_TOOL_MULTI_TURN_CHAT,
DEFAULT_ENABLE_FLASH_ATTENTION,
DEFAULT_USE_GBNF_GRAMMAR,
DEFAULT_GBNF_GRAMMAR_FILE,
@@ -99,8 +106,14 @@ from .const import (
TEXT_GEN_WEBUI_CHAT_MODE_CHAT,
TEXT_GEN_WEBUI_CHAT_MODE_INSTRUCT,
TEXT_GEN_WEBUI_CHAT_MODE_CHAT_INSTRUCT,
ALLOWED_LEGACY_SERVICE_CALL_ARGUMENTS,
DOMAIN,
HOME_LLM_API_ID,
SERVICE_TOOL_NAME,
PROMPT_TEMPLATE_DESCRIPTIONS,
TOOL_FORMAT_FULL,
TOOL_FORMAT_REDUCED,
TOOL_FORMAT_MINIMAL,
)
# make type checking work for llama-cpp-python without importing it directly at runtime
@@ -114,8 +127,8 @@ _LOGGER = logging.getLogger(__name__)
CONFIG_SCHEMA = cv.config_entry_only_config_schema(DOMAIN)
class LLaMAAgent(AbstractConversationAgent):
"""Base LLaMA conversation agent."""
class LocalLLMAgent(AbstractConversationAgent):
"""Base Local LLM conversation agent."""
hass: HomeAssistant
entry_id: str
@@ -146,7 +159,7 @@ class LLaMAAgent(AbstractConversationAgent):
with open(icl_filename, encoding="utf-8-sig") as f:
self.in_context_examples = list(csv.DictReader(f))
if set(self.in_context_examples[0].keys()) != set(["service", "response" ]):
if set(self.in_context_examples[0].keys()) != set(["type", "request", "tool", "response" ]):
raise Exception("ICL csv file did not have 2 columns: service & response")
if len(self.in_context_examples) == 0:
@@ -209,8 +222,6 @@ class LLaMAAgent(AbstractConversationAgent):
remember_conversation = self.entry.options.get(CONF_REMEMBER_CONVERSATION, DEFAULT_REMEMBER_CONVERSATION)
remember_num_interactions = self.entry.options.get(CONF_REMEMBER_NUM_INTERACTIONS, DEFAULT_REMEMBER_NUM_INTERACTIONS)
service_call_regex = self.entry.options.get(CONF_SERVICE_CALL_REGEX, DEFAULT_SERVICE_CALL_REGEX)
allowed_service_call_arguments = self.entry.options \
.get(CONF_ALLOWED_SERVICE_CALL_ARGUMENTS, DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS)
try:
service_call_pattern = re.compile(service_call_regex)
@@ -225,6 +236,31 @@ class LLaMAAgent(AbstractConversationAgent):
return ConversationResult(
response=intent_response, conversation_id=conversation_id
)
llm_api: llm.APIInstance | None = None
if self.entry.options.get(CONF_LLM_HASS_API):
try:
llm_api = await llm.async_get_api(
self.hass,
self.entry.options[CONF_LLM_HASS_API],
llm_context=llm.LLMContext(
platform=DOMAIN,
context=user_input.context,
user_prompt=user_input.text,
language=user_input.language,
assistant=ha_conversation.DOMAIN,
device_id=user_input.device_id,
)
)
except HomeAssistantError as err:
_LOGGER.error("Error getting LLM API: %s", err)
intent_response.async_set_error(
intent.IntentResponseErrorCode.UNKNOWN,
f"Error preparing LLM API: {err}",
)
return conversation.ConversationResult(
response=intent_response, conversation_id=user_input.conversation_id
)
if user_input.conversation_id in self.history:
conversation_id = user_input.conversation_id
@@ -235,7 +271,7 @@ class LLaMAAgent(AbstractConversationAgent):
if len(conversation) == 0 or refresh_system_prompt:
try:
message = self._generate_system_prompt(raw_prompt)
message = self._generate_system_prompt(raw_prompt, llm_api)
except TemplateError as err:
_LOGGER.error("Error rendering prompt: %s", err)
intent_response = intent.IntentResponse(language=user_input.language)
@@ -269,7 +305,7 @@ class LLaMAAgent(AbstractConversationAgent):
intent_response = intent.IntentResponse(language=user_input.language)
intent_response.async_set_error(
intent.IntentResponseErrorCode.UNKNOWN,
intent.IntentResponseErrorCode.FAILED_TO_HANDLE,
f"Sorry, there was a problem talking to the backend: {repr(err)}",
)
return ConversationResult(
@@ -283,71 +319,119 @@ class LLaMAAgent(AbstractConversationAgent):
conversation.pop(1)
self.history[conversation_id] = conversation
if llm_api is None:
# return the output without messing with it if there is no API exposed to the model
intent_response = intent.IntentResponse(language=user_input.language)
intent_response.async_set_speech(response.strip())
return ConversationResult(
response=intent_response, conversation_id=conversation_id
)
# parse response
exposed_entities = list(self._async_get_exposed_entities()[0].keys())
to_say = service_call_pattern.sub("", response).strip()
to_say = service_call_pattern.sub("", response.strip())
for block in service_call_pattern.findall(response.strip()):
services = block.split("\n")
_LOGGER.info(f"running services: {' '.join(services)}")
parsed_tool_call: dict = json.loads(block)
for line in services:
if len(line) == 0:
break
if llm_api.api.id == HOME_LLM_API_ID:
schema_to_validate = vol.Schema({
vol.Required('service'): str,
vol.Required('target_device'): str,
vol.Optional('rgb_color'): str,
vol.Optional('brightness'): float,
vol.Optional('temperature'): float,
vol.Optional('humidity'): float,
vol.Optional('fan_mode'): str,
vol.Optional('hvac_mode'): str,
vol.Optional('preset_mode'): str,
vol.Optional('duration'): str,
vol.Optional('item'): str,
})
else:
schema_to_validate = vol.Schema({
vol.Required("name"): str,
vol.Required("arguments"): dict,
})
try:
schema_to_validate(parsed_tool_call)
except vol.Error as ex:
_LOGGER.info(f"LLM produced an improperly formatted response: {repr(ex)}")
# parse old format or JSON format
try:
json_output = json.loads(line)
service = json_output["service"]
entity = json_output["target_device"]
domain, service = tuple(service.split("."))
if "to_say" in json_output:
to_say = to_say + json_output.pop("to_say")
intent_response = intent.IntentResponse(language=user_input.language)
intent_response.async_set_error(
intent.IntentResponseErrorCode.NO_INTENT_MATCH,
f"I'm sorry, I didn't produce a correctly formatted tool call! Please see the logs for more info.",
)
return ConversationResult(
response=intent_response, conversation_id=conversation_id
)
extra_arguments = { k: v for k, v in json_output.items() if k not in [ "service", "target_device" ] }
except Exception:
try:
service = line.split("(")[0]
entity = line.split("(")[1][:-1]
domain, service = tuple(service.split("."))
extra_arguments = {}
except Exception:
to_say += f" Failed to parse call from '{line}'!"
continue
_LOGGER.info(f"calling tool: {block}")
# fix certain arguments
# make sure brightness is 0-255 and not a percentage
if "brightness" in extra_arguments and 0.0 < extra_arguments["brightness"] <= 1.0:
extra_arguments["brightness"] = int(extra_arguments["brightness"] * 255)
# try to fix certain arguments
args_dict = parsed_tool_call if llm_api.api.id == HOME_LLM_API_ID else parsed_tool_call["arguments"]
# convert string "tuple" to a list for RGB colors
if "rgb_color" in extra_arguments and isinstance(extra_arguments["rgb_color"], str):
extra_arguments["rgb_color"] = [ int(x) for x in extra_arguments["rgb_color"][1:-1].split(",") ]
# make sure brightness is 0-255 and not a percentage
if "brightness" in args_dict and 0.0 < args_dict["brightness"] <= 1.0:
args_dict["brightness"] = int(args_dict["brightness"] * 255)
# only acknowledge requests to exposed entities
if entity not in exposed_entities:
to_say += f" Can't find device '{entity}'!"
else:
# copy arguments to service call
service_data = {ATTR_ENTITY_ID: entity}
for attr in allowed_service_call_arguments:
if attr in extra_arguments.keys():
service_data[attr] = extra_arguments[attr]
try:
_LOGGER.debug(f"service data: {service_data}")
await self.hass.services.async_call(
domain,
service,
service_data=service_data,
blocking=True,
)
except Exception as err:
to_say += f"\nFailed to run: {line}"
_LOGGER.exception(f"Failed to run: {line}")
# convert string "tuple" to a list for RGB colors
if "rgb_color" in args_dict and isinstance(args_dict["rgb_color"], str):
args_dict["rgb_color"] = [ int(x) for x in args_dict["rgb_color"][1:-1].split(",") ]
if llm_api.api.id == HOME_LLM_API_ID:
to_say = to_say + parsed_tool_call.pop("to_say", "")
tool_input = llm.ToolInput(
tool_name=SERVICE_TOOL_NAME,
tool_args=parsed_tool_call,
)
else:
tool_input = llm.ToolInput(
tool_name=parsed_tool_call["name"],
tool_args=parsed_tool_call["arguments"],
)
if template_desc["assistant"]["suffix"]:
to_say = to_say.replace(template_desc["assistant"]["suffix"], "") # remove the eos token if it is returned (some backends + the old model does this)
try:
tool_response = await llm_api.async_call_tool(tool_input)
except (HomeAssistantError, vol.Invalid) as e:
tool_response = {"error": type(e).__name__}
if str(e):
tool_response["error_text"] = str(e)
intent_response = intent.IntentResponse(language=user_input.language)
intent_response.async_set_error(
intent.IntentResponseErrorCode.NO_INTENT_MATCH,
f"I'm sorry! I encountered an error calling the tool. See the logs for more info.",
)
return ConversationResult(
response=intent_response, conversation_id=conversation_id
)
_LOGGER.debug("Tool response: %s", tool_response)
# handle models that generate a function call and wait for the result before providing a response
if self.entry.options.get(CONF_TOOL_MULTI_TURN_CHAT, DEFAULT_TOOL_MULTI_TURN_CHAT):
conversation.append({"role": "tool", "message": json.dumps(tool_response)})
# generate a response based on the tool result
try:
_LOGGER.debug(conversation)
to_say = await self._async_generate(conversation)
_LOGGER.debug(to_say)
except Exception as err:
_LOGGER.exception("There was a problem talking to the backend")
intent_response = intent.IntentResponse(language=user_input.language)
intent_response.async_set_error(
intent.IntentResponseErrorCode.FAILED_TO_HANDLE,
f"Sorry, there was a problem talking to the backend: {repr(err)}",
)
return ConversationResult(
response=intent_response, conversation_id=conversation_id
)
conversation.append({"role": "assistant", "message": response})
# generate intent response to Home Assistant
intent_response = intent.IntentResponse(language=user_input.language)
@@ -396,7 +480,8 @@ class LLaMAAgent(AbstractConversationAgent):
for message in prompt:
role = message["role"]
message = message["message"]
role_desc = template_desc[role]
# fall back to the "user" role for unknown roles
role_desc = template_desc.get(role, template_desc["user"])
formatted_prompt = (
formatted_prompt + f"{role_desc['prefix']}{message}{role_desc['suffix']}\n"
)
@@ -406,47 +491,133 @@ class LLaMAAgent(AbstractConversationAgent):
_LOGGER.debug(formatted_prompt)
return formatted_prompt
def _format_tool(self, name: str, parameters: vol.Schema, description: str):
style = self.entry.options.get(CONF_TOOL_FORMAT, DEFAULT_TOOL_FORMAT)
def _generate_system_prompt(self, prompt_template: str) -> str:
if style == TOOL_FORMAT_MINIMAL:
result = f"{name}({','.join(flatten_vol_schema(parameters))})"
if description:
result = result + f" - {description}"
return result
raw_parameters: list = voluptuous_serialize.convert(
parameters, custom_serializer=custom_custom_serializer)
# handle vol.Any in the key side of things
processed_parameters = []
for param in raw_parameters:
if isinstance(param["name"], vol.Any):
for possible_name in param["name"].validators:
actual_param = param.copy()
actual_param["name"] = possible_name
actual_param["required"] = False
processed_parameters.append(actual_param)
else:
processed_parameters.append(param)
if style == TOOL_FORMAT_REDUCED:
return {
"name": name,
"description": description,
"parameters": {
"properties": {
x["name"]: x.get("type", "string") for x in processed_parameters
},
"required": [
x["name"] for x in processed_parameters if x.get("required")
]
}
}
elif style == TOOL_FORMAT_FULL:
return {
"type": "function",
"function": {
"name": name,
"description": description,
"parameters": {
"type": "object",
"properties": {
x["name"]: {
"type": x.get("type", "string"),
"description": x.get("description", ""),
} for x in processed_parameters
},
"required": [
x["name"] for x in processed_parameters if x.get("required")
]
}
}
}
raise Exception(f"Unknown tool format {style}")
def _generate_icl_examples(self, num_examples, entity_names):
entity_names = entity_names[:]
entity_domains = set([x.split(".")[0] for x in entity_names])
in_context_examples = [
x for x in self.in_context_examples
if x["type"] in entity_domains
]
random.shuffle(in_context_examples)
random.shuffle(entity_names)
num_examples_to_generate = min(num_examples, len(in_context_examples))
if num_examples_to_generate < num_examples:
_LOGGER.warning(f"Attempted to generate {num_examples} ICL examples for conversation, but only {len(in_context_examples)} are available!")
examples = []
for _ in range(num_examples_to_generate):
chosen_example = in_context_examples.pop()
request = chosen_example["request"]
response = chosen_example["response"]
random_device = [ x for x in entity_names if x.split(".")[0] == chosen_example["type"] ][0]
random_area = "bedroom" # todo, pick a random area
random_brightness = round(random.random(), 2)
random_color = random.choice(list(color.COLORS.keys()))
tool_arguments = {}
if "<area>" in request:
request = request.replace("<area>", random_area)
response = response.replace("<area>", random_area)
tool_arguments["area"] = random_area
if "<name>" in request:
request = request.replace("<name>", random_device)
response = response.replace("<name>", random_device)
tool_arguments["name"] = random_device
if "<brightness>" in request:
request = request.replace("<brightness>", str(random_brightness))
response = response.replace("<brightness>", str(random_brightness))
tool_arguments["brightness"] = random_brightness
if "<color>" in request:
request = request.replace("<color>", random_color)
response = response.replace("<color>", random_color)
tool_arguments["color"] = random_color
examples.append({
"request": request,
"response": response,
"tool": {
"name": chosen_example["tool"],
"arguments": tool_arguments
}
})
return examples
def _generate_system_prompt(self, prompt_template: str, llm_api: llm.APIInstance) -> str:
"""Generate the system prompt with current entity states"""
entities_to_expose, domains = self._async_get_exposed_entities()
extra_attributes_to_expose = self.entry.options \
.get(CONF_EXTRA_ATTRIBUTES_TO_EXPOSE, DEFAULT_EXTRA_ATTRIBUTES_TO_EXPOSE)
allowed_service_call_arguments = self.entry.options \
.get(CONF_ALLOWED_SERVICE_CALL_ARGUMENTS, DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS)
def icl_example_generator(num_examples, entity_names, service_names):
entity_domains = set([x.split(".")[0] for x in entity_names])
entity_names = entity_names[:]
# filter out examples for disabled services
selected_in_context_examples = []
for x in self.in_context_examples:
if x["service"] in service_names and x["service"].split(".")[0] in entity_domains:
selected_in_context_examples.append(x)
# if we filtered everything then just sample randomly
if len(selected_in_context_examples) == 0:
selected_in_context_examples = self.in_context_examples[:]
random.shuffle(selected_in_context_examples)
random.shuffle(entity_names)
num_examples_to_generate = min(num_examples, len(selected_in_context_examples))
if num_examples_to_generate < num_examples:
_LOGGER.warning(f"Attempted to generate {num_examples} ICL examples for conversation, but only {len(selected_in_context_examples)} are available!")
for x in range(num_examples_to_generate):
chosen_example = selected_in_context_examples.pop()
chosen_service = chosen_example["service"]
device = [ x for x in entity_names if x.split(".")[0] == chosen_service.split(".")[0] ][0]
example = {
"to_say": chosen_example["response"],
"service": chosen_service,
"target_device": device,
}
yield json.dumps(example) + "\n"
def expose_attributes(attributes):
result = attributes["state"]
@@ -487,37 +658,54 @@ class LLaMAAgent(AbstractConversationAgent):
formatted_states = "\n".join(device_states) + "\n"
service_dict = self.hass.services.async_services()
all_services = []
all_service_names = []
for domain in domains:
# scripts show up as individual services
if domain == "script":
all_services.extend(["script.reload()", "script.turn_on()", "script.turn_off()", "script.toggle()"])
continue
for name, service in service_dict.get(domain, {}).items():
args = flatten_vol_schema(service.schema)
args_to_expose = set(args).intersection(allowed_service_call_arguments)
all_services.append(f"{domain}.{name}({','.join(args_to_expose)})")
all_service_names.append(f"{domain}.{name}")
formatted_services = ", ".join(all_services)
if llm_api:
if llm_api.api.id == HOME_LLM_API_ID:
service_dict = self.hass.services.async_services()
all_services = []
for domain in domains:
# scripts show up as individual services
if domain == "script":
all_services.extend(["script.reload()", "script.turn_on()", "script.turn_off()", "script.toggle()"])
continue
for name, service in service_dict.get(domain, {}).items():
args = flatten_vol_schema(service.schema)
args_to_expose = set(args).intersection(ALLOWED_LEGACY_SERVICE_CALL_ARGUMENTS)
service_schema = vol.Schema({
vol.Optional(arg): str for arg in args_to_expose
})
all_services.append((f"{domain}.{name}", service_schema, ""))
tools = [
self._format_tool(*tool)
for tool in all_services
]
else:
tools = [
self._format_tool(tool.name, tool.parameters, tool.description)
for tool in llm_api.tools
]
else:
tools = "No tools were provided. If the user requests you interact with a device, tell them you are unable to do so."
render_variables = {
"devices": formatted_states,
"services": formatted_services,
"tools": tools,
"response_examples": []
}
if self.in_context_examples:
# only pass examples if there are loaded examples + an API was exposed
if self.in_context_examples and llm_api:
num_examples = int(self.entry.options.get(CONF_NUM_IN_CONTEXT_EXAMPLES, DEFAULT_NUM_IN_CONTEXT_EXAMPLES))
render_variables["response_examples"] = "\n".join(icl_example_generator(num_examples, list(entities_to_expose.keys()), all_service_names))
render_variables["response_examples"] = self._generate_icl_examples(num_examples, list(entities_to_expose.keys()))
return template.Template(prompt_template, self.hass).async_render(
render_variables,
parse_result=False,
)
class LocalLLaMAAgent(LLaMAAgent):
class LlamaCppAgent(LocalLLMAgent):
model_path: str
llm: LlamaType
grammar: Any
@@ -612,7 +800,7 @@ class LocalLLaMAAgent(LLaMAAgent):
self.grammar = None
def _update_options(self):
LLaMAAgent._update_options(self)
LocalLLMAgent._update_options(self)
model_reloaded = False
if self.loaded_model_settings[CONF_CONTEXT_LENGTH] != self.entry.options.get(CONF_CONTEXT_LENGTH, DEFAULT_CONTEXT_LENGTH) or \
@@ -662,7 +850,7 @@ class LocalLLaMAAgent(LLaMAAgent):
def _async_get_exposed_entities(self) -> tuple[dict[str, str], list[str]]:
"""Takes the super class function results and sorts the entities with the recently updated at the end"""
entities, domains = LLaMAAgent._async_get_exposed_entities(self)
entities, domains = LocalLLMAgent._async_get_exposed_entities(self)
# ignore sorting if prompt caching is disabled
if not self.entry.options.get(CONF_PROMPT_CACHING_ENABLED, DEFAULT_PROMPT_CACHING_ENABLED):
@@ -715,13 +903,23 @@ class LocalLLaMAAgent(LLaMAAgent):
if entity:
self.last_updated_entities[entity] = refresh_start
llm_api: llm.APIInstance | None = None
if self.entry.options.get(CONF_LLM_HASS_API):
try:
llm_api = await llm.async_get_api(
self.hass, self.entry.options[CONF_LLM_HASS_API]
)
except HomeAssistantError:
_LOGGER.exception("Failed to get LLM API when caching prompt!")
return
_LOGGER.debug(f"refreshing cached prompt because {entity} changed...")
await self.hass.async_add_executor_job(self._cache_prompt)
await self.hass.async_add_executor_job(self._cache_prompt, llm_api)
refresh_end = time.time()
_LOGGER.debug(f"cache refresh took {(refresh_end - refresh_start):.2f} sec")
def _cache_prompt(self) -> None:
def _cache_prompt(self, llm_api: llm.API) -> None:
# if a refresh is already scheduled then exit
if self.cache_refresh_after_cooldown:
return
@@ -742,11 +940,10 @@ class LocalLLaMAAgent(LLaMAAgent):
try:
raw_prompt = self.entry.options.get(CONF_PROMPT, DEFAULT_PROMPT)
prompt = self._format_prompt([
{ "role": "system", "message": self._generate_system_prompt(raw_prompt)},
{ "role": "system", "message": self._generate_system_prompt(raw_prompt, llm_api)},
{ "role": "user", "message": "" }
], include_generation_prompt=False)
input_tokens = self.llm.tokenize(
prompt.encode(), add_bos=False
)
@@ -839,7 +1036,7 @@ class LocalLLaMAAgent(LLaMAAgent):
return result
class GenericOpenAIAPIAgent(LLaMAAgent):
class GenericOpenAIAPIAgent(LocalLLMAgent):
api_host: str
api_key: str
model_name: str
@@ -1046,7 +1243,7 @@ class LlamaCppPythonAPIAgent(GenericOpenAIAPIAgent):
return endpoint, request_params
class OllamaAPIAgent(LLaMAAgent):
class OllamaAPIAgent(LocalLLMAgent):
api_host: str
api_key: str
model_name: str

View File

@@ -1,4 +1,4 @@
"""Config flow for Local LLaMA Conversation integration."""
"""Config flow for Local LLM Conversation integration."""
from __future__ import annotations
import os
@@ -13,18 +13,20 @@ import voluptuous as vol
from homeassistant import config_entries
from homeassistant.core import HomeAssistant
from homeassistant.const import CONF_HOST, CONF_PORT, CONF_SSL, UnitOfTime
from homeassistant.const import CONF_HOST, CONF_PORT, CONF_SSL, CONF_LLM_HASS_API, UnitOfTime
from homeassistant.data_entry_flow import (
AbortFlow,
FlowHandler,
FlowManager,
FlowResult,
)
from homeassistant.helpers import llm
from homeassistant.helpers.selector import (
NumberSelector,
NumberSelectorConfig,
NumberSelectorMode,
TemplateSelector,
SelectOptionDict,
SelectSelector,
SelectSelectorConfig,
SelectSelectorMode,
@@ -36,7 +38,7 @@ from homeassistant.helpers.selector import (
from homeassistant.util.package import is_installed
from importlib.metadata import version
from .utils import download_model_from_hf, install_llama_cpp_python
from .utils import download_model_from_hf, install_llama_cpp_python, MissingQuantizationException
from .const import (
CONF_CHAT_MODEL,
CONF_MAX_TOKENS,
@@ -54,11 +56,12 @@ from .const import (
CONF_DOWNLOADED_MODEL_QUANTIZATION,
CONF_DOWNLOADED_MODEL_QUANTIZATION_OPTIONS,
CONF_PROMPT_TEMPLATE,
CONF_TOOL_FORMAT,
CONF_TOOL_MULTI_TURN_CHAT,
CONF_ENABLE_FLASH_ATTENTION,
CONF_USE_GBNF_GRAMMAR,
CONF_GBNF_GRAMMAR_FILE,
CONF_EXTRA_ATTRIBUTES_TO_EXPOSE,
CONF_ALLOWED_SERVICE_CALL_ARGUMENTS,
CONF_TEXT_GEN_WEBUI_PRESET,
CONF_REFRESH_SYSTEM_PROMPT,
CONF_REMEMBER_CONVERSATION,
@@ -94,11 +97,12 @@ from .const import (
DEFAULT_BACKEND_TYPE,
DEFAULT_DOWNLOADED_MODEL_QUANTIZATION,
DEFAULT_PROMPT_TEMPLATE,
DEFAULT_TOOL_FORMAT,
DEFAULT_TOOL_MULTI_TURN_CHAT,
DEFAULT_ENABLE_FLASH_ATTENTION,
DEFAULT_USE_GBNF_GRAMMAR,
DEFAULT_GBNF_GRAMMAR_FILE,
DEFAULT_EXTRA_ATTRIBUTES_TO_EXPOSE,
DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS,
DEFAULT_REFRESH_SYSTEM_PROMPT,
DEFAULT_REMEMBER_CONVERSATION,
DEFAULT_REMEMBER_NUM_INTERACTIONS,
@@ -123,16 +127,22 @@ from .const import (
BACKEND_TYPE_LLAMA_CPP_PYTHON_SERVER,
BACKEND_TYPE_OLLAMA,
PROMPT_TEMPLATE_DESCRIPTIONS,
TOOL_FORMAT_FULL,
TOOL_FORMAT_REDUCED,
TOOL_FORMAT_MINIMAL,
TEXT_GEN_WEBUI_CHAT_MODE_CHAT,
TEXT_GEN_WEBUI_CHAT_MODE_INSTRUCT,
TEXT_GEN_WEBUI_CHAT_MODE_CHAT_INSTRUCT,
DOMAIN,
HOME_LLM_API_ID,
DEFAULT_OPTIONS,
OPTIONS_OVERRIDES,
RECOMMENDED_CHAT_MODELS,
EMBEDDED_LLAMA_CPP_PYTHON_VERSION
)
from . import HomeLLMAPI
_LOGGER = logging.getLogger(__name__)
def is_local_backend(backend):
@@ -172,7 +182,7 @@ def STEP_LOCAL_SETUP_EXISTING_DATA_SCHEMA(model_file=None, selected_language=Non
}
)
def STEP_LOCAL_SETUP_DOWNLOAD_DATA_SCHEMA(*, chat_model=None, downloaded_model_quantization=None, selected_language=None):
def STEP_LOCAL_SETUP_DOWNLOAD_DATA_SCHEMA(*, chat_model=None, downloaded_model_quantization=None, selected_language=None, available_quantizations=None):
return vol.Schema(
{
vol.Required(CONF_CHAT_MODEL, default=chat_model if chat_model else DEFAULT_CHAT_MODEL): SelectSelector(SelectSelectorConfig(
@@ -181,7 +191,7 @@ def STEP_LOCAL_SETUP_DOWNLOAD_DATA_SCHEMA(*, chat_model=None, downloaded_model_q
multiple=False,
mode=SelectSelectorMode.DROPDOWN,
)),
vol.Required(CONF_DOWNLOADED_MODEL_QUANTIZATION, default=downloaded_model_quantization if downloaded_model_quantization else DEFAULT_DOWNLOADED_MODEL_QUANTIZATION): vol.In(CONF_DOWNLOADED_MODEL_QUANTIZATION_OPTIONS),
vol.Required(CONF_DOWNLOADED_MODEL_QUANTIZATION, default=downloaded_model_quantization if downloaded_model_quantization else DEFAULT_DOWNLOADED_MODEL_QUANTIZATION): vol.In(available_quantizations if available_quantizations else CONF_DOWNLOADED_MODEL_QUANTIZATION_OPTIONS),
vol.Required(CONF_SELECTED_LANGUAGE, default=selected_language if selected_language else "en"): SelectSelector(SelectSelectorConfig(
options=CONF_SELECTED_LANGUAGE_OPTIONS,
translation_key=CONF_SELECTED_LANGUAGE,
@@ -279,9 +289,9 @@ class BaseLlamaConversationConfigFlow(FlowHandler, ABC):
""" Finish configuration """
class ConfigFlow(BaseLlamaConversationConfigFlow, config_entries.ConfigFlow, domain=DOMAIN):
"""Handle a config flow for Local LLaMA Conversation."""
"""Handle a config flow for Local LLM Conversation."""
VERSION = 1
VERSION = 2
install_wheel_task = None
install_wheel_error = None
download_task = None
@@ -305,6 +315,11 @@ class ConfigFlow(BaseLlamaConversationConfigFlow, config_entries.ConfigFlow, dom
"""Handle the initial step."""
self.model_config = {}
self.options = {}
# make sure the API is registered
if not any([x.id == HOME_LLM_API_ID for x in llm.async_get_apis(self.hass)]):
llm.async_register_api(self.hass, HomeLLMAPI(self.hass))
return await self.async_step_pick_backend()
async def async_step_pick_backend(
@@ -387,13 +402,35 @@ class ConfigFlow(BaseLlamaConversationConfigFlow, config_entries.ConfigFlow, dom
raise ValueError()
if self.download_error:
errors["base"] = "download_failed"
description_placeholders["exception"] = str(self.download_error)
schema = STEP_LOCAL_SETUP_DOWNLOAD_DATA_SCHEMA(
chat_model=self.model_config[CONF_CHAT_MODEL],
downloaded_model_quantization=self.model_config[CONF_DOWNLOADED_MODEL_QUANTIZATION],
selected_language=self.selected_language
)
if isinstance(self.download_error, MissingQuantizationException):
available_quants = list(set(self.download_error.available_quants).intersection(set(CONF_DOWNLOADED_MODEL_QUANTIZATION_OPTIONS)))
if len(available_quants) == 0:
errors["base"] = "no_supported_ggufs"
schema = STEP_LOCAL_SETUP_DOWNLOAD_DATA_SCHEMA(
chat_model=self.model_config[CONF_CHAT_MODEL],
downloaded_model_quantization=self.model_config[CONF_DOWNLOADED_MODEL_QUANTIZATION],
selected_language=self.selected_language
)
else:
errors["base"] = "missing_quantization"
description_placeholders["missing"] = self.download_error.missing_quant
description_placeholders["available"] = ", ".join(self.download_error.available_quants)
schema = STEP_LOCAL_SETUP_DOWNLOAD_DATA_SCHEMA(
chat_model=self.model_config[CONF_CHAT_MODEL],
downloaded_model_quantization=self.download_error.available_quants[0],
selected_language=self.selected_language,
available_quantizations=available_quants,
)
else:
errors["base"] = "download_failed"
description_placeholders["exception"] = str(self.download_error)
schema = STEP_LOCAL_SETUP_DOWNLOAD_DATA_SCHEMA(
chat_model=self.model_config[CONF_CHAT_MODEL],
downloaded_model_quantization=self.model_config[CONF_DOWNLOADED_MODEL_QUANTIZATION],
selected_language=self.selected_language
)
if user_input and "result" not in user_input:
self.selected_language = user_input.pop(CONF_SELECTED_LANGUAGE, self.hass.config.language)
@@ -584,13 +621,17 @@ class ConfigFlow(BaseLlamaConversationConfigFlow, config_entries.ConfigFlow, dom
persona = PERSONA_PROMPTS.get(self.selected_language, PERSONA_PROMPTS.get("en"))
selected_default_options[CONF_PROMPT] = selected_default_options[CONF_PROMPT].replace("<persona>", persona)
schema = vol.Schema(local_llama_config_option_schema(selected_default_options, backend_type))
schema = vol.Schema(local_llama_config_option_schema(self.hass, selected_default_options, backend_type))
if user_input:
self.options = user_input
if user_input[CONF_LLM_HASS_API] == "none":
user_input.pop(CONF_LLM_HASS_API)
try:
# validate input
schema(user_input)
self.options = user_input
return await self.async_step_finish()
except Exception as ex:
_LOGGER.exception("An unknown error has occurred!")
@@ -626,7 +667,7 @@ class ConfigFlow(BaseLlamaConversationConfigFlow, config_entries.ConfigFlow, dom
class OptionsFlow(config_entries.OptionsFlow):
"""Local LLaMA config flow options handler."""
"""Local LLM config flow options handler."""
def __init__(self, config_entry: config_entries.ConfigEntry) -> None:
"""Initialize options flow."""
@@ -655,10 +696,14 @@ class OptionsFlow(config_entries.OptionsFlow):
errors["base"] = "missing_icl_file"
description_placeholders["filename"] = filename
if user_input[CONF_LLM_HASS_API] == "none":
user_input.pop(CONF_LLM_HASS_API)
if len(errors) == 0:
return self.async_create_entry(title="LLaMA Conversation", data=user_input)
return self.async_create_entry(title="Local LLM Conversation", data=user_input)
schema = local_llama_config_option_schema(
self.hass,
self.config_entry.options,
self.config_entry.data[CONF_BACKEND_TYPE],
)
@@ -682,12 +727,31 @@ def insert_after_key(input_dict: dict, key_name: str, other_dict: dict):
return result
def local_llama_config_option_schema(options: MappingProxyType[str, Any], backend_type: str) -> dict:
"""Return a schema for Local LLaMA completion options."""
def local_llama_config_option_schema(hass: HomeAssistant, options: MappingProxyType[str, Any], backend_type: str) -> dict:
"""Return a schema for Local LLM completion options."""
if not options:
options = DEFAULT_OPTIONS
apis: list[SelectOptionDict] = [
SelectOptionDict(
label="No control",
value="none",
)
]
apis.extend(
SelectOptionDict(
label=api.name,
value=api.id,
)
for api in llm.async_get_apis(hass)
)
result = {
vol.Optional(
CONF_LLM_HASS_API,
description={"suggested_value": options.get(CONF_LLM_HASS_API)},
default="none",
): SelectSelector(SelectSelectorConfig(options=apis)),
vol.Required(
CONF_PROMPT,
description={"suggested_value": options.get(CONF_PROMPT)},
@@ -703,6 +767,21 @@ def local_llama_config_option_schema(options: MappingProxyType[str, Any], backen
multiple=False,
mode=SelectSelectorMode.DROPDOWN,
)),
vol.Required(
CONF_TOOL_FORMAT,
description={"suggested_value": options.get(CONF_TOOL_FORMAT)},
default=DEFAULT_TOOL_FORMAT,
): SelectSelector(SelectSelectorConfig(
options=[TOOL_FORMAT_FULL, TOOL_FORMAT_REDUCED, TOOL_FORMAT_MINIMAL],
translation_key=CONF_TOOL_FORMAT,
multiple=False,
mode=SelectSelectorMode.DROPDOWN,
)),
vol.Required(
CONF_TOOL_MULTI_TURN_CHAT,
description={"suggested_value": options.get(CONF_TOOL_MULTI_TURN_CHAT)},
default=DEFAULT_TOOL_MULTI_TURN_CHAT,
): BooleanSelector(BooleanSelectorConfig()),
vol.Required(
CONF_USE_IN_CONTEXT_LEARNING_EXAMPLES,
description={"suggested_value": options.get(CONF_USE_IN_CONTEXT_LEARNING_EXAMPLES)},
@@ -728,11 +807,6 @@ def local_llama_config_option_schema(options: MappingProxyType[str, Any], backen
description={"suggested_value": options.get(CONF_EXTRA_ATTRIBUTES_TO_EXPOSE)},
default=DEFAULT_EXTRA_ATTRIBUTES_TO_EXPOSE,
): TextSelector(TextSelectorConfig(multiple=True)),
vol.Required(
CONF_ALLOWED_SERVICE_CALL_ARGUMENTS,
description={"suggested_value": options.get(CONF_ALLOWED_SERVICE_CALL_ARGUMENTS)},
default=DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS,
): TextSelector(TextSelectorConfig(multiple=True)),
vol.Required(
CONF_SERVICE_CALL_REGEX,
description={"suggested_value": options.get(CONF_SERVICE_CALL_REGEX)},

View File

@@ -1,7 +1,9 @@
"""Constants for the LLaMa Conversation integration."""
"""Constants for the Local LLM Conversation integration."""
import types, os
DOMAIN = "llama_conversation"
HOME_LLM_API_ID = "home-llm-service-api"
SERVICE_TOOL_NAME = "HassCallService"
CONF_PROMPT = "prompt"
PERSONA_PROMPTS = {
"en": "You are 'Al', a helpful AI Assistant that controls the devices in a house. Complete the following task as instructed with the information provided only.",
@@ -11,16 +13,26 @@ PERSONA_PROMPTS = {
}
DEFAULT_PROMPT_BASE = """<persona>
The current time and date is {{ (as_timestamp(now()) | timestamp_custom("%I:%M %p on %A %B %d, %Y", "")) }}
Services: {{ services }}
Tools: {{ tools | to_json }}
Devices:
{{ devices }}"""
DEFAULT_PROMPT_BASE_LEGACY = """<persona>
The current time and date is {{ (as_timestamp(now()) | timestamp_custom("%I:%M %p on %A %B %d, %Y", "")) }}
Services: {{ tools | join(", ") }}
Devices:
{{ devices }}"""
ICL_EXTRAS = """
Respond to the following user instruction by responding in the same format as the following examples:
{{ response_examples }}"""
{% for item in response_examples %}
{{ item.request }}
{{ item.response }}
<functioncall> {{ item.tool | to_json }}
{% endfor %}"""
ICL_NO_SYSTEM_PROMPT_EXTRAS = """
Respond to the following user instruction by responding in the same format as the following examples:
{{ response_examples }}
{% for item in response_examples %}
{{ item.request }}
{{ item.response }}
<functioncall> {{ item.tool | to_json }}
{% endfor %}
User instruction:"""
DEFAULT_PROMPT = DEFAULT_PROMPT_BASE + ICL_EXTRAS
CONF_CHAT_MODEL = "huggingface_model"
@@ -51,7 +63,12 @@ DEFAULT_BACKEND_TYPE = BACKEND_TYPE_LLAMA_HF
CONF_SELECTED_LANGUAGE = "selected_language"
CONF_SELECTED_LANGUAGE_OPTIONS = [ "en", "de", "fr", "es" ]
CONF_DOWNLOADED_MODEL_QUANTIZATION = "downloaded_model_quantization"
CONF_DOWNLOADED_MODEL_QUANTIZATION_OPTIONS = ["F16", "Q8_0", "Q5_K_M", "Q4_K_M", "Q3_K_M"]
CONF_DOWNLOADED_MODEL_QUANTIZATION_OPTIONS = [
"Q4_0", "Q4_1", "Q5_0", "Q5_1", "IQ2_XXS", "IQ2_XS", "IQ2_S", "IQ2_M", "IQ1_S", "IQ1_M",
"Q2_K", "Q2_K_S", "IQ3_XXS", "IQ3_S", "IQ3_M", "Q3_K", "IQ3_XS", "Q3_K_S", "Q3_K_M", "Q3_K_L",
"IQ4_NL", "IQ4_XS", "Q4_K", "Q4_K_S", "Q4_K_M", "Q5_K", "Q5_K_S", "Q5_K_M", "Q6_K", "Q8_0",
"F16", "BF16", "F32"
]
DEFAULT_DOWNLOADED_MODEL_QUANTIZATION = "Q4_K_M"
CONF_DOWNLOADED_MODEL_FILE = "downloaded_model_file"
DEFAULT_DOWNLOADED_MODEL_FILE = ""
@@ -59,8 +76,7 @@ DEFAULT_PORT = "5000"
DEFAULT_SSL = False
CONF_EXTRA_ATTRIBUTES_TO_EXPOSE = "extra_attributes_to_expose"
DEFAULT_EXTRA_ATTRIBUTES_TO_EXPOSE = ["rgb_color", "brightness", "temperature", "humidity", "fan_mode", "media_title", "volume_level", "item", "wind_speed"]
CONF_ALLOWED_SERVICE_CALL_ARGUMENTS = "allowed_service_call_arguments"
DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS = ["rgb_color", "brightness", "temperature", "humidity", "fan_mode", "hvac_mode", "preset_mode", "item", "duration"]
ALLOWED_LEGACY_SERVICE_CALL_ARGUMENTS = ["rgb_color", "brightness", "temperature", "humidity", "fan_mode", "hvac_mode", "preset_mode", "item", "duration"]
CONF_PROMPT_TEMPLATE = "prompt_template"
PROMPT_TEMPLATE_CHATML = "chatml"
PROMPT_TEMPLATE_COMMAND_R = "command-r"
@@ -78,6 +94,7 @@ PROMPT_TEMPLATE_DESCRIPTIONS = {
"system": { "prefix": "<|im_start|>system\n", "suffix": "<|im_end|>" },
"user": { "prefix": "<|im_start|>user\n", "suffix": "<|im_end|>" },
"assistant": { "prefix": "<|im_start|>assistant\n", "suffix": "<|im_end|>" },
"tool": { "prefix": "<|im_start|>tool", "suffix": "<|im_end|>" },
"generation_prompt": "<|im_start|>assistant"
},
PROMPT_TEMPLATE_COMMAND_R: {
@@ -134,6 +151,13 @@ PROMPT_TEMPLATE_DESCRIPTIONS = {
"generation_prompt": "<|start_header_id|>assistant<|end_header_id|>\n\n"
}
}
CONF_TOOL_FORMAT = "tool_format"
TOOL_FORMAT_FULL = "full_tool_format"
TOOL_FORMAT_REDUCED = "reduced_tool_format"
TOOL_FORMAT_MINIMAL = "min_tool_format"
DEFAULT_TOOL_FORMAT = TOOL_FORMAT_FULL
CONF_TOOL_MULTI_TURN_CHAT = "tool_multi_turn_chat"
DEFAULT_TOOL_MULTI_TURN_CHAT = False
CONF_ENABLE_FLASH_ATTENTION = "enable_flash_attention"
DEFAULT_ENABLE_FLASH_ATTENTION = False
CONF_USE_GBNF_GRAMMAR = "gbnf_grammar"
@@ -149,7 +173,7 @@ DEFAULT_NUM_IN_CONTEXT_EXAMPLES = 4
CONF_TEXT_GEN_WEBUI_PRESET = "text_generation_webui_preset"
CONF_OPENAI_API_KEY = "openai_api_key"
CONF_TEXT_GEN_WEBUI_ADMIN_KEY = "text_generation_webui_admin_key"
CONF_REFRESH_SYSTEM_PROMPT = "refresh_prompt_per_tern"
CONF_REFRESH_SYSTEM_PROMPT = "refresh_prompt_per_turn"
DEFAULT_REFRESH_SYSTEM_PROMPT = True
CONF_REMEMBER_CONVERSATION = "remember_conversation"
DEFAULT_REMEMBER_CONVERSATION = True
@@ -160,7 +184,7 @@ DEFAULT_PROMPT_CACHING_ENABLED = False
CONF_PROMPT_CACHING_INTERVAL = "prompt_caching_interval"
DEFAULT_PROMPT_CACHING_INTERVAL = 30
CONF_SERVICE_CALL_REGEX = "service_call_regex"
DEFAULT_SERVICE_CALL_REGEX = r"({[\S \t]*?})"
DEFAULT_SERVICE_CALL_REGEX = r"<functioncall> ({[\S \t]*})"
FINE_TUNED_SERVICE_CALL_REGEX = r"```homeassistant\n([\S \t\n]*?)```"
CONF_REMOTE_USE_CHAT_ENDPOINT = "remote_use_chat_endpoint"
DEFAULT_REMOTE_USE_CHAT_ENDPOINT = False
@@ -197,7 +221,6 @@ DEFAULT_OPTIONS = types.MappingProxyType(
CONF_ENABLE_FLASH_ATTENTION: DEFAULT_ENABLE_FLASH_ATTENTION,
CONF_USE_GBNF_GRAMMAR: DEFAULT_USE_GBNF_GRAMMAR,
CONF_EXTRA_ATTRIBUTES_TO_EXPOSE: DEFAULT_EXTRA_ATTRIBUTES_TO_EXPOSE,
CONF_ALLOWED_SERVICE_CALL_ARGUMENTS: DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS,
CONF_REFRESH_SYSTEM_PROMPT: DEFAULT_REFRESH_SYSTEM_PROMPT,
CONF_REMEMBER_CONVERSATION: DEFAULT_REMEMBER_CONVERSATION,
CONF_REMEMBER_NUM_INTERACTIONS: DEFAULT_REMEMBER_NUM_INTERACTIONS,
@@ -219,48 +242,44 @@ DEFAULT_OPTIONS = types.MappingProxyType(
)
OPTIONS_OVERRIDES = {
"home-3b-v4": {
CONF_PROMPT: DEFAULT_PROMPT_BASE,
CONF_PROMPT_TEMPLATE: PROMPT_TEMPLATE_ZEPHYR,
CONF_USE_IN_CONTEXT_LEARNING_EXAMPLES: False,
CONF_SERVICE_CALL_REGEX: FINE_TUNED_SERVICE_CALL_REGEX,
CONF_USE_GBNF_GRAMMAR: True,
},
"home-3b-v3": {
CONF_PROMPT: DEFAULT_PROMPT_BASE,
CONF_PROMPT: DEFAULT_PROMPT_BASE_LEGACY,
CONF_PROMPT_TEMPLATE: PROMPT_TEMPLATE_ZEPHYR,
CONF_USE_IN_CONTEXT_LEARNING_EXAMPLES: False,
CONF_SERVICE_CALL_REGEX: FINE_TUNED_SERVICE_CALL_REGEX,
CONF_USE_GBNF_GRAMMAR: True,
CONF_TOOL_FORMAT: TOOL_FORMAT_MINIMAL,
},
"home-3b-v2": {
CONF_PROMPT: DEFAULT_PROMPT_BASE,
CONF_PROMPT: DEFAULT_PROMPT_BASE_LEGACY,
CONF_USE_IN_CONTEXT_LEARNING_EXAMPLES: False,
CONF_SERVICE_CALL_REGEX: FINE_TUNED_SERVICE_CALL_REGEX,
CONF_USE_GBNF_GRAMMAR: True,
CONF_TOOL_FORMAT: TOOL_FORMAT_MINIMAL,
},
"home-3b-v1": {
CONF_PROMPT: DEFAULT_PROMPT_BASE,
CONF_PROMPT: DEFAULT_PROMPT_BASE_LEGACY,
CONF_PROMPT_TEMPLATE: PROMPT_TEMPLATE_ZEPHYR,
CONF_USE_IN_CONTEXT_LEARNING_EXAMPLES: False,
CONF_SERVICE_CALL_REGEX: FINE_TUNED_SERVICE_CALL_REGEX,
CONF_TOOL_FORMAT: TOOL_FORMAT_MINIMAL,
},
"home-1b-v3": {
CONF_PROMPT: DEFAULT_PROMPT_BASE,
CONF_PROMPT: DEFAULT_PROMPT_BASE_LEGACY,
CONF_PROMPT_TEMPLATE: PROMPT_TEMPLATE_ZEPHYR2,
CONF_USE_IN_CONTEXT_LEARNING_EXAMPLES: False,
CONF_SERVICE_CALL_REGEX: FINE_TUNED_SERVICE_CALL_REGEX,
CONF_USE_GBNF_GRAMMAR: True,
CONF_TOOL_FORMAT: TOOL_FORMAT_MINIMAL,
},
"home-1b-v2": {
CONF_PROMPT: DEFAULT_PROMPT_BASE,
CONF_PROMPT: DEFAULT_PROMPT_BASE_LEGACY,
CONF_USE_IN_CONTEXT_LEARNING_EXAMPLES: False,
CONF_SERVICE_CALL_REGEX: FINE_TUNED_SERVICE_CALL_REGEX,
CONF_TOOL_FORMAT: TOOL_FORMAT_MINIMAL,
},
"home-1b-v1": {
CONF_PROMPT: DEFAULT_PROMPT_BASE,
CONF_PROMPT: DEFAULT_PROMPT_BASE_LEGACY,
CONF_USE_IN_CONTEXT_LEARNING_EXAMPLES: False,
CONF_SERVICE_CALL_REGEX: FINE_TUNED_SERVICE_CALL_REGEX,
CONF_TOOL_FORMAT: TOOL_FORMAT_MINIMAL,
},
"mistral": {
CONF_PROMPT: DEFAULT_PROMPT_BASE + ICL_NO_SYSTEM_PROMPT_EXTRAS,
@@ -296,5 +315,5 @@ OPTIONS_OVERRIDES = {
}
}
INTEGRATION_VERSION = "0.2.17"
EMBEDDED_LLAMA_CPP_PYTHON_VERSION = "0.2.70"
INTEGRATION_VERSION = "0.3"
EMBEDDED_LLAMA_CPP_PYTHON_VERSION = "0.2.77"

View File

@@ -1,37 +1,22 @@
service,response
fan.turn_on,Turning on the fan for you.
fan.turn_off,Switching off the fan as requested.
fan.toggle,I'll toggle the fan's state for you.
fan.increase_speed,Increasing the fan speed for you.
fan.decrease_speed,Reducing the fan speed as you requested.
cover.open_cover,Opening the garage door for you.
cover.close_cover,Closing the garage door as requested.
cover.stop_cover,Stopping the garage door now.
cover.toggle,Toggling the garage door state for you.
light.turn_on,Turning on the light for you.
light.turn_off,Turning off the light as requested.
light.toggle,Toggling the light for you.
lock.lock,Locking the door for you.
lock.unlock,Unlocking the door as you requested.
media_player.turn_on,Turning on the media player for you.
media_player.turn_off,Turning off the media player as requested.
media_player.toggle,Toggling the media player for you.
media_player.volume_up,Increasing the volume for you.
media_player.volume_down,Reducing the volume as you requested.
media_player.volume_mute,Muting the volume for you.
media_player.media_play_pause,Toggling play/pause on the media player.
media_player.media_play,Starting media playback.
media_player.media_pause,Pausing the media playback.
media_player.media_stop,Stopping the media playback.
media_player.media_next_track,Skipping to the next track.
media_player.media_previous_track,Going back to the previous track.
switch.turn_on,Turning on the switch for you.
switch.turn_off,Turning off the switch as requested.
switch.toggle,Toggling the switch for you.
vacuum.start,Starting the vacuum now.
vacuum.stop,Stopping the vacuum.
vacuum.pause,Pausing the vacuum for now.
vacuum.return_to_base,Sending the vacuum back to its base.
todo.add_item,the todo has been added to your todo list.
timer.start,Starting timer now.
timer.cancel,Timer has been canceled.
type,request,tool,response
fan,Turn on the <name>,HassTurnOn,Turning on the fan for you.
fan,Turn on the fans in the <area>,HassTurnOn,Turning on the fans in <area>
fan,Turn off the <name>,HassTurnOff,Switching off the fan as requested.
fan,Toggle the fan <name> for me,HassToggle,I'll toggle the fan's state for you.
cover,Can you open the <name>?,HassOpenCover,Opening the garage door for you.
cover,Close the <name> now,HassCloseCover,Closing the garage door as requested.
light,Turn the <name> light on,HassTurnOn,Turning on the light for you.
light,Lights off in <area>,HassTurnOff,Turning off the light as requested.
light,Hit toggle for the lights in <area>,HassToggle,Toggling the lights in the <area> for you.
light,Set the brightness for <name> to <brightness>,HassLightSet,Setting the brightness now.
light,Make the lights in <area> <color>,HassLightSet,The color should be changed now.
media_player,Can you turn on <name>,HassTurnOn,Turning on the media player for you.
media_player,<name> should be turned off,HassTurnOff,Turning off the media player as requested.
media_player,Can you press play on <name>,HassMediaUnpause,Starting media playback.
media_player,Pause the <name>,HassMediaPause,Pausing the media playback.
media_player,Play the next thing on <name>,HassMediaNext,Skipping to the next track.
switch,Turn on the <name>,HassTurnOn,Turning on the switch for you.
switch,Turn off the switches in <area>,HassTurnOff,Turning off the devices as requested.
switch,Toggle the switch <name>,HassToggle,Toggling the switch for you.
vacuum,Start the vacuum called <name>,HassVacuumStart,Starting the vacuum now.
vacuum,Stop the <name> vacuum,HassVacuumReturnToBase,Sending the vacuum back to its base.
1 service type request tool response
2 fan.turn_on fan Turn on the <name> HassTurnOn Turning on the fan for you.
3 fan.turn_off fan Turn on the fans in the <area> HassTurnOn Switching off the fan as requested. Turning on the fans in <area>
4 fan.toggle fan Turn off the <name> HassTurnOff I'll toggle the fan's state for you. Switching off the fan as requested.
5 fan.increase_speed fan Toggle the fan <name> for me HassToggle Increasing the fan speed for you. I'll toggle the fan's state for you.
6 fan.decrease_speed cover Can you open the <name>? HassOpenCover Reducing the fan speed as you requested. Opening the garage door for you.
7 cover.open_cover cover Close the <name> now HassCloseCover Opening the garage door for you. Closing the garage door as requested.
8 cover.close_cover light Turn the <name> light on HassTurnOn Closing the garage door as requested. Turning on the light for you.
9 cover.stop_cover light Lights off in <area> HassTurnOff Stopping the garage door now. Turning off the light as requested.
10 cover.toggle light Hit toggle for the lights in <area> HassToggle Toggling the garage door state for you. Toggling the lights in the <area> for you.
11 light.turn_on light Set the brightness for <name> to <brightness> HassLightSet Turning on the light for you. Setting the brightness now.
12 light.turn_off light Make the lights in <area> <color> HassLightSet Turning off the light as requested. The color should be changed now.
13 light.toggle media_player Can you turn on <name> HassTurnOn Toggling the light for you. Turning on the media player for you.
14 lock.lock media_player <name> should be turned off HassTurnOff Locking the door for you. Turning off the media player as requested.
15 lock.unlock media_player Can you press play on <name> HassMediaUnpause Unlocking the door as you requested. Starting media playback.
16 media_player.turn_on media_player Pause the <name> HassMediaPause Turning on the media player for you. Pausing the media playback.
17 media_player.turn_off media_player Play the next thing on <name> HassMediaNext Turning off the media player as requested. Skipping to the next track.
18 media_player.toggle switch Turn on the <name> HassTurnOn Toggling the media player for you. Turning on the switch for you.
19 media_player.volume_up switch Turn off the switches in <area> HassTurnOff Increasing the volume for you. Turning off the devices as requested.
20 media_player.volume_down switch Toggle the switch <name> HassToggle Reducing the volume as you requested. Toggling the switch for you.
21 media_player.volume_mute vacuum Start the vacuum called <name> HassVacuumStart Muting the volume for you. Starting the vacuum now.
22 media_player.media_play_pause vacuum Stop the <name> vacuum HassVacuumReturnToBase Toggling play/pause on the media player. Sending the vacuum back to its base.
media_player.media_play Starting media playback.
media_player.media_pause Pausing the media playback.
media_player.media_stop Stopping the media playback.
media_player.media_next_track Skipping to the next track.
media_player.media_previous_track Going back to the previous track.
switch.turn_on Turning on the switch for you.
switch.turn_off Turning off the switch as requested.
switch.toggle Toggling the switch for you.
vacuum.start Starting the vacuum now.
vacuum.stop Stopping the vacuum.
vacuum.pause Pausing the vacuum for now.
vacuum.return_to_base Sending the vacuum back to its base.
todo.add_item the todo has been added to your todo list.
timer.start Starting timer now.
timer.cancel Timer has been canceled.

View File

@@ -1,6 +1,6 @@
{
"domain": "llama_conversation",
"name": "LLaMA Conversation",
"name": "Local LLM Conversation",
"version": "0.2.17",
"codeowners": ["@acon96"],
"config_flow": true,

View File

@@ -1,29 +0,0 @@
root ::= (tosay "\n")+ functioncalls?
tosay ::= [0-9a-zA-Z #%.?!]*
functioncalls ::=
"```homeassistant\n" (object ws)* "```"
value ::= object | array | string | number | ("true" | "false" | "null") ws
object ::=
"{" ws (
string ":" ws value
("," ws string ":" ws value)*
)? "}" ws
array ::=
"[" ws (
value
("," ws value)*
)? "]" ws
string ::=
"\"" (
[^"\\] |
"\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
)* "\"" ws
number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
# Optional space: by convention, applied in this grammar after literal chars when allowed
ws ::= ([ \t\n] ws)?

View File

@@ -2,6 +2,8 @@
"config": {
"error": {
"download_failed": "The download failed to complete: {exception}",
"missing_quantization": "The GGUF quantization level {missing} does not exist in the provided HuggingFace repo. The following quantization levels were found: {available}",
"no_supported_ggufs": "The provided HuggingFace repo does not contain any compatible GGUF files!",
"failed_to_connect": "Failed to connect to the remote API: {exception}",
"missing_model_api": "The selected model is not provided by this API.",
"missing_model_file": "The provided file does not exist.",
@@ -51,8 +53,11 @@
"model_parameters": {
"data": {
"max_new_tokens": "Maximum tokens to return in response",
"llm_hass_api": "Selected LLM API",
"prompt": "System Prompt",
"prompt_template": "Prompt Format",
"tool_format": "Tool Format",
"tool_multi_turn_chat": "Multi-Turn Tool Use",
"temperature": "Temperature",
"top_k": "Top K",
"top_p": "Top P",
@@ -62,14 +67,13 @@
"ollama_keep_alive": "Keep Alive/Inactivity Timeout (minutes)",
"ollama_json_mode": "JSON Output Mode",
"extra_attributes_to_expose": "Additional attribute to expose in the context",
"allowed_service_call_arguments": "Arguments allowed to be pass to service calls",
"enable_flash_attention": "Enable Flash Attention",
"gbnf_grammar": "Enable GBNF Grammar",
"gbnf_grammar_file": "GBNF Grammar Filename",
"openai_api_key": "API Key",
"text_generation_webui_admin_key": "Admin Key",
"service_call_regex": "Service Call Regex",
"refresh_prompt_per_tern": "Refresh System Prompt Every Turn",
"refresh_prompt_per_turn": "Refresh System Prompt Every Turn",
"remember_conversation": "Remember conversation",
"remember_num_interactions": "Number of past interactions to remember",
"in_context_examples": "Enable in context learning (ICL) examples",
@@ -86,11 +90,11 @@
"n_batch_threads": "Batch Thread Count"
},
"data_description": {
"llm_hass_api": "Select 'Assist' if you want the model to be able to control devices. If you are using the Home-LLM v1, v2, or v3 model then select 'Home-LLM (v1-3)'",
"prompt": "See [here](https://github.com/acon96/home-llm/blob/develop/docs/Model%20Prompting.md) for more information on model prompting.",
"in_context_examples": "If you are using a model that is not specifically fine-tuned for use with this integration: enable this",
"remote_use_chat_endpoint": "If this is enabled, then the integration will use the chat completion HTTP endpoint instead of the text completion one.",
"extra_attributes_to_expose": "This is the list of Home Assistant 'attributes' that are exposed to the model. This limits how much information the model is able to see and answer questions on.",
"allowed_service_call_arguments": "This is the list of parameters that are allowed to be passed to Home Assistant service calls.",
"gbnf_grammar": "Forces the model to output properly formatted responses. Ensure the file specified below exists in the integration directory.",
"prompt_caching": "Prompt caching attempts to pre-process the prompt (house state) and cache the processing that needs to be done to understand the prompt. Enabling this will cause the model to re-process the prompt any time an entity state changes in the house, restricted by the interval below."
},
@@ -103,9 +107,12 @@
"step": {
"init": {
"data": {
"llm_hass_api": "Selected LLM API",
"max_new_tokens": "Maximum tokens to return in response",
"prompt": "System Prompt",
"prompt_template": "Prompt Format",
"tool_format": "Tool Format",
"tool_multi_turn_chat": "Multi-Turn Tool Use",
"temperature": "Temperature",
"top_k": "Top K",
"top_p": "Top P",
@@ -115,14 +122,13 @@
"ollama_keep_alive": "Keep Alive/Inactivity Timeout (minutes)",
"ollama_json_mode": "JSON Output Mode",
"extra_attributes_to_expose": "Additional attribute to expose in the context",
"allowed_service_call_arguments": "Arguments allowed to be pass to service calls",
"enable_flash_attention": "Enable Flash Attention",
"gbnf_grammar": "Enable GBNF Grammar",
"gbnf_grammar_file": "GBNF Grammar Filename",
"openai_api_key": "API Key",
"text_generation_webui_admin_key": "Admin Key",
"service_call_regex": "Service Call Regex",
"refresh_prompt_per_tern": "Refresh System Prompt Every Turn",
"refresh_prompt_per_turn": "Refresh System Prompt Every Turn",
"remember_conversation": "Remember conversation",
"remember_num_interactions": "Number of past interactions to remember",
"in_context_examples": "Enable in context learning (ICL) examples",
@@ -139,11 +145,11 @@
"n_batch_threads": "Batch Thread Count"
},
"data_description": {
"llm_hass_api": "Select 'Assist' if you want the model to be able to control devices. If you are using the Home-LLM v1, v2, or v3 model then select 'Home-LLM (v1-3)'",
"prompt": "See [here](https://github.com/acon96/home-llm/blob/develop/docs/Model%20Prompting.md) for more information on model prompting.",
"in_context_examples": "If you are using a model that is not specifically fine-tuned for use with this integration: enable this",
"remote_use_chat_endpoint": "If this is enabled, then the integration will use the chat completion HTTP endpoint instead of the text completion one.",
"extra_attributes_to_expose": "This is the list of Home Assistant 'attributes' that are exposed to the model. This limits how much information the model is able to see and answer questions on.",
"allowed_service_call_arguments": "This is the list of parameters that are allowed to be passed to Home Assistant service calls.",
"gbnf_grammar": "Forces the model to output properly formatted responses. Ensure the file specified below exists in the integration directory.",
"prompt_caching": "Prompt caching attempts to pre-process the prompt (house state) and cache the processing that needs to be done to understand the prompt. Enabling this will cause the model to re-process the prompt any time an entity state changes in the house, restricted by the interval below."
}
@@ -170,6 +176,13 @@
"no_prompt_template": "None"
}
},
"tool_format": {
"options": {
"full_tool_format": "Full JSON Tool Format",
"reduced_tool_format": "Reduced JSON Tool Format",
"min_tool_format": "Minimal Function Style Tool Format"
}
},
"model_backend": {
"options": {
"llama_cpp_hf": "Llama.cpp (HuggingFace)",

View File

@@ -8,7 +8,10 @@ import voluptuous as vol
import webcolors
from importlib.metadata import version
from homeassistant.helpers import config_validation as cv
from homeassistant.helpers import intent
from homeassistant.requirements import pip_kwargs
from homeassistant.util import color
from homeassistant.util.package import install_package, is_installed
from .const import (
@@ -18,6 +21,11 @@ from .const import (
_LOGGER = logging.getLogger(__name__)
class MissingQuantizationException(Exception):
def __init__(self, missing_quant: str, available_quants: list[str]):
self.missing_quant = missing_quant
self.available_quants = available_quants
def closest_color(requested_color):
min_colors = {}
for key, name in webcolors.CSS3_HEX_TO_NAMES.items():
@@ -46,6 +54,34 @@ def flatten_vol_schema(schema):
_flatten(schema)
return flattened
def custom_custom_serializer(value):
"""a vol schema is really not straightforward to convert back into a dictionary"""
if value is cv.ensure_list:
return { "type": "list" }
if value is color.color_name_to_rgb:
return { "type": "string" }
if value is intent.non_empty_string:
return { "type": "string" }
# media player registers an intent using a lambda...
# there's literally no way to detect that properly. with that in mind, we have this
try:
if value(100) == 1:
return { "type": "integer" }
except Exception:
pass
if isinstance(value, list):
result = {}
for x in value:
result.update(custom_custom_serializer(x))
return result
return cv.custom_serializer(value)
def download_model_from_hf(model_name: str, quantization_type: str, storage_folder: str):
try:
from huggingface_hub import hf_hub_download, HfFileSystem
@@ -57,8 +93,8 @@ def download_model_from_hf(model_name: str, quantization_type: str, storage_fold
wanted_file = [f for f in potential_files if (f".{quantization_type.lower()}." in f or f".{quantization_type.upper()}." in f)]
if len(wanted_file) != 1:
raise Exception(f"The quantization '{quantization_type}' does not exist in the HF repo for {model_name}")
available_quants = [file.split(".")[-2].upper() for file in potential_files]
raise MissingQuantizationException(quantization_type, available_quants)
try:
os.makedirs(storage_folder, exist_ok=True)
except Exception as ex:
@@ -68,7 +104,6 @@ def download_model_from_hf(model_name: str, quantization_type: str, storage_fold
repo_id=model_name,
repo_type="model",
filename=wanted_file[0].removeprefix(model_name + "/"),
resume_download=True,
cache_dir=storage_folder,
)

View File

@@ -5,8 +5,11 @@ There are multiple backends to choose for running the model that the Home Assist
# Common Options
| Option Name | Description | Suggested Value |
|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|
| LLM API | This is the set of tools that are provided to the LLM. Use Assist for the built-in API. If you are using Home-LLM v1, v2, or v3, then select the dedicated API | |
| System Prompt | [see here](./Model%20Prompting.md) | |
| Prompt Format | The format for the context of the model | |
| Tool Format | The format of the tools that are provided to the model. Full, Reduced, or Minimal | |
| Multi-Turn Tool Use | Enable this if the model you are using expects to receive the result from the tool call before responding to the user | |
| Maximum tokens to return in response | Limits the number of tokens that can be produced by each model response | 512 |
| Additional attribute to expose in the context | Extra attributes that will be exposed to the model via the `{{ devices }}` template variable | |
| Arguments allowed to be pass to service calls | Any arguments not listed here will be filtered out of service calls. Used to restrict the model from modifying certain parts of your home. | |

View File

@@ -35,7 +35,7 @@ The following link will open your Home Assistant installation and download the i
[![Open your Home Assistant instance and open a repository inside the Home Assistant Community Store.](https://my.home-assistant.io/badges/hacs_repository.svg)](https://my.home-assistant.io/redirect/hacs_repository/?category=Integration&repository=home-llm&owner=acon96)
After installation, A "LLaMA Conversation" device should show up in the `Settings > Devices and Services > [Devices]` tab now.
After installation, A "Local LLM Conversation" device should show up in the `Settings > Devices and Services > [Devices]` tab now.
## Path 1: Using the Home Model with the Llama.cpp Backend
### Overview
@@ -44,7 +44,7 @@ This setup path involves downloading a fine-tuned model from HuggingFace and int
### Step 1: Wheel Installation for llama-cpp-python
1. In Home Assistant: navigate to `Settings > Devices and Services`
2. Select the `+ Add Integration` button in the bottom right corner
3. Search for, and select `LLaMA Conversation`
3. Search for, and select `Local LLM Conversation`
4. With the `Llama.cpp (HuggingFace)` backend selected, click `Submit`
This should download and install `llama-cpp-python` from GitHub. If the installation fails for any reason, follow the manual installation instructions [here](./Backend%20Configuration.md#wheels).
@@ -52,9 +52,9 @@ This should download and install `llama-cpp-python` from GitHub. If the installa
Once `llama-cpp-python` is installed, continue to the model selection.
### Step 2: Model Selection
The next step is to specify which model will be used by the integration. You may select any repository on HuggingFace that has a model in GGUF format in it. We will use `acon96/Home-3B-v3-GGUF` for this example. If you have less than 4GB of RAM then use `acon96/Home-1B-v2-GGUF`.
The next step is to specify which model will be used by the integration. You may select any repository on HuggingFace that has a model in GGUF format in it. We will use `acon96/Home-3B-v3-GGUF` for this example. If you have less than 4GB of RAM then use `acon96/Home-1B-v3-GGUF`.
**Model Name**: Use either `acon96/Home-3B-v3-GGUF` or `acon96/Home-1B-v2-GGUF`
**Model Name**: Use either `acon96/Home-3B-v3-GGUF` or `acon96/Home-1B-v3-GGUF`
**Quantization Level**: The model will be downloaded in the selected quantization level from the HuggingFace repository. If unsure which level to choose, select `Q4_K_M`.
Pressing `Submit` will download the model from HuggingFace.
@@ -64,7 +64,9 @@ Pressing `Submit` will download the model from HuggingFace.
### Step 3: Model Configuration
This step allows you to configure how the model is "prompted". See [here](./Model%20Prompting.md) for more information on how that works.
For now, defaults for the model should have been populated and you can just scroll to the bottom and click `Submit`.
For now, defaults for the model should have been populated. If you would like the model to be able to control devices then you must select the `Home-LLM (v1-v3)` API. This API is included to ensure compatability with the Home-LLM models that were trained before the introduction of the built in Home Assistant LLM API.
Once the desired API has been selected, scroll to the bottom and click `Submit`.
The model will be loaded into memory and should now be available to select as a conversation agent!
@@ -82,7 +84,7 @@ In order to access the model from another machine, we need to run the Ollama API
1. In Home Assistant: navigate to `Settings > Devices and Services`
2. Select the `+ Add Integration` button in the bottom right corner
3. Search for, and select `LLaMA Conversation`
3. Search for, and select `Local LLM Conversation`
4. Select `Ollama API` from the dropdown and click `Submit`
5. Set up the connection to the API:
- **IP Address**: Fill out IP Address for the machine hosting Ollama
@@ -95,7 +97,9 @@ In order to access the model from another machine, we need to run the Ollama API
### Step 3: Model Configuration
This step allows you to configure how the model is "prompted". See [here](./Model%20Prompting.md) for more information on how that works.
For now, defaults for the model should have been populated and you can just scroll to the bottom and click `Submit`.
For now, defaults for the model should have been populated. If you would like the model to be able to control devices then you must select the `Assist` API.
Once the desired API has been selected, scroll to the bottom and click `Submit`.
> NOTE: The key settings in this case are that our prompt references the `{{ response_examples }}` variable and the `Enable in context learning (ICL) examples` option is turned on.

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 78 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 78 KiB

View File

@@ -31,7 +31,7 @@ def generate(model, tokenizer, prompt):
def format_example(example):
sys_prompt = SYSTEM_PROMPT
services_block = "Services: " + ", ".join(sorted(example["available_services"]))
services_block = "Services: " + ", ".join(sorted(example["available_tools"]))
states_block = "Devices:\n" + "\n".join(example["states"])
question = "Request:\n" + example["question"]
response_start = "Response:\n"
@@ -52,7 +52,7 @@ def main():
"fan.family_room = off",
"lock.front_door = locked"
],
"available_services": ["turn_on", "turn_off", "toggle", "lock", "unlock" ],
"available_tools": ["turn_on", "turn_off", "toggle", "lock", "unlock" ],
"question": request,
}

View File

@@ -1,6 +1,6 @@
{
"name": "LLaMA Conversation",
"homeassistant": "2023.10.0",
"name": "Local LLM Conversation",
"homeassistant": "2024.6.0",
"content_in_root": false,
"render_readme": true
}

View File

@@ -1,3 +1,4 @@
# training + dataset requirements
transformers
tensorboard
datasets
@@ -11,9 +12,17 @@ sentencepiece
deep-translator
langcodes
# integration requirements
requests==2.31.0
huggingface-hub==0.23.0
webcolors==1.13
# types from Home Assistant
homeassistant
hassil
home-assistant-intents
# testing requirements
pytest
pytest-asyncio
pytest-homeassistant-custom-component

View File

@@ -4,7 +4,7 @@ import pytest
import jinja2
from unittest.mock import patch, MagicMock, PropertyMock, AsyncMock, ANY
from custom_components.llama_conversation.agent import LocalLLaMAAgent, OllamaAPIAgent, TextGenerationWebuiAgent, GenericOpenAIAPIAgent
from custom_components.llama_conversation.agent import LlamaCppAgent, OllamaAPIAgent, TextGenerationWebuiAgent, GenericOpenAIAPIAgent
from custom_components.llama_conversation.const import (
CONF_CHAT_MODEL,
CONF_MAX_TOKENS,
@@ -18,7 +18,6 @@ from custom_components.llama_conversation.const import (
CONF_BACKEND_TYPE,
CONF_DOWNLOADED_MODEL_FILE,
CONF_EXTRA_ATTRIBUTES_TO_EXPOSE,
CONF_ALLOWED_SERVICE_CALL_ARGUMENTS,
CONF_PROMPT_TEMPLATE,
CONF_ENABLE_FLASH_ATTENTION,
CONF_USE_GBNF_GRAMMAR,
@@ -54,7 +53,6 @@ from custom_components.llama_conversation.const import (
DEFAULT_BACKEND_TYPE,
DEFAULT_REQUEST_TIMEOUT,
DEFAULT_EXTRA_ATTRIBUTES_TO_EXPOSE,
DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS,
DEFAULT_PROMPT_TEMPLATE,
DEFAULT_ENABLE_FLASH_ATTENTION,
DEFAULT_USE_GBNF_GRAMMAR,
@@ -140,10 +138,10 @@ def home_assistant_mock():
@pytest.fixture
def local_llama_agent_fixture(config_entry, home_assistant_mock):
with patch.object(LocalLLaMAAgent, '_load_icl_examples') as load_icl_examples_mock, \
patch.object(LocalLLaMAAgent, '_load_grammar') as load_grammar_mock, \
patch.object(LocalLLaMAAgent, 'entry', new_callable=PropertyMock) as entry_mock, \
patch.object(LocalLLaMAAgent, '_async_get_exposed_entities') as get_exposed_entities_mock, \
with patch.object(LlamaCppAgent, '_load_icl_examples') as load_icl_examples_mock, \
patch.object(LlamaCppAgent, '_load_grammar') as load_grammar_mock, \
patch.object(LlamaCppAgent, 'entry', new_callable=PropertyMock) as entry_mock, \
patch.object(LlamaCppAgent, '_async_get_exposed_entities') as get_exposed_entities_mock, \
patch('homeassistant.helpers.template.Template') as template_mock, \
patch('custom_components.llama_conversation.agent.importlib.import_module') as import_module_mock, \
patch('custom_components.llama_conversation.agent.install_llama_cpp_python') as install_llama_cpp_python_mock:
@@ -174,7 +172,7 @@ def local_llama_agent_fixture(config_entry, home_assistant_mock):
"target_device": "light.kitchen_light",
}).encode()
agent_obj = LocalLLaMAAgent(
agent_obj = LlamaCppAgent(
home_assistant_mock,
config_entry
)
@@ -191,7 +189,7 @@ def local_llama_agent_fixture(config_entry, home_assistant_mock):
async def test_local_llama_agent(local_llama_agent_fixture):
local_llama_agent: LocalLLaMAAgent
local_llama_agent: LlamaCppAgent
all_mocks: dict[str, MagicMock]
local_llama_agent, all_mocks = local_llama_agent_fixture

View File

@@ -24,7 +24,6 @@ from custom_components.llama_conversation.const import (
CONF_BACKEND_TYPE,
CONF_DOWNLOADED_MODEL_FILE,
CONF_EXTRA_ATTRIBUTES_TO_EXPOSE,
CONF_ALLOWED_SERVICE_CALL_ARGUMENTS,
CONF_PROMPT_TEMPLATE,
CONF_ENABLE_FLASH_ATTENTION,
CONF_USE_GBNF_GRAMMAR,
@@ -66,7 +65,6 @@ from custom_components.llama_conversation.const import (
DEFAULT_BACKEND_TYPE,
DEFAULT_REQUEST_TIMEOUT,
DEFAULT_EXTRA_ATTRIBUTES_TO_EXPOSE,
DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS,
DEFAULT_PROMPT_TEMPLATE,
DEFAULT_ENABLE_FLASH_ATTENTION,
DEFAULT_USE_GBNF_GRAMMAR,
@@ -171,7 +169,6 @@ async def test_validate_config_flow_generic_openai(mock_setup_entry, hass: HomeA
CONF_REQUEST_TIMEOUT: DEFAULT_REQUEST_TIMEOUT,
CONF_PROMPT_TEMPLATE: DEFAULT_PROMPT_TEMPLATE,
CONF_EXTRA_ATTRIBUTES_TO_EXPOSE: DEFAULT_EXTRA_ATTRIBUTES_TO_EXPOSE,
CONF_ALLOWED_SERVICE_CALL_ARGUMENTS: DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS,
CONF_REFRESH_SYSTEM_PROMPT: DEFAULT_REFRESH_SYSTEM_PROMPT,
CONF_REMEMBER_CONVERSATION: DEFAULT_REMEMBER_CONVERSATION,
CONF_REMEMBER_NUM_INTERACTIONS: DEFAULT_REMEMBER_NUM_INTERACTIONS,
@@ -261,7 +258,6 @@ async def test_validate_config_flow_ollama(mock_setup_entry, hass: HomeAssistant
CONF_REQUEST_TIMEOUT: DEFAULT_REQUEST_TIMEOUT,
CONF_PROMPT_TEMPLATE: DEFAULT_PROMPT_TEMPLATE,
CONF_EXTRA_ATTRIBUTES_TO_EXPOSE: DEFAULT_EXTRA_ATTRIBUTES_TO_EXPOSE,
CONF_ALLOWED_SERVICE_CALL_ARGUMENTS: DEFAULT_ALLOWED_SERVICE_CALL_ARGUMENTS,
CONF_REFRESH_SYSTEM_PROMPT: DEFAULT_REFRESH_SYSTEM_PROMPT,
CONF_REMEMBER_CONVERSATION: DEFAULT_REMEMBER_CONVERSATION,
CONF_REMEMBER_NUM_INTERACTIONS: DEFAULT_REMEMBER_NUM_INTERACTIONS,
@@ -299,7 +295,7 @@ def test_validate_options_schema():
universal_options = [
CONF_PROMPT, CONF_PROMPT_TEMPLATE,
CONF_USE_IN_CONTEXT_LEARNING_EXAMPLES, CONF_IN_CONTEXT_EXAMPLES_FILE, CONF_NUM_IN_CONTEXT_EXAMPLES,
CONF_MAX_TOKENS, CONF_EXTRA_ATTRIBUTES_TO_EXPOSE, CONF_ALLOWED_SERVICE_CALL_ARGUMENTS,
CONF_MAX_TOKENS, CONF_EXTRA_ATTRIBUTES_TO_EXPOSE,
CONF_SERVICE_CALL_REGEX, CONF_REFRESH_SYSTEM_PROMPT, CONF_REMEMBER_CONVERSATION, CONF_REMEMBER_NUM_INTERACTIONS,
]