mirror of
https://github.com/acon96/home-llm.git
synced 2026-01-09 13:48:05 -05:00
@@ -158,6 +158,7 @@ python3 train.py \
|
||||
## Version History
|
||||
| Version | Description |
|
||||
|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| v0.4.3 | Fix an issue with the integration not creating model configs properly during setup |
|
||||
| v0.4.2 | Fix the following issues: not correctly setting default model settings during initial setup, non-integers being allowed in numeric config fields, being too strict with finish_reason requirements, and not letting the user clear the active LLM API |
|
||||
| v0.4.1 | Fix an issue with using Llama.cpp models downloaded from HuggingFace |
|
||||
| v0.4 | Rewrite integration to support tool calling models/agentic tool use loop, voice streaming, multiple config sub-entries per backend, and dynamic llama.cpp processor selection |
|
||||
|
||||
40
TODO.md
40
TODO.md
@@ -1,25 +1,27 @@
|
||||
# TODO
|
||||
- [x] proper tool calling support
|
||||
- [ ] fix old GGUFs to support tool calling
|
||||
- [x] home assistant component text streaming support
|
||||
- [x] move llama-cpp build to forked repo + add support for multi backend builds (no more -noavx)
|
||||
- [ ] new model based on qwen3 0.6b
|
||||
- [ ] new model based on qwen3 0.6b, 1.7b and 4b
|
||||
- [ ] new model based on gemma3 270m
|
||||
- [ ] support AI task API
|
||||
- [ ] vision support for remote backends
|
||||
- [ ] vision support for local backend (llama.cpp + llava)
|
||||
- [ ] move llamacpp to a separate process because of all the crashing
|
||||
- [ ] optional sampling parameters in options panel (don't pass to backend if not set)
|
||||
- [ ] update dataset so new models will work with the Assist API
|
||||
- [ ] make ICL examples into conversation turns
|
||||
- [ ] translate ICL examples + make better ones
|
||||
- [ ] figure out DPO to improve response quality
|
||||
- [x] proper tool calling support
|
||||
- [x] fix old GGUFs to support tool calling
|
||||
- [x] home assistant component text streaming support
|
||||
- [x] move llama-cpp build to forked repo + add support for multi backend builds (no more -noavx)
|
||||
- [x] support new LLM APIs
|
||||
- rewrite how services are called
|
||||
- handle no API selected
|
||||
- rewrite prompts + service block formats
|
||||
- implement new LLM API that has `HassCallService` so old models can still work
|
||||
- [ ] update dataset so new models will work with the API
|
||||
- [ ] make ICL examples into conversation turns
|
||||
- [ ] translate ICL examples + make better ones
|
||||
- [x] areas/room support
|
||||
- [x] convert requests to aiohttp
|
||||
- [x] detection/mitigation of too many entities being exposed & blowing out the context length
|
||||
- [ ] figure out DPO to improve response quality
|
||||
- [x] setup github actions to build wheels that are optimized for RPIs
|
||||
- [x] mixtral + prompting (no fine tuning)
|
||||
- add in context learning variables to sys prompt template
|
||||
@@ -58,24 +60,6 @@
|
||||
- [x] ollama backend
|
||||
- [x] tailored_openai backend
|
||||
- [x] generic openai responses backend
|
||||
- [ ] fix and re-upload all compatible old models (+ upload all original safetensors)
|
||||
- [x] fix and re-upload all compatible old models (+ upload all original safetensors)
|
||||
- [x] config entry migration function
|
||||
- [x] re-write setup guide
|
||||
|
||||
## more complicated ideas
|
||||
- [ ] "context requests"
|
||||
- basically just let the model decide what RAG/extra context it wants
|
||||
- the model predicts special tokens as the first few tokens of its output
|
||||
- the requested content is added to the context after the request tokens and then generation continues
|
||||
- needs more complicated training b/c multi-turn + there will be some weird masking going on for training the responses properly
|
||||
- [ ] integrate with llava for checking camera feeds in home assistant
|
||||
- can check still frames to describe what is there
|
||||
- for remote backends that support images, could also support this
|
||||
- depends on context requests because we don't want to feed camera feeds into the context every time
|
||||
- [ ] RAG for getting info for setting up new devices
|
||||
- set up vectordb
|
||||
- ingest home assistant docs
|
||||
- "context request" from above to initiate a RAG search
|
||||
- [ ] train the model to respond to house events (HA is calling these AI tasks)
|
||||
- present the model with an event + a "prompt" from the user of what you want it to do (i.e. turn on the lights when I get home = the model turns on lights when your entity presence triggers as being home)
|
||||
- basically lets you write automations in plain english
|
||||
@@ -191,7 +191,9 @@ class GenericOpenAIAPIClient(LocalLLMClient):
|
||||
return endpoint, request_params
|
||||
|
||||
def _extract_response(self, response_json: dict, llm_api: llm.APIInstance | None, user_input: conversation.ConversationInput) -> Tuple[Optional[str], Optional[List[llm.ToolInput]]]:
|
||||
if len(response_json["choices"]) == 0: # finished
|
||||
if "choices" not in response_json or len(response_json["choices"]) == 0: # finished
|
||||
_LOGGER.warning("Response missing or empty 'choices'. Keys present: %s. Full response: %s",
|
||||
list(response_json.keys()), response_json)
|
||||
return None, None
|
||||
|
||||
choice = response_json["choices"][0]
|
||||
|
||||
@@ -1139,8 +1139,8 @@ class LocalLLMSubentryFlowHandler(ConfigSubentryFlow):
|
||||
selected_default_options[CONF_PROMPT] = build_prompt_template(
|
||||
selected_language, str(selected_default_options.get(CONF_PROMPT, DEFAULT_PROMPT))
|
||||
)
|
||||
|
||||
self.model_config = selected_default_options
|
||||
|
||||
self.model_config = {**selected_default_options, **self.model_config}
|
||||
|
||||
schema = vol.Schema(
|
||||
local_llama_config_option_schema(
|
||||
|
||||
@@ -337,5 +337,5 @@ def option_overrides(backend_type: str) -> dict[str, Any]:
|
||||
},
|
||||
}
|
||||
|
||||
INTEGRATION_VERSION = "0.4.2"
|
||||
INTEGRATION_VERSION = "0.4.3"
|
||||
EMBEDDED_LLAMA_CPP_PYTHON_VERSION = "0.3.16+b6153"
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"domain": "llama_conversation",
|
||||
"name": "Local LLMs",
|
||||
"version": "0.4.2",
|
||||
"version": "0.4.3",
|
||||
"codeowners": ["@acon96"],
|
||||
"config_flow": true,
|
||||
"dependencies": ["conversation"],
|
||||
|
||||
@@ -3,42 +3,73 @@
|
||||
There are multiple backends to choose for running the model that the Home Assistant integration uses. Here is a description of all the options for each backend
|
||||
|
||||
# Common Options
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|
|
||||
| LLM API | This is the set of tools that are provided to the LLM. Use Assist for the built-in API. If you are using Home-LLM v1, v2, or v3, then select the dedicated API | |
|
||||
| System Prompt | [see here](./Model%20Prompting.md) | |
|
||||
| Maximum tokens to return in response | Limits the number of tokens that can be produced by each model response | 512 |
|
||||
| Additional attribute to expose in the context | Extra attributes that will be exposed to the model via the `{{ devices }}` template variable | |
|
||||
| Arguments allowed to be pass to service calls | Any arguments not listed here will be filtered out of service calls. Used to restrict the model from modifying certain parts of your home. | |
|
||||
| Service Call Regex | The regular expression used to extract service calls from the model response; should contain 1 repeated capture group | |
|
||||
| Refresh System Prompt Every Turn | Flag to update the system prompt with updated device states on every chat turn. Disabling can significantly improve agent response times when using a backend that supports prefix caching (Llama.cpp) | Enabled |
|
||||
| Remember conversation | Flag to remember the conversation history (excluding system prompt) in the model context. | Enabled |
|
||||
| Number of past interactions to remember | If `Remember conversation` is enabled, number of user-assistant interaction pairs to keep in history. | |
|
||||
| Enable in context learning (ICL) examples | If enabled, will load examples from the specified file and expose them as the `{{ response_examples }}` variable in the system prompt template | |
|
||||
| In context learning examples CSV filename | The file to load in context learning examples from. Must be located in the same directory as the custom component | |
|
||||
| Number of ICL examples to generate | The number of examples to select when expanding the `{{ in_context_examples }}` template in the prompt | |
|
||||
These options are available for all backends and control model inference behavior, conversation memory, and integration-specific settings.
|
||||
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
|
||||
| Selected Language | The language to use for prompts and responses. Affects system prompt templates and examples. | en |
|
||||
| LLM API | The API to use for tool execution. Select "Assist" for the built-in Home Assistant API, or "No control" to disable tool execution. Other options are specialized APIs like Home-LLM v1/v2/v3. | Assist |
|
||||
| System Prompt | [see here](./Model%20Prompting.md) | |
|
||||
| Additional attributes to expose in the context | Extra attributes that will be exposed to the model via the `{{ devices }}` template variable (e.g., rgb_color, brightness, temperature, humidity, fan_mode, volume_level) | See suggestions |
|
||||
| Refresh System Prompt Every Turn | Flag to update the system prompt with updated device states on every chat turn. Disabling can significantly improve agent response times when using a backend that supports prefix caching (Llama.cpp) | Enabled |
|
||||
| Remember conversation | Flag to remember the conversation history (excluding system prompt) in the model context. | Enabled |
|
||||
| Number of past interactions to remember | If `Remember conversation` is enabled, number of user-assistant interaction pairs to keep in history. Not used by Generic OpenAI Responses backend. | |
|
||||
| Enable in context learning (ICL) examples | If enabled, will load examples from the specified file and expose them as the `{{ response_examples }}` variable in the system prompt template | Enabled |
|
||||
| In context learning examples CSV filename | The file to load in context learning examples from. Must be located in the same directory as the custom component | in_context_examples.csv |
|
||||
| Number of ICL examples to generate | The number of examples to select when expanding the `{{ in_context_examples }}` template in the prompt | 4 |
|
||||
| Thinking prefix | String prefix to mark the start of internal model reasoning (used when the model supports explicit thinking) | `<think>` |
|
||||
| Thinking suffix | String suffix to mark the end of internal model reasoning | `</think>` |
|
||||
| Tool call prefix | String prefix to mark the start of a function call in the model response | `<tool_call>` |
|
||||
| Tool call suffix | String suffix to mark the end of a function call in the model response | `</tool_call>` |
|
||||
| Enable legacy tool calling | If enabled, uses the legacy `\`\`\`homeassistant` tool calling format instead of the newer prefix/suffix format. Required for some older Home-LLM models. | Disabled |
|
||||
| Max tool call iterations | Maximum number of times the model can make tool calls in sequence before the conversation is terminated | 3 |
|
||||
|
||||
# Llama.cpp
|
||||
For details about the sampling parameters, see here: https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab#parameters-description
|
||||
|
||||
## Connection & Model Selection
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|--------------------------------------------------------------------------------------------------------------------------------|------------------------|
|
||||
| Chat Model | The Hugging Face model repository or local model filename to use for inference | acon96/Home-3B-v3-GGUF |
|
||||
| Model Quantization | The quantization level to download for the selected model from Hugging Face | Q4_K_M |
|
||||
| Model File Path | The full path to a local GGUF model file. If not specified, the model will be downloaded from Hugging Face | |
|
||||
|
||||
## Sampling & Output
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|-----------------|
|
||||
| Temperature | Sampling parameter; see above link | 0.1 |
|
||||
| Top K | Sampling parameter; see above link | 40 |
|
||||
| Top P | Sampling parameter; see above link | 1.0 |
|
||||
| Min P | Sampling parameter; see above link | 0.0 |
|
||||
| Typical P | Sampling parameter; see above link | 1.0 |
|
||||
| Maximum tokens to return in response | Limits the number of tokens that can be produced by each model response | 512 |
|
||||
| Context Length | Maximum number of tokens the model can consider in its context window | 2048 |
|
||||
|
||||
## Performance Optimization
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------------------|
|
||||
| Batch Size | Number of tokens to process in each batch. Higher values increase speed but consume more memory | 512 |
|
||||
| Thread Count | Number of CPU threads to use for inference | (number of physical CPU cores) |
|
||||
| Batch Thread Count | Number of threads to use for batch processing | (number of physical CPU cores) |
|
||||
| Enable Flash Attention | Use Flash Attention optimization if supported by the model. Can significantly improve performance on compatible GPUs | Disabled |
|
||||
|
||||
## Advanced Features
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
|
||||
| Top K | Sampling parameter; see above link | 40 |
|
||||
| Top P | Sampling parameter; see above link | 1.0 |
|
||||
| Temperature | Sampling parameter; see above link | 0.1 |
|
||||
| Min P | Sampling parameter; see above link | 0.1 |
|
||||
| Typical P | Sampling parameter; see above link | 0.95 |
|
||||
| Enable GBNF Grammar | Restricts the output of the model to follow a pre-defined syntax; eliminates function calling syntax errors on quantized models | Enabled |
|
||||
| GBNF Grammar Filename | The file to load as the GBNF grammar. Must be located in the same directory as the custom component. | `output.gbnf` for Home LLM and `json.gbnf` for any model using ICL |
|
||||
| Enable Prompt Caching | Cache the system prompt to avoid recomputing it on every turn (requires refresh_system_prompt to be disabled) | Disabled |
|
||||
| Prompt Caching Interval | Number of seconds between prompt cache refreshes (if caching is enabled) | 30 |
|
||||
|
||||
## Wheels
|
||||
The wheels for `llama-cpp-python` can be built or downloaded manually for installation.
|
||||
The wheels for `llama-cpp-python` can be built or downloaded manually for installation/re-installation.
|
||||
|
||||
Take the appropriate wheel and copy it to the `custom_components/llama_conversation/` directory.
|
||||
|
||||
After the wheel file has been copied to the correct folder, attempt the wheel installation step of the integration setup. The local wheel file should be detected and installed.
|
||||
|
||||
## Pre-built
|
||||
Pre-built wheel files (`*.whl`) are provided as part of the [GitHub release](https://github.com/acon96/home-llm/releases/latest) for the integration.
|
||||
Pre-built wheel files (`*.whl`) are built as part of a fork of llama-cpp-python and are available on the [GitHub releases](https://github.com/acon96/llama-cpp-python/releases/latest) page for the fork.
|
||||
|
||||
To ensure compatibility with your Home Assistant and Python versions, select the correct `.whl` file for your hardware's architecture:
|
||||
- For Home Assistant `2024.2.0` and newer, use the Python 3.12 wheels (`cp312`)
|
||||
@@ -57,35 +88,123 @@ To ensure compatibility with your Home Assistant and Python versions, select the
|
||||
3. The compatible wheel files will be placed in the folder you executed the script from
|
||||
|
||||
|
||||
# Llama.cpp Server
|
||||
Llama.cpp Server backend is used when running inference via a separate `llama-cpp-python` HTTP server.
|
||||
|
||||
## Connection
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
|
||||
| Host | The hostname or IP address of the llama-cpp-python server | |
|
||||
| Port | The port number the server is listening on | 8000 |
|
||||
| SSL | Whether to use HTTPS for the connection | false |
|
||||
|
||||
## Sampling & Output
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
|
||||
| Top K | Sampling parameter; see [text-generation-webui wiki](https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab#parameters-description) | 40 |
|
||||
| Top P | Sampling parameter; see above link | 1.0 |
|
||||
| Maximum tokens to return in response | Limits the number of tokens that can be produced by each model response | 512 |
|
||||
| Request Timeout | The maximum time in seconds that the integration will wait for a response from the remote server | 90 (higher if running on low resource hardware) |
|
||||
|
||||
## Advanced Features
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
|
||||
| Enable GBNF Grammar | Restricts the output of the model to follow a pre-defined syntax; eliminates function calling syntax errors | Enabled |
|
||||
| GBNF Grammar Filename | The file to load as the GBNF grammar. Must be located in the same directory as the custom component. | `output.gbnf` |
|
||||
|
||||
|
||||
# text-generation-webui
|
||||
For details about the sampling parameters, see here: https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab#parameters-description
|
||||
| Option Name | Description | Suggested Value |
|
||||
|
||||
## Connection
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
|
||||
| Host | The hostname or IP address of the text-generation-webui server | |
|
||||
| Port | The port number the server is listening on | 5000 |
|
||||
| SSL | Whether to use HTTPS for the connection | false |
|
||||
| Admin Key | The admin key for the text-generation-webui server (if configured for authentication) | |
|
||||
|
||||
## Sampling & Output
|
||||
| Option Name | Description | Suggested Value |
|
||||
|----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
|
||||
| Temperature | Sampling parameter; see above link | 0.1 |
|
||||
| Top K | Sampling parameter; see above link | 40 |
|
||||
| Top P | Sampling parameter; see above link | 1.0 |
|
||||
| Min P | Sampling parameter; see above link | 0.0 |
|
||||
| Typical P | Sampling parameter; see above link | 1.0 |
|
||||
| Context Length | Maximum number of tokens the model can consider in its context window | 2048 |
|
||||
| Request Timeout | The maximum time in seconds that the integration will wait for a response from the remote server | 90 (higher if running on low resource hardware) |
|
||||
|
||||
## UI Configuration
|
||||
| Option Name | Description | Suggested Value |
|
||||
|----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
|
||||
| Request Timeout | The maximum time in seconds that the integration will wait for a response from the remote server | 90 (higher if running on low resource hardware) |
|
||||
| Generation Preset/Character Name | The preset or character name to pass to the backend. If none is provided then the settings that are currently selected in the UI will be applied | |
|
||||
| Chat Mode | [see here](https://github.com/oobabooga/text-generation-webui/wiki/01-%E2%80%90-Chat-Tab#mode) | Instruct |
|
||||
| Top K | Sampling parameter; see above link | 40 |
|
||||
| Top P | Sampling parameter; see above link | 1.0 |
|
||||
| Temperature | Sampling parameter; see above link | 0.1 |
|
||||
| Min P | Sampling parameter; see above link | 0.1 |
|
||||
| Typical P | Sampling parameter; see above link | 0.95 |
|
||||
| Chat Mode | [see here](https://github.com/oobabooga/text-generation-webui/wiki/01-%E2%80%90-Chat-Tab#mode) | Instruct |
|
||||
|
||||
# Ollama
|
||||
For details about the sampling parameters, see here: https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab#parameters-description
|
||||
|
||||
## Connection
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
|
||||
| Host | The hostname or IP address of the Ollama server | |
|
||||
| Port | The port number the server is listening on | 11434 |
|
||||
| SSL | Whether to use HTTPS for the connection | false |
|
||||
|
||||
## Sampling & Output
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
|
||||
| Request Timeout | The maximum time in seconds that the integration will wait for a response from the remote server | 90 (higher if running on low resource hardware) |
|
||||
| Keep Alive/Inactivity Timeout | The duration in minutes to keep the model loaded after each request. Set to a negative value to keep loaded forever | 30m |
|
||||
| JSON Mode | Restricts the model to only ouput valid JSON objects. Enable this if you are using ICL and are getting invalid JSON responses. | True |
|
||||
| Top K | Sampling parameter; see above link | 40 |
|
||||
| Top P | Sampling parameter; see above link | 1.0 |
|
||||
| Temperature | Sampling parameter; see above link | 0.1 |
|
||||
| Typical P | Sampling parameter; see above link | 0.95 |
|
||||
| Typical P | Sampling parameter; see above link | 1.0 |
|
||||
| Maximum tokens to return in response | Limits the number of tokens that can be produced by each model response | 512 |
|
||||
| Context Length | Maximum number of tokens the model can consider in its context window | 2048 |
|
||||
| Request Timeout | The maximum time in seconds that the integration will wait for a response from the remote server | 90 (higher if running on low resource hardware) |
|
||||
|
||||
# Generic OpenAI API Compatible
|
||||
## Advanced Features
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
|
||||
| JSON Mode | Restricts the model to only output valid JSON objects. Enable this if you are using ICL and are getting invalid JSON responses. | True |
|
||||
| Keep Alive/Inactivity Timeout | The duration in minutes to keep the model loaded after each request. Set to a negative value to keep loaded forever | 30 (minutes) |
|
||||
|
||||
# Generic OpenAI API (Chat Completions)
|
||||
For details about the sampling parameters, see here: https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab#parameters-description
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-------------------------------|--------------------------------------------------------------------------------------------------|-------------------------------------------------|
|
||||
| Request Timeout | The maximum time in seconds that the integration will wait for a response from the remote server | 90 (higher if running on low resource hardware) |
|
||||
| Top P | Sampling parameter; see above link | 1.0 |
|
||||
| Temperature | Sampling parameter; see above link | 0.1 |
|
||||
|
||||
## Connection
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
|
||||
| Host | The hostname or IP address of the OpenAI-compatible API server | |
|
||||
| Port | The port number the server is listening on (leave empty for default) | |
|
||||
| SSL | Whether to use HTTPS for the connection | false |
|
||||
| API Key | The API key for authentication (if required by your server) | |
|
||||
| API Path | The path prefix for API requests (e.g., `/v1` for OpenAI-compatible servers) | v1 |
|
||||
|
||||
## Sampling & Output
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
|
||||
| Top P | Sampling parameter; see above link | 1.0 |
|
||||
| Request Timeout | The maximum time in seconds that the integration will wait for a response from the remote server | 90 (higher if running on low resource hardware) |
|
||||
|
||||
# Generic OpenAI Responses
|
||||
Generic OpenAI Responses backend uses time-based conversation memory instead of interaction counts and is compatible with specialized response APIs.
|
||||
|
||||
## Connection
|
||||
| Option Name | Description | Suggested Value |
|
||||
|-----------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
|
||||
| Host | The hostname or IP address of the OpenAI-compatible API server | |
|
||||
| Port | The port number the server is listening on (leave empty for default) | |
|
||||
| SSL | Whether to use HTTPS for the connection | false |
|
||||
| API Key | The API key for authentication (if required by your server) | |
|
||||
| API Path | The path prefix for API requests | v1 |
|
||||
|
||||
## Sampling & Output
|
||||
| Option Name | Description | Suggested Value |
|
||||
|----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
|
||||
| Temperature | Sampling parameter; see above link | 0.1 |
|
||||
| Top P | Sampling parameter; see above link | 1.0 |
|
||||
| Request Timeout | The maximum time in seconds that the integration will wait for a response from the remote server | 90 (higher if running on low resource hardware) |
|
||||
|
||||
## Memory & Conversation
|
||||
| Option Name | Description | Suggested Value |
|
||||
|---------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|-----------------|
|
||||
| Remember conversation time (minutes) | Number of minutes to remember conversation history. Uses time-based memory instead of interaction count. | 2 (minutes) |
|
||||
|
||||
Reference in New Issue
Block a user