use qwen3 instead of mistral as ollama example

2026-01-09 13:48:05 -05:00 · 2025-10-25 22:57:19 -04:00
parent 050a539f72
commit 0206673303
2 changed files with 26 additions and 5 deletions
--- a/custom_components/llama_conversation/const.py
+++ b/custom_components/llama_conversation/const.py
@@ -291,15 +291,25 @@ def option_overrides(backend_type: str) -> dict[str, Any]:
            # no prompt formats with tool calling support, so just use legacy tool calling
            CONF_ENABLE_LEGACY_TOOL_CALLING: True
        },
+        "qwen3": {
+            CONF_PROMPT: DEFAULT_PROMPT_BASE,
+            CONF_TEMPERATURE: 0.6,
+            CONF_TOP_K: 20,
+            CONF_TOP_P: 0.95
+        },
        "mistral": {
            CONF_PROMPT: DEFAULT_PROMPT_BASE + ICL_NO_SYSTEM_PROMPT_EXTRAS,
            CONF_MIN_P: 0.1,
            CONF_TYPICAL_P: 0.9,
+            # no prompt formats with tool calling support, so just use legacy tool calling
+            CONF_ENABLE_LEGACY_TOOL_CALLING: True,
        },
        "mixtral": {
            CONF_PROMPT: DEFAULT_PROMPT_BASE + ICL_NO_SYSTEM_PROMPT_EXTRAS,
            CONF_MIN_P: 0.1,
            CONF_TYPICAL_P: 0.9,
+            # no prompt formats with tool calling support, so just use legacy tool calling
+            CONF_ENABLE_LEGACY_TOOL_CALLING: True,
        },
        "llama-3": {
            CONF_PROMPT: DEFAULT_PROMPT_BASE + ICL_EXTRAS,
@@ -309,6 +319,7 @@ def option_overrides(backend_type: str) -> dict[str, Any]:
        },
        "zephyr": {
            CONF_PROMPT: DEFAULT_PROMPT_BASE + ICL_EXTRAS,
+            
        },
        "phi-3": {
            CONF_PROMPT: DEFAULT_PROMPT_BASE + ICL_EXTRAS,
--- a/docs/Setup.md
+++ b/docs/Setup.md
@@ -9,7 +9,7 @@
    * [Step 1: Wheel Installation for llama-cpp-python](#step-1-wheel-installation-for-llama-cpp-python)
    * [Step 2: Model Selection](#step-2-model-selection)
    * [Step 3: Model Configuration](#step-3-model-configuration)
-* [Path 2: Using Mistral-Instruct-7B with Ollama Backend](#path-2-using-mistral-instruct-7b-with-ollama-backend)
+* [Path 2: Using Qwen3 with Ollama Backend](#path-2-using-qwen3-with-ollama-backend)
    * [Overview](#overview-1)
    * [Step 1: Downloading and serving the Model](#step-1-downloading-and-serving-the-model)
    * [Step 2: Connect to the Ollama API](#step-2-connect-to-the-ollama-api)
@@ -76,12 +76,22 @@ Once the desired API has been selected, scroll to the bottom and click `Submit`.

 The model will be loaded into memory and should now be available to select as a conversation agent!

-## Path 2: Using Mistral-Instruct-7B with Ollama Backend
+## Path 2: Using Qwen3 with Ollama Backend
 ### Overview
-For those who have access to a GPU, you can also use the Mistral-Instruct-7B model to power your conversation agent. This path requires a separate machine that has a GPU and has [Ollama](https://ollama.com/) already installed on it.  This path utilizes in-context learning examples, to prompt the model to produce the output that we expect.
+For those who have access to a GPU, you can also use the Qwen3 model to power your conversation agent. This path requires a separate machine that has a GPU and has [Ollama](https://ollama.com/) already installed on it.  This path utilizes in-context learning examples, to prompt the model to produce the output that we expect.

 ### Step 1: Downloading and serving the Model
-Mistral can be easily set up and downloaded on the serving machine using the `ollama pull mistral` command.
+There are multiple size options for the Qwen3 series of model. Replace `8b` with the tag for your choice of model.
+
+| Parameter Count  | Estimated VRAM | Ollama Tag | 
+| -----------------| ---------------|------------|
+| 4 Billion        | 8-10 GB        | `4b`       |
+| 8 Billion        | 9-12 GB        | `8b`       |
+| 14 Billion       | 14-16 GB       | `14b`      |
+| 32 Billion       | 22+ GB         | `32b`      |
+| 30B (3B Active)  | 20+ GB         | `30b`      |
+
+Qwen3 can be easily set up and downloaded on the serving machine using the `ollama pull qwen3:8b` command.

 In order to access the model from another machine, we need to run the Ollama API server open to the local network. This can be achieved using the `OLLAMA_HOST=0.0.0.0:11434 ollama serve` command. **DO NOT RUN THIS COMMAND ON ANY PUBLICLY
 ACCESSIBLE SERVERS AS IT LISTENS ON ALL NETWORK INTERFACES**
@@ -103,7 +113,7 @@ In order to access the model from another machine, we need to run the Ollama API
 ### Step 3: Model Selection & Configuration
 1. You must create the conversation agent based on the model you wish to use.  
    Under the `Ollama at '<url>` service that you just created, select `+ Add conversation agent`  
-    - **Model Name**: Select the Mistral 7B model. This should automatically populated based on the model you already downloaded 
+    - **Model Name**: Select `qwen3:8b` from the list.
 2. You can configure how the model is "prompted". See [here](./Model%20Prompting.md) for more information on how that works.  

 For now, defaults for the model should have been populated. If you would like the model to be able to control devices then you must select the `Assist` API.