add performance notes

2026-01-09 13:48:05 -05:00 · 2023-12-28 16:29:52 -05:00
parent e1847bd0f8
commit 4a3855df64
2 changed files with 6 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -97,4 +97,7 @@ In order to facilitate running the project entirely on the system where Home Ass
 4. Select the 3 dots in the top right and click "Check for Updates" and Refresh the webpage.
 5. There should now be a "Local Add-ons" section at the top of the "Add-on Store"
 6. Install the `oobabooga-text-generation-webui` add-on. It will take ~15-20 minutes to build the image on a Raspberry Pi.
-7. Copy any models you want to use to the `addon_configs/local_text-generation-webui/models` folder.
+7. Copy any models you want to use to the `addon_configs/local_text-generation-webui/models` folder.
+
+### Performance of running the model on a Raspberry Pi
+The RPI4 4GB that I have was sitting right at 1.5 tokens/sec for prompt eval and 1.6 tokens/sec for token generation when running the `Q4_K_M`` quant. I was reliably getting responses in 30-40 seconds after the initial prompt processing which took almost 5 minutes.
--- a/TODO.md
+++ b/TODO.md
@@ -19,4 +19,5 @@
    - set up vectordb
    - ingest home assistant docs
    - "context request" from above to initiate a RAG search
-[ ] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"
+[ ] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"
+[ ] prime kv cache with current "state" so that requests are faster