add performance notes

This commit is contained in:
Alex O'Connell
2023-12-28 16:29:52 -05:00
parent e1847bd0f8
commit 4a3855df64
2 changed files with 6 additions and 2 deletions

View File

@@ -97,4 +97,7 @@ In order to facilitate running the project entirely on the system where Home Ass
4. Select the 3 dots in the top right and click "Check for Updates" and Refresh the webpage.
5. There should now be a "Local Add-ons" section at the top of the "Add-on Store"
6. Install the `oobabooga-text-generation-webui` add-on. It will take ~15-20 minutes to build the image on a Raspberry Pi.
7. Copy any models you want to use to the `addon_configs/local_text-generation-webui/models` folder.
7. Copy any models you want to use to the `addon_configs/local_text-generation-webui/models` folder.
### Performance of running the model on a Raspberry Pi
The RPI4 4GB that I have was sitting right at 1.5 tokens/sec for prompt eval and 1.6 tokens/sec for token generation when running the `Q4_K_M`` quant. I was reliably getting responses in 30-40 seconds after the initial prompt processing which took almost 5 minutes.

View File

@@ -19,4 +19,5 @@
- set up vectordb
- ingest home assistant docs
- "context request" from above to initiate a RAG search
[ ] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"
[ ] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"
[ ] prime kv cache with current "state" so that requests are faster