mirror of
https://github.com/acon96/home-llm.git
synced 2026-01-09 13:48:05 -05:00
add performance notes
This commit is contained in:
@@ -97,4 +97,7 @@ In order to facilitate running the project entirely on the system where Home Ass
|
||||
4. Select the 3 dots in the top right and click "Check for Updates" and Refresh the webpage.
|
||||
5. There should now be a "Local Add-ons" section at the top of the "Add-on Store"
|
||||
6. Install the `oobabooga-text-generation-webui` add-on. It will take ~15-20 minutes to build the image on a Raspberry Pi.
|
||||
7. Copy any models you want to use to the `addon_configs/local_text-generation-webui/models` folder.
|
||||
7. Copy any models you want to use to the `addon_configs/local_text-generation-webui/models` folder.
|
||||
|
||||
### Performance of running the model on a Raspberry Pi
|
||||
The RPI4 4GB that I have was sitting right at 1.5 tokens/sec for prompt eval and 1.6 tokens/sec for token generation when running the `Q4_K_M`` quant. I was reliably getting responses in 30-40 seconds after the initial prompt processing which took almost 5 minutes.
|
||||
3
TODO.md
3
TODO.md
@@ -19,4 +19,5 @@
|
||||
- set up vectordb
|
||||
- ingest home assistant docs
|
||||
- "context request" from above to initiate a RAG search
|
||||
[ ] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"
|
||||
[ ] make llama-cpp-python wheels for "llama-cpp-python>=0.2.24"
|
||||
[ ] prime kv cache with current "state" so that requests are faster
|
||||
Reference in New Issue
Block a user