* Streaming LLM
* Update precision and add gpu support
* (studio2) Separate weights generation for quantization support
* Adapt prompt changes to studio flow
* Remove outdated flag from llm compile flags.
* (studio2) use turbine vmfbRunner
* tweaks to prompts
* Update CPU path and llm api test.
* Change device in test to cpu.
* Fixes to runner, device names, vmfb mgmt
* Use small test without external weights.