mirror of
https://github.com/nod-ai/AMD-SHARK-Studio.git
synced 2026-04-03 03:00:17 -04:00
The past key values are only used within the models themselves and can be kept on device. For vulkan int4, this gives 44 tok/s (for the first prompt) and settles at around 26 tok/s on 7900xtx.