mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-01-10 07:28:15 -05:00
fix llama shard convo mode (#3716)
This commit is contained in:
@@ -392,7 +392,7 @@ After you are done speaking, output [EOS]. You are not Chad.
|
||||
|
||||
print(f"Preparing KV cache for chatbot with personality {args.personality}...")
|
||||
with Timing():
|
||||
llama.model(Tensor([toks]), 0, args.temperature).realize() # NOTE: outputs are not used
|
||||
llama.model(Tensor([toks], device=device), 0, args.temperature).realize() # NOTE: outputs are not used
|
||||
start_pos = len(toks)
|
||||
else:
|
||||
# non chat bot mode
|
||||
|
||||
Reference in New Issue
Block a user