mirror of
https://github.com/nod-ai/AMD-SHARK-Studio.git
synced 2026-04-03 03:00:17 -04:00
- Move statistics out of the main loop - Add 'end-to-end' numbers - Switch the main display unit from s to ms - Start measuring time at 0 The new print format looks like this: ``` Number of iterations: 5 Num tokens: 1 (prompt), 512 (generated), 513 (total) Prefill: avg. 0.01 ms (stdev 0.00), avg. 97.99 tokens/s Decode: avg. 4840.44 ms (stdev 28.80), avg. 97.99 tokens/s Decode end-2-end: avg. 85.78 tokens/s (w/o prompt), avg. 95.98 (w/ prompt) ```
CodeGen Setup using SHARK-server
Setup Server
- clone SHARK and setup the venv
- host the server using
python apps/stable_diffusion/web/index.py --api --server_port=<PORT> - default server address is
http://0.0.0.0:8080
Setup Client
- fauxpilot-vscode (VSCode Extension):
- Code for the extension can be found here
- PreReq: VSCode extension (will need
nodejsandnpmto compile and run the extension) - Compile and Run the extension on VSCode (press F5 on VSCode), this opens a new VSCode window with the extension running
- Open VSCode settings, search for fauxpilot in settings and modify
server : http://<IP>:<PORT>,Model : codegen,Max Lines : 30
- Others (REST API curl, OpenAI Python bindings) as shown here
- using Github Copilot VSCode extension with SHARK-server needs more work to be functional.