AMD-SHARK-Studio

mirror of https://github.com/nod-ai/AMD-SHARK-Studio.git synced 2026-04-03 03:00:17 -04:00

Files

Jakub Kuderski 2da31c4109 [vicuna.py] Rework benchmark statistics calculation (#1992 )

- Move statistics out of the main loop
- Add 'end-to-end' numbers
- Switch the main display unit from s to ms
- Start measuring time at 0

The new print format looks like this:
```
Number of iterations: 5
Num tokens: 1 (prompt), 512 (generated), 513 (total)
Prefill: avg. 0.01 ms (stdev 0.00), avg. 97.99 tokens/s
Decode: avg. 4840.44 ms (stdev 28.80), avg. 97.99 tokens/s
Decode end-2-end: avg. 85.78 tokens/s (w/o prompt), avg. 95.98 (w/ prompt)
```

2023-11-23 12:04:03 -05:00

langchain

Update brevitas quant api (#1975 )

2023-11-15 10:04:07 -08:00

scripts

[vicuna.py] Rework benchmark statistics calculation (#1992 )

2023-11-23 12:04:03 -05:00

src

Update brevitas quant api (#1975 )

2023-11-15 10:04:07 -08:00

README.md

Add README for CodeGen server

2023-07-19 23:10:23 +05:30

shark_llama_cli.spec

Fix llama2 13b crashing + add spec file for CLI execution of Llama (#1797 )

2023-08-25 09:36:09 -05:00

utils.py

[Llama2] Fix wrong Vulkan device ID + Add Vulkan compile flags

2023-09-22 22:24:18 +05:30

README.md

CodeGen Setup using SHARK-server

Setup Server

clone SHARK and setup the venv
host the server using python apps/stable_diffusion/web/index.py --api --server_port=<PORT>
default server address is http://0.0.0.0:8080

Setup Client

fauxpilot-vscode (VSCode Extension):

Code for the extension can be found here
PreReq: VSCode extension (will need nodejs and npm to compile and run the extension)
Compile and Run the extension on VSCode (press F5 on VSCode), this opens a new VSCode window with the extension running
Open VSCode settings, search for fauxpilot in settings and modify server : http://<IP>:<PORT>, Model : codegen , Max Lines : 30

Others (REST API curl, OpenAI Python bindings) as shown here

using Github Copilot VSCode extension with SHARK-server needs more work to be functional.