mirror of
https://github.com/nod-ai/AMD-SHARK-Studio.git
synced 2026-02-19 11:56:43 -05:00
101 lines
2.9 KiB
Markdown
101 lines
2.9 KiB
Markdown
# SHARK Triton Backend
|
|
|
|
The triton backend for shark.
|
|
|
|
# Build
|
|
|
|
Install SHARK
|
|
|
|
```
|
|
git clone https://github.com/nod-ai/SHARK.git
|
|
# skip above step if dshark is already installed
|
|
cd SHARK/inference
|
|
```
|
|
|
|
install dependancies
|
|
|
|
```
|
|
apt-get install patchelf rapidjson-dev python3-dev
|
|
git submodule update --init
|
|
```
|
|
|
|
update the submodules of iree
|
|
|
|
```
|
|
cd thirdparty/shark-runtime
|
|
git submodule update --init
|
|
```
|
|
|
|
Next, make the backend and install it
|
|
|
|
```
|
|
cd ../..
|
|
mkdir build && cd build
|
|
cmake -DTRITON_ENABLE_GPU=ON \
|
|
-DIREE_HAL_DRIVER_CUDA=ON \
|
|
-DIREE_TARGET_BACKEND_CUDA=ON \
|
|
-DMLIR_ENABLE_CUDA_RUNNER=ON \
|
|
-DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install \
|
|
-DTRITON_BACKEND_REPO_TAG=r22.02 \
|
|
-DTRITON_CORE_REPO_TAG=r22.02 \
|
|
-DTRITON_COMMON_REPO_TAG=r22.02 ..
|
|
make install
|
|
```
|
|
|
|
# Incorporating into Triton
|
|
|
|
There are much more in depth explenations for the following steps in triton's documentation:
|
|
https://github.com/triton-inference-server/server/blob/main/docs/compose.md#triton-with-unsupported-and-custom-backends
|
|
|
|
There should be a file at /build/install/backends/dshark/libtriton_dshark.so. You will need to copy it into your triton server image.
|
|
More documentation is in the link above, but to create the docker image, you need to run the compose.py command in the triton-backend server repo
|
|
|
|
|
|
To first build your image, clone the tritonserver repo.
|
|
|
|
```
|
|
git clone https://github.com/triton-inference-server/server.git
|
|
```
|
|
|
|
then run `compose.py` to build a docker compose file
|
|
```
|
|
cd server
|
|
python3 compose.py --repoagent checksum --dry-run
|
|
```
|
|
|
|
Because dshark is a third party backend, you will need to manually modify the `Dockerfile.compose` to include the dshark backend. To do this, in the Dockerfile.compose file produced, copy this line.
|
|
the dshark backend will be located in the build folder from earlier under `/build/install/backends`
|
|
|
|
```
|
|
COPY /path/to/build/install/backends/dshark /opt/tritonserver/backends/dshark
|
|
```
|
|
|
|
Next run
|
|
```
|
|
docker build -t tritonserver_custom -f Dockerfile.compose .
|
|
docker run -it --gpus=1 --net=host -v/path/to/model_repos:/models tritonserver_custom:latest tritonserver --model-repository=/models
|
|
```
|
|
|
|
where `path/to/model_repos` is where you are storing the models you want to run
|
|
|
|
if your not using gpus, omit `--gpus=1`
|
|
|
|
```
|
|
docker run -it --net=host -v/path/to/model_repos:/models tritonserver_custom:latest tritonserver --model-repository=/models
|
|
```
|
|
|
|
# Setting up a model
|
|
|
|
to include a model in your backend, add a directory with your model name to your model repository directory. examples of models can be seen here: https://github.com/triton-inference-server/backend/tree/main/examples/model_repos/minimal_models
|
|
|
|
make sure to adjust the input correctly in the config.pbtxt file, and save a vmfb file under 1/model.vmfb
|
|
|
|
# CUDA
|
|
|
|
if you're having issues with cuda, make sure your correct drivers are installed, and that `nvidia-smi` works, and also make sure that the nvcc compiler is on the path.
|
|
|
|
|
|
|
|
|
|
|