AMD-SHARK-Studio/inference/README.md

# SHARK Triton Backend

The triton backend for shark.

# Build

Install SHARK

```
git clone https://github.com/nod-ai/SHARK.git
# skip above step if dshark is already installed
cd SHARK/inference
```

install dependancies

```
apt-get install patchelf rapidjson-dev python3-dev
git submodule update --init
```

update the submodules of iree

```
cd thirdparty/shark-runtime
git submodule update --init
```

Next, make the backend and install it

```
cd ../..
mkdir build && cd build
cmake -DTRITON_ENABLE_GPU=ON \
-DIREE_HAL_DRIVER_CUDA=ON \
-DIREE_TARGET_BACKEND_CUDA=ON \
-DMLIR_ENABLE_CUDA_RUNNER=ON \
-DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install \
-DTRITON_BACKEND_REPO_TAG=r22.02 \
-DTRITON_CORE_REPO_TAG=r22.02 \
-DTRITON_COMMON_REPO_TAG=r22.02 ..
make install
```

# Incorporating into Triton

There are much more in depth explenations for the following steps in triton's documentation:
https://github.com/triton-inference-server/server/blob/main/docs/compose.md#triton-with-unsupported-and-custom-backends

There should be a file at /build/install/backends/dshark/libtriton_dshark.so.  You will need to copy it into your triton server image.
More documentation is in the link above, but to create the docker image, you need to run the compose.py command in the triton-backend server repo


To first build your image, clone the tritonserver repo.

```
git clone https://github.com/triton-inference-server/server.git
```

then run `compose.py` to build a docker compose file
```
cd server
python3 compose.py --repoagent checksum --dry-run
```

Because dshark is a third party backend, you will need to manually modify the `Dockerfile.compose` to include the dshark backend.  To do this, in the Dockerfile.compose file produced, copy this line.
the dshark backend will be located in the build folder from earlier under `/build/install/backends`

```
COPY /path/to/build/install/backends/dshark /opt/tritonserver/backends/dshark
```

Next run
```
docker build -t tritonserver_custom -f Dockerfile.compose .
docker run -it --gpus=1 --net=host -v/path/to/model_repos:/models  tritonserver_custom:latest tritonserver --model-repository=/models
```

where `path/to/model_repos` is where you are storing the models you want to run

if your not using gpus, omit `--gpus=1`

```
docker run -it  --net=host -v/path/to/model_repos:/models  tritonserver_custom:latest tritonserver --model-repository=/models
```

# Setting up a model

to include a model in your backend, add a directory with your model name to your model repository directory.  examples of models can be seen here: https://github.com/triton-inference-server/backend/tree/main/examples/model_repos/minimal_models

make sure to adjust the input correctly in the config.pbtxt file, and save a vmfb file under 1/model.vmfb

# CUDA

if you're having issues with cuda, make sure your correct drivers are installed, and that `nvidia-smi` works, and also make sure that the nvcc compiler is on the path.