[FRONTEND] improve the process of finding libcuda.so and the error message (#1981)

`triton` uses `whereis` command to find `libcuda.so`, which is intended
to find binary, source, and manual page files. When `libcuda.so` is not
properly setup, the `whereis` command ends up with
`/usr/share/man/man7/libcuda.7`, which is not the place to look for.

This PR uses `ldconfig -p` to reliably find `libcuda.so`.

In my case, I find that I have a `libcuda.so.1` file, but it is not
linked to `libcuda.so`. Therefore `ld` cannot find the library to link.
After creating the linking, I was able to run `triton` successfully.

Therefore, I improve the code by first invoking `ldconfig -p`, and
checking `libcuda.so` strings first. These might be possible library to
link against. If the literal `libcuda.so` file is not found, then I
raise an error and tells the user that a possible fix is to create a
symlink file.
This commit is contained in:
youkaichao
2023-07-24 01:31:07 +08:00
committed by GitHub
parent 66eda76e45
commit c9ab44888e

View File

@@ -18,8 +18,17 @@ def is_hip():
@functools.lru_cache()
def libcuda_dirs():
locs = subprocess.check_output(["whereis", "libcuda.so"]).decode().strip().split()[1:]
return [os.path.dirname(loc) for loc in locs]
libs = subprocess.check_output(["ldconfig", "-p"]).decode()
# each line looks like the following:
# libcuda.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so.1
locs = [line.split()[-1] for line in libs.splitlines() if "libcuda.so" in line]
dirs = [os.path.dirname(loc) for loc in locs]
msg = 'libcuda.so cannot found!\n'
if locs:
msg += 'Possible files are located at %s.' % str(locs)
msg += 'Please create a symlink of libcuda.so to any of the file.'
assert any(os.path.exists(os.path.join(path, 'libcuda.so')) for path in dirs), msg
return dirs
@functools.lru_cache()