* this pr adds a third party backend for triton that works on AMD
* this expose a lot of the work that has been done in our
[fork](https://github.com/ROCmSoftwarePlatform/triton)
* most unit tests on `test_core.py` pass
* it skips some unit tests for various reasons
* we plan to follow up with more prs improving Functionality and
Performance in the future
---------
Co-authored-by: Philippe Tillet <phil@openai.com>
The initial code merge of Nvidia Hopper features support. Please be
aware that the code merge is not finished yet and the trouble-shooting
is still ongoing. The new hardware features (GMMA, TMA, STMATRIX etc.)
and automatic warp-specialization are experimental for now and turned
off by default. It is recommended for a trial when version 3.0 is
released.
The work is contributed by:
ben-zhang-609, bealwang, donproc, qliu93, jsh20, allatit23, LyricZhao,
ivanyinwz, goostavz & yangjunpro
from Nvidia, in cooperation with:
ptillet, Jokeren, ThomasRaoux & zahimoud
from OpenAI.
Co-authored-by: Goostav Zhu <gzhu@nvidia.com>
Run most of the pytest in parallel, this allows to speed up CI from
36min to 10min for A100 and 22min to 6min for H100. Some tests still
need to run serially like runtime tests.
We need to split the CI into two jobs, nvidia (PR blocking) and
third-party (PR non-blocking). This way we can guarantee that artifacts
are uploaded for any PR that gets merged into `main`, and that `compare
artifacts` job can just wait on the artifacts-uploading job.
Previously the nightly run was failing to upload if there had been no
commits since the previous night. Also moves time ahead 20 minutes to
avoid hourly spike delays launching workflows.
This patch adds a GitHub workflow yaml file and a Docker file to build
LLVM for the commit hash specified in the llvm-hash.txt file in the
llvm-head branch. If the tests run successfully, the built artifacts
are uploaded to Azure blob storage and their URL is available in the CI
logs. These artifacts can then be used in the python/setup.py script
for fetching the necessary LLVM binary objects while building Triton.
The Azure CLI wasn't available on the builders, so this resolves that
issue and updates the docs to point to the new packages. Also stops
publishing Python 3.6 wheels, as that version is out of support.
Temporarily disables musllinux builds, as they are broken.
Ideally you would also build source distributions so that it is in
principle possible to build `triton` on other platforms, but building
`musllinux` wheels would at least help with openai/whisper#1328.
I suspect you will also get people showing up at some point asking for
`aarch64` wheels as well. It might be worth taking a look at the
[`cibuildwheel` output
matrix](https://cibuildwheel.readthedocs.io/en/stable/#what-does-it-do)
to see what you are comfortable with shipping (particularly if you
aren't shipping source distributions).
Fixes#1545
`build_temp` is a temporary directory which `distutils` used to keep in
the `./build` directory, but when `pyproject.toml` is present `pip` now
puts it in `/tmp` and removes it at the end of the build.
Instead, this creates a new permanent directory like
`python/build/cmake.linux_x86_64-cpython-3.8` (the old name but with
cmake instead of temp).
While I was looking at the verbose pip output, I also noticed a bunch of
warnings like
```
Python recognizes 'triton/runtime.backends' as an importable package,
but it is not listed in the `packages` configuration of setuptools.
'triton/runtime.backends' has been automatically added to the distribution only
because it may contain data files, but this behavior is likely to change
in future versions of setuptools (and therefore is considered deprecated).
```
So I've also added these to the packages list.
---------
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
While not currently a vulnerability, using this option can introduce
risks of supply chain attacks, where an attacker adds a malicious
package into the default public repository with the same name as one
that you are installing. The best practice is to instead use --index-url
when a package is not available in the default repo, so I've updated to
use this option instead.
For full disclosure, this is currently causing Microsoft's internal
security checks to fail when building triton (which is why I care about
this theoretical issue/best practice).
This PR is a first in a series of PRs to import the changes that we have
made to enable ROCM on [our
fork](https://github.com/ROCmSoftwarePlatform/triton) of triton.
The PR contains the major changes to the python frontend and enough
changes to the c++ backend to allow compilation and running of the empty
kernel. We use the ROCM ci added a few weeks ago to verify things.
---------
Co-authored-by: Ronan Keryell <ronan@keryell.fr>