Commit Graph

92 Commits

Author SHA1 Message Date
jon-chuang
36859aebff [DOCS] Add MLIR Autogenerated Docs to Sphinx Docs (#2234)
Partially fixes: https://github.com/openai/triton/issues/2226

Here are some example renderings:
![Screenshot from 2023-09-04
18-39-20](https://github.com/openai/triton/assets/9093549/e9c4af04-aeae-4021-a8db-6a4a82b59ae7)
![Screenshot from 2023-09-04
18-39-30](https://github.com/openai/triton/assets/9093549/410391b8-e07e-4bed-909c-8ce5484072d1)
![Screenshot from 2023-09-04
18-39-41](https://github.com/openai/triton/assets/9093549/f1eaef95-66c1-4506-a153-c6069e2b5072)
2023-09-06 08:17:12 +00:00
Michael Melesse
c6d33dcebf [ROCM] Core Functionality for AMD (#1983)
* this pr adds a third party backend for triton that works on AMD 
* this expose a lot of the work that has been done in our
[fork](https://github.com/ROCmSoftwarePlatform/triton)
* most unit tests on `test_core.py` pass
* it skips some unit tests for various reasons
* we plan to follow up with more prs improving Functionality and
Performance in the future

---------

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-08-31 14:02:00 -07:00
Zahi Moudallal
bffb76e847 [DOCS] Fixing docs (#2221) 2023-08-31 11:47:57 -07:00
Zahi Moudallal
f922ecbc29 [DOCS] Fixing dependency (#2219) 2023-08-31 17:51:04 +00:00
Zahi Moudallal
cf31f4ddb2 [DOCS] Fixing docs by using sphinx-build instead of multiversion (#2217) 2023-08-31 16:51:14 +00:00
Zahi Moudallal
5282ed890d [CI] Add back pre-commit to nvidia CI job (#2159) 2023-08-23 01:11:03 +00:00
Zahi Moudallal
3c8f959f91 [CI] Adding workflow_run (#2120) 2023-08-18 23:58:41 -07:00
Zahi Moudallal
1faf93e6fb [CI] Fix PR comment (#2131) 2023-08-18 09:16:18 -07:00
Zahi Moudallal
b33f97a682 [CI] Fix bug in Compare Artifacts workflow (#2128)
Forgot to remove this line
2023-08-17 18:06:36 -07:00
Zahi Moudallal
6f654cfbbf [CI] Testing PR comment from another workflow (#2127) 2023-08-17 17:34:59 -07:00
Zahi Moudallal
3fa6d51bc9 [CI] Adding new github workflow for testing (#2121) 2023-08-17 15:32:38 -07:00
Zahi Moudallal
557b2d4b34 [CI] upload only test/unit/operators cache to artifacts and rely on kernel names in cache to compare artifacts (#2111) 2023-08-16 20:34:40 -07:00
Zahi Moudallal
0312ed3473 [CI] Update kernels names (#2093)
Co-authored-by: Philippe Tillet <phil@openai.com>
2023-08-14 19:41:41 -07:00
Philippe Tillet
facc1dcbac [TESTS] better matmul unit testing (#2098) 2023-08-13 17:54:32 -07:00
Zahi Moudallal
c309f7e57a [CI] Skip PR if we paginate 30 times without finding a run_id (#2092) 2023-08-11 15:31:46 -07:00
Philippe Tillet
3ec05fb023 [CI] H100 tests always use ENABLE_TMA=1 ENABLE_MMA_V3=1 (#2051) 2023-08-07 19:32:55 -07:00
Philippe Tillet
54f1ac950e [CI] disable AMD CI (#2045) 2023-08-07 12:03:26 -07:00
Philippe Tillet
223c2d32a2 [CI] disable XPU tests (not compiling) (#2044)
cc @EikanWang . I'm disabling this for now since it broke with the H100
merge, but please feel free to fix the compilation errors and submit a
PR.
2023-08-07 11:56:16 -07:00
goostavz
f1512bded1 Initial code merge of Hopper support (#2036)
The initial code merge of Nvidia Hopper features support. Please be
aware that the code merge is not finished yet and the trouble-shooting
is still ongoing. The new hardware features (GMMA, TMA, STMATRIX etc.)
and automatic warp-specialization are experimental for now and turned
off by default. It is recommended for a trial when version 3.0 is
released.

The work is contributed by:
ben-zhang-609, bealwang, donproc, qliu93, jsh20, allatit23, LyricZhao,
ivanyinwz, goostavz & yangjunpro
from Nvidia, in cooperation with:
ptillet, Jokeren, ThomasRaoux & zahimoud
from OpenAI.

Co-authored-by: Goostav Zhu <gzhu@nvidia.com>
2023-08-07 09:53:04 +08:00
Philippe Tillet
1db3bdc52e [BACKEND] avoid code duplication for fully warp-synchronous reductions (#1978) 2023-07-21 16:06:00 -07:00
Phil Tillet
c7757fae71 [GITHUB] tweak CODEOWNERS 2023-07-17 00:41:11 -07:00
Keren Zhou
571c92f2a8 [CI] Fix CI kernel compare (#1931)
With this PR, we find the latest merged PR that successfully passed
"Integration Tests".
2023-07-12 10:06:34 -07:00
Philippe Tillet
7e3ebbc4c8 [TESTING] now using cuda graphs for perf regression tests (#1925) 2023-07-10 22:49:25 -07:00
Zahi Moudallal
2a2cbc352b [CI] Fix failure due to not finding workflow run on first page (#1907)
Co-authored-by: Philippe Tillet <phil@openai.com>
2023-07-06 12:48:36 -07:00
Thomas
787cdff0cd [TESTS] Enable parallel pytest in CI for CUDA (#1905)
Run most of the pytest in parallel, this allows to speed up CI from
36min to 10min for A100 and 22min to 6min for H100. Some tests still
need to run serially like runtime tests.
2023-07-06 11:40:33 -07:00
Zahi Moudallal
2c51dc00ab [CI] Use PAT for PR comment in CI (#1839)
Co-authored-by: Philippe Tillet <phil@openai.com>
2023-07-05 12:34:58 -04:00
Zahi Moudallal
62aef58c17 [CI] Split integration tests into two jobs (#1855)
We need to split the CI into two jobs, nvidia (PR blocking) and
third-party (PR non-blocking). This way we can guarantee that artifacts
are uploaded for any PR that gets merged into `main`, and that `compare
artifacts` job can just wait on the artifacts-uploading job.
2023-07-01 11:28:55 -07:00
Zahi Moudallal
6ad8cd52e7 [CI] Added IR reference-check github workflow (#1755) 2023-06-22 18:00:40 -07:00
Zahi Moudallal
cfef4389f5 [CI] Add gpu arch to artifacts name (#1827) 2023-06-22 16:45:50 -07:00
Nathaniel McVicar
d416d63152 [PUBLISHING] Skip nightly run if no commits (#1807)
Previously the nightly run was failing to upload if there had been no
commits since the previous night. Also moves time ahead 20 minutes to
avoid hourly spike delays launching workflows.
2023-06-21 18:12:32 -07:00
Ashay Rane
2d774ab129 [CI] add workflow for building, testing, and uploading LLVM artifacts (#1777)
This patch adds a GitHub workflow yaml file and a Docker file to build
LLVM for the commit hash specified in the llvm-hash.txt file in the
llvm-head branch.  If the tests run successfully, the built artifacts
are uploaded to Azure blob storage and their URL is available in the CI
logs.  These artifacts can then be used in the python/setup.py script
for fetching the necessary LLVM binary objects while building Triton.
2023-06-20 09:26:19 -05:00
Philippe Tillet
9a2580de13 [CI] Added H100 node (#1779) 2023-06-15 14:21:47 -07:00
Wang Weihan
b27a91a113 [FRONTEND] Enable triton to support register thirdparty backend at runtime (#1643)
This PR intends to provide a mechanism to support a third-party backend
at runtime to generate the backend-specific code.

The mechanism provided a common class to abstract the third-party
backend logic and two essential functions to register and get the
third-party backend at runtime.

- `BaseBackend`: A common class to abstract the third-party backend
logic
- `register_backend`: Register a third-party backend with a given device
type
- `get_backend`: Get the third-party backend with a given device type

Generally, a third-party backend must inherit from `BaseBackend` and
implement all the member functions according to the backend
characteristics. As long as the backend implementation is ready, the
third-party backend can invoke `register_backend` to register it under a
given device. During the kernel compilation and execution, the mechanism
will get the registered backend to generate the kernel and launcher code
for a given device.

This PR added a dummy backend to simulate a third-party backend and
demonstrate the usage.

-
[test_device_backend.py](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1):
To define a third-party backend and register the backend
-
[ExtensionBackend](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R123):
Inherit from the `BaseBackend` and implement some specific logic like
[filter out some compile
stages](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R129-R135)
- [Register the `ExtensionBackend` for
`CPU`](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R279)
  
-
[extension_backend.c](https://github.com/openai/triton/pull/1643/files#diff-169c1d08b3a0a7b343cfa3258fbc32b47e0f6c46305a112652fa1bdaaec89d29):
To provide the utility function to load kernel binary and get the
backend properties.
2023-06-09 09:09:59 -07:00
Nathaniel McVicar
9101736da1 [PUBLISHING] Install Azure CLI for nightly builds (#1748)
The Azure CLI wasn't available on the builders, so this resolves that
issue and updates the docs to point to the new packages. Also stops
publishing Python 3.6 wheels, as that version is out of support.

Temporarily disables musllinux builds, as they are broken.
2023-06-07 19:39:53 -07:00
Zahi Moudallal
e3ab942c44 [CI] Update path to artifacts (#1749) 2023-06-06 17:05:44 -07:00
Nathaniel McVicar
035381aa28 [PUBLISHING] Enable nightlies on Azure DevOps (#1729)
This reenables nightly builds and uploads them to a public AzDO, where
you can use pip to install a specific version or the latest as normal.
For example: `python -m pip install -i
https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/
triton-nightly`
2023-06-02 16:36:22 -07:00
Zahi Moudallal
0cd8f05e01 [CI] Upload CUDA test artifacts (#1645) 2023-05-09 23:04:23 -07:00
Paul Ganssle
319af1fb65 [CI] Build wheels for musllinux (#1638)
Ideally you would also build source distributions so that it is in
principle possible to build `triton` on other platforms, but building
`musllinux` wheels would at least help with openai/whisper#1328.

I suspect you will also get people showing up at some point asking for
`aarch64` wheels as well. It might be worth taking a look at the
[`cibuildwheel` output
matrix](https://cibuildwheel.readthedocs.io/en/stable/#what-does-it-do)
to see what you are comfortable with shipping (particularly if you
aren't shipping source distributions).
2023-05-09 01:04:13 -07:00
Philippe Tillet
aaeba98f13 [CI] no longer runs CI job on macos-10.15 (#1624) 2023-05-05 12:13:33 -07:00
peterbell10
c71bf73f24 [BUILD] Use a persistent directory for cmake (#1548)
Fixes #1545

`build_temp` is a temporary directory which `distutils` used to keep in
the `./build` directory, but when `pyproject.toml` is present `pip` now
puts it in `/tmp` and removes it at the end of the build.

Instead, this creates a new permanent directory like
`python/build/cmake.linux_x86_64-cpython-3.8` (the old name but with
cmake instead of temp).

While I was looking at the verbose pip output, I also noticed a bunch of
warnings like
```
Python recognizes 'triton/runtime.backends' as an importable package,
but it is not listed in the `packages` configuration of setuptools.

'triton/runtime.backends' has been automatically added to the distribution only
because it may contain data files, but this behavior is likely to change
in future versions of setuptools (and therefore is considered deprecated).
```

So I've also added these to the packages list.

---------

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2023-04-20 16:38:44 -07:00
Phil Tillet
92d07d1b8e [DOCS] Fixed up workflow 2023-04-13 16:05:29 -07:00
Philippe Tillet
2aad2336e9 [DOCS] Documentation job now uses A100 GPUs (#1522) 2023-04-13 15:35:16 -07:00
Keren Zhou
fdf1c1f2a1 [DOCS] Fix documentation workflow (#1520)
Co-authored-by: Phil Tillet <phil@openai.com>
2023-04-13 13:49:36 -07:00
Philippe Tillet
5b9119117b [CI] No longer install triton in editable mode to run tests (#1476) 2023-04-12 17:55:44 -07:00
Keren Zhou
272f23457a [DOCS] Restore the documentation workflow (#1503)
Not sure if it works at this moment, but at least we can restore the
workflow first.
2023-04-11 13:36:15 -07:00
Michael Holman
e3a763872b [CI] Avoid using EXTRA_INDEX_URL in pip install (#1424)
While not currently a vulnerability, using this option can introduce
risks of supply chain attacks, where an attacker adds a malicious
package into the default public repository with the same name as one
that you are installing. The best practice is to instead use --index-url
when a package is not available in the default repo, so I've updated to
use this option instead.

For full disclosure, this is currently causing Microsoft's internal
security checks to fail when building triton (which is why I care about
this theoretical issue/best practice).
2023-03-27 20:34:45 +00:00
Xuehai Pan
c52219b5c3 [SETUP] avoid using deprecated distutils (#1400)
Module [`distutils`](https://docs.python.org/3/library/distutils.html)
is deprecated and will be removed in Python 3.12.

Ref:

- `distutils` documentation:

> ##
[distutils](https://docs.python.org/3/library/distutils.html#module-distutils)
— Building and installing Python modules
>
[distutils](https://docs.python.org/3/library/distutils.html#module-distutils)
is deprecated with removal planned for Python 3.12.

- PEP 632 – Deprecate distutils module:

> [PEP 632 – Deprecate distutils
module](https://peps.python.org/pep-0632)

------

This PR removes references to `distutils` and replaces them with
[`packaging`](https://pypi.org/project/packaging) and[
`sysconfig`](https://docs.python.org/3/library/sysconfig.html).
Alleviate potential breakage in the modern Python packaging system.

Changes:

- Removes references to `distutils` and replaces them with
[`packaging`](https://pypi.org/project/packaging) and[
`sysconfig`](https://docs.python.org/3/library/sysconfig.html)
- Add `cmake` and `package` in `build-system.requires` to install
necessary build dependencies prior to calling `setup.py`.
- Minor changes: `multiprocessing.cpu_count() -> os.cpu_count()` and add
PyPI classifiers.

---------

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-03-27 10:37:47 -07:00
Xuehai Pan
5b36cb48ad [CI][TEST] update pre-commit hooks and use pre-commit for style tests in CI (#1409)
Ref issue:

- #1408

Changes:

- Add `.editorconfig`
- Add `pre-commit-hooks`:

    ```yaml
    - repo: https://github.com/pre-commit/pre-commit-hooks
      rev: v4.4.0
      hooks:
        - id: check-symlinks
        - id: destroyed-symlinks
        - id: trailing-whitespace
        - id: end-of-file-fixer
        - id: check-yaml
        - id: check-toml
        - id: check-ast
        - id: check-added-large-files
        - id: check-merge-conflict
        - id: check-executables-have-shebangs
        - id: check-shebang-scripts-are-executable
        - id: detect-private-key
        - id: debug-statements
    ```
- Add `flake8` to `pre-commit` config and add `.flake8` file
- Use `pre-commit` for style tests in CI
- Run `pre-commit` and fix existing violations:
    - fix trailing spaces
    - fix end-of-files
    - fix mod file mode with `chmod -x`
    - run `autopep8` on existing code
    - fix `flake8` violations
2023-03-25 14:52:16 -07:00
Michael Melesse
a9c87245b4 [ROCM] Enable ROCM Backend #1: Empty Kernel (#1312)
This PR is a first in a series of PRs to import the changes that we have
made to enable ROCM on [our
fork](https://github.com/ROCmSoftwarePlatform/triton) of triton.

The PR contains the major changes to the python frontend and enough
changes to the c++ backend to allow compilation and running of the empty
kernel. We use the ROCM ci added a few weeks ago to verify things.

---------

Co-authored-by: Ronan Keryell <ronan@keryell.fr>
2023-03-24 17:18:27 -07:00
Philippe Tillet
7c7b769e37 [SETUP] Fixed dependencies (#1389) 2023-03-22 16:15:35 -07:00