Commit Graph

131 Commits

Author SHA1 Message Date
Jason Furmanek
33151a860f Merge commit 'ac9fa68d18c777e421bd3f6fb1ddcfd60b6fda33' into ifu-rebase-again
Conflicts:
	.gitignore
	.gitmodules
	README.md
	bin/triton-translate.cpp
	include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td
	include/triton/Target/AMDGCN/AMDGCNTranslation.h
	include/triton/Target/HSACO/HSACOTranslation.h
	lib/Analysis/Allocation.cpp
	lib/Analysis/Utility.cpp
	lib/Conversion/TritonGPUToLLVM/CMakeLists.txt
	lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM.cpp
	lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp
	lib/Conversion/TritonGPUToLLVM/ScanOpToLLVM.cpp
	lib/Conversion/TritonGPUToLLVM/Utility.cpp
	lib/Conversion/TritonGPUToLLVM/Utility.h
	lib/Dialect/TritonGPU/IR/Dialect.cpp
	lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp
	lib/Target/HSACO/CMakeLists.txt
	lib/Target/HSACO/HSACOTranslation.cpp
	lib/Target/LLVMIR/LLVMIRTranslation.cpp
	python/src/triton.cc
	python/test/unit/language/test_core.py
	python/test/unit/operators/test_flash_attention.py
	python/triton/compiler/compiler.py
	python/triton/compiler/make_launcher.py
	python/triton/language/semantic.py
	python/triton/runtime/jit.py
	python/tutorials/06-fused-attention.py
	python/tutorials/11-grouped-gemm.py
	test/Conversion/tritongpu_to_llvm.mlir
2023-11-06 23:10:10 +00:00
Jack Taylor
47563240f8 PyTorch triton branch synchronisation (#354)
* Restructure ROCM Library Search
Currently there are a handful of ROCM dependant files which are required for
triton to run.  The linker(ld.lld), the include files, and multiple hip/hsa
shared objects.

This change will provide three search areas to find these files.  All in
the same order.

1. third_party/rocm.  This location is within the python/triton directory
   and is carried over when triton is built.  IF all necessary files
   are in this location there will be no need to have ROCM installed at
   all on the system.

2. $ROCM_PATH environmental variable.  If this exists it will override
   all other locations to find ROCM necessary files

3. /opt/rocm.  The default location for ROCm installations.  Finding one
   here will notify triton that ROCM is installed in this environment

To ease with step 3.  A new script scripts/amd/setup_rocm_libs.sh
has been added to the repo.  Executing this script will cause all necessary
ROCM files to be downloaded from their respective packages on repo.radeon.com
and installed in third_party/rocm.  Allowing for triton to run without installing
the full ROCM stack.  setup_rocm_libs.sh takes a env_var ROCM_VERSION if a user
wishes to install a ROCM version other than the default (currently 5.4.2)

When triton whls are built to support Pytorch, method 3 will be used to stay in
sync with PyTorch's approach of bringing along any libraries needed and not
requiring ROCM to be installed.

(cherry picked from commit e6aea90fb3e8218cb562e5d990719112d8282702)

* Fix default rocm path

Running into `fatal error: hip/hip_runtime.h: No such file or directory` with latest wheel due to incorrect directory for ROCm libs

(cherry picked from commit 292bae625b113eb65c66cfe4442da7a6456c988a)

* setup_rocm_libs.sh manylinux refactor

(cherry picked from commit f995f314ada4606cb78dc6233cd9c8effc356191)

* Set setup_rocm_libs.sh to be executable

(cherry picked from commit 05d67b9418cacda0d356c2102d7c1a887948b013)

* Revert to using numbered so files to fix upstream

(cherry picked from commit 34f8189eae57a23cc15b4b4f032fe25757e0db8e)

* Remove drm script

---------

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2023-10-11 15:30:39 +01:00
Jason Furmanek
74fd8e9754 Merge commit '36fc54b6f28168d3644808bfe299f1ba06a36272' into ifu230908-2
Conflicts:
	.gitignore
	bin/triton-translate.cpp
	include/triton/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.h
	include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td
	include/triton/Dialect/TritonGPU/IR/TritonGPUDialect.td
	lib/Analysis/Utility.cpp
	lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandMMAv2.cpp
	lib/Conversion/TritonGPUToLLVM/DotOpToLLVM.cpp
	lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp
	lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp
	lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVM.cpp
	lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMBase.h
	lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.cpp
	lib/Conversion/TritonGPUToLLVM/Utility.h
	lib/Dialect/Triton/Transforms/RewriteTensorPointer.cpp
	lib/Dialect/TritonGPU/IR/Dialect.cpp
	lib/Dialect/TritonGPU/Transforms/AccelerateMatmul.cpp
	lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp
	lib/Target/LLVMIR/LLVMIRTranslation.cpp
	python/src/triton.cc
	python/test/unit/runtime/test_subproc.py
	python/triton/compiler/compiler.py
	python/triton/compiler/make_launcher.py
	python/triton/language/semantic.py
	python/triton/runtime/jit.py
	python/tutorials/06-fused-attention.py
	test/Conversion/triton_to_tritongpu.mlir
	test/Conversion/tritongpu_to_llvm.mlir
	test/TritonGPU/coalesce.mlir
	unittest/Conversion/TritonGPUToLLVM/CMakeLists.txt
2023-10-02 18:01:04 +00:00
Justin Lebar
9bf9c20f30 [DOCS] update build instructions, and add testing instrs. (#2400)
- Note `wheel` as a build-time dependency.
- Add tips for getting a faster build.
- Add instructions for running tests.
- Add flag to build with ccache.

(Thanks to @ThomasRaoux for most of these instructions!)
2023-09-27 22:13:03 -07:00
Justin Lebar
2a3746bac5 [BUILD] use ninja (#2318) 2023-09-18 15:30:06 -05:00
Philippe Tillet
e686b4d6d4 [FRONTEND] interpreter rewrite (#2321)
This is a new interpreter mode that shares semantic analysis with the
JIT'ed codepath and that the Triton core team is committed to maintain
2023-09-17 14:58:50 -07:00
Justin Lebar
073aa16379 [BUILD] use ninja (#2318) 2023-09-17 02:08:04 -07:00
Zahi Moudallal
36087a108f [FRONTEND] Added SASS to asm dict (#2280) 2023-09-13 21:21:01 +00:00
jon-chuang
36859aebff [DOCS] Add MLIR Autogenerated Docs to Sphinx Docs (#2234)
Partially fixes: https://github.com/openai/triton/issues/2226

Here are some example renderings:
![Screenshot from 2023-09-04
18-39-20](https://github.com/openai/triton/assets/9093549/e9c4af04-aeae-4021-a8db-6a4a82b59ae7)
![Screenshot from 2023-09-04
18-39-30](https://github.com/openai/triton/assets/9093549/410391b8-e07e-4bed-909c-8ce5484072d1)
![Screenshot from 2023-09-04
18-39-41](https://github.com/openai/triton/assets/9093549/f1eaef95-66c1-4506-a153-c6069e2b5072)
2023-09-06 08:17:12 +00:00
Wang Weihan
e721911705 [FRONTEND] clean build directly when executing python setup.py clean (#2238)
Current setup.py could not clean the build directly because the default
build directly has been changed in `CMakeBuild`. This PR is to clean
build directly in this regard.
2023-09-04 21:31:38 -07:00
darkbuck
a3df6068b4 [BACKEND] Minor fixes found when building triton with LLVM 17/main branches (#2089)
- These minor fixes are not specific to interface changes from LLVM main
or official llvm-17 branch and can be applied on triton main branch.
- https://github.com/darkbuck/triton/tree/darkbuck/main/llvm-main-branch
has extra changes to build again LLVM main branch build to enable me to
work on other backends on the main branch only. That's the hobby effort
and just FYR.
2023-08-16 01:18:06 +00:00
Alex Collins
4ed8381fdb Linux arm64 support (#2003)
We are interested in having python wheels for triton built for Linux
arm64 platforms, such as NVIDIA's Grace CPU.

This change is fairly simple, however:
- It requires a linux arm64 build of LLVM to be available (see MR here:
https://github.com/ptillet/triton-llvm-releases/pull/15)
- For now my changes use the LLVM build hosted here:
https://github.com/acollins3/triton-llvm-releases/releases/tag/llvm-17.0.0-c5dede880d17
- The Triton release process will need to be updated to include arm64
wheels. Is this something you have time to work on @ptillet? It would be
difficult for me to update this part without more access permissions.

With these changes, I managed to build a set of python wheels and have
hosted them here for us to use in the meantime:
https://github.com/acollins3/triton/releases/tag/triton-2.1.0-arm64
2023-08-08 12:39:41 +08:00
goostavz
f1512bded1 Initial code merge of Hopper support (#2036)
The initial code merge of Nvidia Hopper features support. Please be
aware that the code merge is not finished yet and the trouble-shooting
is still ongoing. The new hardware features (GMMA, TMA, STMATRIX etc.)
and automatic warp-specialization are experimental for now and turned
off by default. It is recommended for a trial when version 3.0 is
released.

The work is contributed by:
ben-zhang-609, bealwang, donproc, qliu93, jsh20, allatit23, LyricZhao,
ivanyinwz, goostavz & yangjunpro
from Nvidia, in cooperation with:
ptillet, Jokeren, ThomasRaoux & zahimoud
from OpenAI.

Co-authored-by: Goostav Zhu <gzhu@nvidia.com>
2023-08-07 09:53:04 +08:00
Philippe Tillet
3452615d79 [BUILD] Reverted ptxas change and fixed bug in cache key computation (#1971) 2023-07-19 20:58:24 -07:00
Philippe Tillet
8207eabd7b [FRONTEND][OPTIMIZER] small perf improvements (#1945) 2023-07-14 15:11:36 -07:00
Daniyal khan
b70d07aafe [BUILD][DOCS] updated setup.py and documentation (#1930) 2023-07-11 11:46:28 -07:00
Goran Flegar
938a6754b4 [BUILD] Export compile commands (#1854)
This can be used by IDEs to figure out how to correctly compile
individual sources and offer semantic code completion.
2023-06-28 14:11:59 +00:00
Wang Weihan
4d3a92f1b8 [BUILD] Make sure always build_ext first (#1819)
The third-party backend might install its python package to the
`triton/third_party` python package during the build process. But the
`build_py` could be executed before the `build_ext`, and then `build_py`
would only copy the `packages` defined in the `setup.py` w/o the
third-party related packages as the third-party backend has not been
built, which is triggered by `build_ext`. Therefore, this PR refined the
build order a little bit to ensure `build_ext` always happens before
`build_py`.
2023-06-22 13:32:03 -07:00
Philippe Tillet
b24dc19741 [FRONTEND] cleaned up symbol names (#1782) 2023-06-14 18:55:32 -07:00
Wang Weihan
b27a91a113 [FRONTEND] Enable triton to support register thirdparty backend at runtime (#1643)
This PR intends to provide a mechanism to support a third-party backend
at runtime to generate the backend-specific code.

The mechanism provided a common class to abstract the third-party
backend logic and two essential functions to register and get the
third-party backend at runtime.

- `BaseBackend`: A common class to abstract the third-party backend
logic
- `register_backend`: Register a third-party backend with a given device
type
- `get_backend`: Get the third-party backend with a given device type

Generally, a third-party backend must inherit from `BaseBackend` and
implement all the member functions according to the backend
characteristics. As long as the backend implementation is ready, the
third-party backend can invoke `register_backend` to register it under a
given device. During the kernel compilation and execution, the mechanism
will get the registered backend to generate the kernel and launcher code
for a given device.

This PR added a dummy backend to simulate a third-party backend and
demonstrate the usage.

-
[test_device_backend.py](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1):
To define a third-party backend and register the backend
-
[ExtensionBackend](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R123):
Inherit from the `BaseBackend` and implement some specific logic like
[filter out some compile
stages](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R129-R135)
- [Register the `ExtensionBackend` for
`CPU`](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R279)
  
-
[extension_backend.c](https://github.com/openai/triton/pull/1643/files#diff-169c1d08b3a0a7b343cfa3258fbc32b47e0f6c46305a112652fa1bdaaec89d29):
To provide the utility function to load kernel binary and get the
backend properties.
2023-06-09 09:09:59 -07:00
Ingo Müller
0c4de8ab72 [DEPENDENCIES] Update LLVM to 17.0.0 (c5dede880d17) and port changes. (#1668)
This depends on a [pending LLVM
release](https://github.com/ptillet/triton-llvm-releases/pull/10).

* Implement setCalleeFromCallable in CallOp.
* Cast type to ShapedType for various getters.
* Improve TritonDialect::materializeConstant due to breaking change in
constructor of arith::ConstantOp.
* Add OpaqueProperties argument in inferReturnTypes.

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-05-15 21:51:14 -07:00
Michaël Benesty
858a2f0a5e [FRONTEND] Added interpreter mode (#1573)
Simple mechanism to run Triton kernels on PyTorch for debugging purpose
(upstream from Kernl).

Todo:
- random grid iteration
- support of atomic ops
- more unit tests
- cover new APIs?
2023-05-08 14:28:20 -07:00
Philippe Tillet
d338521b65 [SETUP] Removing torch as a test dependency (#1632)
circular dependency is causing troubles now that our interpreter depends
on torch 2.0 ...
2023-05-07 12:29:19 -07:00
Philippe Tillet
ec242430d1 [THIRD_PARTY] bumped ptxas version to 12.1.105 (#1574) 2023-04-24 16:49:31 -07:00
peterbell10
c71bf73f24 [BUILD] Use a persistent directory for cmake (#1548)
Fixes #1545

`build_temp` is a temporary directory which `distutils` used to keep in
the `./build` directory, but when `pyproject.toml` is present `pip` now
puts it in `/tmp` and removes it at the end of the build.

Instead, this creates a new permanent directory like
`python/build/cmake.linux_x86_64-cpython-3.8` (the old name but with
cmake instead of temp).

While I was looking at the verbose pip output, I also noticed a bunch of
warnings like
```
Python recognizes 'triton/runtime.backends' as an importable package,
but it is not listed in the `packages` configuration of setuptools.

'triton/runtime.backends' has been automatically added to the distribution only
because it may contain data files, but this behavior is likely to change
in future versions of setuptools (and therefore is considered deprecated).
```

So I've also added these to the packages list.

---------

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2023-04-20 16:38:44 -07:00
Daniil Fukalov
a90a2d864f [BUILD] Add ability to build with clang+lld. (#1544)
This way reduces build time with assertions enabled LLVM and
dramatically speeds up triton's build with a "debug" LLVM.

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-04-18 21:20:12 +00:00
Philippe Tillet
e5c7d2a83c [FRONTEND] cleaned up language; added frontend function for globaltimer special register (#1525) 2023-04-14 15:29:27 -07:00
Philippe Tillet
c0d86d3b04 [RUNTIME] refactor driver (#1515)
Improved separation between different backends
2023-04-12 23:50:44 -07:00
Phil Tillet
d7d62ddae9 Revert "[BUILD] Fixed typo in setup.py"
This reverts commit 2931bb8195.
2023-04-11 20:12:22 -07:00
Phil Tillet
2931bb8195 [BUILD] Fixed typo in setup.py 2023-04-11 20:09:09 -07:00
Philippe Tillet
e0d6f5f4f5 [BUILD] updated LLVM binaries (#1504)
Co-authored-by: Christian Sigg <csigg@google.com>
2023-04-11 00:14:00 -07:00
Eta
577cafff0a [BUILD] Add missing subpackages to build (#1475)
The `triton/compiler`, `triton/runtime/driver`, and `triton/third_party`
subpackages were missing from the distribution built with the old
`setup.py` after #1464, causing an immediate error upon importing Triton
with a non-editable installation. This change adds the missing Python
subpackages and moves `triton/third_party` inclusion to `MANIFEST.in`,
where it will automatically be included in wheels due to the existing
`include_package_data` setup flag.
2023-04-04 22:41:08 -07:00
Philippe Tillet
053af4e9f8 [FRONTEND] Refactor file hierarchy (#1464)
The purpose of this PR is to remove some circular dependencies and
separate concerns better in the frontend. It's still not perfect --
`triton.compile` still includes a few runtime architecture-specific
component, but at least much better than before.

This PR still assumes that AMD only supports empty kernels right now.
Other PRs will follow to make the frontend supports multiple devices in
a more modular way.
2023-04-02 12:07:08 -07:00
Francisco Massa
c1b057eee9 [FRONTEND] Add option to specify number of compilation threads during Triton compilation (#1450)
On some machines, the amount of available RAM might not be enough to
compile Triton with `2 * num_cpus` parallelism. For example, CircleCI's
`large` instance can't handle Triton compilation as is due to
insufficient memory.

Instead, I propose to take PyTorch's approach where we can define a
[`MAX_JOBS` env
var](0e4ddc2b40/tools/setup_helpers/cmake.py (L366-L368))
that gives the user the possibility to reduce (or increase) the
parallelism during compilation.

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-03-31 11:34:18 -07:00
Philippe Tillet
fe76b12354 [BUILD] Back to cmake >= 3.18 (#1428) 2023-03-27 16:47:34 -07:00
Xuehai Pan
c52219b5c3 [SETUP] avoid using deprecated distutils (#1400)
Module [`distutils`](https://docs.python.org/3/library/distutils.html)
is deprecated and will be removed in Python 3.12.

Ref:

- `distutils` documentation:

> ##
[distutils](https://docs.python.org/3/library/distutils.html#module-distutils)
— Building and installing Python modules
>
[distutils](https://docs.python.org/3/library/distutils.html#module-distutils)
is deprecated with removal planned for Python 3.12.

- PEP 632 – Deprecate distutils module:

> [PEP 632 – Deprecate distutils
module](https://peps.python.org/pep-0632)

------

This PR removes references to `distutils` and replaces them with
[`packaging`](https://pypi.org/project/packaging) and[
`sysconfig`](https://docs.python.org/3/library/sysconfig.html).
Alleviate potential breakage in the modern Python packaging system.

Changes:

- Removes references to `distutils` and replaces them with
[`packaging`](https://pypi.org/project/packaging) and[
`sysconfig`](https://docs.python.org/3/library/sysconfig.html)
- Add `cmake` and `package` in `build-system.requires` to install
necessary build dependencies prior to calling `setup.py`.
- Minor changes: `multiprocessing.cpu_count() -> os.cpu_count()` and add
PyPI classifiers.

---------

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-03-27 10:37:47 -07:00
Philippe Tillet
7c7b769e37 [SETUP] Fixed dependencies (#1389) 2023-03-22 16:15:35 -07:00
Philippe Tillet
e4b2d1bc3d [FRONTEND][BACKEND] no longer using indices for loops (#1370) 2023-03-19 14:57:50 -07:00
Stonepia
109b5e2729 [BUILD] Fix the build bug when user use system package of llvm by setting LLVM_SYSPATH (#1336)
When the user set the `LLVM_SYSPATH` to use custom build llvm, it will
throw the error because there is no version.txt under the custom build
one.

This PR skips the version check If the `LLVM_SYSPATH` is set.

---------

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-03-15 13:28:19 -07:00
Phil Tillet
773c29cfaa [BUILD] Fix comment typo 2023-03-07 16:47:30 -08:00
Phil Tillet
305f99e614 [BUILD] Fixed typo in setup.py 2023-03-07 15:45:36 -08:00
Philippe Tillet
c34b32866b [BUILD] re-download package if version has changed (#1294) 2023-03-07 10:15:35 -08:00
Philippe Tillet
ff94e34430 [TESTS][BUILD] now using llvm @ 8e5a41e8271f (#1282)
Now we also use the FileTest utility packaged with llvm pre-built binaries
2023-03-05 17:23:00 -08:00
Keren Zhou
77c145cec8 [BUILD] Bump cmake requirement to >= 3.20 and format CMakeLists.txt (#1276)
cc @malfet
2023-03-03 11:43:09 -08:00
Phil Tillet
c7581c9a91 [PACKAGING] bump dev version to 2.1.0 2023-03-02 21:52:30 -08:00
BillSchumacher
6b44d31ae4 [BUILD] windows and cmake compatibility. (#1214)
Make cmake happier, it doesn't like multiple target_link_library
definitions for the same name.

Use find_package instead on Windows for dlfcn-win32. 
Set LLVM_SYS_PATH on Windows for python setup.

Debug build almost working, AlwaysCreate error thrown still.
2023-02-19 09:51:50 +00:00
Christian Sigg
9ef4b5d773 Rebase to LLVM-head. (#1200)
Rebase to
37b7a60cd7
2023-02-17 13:16:11 -08:00
Philippe Tillet
969331aedd [BUILD] fixed setup.py on older glibc (#1206) 2023-02-16 19:43:18 -08:00
Christian Sigg
fc7a8e3581 Rebase Triton to LLVM-15. (#1070)
This PR rebases Triton from LLVM-14 to LLVM-15. Most changes are
mechanical, except for the analysis framework changes.
2023-02-16 06:40:53 -08:00
Nikita Shulga
ebbd9a5df3 [BUILD] remove unused global var (#1161)
`package_data` is no longer referenced from anywhere.

Use `third_party/**/*` wildcard to package contents of subfolders
2023-02-08 05:23:05 +00:00