Commit Graph

34 Commits

Author SHA1 Message Date
Whitney Tsang
129e7dfc6f [TritonGPUToLLVM] Correct the usage of option passing (#2104)
For example, when given `--convert-triton-gpu-to-llvm="is-rocm=true"`,
`ConvertTritonGPUToLLVMPass` should generate ROCM-compatible LLVM.
Before this PR, transformation options passed in command line are not
respected.
2023-08-16 00:56:01 +00:00
Zahi Moudallal
4d373aa103 [BACKEND] Remove HopperHelpers.c and replace with inline ptx and LLVM codegen (#2047) 2023-08-10 15:52:37 -07:00
Peter Hawkins
3be74fa92d Include only necessary MLIR conversion passes, rather than all of them. (#2068)
No functional changes intended, and it might slightly speed up the
build.

This allows a downstream Bazel build of Triton to avoid building a
number of dialects and passes that Triton doesn't need.
2023-08-09 08:30:42 -07:00
goostavz
f1512bded1 Initial code merge of Hopper support (#2036)
The initial code merge of Nvidia Hopper features support. Please be
aware that the code merge is not finished yet and the trouble-shooting
is still ongoing. The new hardware features (GMMA, TMA, STMATRIX etc.)
and automatic warp-specialization are experimental for now and turned
off by default. It is recommended for a trial when version 3.0 is
released.

The work is contributed by:
ben-zhang-609, bealwang, donproc, qliu93, jsh20, allatit23, LyricZhao,
ivanyinwz, goostavz & yangjunpro
from Nvidia, in cooperation with:
ptillet, Jokeren, ThomasRaoux & zahimoud
from OpenAI.

Co-authored-by: Goostav Zhu <gzhu@nvidia.com>
2023-08-07 09:53:04 +08:00
Thomas
e6216047b8 [BACKEND] Upgrade the max PTX version allowed to 8.2 (#1982) 2023-07-23 19:56:01 -07:00
Keren Zhou
cc5a7ed52f [FRONTEND][BACKEND] Materialize line info for triton kernels (#1902)
`export TRITON_DISABLE_LINE_INFO=1` to disable the feature.
2023-07-07 16:03:44 -04:00
Ingo Müller
a5fb71eed8 [CMAKE] Add link dependency to dl for TritonLLVMIR. (#1857)
That library makes use of the dladdr function, so it eventually needs to
be linked with -ldl, which may not be done automatically. This commit
adds a link dependency to `${CMAKE_DL_LIBS}`, which is CMake's way of
specifying that library in a portable way.
2023-06-29 10:31:45 -04:00
Mehdi Amini
83245259a6 [OPTIMIZER][BACKEND] switch the TritonGPU dialect to use MLIR Properties (NFC) (#1696)
Also try to switch APIs access to the new upstream APIs that separate
explicitly the access to "discardable" and "inherent" attributes (the
latter being stored in properties now).

Generic accessors like `getAttr()` `setAttr()` `setAttrs()` are much
more expensive and to be avoided.
2023-05-20 01:36:48 +00:00
cloudhan
323843cde8 [BUILD] stop depending on dlfcn-win32 by implementing dladdr natively with WIN32 API (#1674)
Co-authored-by: Philippe Tillet <phil@openai.com>
2023-05-16 07:19:36 +00:00
Ingo Müller
b2a757d000 [BUILD] Add missing CMake link-time dependencies. (#1654) 2023-05-11 19:17:44 -07:00
Philippe Tillet
e5c7d2a83c [FRONTEND] cleaned up language; added frontend function for globaltimer special register (#1525) 2023-04-14 15:29:27 -07:00
Philippe Tillet
e0d6f5f4f5 [BUILD] updated LLVM binaries (#1504)
Co-authored-by: Christian Sigg <csigg@google.com>
2023-04-11 00:14:00 -07:00
Philippe Tillet
053af4e9f8 [FRONTEND] Refactor file hierarchy (#1464)
The purpose of this PR is to remove some circular dependencies and
separate concerns better in the frontend. It's still not perfect --
`triton.compile` still includes a few runtime architecture-specific
component, but at least much better than before.

This PR still assumes that AMD only supports empty kernels right now.
Other PRs will follow to make the frontend supports multiple devices in
a more modular way.
2023-04-02 12:07:08 -07:00
Michael Melesse
a9c87245b4 [ROCM] Enable ROCM Backend #1: Empty Kernel (#1312)
This PR is a first in a series of PRs to import the changes that we have
made to enable ROCM on [our
fork](https://github.com/ROCmSoftwarePlatform/triton) of triton.

The PR contains the major changes to the python frontend and enough
changes to the c++ backend to allow compilation and running of the empty
kernel. We use the ROCM ci added a few weeks ago to verify things.

---------

Co-authored-by: Ronan Keryell <ronan@keryell.fr>
2023-03-24 17:18:27 -07:00
Philippe Tillet
b4decbe155 [BACKEND] Now using call_once to initialize LLVM target (#1373) 2023-03-19 21:23:39 -07:00
Philippe Tillet
e4b2d1bc3d [FRONTEND][BACKEND] no longer using indices for loops (#1370) 2023-03-19 14:57:50 -07:00
Philippe Tillet
39139258c8 [FRONTEND][BACKEND] tl.mathlib -> tl.math; internally reverted to mathlib -> libdevice (#1368) 2023-03-19 02:14:57 -07:00
rsanthanam-amd
c575911a01 [FRONTEND] Change libdevice to mathlib and fix abs (#1361)
Co-authored-by: Phil Tillet <phil@openai.com>
2023-03-19 01:34:16 -07:00
Keren Zhou
da0b0bfde6 [BACKEND] Still run llvm-opt but set optLevel to 0 to avoid the abs(float) bug (#1339)
https://github.com/openai/triton/issues/1337
2023-03-14 12:38:57 -07:00
Philippe Tillet
6a8634e2a7 [BACKEND] No longer running LLVM-IR optimizations after codegen. (#1338)
This triggered some outrageous bugs. See #1337.
2023-03-13 22:50:15 -07:00
Ilia Sergachev
22a61e6f59 [BACKEND] Fix stack-use-after-scope in translateLLVMIRToPTX. (#1304)
stack-use-after-scope is reported by LLVM AddressSanitizer at stream
flush happening in PassManager's destructor.
2023-03-08 18:25:38 -08:00
Philippe Tillet
3db55c5f94 [OPTIMIZER]]BACKEND] Some backend and optimization passes clean-up (#1284)
* Cleaned up pipeline pass. Now works when there are element-wise ops
between the load and the dot
* Made `splat` compatible with varibales that have DotOperandLayout
* Moves rematerialization utils to separate Transforms/Utility.cpp file.
2023-03-06 17:17:59 -08:00
Philippe Tillet
fa0fbc937f [FRONTEND][BACKEND][OPTIMIZER] Loops now use 64-bit indices when necessary (#1261)
* Frontend:
  - `int` kernel arguments are always signed
- Loop induction variable is now determine by integer promotion on
lb/ub/step
* Optimizer:
  -  Added new ExtractSliceOp that enforces 32-bit offsets
* Backend:
    - Use 64-bit indices when lowering functions and control flow
    - Removed `idx_val` macro and replaced it with `i32_val`
    - Cleaned up comments
- Added new ArithToIndex pass to make sure operations on indices are
done with the `index` dialect, that gets converted to LLVM separately
using a 64-bit target
2023-03-01 23:09:48 -08:00
Keren Zhou
6a9316e69a [BACKEND] Clean up SCF -> CF conversion (#1234) 2023-02-22 23:49:47 +00:00
Yu Guo
19228d88bc [FRONTEND][BACKEND] add env variable TRITON_LIBDEVICE_PATH (#1166)
we may compile kernels on remote machines which do not have local
libdevice.10.bc.

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-21 20:15:12 +00:00
Christian Sigg
9ef4b5d773 Rebase to LLVM-head. (#1200)
Rebase to
37b7a60cd7
2023-02-17 13:16:11 -08:00
Christian Sigg
fc7a8e3581 Rebase Triton to LLVM-15. (#1070)
This PR rebases Triton from LLVM-14 to LLVM-15. Most changes are
mechanical, except for the analysis framework changes.
2023-02-16 06:40:53 -08:00
Nikita Shulga
2d4370bc9f [LINKER] search for libdevice relative to shared library (#1176) 2023-02-11 02:24:33 +00:00
Philippe Tillet
43798ab27e [BUILD] Restored wheels workflow (#1146)
- Dependent CUDA files (ptxas, cuda.h, libdevice.bc.10) are now packaged in
`triton/third_party/cuda`. `ptxas` is downloaded from conda repo at
install time.
- Can now be built with old glibc (as that used by manylinux2014)
2023-02-03 16:22:10 -08:00
goostavz
3e8d83b7cc Minor fix to support sm_90 (#1125)
This fix enables the support on sm_90 (otherwise it will crash).

Logs like 
> 'sm_90' is not a recognized processor for this target (ignoring
processor)

could be ignored and should be eliminated with the update of llvm nvptx
backend.
2023-01-31 14:08:02 +08:00
Goran Flegar
afd02626ea [BUILD] Fix build issues of triton-translate tool (#1068) 2023-01-17 09:03:29 -08:00
Connor Baker
c20215dad1 [FRONTEND] Update PTX/SM support for LLVM14 (PR #1038 redux) (#1039)
=
2023-01-09 10:31:55 -08:00
Keren Zhou
733301ff31 [Backend] Rewrite code for linking external library to expose more inlining opportunities (#1037)
- Also make it cleaner. 
- And mark out the code needs to be fixed in `semantic.py`.
2023-01-08 13:44:29 -08:00
Philippe Tillet
20100a7254 Merge triton-mlir branch - Complete rewrite of the backend from scratch (#1004)
This PR merges the `triton-mlir` branch, in which we have been quietly
rewriting the Triton backend from scratch to increase maintainability,
stability and ultimately performance. Changes to the runtime are
minimal, and this new version aims to remain backward-compatible with
the previous commit. The legacy backend is now officially deprecated,
but can still be accessed via the `legacy-backend` tag.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com>
Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com>
Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com>
Co-authored-by: Yan Da <dyanab@connect.ust.hk>
Co-authored-by: Jun Yang <yangjunpro@gmail.com>
Co-authored-by: Ian Bearman <ianb@microsoft.com>
Co-authored-by: Jason Ansel <jansel@jansel.net>
Co-authored-by: Qingyi Liu <qingyil@nvidia.com>
Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com>
Co-authored-by: Chenggang Zhao <lyricz@yeah.net>
Co-authored-by: ben-zhang-609 <benzh609@gmail.com>
Co-authored-by: dongdongl <dongdongl@nvidia.com>
2022-12-21 01:30:50 -08:00