Commit Graph

566 Commits

Author SHA1 Message Date
Philippe Tillet
9b7c65a3a9 [BACKEND][OPTIMIZER] Refactor MMAv1 codegen (#1322)
- Significant simplification of the optimizer pipeline. Right mma
version is now set directly after the coalescing pass. DotOperand layout
no longer hold a state to `isRow` argument, and instead query it from
their parent
- Moved a bunch of things from TritonGPUToLLVM/DotOpHelpers to
TritonGPUAttrDefs. All MMAv1 state is now queried from attributes.
- logic for getELemsPerThread is no longer duplicated in TypeConverter
2023-03-12 19:54:38 -07:00
Yu Guo
ef55ccfed0 [TESTING] fix get_max_simd_tflops (#1318)
`_triton.runtime.num_sm`, `_triton.runtime.clock_rate`,
`_triton.runtime.cc` seem no longer exist.

use the corresponding methods from `get_max_tensorcore_tflops` in the
same file.
2023-03-11 10:07:25 -08:00
Philippe Tillet
5a786cf778 [FRONTEND] Fixed contains_return_op behavior (#1317) 2023-03-10 23:58:28 -08:00
Philippe Tillet
3fe3adbcde [FRONTEND][BACKEND] Add support for float8e5m2 type (#1314) 2023-03-10 19:14:47 -08:00
Luo Yihang
9626c8e944 [DOC] Fix typos in comments (#1311)
Fixed several typos in `python/triton/runtime/autotuner.py`
2023-03-10 09:33:24 -08:00
Keren Zhou
8b25c30d39 [BACKEND] Fix bfloat16 flash attention (#1306)
See https://github.com/openai/triton/issues/1245 for more detailed
information

---------

Co-authored-by: giorgio-arena <arena.cpp@gmail.com>
2023-03-09 21:14:52 -08:00
Sophia Wisdom
a4a824a3c9 [FRONTEND] Correct error message (#1308) 2023-03-09 21:14:11 -08:00
Da Yan
902c61affb [BACKEND] Add arith::SelectOp => LLVM::SelectOp conversion (#1307) 2023-03-09 09:35:30 -08:00
Keren Zhou
78b311f6e2 [FRONTEND] Fix cast when both src_ty and dst_ty are of block_type (#1301)
Commonly used in atomic_rmw ops
2023-03-08 09:25:00 -08:00
shunting314
f5c9f9b4b5 [FRONTEND] Expose the register usage and spill information thru CompiledKernel (#1296) 2023-03-08 01:30:31 +00:00
Phil Tillet
773c29cfaa [BUILD] Fix comment typo 2023-03-07 16:47:30 -08:00
Phil Tillet
305f99e614 [BUILD] Fixed typo in setup.py 2023-03-07 15:45:36 -08:00
Philippe Tillet
c34b32866b [BUILD] re-download package if version has changed (#1294) 2023-03-07 10:15:35 -08:00
JiCheng
849a40baad [FRONTEND] Add check for the axis of reduction op (#1268) 2023-03-06 22:11:43 -08:00
Philippe Tillet
3db55c5f94 [OPTIMIZER]]BACKEND] Some backend and optimization passes clean-up (#1284)
* Cleaned up pipeline pass. Now works when there are element-wise ops
between the load and the dot
* Made `splat` compatible with varibales that have DotOperandLayout
* Moves rematerialization utils to separate Transforms/Utility.cpp file.
2023-03-06 17:17:59 -08:00
Keren Zhou
4731f300d3 [BACKEND] Mask out wrapped threads in store ops (#1283) 2023-03-06 14:50:20 -08:00
Alexander Zinoviev
5e92a66267 [DOC] Fix a typo in where's description (#1286)
Co-authored-by: Alexander Zinoviev <zinoviev@google.com>
2023-03-06 14:38:03 -08:00
Philippe Tillet
ff94e34430 [TESTS][BUILD] now using llvm @ 8e5a41e8271f (#1282)
Now we also use the FileTest utility packaged with llvm pre-built binaries
2023-03-05 17:23:00 -08:00
Keren Zhou
d376020f90 [FRONTEND][BACKEND] Implement tl.device_assert and rename tl.printf to tl.device_print (#1143)
Note that `tl.device_print` and `print` accepts different arguments than
the normal `print`. The first argument must be a string, following by
variables.

Device side:

- `tl.device_print`
- `tl.device_assert`
- `print`
- `assert`

Compilation time:

- `tl.static_assert`
- `tl.static_print`

Usage example:

1.
```Python
tl.device_assert(x == 0, "x != 0")
```

Output:

```Python
...
python/test/unit/language/assert_helper.py:18: kernel: block: [0,0,0], thread: [33,0,0] Assertion `x != 0` failed.
...
```

2.
```Python
tl.device_print("hello ", x)
```

Output:

```Python
...
hello 1
...
```

The environment variable `TRITON_DEBUG` sets the default debugging flag; if it's true, `tl.device_assert` or `assert` will be skipped.
2023-03-04 08:08:29 -08:00
Keren Zhou
77c145cec8 [BUILD] Bump cmake requirement to >= 3.20 and format CMakeLists.txt (#1276)
cc @malfet
2023-03-03 11:43:09 -08:00
Phil Tillet
c7581c9a91 [PACKAGING] bump dev version to 2.1.0 2023-03-02 21:52:30 -08:00
Keren Zhou
65e5a3bc24 [FRONTEND] Improve tl.full to accept both static and dynamic values (#1269) 2023-03-02 12:19:54 -08:00
Phil Tillet
2660c814c9 [FRONTEND] for loop negative step hotfix 2023-03-01 23:45:03 -08:00
Philippe Tillet
fa0fbc937f [FRONTEND][BACKEND][OPTIMIZER] Loops now use 64-bit indices when necessary (#1261)
* Frontend:
  - `int` kernel arguments are always signed
- Loop induction variable is now determine by integer promotion on
lb/ub/step
* Optimizer:
  -  Added new ExtractSliceOp that enforces 32-bit offsets
* Backend:
    - Use 64-bit indices when lowering functions and control flow
    - Removed `idx_val` macro and replaced it with `i32_val`
    - Cleaned up comments
- Added new ArithToIndex pass to make sure operations on indices are
done with the `index` dialect, that gets converted to LLVM separately
using a 64-bit target
2023-03-01 23:09:48 -08:00
Keren Zhou
90fcb38c7b [BACKEND] Overwrite NVPTX converters for fp16<->fp32 and int16<->int32 to avoid ptxas problems (#1267) 2023-03-01 18:26:06 -08:00
Da Yan
cb7b315a17 [OPTIMIZER] Copying named attributes when converting from Triton to TritonGPU (#1265) 2023-03-01 12:31:46 -08:00
Keren Zhou
be6217cce7 [BACKEND] Improve ptxas error message (#1263) 2023-03-01 00:59:36 +00:00
Keren Zhou
5376fe9443 [FRONTEND] Improve triton hooks (#1256)
Callback interfaces are not changed, just to record more attributes
(i.e., `constants`) and simplify invocations
2023-02-26 17:16:05 -08:00
Da Yan
0eead250c1 [FRONTEND] add missing tensor/constexpr ops (#1249) 2023-02-24 18:45:22 +00:00
Yan Chunwei
7eecc4d4ad [Frontend] Fix jit cache bug (#1242) 2023-02-23 09:21:30 -08:00
Michaël Benesty
66ddd17e72 [EXAMLPES] remove unnecessary argument (#1243)
Small cleaning of an example calling an old API to display generated IR
2023-02-23 09:15:32 -08:00
Douglas Lehr
729211a404 Ensure __triton_launcher calls right _launch. (#1229)
Per issue https://github.com/openai/triton/issues/1228. I believe we are
potentially exposed when a Triton executor (Pytorch for example) links
in two or more `triton_.so` shared objects and each has a stub for
`_launch`.

This fix ensures the `_launch` function is tied locally to the calling
`__triton_launcher` and can't be misused by another library.
2023-02-23 00:16:36 +00:00
Philippe Tillet
0ec277efc5 [OPTIMIZER] cleaned, renamed and simplified some optimization passes (#1232)
This shouldn't actually change the behavior of Triton -- only clean things up.
2023-02-22 13:54:55 -08:00
Philippe Tillet
ba0198326e [TESTS] make performance regression testing less strict (#1231) 2023-02-21 22:22:02 -08:00
Mihir Patel
6bef0c2bd6 [FRONTEND] Update path for headers to support Python 3.10 (#1123)
Python 3.10 changes where packages are installed by default, causing
problems with Ubuntu into `/local`. See
[this](https://lists.debian.org/debian-python/2022/03/msg00039.html) and
[this](https://bugs.launchpad.net/ubuntu/+source/python3.10/+bug/1967920).
Triton seems to break when using 3.10 as it looks for the headers, but
the headers are not in `/local`, e.g. they are at
`/usr/include/python3.X` and not `/usr/local/include/python3.X`


Not 100% sure what's going on here since it's deep in python / pip, but
I think this should fix it. Otherwise, you have to hack around it in
dockerfiles, e.g. `ENV DEB_PYTHON_INSTALL_LAYOUT=deb`, which breaks
things with the release of pip that went.

---------

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2023-02-21 21:19:08 -08:00
Philippe Tillet
174f121c1c [TESTS] Added attention regression tests (#1227) 2023-02-21 20:22:36 -08:00
Eric Wang
320ae18093 [FRONTEND] Add error messages for arange (#1218)
Fix issue https://github.com/openai/triton/issues/244

Check `end` is greater than `start`.
Check if the range can fit in `int32`.
Check the number of elements less than or equal to
`TRITON_MAX_TENSOR_NUMEL = 131072`.

---------

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-22 00:37:28 +00:00
Philippe Tillet
307dde9cb5 [CI] revived regression tests (#1225) 2023-02-21 16:33:03 -08:00
Yu Guo
19228d88bc [FRONTEND][BACKEND] add env variable TRITON_LIBDEVICE_PATH (#1166)
we may compile kernels on remote machines which do not have local
libdevice.10.bc.

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-21 20:15:12 +00:00
Philippe Tillet
cdd59eae68 [CI] Added A100 runner; tentative merge queues support (#1224) 2023-02-21 01:37:56 -08:00
Michaël Benesty
940f394a35 [Frontend] fix crash on cast when dest is constexpr (#1222)
This pull request addresses a crash that occurs when casting to a
tl.constexpr type in the frontend.

More info and repro code available in:
https://github.com/openai/triton/issues/1221
2023-02-20 10:50:33 -08:00
Christian Sigg
17795a34ac [NFC] Remove null character (#1220) 2023-02-20 08:50:28 +00:00
BillSchumacher
6b44d31ae4 [BUILD] windows and cmake compatibility. (#1214)
Make cmake happier, it doesn't like multiple target_link_library
definitions for the same name.

Use find_package instead on Windows for dlfcn-win32. 
Set LLVM_SYS_PATH on Windows for python setup.

Debug build almost working, AlwaysCreate error thrown still.
2023-02-19 09:51:50 +00:00
Arun A. Kumar
35d1c062b8 [FRONTEND] fix AutoTuner error when OutOfResources (#1208)
Minor bug: AutoTuner currently throws the following error when certain
configs go OutOfResources (e.g. the matmul example when testing on GPUs
with less shared memory).
2023-02-18 07:29:33 +00:00
Philippe Tillet
4d067f5120 [FRONTEND] Now emit an error for tl.reshape, instead of silently calling tl.view (#1212) 2023-02-17 20:21:20 -08:00
Christian Sigg
9ef4b5d773 Rebase to LLVM-head. (#1200)
Rebase to
37b7a60cd7
2023-02-17 13:16:11 -08:00
Philippe Tillet
969331aedd [BUILD] fixed setup.py on older glibc (#1206) 2023-02-16 19:43:18 -08:00
Philippe Tillet
8a4117a0f4 [FRONTEND] launcher module is now renamed from launcher to __triton_launcher (#1201)
creating dynamically a module named `launcher` may conflict with other
modules named the same in the user's environment.
2023-02-16 17:28:51 -08:00
Christian Sigg
fc7a8e3581 Rebase Triton to LLVM-15. (#1070)
This PR rebases Triton from LLVM-14 to LLVM-15. Most changes are
mechanical, except for the analysis framework changes.
2023-02-16 06:40:53 -08:00
Horace He
f21e76affe [TUTORIALS] changed for loop to iterate by 1 in matmuls (#1198)
For the new MLIR backend, this appears to increase matmul perf
significantly in many cases.
2023-02-16 03:44:42 +00:00