Commit Graph

528 Commits

Author SHA1 Message Date
Berke Kocaoğlu
ba91f39dbf [DOC] Fix syntax errors, typos, formatting; increase consistency (#1357)
This PR;
- Fixes syntax errors like `.type values: dict[str,
Callable[[list[Any]], Any]]` to `:type values: dict[str,
Callable[[list[Any]], Any]]`,
- Fixes typos,
- Fixes formatting like `k ++` to ` k++`,
- Increases consistency (e.g. by transforming the minority `cd dir/` to
the majority `cd dir`).
2023-03-16 15:32:02 -07:00
mcskatkat
53e8e04d6e [FRONTEND] fix constexpr by annotation (#1352)
Fixed unjustified `TypeError` raised when arg is (strangely) annotated
with a non-type
2023-03-16 11:10:19 -07:00
mcskatkat
f5d22d5995 [FRONTEND] support f-strings in compiler with constexpr conversion (#1349)
This addition allows explanatory messages upon assertion failures:

```python
@triton.jit
def my_single_block_kernel(
    matrix_extent: tl.constexpr,
    block_size: tl.constexpr,      # must be >= extent (single block)
    matrix: Tensor,
    ...
):
    tl.static_assert(matrix_extent <= block_size, 
                     f"`matrix_extent` should not be more than the block size ({block_size}), but is {matrix_extent}")
```

Yielding, when called incorrectly:
```
AssertionError: `matrix_extent` should not be more than the block size (32), but is 57
```
2023-03-16 08:02:10 +00:00
Shintaro Iwasaki
4b774ee4d0 [OPS/BLOCKSPARSE] remove unnecessary mask (#1351)
This PR applies a minor patch that removes unnecessary masks in
`_dsd_kernel()`.

### Details

`offs_bn` is defined as follows and not updated after that.
```py
offs_bn = pid_m * TILE_N + tl.arange(0, TILE_N)
offs_bn = tl.max_contiguous(tl.multiple_of(offs_bn % DS0, TILE_N), TILE_N)
```

Because `offs_bn = offs_bn % DS0`, this mask is always `True`.
```py
b = tl.load(pb, mask=offs_bn[None, :] < DS0)
```
This PR removes this mask (as well as explicit `mask=True`).
2023-03-15 19:06:38 -07:00
mcskatkat
c175473bbf [FRONTEND] In JITFunction: infer constexpr arg only if annotated as such (#1345)
Fixed `JITFunction.__init__` to mark args as constexpr only when the
annotation is actually `tl.constexpr`, rather than treating any
annotated arg as constexpr.
2023-03-15 16:39:45 -07:00
Stonepia
109b5e2729 [BUILD] Fix the build bug when user use system package of llvm by setting LLVM_SYSPATH (#1336)
When the user set the `LLVM_SYSPATH` to use custom build llvm, it will
throw the error because there is no version.txt under the custom build
one.

This PR skips the version check If the `LLVM_SYSPATH` is set.

---------

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-03-15 13:28:19 -07:00
Philippe Tillet
56b23f433d [TEST] Temporarily disable test_dot mode that fails because of ptxas/nvptx (#1344) 2023-03-15 01:17:48 -07:00
peterbell10
01b177afe7 [FRONTEND] Mangle signed and unsigned integer types differently (#1340)
This is cherry-picked from #1305

If you call a `JITFunction` twice in the same kernel, first with `int32`
then with `uint32`, the second call will treat the unsigned value as
signed. This passes through MLIR without error because MLIR uses the
same types for both, but different operation calls will be generated so
you may silently get the wrong result.
2023-03-14 22:29:18 -07:00
Philippe Tillet
ad81447ad0 [FRONTEND] Marking int1 (bool) type as unsigned (#1343) 2023-03-14 22:05:13 -07:00
Philippe Tillet
6a8634e2a7 [BACKEND] No longer running LLVM-IR optimizations after codegen. (#1338)
This triggered some outrageous bugs. See #1337.
2023-03-13 22:50:15 -07:00
Philippe Tillet
dde34904d0 [TESTING] triton.testing.allclose now uses torch.allclose (#1333) 2023-03-13 17:48:32 -07:00
Nikita Shulga
663074460d [VERSION] Update triton/__init__.py (#1327)
Followup after
c7581c9a91
2023-03-13 10:38:38 -07:00
Philippe Tillet
9b7c65a3a9 [BACKEND][OPTIMIZER] Refactor MMAv1 codegen (#1322)
- Significant simplification of the optimizer pipeline. Right mma
version is now set directly after the coalescing pass. DotOperand layout
no longer hold a state to `isRow` argument, and instead query it from
their parent
- Moved a bunch of things from TritonGPUToLLVM/DotOpHelpers to
TritonGPUAttrDefs. All MMAv1 state is now queried from attributes.
- logic for getELemsPerThread is no longer duplicated in TypeConverter
2023-03-12 19:54:38 -07:00
Yu Guo
ef55ccfed0 [TESTING] fix get_max_simd_tflops (#1318)
`_triton.runtime.num_sm`, `_triton.runtime.clock_rate`,
`_triton.runtime.cc` seem no longer exist.

use the corresponding methods from `get_max_tensorcore_tflops` in the
same file.
2023-03-11 10:07:25 -08:00
Philippe Tillet
5a786cf778 [FRONTEND] Fixed contains_return_op behavior (#1317) 2023-03-10 23:58:28 -08:00
Philippe Tillet
3fe3adbcde [FRONTEND][BACKEND] Add support for float8e5m2 type (#1314) 2023-03-10 19:14:47 -08:00
Luo Yihang
9626c8e944 [DOC] Fix typos in comments (#1311)
Fixed several typos in `python/triton/runtime/autotuner.py`
2023-03-10 09:33:24 -08:00
Keren Zhou
8b25c30d39 [BACKEND] Fix bfloat16 flash attention (#1306)
See https://github.com/openai/triton/issues/1245 for more detailed
information

---------

Co-authored-by: giorgio-arena <arena.cpp@gmail.com>
2023-03-09 21:14:52 -08:00
Sophia Wisdom
a4a824a3c9 [FRONTEND] Correct error message (#1308) 2023-03-09 21:14:11 -08:00
Da Yan
902c61affb [BACKEND] Add arith::SelectOp => LLVM::SelectOp conversion (#1307) 2023-03-09 09:35:30 -08:00
Keren Zhou
78b311f6e2 [FRONTEND] Fix cast when both src_ty and dst_ty are of block_type (#1301)
Commonly used in atomic_rmw ops
2023-03-08 09:25:00 -08:00
shunting314
f5c9f9b4b5 [FRONTEND] Expose the register usage and spill information thru CompiledKernel (#1296) 2023-03-08 01:30:31 +00:00
Phil Tillet
773c29cfaa [BUILD] Fix comment typo 2023-03-07 16:47:30 -08:00
Phil Tillet
305f99e614 [BUILD] Fixed typo in setup.py 2023-03-07 15:45:36 -08:00
Philippe Tillet
c34b32866b [BUILD] re-download package if version has changed (#1294) 2023-03-07 10:15:35 -08:00
JiCheng
849a40baad [FRONTEND] Add check for the axis of reduction op (#1268) 2023-03-06 22:11:43 -08:00
Philippe Tillet
3db55c5f94 [OPTIMIZER]]BACKEND] Some backend and optimization passes clean-up (#1284)
* Cleaned up pipeline pass. Now works when there are element-wise ops
between the load and the dot
* Made `splat` compatible with varibales that have DotOperandLayout
* Moves rematerialization utils to separate Transforms/Utility.cpp file.
2023-03-06 17:17:59 -08:00
Keren Zhou
4731f300d3 [BACKEND] Mask out wrapped threads in store ops (#1283) 2023-03-06 14:50:20 -08:00
Alexander Zinoviev
5e92a66267 [DOC] Fix a typo in where's description (#1286)
Co-authored-by: Alexander Zinoviev <zinoviev@google.com>
2023-03-06 14:38:03 -08:00
Philippe Tillet
ff94e34430 [TESTS][BUILD] now using llvm @ 8e5a41e8271f (#1282)
Now we also use the FileTest utility packaged with llvm pre-built binaries
2023-03-05 17:23:00 -08:00
Keren Zhou
d376020f90 [FRONTEND][BACKEND] Implement tl.device_assert and rename tl.printf to tl.device_print (#1143)
Note that `tl.device_print` and `print` accepts different arguments than
the normal `print`. The first argument must be a string, following by
variables.

Device side:

- `tl.device_print`
- `tl.device_assert`
- `print`
- `assert`

Compilation time:

- `tl.static_assert`
- `tl.static_print`

Usage example:

1.
```Python
tl.device_assert(x == 0, "x != 0")
```

Output:

```Python
...
python/test/unit/language/assert_helper.py:18: kernel: block: [0,0,0], thread: [33,0,0] Assertion `x != 0` failed.
...
```

2.
```Python
tl.device_print("hello ", x)
```

Output:

```Python
...
hello 1
...
```

The environment variable `TRITON_DEBUG` sets the default debugging flag; if it's true, `tl.device_assert` or `assert` will be skipped.
2023-03-04 08:08:29 -08:00
Keren Zhou
77c145cec8 [BUILD] Bump cmake requirement to >= 3.20 and format CMakeLists.txt (#1276)
cc @malfet
2023-03-03 11:43:09 -08:00
Phil Tillet
c7581c9a91 [PACKAGING] bump dev version to 2.1.0 2023-03-02 21:52:30 -08:00
Keren Zhou
65e5a3bc24 [FRONTEND] Improve tl.full to accept both static and dynamic values (#1269) 2023-03-02 12:19:54 -08:00
Phil Tillet
2660c814c9 [FRONTEND] for loop negative step hotfix 2023-03-01 23:45:03 -08:00
Philippe Tillet
fa0fbc937f [FRONTEND][BACKEND][OPTIMIZER] Loops now use 64-bit indices when necessary (#1261)
* Frontend:
  - `int` kernel arguments are always signed
- Loop induction variable is now determine by integer promotion on
lb/ub/step
* Optimizer:
  -  Added new ExtractSliceOp that enforces 32-bit offsets
* Backend:
    - Use 64-bit indices when lowering functions and control flow
    - Removed `idx_val` macro and replaced it with `i32_val`
    - Cleaned up comments
- Added new ArithToIndex pass to make sure operations on indices are
done with the `index` dialect, that gets converted to LLVM separately
using a 64-bit target
2023-03-01 23:09:48 -08:00
Keren Zhou
90fcb38c7b [BACKEND] Overwrite NVPTX converters for fp16<->fp32 and int16<->int32 to avoid ptxas problems (#1267) 2023-03-01 18:26:06 -08:00
Da Yan
cb7b315a17 [OPTIMIZER] Copying named attributes when converting from Triton to TritonGPU (#1265) 2023-03-01 12:31:46 -08:00
Keren Zhou
be6217cce7 [BACKEND] Improve ptxas error message (#1263) 2023-03-01 00:59:36 +00:00
Keren Zhou
5376fe9443 [FRONTEND] Improve triton hooks (#1256)
Callback interfaces are not changed, just to record more attributes
(i.e., `constants`) and simplify invocations
2023-02-26 17:16:05 -08:00
Da Yan
0eead250c1 [FRONTEND] add missing tensor/constexpr ops (#1249) 2023-02-24 18:45:22 +00:00
Yan Chunwei
7eecc4d4ad [Frontend] Fix jit cache bug (#1242) 2023-02-23 09:21:30 -08:00
Michaël Benesty
66ddd17e72 [EXAMLPES] remove unnecessary argument (#1243)
Small cleaning of an example calling an old API to display generated IR
2023-02-23 09:15:32 -08:00
Douglas Lehr
729211a404 Ensure __triton_launcher calls right _launch. (#1229)
Per issue https://github.com/openai/triton/issues/1228. I believe we are
potentially exposed when a Triton executor (Pytorch for example) links
in two or more `triton_.so` shared objects and each has a stub for
`_launch`.

This fix ensures the `_launch` function is tied locally to the calling
`__triton_launcher` and can't be misused by another library.
2023-02-23 00:16:36 +00:00
Philippe Tillet
0ec277efc5 [OPTIMIZER] cleaned, renamed and simplified some optimization passes (#1232)
This shouldn't actually change the behavior of Triton -- only clean things up.
2023-02-22 13:54:55 -08:00
Philippe Tillet
ba0198326e [TESTS] make performance regression testing less strict (#1231) 2023-02-21 22:22:02 -08:00
Mihir Patel
6bef0c2bd6 [FRONTEND] Update path for headers to support Python 3.10 (#1123)
Python 3.10 changes where packages are installed by default, causing
problems with Ubuntu into `/local`. See
[this](https://lists.debian.org/debian-python/2022/03/msg00039.html) and
[this](https://bugs.launchpad.net/ubuntu/+source/python3.10/+bug/1967920).
Triton seems to break when using 3.10 as it looks for the headers, but
the headers are not in `/local`, e.g. they are at
`/usr/include/python3.X` and not `/usr/local/include/python3.X`


Not 100% sure what's going on here since it's deep in python / pip, but
I think this should fix it. Otherwise, you have to hack around it in
dockerfiles, e.g. `ENV DEB_PYTHON_INSTALL_LAYOUT=deb`, which breaks
things with the release of pip that went.

---------

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2023-02-21 21:19:08 -08:00
Philippe Tillet
174f121c1c [TESTS] Added attention regression tests (#1227) 2023-02-21 20:22:36 -08:00
Eric Wang
320ae18093 [FRONTEND] Add error messages for arange (#1218)
Fix issue https://github.com/openai/triton/issues/244

Check `end` is greater than `start`.
Check if the range can fit in `int32`.
Check the number of elements less than or equal to
`TRITON_MAX_TENSOR_NUMEL = 131072`.

---------

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-22 00:37:28 +00:00
Philippe Tillet
307dde9cb5 [CI] revived regression tests (#1225) 2023-02-21 16:33:03 -08:00