Commit Graph

732 Commits

Author SHA1 Message Date
Phil Tillet
305f99e614 [BUILD] Fixed typo in setup.py 2023-03-07 15:45:36 -08:00
Philippe Tillet
c34b32866b [BUILD] re-download package if version has changed (#1294) 2023-03-07 10:15:35 -08:00
Philippe Tillet
53c216f506 [TESTS] Now using downloaded FileCheck directly without copying it (#1293) 2023-03-07 09:11:08 -08:00
JiCheng
849a40baad [FRONTEND] Add check for the axis of reduction op (#1268) 2023-03-06 22:11:43 -08:00
Da Yan
6249f5d923 [OPTIMIZER] checking loads' masks' alignment info in the pipeline pass (#1289)
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2023-03-06 18:37:26 -08:00
Philippe Tillet
3db55c5f94 [OPTIMIZER]]BACKEND] Some backend and optimization passes clean-up (#1284)
* Cleaned up pipeline pass. Now works when there are element-wise ops
between the load and the dot
* Made `splat` compatible with varibales that have DotOperandLayout
* Moves rematerialization utils to separate Transforms/Utility.cpp file.
2023-03-06 17:17:59 -08:00
Keren Zhou
73d55eb59c [OPTIMIZER] Check if the definition op is NULL in GPU combine passes (#1288) 2023-03-06 16:19:13 -08:00
Keren Zhou
4731f300d3 [BACKEND] Mask out wrapped threads in store ops (#1283) 2023-03-06 14:50:20 -08:00
Alexander Zinoviev
5e92a66267 [DOC] Fix a typo in where's description (#1286)
Co-authored-by: Alexander Zinoviev <zinoviev@google.com>
2023-03-06 14:38:03 -08:00
Philippe Tillet
ff94e34430 [TESTS][BUILD] now using llvm @ 8e5a41e8271f (#1282)
Now we also use the FileTest utility packaged with llvm pre-built binaries
2023-03-05 17:23:00 -08:00
Philippe Tillet
0f82fac60e [BACKEND] move struct packing/unpacking to type converter and give a more explicit name (#1281)
This is the first of a series of PR meant to clean up how the backend
handles the codegen for dot operand layouts
2023-03-05 16:04:29 -08:00
Keren Zhou
d376020f90 [FRONTEND][BACKEND] Implement tl.device_assert and rename tl.printf to tl.device_print (#1143)
Note that `tl.device_print` and `print` accepts different arguments than
the normal `print`. The first argument must be a string, following by
variables.

Device side:

- `tl.device_print`
- `tl.device_assert`
- `print`
- `assert`

Compilation time:

- `tl.static_assert`
- `tl.static_print`

Usage example:

1.
```Python
tl.device_assert(x == 0, "x != 0")
```

Output:

```Python
...
python/test/unit/language/assert_helper.py:18: kernel: block: [0,0,0], thread: [33,0,0] Assertion `x != 0` failed.
...
```

2.
```Python
tl.device_print("hello ", x)
```

Output:

```Python
...
hello 1
...
```

The environment variable `TRITON_DEBUG` sets the default debugging flag; if it's true, `tl.device_assert` or `assert` will be skipped.
2023-03-04 08:08:29 -08:00
Keren Zhou
77c145cec8 [BUILD] Bump cmake requirement to >= 3.20 and format CMakeLists.txt (#1276)
cc @malfet
2023-03-03 11:43:09 -08:00
Li Dong
e6edcdfd28 [README] fix typo (#1274) 2023-03-03 08:01:18 -08:00
Phil Tillet
c7581c9a91 [PACKAGING] bump dev version to 2.1.0 2023-03-02 21:52:30 -08:00
Phil Tillet
2cb7da7259 [DOCS] Light edits to README 2023-03-02 21:52:03 -08:00
Keren Zhou
65e5a3bc24 [FRONTEND] Improve tl.full to accept both static and dynamic values (#1269) 2023-03-02 12:19:54 -08:00
Keren Zhou
d54745538b [BACKEND][CI] Disable most backward rematerialization through load/store (#1260) 2023-03-02 09:45:50 -08:00
Phil Tillet
2660c814c9 [FRONTEND] for loop negative step hotfix 2023-03-01 23:45:03 -08:00
Philippe Tillet
fa0fbc937f [FRONTEND][BACKEND][OPTIMIZER] Loops now use 64-bit indices when necessary (#1261)
* Frontend:
  - `int` kernel arguments are always signed
- Loop induction variable is now determine by integer promotion on
lb/ub/step
* Optimizer:
  -  Added new ExtractSliceOp that enforces 32-bit offsets
* Backend:
    - Use 64-bit indices when lowering functions and control flow
    - Removed `idx_val` macro and replaced it with `i32_val`
    - Cleaned up comments
- Added new ArithToIndex pass to make sure operations on indices are
done with the `index` dialect, that gets converted to LLVM separately
using a 64-bit target
2023-03-01 23:09:48 -08:00
Keren Zhou
90fcb38c7b [BACKEND] Overwrite NVPTX converters for fp16<->fp32 and int16<->int32 to avoid ptxas problems (#1267) 2023-03-01 18:26:06 -08:00
Da Yan
cb7b315a17 [OPTIMIZER] Copying named attributes when converting from Triton to TritonGPU (#1265) 2023-03-01 12:31:46 -08:00
Keren Zhou
be6217cce7 [BACKEND] Improve ptxas error message (#1263) 2023-03-01 00:59:36 +00:00
Philippe Tillet
f8c92c3d17 [ANALYSIS] initializing operand axisinfo state (if necessary) before visiting operation (#1259)
Feels very wrong, and probably not the right way to do this. But
otherwise `scf.if` doesn't get initialized since the merge to llvm-head.
Suggestions are welcome 😅
2023-02-27 19:32:19 -08:00
Keren Zhou
5376fe9443 [FRONTEND] Improve triton hooks (#1256)
Callback interfaces are not changed, just to record more attributes
(i.e., `constants`) and simplify invocations
2023-02-26 17:16:05 -08:00
Keren Zhou
eab16d7aa7 [CI] Various fixes for the Torchinductor CI (#1251) 2023-02-24 11:49:14 -08:00
Da Yan
0eead250c1 [FRONTEND] add missing tensor/constexpr ops (#1249) 2023-02-24 18:45:22 +00:00
Phil Tillet
823072146f [CI] .github/workflows/torchinductor/run-tests.yml -> .github/workflows/torch-inductor-tests.yml 2023-02-23 15:55:07 -08:00
Keren Zhou
5b0b89d45b [CI] Init torchinductor correctness and performance tests (#1241) 2023-02-23 13:53:41 -08:00
Yan Chunwei
7eecc4d4ad [Frontend] Fix jit cache bug (#1242) 2023-02-23 09:21:30 -08:00
Michaël Benesty
66ddd17e72 [EXAMLPES] remove unnecessary argument (#1243)
Small cleaning of an example calling an old API to display generated IR
2023-02-23 09:15:32 -08:00
Stonepia
a38d2defb8 [Build] Strip the static libraries symbol from the triton shared library (#1240)
This is to solve https://github.com/openai/triton/issues/1236

This commit hides the symbols of the shared libraries for
`libtriton.so`, so that when other object link against `libtriton.so`,
it won't have confilct.
2023-02-22 23:39:31 -08:00
Chenggang Zhao
b5efa91e2a [Backend] Fix a bug in swizzling store (#1235)
The function calculates the swizzled address to **store** (not load), so
we should use `outOrder` instead of `inOrder`. Current tests do not
cover this case, but at NVIDIA, we have a case related to `sm_90` that
could trigger. Already discussed in the Slack channel with @Jokeren.
2023-02-22 19:13:21 -08:00
Douglas Lehr
729211a404 Ensure __triton_launcher calls right _launch. (#1229)
Per issue https://github.com/openai/triton/issues/1228. I believe we are
potentially exposed when a Triton executor (Pytorch for example) links
in two or more `triton_.so` shared objects and each has a stub for
`_launch`.

This fix ensures the `_launch` function is tied locally to the calling
`__triton_launcher` and can't be misused by another library.
2023-02-23 00:16:36 +00:00
Keren Zhou
6a9316e69a [BACKEND] Clean up SCF -> CF conversion (#1234) 2023-02-22 23:49:47 +00:00
Philippe Tillet
0ec277efc5 [OPTIMIZER] cleaned, renamed and simplified some optimization passes (#1232)
This shouldn't actually change the behavior of Triton -- only clean things up.
2023-02-22 13:54:55 -08:00
Philippe Tillet
ba0198326e [TESTS] make performance regression testing less strict (#1231) 2023-02-21 22:22:02 -08:00
Mihir Patel
6bef0c2bd6 [FRONTEND] Update path for headers to support Python 3.10 (#1123)
Python 3.10 changes where packages are installed by default, causing
problems with Ubuntu into `/local`. See
[this](https://lists.debian.org/debian-python/2022/03/msg00039.html) and
[this](https://bugs.launchpad.net/ubuntu/+source/python3.10/+bug/1967920).
Triton seems to break when using 3.10 as it looks for the headers, but
the headers are not in `/local`, e.g. they are at
`/usr/include/python3.X` and not `/usr/local/include/python3.X`


Not 100% sure what's going on here since it's deep in python / pip, but
I think this should fix it. Otherwise, you have to hack around it in
dockerfiles, e.g. `ENV DEB_PYTHON_INSTALL_LAYOUT=deb`, which breaks
things with the release of pip that went.

---------

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2023-02-21 21:19:08 -08:00
Philippe Tillet
174f121c1c [TESTS] Added attention regression tests (#1227) 2023-02-21 20:22:36 -08:00
Phil Tillet
0192ab2178 Revert "[CI] Now only running CI when checks are requested in merge groups"
This reverts commit d023e1cb06.
2023-02-21 16:39:47 -08:00
Eric Wang
320ae18093 [FRONTEND] Add error messages for arange (#1218)
Fix issue https://github.com/openai/triton/issues/244

Check `end` is greater than `start`.
Check if the range can fit in `int32`.
Check the number of elements less than or equal to
`TRITON_MAX_TENSOR_NUMEL = 131072`.

---------

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-22 00:37:28 +00:00
Phil Tillet
d023e1cb06 [CI] Now only running CI when checks are requested in merge groups 2023-02-21 16:34:25 -08:00
Philippe Tillet
307dde9cb5 [CI] revived regression tests (#1225) 2023-02-21 16:33:03 -08:00
Yu Guo
19228d88bc [FRONTEND][BACKEND] add env variable TRITON_LIBDEVICE_PATH (#1166)
we may compile kernels on remote machines which do not have local
libdevice.10.bc.

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-21 20:15:12 +00:00
Philippe Tillet
cdd59eae68 [CI] Added A100 runner; tentative merge queues support (#1224) 2023-02-21 01:37:56 -08:00
Michaël Benesty
940f394a35 [Frontend] fix crash on cast when dest is constexpr (#1222)
This pull request addresses a crash that occurs when casting to a
tl.constexpr type in the frontend.

More info and repro code available in:
https://github.com/openai/triton/issues/1221
2023-02-20 10:50:33 -08:00
Christian Sigg
17795a34ac [NFC] Remove null character (#1220) 2023-02-20 08:50:28 +00:00
Keren Zhou
123c687ed9 [BACKEND] Rewrite Membar to fit the CF dialect (#1213) 2023-02-19 14:54:33 -08:00
BillSchumacher
6b44d31ae4 [BUILD] windows and cmake compatibility. (#1214)
Make cmake happier, it doesn't like multiple target_link_library
definitions for the same name.

Use find_package instead on Windows for dlfcn-win32. 
Set LLVM_SYS_PATH on Windows for python setup.

Debug build almost working, AlwaysCreate error thrown still.
2023-02-19 09:51:50 +00:00
Philippe Tillet
c1194bd237 [OPTIMIZER] Refined side-effect traits (#1216) 2023-02-19 01:21:19 -08:00