Commit Graph

167 Commits

Author SHA1 Message Date
Rohit Santhanam
cd9ae1cd36 Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-02232023 2023-02-23 21:41:54 +00:00
rsanthanam-amd
e7f84448bf Merge pull request #127 from dfukalov/dfukalov/work-3
[ROCM] Enable float16 and int8 types for FMA based `dot` implementation.
2023-02-22 16:39:04 -06:00
Daniil Fukalov
2d678efb89 [ROCM] Enable float16 and int8 types for FMA based dot implementation.
By default Triton generates MLIR with f32 result of the tt.dot operation on f16
 typed operands. So we have "tt.dot(f16,f16,f32)->f32" types in .ttgir. But
LLVM FMA instruction requires for the same type for all three operands. So first
two operands are implicitly casted f16->f32 as
"unrealized_conversion_cast struct{f16,f16,...}->struct{f32,f32}".
The change fixed incorrect implicit cast generation.
For the int8 typed operands result operand is also casted after performing dot.

As the next step to improve FMA based dot operation FMA on f16 and int8 target
specific intrinsics (e.g. fma(f16,f16,f16)->f16) could be used, perhaps as an
option.
2023-02-22 22:36:20 +01:00
Eric Wang
320ae18093 [FRONTEND] Add error messages for arange (#1218)
Fix issue https://github.com/openai/triton/issues/244

Check `end` is greater than `start`.
Check if the range can fit in `int32`.
Check the number of elements less than or equal to
`TRITON_MAX_TENSOR_NUMEL = 131072`.

---------

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-22 00:37:28 +00:00
Yu Guo
19228d88bc [FRONTEND][BACKEND] add env variable TRITON_LIBDEVICE_PATH (#1166)
we may compile kernels on remote machines which do not have local
libdevice.10.bc.

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-21 20:15:12 +00:00
Michaël Benesty
940f394a35 [Frontend] fix crash on cast when dest is constexpr (#1222)
This pull request addresses a crash that occurs when casting to a
tl.constexpr type in the frontend.

More info and repro code available in:
https://github.com/openai/triton/issues/1221
2023-02-20 10:50:33 -08:00
Rohit Santhanam
841784d1e3 Merge remote-tracking branch 'upstream/main' into upgrade_triton_mlir_rocm_to_llvm_head 2023-02-18 09:25:20 +00:00
Philippe Tillet
4d067f5120 [FRONTEND] Now emit an error for tl.reshape, instead of silently calling tl.view (#1212) 2023-02-17 20:21:20 -08:00
Eric Wang
30db959dae [FRONTEND] Add error messages for load/store (#1179)
Fix issue https://github.com/openai/triton/issues/633
2023-02-13 10:52:50 -05:00
Rohit Santhanam
a2416e0901 Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-02112023 2023-02-11 14:48:19 +00:00
Philippe Tillet
2aba985daa [OPTIMIZER] Improved layout simplifications heuristics (#1168) 2023-02-09 20:17:25 -08:00
fdrocha
972b761390 [FRONTEND] For __rshift__ operator, use arithmetic right shift if dtype is a signed int. (#1153) 2023-02-06 10:26:17 +00:00
Keren Zhou
ce47f94e59 [FRONTEND] Check if the data types of *A* and *B* in the dot op have the same data type (#1155) 2023-02-06 01:58:07 -08:00
Philippe Tillet
43798ab27e [BUILD] Restored wheels workflow (#1146)
- Dependent CUDA files (ptxas, cuda.h, libdevice.bc.10) are now packaged in
`triton/third_party/cuda`. `ptxas` is downloaded from conda repo at
install time.
- Can now be built with old glibc (as that used by manylinux2014)
2023-02-03 16:22:10 -08:00
Rohit Santhanam
8cb6ab5b1a Merge remote-tracking branch 'upstream/main' into triton_mlir_IFU_02022023 2023-02-02 22:54:53 +00:00
George Karpenkov
a9d1935e79 [FRONTEND] Fix error message when atomics are not supported for a given dtype (#1134)
Otherwise, the construction of the exception crashes during string
concatenation.
2023-02-02 02:49:34 +00:00
Philippe Tillet
8fea1fb478 [FRONTEND] Adding static range (#1130)
Included: Revert "[BACKEND] Replace `mlir::topologicalSort` with a
custom implementation (#1113)"
2023-01-31 18:04:19 -08:00
Philippe Tillet
c4b9d699d2 [FRONTEND][BACKEND] Fixed many bugs (#1122)
- **temporarily commenting assertion in `MemBar.cpp`. We need to fix
this! but for now the following patches will unblock a number of
users.**
- Fixed frontend codegen issue for If / For / While. Emit an error when
replaced values' type mismatch.
- Added "top level" codepath for if statements, which allows users to
write patterns to exit early from kernels (e.g., `if cond1: if cond2:
return else: ...`). Added associated codegen in TritonToTritonGPUPass
- Added basic control flow tests
- Pipeline pass is no longer activated when memory accesses can't be
vectorized
- Added missing magic methods to `constexpr`
- Fixed issue in random.py: bitcast some values to uint when they need
to be.
- Added support for `Not`
- Fixed nondeterministic compilation issue
2023-01-30 23:22:36 -08:00
Yan Chunwei
94b419c327 [FRONTEND] some tiny fix (#1120) 2023-01-30 19:39:38 -08:00
Nishant Sikarwar
653c8dc124 [FRONTEND] Replaced range with enumerate calls (#1110)
Using range(len(...)) is not pythonic. 
Python does not have not index-based loops. Instead, it uses collection
iterators. Python has a built-in method enumerate which adds a counter
to an iterable. Using this, you can access the counter and the value
from the iterable at the same time. It is therefore recommended to
replace range(len(...)) with enumerate(...).

for ex 


5bcf60a5c0/python/triton/language/extern.py (L68)



f62d556fff/python/triton/language/extern.py (L68)

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2023-01-30 15:22:11 -08:00
Michael Melesse
a9f955f862 Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-2023-30-1 2023-01-30 14:04:01 -06:00
Nishant Sikarwar
e5dbe35cc1 [FRONTEND] removed unnecessary comprehension (#1085) 2023-01-30 19:42:14 +00:00
Rohit Santhanam
2d0ee0fa0f Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-01232023 2023-01-24 03:59:17 +00:00
Daniil Fukalov
e6983feb91 [ROCM] Implement next part of atomics.
- fixed scalar atomic_rmw implementation for fmin/fmax for f32
- fixed tensor atomic_rmw
- added atomic_cas implementation.

TODO: fix atomic_rmw for f16, implement fmin/fmax for f32 with
native instructions (asm inline in case of LLVM 14) instead of
tweak used as for NV.
2023-01-23 14:01:08 +01:00
Keren Zhou
c59fb4acca [FRONTEND] Fix libdevice elementwise compute for constexpr (#1082) 2023-01-22 07:11:44 +00:00
Nishant Sikarwar
7687f85ca4 [FRONTEND] decorating static methods with @staticmethod (#1069) 2023-01-17 14:35:06 -08:00
Keren Zhou
3f47e9aa0e [BACKEND] Fix unrealized conversion for fp32 dot (#1051) 2023-01-17 21:55:44 +00:00
Nishant Sikarwar
4a74d6eae9 [FRONTEND] replaced chains comparison operator with in (#1059) 2023-01-15 20:14:35 +00:00
Rohit Santhanam
ce8adb92bd Merge remote-tracking branch 'upstream/master' into triton-mlir-IFU-01142023 2023-01-14 19:19:58 +00:00
Philippe Tillet
dc7ecf4535 [FRONTEND] Fix output datatype of reduce (#1045) 2023-01-10 15:04:54 -08:00
Keren Zhou
733301ff31 [Backend] Rewrite code for linking external library to expose more inlining opportunities (#1037)
- Also make it cleaner. 
- And mark out the code needs to be fixed in `semantic.py`.
2023-01-08 13:44:29 -08:00
Keren Zhou
4023149ee3 [Frontend] Convert constexpr to value for store and load ops (#1030)
Fixing problem 2 in https://github.com/openai/triton/issues/1017

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-01-05 14:40:16 -05:00
Sophia Wisdom
411bacb2a8 [FRONTEND] Add logical operations on constexprs (#1033) 2023-01-04 18:06:32 -08:00
Sharad Vikram
4a3eb0fb9f [FRONTEND] Fix argmin/max output type (#1012)
Currently Triton returns tensors with the input types rather than i32
when doing reduce argmax/argmin.
2023-01-04 15:13:47 +00:00
Keren Zhou
18c47161ae [Frontend] Fix import for libdevice (#1028)
This is a hotfix for issue 1 in
https://github.com/openai/triton/issues/1017
2023-01-04 15:13:30 +00:00
Sharad Vikram
d32c538066 [FRONTEND] Export broadcast and broadcast_to in triton.language (#1007) 2023-01-04 15:10:30 +00:00
Keren Zhou
c9e7385255 [FRONTEND] Fix 3d indexing (#1006) 2023-01-04 14:59:31 +00:00
Sharad Vikram
bc73bbb12c [FRONTEND] Fix argmin/max output type (#1012)
Currently Triton returns tensors with the input types rather than i32
when doing reduce argmax/argmin.
2023-01-03 23:12:16 -08:00
Keren Zhou
8460ea3df1 [Frontend] Fix import for libdevice (#1028)
This is a hotfix for issue 1 in
https://github.com/openai/triton/issues/1017
2023-01-03 15:48:05 -08:00
Sharad Vikram
925d3d7f98 [FRONTEND] Export broadcast and broadcast_to in triton.language (#1007) 2022-12-22 01:57:33 +00:00
Keren Zhou
b5aafb0dab [FRONTEND] Fix 3d indexing (#1006) 2022-12-21 12:52:32 -08:00
Michael Melesse
41578a63d2 Merge remote-tracking branch 'upstream/triton-mlir' into triton-mlir-IFU 2022-12-21 12:53:03 -06:00
Philippe Tillet
20100a7254 Merge triton-mlir branch - Complete rewrite of the backend from scratch (#1004)
This PR merges the `triton-mlir` branch, in which we have been quietly
rewriting the Triton backend from scratch to increase maintainability,
stability and ultimately performance. Changes to the runtime are
minimal, and this new version aims to remain backward-compatible with
the previous commit. The legacy backend is now officially deprecated,
but can still be accessed via the `legacy-backend` tag.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com>
Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com>
Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com>
Co-authored-by: Yan Da <dyanab@connect.ust.hk>
Co-authored-by: Jun Yang <yangjunpro@gmail.com>
Co-authored-by: Ian Bearman <ianb@microsoft.com>
Co-authored-by: Jason Ansel <jansel@jansel.net>
Co-authored-by: Qingyi Liu <qingyil@nvidia.com>
Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com>
Co-authored-by: Chenggang Zhao <lyricz@yeah.net>
Co-authored-by: ben-zhang-609 <benzh609@gmail.com>
Co-authored-by: dongdongl <dongdongl@nvidia.com>
2022-12-21 01:30:50 -08:00
Keren Zhou
50a5128448 [Triton-MLIR][BACKEND] Support bfloat16 and clean up some test code (#998) 2022-12-20 22:26:51 -08:00
Philippe Tillet
899bb0a0e7 [FORMAT] Run clang-format, autopep8 and isort (#1000) 2022-12-20 17:47:34 -08:00
Philippe Tillet
e759d8ef61 [FRONTEND] % now has same semantics as torch on floats (#999) 2022-12-20 15:37:19 -08:00
dfukalov
dfa19a6fe4 [Triton-MLIR][ROCM] Addressed comments.
Added example script to build cuda2gcn.bc.
2022-12-20 00:19:45 +01:00
Daniil Fukalov
c2a6028f26 [Triton-MLIR][ROCM] Added cuda2gcn wrapper for devicelib.
Thereis cuda2gcn wrapper library in ROCm-Device-Libs' sources:
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/tree/amd-stg-open/cuda2gcn

I've just slightly fixed it (going to create appropriate PRs in its repo).
And then have built from the source (the default ROCm-Device-Libs build
doesn't create it).

It contains mappings and/or implementations for __nv* library functions.
This step allowed to remove `if`s in libdevice.py since they are not
neeeded there anymore.

Also added some tests (that started to pass) to test_core_amd.py.
2022-12-20 00:18:50 +01:00
Philippe Tillet
9f27468377 [TESTS][FRONTEND][BACKEND] Merge master and triton-mlir tests (#979)
Also fix a bunch of bugs in float32 / tf32

Co-authored-by: Jokeren <kerenzhou@openai.com>
2022-12-15 19:28:50 -08:00
Philippe Tillet
e5cfa0f633 [FRONTEND] Added a few assertions in semantic.dot (#977) 2022-12-12 00:07:14 -08:00