Commit Graph

82 Commits

Author SHA1 Message Date
Rohit Santhanam
cd9ae1cd36 Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-02232023 2023-02-23 21:41:54 +00:00
rsanthanam-amd
e7f84448bf Merge pull request #127 from dfukalov/dfukalov/work-3
[ROCM] Enable float16 and int8 types for FMA based `dot` implementation.
2023-02-22 16:39:04 -06:00
Daniil Fukalov
2d678efb89 [ROCM] Enable float16 and int8 types for FMA based dot implementation.
By default Triton generates MLIR with f32 result of the tt.dot operation on f16
 typed operands. So we have "tt.dot(f16,f16,f32)->f32" types in .ttgir. But
LLVM FMA instruction requires for the same type for all three operands. So first
two operands are implicitly casted f16->f32 as
"unrealized_conversion_cast struct{f16,f16,...}->struct{f32,f32}".
The change fixed incorrect implicit cast generation.
For the int8 typed operands result operand is also casted after performing dot.

As the next step to improve FMA based dot operation FMA on f16 and int8 target
specific intrinsics (e.g. fma(f16,f16,f16)->f16) could be used, perhaps as an
option.
2023-02-22 22:36:20 +01:00
Eric Wang
320ae18093 [FRONTEND] Add error messages for arange (#1218)
Fix issue https://github.com/openai/triton/issues/244

Check `end` is greater than `start`.
Check if the range can fit in `int32`.
Check the number of elements less than or equal to
`TRITON_MAX_TENSOR_NUMEL = 131072`.

---------

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-22 00:37:28 +00:00
Michaël Benesty
940f394a35 [Frontend] fix crash on cast when dest is constexpr (#1222)
This pull request addresses a crash that occurs when casting to a
tl.constexpr type in the frontend.

More info and repro code available in:
https://github.com/openai/triton/issues/1221
2023-02-20 10:50:33 -08:00
Rohit Santhanam
841784d1e3 Merge remote-tracking branch 'upstream/main' into upgrade_triton_mlir_rocm_to_llvm_head 2023-02-18 09:25:20 +00:00
Philippe Tillet
4d067f5120 [FRONTEND] Now emit an error for tl.reshape, instead of silently calling tl.view (#1212) 2023-02-17 20:21:20 -08:00
Eric Wang
30db959dae [FRONTEND] Add error messages for load/store (#1179)
Fix issue https://github.com/openai/triton/issues/633
2023-02-13 10:52:50 -05:00
Rohit Santhanam
a2416e0901 Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-02112023 2023-02-11 14:48:19 +00:00
Philippe Tillet
2aba985daa [OPTIMIZER] Improved layout simplifications heuristics (#1168) 2023-02-09 20:17:25 -08:00
fdrocha
972b761390 [FRONTEND] For __rshift__ operator, use arithmetic right shift if dtype is a signed int. (#1153) 2023-02-06 10:26:17 +00:00
Keren Zhou
ce47f94e59 [FRONTEND] Check if the data types of *A* and *B* in the dot op have the same data type (#1155) 2023-02-06 01:58:07 -08:00
Rohit Santhanam
8cb6ab5b1a Merge remote-tracking branch 'upstream/main' into triton_mlir_IFU_02022023 2023-02-02 22:54:53 +00:00
George Karpenkov
a9d1935e79 [FRONTEND] Fix error message when atomics are not supported for a given dtype (#1134)
Otherwise, the construction of the exception crashes during string
concatenation.
2023-02-02 02:49:34 +00:00
Philippe Tillet
c4b9d699d2 [FRONTEND][BACKEND] Fixed many bugs (#1122)
- **temporarily commenting assertion in `MemBar.cpp`. We need to fix
this! but for now the following patches will unblock a number of
users.**
- Fixed frontend codegen issue for If / For / While. Emit an error when
replaced values' type mismatch.
- Added "top level" codepath for if statements, which allows users to
write patterns to exit early from kernels (e.g., `if cond1: if cond2:
return else: ...`). Added associated codegen in TritonToTritonGPUPass
- Added basic control flow tests
- Pipeline pass is no longer activated when memory accesses can't be
vectorized
- Added missing magic methods to `constexpr`
- Fixed issue in random.py: bitcast some values to uint when they need
to be.
- Added support for `Not`
- Fixed nondeterministic compilation issue
2023-01-30 23:22:36 -08:00
Yan Chunwei
94b419c327 [FRONTEND] some tiny fix (#1120) 2023-01-30 19:39:38 -08:00
Nishant Sikarwar
653c8dc124 [FRONTEND] Replaced range with enumerate calls (#1110)
Using range(len(...)) is not pythonic. 
Python does not have not index-based loops. Instead, it uses collection
iterators. Python has a built-in method enumerate which adds a counter
to an iterable. Using this, you can access the counter and the value
from the iterable at the same time. It is therefore recommended to
replace range(len(...)) with enumerate(...).

for ex 


5bcf60a5c0/python/triton/language/extern.py (L68)



f62d556fff/python/triton/language/extern.py (L68)

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2023-01-30 15:22:11 -08:00
Michael Melesse
a9f955f862 Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-2023-30-1 2023-01-30 14:04:01 -06:00
Nishant Sikarwar
e5dbe35cc1 [FRONTEND] removed unnecessary comprehension (#1085) 2023-01-30 19:42:14 +00:00
Rohit Santhanam
2d0ee0fa0f Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-01232023 2023-01-24 03:59:17 +00:00
Daniil Fukalov
e6983feb91 [ROCM] Implement next part of atomics.
- fixed scalar atomic_rmw implementation for fmin/fmax for f32
- fixed tensor atomic_rmw
- added atomic_cas implementation.

TODO: fix atomic_rmw for f16, implement fmin/fmax for f32 with
native instructions (asm inline in case of LLVM 14) instead of
tweak used as for NV.
2023-01-23 14:01:08 +01:00
Keren Zhou
3f47e9aa0e [BACKEND] Fix unrealized conversion for fp32 dot (#1051) 2023-01-17 21:55:44 +00:00
Rohit Santhanam
ce8adb92bd Merge remote-tracking branch 'upstream/master' into triton-mlir-IFU-01142023 2023-01-14 19:19:58 +00:00
Philippe Tillet
dc7ecf4535 [FRONTEND] Fix output datatype of reduce (#1045) 2023-01-10 15:04:54 -08:00
Keren Zhou
733301ff31 [Backend] Rewrite code for linking external library to expose more inlining opportunities (#1037)
- Also make it cleaner. 
- And mark out the code needs to be fixed in `semantic.py`.
2023-01-08 13:44:29 -08:00
Sharad Vikram
4a3eb0fb9f [FRONTEND] Fix argmin/max output type (#1012)
Currently Triton returns tensors with the input types rather than i32
when doing reduce argmax/argmin.
2023-01-04 15:13:47 +00:00
Sharad Vikram
bc73bbb12c [FRONTEND] Fix argmin/max output type (#1012)
Currently Triton returns tensors with the input types rather than i32
when doing reduce argmax/argmin.
2023-01-03 23:12:16 -08:00
Michael Melesse
41578a63d2 Merge remote-tracking branch 'upstream/triton-mlir' into triton-mlir-IFU 2022-12-21 12:53:03 -06:00
Philippe Tillet
20100a7254 Merge triton-mlir branch - Complete rewrite of the backend from scratch (#1004)
This PR merges the `triton-mlir` branch, in which we have been quietly
rewriting the Triton backend from scratch to increase maintainability,
stability and ultimately performance. Changes to the runtime are
minimal, and this new version aims to remain backward-compatible with
the previous commit. The legacy backend is now officially deprecated,
but can still be accessed via the `legacy-backend` tag.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com>
Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com>
Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com>
Co-authored-by: Yan Da <dyanab@connect.ust.hk>
Co-authored-by: Jun Yang <yangjunpro@gmail.com>
Co-authored-by: Ian Bearman <ianb@microsoft.com>
Co-authored-by: Jason Ansel <jansel@jansel.net>
Co-authored-by: Qingyi Liu <qingyil@nvidia.com>
Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com>
Co-authored-by: Chenggang Zhao <lyricz@yeah.net>
Co-authored-by: ben-zhang-609 <benzh609@gmail.com>
Co-authored-by: dongdongl <dongdongl@nvidia.com>
2022-12-21 01:30:50 -08:00
Keren Zhou
50a5128448 [Triton-MLIR][BACKEND] Support bfloat16 and clean up some test code (#998) 2022-12-20 22:26:51 -08:00
Philippe Tillet
899bb0a0e7 [FORMAT] Run clang-format, autopep8 and isort (#1000) 2022-12-20 17:47:34 -08:00
Philippe Tillet
e759d8ef61 [FRONTEND] % now has same semantics as torch on floats (#999) 2022-12-20 15:37:19 -08:00
Philippe Tillet
9f27468377 [TESTS][FRONTEND][BACKEND] Merge master and triton-mlir tests (#979)
Also fix a bunch of bugs in float32 / tf32

Co-authored-by: Jokeren <kerenzhou@openai.com>
2022-12-15 19:28:50 -08:00
Philippe Tillet
e5cfa0f633 [FRONTEND] Added a few assertions in semantic.dot (#977) 2022-12-12 00:07:14 -08:00
Philippe Tillet
e552219104 [FRONTEND] Add possibility for user to force a GPU threadsync barrier (#976)
compiler still has pitfalls even in master branch
2022-12-11 23:03:52 -08:00
Keren Zhou
be2f70699c [BACKEND][FRONTEND] Fix problems with test_matmul (#973)
1. Handle induction variable when step is negative
2. Restore async_wait that accidentally deleted
3. Add missing induction variable in prefetch
4. Add device property functions

Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com>
2022-12-10 20:34:58 -08:00
Keren Zhou
83f3b9165b [FRONTEND][BACKEND] Fix bool and int8 load when the other operand is given (#968) 2022-12-08 11:52:18 -08:00
Rohit Santhanam
dbe1b2aafb AMDGCN fixes for libdevice.py. 2022-12-08 19:08:26 +00:00
Philippe Tillet
b2b793dfb5 [FRONTEND][BACKEND] Fixes for cat / reshape / addptr (#959)
Most notably, this PR:
- changes the traits (and assembly format) of addptr so it can handle offsets that have arbitrary integer width.
- adds support for `cat`
2022-12-06 23:29:50 -08:00
Philippe Tillet
532e10cf87 [FRONTEND][BACKEND] Clean-up transpositions (#953) 2022-12-06 09:32:13 -08:00
Philippe Tillet
8edfe813a5 [FRONTEND][BACKEND] Added trans instruction; made flash attention bwd pass work (#943) 2022-12-03 09:58:24 -08:00
Philippe Tillet
9bb54402b3 [FRONTEND][BACKEND] Small fixes to multiple_of, num_programs, axisinfo; enable block-sparse tests (#927) 2022-11-29 20:00:34 +01:00
Qingyi Liu
9d31998a9d [Triton-MLIR][BACKEND] Add argmin / argmax implementation for ReduceOp (#918) 2022-11-27 22:59:27 -08:00
donproc
8925c2cd11 [TRITON-MLIR][BACKEND]AtomicRMWOp supports scalar (#903)
AtomicRMWOp supports scalar

Co-authored-by: dongdongl <dongdongl@nvidia.com>
2022-11-23 07:59:09 +00:00
Chenggang Zhao
516a241234 [Triton-MLIR] Fix some typos (#874)
Fix some typos
2022-11-13 18:15:53 -08:00
Chenggang Zhao
57fd1864a7 [Triton-MLIR] Support FP8 (#864)
Co-authored-by: Superjomn <yanchunwei@outlook.com>
2022-11-10 15:53:06 +08:00
ben-zhang-609
5feb6e24f9 [Triton-MLIR]Add ptx vprintf support (#825)
Not know how to write unit test for this feature.

Co-authored-by: Yan Chunwei <yanchunwei@outlook.com>
2022-11-02 16:39:09 +08:00
Philippe Tillet
7dfab26a39 [FRONTEND][BACKEND] Fixed various bugs (#819)
- Fixed bugs on layout conversions for int1 data (we should use int8
internally for int1 data to prevent llvm from using vec<i1> which has
different semantics)
- Fixed semantics of some casts to bool in the frontend
2022-10-29 06:34:14 +00:00
ben-zhang-609
3685194456 [Triton-MLIR][BACKEND] Add elementwise ops and tests (#804)
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2022-10-28 05:26:29 +00:00
Philippe Tillet
3e6cc6d66c [FRONTEND] Made more tests pass (#805) 2022-10-26 17:47:33 -07:00