Philippe Tillet
fc7c0b0e43
[FRONTEND] Removed torch dependency and cleaned up testing ( #1394 )
...
`assert triton.testing.allclose` -> `torch.testing.assert_allclose`
`triton.testing.assert_almost_equal` -> `torch.testing.assert_allclose`
2023-03-23 22:37:21 -07:00
xndcn
ff1d0377e0
[BACKEND] Fix wrong conversion from float8e5m2 <> bfloat16 ( #1391 )
...
exponent compensate should be 0x3800(112) instead of 0x3000(96)
also add a mantissa bit for float16 conversion to round to nearest
float8e5m2
Co-authored-by: Philippe Tillet <phil@openai.com >
2023-03-24 04:42:08 +00:00
Natalia Gimelshein
3239c93a93
[TEST] add a test for inductor normalization pattern ( #1390 )
2023-03-23 00:29:28 +00:00
xndcn
65d8d802d5
[BACKEND] Fix wrong conversion from float8e4m3 <> bfloat16 ( #1384 )
...
exponent compensate should be 0x3c00(120) instead of 0x3800(112)
2023-03-21 18:58:13 -07:00
Keren Zhou
c1dd6df9ce
[FRONTEND] Fix negative induction variable ( #1382 )
2023-03-21 08:38:16 -07:00
xndcn
84ffefc368
[BACKEND] Fix wrong conversion from float8e4m3 <> float16 ( #1375 )
...
after offset shifting, exponent compensate should not be forgotten
also add back some comments from `legacy_backend`
2023-03-20 21:45:25 -07:00
Phil Tillet
e650d3708b
[FRONTEND] dot now uses tl.float32 by default for out_dtype.
2023-03-19 21:58:46 -07:00
Philippe Tillet
b4decbe155
[BACKEND] Now using call_once to initialize LLVM target ( #1373 )
2023-03-19 21:23:39 -07:00
Fei Hu
6366c5a254
[FRONTEND][BACKEND] Add support for FP16 output for tl.dot ( #1258 )
...
---------
Co-authored-by: Fei Hu <fhu@microsoft.com >
2023-03-19 19:52:14 -07:00
Rohit Santhanam
a84b4883e6
Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-03192023
2023-03-19 13:46:50 +00:00
Philippe Tillet
39139258c8
[FRONTEND][BACKEND] tl.mathlib -> tl.math; internally reverted to mathlib -> libdevice ( #1368 )
2023-03-19 02:14:57 -07:00
rsanthanam-amd
c575911a01
[FRONTEND] Change libdevice to mathlib and fix abs ( #1361 )
...
Co-authored-by: Phil Tillet <phil@openai.com >
2023-03-19 01:34:16 -07:00
Horace He
1d2871d0d1
[RUNTIME] Fix memory leak in ( #1358 )
...
Fixes a bug that causes Triton to leak 32 bytes on every kernel
invocation.
Also solves https://github.com/pytorch/pytorch/issues/96937
2023-03-16 17:52:06 -07:00
Rohit Santhanam
def693c53f
Update test_core_amd.py with new unit test from test_core.py.
2023-03-16 23:53:43 +00:00
Berke Kocaoğlu
ba91f39dbf
[DOC] Fix syntax errors, typos, formatting; increase consistency ( #1357 )
...
This PR;
- Fixes syntax errors like `.type values: dict[str,
Callable[[list[Any]], Any]]` to `:type values: dict[str,
Callable[[list[Any]], Any]]`,
- Fixes typos,
- Fixes formatting like `k ++` to ` k++`,
- Increases consistency (e.g. by transforming the minority `cd dir/` to
the majority `cd dir`).
2023-03-16 15:32:02 -07:00
Rohit Santhanam
77dcb667b5
Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-03162023
2023-03-16 13:21:15 +00:00
Philippe Tillet
56b23f433d
[TEST] Temporarily disable test_dot mode that fails because of ptxas/nvptx ( #1344 )
2023-03-15 01:17:48 -07:00
peterbell10
01b177afe7
[FRONTEND] Mangle signed and unsigned integer types differently ( #1340 )
...
This is cherry-picked from #1305
If you call a `JITFunction` twice in the same kernel, first with `int32`
then with `uint32`, the second call will treat the unsigned value as
signed. This passes through MLIR without error because MLIR uses the
same types for both, but different operation calls will be generated so
you may silently get the wrong result.
2023-03-14 22:29:18 -07:00
Rohit Santhanam
40351272f4
Fix test_convert2d from test_core_amd.py.
2023-03-14 15:11:56 +00:00
Philippe Tillet
6a8634e2a7
[BACKEND] No longer running LLVM-IR optimizations after codegen. ( #1338 )
...
This triggered some outrageous bugs. See #1337 .
2023-03-13 22:50:15 -07:00
Rohit Santhanam
6ff54b495c
Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-03122023
2023-03-13 18:09:12 +00:00
Philippe Tillet
3fe3adbcde
[FRONTEND][BACKEND] Add support for float8e5m2 type ( #1314 )
2023-03-10 19:14:47 -08:00
Keren Zhou
8b25c30d39
[BACKEND] Fix bfloat16 flash attention ( #1306 )
...
See https://github.com/openai/triton/issues/1245 for more detailed
information
---------
Co-authored-by: giorgio-arena <arena.cpp@gmail.com >
2023-03-09 21:14:52 -08:00
Da Yan
902c61affb
[BACKEND] Add arith::SelectOp => LLVM::SelectOp conversion ( #1307 )
2023-03-09 09:35:30 -08:00
Keren Zhou
78b311f6e2
[FRONTEND] Fix cast when both src_ty and dst_ty are of block_type ( #1301 )
...
Commonly used in atomic_rmw ops
2023-03-08 09:25:00 -08:00
B1tway
b5dc18d7c9
Added missing #ifdef and fixed code style
2023-03-07 11:32:52 +00:00
Keren Zhou
4731f300d3
[BACKEND] Mask out wrapped threads in store ops ( #1283 )
2023-03-06 14:50:20 -08:00
B1tway
625a99aa78
Fixing the layouts in the tests for Wave64
2023-03-06 18:35:18 +00:00
Keren Zhou
d376020f90
[FRONTEND][BACKEND] Implement tl.device_assert and rename tl.printf to tl.device_print ( #1143 )
...
Note that `tl.device_print` and `print` accepts different arguments than
the normal `print`. The first argument must be a string, following by
variables.
Device side:
- `tl.device_print`
- `tl.device_assert`
- `print`
- `assert`
Compilation time:
- `tl.static_assert`
- `tl.static_print`
Usage example:
1.
```Python
tl.device_assert(x == 0, "x != 0")
```
Output:
```Python
...
python/test/unit/language/assert_helper.py:18: kernel: block: [0,0,0], thread: [33,0,0] Assertion `x != 0` failed.
...
```
2.
```Python
tl.device_print("hello ", x)
```
Output:
```Python
...
hello 1
...
```
The environment variable `TRITON_DEBUG` sets the default debugging flag; if it's true, `tl.device_assert` or `assert` will be skipped.
2023-03-04 08:08:29 -08:00
Keren Zhou
65e5a3bc24
[FRONTEND] Improve tl.full to accept both static and dynamic values ( #1269 )
2023-03-02 12:19:54 -08:00
Philippe Tillet
fa0fbc937f
[FRONTEND][BACKEND][OPTIMIZER] Loops now use 64-bit indices when necessary ( #1261 )
...
* Frontend:
- `int` kernel arguments are always signed
- Loop induction variable is now determine by integer promotion on
lb/ub/step
* Optimizer:
- Added new ExtractSliceOp that enforces 32-bit offsets
* Backend:
- Use 64-bit indices when lowering functions and control flow
- Removed `idx_val` macro and replaced it with `i32_val`
- Cleaned up comments
- Added new ArithToIndex pass to make sure operations on indices are
done with the `index` dialect, that gets converted to LLVM separately
using a 64-bit target
2023-03-01 23:09:48 -08:00
Keren Zhou
90fcb38c7b
[BACKEND] Overwrite NVPTX converters for fp16<->fp32 and int16<->int32 to avoid ptxas problems ( #1267 )
2023-03-01 18:26:06 -08:00
Da Yan
cb7b315a17
[OPTIMIZER] Copying named attributes when converting from Triton to TritonGPU ( #1265 )
2023-03-01 12:31:46 -08:00
Rohit Santhanam
3e48537264
Merge remote-tracking branch 'upstream/main' into triton-mlir
2023-02-28 17:21:18 +00:00
Da Yan
0eead250c1
[FRONTEND] add missing tensor/constexpr ops ( #1249 )
2023-02-24 18:45:22 +00:00
rsanthanam-amd
f6548e37c1
Merge branch 'triton-mlir' into dfukalov/work-3
2023-02-23 20:08:16 -06:00
Daniil Fukalov
7133d23512
[Tests] Update test_core_amd.py from test_core.py.
...
1. Added some tests/configs are passing at the moment.
2. Changed switching off parts from commenting to `if torch.version.hip`.
2023-02-23 23:09:58 +01:00
Rohit Santhanam
cd9ae1cd36
Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-02232023
2023-02-23 21:41:54 +00:00
rsanthanam-amd
e7f84448bf
Merge pull request #127 from dfukalov/dfukalov/work-3
...
[ROCM] Enable float16 and int8 types for FMA based `dot` implementation.
2023-02-22 16:39:04 -06:00
Daniil Fukalov
2d678efb89
[ROCM] Enable float16 and int8 types for FMA based dot implementation.
...
By default Triton generates MLIR with f32 result of the tt.dot operation on f16
typed operands. So we have "tt.dot(f16,f16,f32)->f32" types in .ttgir. But
LLVM FMA instruction requires for the same type for all three operands. So first
two operands are implicitly casted f16->f32 as
"unrealized_conversion_cast struct{f16,f16,...}->struct{f32,f32}".
The change fixed incorrect implicit cast generation.
For the int8 typed operands result operand is also casted after performing dot.
As the next step to improve FMA based dot operation FMA on f16 and int8 target
specific intrinsics (e.g. fma(f16,f16,f16)->f16) could be used, perhaps as an
option.
2023-02-22 22:36:20 +01:00
Philippe Tillet
ba0198326e
[TESTS] make performance regression testing less strict ( #1231 )
2023-02-21 22:22:02 -08:00
Philippe Tillet
174f121c1c
[TESTS] Added attention regression tests ( #1227 )
2023-02-21 20:22:36 -08:00
Philippe Tillet
307dde9cb5
[CI] revived regression tests ( #1225 )
2023-02-21 16:33:03 -08:00
Rohit Santhanam
841784d1e3
Merge remote-tracking branch 'upstream/main' into upgrade_triton_mlir_rocm_to_llvm_head
2023-02-18 09:25:20 +00:00
Christian Sigg
fc7a8e3581
Rebase Triton to LLVM-15. ( #1070 )
...
This PR rebases Triton from LLVM-14 to LLVM-15. Most changes are
mechanical, except for the analysis framework changes.
2023-02-16 06:40:53 -08:00
Philippe Tillet
9c330a411c
[FRONTEND] fixed pinned memory exception behavior ( #1197 )
...
no longer raise exception when the pointer is on "cpu" but also
accessible from within kernels (e.g., pinned memory)
2023-02-15 17:40:45 -08:00
Philippe Tillet
e3941f9d09
[OPTIMIZER][BACKEND] Cleaned up Volta codegen ( #1185 )
2023-02-14 22:39:35 -08:00
Keren Zhou
6413c7b9de
[BACKEND] Calculate correct warp ids for small matrices ( #1180 )
...
Fixing https://github.com/openai/triton/issues/1162
Add tests 16x16x16
2023-02-14 05:28:03 +00:00
rsanthanam-amd
44f69bea81
Merge pull request #113 from ROCmSoftwarePlatform/triton-mlir-IFU-02112023
...
Triton mlir ifu 02112023
2023-02-13 09:26:10 -06:00
rsanthanam-amd
ec387d5bf4
Merge pull request #109 from dfukalov/dfukalov/work-3
...
[ROCM] Enable part of tl.dot operations.
2023-02-12 13:50:20 -06:00