github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Rohit Santhanam	cd9ae1cd36	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-02232023	2023-02-23 21:41:54 +00:00
rsanthanam-amd	e7f84448bf	Merge pull request #127 from dfukalov/dfukalov/work-3 [ROCM] Enable float16 and int8 types for FMA based `dot` implementation.	2023-02-22 16:39:04 -06:00
Daniil Fukalov	2d678efb89	[ROCM] Enable float16 and int8 types for FMA based `dot` implementation. By default Triton generates MLIR with f32 result of the tt.dot operation on f16 typed operands. So we have "tt.dot(f16,f16,f32)->f32" types in .ttgir. But LLVM FMA instruction requires for the same type for all three operands. So first two operands are implicitly casted f16->f32 as "unrealized_conversion_cast struct{f16,f16,...}->struct{f32,f32}". The change fixed incorrect implicit cast generation. For the int8 typed operands result operand is also casted after performing dot. As the next step to improve FMA based dot operation FMA on f16 and int8 target specific intrinsics (e.g. fma(f16,f16,f16)->f16) could be used, perhaps as an option.	2023-02-22 22:36:20 +01:00
Eric Wang	320ae18093	[FRONTEND] Add error messages for arange (#1218 ) Fix issue https://github.com/openai/triton/issues/244 Check `end` is greater than `start`. Check if the range can fit in `int32`. Check the number of elements less than or equal to `TRITON_MAX_TENSOR_NUMEL = 131072`. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-02-22 00:37:28 +00:00
Michaël Benesty	940f394a35	[Frontend] fix crash on cast when dest is constexpr (#1222 ) This pull request addresses a crash that occurs when casting to a tl.constexpr type in the frontend. More info and repro code available in: https://github.com/openai/triton/issues/1221	2023-02-20 10:50:33 -08:00
Rohit Santhanam	841784d1e3	Merge remote-tracking branch 'upstream/main' into upgrade_triton_mlir_rocm_to_llvm_head	2023-02-18 09:25:20 +00:00
Philippe Tillet	4d067f5120	[FRONTEND] Now emit an error for `tl.reshape`, instead of silently calling `tl.view` (#1212 )	2023-02-17 20:21:20 -08:00
Eric Wang	30db959dae	[FRONTEND] Add error messages for load/store (#1179 ) Fix issue https://github.com/openai/triton/issues/633	2023-02-13 10:52:50 -05:00
Rohit Santhanam	a2416e0901	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-02112023	2023-02-11 14:48:19 +00:00
Philippe Tillet	2aba985daa	[OPTIMIZER] Improved layout simplifications heuristics (#1168 )	2023-02-09 20:17:25 -08:00
fdrocha	972b761390	[FRONTEND] For __rshift__ operator, use arithmetic right shift if dtype is a signed int. (#1153 )	2023-02-06 10:26:17 +00:00
Keren Zhou	ce47f94e59	[FRONTEND] Check if the data types of A and B in the dot op have the same data type (#1155 )	2023-02-06 01:58:07 -08:00
Rohit Santhanam	8cb6ab5b1a	Merge remote-tracking branch 'upstream/main' into triton_mlir_IFU_02022023	2023-02-02 22:54:53 +00:00
George Karpenkov	a9d1935e79	[FRONTEND] Fix error message when atomics are not supported for a given dtype (#1134 ) Otherwise, the construction of the exception crashes during string concatenation.	2023-02-02 02:49:34 +00:00
Philippe Tillet	c4b9d699d2	[FRONTEND][BACKEND] Fixed many bugs (#1122 ) - temporarily commenting assertion in `MemBar.cpp`. We need to fix this! but for now the following patches will unblock a number of users. - Fixed frontend codegen issue for If / For / While. Emit an error when replaced values' type mismatch. - Added "top level" codepath for if statements, which allows users to write patterns to exit early from kernels (e.g., `if cond1: if cond2: return else: ...`). Added associated codegen in TritonToTritonGPUPass - Added basic control flow tests - Pipeline pass is no longer activated when memory accesses can't be vectorized - Added missing magic methods to `constexpr` - Fixed issue in random.py: bitcast some values to uint when they need to be. - Added support for `Not` - Fixed nondeterministic compilation issue	2023-01-30 23:22:36 -08:00
Yan Chunwei	94b419c327	[FRONTEND] some tiny fix (#1120 )	2023-01-30 19:39:38 -08:00
Nishant Sikarwar	653c8dc124	[FRONTEND] Replaced range with enumerate calls (#1110 ) Using range(len(...)) is not pythonic. Python does not have not index-based loops. Instead, it uses collection iterators. Python has a built-in method enumerate which adds a counter to an iterable. Using this, you can access the counter and the value from the iterable at the same time. It is therefore recommended to replace range(len(...)) with enumerate(...). for ex `5bcf60a5c0/python/triton/language/extern.py (L68)` `f62d556fff/python/triton/language/extern.py (L68)` Signed-off-by: GitHub <noreply@github.com> Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-01-30 15:22:11 -08:00
Michael Melesse	a9f955f862	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-2023-30-1	2023-01-30 14:04:01 -06:00
Nishant Sikarwar	e5dbe35cc1	[FRONTEND] removed unnecessary comprehension (#1085 )	2023-01-30 19:42:14 +00:00
Rohit Santhanam	2d0ee0fa0f	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-01232023	2023-01-24 03:59:17 +00:00
Daniil Fukalov	e6983feb91	[ROCM] Implement next part of atomics. - fixed scalar atomic_rmw implementation for fmin/fmax for f32 - fixed tensor atomic_rmw - added atomic_cas implementation. TODO: fix atomic_rmw for f16, implement fmin/fmax for f32 with native instructions (asm inline in case of LLVM 14) instead of tweak used as for NV.	2023-01-23 14:01:08 +01:00
Keren Zhou	3f47e9aa0e	[BACKEND] Fix unrealized conversion for fp32 dot (#1051 )	2023-01-17 21:55:44 +00:00
Rohit Santhanam	ce8adb92bd	Merge remote-tracking branch 'upstream/master' into triton-mlir-IFU-01142023	2023-01-14 19:19:58 +00:00
Philippe Tillet	dc7ecf4535	[FRONTEND] Fix output datatype of reduce (#1045 )	2023-01-10 15:04:54 -08:00
Keren Zhou	733301ff31	[Backend] Rewrite code for linking external library to expose more inlining opportunities (#1037 ) - Also make it cleaner. - And mark out the code needs to be fixed in `semantic.py`.	2023-01-08 13:44:29 -08:00
Sharad Vikram	4a3eb0fb9f	[FRONTEND] Fix argmin/max output type (#1012 ) Currently Triton returns tensors with the input types rather than i32 when doing reduce argmax/argmin.	2023-01-04 15:13:47 +00:00
Sharad Vikram	bc73bbb12c	[FRONTEND] Fix argmin/max output type (#1012 ) Currently Triton returns tensors with the input types rather than i32 when doing reduce argmax/argmin.	2023-01-03 23:12:16 -08:00
Michael Melesse	41578a63d2	Merge remote-tracking branch 'upstream/triton-mlir' into triton-mlir-IFU	2022-12-21 12:53:03 -06:00
Philippe Tillet	20100a7254	Merge `triton-mlir` branch - Complete rewrite of the backend from scratch (#1004 ) This PR merges the `triton-mlir` branch, in which we have been quietly rewriting the Triton backend from scratch to increase maintainability, stability and ultimately performance. Changes to the runtime are minimal, and this new version aims to remain backward-compatible with the previous commit. The legacy backend is now officially deprecated, but can still be accessed via the `legacy-backend` tag. Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: Yan Chunwei <yanchunwei@outlook.com> Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com> Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com> Co-authored-by: Yan Da <dyanab@connect.ust.hk> Co-authored-by: Jun Yang <yangjunpro@gmail.com> Co-authored-by: Ian Bearman <ianb@microsoft.com> Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Qingyi Liu <qingyil@nvidia.com> Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com> Co-authored-by: Chenggang Zhao <lyricz@yeah.net> Co-authored-by: ben-zhang-609 <benzh609@gmail.com> Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-12-21 01:30:50 -08:00
Keren Zhou	50a5128448	[Triton-MLIR][BACKEND] Support bfloat16 and clean up some test code (#998 )	2022-12-20 22:26:51 -08:00
Philippe Tillet	899bb0a0e7	[FORMAT] Run `clang-format`, `autopep8` and `isort` (#1000 )	2022-12-20 17:47:34 -08:00
Philippe Tillet	e759d8ef61	[FRONTEND] `%` now has same semantics as torch on floats (#999 )	2022-12-20 15:37:19 -08:00
Philippe Tillet	9f27468377	[TESTS][FRONTEND][BACKEND] Merge `master` and `triton-mlir` tests (#979 ) Also fix a bunch of bugs in float32 / tf32 Co-authored-by: Jokeren <kerenzhou@openai.com>	2022-12-15 19:28:50 -08:00
Philippe Tillet	e5cfa0f633	[FRONTEND] Added a few assertions in `semantic.dot` (#977 )	2022-12-12 00:07:14 -08:00
Philippe Tillet	e552219104	[FRONTEND] Add possibility for user to force a GPU threadsync barrier (#976 ) compiler still has pitfalls even in master branch	2022-12-11 23:03:52 -08:00
Keren Zhou	be2f70699c	[BACKEND][FRONTEND] Fix problems with test_matmul (#973 ) 1. Handle induction variable when step is negative 2. Restore async_wait that accidentally deleted 3. Add missing induction variable in prefetch 4. Add device property functions Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com>	2022-12-10 20:34:58 -08:00
Keren Zhou	83f3b9165b	[FRONTEND][BACKEND] Fix bool and int8 load when the other operand is given (#968 )	2022-12-08 11:52:18 -08:00
Rohit Santhanam	dbe1b2aafb	AMDGCN fixes for libdevice.py.	2022-12-08 19:08:26 +00:00
Philippe Tillet	b2b793dfb5	[FRONTEND][BACKEND] Fixes for cat / reshape / addptr (#959 ) Most notably, this PR: - changes the traits (and assembly format) of addptr so it can handle offsets that have arbitrary integer width. - adds support for `cat`	2022-12-06 23:29:50 -08:00
Philippe Tillet	532e10cf87	[FRONTEND][BACKEND] Clean-up transpositions (#953 )	2022-12-06 09:32:13 -08:00
Philippe Tillet	8edfe813a5	[FRONTEND][BACKEND] Added `trans` instruction; made flash attention bwd pass work (#943 )	2022-12-03 09:58:24 -08:00
Philippe Tillet	9bb54402b3	[FRONTEND][BACKEND] Small fixes to multiple_of, num_programs, axisinfo; enable block-sparse tests (#927 )	2022-11-29 20:00:34 +01:00
Qingyi Liu	9d31998a9d	[Triton-MLIR][BACKEND] Add argmin / argmax implementation for ReduceOp (#918 )	2022-11-27 22:59:27 -08:00
donproc	8925c2cd11	[TRITON-MLIR][BACKEND]AtomicRMWOp supports scalar (#903 ) AtomicRMWOp supports scalar Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-11-23 07:59:09 +00:00
Chenggang Zhao	516a241234	[Triton-MLIR] Fix some typos (#874 ) Fix some typos	2022-11-13 18:15:53 -08:00
Chenggang Zhao	57fd1864a7	[Triton-MLIR] Support FP8 (#864 ) Co-authored-by: Superjomn <yanchunwei@outlook.com>	2022-11-10 15:53:06 +08:00
ben-zhang-609	5feb6e24f9	[Triton-MLIR]Add ptx vprintf support (#825 ) Not know how to write unit test for this feature. Co-authored-by: Yan Chunwei <yanchunwei@outlook.com>	2022-11-02 16:39:09 +08:00
Philippe Tillet	7dfab26a39	[FRONTEND][BACKEND] Fixed various bugs (#819 ) - Fixed bugs on layout conversions for int1 data (we should use int8 internally for int1 data to prevent llvm from using vec<i1> which has different semantics) - Fixed semantics of some casts to bool in the frontend	2022-10-29 06:34:14 +00:00
ben-zhang-609	3685194456	[Triton-MLIR][BACKEND] Add elementwise ops and tests (#804 ) Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2022-10-28 05:26:29 +00:00
Philippe Tillet	3e6cc6d66c	[FRONTEND] Made more tests pass (#805 )	2022-10-26 17:47:33 -07:00

1 2

82 Commits