github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-02-21 03:00:39 -05:00

Author	SHA1	Message	Date
Philippe Tillet	fc7c0b0e43	[FRONTEND] Removed torch dependency and cleaned up testing (#1394 ) `assert triton.testing.allclose` -> `torch.testing.assert_allclose` `triton.testing.assert_almost_equal` -> `torch.testing.assert_allclose`	2023-03-23 22:37:21 -07:00
xndcn	ff1d0377e0	[BACKEND] Fix wrong conversion from float8e5m2 <> bfloat16 (#1391 ) exponent compensate should be 0x3800(112) instead of 0x3000(96) also add a mantissa bit for float16 conversion to round to nearest float8e5m2 Co-authored-by: Philippe Tillet <phil@openai.com>	2023-03-24 04:42:08 +00:00
Natalia Gimelshein	3239c93a93	[TEST] add a test for inductor normalization pattern (#1390 )	2023-03-23 00:29:28 +00:00
xndcn	65d8d802d5	[BACKEND] Fix wrong conversion from float8e4m3 <> bfloat16 (#1384 ) exponent compensate should be 0x3c00(120) instead of 0x3800(112)	2023-03-21 18:58:13 -07:00
Keren Zhou	c1dd6df9ce	[FRONTEND] Fix negative induction variable (#1382 )	2023-03-21 08:38:16 -07:00
xndcn	84ffefc368	[BACKEND] Fix wrong conversion from float8e4m3 <> float16 (#1375 ) after offset shifting, exponent compensate should not be forgotten also add back some comments from `legacy_backend`	2023-03-20 21:45:25 -07:00
Phil Tillet	e650d3708b	[FRONTEND] `dot` now uses `tl.float32` by default for `out_dtype`.	2023-03-19 21:58:46 -07:00
Philippe Tillet	b4decbe155	[BACKEND] Now using `call_once` to initialize LLVM target (#1373 )	2023-03-19 21:23:39 -07:00
Fei Hu	6366c5a254	[FRONTEND][BACKEND] Add support for FP16 output for tl.dot (#1258 ) --------- Co-authored-by: Fei Hu <fhu@microsoft.com>	2023-03-19 19:52:14 -07:00
Rohit Santhanam	a84b4883e6	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-03192023	2023-03-19 13:46:50 +00:00
Philippe Tillet	39139258c8	[FRONTEND][BACKEND] tl.mathlib -> tl.math; internally reverted to mathlib -> libdevice (#1368 )	2023-03-19 02:14:57 -07:00
rsanthanam-amd	c575911a01	[FRONTEND] Change libdevice to mathlib and fix abs (#1361 ) Co-authored-by: Phil Tillet <phil@openai.com>	2023-03-19 01:34:16 -07:00
Horace He	1d2871d0d1	[RUNTIME] Fix memory leak in (#1358 ) Fixes a bug that causes Triton to leak 32 bytes on every kernel invocation. Also solves https://github.com/pytorch/pytorch/issues/96937	2023-03-16 17:52:06 -07:00
Rohit Santhanam	def693c53f	Update test_core_amd.py with new unit test from test_core.py.	2023-03-16 23:53:43 +00:00
Berke Kocaoğlu	ba91f39dbf	[DOC] Fix syntax errors, typos, formatting; increase consistency (#1357 ) This PR; - Fixes syntax errors like `.type values: dict[str, Callable[[list[Any]], Any]]` to `:type values: dict[str, Callable[[list[Any]], Any]]`, - Fixes typos, - Fixes formatting like `k ++` to ` k++`, - Increases consistency (e.g. by transforming the minority `cd dir/` to the majority `cd dir`).	2023-03-16 15:32:02 -07:00
Rohit Santhanam	77dcb667b5	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-03162023	2023-03-16 13:21:15 +00:00
Philippe Tillet	56b23f433d	[TEST] Temporarily disable `test_dot` mode that fails because of ptxas/nvptx (#1344 )	2023-03-15 01:17:48 -07:00
peterbell10	01b177afe7	[FRONTEND] Mangle signed and unsigned integer types differently (#1340 ) This is cherry-picked from #1305 If you call a `JITFunction` twice in the same kernel, first with `int32` then with `uint32`, the second call will treat the unsigned value as signed. This passes through MLIR without error because MLIR uses the same types for both, but different operation calls will be generated so you may silently get the wrong result.	2023-03-14 22:29:18 -07:00
Rohit Santhanam	40351272f4	Fix test_convert2d from test_core_amd.py.	2023-03-14 15:11:56 +00:00
Philippe Tillet	6a8634e2a7	[BACKEND] No longer running LLVM-IR optimizations after codegen. (#1338 ) This triggered some outrageous bugs. See #1337.	2023-03-13 22:50:15 -07:00
Rohit Santhanam	6ff54b495c	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-03122023	2023-03-13 18:09:12 +00:00
Philippe Tillet	3fe3adbcde	[FRONTEND][BACKEND] Add support for float8e5m2 type (#1314 )	2023-03-10 19:14:47 -08:00
Keren Zhou	8b25c30d39	[BACKEND] Fix bfloat16 flash attention (#1306 ) See https://github.com/openai/triton/issues/1245 for more detailed information --------- Co-authored-by: giorgio-arena <arena.cpp@gmail.com>	2023-03-09 21:14:52 -08:00
Da Yan	902c61affb	[BACKEND] Add arith::SelectOp => LLVM::SelectOp conversion (#1307 )	2023-03-09 09:35:30 -08:00
Keren Zhou	78b311f6e2	[FRONTEND] Fix cast when both `src_ty` and `dst_ty` are of block_type (#1301 ) Commonly used in atomic_rmw ops	2023-03-08 09:25:00 -08:00
B1tway	b5dc18d7c9	Added missing #ifdef and fixed code style	2023-03-07 11:32:52 +00:00
Keren Zhou	4731f300d3	[BACKEND] Mask out wrapped threads in store ops (#1283 )	2023-03-06 14:50:20 -08:00
B1tway	625a99aa78	Fixing the layouts in the tests for Wave64	2023-03-06 18:35:18 +00:00
Keren Zhou	d376020f90	[FRONTEND][BACKEND] Implement `tl.device_assert` and rename `tl.printf` to `tl.device_print` (#1143 ) Note that `tl.device_print` and `print` accepts different arguments than the normal `print`. The first argument must be a string, following by variables. Device side: - `tl.device_print` - `tl.device_assert` - `print` - `assert` Compilation time: - `tl.static_assert` - `tl.static_print` Usage example: 1. ```Python tl.device_assert(x == 0, "x != 0") ``` Output: ```Python ... python/test/unit/language/assert_helper.py:18: kernel: block: [0,0,0], thread: [33,0,0] Assertion `x != 0` failed. ... ``` 2. ```Python tl.device_print("hello ", x) ``` Output: ```Python ... hello 1 ... ``` The environment variable `TRITON_DEBUG` sets the default debugging flag; if it's true, `tl.device_assert` or `assert` will be skipped.	2023-03-04 08:08:29 -08:00
Keren Zhou	65e5a3bc24	[FRONTEND] Improve `tl.full` to accept both static and dynamic values (#1269 )	2023-03-02 12:19:54 -08:00
Philippe Tillet	fa0fbc937f	[FRONTEND][BACKEND][OPTIMIZER] Loops now use 64-bit indices when necessary (#1261 ) * Frontend: - `int` kernel arguments are always signed - Loop induction variable is now determine by integer promotion on lb/ub/step * Optimizer: - Added new ExtractSliceOp that enforces 32-bit offsets * Backend: - Use 64-bit indices when lowering functions and control flow - Removed `idx_val` macro and replaced it with `i32_val` - Cleaned up comments - Added new ArithToIndex pass to make sure operations on indices are done with the `index` dialect, that gets converted to LLVM separately using a 64-bit target	2023-03-01 23:09:48 -08:00
Keren Zhou	90fcb38c7b	[BACKEND] Overwrite NVPTX converters for fp16<->fp32 and int16<->int32 to avoid ptxas problems (#1267 )	2023-03-01 18:26:06 -08:00
Da Yan	cb7b315a17	[OPTIMIZER] Copying named attributes when converting from Triton to TritonGPU (#1265 )	2023-03-01 12:31:46 -08:00
Rohit Santhanam	3e48537264	Merge remote-tracking branch 'upstream/main' into triton-mlir	2023-02-28 17:21:18 +00:00
Da Yan	0eead250c1	[FRONTEND] add missing tensor/constexpr ops (#1249 )	2023-02-24 18:45:22 +00:00
rsanthanam-amd	f6548e37c1	Merge branch 'triton-mlir' into dfukalov/work-3	2023-02-23 20:08:16 -06:00
Daniil Fukalov	7133d23512	[Tests] Update test_core_amd.py from test_core.py. 1. Added some tests/configs are passing at the moment. 2. Changed switching off parts from commenting to `if torch.version.hip`.	2023-02-23 23:09:58 +01:00
Rohit Santhanam	cd9ae1cd36	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-02232023	2023-02-23 21:41:54 +00:00
rsanthanam-amd	e7f84448bf	Merge pull request #127 from dfukalov/dfukalov/work-3 [ROCM] Enable float16 and int8 types for FMA based `dot` implementation.	2023-02-22 16:39:04 -06:00
Daniil Fukalov	2d678efb89	[ROCM] Enable float16 and int8 types for FMA based `dot` implementation. By default Triton generates MLIR with f32 result of the tt.dot operation on f16 typed operands. So we have "tt.dot(f16,f16,f32)->f32" types in .ttgir. But LLVM FMA instruction requires for the same type for all three operands. So first two operands are implicitly casted f16->f32 as "unrealized_conversion_cast struct{f16,f16,...}->struct{f32,f32}". The change fixed incorrect implicit cast generation. For the int8 typed operands result operand is also casted after performing dot. As the next step to improve FMA based dot operation FMA on f16 and int8 target specific intrinsics (e.g. fma(f16,f16,f16)->f16) could be used, perhaps as an option.	2023-02-22 22:36:20 +01:00
Philippe Tillet	ba0198326e	[TESTS] make performance regression testing less strict (#1231 )	2023-02-21 22:22:02 -08:00
Philippe Tillet	174f121c1c	[TESTS] Added attention regression tests (#1227 )	2023-02-21 20:22:36 -08:00
Philippe Tillet	307dde9cb5	[CI] revived regression tests (#1225 )	2023-02-21 16:33:03 -08:00
Rohit Santhanam	841784d1e3	Merge remote-tracking branch 'upstream/main' into upgrade_triton_mlir_rocm_to_llvm_head	2023-02-18 09:25:20 +00:00
Christian Sigg	fc7a8e3581	Rebase Triton to LLVM-15. (#1070 ) This PR rebases Triton from LLVM-14 to LLVM-15. Most changes are mechanical, except for the analysis framework changes.	2023-02-16 06:40:53 -08:00
Philippe Tillet	9c330a411c	[FRONTEND] fixed pinned memory exception behavior (#1197 ) no longer raise exception when the pointer is on "cpu" but also accessible from within kernels (e.g., pinned memory)	2023-02-15 17:40:45 -08:00
Philippe Tillet	e3941f9d09	[OPTIMIZER][BACKEND] Cleaned up Volta codegen (#1185 )	2023-02-14 22:39:35 -08:00
Keren Zhou	6413c7b9de	[BACKEND] Calculate correct warp ids for small matrices (#1180 ) Fixing https://github.com/openai/triton/issues/1162 Add tests 16x16x16	2023-02-14 05:28:03 +00:00
rsanthanam-amd	44f69bea81	Merge pull request #113 from ROCmSoftwarePlatform/triton-mlir-IFU-02112023 Triton mlir ifu 02112023	2023-02-13 09:26:10 -06:00
rsanthanam-amd	ec387d5bf4	Merge pull request #109 from dfukalov/dfukalov/work-3 [ROCM] Enable part of tl.dot operations.	2023-02-12 13:50:20 -06:00

... 5 6 7 8 9 ...

507 Commits