github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-02-21 03:00:39 -05:00

Author	SHA1	Message	Date
Berke Kocaoğlu	ba91f39dbf	[DOC] Fix syntax errors, typos, formatting; increase consistency (#1357 ) This PR; - Fixes syntax errors like `.type values: dict[str, Callable[[list[Any]], Any]]` to `:type values: dict[str, Callable[[list[Any]], Any]]`, - Fixes typos, - Fixes formatting like `k ++` to ` k++`, - Increases consistency (e.g. by transforming the minority `cd dir/` to the majority `cd dir`).	2023-03-16 15:32:02 -07:00
mcskatkat	53e8e04d6e	[FRONTEND] fix constexpr by annotation (#1352 ) Fixed unjustified `TypeError` raised when arg is (strangely) annotated with a non-type	2023-03-16 11:10:19 -07:00
mcskatkat	f5d22d5995	[FRONTEND] support f-strings in compiler with `constexpr` conversion (#1349 ) This addition allows explanatory messages upon assertion failures: ```python @triton.jit def my_single_block_kernel( matrix_extent: tl.constexpr, block_size: tl.constexpr, # must be >= extent (single block) matrix: Tensor, ... ): tl.static_assert(matrix_extent <= block_size, f"`matrix_extent` should not be more than the block size ({block_size}), but is {matrix_extent}") ``` Yielding, when called incorrectly: ``` AssertionError: `matrix_extent` should not be more than the block size (32), but is 57 ```	2023-03-16 08:02:10 +00:00
Shintaro Iwasaki	4b774ee4d0	[OPS/BLOCKSPARSE] remove unnecessary mask (#1351 ) This PR applies a minor patch that removes unnecessary masks in `_dsd_kernel()`. ### Details `offs_bn` is defined as follows and not updated after that. ```py offs_bn = pid_m * TILE_N + tl.arange(0, TILE_N) offs_bn = tl.max_contiguous(tl.multiple_of(offs_bn % DS0, TILE_N), TILE_N) ``` Because `offs_bn = offs_bn % DS0`, this mask is always `True`. ```py b = tl.load(pb, mask=offs_bn[None, :] < DS0) ``` This PR removes this mask (as well as explicit `mask=True`).	2023-03-15 19:06:38 -07:00
mcskatkat	c175473bbf	[FRONTEND] In `JITFunction`: infer constexpr arg only if annotated as such (#1345 ) Fixed `JITFunction.__init__` to mark args as constexpr only when the annotation is actually `tl.constexpr`, rather than treating any annotated arg as constexpr.	2023-03-15 16:39:45 -07:00
Stonepia	109b5e2729	[BUILD] Fix the build bug when user use system package of llvm by setting `LLVM_SYSPATH` (#1336 ) When the user set the `LLVM_SYSPATH` to use custom build llvm, it will throw the error because there is no version.txt under the custom build one. This PR skips the version check If the `LLVM_SYSPATH` is set. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-03-15 13:28:19 -07:00
Philippe Tillet	56b23f433d	[TEST] Temporarily disable `test_dot` mode that fails because of ptxas/nvptx (#1344 )	2023-03-15 01:17:48 -07:00
peterbell10	01b177afe7	[FRONTEND] Mangle signed and unsigned integer types differently (#1340 ) This is cherry-picked from #1305 If you call a `JITFunction` twice in the same kernel, first with `int32` then with `uint32`, the second call will treat the unsigned value as signed. This passes through MLIR without error because MLIR uses the same types for both, but different operation calls will be generated so you may silently get the wrong result.	2023-03-14 22:29:18 -07:00
Philippe Tillet	ad81447ad0	[FRONTEND] Marking `int1` (`bool`) type as unsigned (#1343 )	2023-03-14 22:05:13 -07:00
Philippe Tillet	6a8634e2a7	[BACKEND] No longer running LLVM-IR optimizations after codegen. (#1338 ) This triggered some outrageous bugs. See #1337.	2023-03-13 22:50:15 -07:00
Philippe Tillet	dde34904d0	[TESTING] triton.testing.allclose now uses torch.allclose (#1333 )	2023-03-13 17:48:32 -07:00
Nikita Shulga	663074460d	[VERSION] Update `triton/__init__.py` (#1327 ) Followup after `c7581c9a91`	2023-03-13 10:38:38 -07:00
Philippe Tillet	9b7c65a3a9	[BACKEND][OPTIMIZER] Refactor MMAv1 codegen (#1322 ) - Significant simplification of the optimizer pipeline. Right mma version is now set directly after the coalescing pass. DotOperand layout no longer hold a state to `isRow` argument, and instead query it from their parent - Moved a bunch of things from TritonGPUToLLVM/DotOpHelpers to TritonGPUAttrDefs. All MMAv1 state is now queried from attributes. - logic for getELemsPerThread is no longer duplicated in TypeConverter	2023-03-12 19:54:38 -07:00
Yu Guo	ef55ccfed0	[TESTING] fix get_max_simd_tflops (#1318 ) `_triton.runtime.num_sm`, `_triton.runtime.clock_rate`, `_triton.runtime.cc` seem no longer exist. use the corresponding methods from `get_max_tensorcore_tflops` in the same file.	2023-03-11 10:07:25 -08:00
Philippe Tillet	5a786cf778	[FRONTEND] Fixed `contains_return_op` behavior (#1317 )	2023-03-10 23:58:28 -08:00
Philippe Tillet	3fe3adbcde	[FRONTEND][BACKEND] Add support for float8e5m2 type (#1314 )	2023-03-10 19:14:47 -08:00
Luo Yihang	9626c8e944	[DOC] Fix typos in comments (#1311 ) Fixed several typos in `python/triton/runtime/autotuner.py`	2023-03-10 09:33:24 -08:00
Keren Zhou	8b25c30d39	[BACKEND] Fix bfloat16 flash attention (#1306 ) See https://github.com/openai/triton/issues/1245 for more detailed information --------- Co-authored-by: giorgio-arena <arena.cpp@gmail.com>	2023-03-09 21:14:52 -08:00
Sophia Wisdom	a4a824a3c9	[FRONTEND] Correct error message (#1308 )	2023-03-09 21:14:11 -08:00
Da Yan	902c61affb	[BACKEND] Add arith::SelectOp => LLVM::SelectOp conversion (#1307 )	2023-03-09 09:35:30 -08:00
Keren Zhou	78b311f6e2	[FRONTEND] Fix cast when both `src_ty` and `dst_ty` are of block_type (#1301 ) Commonly used in atomic_rmw ops	2023-03-08 09:25:00 -08:00
shunting314	f5c9f9b4b5	[FRONTEND] Expose the register usage and spill information thru CompiledKernel (#1296 )	2023-03-08 01:30:31 +00:00
Phil Tillet	773c29cfaa	[BUILD] Fix comment typo	2023-03-07 16:47:30 -08:00
Phil Tillet	305f99e614	[BUILD] Fixed typo in setup.py	2023-03-07 15:45:36 -08:00
Philippe Tillet	c34b32866b	[BUILD] re-download package if version has changed (#1294 )	2023-03-07 10:15:35 -08:00
JiCheng	849a40baad	[FRONTEND] Add check for the axis of reduction op (#1268 )	2023-03-06 22:11:43 -08:00
Philippe Tillet	3db55c5f94	[OPTIMIZER]]BACKEND] Some backend and optimization passes clean-up (#1284 ) * Cleaned up pipeline pass. Now works when there are element-wise ops between the load and the dot * Made `splat` compatible with varibales that have DotOperandLayout * Moves rematerialization utils to separate Transforms/Utility.cpp file.	2023-03-06 17:17:59 -08:00
Keren Zhou	4731f300d3	[BACKEND] Mask out wrapped threads in store ops (#1283 )	2023-03-06 14:50:20 -08:00
Alexander Zinoviev	5e92a66267	[DOC] Fix a typo in where's description (#1286 ) Co-authored-by: Alexander Zinoviev <zinoviev@google.com>	2023-03-06 14:38:03 -08:00
Philippe Tillet	ff94e34430	[TESTS][BUILD] now using llvm @ 8e5a41e8271f (#1282 ) Now we also use the FileTest utility packaged with llvm pre-built binaries	2023-03-05 17:23:00 -08:00
Keren Zhou	d376020f90	[FRONTEND][BACKEND] Implement `tl.device_assert` and rename `tl.printf` to `tl.device_print` (#1143 ) Note that `tl.device_print` and `print` accepts different arguments than the normal `print`. The first argument must be a string, following by variables. Device side: - `tl.device_print` - `tl.device_assert` - `print` - `assert` Compilation time: - `tl.static_assert` - `tl.static_print` Usage example: 1. ```Python tl.device_assert(x == 0, "x != 0") ``` Output: ```Python ... python/test/unit/language/assert_helper.py:18: kernel: block: [0,0,0], thread: [33,0,0] Assertion `x != 0` failed. ... ``` 2. ```Python tl.device_print("hello ", x) ``` Output: ```Python ... hello 1 ... ``` The environment variable `TRITON_DEBUG` sets the default debugging flag; if it's true, `tl.device_assert` or `assert` will be skipped.	2023-03-04 08:08:29 -08:00
Keren Zhou	77c145cec8	[BUILD] Bump cmake requirement to >= 3.20 and format CMakeLists.txt (#1276 ) cc @malfet	2023-03-03 11:43:09 -08:00
Phil Tillet	c7581c9a91	[PACKAGING] bump dev version to 2.1.0	2023-03-02 21:52:30 -08:00
Keren Zhou	65e5a3bc24	[FRONTEND] Improve `tl.full` to accept both static and dynamic values (#1269 )	2023-03-02 12:19:54 -08:00
Phil Tillet	2660c814c9	[FRONTEND] for loop negative step hotfix	2023-03-01 23:45:03 -08:00
Philippe Tillet	fa0fbc937f	[FRONTEND][BACKEND][OPTIMIZER] Loops now use 64-bit indices when necessary (#1261 ) * Frontend: - `int` kernel arguments are always signed - Loop induction variable is now determine by integer promotion on lb/ub/step * Optimizer: - Added new ExtractSliceOp that enforces 32-bit offsets * Backend: - Use 64-bit indices when lowering functions and control flow - Removed `idx_val` macro and replaced it with `i32_val` - Cleaned up comments - Added new ArithToIndex pass to make sure operations on indices are done with the `index` dialect, that gets converted to LLVM separately using a 64-bit target	2023-03-01 23:09:48 -08:00
Keren Zhou	90fcb38c7b	[BACKEND] Overwrite NVPTX converters for fp16<->fp32 and int16<->int32 to avoid ptxas problems (#1267 )	2023-03-01 18:26:06 -08:00
Da Yan	cb7b315a17	[OPTIMIZER] Copying named attributes when converting from Triton to TritonGPU (#1265 )	2023-03-01 12:31:46 -08:00
Keren Zhou	be6217cce7	[BACKEND] Improve ptxas error message (#1263 )	2023-03-01 00:59:36 +00:00
Keren Zhou	5376fe9443	[FRONTEND] Improve triton hooks (#1256 ) Callback interfaces are not changed, just to record more attributes (i.e., `constants`) and simplify invocations	2023-02-26 17:16:05 -08:00
Da Yan	0eead250c1	[FRONTEND] add missing tensor/constexpr ops (#1249 )	2023-02-24 18:45:22 +00:00
Yan Chunwei	7eecc4d4ad	[Frontend] Fix jit cache bug (#1242 )	2023-02-23 09:21:30 -08:00
Michaël Benesty	66ddd17e72	[EXAMLPES] remove unnecessary argument (#1243 ) Small cleaning of an example calling an old API to display generated IR	2023-02-23 09:15:32 -08:00
Douglas Lehr	729211a404	Ensure __triton_launcher calls right _launch. (#1229 ) Per issue https://github.com/openai/triton/issues/1228. I believe we are potentially exposed when a Triton executor (Pytorch for example) links in two or more `triton_.so` shared objects and each has a stub for `_launch`. This fix ensures the `_launch` function is tied locally to the calling `__triton_launcher` and can't be misused by another library.	2023-02-23 00:16:36 +00:00
Philippe Tillet	0ec277efc5	[OPTIMIZER] cleaned, renamed and simplified some optimization passes (#1232 ) This shouldn't actually change the behavior of Triton -- only clean things up.	2023-02-22 13:54:55 -08:00
Philippe Tillet	ba0198326e	[TESTS] make performance regression testing less strict (#1231 )	2023-02-21 22:22:02 -08:00
Mihir Patel	6bef0c2bd6	[FRONTEND] Update path for headers to support Python 3.10 (#1123 ) Python 3.10 changes where packages are installed by default, causing problems with Ubuntu into `/local`. See [this](https://lists.debian.org/debian-python/2022/03/msg00039.html) and [this](https://bugs.launchpad.net/ubuntu/+source/python3.10/+bug/1967920). Triton seems to break when using 3.10 as it looks for the headers, but the headers are not in `/local`, e.g. they are at `/usr/include/python3.X` and not `/usr/local/include/python3.X` Not 100% sure what's going on here since it's deep in python / pip, but I think this should fix it. Otherwise, you have to hack around it in dockerfiles, e.g. `ENV DEB_PYTHON_INSTALL_LAYOUT=deb`, which breaks things with the release of pip that went. --------- Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-02-21 21:19:08 -08:00
Philippe Tillet	174f121c1c	[TESTS] Added attention regression tests (#1227 )	2023-02-21 20:22:36 -08:00
Eric Wang	320ae18093	[FRONTEND] Add error messages for arange (#1218 ) Fix issue https://github.com/openai/triton/issues/244 Check `end` is greater than `start`. Check if the range can fit in `int32`. Check the number of elements less than or equal to `TRITON_MAX_TENSOR_NUMEL = 131072`. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-02-22 00:37:28 +00:00
Philippe Tillet	307dde9cb5	[CI] revived regression tests (#1225 )	2023-02-21 16:33:03 -08:00

1 2 3 4 5 ...

528 Commits