github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-02-21 03:00:39 -05:00

Author	SHA1	Message	Date
Chenggang Zhao	e7fdfd76fb	[FRONTEND] Add value restoration for autotuner (#2549 ) For in-place kernels, neither `reset_to_zero` nor `Config.prehook` provided in the autotuner can restore the values changed during the tuning process, so I propose a recovery mechanism here. --------- Co-authored-by: Chenggang Zhao <chenggangz@deepseek.com> Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-10-31 21:37:44 -04:00
Justin Lebar	258399c114	Enable ruff linter instead of flake8 (#2574 ) [FRONTEND] Enable ruff linter instead of flake8. This fixes a few issues automatically, and also flagged two issues to fix manually in test_core.py: We had two duplicate function names! One of these function bodies was a duplicate, so I deleted it. The other function body was not a duplicate, so I gave it a new name. AIUI all of these errors should have been picked up by flake8. I'm confused why it wasn't working. Anyway this is working, and it's faster than flake8, so it seems like an improvement in all dimensions.	2023-10-31 21:28:24 +00:00
Zahi Moudallal	943330790a	[FRONTEND] add do_not_specialize property back to JITFunction (#2573 )	2023-10-31 12:02:45 -07:00
Justin Lebar	12f906287f	[FRONTEND] Refactor jit.py. (#2556 ) [FRONTEND] Refactor jit.py. The goal is to simplify the code and make it more flexible before we change the kernel launch syntax to `kernel[grid, compiler_flags(...)](...)`. The main changes here are: - Get rid of the eval'ed code in make_launcher. We can do everything using bind(). - Add KernelParam and KernelArg classes, letting us get rid of the parallel arrays/dicts indexed by parameter index. - Get rid of duplicated kernel launch code in the cache-hit/cache-miss branches.	2023-10-30 13:14:51 -07:00
Justin Lebar	f88b01f558	Apply `ruff` pre-commit to python/triton/runtime. (#2558 ) We're in the process of incrementally converting from autopep8 + flake8 + isort to ruff, on a directory-by-directory basis. The motivation to switch away from autopep8 is that I can't get it to wrap long lines, even with -aaa. This seems to be a known problem, https://github.com/hhatto/autopep8/issues/497. See more details about alternatives tried in https://github.com/openai/triton/pull/2557.	2023-10-30 11:06:44 -07:00
Adnan Akhundov	50add54334	[FRONTEND] Add input dtypes to autotuning key (#2534 )	2023-10-24 03:29:30 +00:00
Justin Lebar	30186f401e	Fix segfault in assertion test. (#2520 ) <git-pr-chain> #### Commits in this PR 1. Fix segfault in assertion test. The issue here is that we were not checking the return values of the CUDA API calls we were making. We call one function and then use the data it returns as input to another call. Obviously this doesn't work if the first call returns an error and doesn't actually return meaningful data. I don't know why this was passing in CI, but it failed consistently for me. #### [PR chain](https://github.com/jlebar/git-pr-chain) 1. 👉 #2520 👈 YOU ARE HERE </git-pr-chain>	2023-10-19 13:42:38 -07:00
Horace He	a4f373938c	[RUNTIME] Filter out paths that don't exist in json group cache (#2511 ) There's no guarantee that `/tmp/triton//.json` existing means that the corresponding `/tmp/triton//.cubin` file also exists because the tmp directory doesn't guarantee file stability.	2023-10-18 16:44:34 -04:00
ian Bearman	768fc1fcd9	[FRONTEND] change hash to not require ptxas (#2476 ) I noticed that Triton is using the `ptxas` version as part of the version hash even for non-CUDA targets. This is an attempt at fixing this. Moving the version calculation to the back-end makes sense to me from an architectural standpoint, so that's my approach here. I'm not as confident in the implementation, so please if folks have any feedback let me know.	2023-10-17 10:28:51 -07:00
Stewart Hall	29828fe491	[FRONTEND] add option to disable fp mul/add fusion (#2495 ) By default, ptxas will enable fusion of mul/add to fma instructions. The backend was also being configured unconditionally to enable this on conversion from LLVM IR to PTX. This commit adds an option which can be used to disable the FP fusion behavior in both locations.	2023-10-14 12:23:30 -07:00
edimetia3d	cb83b42ed6	[FRONTEND] using closure to create jit launcher (#2289 ) Hi, I'm adding some features to `triton.runtime.jit.JITFunction_make_launcher` and found it is hard to debug it: 1. The inlined Python code is hard to inspect in my editor. 2. My debugger fails to step into these inlined codes. In response, I've introduced some code to solve these issues. My modifications include: ~~1. Refactoring the launcher's inline Python code, ensuring it only relies on the "self" object.~~ ~~2. Add a utility method that generates a temporary file to create a launcher when debugging kernel in main module~~ Using a closure to hold the launcher's body Because this features might be good to others, I have initiated this Pull Request. ~~Tests are yet to be added; if this submission might be accepted, I will add it later.~~ Since this change is a refactor, no new test was added.	2023-09-22 17:01:54 -07:00
Philippe Tillet	894fa9e943	[RUNTIME][INTERPRETER] now also override __str__ method for tensors (#2325 )	2023-09-17 16:49:30 -07:00
Philippe Tillet	e686b4d6d4	[FRONTEND] interpreter rewrite (#2321 ) This is a new interpreter mode that shares semantic analysis with the JIT'ed codepath and that the Triton core team is committed to maintain	2023-09-17 14:58:50 -07:00
Thomas Raoux	b63e8f87fc	[FRONTEND] Override prototype (#2214 ) Low tech but very useful way to override kernels on the fly. This can be use for debugging functionality or performance problems this lets user dump modify and feed back IR into the jit compiler.	2023-09-13 10:05:47 -07:00
Ying Zhang	37f12497b0	[FRONTEND] Add PyTorch fp8 dtypes to Triton (#2279 ) Add PyTorch fp8 dtypes (`8025b193a9/torchgen/api/types/types.py (L50-L51)`) to Triton.	2023-09-12 08:57:01 -07:00
Shintaro Iwasaki	8da27c1c95	[Build] Fix very minor compilation problems (#2277 ) This PR fixes a few very minor compilation issues found in internal deployment at Meta. It looks like nit-picking, but it'd be really appreciated if it could be addressed in OSS Triton (to reduce differences from OSS), and we believe these changes are not bad in general. Neither performance nor functionality is affected by this PR. 1. Type cast in `python/triton/runtime/backends/cuda.c`. Implicit `void ` -> `cuuint{32,64}_t ` cast is not allowed by many compilers (with certain flags). It'd be nice to add an explicit cast (like `backends/hip.c`). 2. Inconsistent include path specification in `lib/Conversion/TritonGPUToLLVM/DotOpToLLVM/WGMMA.cpp`. Unlike other `DotOpToLLVM/*.cpp`, include paths used in `WGMMA.cpp` are not relative. This is problematic in some compilation settings since a compiler somehow needs to find headers in a parent directory. It'd be great to use a relative path, like other source files in Triton. cc: @yuguo68	2023-09-11 19:28:31 -07:00
Izzy Putterman	7d01c1852a	Revert unintentional change (#2257 ) This change seems to have been unintentionally reverted in the hopper PR: `38d767ea93` Adding it back.	2023-09-07 10:48:12 -07:00
Keren Zhou	9e9fbe01f0	[FRONTEND] Fix specialization on triton integer types (#2236 ) https://github.com/openai/triton/issues/2231	2023-09-03 23:57:08 -07:00
Shantanu	a4df60e20a	[FRONTEND] Fix GIL handling in error conditions (#2225 ) The use of the opaque GIL state APIs should mean that the PyErr_SetString is now safe, regardless of whether the caller has the GIL or not.	2023-09-01 13:30:42 -07:00
Michael Melesse	c6d33dcebf	[ROCM] Core Functionality for AMD (#1983 ) * this pr adds a third party backend for triton that works on AMD * this expose a lot of the work that has been done in our [fork](https://github.com/ROCmSoftwarePlatform/triton) * most unit tests on `test_core.py` pass * it skips some unit tests for various reasons * we plan to follow up with more prs improving Functionality and Performance in the future --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-31 14:02:00 -07:00
jon-chuang	9af76e7d5a	[RUNTIME] Fix cache dir (#2196 ) --------- Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-08-29 21:07:16 -04:00
Greg Brockman	ab3e8b0dad	[FRONTEND] fix handling of do_not_specialize with interior constantexprs (#2188 )	2023-08-26 09:19:34 -07:00
Mohammed Anany	ebfe0ffb29	[FRONTEND] fix for undefined dtypes in jit during loading defaults (#2114 ) Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-08-25 10:28:23 -07:00
Shantanu	7083dae4f2	[FRONTEND] drop the GIL around more CUDA ops (#2173 )	2023-08-24 20:31:38 -07:00
chengjunlu	6cb67185f8	[FRONTEND]To use proper default num_warps and num_stages based on the device backend in JITFucntion (#2130 ) The default values used by JITFunction for num_warps and num_stages are coupled with Nvidia GPU architecture. We should use the proper default values based on the device backend for the kernel to be compiled to. 1. Add two functions to return the default num_warps and num_stages for the specific device backend. 2. JITFunction uses the proper default num_warps and num_stages based on the specific device backend. Co-authored-by: Wang Weihan <eikan.wang@intel.com>	2023-08-24 21:58:18 +08:00
Zahi Moudallal	23dd11d471	[BACKEND] Solidify f8e4m3 (#2105 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-18 19:12:09 -07:00
Izzy Putterman	fc667d1f8f	[FRONTEND] fix new absolute imports (#2072 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-13 14:23:36 +00:00
Thomas	98372f46d3	[FRONTEND] Remove extra calls to _get_config causing runtime overhead (#2094 )	2023-08-13 06:51:26 -07:00
Zahi Moudallal	a01c116f76	[FRONTEND/BACKEND] Revived Float8E4B15x4 (#2090 )	2023-08-11 17:49:52 -07:00
Keren Zhou	382e8fb1fa	[RUNTIME] Make apis compatible with cuda 11 drivers (#2081 ) https://github.com/openai/triton/issues/2042	2023-08-11 17:46:56 -07:00
Shantanu	776b3784c2	[FRONTEND] further improve version_key speed (#2073 ) Realised I could do this right after my first PR got merged. This saves another 100ms	2023-08-09 22:29:36 +00:00
Shantanu	0e11257b8d	[FRONTEND] improve speed of computing version_key (#2071 ) libtriton.so is pretty large these days and hashing it is slow. Switching the hash from md5 to sha1 shaves close to 300ms off the time for me (as well as being a better hash, for whatever that's worth). As far as I could tell, sha1 is the fastest stable hash in the Python standard library, including things like zlib.crc32	2023-08-09 21:44:10 +00:00
Keren Zhou	30a331e628	[FRONTEND] Support jit functions without arguments (#2043 ) Issue https://github.com/openai/triton/issues/1973 Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-07 19:05:56 -07:00
goostavz	f1512bded1	Initial code merge of Hopper support (#2036 ) The initial code merge of Nvidia Hopper features support. Please be aware that the code merge is not finished yet and the trouble-shooting is still ongoing. The new hardware features (GMMA, TMA, STMATRIX etc.) and automatic warp-specialization are experimental for now and turned off by default. It is recommended for a trial when version 3.0 is released. The work is contributed by: ben-zhang-609, bealwang, donproc, qliu93, jsh20, allatit23, LyricZhao, ivanyinwz, goostavz & yangjunpro from Nvidia, in cooperation with: ptillet, Jokeren, ThomasRaoux & zahimoud from OpenAI. Co-authored-by: Goostav Zhu <gzhu@nvidia.com>	2023-08-07 09:53:04 +08:00
Shantanu	4f1b2ea8d7	[FRONTEND] fix error with -> None return annotation (#1987 ) None is not a type, so you get: ``` self.constexprs = [self.arg_names.index(name) for name, ty in self.__annotations__.items() if 'constexpr' in ty] E TypeError: argument of type 'NoneType' is not iterable ``` Co-authored-by: Philippe Tillet <phil@openai.com>	2023-07-25 18:49:45 -07:00
Philippe Tillet	3452615d79	[BUILD] Reverted ptxas change and fixed bug in cache key computation (#1971 )	2023-07-19 20:58:24 -07:00
Alex Collins	80163a9c1e	[FRONTEND] Add support for default args in kernel wrappers (#1943 ) Fixes the case where setting default values for arguments in a kernel function signature results in a generated kernel wrapper function without these default values. For example: ``` @triton.jit def kernel(x, y, z=3): ... ... kernel[grid](x,y) ``` Co-authored-by: Philippe Tillet <phil@openai.com>	2023-07-14 21:32:47 +00:00
Philippe Tillet	5a722b5f74	[OPS][TESTS] Added float8 support in triton.ops.matmul (#1918 ) this also adds rather extensive testing for mixed precision mode, including `float8e4b15 x float8e5` and `float8e5 x float16`	2023-07-10 09:31:12 -07:00
Natalia Gimelshein	778ed64a66	[BACKEND] make sure we always bind to primary context in loadBinary (#1912 )	2023-07-07 14:28:03 -07:00
Bert Maher	38d767ea93	[FRONTEND] fix memory leak caused by retaining args to autotuned kernel (#1911 )	2023-07-07 20:58:29 +00:00
Philippe Tillet	6d1285e1ae	[FRONTEND][BACKEND] improved fp8 specs (#1906 ) This un-reverts commit `d4c941177e`.	2023-07-06 13:03:53 -07:00
Philippe Tillet	f77015967d	Revert "[FRONTEND][BACKEND] improved fp8 specs (#1841 )" (#1865 ) This reverts commit `d4c941177e`.	2023-06-29 21:07:01 -04:00
Izzy Putterman	9961b5c7aa	[TESTING] allow user to adjust warmup and repetition time for autotuning (#1850 ) Adds an option to adjust warmup and repetition time for autotuning. It should default to old values and have no effect on current kernels. This is useful for bigger kernels where runtime might be a sizable fraction 100ms and lead to less warmup and more variance during benchmarking.	2023-06-28 11:04:43 -07:00
Philippe Tillet	d4c941177e	[FRONTEND][BACKEND] improved fp8 specs (#1841 ) clearly differentiate between standard fp8e4 (which we'll stop supporting on SM <= 89 because conversions are too expensive if we want to handle the single NaN and clipping properly) and a software-optimized fp8e4b15 format.	2023-06-26 16:19:03 -07:00
Izzy Putterman	3c400e7818	[FRONTEND] switch absolute imports to relative v2 (#1833 )	2023-06-26 04:13:12 +00:00
Goran Flegar	8d566e4196	[FRONTEND] Fix missing attribute access in DependenciesFinder (#1820 ) It seems that patch #1773 introduced a bug, since the `lhs` object doesn't necessarily have a `__name__` attribute. I'm hitting this if I modify the matmul tutorial (gflegar/triton@442b00f4d): ``` File "/home/gflegar/triton/python/triton/runtime/jit.py", line 74, in visit_Attribute if lhs is None or lhs.__name__ == "triton": AttributeError: 'Tensor' object has no attribute '__name__' ``` I think the idea of that patch was to remove the need to import triton by replacing `lhs is triton` with `lhs.__name__ == "triton"`. This patch should have the same behavior as the original code, but withouth failing if `lhs` doesn't havea `__name__` attribute.	2023-06-22 13:30:25 -07:00
Izzy Putterman	5686c51cdb	[FRONTEND] allow pre-hook in autotuner configs to access config kwargs (#1814 ) This is a very quick change that allows the configs' pre-hooks to see the values in the config itself. This is useful if we'd like to allocate intermediate tensor and the shape depends on tile size.	2023-06-22 05:40:48 -07:00
Philippe Tillet	9a2580de13	[CI] Added H100 node (#1779 )	2023-06-15 14:21:47 -07:00
Philippe Tillet	b24dc19741	[FRONTEND] cleaned up symbol names (#1782 )	2023-06-14 18:55:32 -07:00
Izzy Putterman	71e21f5797	[FRONTEND] switch absolute imports to relative imports in Triton (#1773 )	2023-06-14 23:59:24 +00:00

1 2

95 Commits