github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Philippe Tillet	3db55c5f94	[OPTIMIZER]]BACKEND] Some backend and optimization passes clean-up (#1284 ) * Cleaned up pipeline pass. Now works when there are element-wise ops between the load and the dot * Made `splat` compatible with varibales that have DotOperandLayout * Moves rematerialization utils to separate Transforms/Utility.cpp file.	2023-03-06 17:17:59 -08:00
Philippe Tillet	fa0fbc937f	[FRONTEND][BACKEND][OPTIMIZER] Loops now use 64-bit indices when necessary (#1261 ) * Frontend: - `int` kernel arguments are always signed - Loop induction variable is now determine by integer promotion on lb/ub/step * Optimizer: - Added new ExtractSliceOp that enforces 32-bit offsets * Backend: - Use 64-bit indices when lowering functions and control flow - Removed `idx_val` macro and replaced it with `i32_val` - Cleaned up comments - Added new ArithToIndex pass to make sure operations on indices are done with the `index` dialect, that gets converted to LLVM separately using a 64-bit target	2023-03-01 23:09:48 -08:00
Philippe Tillet	0ec277efc5	[OPTIMIZER] cleaned, renamed and simplified some optimization passes (#1232 ) This shouldn't actually change the behavior of Triton -- only clean things up.	2023-02-22 13:54:55 -08:00
Horace He	f21e76affe	[TUTORIALS] changed for loop to iterate by 1 in matmuls (#1198 ) For the new MLIR backend, this appears to increase matmul perf significantly in many cases.	2023-02-16 03:44:42 +00:00
Philippe Tillet	8bca84ce3d	[OPTIMIZER] Bugfix in Combine.cpp ; Added `trans` support in Pipeline.cpp (#1174 )	2023-02-14 13:36:44 -08:00
Yen-Chen Lin	1ea08be168	[TUTORIALS] Add description for 05-layer-norm.py (#1178 ) - Add text description and equations for the tutorial. - Improve the code readability by changing variable names to align them with the equation. The actual code logic is not changed. This is a follow-up of #510. Let me know if a preview HTML is helpful for the review, I can add a link to that too.	2023-02-13 08:47:35 +00:00
Philippe Tillet	408d1d7e87	[OPTIMIZER] Improved flash attention forward pass performance (#1075 ) - Fixed typo in instruction reordering pass - Minor additional optimizations for shared memory allocator - Optimized flash attention tutorial forward pass kernel	2023-01-19 06:46:01 +00:00
Philippe Tillet	259f4c5f7d	[OPTIMIZER] Added new optimization passes (#1055 ) This PR adds a couple of optimization passes that should substantially improve the performance of Triton on fused attention kernels: - DecomposeConversionsPass: This decomposes some instructions of the form `convert_layout` into - ReorderInstructions: this reorders instructions in a way that is more amenable to good code generation from `ptxas`.	2023-01-13 13:15:53 -08:00
Philippe Tillet	20100a7254	Merge `triton-mlir` branch - Complete rewrite of the backend from scratch (#1004 ) This PR merges the `triton-mlir` branch, in which we have been quietly rewriting the Triton backend from scratch to increase maintainability, stability and ultimately performance. Changes to the runtime are minimal, and this new version aims to remain backward-compatible with the previous commit. The legacy backend is now officially deprecated, but can still be accessed via the `legacy-backend` tag. Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: Yan Chunwei <yanchunwei@outlook.com> Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com> Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com> Co-authored-by: Yan Da <dyanab@connect.ust.hk> Co-authored-by: Jun Yang <yangjunpro@gmail.com> Co-authored-by: Ian Bearman <ianb@microsoft.com> Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Qingyi Liu <qingyil@nvidia.com> Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com> Co-authored-by: Chenggang Zhao <lyricz@yeah.net> Co-authored-by: ben-zhang-609 <benzh609@gmail.com> Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-12-21 01:30:50 -08:00
Chenggang Zhao	f16138d447	[Frontend] Interface fixes for libdevice (#830 ) - Unifying several interfaces with different types to a single one, e.g. `fsub_ru` and `dsub_ru` -> `sub_ru`; - Minor bug fix: `fast_pow` is incorrectly classified into the `pow` interface, of which arguments are the same as `powf`; - Explicit interfaces for casting functions, e.g. decoupling `ll2float_ru` to `ll2float_ru` and `ull2float_ru`; - Removing interfaces that are not in NVIDIA's official documents, e.g. `fmaf_ieee_rn`, which is confusing together with `fmaf_rn`. Note that this PR for the master branch is different from #829, which is for the MLIR branch.	2022-11-01 10:51:58 -07:00
Chris	9a11a567ce	[DOCS] Fixed typos in 01-vector-add.py (#751 )	2022-10-09 18:12:46 -07:00
Phil Tillet	b244db06da	[TUTORIALS] Attention tutorial fixup	2022-09-30 19:31:43 -07:00
Shintaro Iwasaki	c668d6596e	[DOCS] Fix spelling (#664 ) This PR applies minor spelling fix in comments and string literals to `master`. It shouldn't hurt anything.	2022-09-16 12:26:40 -07:00
Yunxing Dai	59a8e25f43	[DOCS] Fix typo (#650 )	2022-09-14 12:17:05 -07:00
Phil Wang	7394d732ad	[DOCS] support for variable head dimensions in flash attention triton tutorial (#623 )	2022-08-15 19:16:49 -07:00
Keren Zhou	af85f5fa46	[FRONTEND] Refresh cache when the source code of outlined functions are changed (#590 )	2022-07-20 17:34:07 -07:00
Philippe Tillet	86cab58d89	[CI] Changed dev wheel date to UTC time to match CRON schedule (#587 )	2022-07-18 14:54:13 -07:00
Phil Tillet	5b04331dd2	[TUTORIALS] Added more credits in fused attention tutorial	2022-07-13 23:48:58 -07:00
Keren Zhou	4912916c11	[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) (#562 )	2022-07-13 15:52:21 -07:00
Phil Tillet	971f5782b4	[tutorials] Added flash attention credits in tutorial	2022-07-11 18:56:48 -07:00
Philippe Tillet	d5eb9bc230	[tutorial] Added bwd in fused attention example (#579 ) Doesn't work on V100	2022-07-11 15:43:46 -07:00
Natalia Gimelshein	1bbb2430d9	[TUTORIALS] adjust heuristics for dwdb kernel (#565 )	2022-06-29 17:00:22 -07:00
Philippe Tillet	5b4c8f221e	[BACKEND] Compiler improvements (#557 ) This PR adds several optimization capabilities in the compiler backend: - Now using inline PTX for `tl.store`, making it possible to use things like evict_last - For A100, mma layout can be directly converted to shared memory - For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major. - Fixed liveness analysis; this was broken. - Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop. - `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.	2022-06-27 11:49:19 -07:00
Philippe Tillet	751e325d2e	[TUTORIALS] Fixed typo	2022-06-05 13:33:21 -07:00
Philippe Tillet	801c8a4c92	[TUTORIALS] Fixed typo	2022-06-05 12:32:07 -07:00
Philippe Tillet	8876e53206	[BACKEND] Restored reduction bugfixes	2022-06-03 11:38:52 -07:00
Philippe Tillet	a60374a597	Revert "[BACKEND] Various bug fixes; making reductions faster (#533 )". This is a more stable commit that produce bitwise identical code to earlier versions. Using commits after this one may lead to slightly different numerics	2022-06-03 11:36:06 -07:00
Philippe Tillet	3e7500dfe6	[BACKEND] Various bug fixes; making reductions faster (#533 )	2022-05-31 17:14:44 -07:00
Philippe Tillet	0835a4fb05	[TUTORIALS] Removed #noformat in layer norm tutorial	2022-05-12 12:41:25 -07:00
Philippe Tillet	c736ba7c3e	[TUTORIALS] Fixed formatting	2022-05-12 12:31:23 -07:00
Philippe Tillet	cd30a99aa2	[TUTORIALS] fixed formatting	2022-05-12 12:28:22 -07:00
Philippe Tillet	d87435e536	[TUTORIALS] Layer norm tutorial now uses residency control (#510 )	2022-05-05 19:53:54 -07:00
Philippe Tillet	5c7122004c	[TUTORIALS] Tutorial shouldn't expose `clock`. Just removed it.	2022-04-14 17:33:44 -07:00
Philippe Tillet	bace26143d	[TUTORIALS] Removed leftover print	2022-03-28 16:53:23 -07:00
Philippe Tillet	e0cc488055	[FRONTEND] Added `tl.clock` and `tl.globaltimer` (#485 )	2022-03-28 16:15:43 -07:00
Philippe Tillet	7b48340ffd	[CI] Some fixes for the build (#451 )	2022-02-06 19:11:33 -08:00
Philippe Tillet	2922dc141c	Merge branch 'master' into v2.0	2022-01-30 20:25:01 -08:00
Madeleine Thompson	efdabe6073	[STYLE] check python with flake8 (#424 ) I've been using this locally to find errors without running tests, and now that we're using autopep8, it passes with minimal suppressions. This is also what turned up the issues with the tutorials, which were fixed in #422.	2022-01-07 15:28:36 -08:00
Madeleine Thompson	a70acfec77	[STYLE] add isort and autopep8 config files and check on CI (#423 ) Also a fix a few more style issues from the "aggressive" mode of autopep8.	2022-01-07 13:11:34 -08:00
Madeleine Thompson	9801aa7b56	[DOCS] fix tutorials for v2.0 (#422 ) - Fix meta-parameter usage on tutorials. - Install tutorial dependencies on CI. - Switch from `requirements-test.txt` to `extras_require` for test dependencies, and also use it for tutorial dependencies. - Make some performance tests deterministic.	2022-01-07 12:34:38 -08:00
Madeleine Thompson	8bf551ae7a	[STYLE] run autopep8 and isort (#421 ) Run: ``` isort ./python autopep8 -i --ignore E501,E701,E731 $(find ./python/ -name '*.py') ``` with an `.isort.cfg` and then clean up a few warts. This PR should be a no-op; the idea is that this is all boring whitespace changes, and any config file changes will be in a different change to make it easier to review.	2022-01-06 14:34:17 -08:00
Noah Ziems	3edc2633e9	[TUTORIALS] Fix 01-vector-add.py typo (#406 )	2021-12-29 15:09:34 -08:00
Philippe Tillet	2acaa4d0dd	[LANG] Added support for constexpr (#361 )	2021-10-30 00:32:58 -07:00
Philippe Tillet	90ded16c32	[DOCS] Added placeholder docstring for layernorm tutorial	2021-10-15 19:04:01 -07:00
Philippe Tillet	d4baad426d	[DOCS] Added layer norm example (#326 )	2021-10-08 11:02:10 -07:00
Philippe Tillet	4163d32c49	[DOCS] Fixed leftover exit() in 01-vector-add tutorial	2021-09-10 15:52:26 -07:00
Philippe Tillet	ac10551d55	[PYTHON] Now providing triton.next_power_of_2 (#273 )	2021-09-10 11:05:44 -07:00
Philippe Tillet	94c83d30ce	[GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268 ) - Removed driver module -- accelerator runtime is handled by pytorch - Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes - Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors	2021-09-09 00:04:28 -07:00
Szymon Sidor	8bedcce9be	[LANG] Added seeded random number generation - philox (#261 )	2021-09-02 22:02:40 -07:00
Sasank Chilamkurthy	6aa5720d75	[DOCS] use numel for num_elements in elementwise tutorial (#228 )	2021-08-19 19:35:12 -07:00

1 2

76 Commits