github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
oplavsic	760ac8441a	Dot slicing pass (#440 ) * First commit * Implement DotSlicing pass. * small fixes * Support chained dot in DotSlicingPass (second GEMM in FA) * Add lit test for FA dot slicing --------- Co-authored-by: Ognjen Plavsic <ognjen.plavsic@luxoft.com> Co-authored-by: Ognjen <oplavsic@luxoft.com>	2024-01-16 14:25:10 -06:00
Lixun Zhang	a819e48435	Refine test_correctness (#463 ) - Check correctness of what is benchmarked - Add capability to check col_a and col_b - But only check col_a=False, col_b=True for now - Only benchmark col_a=False, col_b=True for now - Remove in='int8', out='int8' due to too large error	2024-01-16 11:15:54 -06:00
Shucai Xiao	1223f6077a	support type conversion between fp8 formats and bf16/fp32 with HW instructions on MI300 (#414 ) * add type conversion between fp8 and bf16/fp32..	2024-01-15 17:14:49 -06:00
Lixun Zhang	e231c41467	[TUTORIAL] Enable all types in gemm tutorial (#456 ) * Enable all types in gemm tutorial Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>	2024-01-15 14:38:31 -06:00
Vinayak Gokhale	1fec965c06	Add autotuning for FA (#459 )	2024-01-12 17:15:12 -06:00
Lixun Zhang	2e217c5a5c	[Backend] Refactor sharedToDotOperandMFMA lowering (#439 ) * Remove unnecessary xor computations for k-major swizzled tensors * Support mfma16 and mfma4 in the fast path * Choose warpsPerCTA according to nonKDim * Set maxPhase=4 for mfma4 * Fix tests For now, we do not disable swizzling for k-major tensors * Remove fastPathComputeOffsetsTy1 * Enable k-major + disabled swizzling in the normal path	2024-01-12 12:50:18 -06:00
Vinayak Gokhale	c2766bbd5f	Merge changes from upstream FA bwd kernel (#444 ) * Add optimized FA bwd from upstream * Add autotuning * Change loads and stores to use block ptrs * Cleanup	2024-01-05 15:12:05 -06:00
oplavsic	bcea3051af	Add support for MFMA layout to view_slice op (#442 ) Co-authored-by: Ognjen <oplavsic@luxoft.com>	2024-01-03 12:13:36 -06:00
oplavsic	6a520566a3	Add view_slice ttgir instruction (#427 ) * Add view_slice op in ttgir --------- Co-authored-by: Ognjen Plavsic <ognjen.plavsic@luxoft.com> Co-authored-by: Ognjen <oplavsic@luxoft.com> Co-authored-by: Lixun Zhang <lixun.zhang@amd.com>	2024-01-02 15:40:11 -06:00
Alexander Efimov	98589ac013	[MFMA] Remove CTA related code from layout (#429 ) This PR removes CTALayout attribute from MFMA layout, because it is NV specific.	2023-12-27 18:01:28 +01:00
Jack Taylor	1e2fd0dd1a	Update hip_backend to use libhsa-runtime for arch info, (#411 ) brings in path changes for pytorch triton wheels Co-authored-by: jayfurmanek <Jason.Furmanek@amd.com>	2023-12-21 15:40:57 +00:00
Vinayak Gokhale	0248bdb29d	Minor edits to HBM bandwidth measurement kernel (#434 ) * Change units to GiB/s from GB/s * Run both with and w/o bounds check	2023-12-21 06:14:31 -06:00
jayfurmanek	16281f02f4	[ROCM] drop GIL for launch, and set value=false upon pointer error (#426 )	2023-12-19 08:34:51 -06:00
Vinayak Gokhale	422d7096ce	Add kernel to check HBM BW (#431 ) Add kernel to check HBM BW performance	2023-12-18 21:25:21 -06:00
joviliast	af15da2f84	Support WMMA layout in TritonAMDGPUAccelerateMatmulPass -Introduce WmmaEncodingAttr for WMMA output -Introduce BlockedToWMMA rewrite pattern in TritonAMDGPUAccelerateMatmulPass -Provide a flag tho check if wmma instructions are supported by target Signed-off-by: joviliast <iveselov.nn@gmail.com>	2023-12-18 09:11:20 -06:00
Vinayak Gokhale	b7a412d82a	Add support for ALiBi-style attention bias (#417 ) Add support for matrix and vector bias to FA.	2023-12-15 16:28:37 -06:00
jayfurmanek	29847e9bb1	Merge pull request #410 from ROCmSoftwarePlatform/ifu-231117 Ifu 231117	2023-12-15 09:09:40 -06:00
Shucai Xiao	521f425fbf	add bitcode for gfx941 and gfx942 (#403 ) Co-authored-by: Aleksandr Efimov <130555951+alefimov-amd@users.noreply.github.com>	2023-12-14 08:19:23 -06:00
Michael Melesse	26c3f99073	ROCM IFU: disable test_reduce_layouts	2023-12-13 17:12:39 -06:00
Michael Melesse	7a1f54645e	ROCM IFU: remove old tests	2023-12-13 15:30:55 +00:00
Michael Melesse	c7b62d5ec5	ROCM IFU: remove test_ext_elemwise	2023-12-12 22:58:56 -06:00
Michael Melesse	6efc013e46	ROCM IFU: fix AtomicCASOpConversion segfault	2023-12-12 17:40:31 -06:00
jayfurmanek	a42ac260aa	Merge branch 'triton-mlir' into ifu-231117	2023-12-12 14:24:11 -06:00
Jason Furmanek	160dfe838e	ROCM IFU: Fix print and assert	2023-12-12 19:30:01 +00:00
Alexander Efimov	605a90c58e	[MFMA] Support tile size 4x4 version 1 (#413 ) This PR enables 4x4 tile size in MFMA based dot operations. Supported tiled dot is (4x64) x (64x4) -> (4x4) in MFMA layout. However, actual dot operation should have at least 64 output elements, this is a limitation of other layouts appearing during result processing (i.e. blocked layout can not handle tensors smaller than wavesize). For example, following dots are supported: (4x64) x (64x16) -> (4x16), (16x64) x (64x4) -> (16x4) or (8x64) x (64x8) -> (8x8) Following dots are not supporter: (4x128) x (128x4) -> (4x4), (4x64) x (64x8) -> (4x8) This is a first version of dot using mfma 4x4 instructions, with redundancy and reductions.	2023-12-12 18:23:55 +01:00
Michael Melesse	64a0924381	ROCM IFU: remove ref to test_elementwise	2023-12-07 13:31:59 -06:00
Vinayak Gokhale	1d6b919897	Bugfix: Wrong boundary condition on qk GEMM	2023-12-04 10:11:41 -06:00
Vinayak Gokhale	f6969f4bb3	Correct that loop lo is multiple BLOCK_N	2023-12-04 10:11:41 -06:00
Vinayak Gokhale	0ef865508c	Update description	2023-12-04 10:11:41 -06:00
Vinayak Gokhale	dc62569e57	Remove slicing for out in save for bwd	2023-12-04 10:11:41 -06:00
Vinayak Gokhale	e0a4d97569	Mask vs pad for non power of 2 sequence lengths Padding results in memory allocation which is slower. Masking results in better performance.	2023-12-04 10:11:41 -06:00
Vinayak Gokhale	d5028079b7	Add FA support for non pow2 seqlen	2023-12-04 10:11:41 -06:00
Jason Furmanek	64f559771f	ROCM IFU: Fix test_core_amd.py::test_reduce_layouts	2023-11-28 04:02:48 +00:00
Jason Furmanek	f5f6b3c0a3	ROCM IFU: Add get_version_key for ROCM backend	2023-11-28 00:11:44 +00:00
Jason Furmanek	71547e4fdb	ROCM IFU: Fixes for kwargs	2023-11-28 00:09:46 +00:00
jayfurmanek	99aa1f4f75	Merge branch 'triton-mlir' into ifu-231117	2023-11-27 07:44:04 -06:00
Jason Furmanek	968a35fbf0	Fix merge conflict error	2023-11-27 13:33:23 +00:00
Shucai Xiao	d9219e0eba	use hw for fp8 type conversion (#386 ) * use hardware instruction for type conversion between fp8 and fp32 * move gpu_matrix_core_version from semantics.py to hip_backend.py --------- Co-authored-by: Aleksandr Efimov <efimov.alexander@gmail.com>	2023-11-24 10:26:40 -06:00
Jason Furmanek	a08dafe7fe	Initial commit to resolve merge conflicts	2023-11-20 22:41:03 +00:00
Jason Furmanek	5c87f363e4	Merge commit 'cb3d79a185e40c9d8a579bea07747a8a8d157d52' into ifu-231117 Conflicts: lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVM.cpp lib/Dialect/TritonGPU/IR/Dialect.cpp python/setup.py python/test/unit/language/assert_helper.py python/test/unit/operators/test_flash_attention.py python/test/unit/runtime/test_subproc.py python/triton/compiler/compiler.py python/triton/language/semantic.py python/triton/runtime/autotuner.py python/triton/runtime/jit.py python/tutorials/03-matrix-multiplication.py python/tutorials/05-layer-norm.py python/tutorials/06-fused-attention.py python/tutorials/11-grouped-gemm.py test/Conversion/tritongpu_to_llvm.mlir	2023-11-17 20:42:12 +00:00
Alexander Efimov	dfb76540b4	[Tutorial] Fix post IFU issues with FA (#398 ) * [Tutorial] Fix post IFU issues with FA * Remove redundant kernels in 06-fused-attention.py * Added README for scripts in perf-kernels dir * Fix bwd kernel --------- Co-authored-by: Lixun Zhang <lixun.zhang@amd.com>	2023-11-17 01:28:49 +00:00
Alexander Efimov	096def0c9b	[Test] Disable mma layout for amd hardware (#384 ) Disable mma layout testing by looking at is_hip instead of wave size. This fixes tests on Navi GPUs with wave size == 32.	2023-11-17 01:28:49 +00:00
Lixun Zhang	181bdbd410	Benchmark FA on 2 GCDs (#393 )	2023-11-17 01:28:49 +00:00
Jason Furmanek	44b155f41b	ROCM IFU: Resolve merge conflicts in tutorial 06 Resolve merge conflicts in tutorial 06 - 2	2023-11-17 01:28:40 +00:00
Jason Furmanek	484852876e	Resolve merge conflicts; AMD adjustments for new LLVM version	2023-11-09 19:00:49 +00:00
Jason Furmanek	977d5aa267	Merge commit '721897fcc4f942aa97d2e9ba3787a5e213758177' into ifu-231108 Conflicts: bin/triton-translate.cpp lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp python/triton/compiler/compiler.py python/triton/runtime/jit.py python/tutorials/06-fused-attention.py test/Conversion/tritongpu_to_llvm.mlir	2023-11-08 18:51:23 +00:00
Lixun Zhang	1af893d8a2	[FRONTEND] Add input dtypes to autotuning key (#2534 ) (#374 ) * [FRONTEND] Add input dtypes to autotuning key (#2534) * Fix conflict in 06-fused-attention * Fix get_best_config in FA-transV.py * Fix leftover get_best_config() --------- Co-authored-by: Adnan Akhundov <adnan.akhundov@gmail.com>	2023-11-07 19:36:57 -06:00
Jason Furmanek	85216ea5c5	ROCM IFU: Resoolve conflicts in FA tutorial	2023-11-07 04:29:45 +00:00
Alexander Efimov	aefc94bd25	ROCM IFU: fix test_dot_mfma_vector_load test fix for previous commit	2023-11-07 04:29:45 +00:00
Jason Furmanek	39e8901d7a	ROCM IFU: Resolve merge conflicts in RemoveLayoutConversions.cpp fix merge error fix dot fix make_range additional fix	2023-11-07 04:29:38 +00:00

1 2 3 4 5 ...

1440 Commits