github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Lixun Zhang	a819e48435	Refine test_correctness (#463 ) - Check correctness of what is benchmarked - Add capability to check col_a and col_b - But only check col_a=False, col_b=True for now - Only benchmark col_a=False, col_b=True for now - Remove in='int8', out='int8' due to too large error	2024-01-16 11:15:54 -06:00
Lixun Zhang	e231c41467	[TUTORIAL] Enable all types in gemm tutorial (#456 ) * Enable all types in gemm tutorial Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>	2024-01-15 14:38:31 -06:00
Vinayak Gokhale	1fec965c06	Add autotuning for FA (#459 )	2024-01-12 17:15:12 -06:00
Vinayak Gokhale	0248bdb29d	Minor edits to HBM bandwidth measurement kernel (#434 ) * Change units to GiB/s from GB/s * Run both with and w/o bounds check	2023-12-21 06:14:31 -06:00
Vinayak Gokhale	422d7096ce	Add kernel to check HBM BW (#431 ) Add kernel to check HBM BW performance	2023-12-18 21:25:21 -06:00
Vinayak Gokhale	b7a412d82a	Add support for ALiBi-style attention bias (#417 ) Add support for matrix and vector bias to FA.	2023-12-15 16:28:37 -06:00
Vinayak Gokhale	1d6b919897	Bugfix: Wrong boundary condition on qk GEMM	2023-12-04 10:11:41 -06:00
Vinayak Gokhale	f6969f4bb3	Correct that loop lo is multiple BLOCK_N	2023-12-04 10:11:41 -06:00
Vinayak Gokhale	0ef865508c	Update description	2023-12-04 10:11:41 -06:00
Vinayak Gokhale	dc62569e57	Remove slicing for out in save for bwd	2023-12-04 10:11:41 -06:00
Vinayak Gokhale	e0a4d97569	Mask vs pad for non power of 2 sequence lengths Padding results in memory allocation which is slower. Masking results in better performance.	2023-12-04 10:11:41 -06:00
Vinayak Gokhale	d5028079b7	Add FA support for non pow2 seqlen	2023-12-04 10:11:41 -06:00
Alexander Efimov	5b06b168aa	[Tutorial] Fix post IFU issues with FA (#398 ) * [Tutorial] Fix post IFU issues with FA * Remove redundant kernels in 06-fused-attention.py * Added README for scripts in perf-kernels dir * Fix bwd kernel --------- Co-authored-by: Lixun Zhang <lixun.zhang@amd.com>	2023-11-14 10:46:45 -06:00
Lixun Zhang	d4eda83b33	Benchmark FA on 2 GCDs (#393 )	2023-11-08 12:42:54 -06:00
Lixun Zhang	1af893d8a2	[FRONTEND] Add input dtypes to autotuning key (#2534 ) (#374 ) * [FRONTEND] Add input dtypes to autotuning key (#2534) * Fix conflict in 06-fused-attention * Fix get_best_config in FA-transV.py * Fix leftover get_best_config() --------- Co-authored-by: Adnan Akhundov <adnan.akhundov@gmail.com>	2023-11-07 19:36:57 -06:00
oplavsic	c65f1e6211	Add OptimizeEpilogue pass. (#346 ) * optimize_epilogue * Add config * Remove licenses * Comment out Hopper specific parameters when printing out configs * Add benchmark parameters from flash-attention repo * Add Z and H in the key of autotuner --------- Co-authored-by: Lixun Zhang <lixun.zhang@amd.com>	2023-11-03 16:46:24 -05:00
Lixun Zhang	a66270165a	Move fa-transV to the new perf-kernels dir (#387 )	2023-11-03 00:09:48 -05:00

17 Commits