mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
[FA fwd D=128] Reduce LDS usage in epilogue (#340)
* rebase onto improve_fwd_fa * Fixed a leftover from rebase * rebase onto improve_fa_fwd * Reduce tuning space * Disable bwd with D=128 * Add test for d=128 * Fix an issue with get_best_config when there is only one config * Added better configs for d=128 * Fix typos --------- Co-authored-by: Lixun Zhang <lixun.zhang@amd.com>
This commit is contained in:
@@ -100,7 +100,7 @@ class Autotuner(KernelInterface):
|
||||
key_values.append(kwargs[name])
|
||||
key = tuple(key_values)
|
||||
|
||||
return self.cache[key] if key in self.cache else Config({})
|
||||
return self.best_config
|
||||
|
||||
|
||||
def run(self, *args, **kwargs):
|
||||
|
||||
Reference in New Issue
Block a user