[FA fwd D=128] Reduce LDS usage in epilogue (#340)

* rebase onto improve_fwd_fa

* Fixed a leftover from rebase

* rebase onto improve_fa_fwd

* Reduce tuning space

* Disable bwd with D=128

* Add test for d=128

* Fix an issue with get_best_config when there is only one config

* Added better configs for d=128

* Fix typos

---------

Co-authored-by: Lixun Zhang <lixun.zhang@amd.com>
This commit is contained in:
oplavsic
2023-10-25 19:10:34 +02:00
committed by GitHub
parent e74bdb1581
commit 715a589ce3
3 changed files with 182 additions and 42 deletions

View File

@@ -100,7 +100,7 @@ class Autotuner(KernelInterface):
key_values.append(kwargs[name])
key = tuple(key_values)
return self.cache[key] if key in self.cache else Config({})
return self.best_config
def run(self, *args, **kwargs):