Files
ROCm/python/triton/runtime
oplavsic 715a589ce3 [FA fwd D=128] Reduce LDS usage in epilogue (#340)
* rebase onto improve_fwd_fa

* Fixed a leftover from rebase

* rebase onto improve_fa_fwd

* Reduce tuning space

* Disable bwd with D=128

* Add test for d=128

* Fix an issue with get_best_config when there is only one config

* Added better configs for d=128

* Fix typos

---------

Co-authored-by: Lixun Zhang <lixun.zhang@amd.com>
2023-10-25 12:10:34 -05:00
..
2023-04-12 23:50:44 -07:00
2023-08-29 21:07:16 -04:00
2023-10-18 16:57:34 +02:00