Add decompose of aten._scaled_dot_product_flash_attention_for_cpu (#2064 )

New decompose from: https://github.com/pytorch/pytorch/pull/117390 Requied from chatglm model: https://github.com/llvm/torch-mlir/issues/2730
Add decompose of aten._scaled_dot_product_flash_attention.default
2026-01-11 14:58:11 -05:00 · 2024-01-15 20:03:17 -08:00 · 2024-01-16 03:03:14 +00:00
1 changed files with 1 additions and 0 deletions
--- a/shark/shark_importer.py
+++ b/shark/shark_importer.py
@@ -686,6 +686,7 @@ def import_with_fx(
        torch.ops.aten._scaled_dot_product_flash_attention.default,
        torch.ops.aten.index_add,
        torch.ops.aten.index_add_,
+        torch.ops.aten._scaled_dot_product_flash_attention_for_cpu,
    ]
    if precision in ["int4", "int8"] and not is_gptq:
        from brevitas_examples.llm.llm_quant.export import (