mirror of
https://github.com/ROCm/ROCm.git
synced 2026-02-21 03:00:39 -05:00
This generates identical PTX for floating point, but for integer types the resulting PTX is much better. For example `tl.abs` for int16 currently generates ```mlir cvt.s32.s16 %r1, %rs2; neg.s16 %rs4, %rs2; setp.lt.s32 %p4, %r1, 0; selp.b16 %rs3, %rs4, %rs2, %p4; ``` After, it becomes a single `abs.s16` instruction. This also improves LLVM's ability to optimize floats. e.g. `abs(t) * abs(t)` is optimized to `t * t` now which didn't happen before. --------- Co-authored-by: Keren Zhou <kerenzhou@openai.com>