You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The select operations form the cyclic dependency, and they are later transformed to cmov/minss instructions. Before my change, the SimplifyCFG pass could not produce selects, presumably, because the result of the integer select was stored into the temporary <1 x i32> array.
The cyclic dependency introduces by the integer cmov seems to limit performance of the loop. The loop would be better off with compare and jump.
Performance restores with -disable-select-optimize=false, unfortunately, it is disabled by default on X86.
Possible solutions:
Try to enable the select optimization for X86, but I guess it is disabled for a reason.
We can try to trick SimplifyCFG to not optimize the compare-jump into selects by setting the following probability to the jump instruction:
This reproduces on zen4 after #121544
After
InlineHLFIRAssign
we now have this inchozdt_
routine:The do-loop is the result of inlining of:
After LLVM inlining, and other optimizations we have the following minloc loop:
The select operations form the cyclic dependency, and they are later transformed to cmov/minss instructions. Before my change, the SimplifyCFG pass could not produce selects, presumably, because the result of the integer select was stored into the temporary
<1 x i32>
array.The cyclic dependency introduces by the integer cmov seems to limit performance of the loop. The loop would be better off with compare and jump.
Performance restores with
-disable-select-optimize=false
, unfortunately, it is disabled by default on X86.Possible solutions:
This is just a trick though, and the right solution should be allowing the select optimization to use its heuristics.
The text was updated successfully, but these errors were encountered: