Possible optimization on Windows with AVX-512 vs AVX2 #91478

MadProbe · 2023-09-01T19:46:55Z

MadProbe
Sep 1, 2023

Per information in https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#callercallee-saved-registers, I came up with an idea that it would be faster if JIT would use volatile XMM0-5 and XMM16-31 registers first instead of using registers sequentially (from 0 to 31) and thus using non-volatile XMM6-15 after using up first 6 -> need to save these registers on the stack. Also there may be some other profitability functionality changes to accommodate this change to do this only when appropriate (if there are no calls to other functions & using only XMM6-15 parts (without upper YMM and ZMM usage), for example)

Answered by tannergooding

Sep 1, 2023

The register allocator already takes this into account. We independently track callee save vs callee trash sets and within those order them by encoding cost. Thus xmm0-5 are done first and xmm16-xmm31 are taken after those are used up.

Register preferencing, ABI conventions for passing/returning values, and other factors can also influence the ultimate register selection.

The baseline list/order can be seen here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/targetamd64.h#L277-L287 (Unix is in the ifdef just above that).

--Noting that this was adjusted as part of adding AVX-512 support in .NET 8

View full answer

tannergooding · 2023-09-01T20:03:22Z

tannergooding
Sep 1, 2023
Collaborator

The register allocator already takes this into account. We independently track callee save vs callee trash sets and within those order them by encoding cost. Thus xmm0-5 are done first and xmm16-xmm31 are taken after those are used up.

Register preferencing, ABI conventions for passing/returning values, and other factors can also influence the ultimate register selection.

The baseline list/order can be seen here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/targetamd64.h#L277-L287 (Unix is in the ifdef just above that).

--Noting that this was adjusted as part of adding AVX-512 support in .NET 8

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible optimization on Windows with AVX-512 vs AVX2 #91478

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Possible optimization on Windows with AVX-512 vs AVX2 #91478

MadProbe Sep 1, 2023

Replies: 1 comment

tannergooding Sep 1, 2023 Collaborator

MadProbe
Sep 1, 2023

tannergooding
Sep 1, 2023
Collaborator