New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Winch: implement fpu to int conversions for aarch64 #9889

Open

MarinPostma wants to merge 7 commits into bytecodealliance:main from MarinPostma:winc-fpu-truncate

Contributor

MarinPostma commented Dec 21, 2024

This PR implements FPU to int conversions for winch, with the unsigned_truncate and signed_truncate methods.

MarinPostma requested review from a team as code owners

December 21, 2024 16:35

MarinPostma requested review from cfallin and alexcrichton and removed request for a team

December 21, 2024 16:35

MarinPostma force-pushed the winc-fpu-truncate branch from 2d48a96 to 9392766 Compare

December 21, 2024 16:37

Contributor Author

MarinPostma commented Dec 21, 2024

one more in the #8321 series

github-actions bot added the winch label

github-actions bot commented Dec 21, 2024

Subscribe to Label Action

cc @saulecabrera

This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

MarinPostma changed the title ~~Winch: implement fpu to int conversions~~ Winch: implement fpu to int conversions for aarch64

Member

saulecabrera commented Jan 1, 2025

I can help reviewing this one.

saulecabrera requested review from saulecabrera and removed request for cfallin and alexcrichton

January 1, 2025 17:47

saulecabrera reviewed

View reviewed changes

Member

saulecabrera left a comment

Looking good. Left some comments inline.

As a general comment, the emit_* naming convention is not standard across the assembler implementations (as noted and fixed in #9918), so I´d suggest dropping the emit_ prefix in the assembler methods introduced in this PR.

winch/codegen/src/isa/aarch64/asm.rs Outdated

+                  }
+                  /// Emit the `fcmp` instruction with `rn` and `rm` as source registers.
+                  fn emit_fpu_cmp(&mut self, rn: Reg, rm: Reg, size: OperandSize) {

Member

saulecabrera Jan 2, 2025

Here we don´t need to duplicate this method, fcmp already exists.

winch/codegen/src/isa/aarch64/asm.rs Outdated

Comment on lines 1140 to 1141

		/// Load the max value for an integer of size out_size, as a floating-point
		/// of size `in-size`, into register `rd`.

Member

saulecabrera Jan 2, 2025

Suggested change

      
                /// Load the max value for an integer of size out_size, as a floating-point
          
                /// of size `in-size`, into register `rd`.
          
                /// Load the max value for an integer of size out_size, as a floating-point
          
                /// of size `in_size`, into register `rd`.

winch/codegen/src/isa/aarch64/asm.rs Outdated

+                      use OperandSize::*;
+                      match in_size {
+                          OperandSize::S32 => {

Member

saulecabrera Jan 2, 2025

Nit: I believe that here and in the next match arm below you can omit the OperandSize:: prefix given that you're already importing it above (e.g., use OperandSize::*)

winch/codegen/src/isa/aarch64/asm.rs Outdated

+                      use OperandSize::*;
+                      match in_size {
+                          OperandSize::S32 => {

Member

saulecabrera Jan 2, 2025

Similar comment here, I think it should be possible to omit the OperandSize prefix in both match arms.

winch/codegen/src/isa/aarch64/asm.rs Outdated

+                          // - check bounds
+                          self.emit_check_nan(src, src_size);
+                          self.emit_min_fp_value(signed, src_size, dst_size, Writable::from_reg(tmp_reg));

Member

saulecabrera Jan 2, 2025

If we're writing to tmp_reg, I think this should be defined in the signature, similar to dst: tmp_reg: Writable<Reg>

winch/codegen/src/isa/aarch64/masm.rs Outdated

+                          src_size,
+                          dst_size,
+                          kind,
+                          // TODO: strangely, this register is provided in unsigned_truncate

Member

saulecabrera Jan 2, 2025

I´d suggest avoiding passing a placeholder register here. The temp register that is passed to unsigned_truncate is an allocatable register, therefore there's very little chance of it getting unintentionally clobbered. Instead having a single implementation for both cases, I´d suggest breaking down the implementation in the assembler into two methods, one for each case (signed/unsigned), then in each implementation you can decide to use the scratch register, immediately before emission, without the risk of unintentionally clobbering it. This is similar to how the x64 assembler implementation works.

Contributor Author

MarinPostma Jan 3, 2025

To make sure I understand, similarly to how x86 implementation, you would like to have two methods, fpu_to_sint and fpu_to_uint in asm, and also, those methods should be passed a temporary register in masm, rather than using the provided one in the case of unsigned_truncate?

I have a few questions/remarks:

what is the purpose of the temp reg in the unsigned variant? Why must this registry be allocated rather than used as a temp register? Is the caller expecting to find something in there? It is not explicitly passed as a writable register, but the x64 implementation implicitly converts it to one.
While splitting the convert_ methods for x64 makes sense to me since they correspond to a single instruction, we must emit a bunch of checks that are aware of the sign in the case of aarch64. Do you suggest duplicating all the checks in signed and unsigned variants? Or do we have two methods in ASM that call the current implementation with the signed argument set accordingly?
I'm sorry if that sounds very ignorant, but why do we have to worry about clobbering a tmp register anyway? Can't it, by definition, be used for any purpose, and one should not have too strong an assumption about what it may contain, especially when handing it over to a callee?
I'm surely missing something, but it seems to me that splitting the convert method into a signed and unsigned variant, and the use of tmp regs are orthogonal things, so I want to make sure I understand what you want correctly :)

Member

saulecabrera Jan 4, 2025 •

edited

Loading

The purpose of the allocatable temporary register is to provide an extra
register to aid in the emission of the instruction sequence needed for the
unsigned case; in x64, 4 registers are required for the instruction sequence:
1 for the destination and 3 temporary registers. Since in the case of x64 the
instruction sequence is completely handled by Cranelift, there's no risk of
unintentionally clobbering any of the registers passed to it, the unintentional
clobbering risk is particularly important in the case of scratch registers in Winch: even
though they can be used for any purpose, one important detail about them is that
they are not tracked by Winch's regalloc therefore they must be used sparingly
and with extreme caution: a misuse could result in unintentional clobbering and
therefore subtle bugs. Ideally, the live range of the scratch registers should be
as short as possible to avoid potential bugs.

In this particular case the reason why I am suggesting improving the usage of
the scratch register is to minimize the risk of unintentional clobbering of such
registers: once a scratch register is passed as a parameter, its live range is
extended and the risk of misuse is greater. A concrete example in this case of
a potential risk of unintentional clobbering is passing the scratch register to
fpu_to_int and then using the float scratch register in any of the methods
needed for the instruction sequence needed to fulfill the truncate instructions
(this is not the case today, but by passing the float scratch register around as
if it were any other register, it becomes harder to reason and audit in the
future).

There are some alternatives that I've thought about to improve the
handling/usage of the scratch registers to address some of the challenges
exposed in the previous paragraph. We could discuss those in a separate
issue/PR (e.g., relying on the type system to identify/audit scratch register usage and/or
providing exclusive access to the scratch registers, similar to what we have for
the allocatable registers). Care needs to be taken here to ensure that none of
the solutions explored affect compilation performance.

Taking a deeper look at your changes for the unsigned case, only 3 registers are
needed here compared to the x64 implementation which demands 4. This means that
you can effectively ignore the temporary register defined in the Masm
unsigned_truncate signature and only rely on the float scratch register.

Taking this into account, my concrete suggestion to move forward here is to:

Redefine the unsigned_truncate signature to be:

fn unsigned_truncate(
    &mut self,
    context: &mut CodeGenContext<Emission>,
    src_size: OperandSize,
    dst_size: OperandSize,
    kind: TruncKind
) -> Result<()>;

This allows maximum flexibility to encode the ISA differences and more
importantly reduces register pressure in cases where only 3 registers are
needed, reducing unnecessary register pressure where possible is important to
minimize memory traffic.

This is similar to how we handle the div instruction, given that the
requirements vary between ISAs.

Redefine fpu_to_int to be:

fn fpu_to_int(
    &mut self,
    dst: Writable<Reg>,
    src: Reg,
    src_size: OperandSize,
    dst_size: OperandSize,
    kind: TruncKind,
    signed: bool
) { ... };

Avoiding passing the scratch registers as params and shortening their live
range as much as possible by using them explicitly in each of the helper
functions that you have to fulfill the instruction sequence.

I believe that with this approach, my concerns will be addressed and there won't
be a need to split the signed/unsigned implementations (initially I was confused
with the TODO comment in the code and had assumed that the implementation was
making use of a scratch register that wasn't actually needed, although that's
not the case).

Contributor Author

MarinPostma Jan 4, 2025

Thanks a lot for the detailed writeup! This makes things a lot clearer. I'll make the changes 👍

MarinPostma added 4 commits

January 3, 2025 15:38


          implement fpu to int conversions

e527a11


          add tests

766fec5

fmt

1f599eb


          review edits

51efc79

Contributor Author

MarinPostma commented Jan 3, 2025

@saulecabrera I addressed most of the review comments but left a few questions


          fix rebase mishaps

9311b6e

MarinPostma force-pushed the winc-fpu-truncate branch from 9392766 to 9311b6e Compare

January 3, 2025 14:45

MarinPostma added 2 commits

January 3, 2025 15:46

fmt

1417d02


          defer tmp reg allocation in unsigned_truncate

cd42a15

as agreed in review

Contributor Author

MarinPostma commented Jan 4, 2025

@saulecabrera I have made the requested changes

saulecabrera reviewed

View reviewed changes

Member

saulecabrera left a comment

Looks great! Thanks for all your patience iterating on this. Left two minor comments and then we should be able to land this one.

winch/codegen/src/isa/x64/masm.rs

+                      let dst_ty = match dst_size {
+                          OperandSize::S32 => WasmValType::I32,
+                          OperandSize::S64 => WasmValType::I64,
+                          _ => bail!("invalid conversion size"),

Member

saulecabrera Jan 5, 2025

Here you can use CodeGenError::unexpected_operand_size() instead. https://github.com/bytecodealliance/wasmtime/blob/main/winch/codegen/src/codegen/error.rs#L139

winch/codegen/src/isa/aarch64/masm.rs

+                      let dst_ty = match dst_size {
+                          OperandSize::S32 => WasmValType::I32,
+                          OperandSize::S64 => WasmValType::I64,
+                          _ => bail!("invalid conversion size"),

Member

saulecabrera Jan 5, 2025

Similar here, you can use the appropriate error constructor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

winch