Dynamic languages on MIR (with a focus on closures) #306

netbsduser · 2022-11-11T18:42:46Z

netbsduser
Nov 11, 2022

First of all I want to thank you for your work on MIR - I was referred to this project by the authors of tinydotnet, a lightweight implementation of the .NET runtime.

I am working on an implementation of a dialect of the Smalltalk programming language, and will be targetting it to MIR to provide JIT compilation. I am not especially familiar with Ruby, but from what I have gathered it bears a very strong resemblance to Smalltalk. So it was interesting for me to read about your plans for MIR as the basis of a Ruby JIT. Much of what you discussed in "Code Specialisation for MIR" is directly applicable; in Smalltalk for example we have SmallIntegers, a class of integer which is implemented by tagging; an efficient Smalltalk compiler must give these special treatment so that method invocation can be avoided for cases like x + y where x and y are both SmallIntegers. This is similar to your treatment of Ruby FIXNUMS, so the properties and basic-block versioning work should be very beneficial for me there.

There are some other areas which are harder to deal with, and there is one in particular for which I wondered whether you have any ideas or plans on how you plan to tackle it for Ruby. Smalltalk has a very simple language model in which there is an extensive use of block closures - these are the lambda expressions well known to the functional languages. These introduce a problem for optimisation. Consider this typical pattern you might find in Smalltalk:

someArray collect: [ :x | x increment ]

someArray's collect: method is invoked (dynamically; someArray might be an instance of some subclass of Array created by the user which redefined collect:). This is equivalent to the map of functional languages. It is passed a block closure as an argument; this block closure returns the result of invoking increment on its sole argument. The result is a new array with all the values of someArray incremented.

The Array>>#collect: method will iterate over its members calling the block argument for each one. This is of course inefficient and a candidate for inlining, to eliminate this repeated calling, but the dynamic nature of what's going on means that MIR cannot help (neither can LLVM or libgccjit.) For my Smalltalk I plan to eventually handle this in the higher-level stages of compilation, before emitting MIR - add support for a 'good candidate for inlining' annotation to methods in Smalltalk code, and then rely on either type inference at compile time or the contents of inline caches to identify which possible receiver class' methods to add an inline case for (plus a general case for when the assumption is not met.) How about your plans for Ruby? Does a similar problem exist there, and would you be handling it in a similar way to this?

vnmakarov · 2022-11-16T15:14:24Z

vnmakarov
Nov 16, 2022
Maintainer

First of all I want to thank you for your work on MIR - I was referred to this project by the authors of tinydotnet, a lightweight implementation of the .NET runtime.

Thank you for your interest into MIR project.

I am working on an implementation of a dialect of the Smalltalk programming language, and will be targetting it to MIR to provide JIT compilation. I am not especially familiar with Ruby, but from what I have gathered it bears a very strong resemblance to Smalltalk. So it was interesting for me to read about your plans for MIR as the basis of a Ruby JIT. Much of what you discussed in "Code Specialisation for MIR" is directly applicable; in Smalltalk for example we have SmallIntegers, a class of integer which is implemented by tagging; an efficient Smalltalk compiler must give these special treatment so that method invocation can be avoided for cases like x + y where x and y are both SmallIntegers. This is similar to your treatment of Ruby FIXNUMS, so the properties and basic-block versioning work should be very beneficial for me there.

Smalltalk is very close to Ruby. Moreover it is well known that Ruby was significantly influenced by Smalltalk.

What I wrote about the code specialization is just I would like to see finally. I don't know how many years it will take to implement it for me. But I hope less than 3 years.

Currently, I decided to implement Ruby JIT with the current state of MIR project and do code specialization in Ruby-specific way with using a new dynamically specialized IR in CRuby. This could help me to achieve a good MIR-based Ruby JIT faster. You can get more information about this approach in my RubyKaigi2022 presentation https://www.youtube.com/watch?v=TGc8rccEXno

The more detail description of this approach will be described in RedHat developers blog post which will be probably published in December.

There are some other areas which are harder to deal with, and there is one in particular for which I wondered whether you have any ideas or plans on how you plan to tackle it for Ruby. Smalltalk has a very simple language model in which there is an extensive use of block closures - these are the lambda expressions well known to the functional languages. These introduce a problem for optimisation. Consider this typical pattern you might find in Smalltalk:
someArray collect: [ :x | x increment ]
someArray's collect: method is invoked (dynamically; someArray might be an instance of some subclass of Array created by the user which redefined collect:). This is equivalent to the map of functional languages. It is passed a block closure as an argument; this block closure returns the result of invoking increment on its sole argument. The result is a new array with all the values of someArray incremented.

The Array>>#collect: method will iterate over its members calling the block argument for each one. This is of course inefficient and a candidate for inlining, to eliminate this repeated calling, but the dynamic nature of what's going on means that MIR cannot help (neither can LLVM or libgccjit.) For my Smalltalk I plan to eventually handle this in the higher-level stages of compilation, before emitting MIR - add support for a 'good candidate for inlining' annotation to methods in Smalltalk code, and then rely on either type inference at compile time or the contents of inline caches to identify which possible receiver class' methods to add an inline case for (plus a general case for when the assumption is not met.) How about your plans for Ruby? Does a similar problem exist there, and would you be handling it in a similar way to this?

Yes, Ruby has also blocks and the same problem exists for Ruby. Moreover standard methods working with blocks (e.g. array method each) is written on C. This C code calls interpreter for every block invocation. It is very expensive procedure calling setjump (for possible exception handling). In my presentation I mentioned I solve this problem by iterator specialization.

Originally I started MIR-project to solve inlining problem too which is complicated for CRuby as we should inline block code generated from Ruby code into C code. I planned to translate all Ruby standard methods working with blocks into MIR, also translate block code into MIR and do inlining on MIR level or tracing the combined code on MIR level.

Unfortunately for CRuby it is complicated. CRuby uses a lot of C extensions which is not implemented by C2MIR compiler. So basically the first step translating standard CRuby methods written on C can not be done right now. But I am going to move to implement used C extensions or provide analogous extensions.

So I have a lot of plans but the biggest obstacle is time and efforts. I have too much technical debt and I have been working on a lot of projects and can not find enough time to move faster with MIR-project.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic languages on MIR (with a focus on closures) #306

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Dynamic languages on MIR (with a focus on closures) #306

netbsduser Nov 11, 2022

Replies: 1 comment

vnmakarov Nov 16, 2022 Maintainer

netbsduser
Nov 11, 2022

vnmakarov
Nov 16, 2022
Maintainer