Compiling Ruby. Part 5: exceptions
This article is part of the series "Compiling Ruby," in which I'm documenting my journey of building an ahead-of-time (AOT) compiler for DragonRuby, which is based on mruby and heavily utilizes MLIR and LLVM infrastructure.
This series is mostly a brain dump, though sometimes I'm trying to make things easy to understand. Please, let me know if some specific part is unclear and you'd want me to elaborate on it.
Here is what you can expect from the series:
- Motivation: some background reading on what and why
- Compilers vs Interpreters: a high level overview of the chosen approach
- RiteVM: a high-level overview of the mruby Virtual Machine
- MLIR and compilation: covers what is MLIR and how it fits into the whole picture
- Progress update: short progress update with what's done and what's next
- Exceptions: an overview of how exceptions work in Ruby
- Garbage Collection (TBD): an overview of how mruby manages memory
- Fibers (TBD): what are fibers in Ruby, and how mruby makes them work
Call Stack, Stack Frames, and Program Counter
During the program execution, a machine maintains a pointer to the instruction being executed. It’s called Program Counter (or
When you call a method (or send a message if we are speaking of Ruby), the program counter is set to the first instruction on the called function (
The program somehow needs to know how to get back to the call site once the “child” method has completed its execution.
This information is typically maintained using the concept of a Call Stack.
Consider the following program and its call stack on the right.
The call stack consists of Stack Frames. Whenever a function is called, a new stack frame is created and
pushed onto the stack. When the called function returns - the stack frame is
At every point, the call stack represents the actual Stack Trace.
The very top of the call stack represents the scope of the whole file, followed by the stack frame of the
first function, followed by the
second function, and so forth.
In Ruby, the top function/file scope is referred to as simply
Now, imagine that we want to pass some information from the
second function to the
top. Some error or something exceptional happened, and this specific program state needs some special handling.
There are several limited ways to handle such case: either return some special value up (thus, each function on the call stack should be aware of this), or we can use some global variable to communicate with the callers (e.g.,
errno in C) which is again “pollutes” the business logic through the call stack.
One way to handle this problem more elegantly is to use particular language constructs - exceptions.
Instead of polluting the whole call stack, we can
raise an exception and then add special handling at the
top, like in this picture:
Now, the question is: How do we implement this feature? To answer it, let’s understand what needs to happen!
The program was in some specific state before it called the
first function at the top.
Now, the program is in another specific state around the
raise "error" line in the
We need to restore the state somehow as it was right before the
first call and continue execution right after the
top (by changing the program counter accordingly).
Conceptually, we can save the machine state before calling the
first method and restoring it later. The problem is that storing the state of the whole machine is too expensive and adds overhead by saving more than needed.
Instead, we can put the responsibility for maintaining the program on the actual program developers.
Most languages provide useful features for dealing with this:
- Ruby has explicit
- Java has explicit
- C++ has RAII and implicit destructors
- (C has
longjmp, but we are only talking about useful features)
Here is how it works in the case of Ruby.
Whenever the exception is thrown, the program climbs up through the call stack and executes code from those finalizersuntil it reaches the exception handler.
This process is called
I’m not a native speaker, but I’d say it should be called “Stack Winding”, but oh well
Here is an updated example with explicit state restoration during the stack unwinding.
Without executing code from the
ensure block, the hypothetical lock would never be released, thus breaking the program in terrible ways.
Exceptions in Ruby
Now, I can talk about different kinds of exceptions in Ruby. From my perspective, there are three different kinds:
return statements have special meaning when used in the context of
Let me elaborate on all the three with the examples.
Actual exceptions climb up the stack, calling finalizers until an exception handler is found.
These are the normal exceptions you are all familiar with.
returns from a block
return statements behave differently depending on the lexical scope they are part of.
Here is a little puzzle for you.
What will be printed on the screen:
return is called from within a block. You may expect the
x * 4 to be returned from the block, but it’s returned from the enclosing function (lexical scope).
As you can see,
return x * 4 would return from
f instead of from the block.
The code prints
breaks allow returning from the enclosing function, but in a slightly different way.
This is the most complex example here. Let me write down the steps explicitly. You may want to open the picture in a separate tab to read it.
loopfunction and passes the block to it. The block is just another function under the hood; it’s presented separately here as the
- Runtime creates a new stack frame for
loopand puts it on the call stack.
loopcalls the passed block (
- Runtime creates new stack frame for
__anonymous_blockand puts it on the stack.
i, checks for equality, and returns to
loop, nothing special.
- Runtime removes the
__anonymous_blockstack frame from the call stack.
loops stack frame is kept on the call stack, and the next iteration of
while truecalls the
- Runtime creates new stack frame for
__anonymous_blockand puts it on the stack.
i, checks for equality, and invokes
breakinitiates stack unwinding and returns from the enclosing function (
loop). See the dashed line.
loopreturns, thus bypassing the endless loop
break construct is effectively equivalent to the following code:
All the language constructs described above (exceptions,
breaks within a block) behave similarly: they unwind the stack (calling the finalizers on the way up) and stop at some well-defined point.
They are implemented slightly differently in the original mruby runtime. Still, I implemented them all as exceptions, with
breaks being special exceptions: they need to carry a value and store information on where to stop the unwinding process.
The implementation from the LLVM perspective is covered in my recent talk at LLVM Social Berlin: Stack unwinding, landing pads, and other catches.
Here, I’ll mainly focus on the details from the Mruby runtime perspective.
Consider the following example:
The blocks following
ensure are called Landing Pads.
This example has two kinds of landing pads: catch (
rescue) and cleanup (
Catches are “conditional” landing pads: they will be executed only if the exception type matches their type. Note the last
rescue: it doesn’t have any type attached, so it will just catch any exception.
Conversely, cleanups are “unconditional” - they will always run, but they will also forward the exception up to the next function on the call stack.
Another important detail in this example is the second
rescue: it uses function argument as its type. That is, the landing pad type is only known at run time, and it could be anything.
In C++, for example, all the
catch types must be known upfront, and the compiler emits special Runtime Type Information (RTTI). Again, IMO, it should be Compile Time Type Information, but it’s C++…
For this reason, Ruby VM always enters each landing pad. For catches, it first checks (at run time!) if the exception type matches the landing pad’s type, and if so, the exception is marked as caught, and the landing pad’s execution continues.
If the exception type doesn’t match - the exception is immediately re-thrown so the next landing pad can try to catch it.
I’d love to describe how I modeled exceptions at the MLIR level, but it will take more time to do it for several reasons:
- my original approach to constructing SSA right away didn’t work due to the way exceptions work (namely, some registers must spill on the stack), so the dialects have changed a bit, and I need to clean them up a bit
- the way I model them currently is more of a hack and only works because I have certain conventions, so it’s not a solid model yet
- I added JIT support (for
Kernel.eval) and need to do some tweaking there to make exceptions work during just-in-time evaluation
I’ll write down all the low-level details at some point, but I don’t have an ETA, so I’ll stop here.
Thank you so much for reaching this far!
The following articles will focus on JIT compilation and debug information.