“Anything that can go wrong will go wrong.” - Murphy’s Law
Many modern languages, including Ruby, use Exceptions as the primary method of error handling.
In this blog post, I will walk you through the history of error handling.
I will briefly describe popular options but will mostly focus on the pros and cons of Exceptions before demonstrating how monads might provide a better approach in some applications.
What is an error?
There are generally two categories of errors that might happen during the execution of a program: expected and unexpected.
Expected errors often come from the domain or the business logic of an application. For example, a user can’t be found in the database. Or the sum of debits doesn’t correspond to the sum of credits on an account, and so on.
Also, it’s errors caused by external application dependencies.
For example, a network request in the absence of connection, an attempt to access a missing file, a request to a server that is down, etc.
From the perspective of a program, these are not bugs. But if the program relies on having that user data or on the contents of that file, or server response, such errors can, and almost certainly will become bugs if left unhandled.
Unexpected, on the other hand, are errors that are hard or impossible to predict. But most importantly, such that are impossible to deal with: stack overflow, memory limits, hardware failures, etc.
Most unexpected errors are too hard or impossible to handle, while expected errors have a long history with many different ways of handling.
But what does it mean to “handle” an error? Let’s look into the history of dealing with errors to find the answer.
History of errors
In the early years of programming languages, GOTO/JUMP instructions were used for the control flow, including dealing with errors - as one of the possible branches of program execution.
Unfortunately, GOTO is too powerful, and writing or reading any reasonably sized program would often become unreasonably hard.
This frustration had lead to the famous “GOTO considered harmful” paper by Edsger W. Dijkstra. He argued that the freedom to jump from anywhere to anywhere in a program provided by the GOTO statement is leading to unreadable, unmaintainable code and that it should be restricted to some standard, structured control flow patterns, instead.
The paper sparked a debate in which another paper written earlier by Böhm-Jacopini (https://en.wikipedia.org/wiki/Structured_program_theorem) was suggested as a possible solution.
It declared that any program could be implemented without GOTO statements if a language it’s written in allows three basic things: a sequence of procedures, if/else statements, and while loops.
In structured programming, a procedure call behind the scenes is still a GOTO - however, predictably constrained.
It works by entering a subprogram, pushing all values to the stack, executing, leaving the sub-program to where it was before entering, and popping all values back off the stack.
Roughly speaking, it’s how procedures (and functions/methods) work even today in many programming languages.
Despite being a beautiful, high-level concept, procedures needed a way of dealing with errors that would be an alternative to GOTO statements.
The first obvious solution to that are error values.
With procedures (and also functions, or methods) as a concept that logically separates different computations, it’s convenient to think of each procedure as a black box.
A calls procedure
B, which in turn calls procedure
C, each of them expects the other to do its job. If for some reason,
C cannot do its job, it puts
B at risk of not doing its job because
B depends on the result of
C. That, in turn, puts
A at risk for similar reasons. Technically, in the end, for
A not doing its job means halting the program execution and exiting the process.
C could somehow continue working in spite of errors and return to
B, it would enable
A to continue working as well.
B, as well as
C are an excellent place to handle such scenarios!
C encounters an error, it can return a value to
B, indicating the presence of the error and leave it to
B to decide what to do.
It’s the approach many languages took when the structured programming was getting adopted.
Unfortunately, there are some problems with this approach:
- Semipredicate problem: for example, if a procedure returns 0 to indicate an error but the calling code cannot distinguish if it’s an error or a result of a calculation (such as (3 - 3), or (-1 + 1)) - because 0 belongs to the range of valid return values of the function. More subtle scenarios are possible.
- Loss of context: as the error value flows back through the sequence of calls, the knowledge about what caused the error gets lost. As an example, think of what happens if the C procedure returns an error value to B, but B does not handle it for some reason and passes it back to A. Now, A has no idea about the context of the error. What if the call stack is a dozen levels deep and the error goes from the very bottom to the very top of it?
- Easy to silently ignore or forget to handle an error value. This problem is common when a procedure C can return not one, but multiple different error values indicating different errors. It is especially bad if the procedure C lives in an external library, and its errors are not well documented.
- Implicitness and verboseness. Because possible return values are not restricted by anything (but the imagination of the code author), the user of a procedure must read the procedure code or documentation to learn about all possibilities.
Various languages developed several solutions to these problems over the years, with the most notable ones being:
- Returning multiple values (most commonly - two): one containing a result, and another containing an error indicator (boolean or something more specific). The calling procedure can then check if there is an error before accessing and using the value.
- Returning a unique value like nil or null. Very primitive as it doesn’t provide any information about the error and can only be used for one error per procedure.
- Storing the error code in a global variable
- Throwing an exception - a mechanism built-in into a language that allows a procedure to use a special syntax to indicate an error, and the calling procedures up the call stack to handle it.
Lets now review each of these solutions.
One of the most notable examples today is the Go language. It has a built-in error type, which it uses to indicate an abnormal state.
For example, a procedure for opening files
os.Open, returns two values: file and an error.
In a program it looks like this:
The error can hold additional information, and it is the responsibility of the error implementation to summarize the context.
While an acceptable solution, it does feel a bit too verbose. For example, if the value is not empty, the error could be omitted. And vice versa - if there is an error, we don’t need to assign the result, which in this scenario is garbage anyway.
Yet, especially for applications heavy on I/O, this pattern is prevalent and leads to “noisy” code. Techniques to avoid noise are mostly based on hiding or ignoring errors, which is a bad taste in itself.
nil (also “null”, “null pointer”) represents nothingness or absence in most programming languages.
Called a “Billion Dollar Mistake” by its creator, Tony Hoare, it keeps haunting programmers in many languages even today.
The severity of problems caused by nil differs between implementations. In Ruby nil is an object, mostly causing exceptions when undefined methods are mistakenly invoked on it. It’s not as bad as causing segmentation faults by dereferencing null-pointers in C, for example.
Yet, it is too vague and unspecific to be a good idea to return from methods, especially to indicate errors.
Many modern languages don’t even include null pointers in their syntax (Rust, Elm, TypeScript, etc.).
Storing errors in a global variable to be able to access them from everywhere is essentially the same as storing any other, globally shared data. And it comes with the same set of problems:
- No access control: any procedure in the program can read and set global variables without any restrictions or constraints, leading to inconsistencies.
- Poor locality: reading and reasoning about (error-handling) code that is not contained together in one place becomes increasingly harder as the codebase grows.
- Implicit coupling: multiple independent procedures relying on the same global variable or even worse - on the same error in a global variable, lose their independence, and become coupled.
Concurrency issues, namespace pollution, and more difficult testing are also among the problems that make this approach unreliable.
Exceptions were invented and introduced into early versions of Lisp with the intention of “freeing” the code from the explicit error handling with values, which was perceived to create noise by mixing error handling and the working code. Expressing error-handling with exceptions instead should lead to more readable code for the “happy path.”
Another intended benefit was to allow better integration between libraries and applications by letting errors propagate from the library to the application. Libraries tend to be unopinionated about many errors that happen within them. They just return errors to the main code, instead of dealing with it on the spot. With error values as the primary mechanism, it means that libraries must have a lot of error checking code in every procedure even if the call stack is dozens of levels deep. It’s very tiresome, and exceptions promised to solve this problem by letting a library throw at one level and the application code to catch at another, even when very far from each other.
Over time, though, many problems with exceptions themselves were discovered - some quite dangerous, making reliable error handling with exceptions not as simple it was hoped to be.
Here are some of such problems.
Shifting semantics and notion
Depending on the language, exceptions tend to have different meanings. In some languages, exceptions represent abnormal, unpredictable situations, while in others, they serve solely as a kind of control flow. There are also languages where exceptions are something in between.
But languages don’t live in isolation. They live in ecosystems of communities, libraries, books, online discussions, code examples, and documentation - each influencing the style and standards in which a language is used, including, of course, how exceptions are used, too.
As if this wasn’t enough to confuse things, some languages have mechanisms both for exceptions and for exception-like control flow.
Ruby is a notable example with it’s
begin/raise/rescue is what exceptions are in other languages, while
catch/throw is a control flow mechanism in the style of GOTO.
The Internet is full of blog and StackOverflow posts by confused people trying to learn the difference.
Also, this naming is rather hostile to programmers from other languages where catch/throw nearly always indicates a traditional exception handling mechanism.
Hidden control flow
“Exception handling introduces a hidden, “out-of-band” control-flow possibility at essentially every line of code. Such a hidden control transfer possibility is all too easy for programmers to overlook – even experts. When such an oversight occurs, and an exception is then thrown, program state can quickly become corrupt, inconsistent and/or difficult to predict.” - Jason Robert Carey Patterson, Nov 2005
Exception handling flow lives in its own realm, parallel to and hidden from the actual code. It is implicit and undeclared. Such nature of exceptions encourages programmers to simply forget about possible errors or downright ignore them, leading to unreliable and possibly dangerous code.
According to this paper, 35% of all Catastrophic failures in distributed systems are caused by mistakes like over-catching, ignoring or forgetting to catch exceptions.
But even those who think about possible errors are left without any help. There is no way to tell a procedure that throws an exception from the one that doesn’t. One would have to dive a rabbit hole to inspect the code of each underlying procedure, and even then, it wouldn’t guarantee that some library hasn’t forgotten to document an exception. (One such story is detailed in the book “Release it!” (by Michael T. Nygard) in the chapter called “The Exception That Grounded An Airline”).
Each line in any procedure potentially creates a possible exit point, which is impossible to tell from the procedure’s signature nor its name.
It seems that the only way to write reliable code in the presence of exceptions is to religiously wrap each line of code in try/catch blocks, but unfortunately, even that can’t save you from corrupting the state.
Context loss and Corrupt state
One thing I’ve only briefly mentioned so far is that when an exception happens and propagates up the stack, it “unwinds” the stack and removes all the objects from the scope on it’s way up until it reaches a corresponding catch block. Such behavior intends to “undo” or “rollback” everything leading to the error to give the catching code a chance to try again.
One problem with it is that without the context of what the lower-level code was trying to do when the error occurred, the higher-level code that caught the error can’t possibly know what caused an erroneous situation.
Another problem is that the high-level code knows nothing about the state that was changed by the lower-level code before the error happened. This state can be both internal modifications to the global state of the application as well as external: database or service calls, file manipulation, etc. The unwinding cannot “undo” such changes.
And it’s not even the worst! Imagine, if the code was modifying some big data structure when the exception occurred. After the exception was caught - the data structure stays in a corrupt state!
Such scenarios could potentially lead to catastrophic failures.
Concurrent and Parallel Programming
The moment when one enters the world of concurrent and parallel programming, he is doomed with the fact that proper exception handling becomes extremely hard or even impossible.
For simple multi-threaded applications, there is no best way to react to an exception in one of the threads. It’s either unwinding up to the point of forking and deciding to keep other “healthy” threads running or killing all the threads. Each scenario can potentially lead to corrupt data.
When multiple threads don’t simply fork/join but form more complex thread pools and communicate via shared “work queue”, each doing its little part, then there is nothing to unwind in case of an exception because the original caller is detached from such threads and is nowhere to be found to deal with the situation.
The same applies to the Actor model, where actors can be not only local but connected over the network. The fact that the exception handling mechanism “lives in a parallel realm” and is not part of the program’s control and data flow makes it impossible to propagate exception between actors and across the network.
With all its problems, in simple scenarios, where a discovered error can be turned into an error message straightforwardly, exceptions can be used to simplify the application code by only having exception handlers at a higher level of the application. In web applications, it would be middleware close to the webserver or a high-level code that returns an HTTP response.
It can also be used rather safely and easily in situations where expected errors are very few, and exceptions are caught immediately at the level where they are raised.
In other situations, where error discovery requires several non-trivial steps, often with the context of the error and high degree of flexibility to handle the error, exception handling becomes significantly complicated and hard. It requires high discipline and attention and still doesn’t guarantee reliable software in the end.
Monadic return values
Let’s take a moment and review the idea of returning a tuple from every procedure, as in the Go programming language described above.
In its core, it’s a perfect idea: every function explicitly returns both value or an error, leaving no room for ambiguity.
Unfortunately, the need to
if..else check for the presence of the value or the error to learn if there was a success or a failure makes this technique cumbersome.
The same goes for the fact that returned values and errors can be of different, often not composable types, meaning that it’s impossible to chain multiple such procedures.
Monads can solve both these problems in a simple, elegant way.
Monads are data types that wrap values of any other types and provide a structured way of executing a sequence of functions on those wrapped values, without any additional boilerplate.
There are many kinds of Monads for various purposes, but in the context of this discussion, we are interested in one that is called Either (also known as Result or Error - depending on the language or the library).
The Either Monad represents values with two possible types: either Left or Right. Because most often, the Either Monad is used to represent a value which is either correct or an error, by convention, the Left type is used to hold an error value, and the Right is used to hold a correct value.
Representing correctness is not the only use case for Either Monad - therefore, such an abstract naming. But in many languages and libraries where representing correctness of computation with Either is indeed the only use case - it’s named more specifically as Result, with types
Success instead of
Right accordingly. This is the case for the Ruby gem used for code examples here - dry-monads
Result is a so-called “sum type.” It means that a Result object can only be an instance of either
Failure objects can be composed together by calling method
#bind on them.
This solves both problems of the approach with tuples: when each return value is wrapped in either
Failure, then there is no need for
if..else statements and
#bind helps to compose multiple such procedures together.
(Moreover, pattern matching in Ruby 2.7+ makes working with sum types a pleasure.)
Equipped with this knowledge and
dry-monads library, we can make every method in our code return either
Failure objects, which we can chain together.
Let’s look at an example of a simple web service that uses the Result Monad.
It’s intentionally simplified wherever possible and has no tests to reduce its size and to keep the main point clear.
Let’s say there is an ActiveRecord model class User that represents the users table with one entry:
The web service has an endpoint for updating a user. It accepts id, name, and email as parameters.
Upon receiving the parameters, the code on this endpoint validates parameters presence, the email format, finds the user in the database, updates the user with the possible new name and email, and sends a notification email.
For the purpose of the example, all this functionality is contained in only one class:
The example above demonstrates a few essential basics about this approach:
- the data “flows” through a chain of method calls
- each method must return a Result - in the form of Success or Failure objects with data or errors inside, accordingly
- if a Failure is returned at any point - the chain gets interrupted, and the Failure is returned to the caller immediately
It’s good for basics, but the code above still has some significant shortcomings. Before addressing them, let’s make it a bit cleaner.
The dry-monads library provides a special syntax to fight the “Pyramid of Doom” like the one in the #call method: it’s called “do notation.”
After refactoring the
#call method to “do notation,” it looks much simpler and easier on eyes to read while staying mainly the same (consult the dry-monads documentation for more details):
Now that it’s tidier let’s see what is still wrong with the code.
#validate_email look good - we fully control what happens inside them and can expect them to return either
But if you look at #find_user,
#send_email - they all have something in common. And that something is external method calls!
Technically, there is nothing wrong with making calls to other methods outside our class or even our codebase. But we are not in control of that code, and we cannot expect it’s methods to return Result objects! Moreover, it might (and will) raise exceptions!
If we were writing the code in a more “functional” way or in a different, “FP-first” language, we would keep such “impure” calls (also labeled “side-effects”) at the outer layer of our application, far from the core business logic. But it’s outside the scope of this example. Our goal now is to replace exception-handling with a more robust approach.
dry-monads proposes another Either-like monad, similar to Result . It’s called Try, and it’s capable of running expressions and wrapping any possible exceptions raised by those expressions into an Error object while keeping the successful result in Value .
The power of monad composition allows for a seamless binding of Try and Result in a chain of calls. It’s also possible to transform Try to Result using the
#to_result method. In that case, the Failure object simply contains the exception class. Shortly, we will see how this might be useful.
But for now, let’s do a small refactoring of the three methods:
All external method calls are safe now. If any of them raises an exception - it’s going to be wrapped in an
Error object instead of blowing up.
With all that in place, we can now safely call our
UserUpdater in the request-handling code and deal with the result to our liking:
Why all the hassle?
If you still don’t see the value and find this code unusual or weird - you are not alone! But hear me out.
Unlike exceptions, our Result and Try monads are just objects. Data.
And data is the first-class citizen of the language.
Data comes in and comes out. Data can be manipulated and transformed. It can be combined with other data. Data can be used to make decisions further in the code.
After all - the whole language is designed around handling data!
So now, instead of handling errors with the limited
begin ... rescue mechanism, you can use the full power of Ruby to do it!
Take your Failure objects, for example. You don’t have to store only symbols in them like
What about putting an object with all the information you need to undo certain operations or database calls you made?
What about capturing the scope into a
Proc at the error location and later using it for all kinds of things: undo logic, logging, retrying?
With errors as data, your possibilities are only limited by your imagination (and the language)!
It’s simple to reason about
OK, it might not be the easiest thing to grasp from the first time - but so were exceptions when you learned about them.
Yet, unlike exceptions, because error Monads are data - it’s right there in front of your eyes.
It’s not hiding from the reader somewhere a dozen method calls away.
If there is a possibility of errors in the code you read - you see it in the form of Result objects.
Exceptions pop up to the point where a thread was spawned at, and blow up there, without ever getting to the parent thread.
Error monads are returned to the parent thread the same way as any other data, no additional work required.
It discourages forgetfulness and laziness
Let me admit it straight out: because Ruby is a dynamic language, there is no way to make you return Result objects from methods.
But in my experience, when you are adding or modifying a method in a class where all other methods return Result - you notice it and remember about it.
Also, if you call
#bind on a method that doesn’t return Result - it will fail, and thus remind about itself.
But mechanics aside, the most important thing is that it encourages thinking about possible error scenarios.
Writing code with error-handling monads pulls you from the cozy, illusionary world of happy paths into reality.
Price of using error Monads
Like every technology, it comes with its price.
It’s not easy to grasp
I mentioned above that using Either monads for error handling is simple.
It is simple, but it is not easy. It requires a paradigm shift, especially if you’ve been using exceptions or other styles for most of your career.
In every language community, there are its own set of conventions and practices. So is in the Ruby community.
Although monads are just data and there is technically nothing preventing you from writing your code with monads - because it’s uncommon, you have to explain it and teach it to new colleagues every time.
Might be an overkill in simple applications
The flexibility Either monads provide for handling errors in your code might, in fact, be not what you need.
For simple cases where turning an error into an error message for the user is straightforward - exceptions are still a valid solution.
When it comes to error-handling, the path from error-discovery to error-reporting is what determines what degree of flexibility and control you may need.
Either Monad is an elegant and simple abstraction that doesn’t only provide powerful error-handling but increases the modularity and reduces the overall complexity of code.
In this blog post, I intentionally tried to keep code examples at the bare minimum. I haven’t demonstrated how to create bigger, real-world-like programs with Monads. How to test, debug and run them togehter with other popular libraries and frameworks.
If you want to learn more and to figure out if Monads are for you - give them a chance in your next project!
Interested to learn more?
- dry-monads documentation - there is a lot more in the library than I was able to demonstrate in this post (for example, the
MaybeMonad). Go and learn what it’s capable of!
- Functional Design Patterns by Scott Wlaschin. Examples are in F# but it’s a very approachable talk, and it explains a lot of functional concepts, including error handling with Monads.
- Railway-oriented-programming by Scott Wlaschin again, as a follow up to the previous talk.
- Refactoring Ruby with Monads talk by Tom Stuart. A great introduction into a “homegrown” Ruby monads. Gives you an idea about how Monads themselves are implemented.
- Categories for the Working Hacker - an amazing talk by Philip Wadler about the Category Theory, tailored for programmers.