Back in college when I was first exposed to functional languages, I fell in love with algebraic data types and structural pattern matching. Now that I have joined the work-force I no longer have the freedom to work with those languages. So I decided to give a shot at emulating structural pattern matching in Java.
If this concept is new to you here’s a short intro. Let’s say we want to represent binary trees. We know that a tree has three kinds of structures:
- A node that can have a left branch and right branch
- A leaf that has a value
- An empty tree with no value or branches
In Haskell, we can group these three structures under a single data type:
Now, whenever we do any kind of operation on trees, we can use pattern matching to find out which of these three structures we are dealing with. For example, here is function that calculates the depth of a tree.
This recursive function, tests the structure of the tree and returns different values depending on the pattern that matched.
Now, all that is good. But how do we implement the same in an object oriented language that does not support algebraic data types or pattern matching. Well, the closest thing to algebraic data types in an OOP language like Java is a class. And to simulate pattern matching we will use a feature called polymorphism.
Here’s our first implementation:
Nothing fancy is going on in this code snippet. I declared an interface called Tree which defines a method signature called “depth”. There are three classes that implement this interface. And each class provides it’s own implementation of depth. With this setup, the master depth function is trivial:
With the help of polymorphism, at each step of the recursion, the right implementation of depth is going to be called based on the class.
So? What’s the problem? Well, it was trivial to write, but it’s going to be a nightmare to maintain this setup. Let’s say, tomorrow we want to add a function to flatten a tree, or a function to do an in-order traversal. We would have to add the methods to the tree interface. And then we would need to add an implementation of these methods to each of the three classes. Doesn’t sound like a lot of work. But imagine if we were distributing these classes as a library and there were hundreds of other people using it. Each time we add a couple of utility methods to the Tree interface we would need to recompile all the classes and deploy a new version of the library, and ask hundreds of users to update to the latest version.
So, we will need to think of a different approach. One that does not require us to change the original classes. We need an approach that empowers the library user to write his own implementation of “depth” without any fuss.
The quick and dirty way would be to use the instanceof operator to perform pattern matching manually.
This version works, but it has its problems. By using the instanceof operator we are giving up all compile-time guarantees on whether we are doing our cast correctly. That means, if you make any mistake, you’ll only find that out only at runtime. Moreover, our previous approach used interfaces. So we were guaranteed to have an implementation of depth for each possible structure. In our new approach, there are no such guarantees. If the library user, forgot to check for a particular variant of the tree structure he wouldn’t get any compile time warning, just an exception at runtime.
We need a new approach. One that takes advantage of compile time checks and also forces us to exhaustively check for all possible structures. We can do that using Exceptions and Try-Catch blocks. Here’s our new implementation of Tree:
Here are the important differences:
- Tree is not an interface anymore, it’s a concrete class
- Tree extends Exception, so it can be thrown
- Node, Leaf, and Empty are also exceptions, so they can also be thrown
- The match method can throw an exception corresponding to each possible structure
With our new approach, here is our new depth function:
Since the match method of the Tree class exhaustively lists all possible structures as throwable Exceptions, the depth function will even not compile unless there is a catch block for each structure. As a result, you are getting both exhaustiveness and compile time checks at the same time. To be honest, this does not guarantee 100% exhaustiveness. The developer of the library has to make sure he throws an Exception for each possible sub-class of his algebraic type. But at least, in this approach, the burden of responsibility lies on the API side, not on the user. Therefore, if any future release of the library introduces a new structure, users won’t have to read the documentation, or discover it at runtime. They can just rely on the compiler to do the check for them and identify the affected methods.
There’s just one thing that makes this approach less than perfect. Unfortunately, our tree class cannot be generic. Because of the way generics are implemented in Java, sub-classes of Exception are not allowed to be generic.
If this is a deal breaker for you, check out this excellent blog post that describes another approach using the Functional Java library.
Tags: algebraic types, exception, java, pattern matching, structural
Greetings humans and spiders... I'm Shahriar and this is my stream of consciousness... Enjoy the random musings and my sporadic excitement for the silly things we always overlook...
Or, you know, Scala.
To make construction cheaper, suppress the stack trace:
https://github.com/scala/scala/blob/master/src/library/scala/util/control/NoStackTrace.scala#L11
To clarify, Java’s “catch” is not pattern matching. It is type casing. To be pattern matching it would need…um…patterns.
So checked exceptions in Java are useful after-all !
Why not just use multiple dispatch by way of the visitor pattern?
In your first snippet of Java code, in Node class, you mean
return 1+Math.max(left.depth(), right.depth())
I hope.
Thanks anyway!
The performance hit of using exception handling for control flow unfortunately makes this approach little more than a novelty.
For Java, I really like Runar’s solution which you link to, although as James pointed out, most of these solutions really don’t give us the benefits of real pattern matching and destructured binds.
But at least Runar’s solution gives us exhaustive type safety for the entire algebra of an ADT and reasonably enough performance.