winterkoninkje | Haskell: the language of pure thought

Joel on Software says somewhere that there are two things every programmer must understand to call themselves a computer scientist. The first: pointers, which can only be understood —in all their subtle horror— by learning C (or assembly). The second is recursion which can only really be learned from pure functional languages (or mathematical topology). Many imperative programmers think they understand recursion, but they don't. Lists are pretty. Trees are cute. Binary is pretty cute. But you don't understand recursion until you've been throwing bananas at bushes and convinced people it makes sense.

Today I present a job problem a coworker thought was impossible. It's a run of the mill problem, says the Haskeller, but it highlights the extent to which Java and imperative thinking cloud the simple mathematical problem and the equally simple functional solution. Consider for the moment that you're writing a program to do parsing-based machine translation. You have a bunch of CFG-like rules. Somewhere in your program you have a function that takes, for each non-terminal position, a list of all parses which can produce the given non-terminal, and you want to try all possible ways of completing each rule. For grammars restricted to only having at most two nonterminals per rule, your function looks something like this:

public void completeCell(a bunch of arguments,
                         ArrayList<Rule> rules,
                         ArrayList<ArrayList<Parse>> allParses) {
    for (Rule r : rules) {
        if (r.arity == 1) {
            for(Parse p : allParses.get(0)) {
                ArrayList<Parse> antecedents = new ArrayList<Parse>();
                antecedents.add(p);
                doCrazyStuff(a bunch of arguments, antecedents);
            }
        } else if (r.arity == 2) {
            for(Parse p0 : allParses.get(0)) {
            for(Parse p1 : allParses.get(1)) {
                ArrayList<Parse> antecedents = new ArrayList<Parse>();
                antecedents.add(p0);
                antecedents.add(p1);
                doCrazyStuff(a bunch of arguments, antecedents);
            }
            }
        } else {
            System.crash("ohnoes, we can only do two!");
        }
}

Did you read all that? Neither did I. But now it's your job to generalize this function so that it will work for rules with an arbitrary number of nonterminals. Obviously, adding extra conditionals with additional loops can only ever get so far. I bring up that the problem domain involves formal language theory, not just because it does, but because it gives us a much cleaner way to think about the problem without code bloat obscuring our vision. Consider the following class of formal languages: someone hands you a sequence of sets of letters, and you must enumerate all sequences of letters consistent with it (that is, the letter at position X is an element of the set at position X, for all X). In particular, your algorithm for enumerating these sequences must work no matter how long the (finite) sequence of (finite) sets is. These are exactly the same problem. All that fluff about what you do with the sequences after you've created them is just that. Fluff. Ditto for what the letters look like inside and why you'd want to do this in the first place. But given this simple problem enumerate :: Seq (Set a) -> Set (Seq a), do you see what the answer is?

Both sets and sequences are simple recursive data types, so we have two cases of recursion to decompose them set-by-set and letter-by-letter. In fact, our algorithm will work on infinite length sequences or infinite size sets, modulo the fact that it will never finish enumerating. (Given infinite sets, there are some tricks if you want to return the strings in a fair order. And given infinite sequences our algorithm requires an infinite call stack or laziness tricks, but there are some other tricks to make it feasible for different algorithms.) The idea is simply to pick the first remaining letter of the first remaining set (marking them off as you do so), and repeat. When you run out of sets, give your result to whoever cares, and step back one set. When your set runs out of letters, fill it up anew and step back one set.

This sort of embedded recursion generates a call-tree, which will be familiar to anyone who's seen Haskell's mystical incantation filterM (const [True,False]) for generating power-lists. In fact, the solution is so similar to that one that it's almost not worth mentioning. Unfortunately, call-trees are not the sort of pattern that imperative programmers often think about. Ours is just one example of a search problem. Imperativists often write programs to solve search problems, but the fact that they're exploring a search tree is obfuscated behind optimization tricks. In Haskell it's feasible to make the search tree explicit, even when it's potentially infinite. Once we can create the tree, it's trivial to walk along the tree and just spit out the answers. For this problem we already know the size, shape, and makeup of the tree so we don't even need to construct it, we can simply walk it as if it existed. Hence, call-tree. Which is exactly what the algorithm I just said does: pick each letter, call yourself recursively for the next set.

Unfortunately this day job doesn't pay me to write Haskell. While the solution is trivial there, we need to implement it in Java which lacks cheap allocation, cheap function calls, first-class functions, or referential transparency (all things that the Haskell solution depends on). So we need to do it by the same sorts of obfuscatory hackery as any other imperativist. A direct translation of the Haskell into Java would be far too expensive for a function that is called from the innermost loop of our program. But by this point we know that there is indeed an answer, and a simple one at that. Just looking at the Java code from the beginning would never have made things this clear because Java's syntax focuses on all the fiddly bits, and none of the important parts.

The second recursion (for each letter in this set) is a tail-call recursion, so it can be converted into a for-loop. The first recursion (for each set in the sequence) cannot be transformed this way.[1] If it could, then we would have a polynomial algorithm for enumerating the entire search tree. Given that there are exponentially many strings to generate, that can't be possible. Another way of looking at it is that the two recursions are at orthogonal angles from one another, whereas embedded loops are parallel to one another. We can see this in the shape of the call-tree, compared to the algorithm to, say, enumerate the elements of a two-dimensional array. For the two dimensional array there's no information sharing between iterating the first column and the second. Whereas for our call-tree we must hold onto the elements we've chosen in previous iterations, and must backtrack in order to pick new ones.

One obfuscation has to do with the lack of referential transparency in impure languages. Once we hit the leaves of the recursion we have to call some function with the sequence we've generated. Unfortunately, that function may make destructive updates to the sequence, or may hold a pointer to it and look at it in the future. That means when we backtrack, popping off the last letter and replacing it with a new last letter, the function may have already messed up the sequence or may see the changes we make. Thus we need to clone the sequence and hand the functions a copy of it, instead of the original. If we're smart we can get away with only cloning at the leaves instead of at every node of the call-tree. The smartness mainly has to do with taking advantage of the fact that when our recursive self returns, we know the other-us didn't ruin the sequence and hasn't squirreled away pointers to it. Given those assumptions about our behavior, then we can safely maintain a single sequence (i.e. array of pointers) by destructive changes, rather than needing to use a persistent data structure or cloning in the nodes, both of which involve expensive allocations. Essentially, this problem of where to clone and where not to amounts to the fact that we don't want to unfurl our tree of recursion. There's work shared at the top and we don't want to duplicate that work later on in the tree.

A second obfuscation has to do with the fact that Java lacks first-class functions. Once we hit the leaves of recursion and need to call those other functions, we need to pass them a whole bunch of extra arguments too. It'd be very expensive to pass all of them down on the call stack, because that means pushing the same data onto the stack every time we recurse. With first-class functions we could curry the arguments into the function and just pass the new function down, or we could use call-back tricks which amount to constructing the same closure. The Java-ish way of implementing this trick is to stash all the arguments away in an object, and have the recursive function be a method on that object. This pattern is an example of defunctionalization. Very heavy-handed, but underlyingly it amounts to the same assembly code in the end. Of course, looking at the source code it's not clear that this extraneous object is serving the role of a continuation. And we have to be careful that our future selves don't corrupt the object's state (hence, cloning at the leaves again).

The generalized version is not as efficient as the hand-unrolled version I posted above. It has extra function calls and there's extra work constructing the continuation and maintaining the sequence-in-progress. But the trade off is that we can now deal with grammars that have unboundedly many nonterminals in each rule. Given that there's no theoretical reason to restrict ourselves to binary rules, I count that as a success. Sure the algorithmic complexity goes through the roof, but that's no reason to stop people from trying it if they're willing to accept the cost. Programs are filled with too many arbitrary numbers simply because programmers don't understand recursion or generalization.

[1] Well yes, technically you can transform it into a loop if you want. But it's a while-loop and you'll need to be pushing and popping the sequence of sets (or, equivalently, maintaining counters/pointers). Formally, you've just re-invented the call-stack. You might as well just use the call-stack itself and save yourself the work of debugging and the cost of doing all that manipulation. The main difference between this explicit manipulation and using the call-stack is that this manipulation happens on the heap instead of the stack. The only reason to bother with it is if your call-stack size is severely bounded, but you have plenty of memory, and you know the recursion is going to be deep. But even if you do it, that doesn't change the formal power or the runtime complexity.

Flat | Top-Level Comments Only

But you don't understand recursion until you've been throwing bananas at bushes and convinced people it makes sense.

Really, Wren? Do you really believe that one needs to understand those papers to understand recursion? I don't understand those papers, but I didn't have any trouble with Joel's challenge.

Honestly, I think that if we want more people to use and understand recursion, it might help if we stopped propagating the myth that you have to be a rocket surgeon to understand it.

Those particular papers? Maybe not. Papers like them? Absolutely.

Joel's challenge is a trivial example of recursion. It's just using foldr (http://haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v%3Afoldr) to fold a list, mapping (^2) over it first. Thus: foldr (+) 0 . map (^2) which can be reduced to foldr ((+) . (^2)) 0. But the real questions are: why is it valid to combine the map in that way? and, critically, why does foldr have the type that it does? Both of these are very real questions about recursion. Solving Joel's problem isn't hard, knowing why the solution looks the way it does is vastly more important.

Lists are the simplest variety of recursion worth discussing (well, Peano integers are simpler) and so they often elide important issues that you encounter with non-linear recursion. Consider this challenge: You have binary trees data Tree a = Leaf a | Branch (Tree a) (Tree a), write the version of foldr for this data type so someone else could use it to define sumOfSquares using only addition, function composition, and squaring. Since addition is a monoid it would be easy to get this wrong, so make sure it could also be used to define toList :: Tree a -> List a using only cons and nil, retaining the order of the leaves. [You can use list concatenation too, but that can be easily defined in terms of list's foldr, see how? Using list concatenation results in O(n^2) complexity, but you can do a continuation-passing transformation for where you'd want to use it, to get it back to O(n).] If you can solve this one without breaking a sweat, then writing a foldr (aka catamorphism (http://knol.google.com/k/edward-kmett/catamorphisms/3qi7x2qrdushx/2#), aka bananna brackets) for bushes (data Bush a = Leaf a | Branch (Bush (Bush a)) (Bush (Bush a))) shouldn't be too hard, despite the non-uniform recursion.

Once you've seen all of these and the catamorphisms for other recursive data types, it should become clear how we can generalize them all into a single function cata (http://hackage.haskell.org/packages/archive/category-extras/0.53.5/doc/html/Control-Morphism-Cata.html) which takes an F-algebra (phi :: f a -> a) and returns a function to fold the least-fixed-point (using the isorecursive (http://en.wikipedia.org/wiki/Recursive_type#Isorecursive_types) definition) of a functor (some type constructor f which has a version of map :: forall a b. (a -> b) -> (f a -> f b) defined). The generalization to cata isn't necessary to have a decent grasp of recursion, but it is really helpful. Here (http://www.cs.nott.ac.uk/~wss/Publications/DataTypesALaCarte.pdf) is a very readable paper applying that category-theoretic treatment to the problem of defining extensible unions in a System-Fw typed language. Eventually I'll write up a version of my masters thesis as a journal article. The encoding we use for Dyna's Prolog-like terms is a hybridization of Swierstra's paper and this one (http://web.cecs.pdx.edu/~sheard/papers/generic.ps) by Tim Sheard. Tim is big on metaprogramming so his paper is higher-level than what I'd call a necessary understanding of recursion, but the two-level types idea —like the coproduct types of Swierstra— are tools that every programmer should have in their repertoire.

While the category theory jargon may be rocket surgery, I really don't think any of the rest of this stuff is.

Haskell: the language of pure thought

no subject

no subject