18 Dec 2008

winterkoninkje: shadowcrane (clean) (Default)

Joel on Software says somewhere that there are two things every programmer must understand to call themselves a computer scientist. The first: pointers, which can only be understood —in all their subtle horror— by learning C (or assembly). The second is recursion which can only really be learned from pure functional languages (or mathematical topology). Many imperative programmers think they understand recursion, but they don't. Lists are pretty. Trees are cute. Binary is pretty cute. But you don't understand recursion until you've been throwing bananas at bushes and convinced people it makes sense.

Today I present a job problem a coworker thought was impossible. It's a run of the mill problem, says the Haskeller, but it highlights the extent to which Java and imperative thinking cloud the simple mathematical problem and the equally simple functional solution. Consider for the moment that you're writing a program to do parsing-based machine translation. You have a bunch of CFG-like rules. Somewhere in your program you have a function that takes, for each non-terminal position, a list of all parses which can produce the given non-terminal, and you want to try all possible ways of completing each rule. For grammars restricted to only having at most two nonterminals per rule, your function looks something like this:

public void completeCell(a bunch of arguments,
                         ArrayList<Rule> rules,
                         ArrayList<ArrayList<Parse>> allParses) {
    for (Rule r : rules) {
        if (r.arity == 1) {
            for(Parse p : allParses.get(0)) {
                ArrayList<Parse> antecedents = new ArrayList<Parse>();
                antecedents.add(p);
                doCrazyStuff(a bunch of arguments, antecedents);
            }
        } else if (r.arity == 2) {
            for(Parse p0 : allParses.get(0)) {
            for(Parse p1 : allParses.get(1)) {
                ArrayList<Parse> antecedents = new ArrayList<Parse>();
                antecedents.add(p0);
                antecedents.add(p1);
                doCrazyStuff(a bunch of arguments, antecedents);
            }
            }
        } else {
            System.crash("ohnoes, we can only do two!");
        }
}

Did you read all that? Neither did I. But now it's your job to generalize this function so that it will work for rules with an arbitrary number of nonterminals. Obviously, adding extra conditionals with additional loops can only ever get so far. I bring up that the problem domain involves formal language theory, not just because it does, but because it gives us a much cleaner way to think about the problem without code bloat obscuring our vision. Consider the following class of formal languages: someone hands you a sequence of sets of letters, and you must enumerate all sequences of letters consistent with it (that is, the letter at position X is an element of the set at position X, for all X). In particular, your algorithm for enumerating these sequences must work no matter how long the (finite) sequence of (finite) sets is. These are exactly the same problem. All that fluff about what you do with the sequences after you've created them is just that. Fluff. Ditto for what the letters look like inside and why you'd want to do this in the first place. But given this simple problem enumerate :: Seq (Set a) -> Set (Seq a), do you see what the answer is?

See if you can work it out. )

April 2019

S M T W T F S
 123456
78910111213
14151617181920
212223242526 27
282930    

Tags

Page generated 1 Jun 2025 05:40 pm
Powered by Dreamwidth Studios