winterkoninkje: shadowcrane (clean) (Default)

When someone comes to you and says,

“hey, I don’t understand what you mean. Because sometimes you say A, and sometimes you say B. (Where A and B are mutually exclusive.) So what’s going on?”

It is not useful to say, “I’ve never changed my mind on this matter”. Even if that’s true, there’s a communication problem and whatever it is you think is not being conveyed.

It is not useful to say, “whenever I say something that doesn’t make sense you should ask”. That person is —at this very second— asking!

It is not useful to say, “I’m sorry you misunderstood”. Because no, you are not sorry in the slightest, you are a fucking jackass. When someone is asking you for clarification, blaming them for your failure to communicate does nothing to clarify matters. Moreover, words like these are an attack. That person was trying to bring up the existence of a communication failure, and because you are unwilling to accept responsibility for the fact that communication is a two-party process, you instead lash out and blame the person. The fact that you are incapable of accepting responsibility is one thing. But attacking the person for bringing the matter up for discussion says that such matters must never be brought up for discussion. This is how you construct a toxic environment.

winterkoninkje: shadowcrane (clean) (Default)

Last time I talked a bit about ABTs; in particular, I introduced the notion of strongly-typed ABTs (or "GABTs" if you prefer) and showed how we can extend the basic idea of ABTs to guarantee well-typedness in addition to well-aritiedness. However, I also made a note that ensuring this sort of well-typedness runs counter to what Neel and other CMUers often do. One of my colleagues here at IU noticed the reason, so I thought I'd write a bit more about it.

The issue at stake here is how general we can make our ABT library, to minimize the amount of boilerplate needed whenever inventing a new language. By encoding object-language type systems into the kinding of the ABT, we restrict the the possible object languages we can use the ABT implementation for (namely those object languages with type systems that can be embedded into whatever kinding the ABT has). To put a finer point on it, using the kinds presented in the previous post you cannot have binders in your type system. (Edit 2016.02.29: actually the details are more complicated.) This means no System F, and no dependent types. This is unfortunate as the whole point of ABTs is to capture binding structure once and for all!

However, I'd like to reiterate that, for our purposes in Hakaru this limitation is no restriction. Hakaru is simply-typed, so there are no type-level binders in sight. Moreover, we do a lot of program transformations in Hakaru. By using GABTs we can have GHC verify that our program transformations will never produce Hakaru code which is ill-typed, and that our program transformations will always produce Hakaru code of an appropriate type (e.g., the same type as the input term, for things like partial evaluation; but we have a number of type-changing transformations too). Thus, even though our GABT library could not be reused for implementing languages with type-level binders, it still provides a substantial benefit for those languages without type-level binders.

Although our GABTs cannot handle type-level binders, that does not mean we're restricted to only working with simply typed languages. For example, intersection types are not usually thought of as "simple types"; but they do not require binders and so they're fine. More generally, Larry Moss is engaged in a research project where he asks, "given infinite time, how far could Aristotle have gotten in logic?" By which he means, given the Aristotelian restriction to syllogistic logics (i.e., ones without the quantifiers introduced by Frege), what are the limits in what we can cover? It turns out that we can cover quite a lot. Some syllogistic logics go beyond the power of the "Peano–Frege" boundary: they can handle comparing cardinality of sets! A good pictorial summary of this line of research is on slide 2 of this talk; and a bit more about the complexity results is given in this talk (the requisite picture is on slide 39).

Edit 2016.02.29: In actuality, there's nothing inherent in type theory that prohibits having type-level binders for our object language; it's a limitation in GHC. In particular, GHC doesn't allow lifting GADTs into data kinds. If we could lift GADTs, then we could simply use ABTs to define the syntax of object-language type expressions, and lift those to serve as the type indices for using ABTs to define the syntax of object-language term expressions. This stratified approach is sufficient to handle System F and any other non-dependent quantifiers. To go further and handle dependent quantifiers as well, we'd also need to be able to define the object-language's terms and types in a mutually inductive way.

winterkoninkje: shadowcrane (clean) (Default)

Edit 2015.10.29: Be sure to also read the followup post on the benefits and limitations of this approach compared to the usual untyped ABTs.

Earlier this year Neel Krishnaswami talked about abstract binding trees (ABTs) [part 1] [part 2]. IMO, the best way to think about ABTs is as a generalization of abstract syntax trees (ASTs), though this is not a perspective sanctioned by the CMUers I’ve talked to. CMUers oppose this way of phrasing things, in part, because the ABT libraries they’re familiar with make crucial use of the design pattern of two-level types; but I think the essential insights of ABTs and two-level types are quite different, and we ought to keep the benefits of these two techniques distinct.

Over the past year I’ve been working on the inferential language1 Hakaru, and in the new version of the compiler we’re using ABTs for our syntax trees. However, contrary to Neel’s stance against using strongly-typed internal representations for syntax, we extend the ABT approach to make use of GADTs to guarantee local well-typedness— since this in turn can be used to guarantee that program transformations are also well-typed. (If you don’t want those guarantees, then take a look at Jon Sterling’s abt library on Hackage2.) In this post I’m going to present a simplified version of our architecture, and then talk about some of the extra stuff bringing it closer to our production architecture.

First things first

Since we want everything to be well-typed, we first must introduce some universe, U, of all the types in our language. (In Haskell we can implement such a universe by using the -XDataKinds extension, so I’ll equivocate between calling U a “universe” vs a “kind”.) For the rest of this post it doesn’t actually matter what lives in that universe3, just so long as things match up when they need to. Since the choice of universe is irrelevant, we could abstract over U by turning on the -XPolyKinds extension; but I avoid doing so below, just to help keep things more concrete.

Implementing ASTs

The simplest way of thinking about well-typed ASTs is that they capture the set of terms generated by a (typed) signature; that is, the fixed point of some Σ [U] U . Unpacking the type for Σ, we have that every syntactic constructor sΣ is associated with some arity (the length of the list), each argument to s has some type in U (the elements of the list), and applying s to the right number of ASTs of the right types will generate a new AST with some type in U (the second argument to Σ).

To implement this fixed point we define an AST type which is parameterized by its signature. To ensure well-aritiedness (and well-typedness) of our ASTs with respect to that signature, we’ll need to introduce a helper type SArgs4. And to ensure that we obtain the least fixed-point of the signature, we’ll make everything strict.

infix  4 :$
infixr 5 :*

data SArgs  (U  )  [U]   where
    End   SArgs ast []

    (:*)  !(ast u)
          !(SArgs ast us)
          SArgs ast (u : us)

data AST  ([U]  U  )  U   where
    (:$)  !(σ us u)
          !(SArgs (AST σ) us)
          AST σ u

Implementing ABTs

The problem with ASTs is that they have no notion of variables, and thus have no notion of variable binding. Naively we could implement binders like lambda-abstraction by having something like λ Σ [u, v] (u :→ v) but then we’d need to do a post-hoc check to ensure that the first argument to λ is in fact a variable. To build that check into the datatype itself we’d have to move λ into the definition of AST (since the first argument is of type Variable u rather than AST Σ u). If lambda-abstraction were the only binder we had, that might not be so bad; but any real-world language has a plethora of binders, and this approach doesn’t scale.

The essential idea behind ABTs is to abstract over the notion of binding itself. Given a single uniform definition of what it means to be a binding form, we don’t have to worry about adding a bunch of ad-hoc constructors to our AST datatype. Moreover, we can then provide single uniform definitions for things which mess with variables and are homomorphic over the signature. Things like capture-avoiding substitution and providing a HOAS API for our first-order representation.

The crucial step is to adjust our notion of what a signature contains. The basic signatures used above only contained applicative forms; i.e., things we can apply to locally-closed terms; i.e., what are called “functors” in the logic programming community. For ABTs we’ll want to allow our signatures to include any generalized quantifier. That is, our signatures will now be of type Σ [[U] × U] U . Previously, the arguments were indexed by U; now, they’re indexed by [U] × U. The length of the list gives the number of variables being bound, the types in the list give the types of those variables, and the second component of the pair gives the type of the whole locally-open expression.

To implement this we need to extend our syntax tree to include variable bindings and variable uses:

data SArgs  ([U]  U  )  [[U] × U]   where
    End   SArgs abt []

    (:*)  !(abt vs u)
          !(SArgs abt vus)
          SArgs abt ((vs,u) : vus)

data ABT  ([[U] × U]  U  )  [U]  U   where
    (:$)  !(σ vus u)
          !(SArgs (ABT σ) vus)
          ABT σ [] u

    Var   !(Variable v)
          ABT σ [] v

    Bind  !(Variable v)
          !(ABT σ vs u)
          ABT σ (v : vs) u

Time for an example of how this all fits together. To add lambda-abstraction to our language we’d have λ Σ [([u],v)] (u :→ v): that is, the λ constructor takes a single argument which is a locally-open term, binding a single variable of type u, and whose body has type v. So given some x Variable u and e ABT Σ [] v we’d have the AST (λ :$ Bind x e :* End) ABT Σ [] (u :→ v).

“Local” vs “global” well-typedness

With the ABT definition above, every term of type ABT Σ vs u must be locally well-typed according to the signature Σ. I keep saying “locally” well-typed because we only actually keep track of local binding information. This is an intentional design decision. But only tracking local well-typedness does have some downsides.

So what are the downsides? Where could things go wrong? Given a locally-closed term (i.e., either Var x or f :$ e) any free variables that occur inside will not have their U-types tracked by Haskell’s type system. This introduces some room for the compiler writer to break the connection between the types of a variable’s binder and its use. That is, under the hood, every variable is represented by some unique identifier like an integer or a string. Integers and strings aren’t U-indexed Haskell types, thus it’s possible to construct a Variable u and a Variable v with the same unique identifier, even though u and v differ. We could then Bind the Variable u but Var the Variable v. In order to ensure global well-typedness we need to ensure this can’t happen.

One way is to keep track of global binding information, as we do in the paper presentation of languages. Unfortunately, to do this we’d need to teach Haskell’s typechecker about the structural rules of our language. Without a type-level implementation of sets/maps which respects all the axioms that sets/maps should, we’d be forced to do things like traverse our ASTs and rebuild them identically, but at different type indices. This is simply too hairy to stomach. Implementing the axioms ourselves is doubly so.

Or we could fake it, using unsafeCoerce to avoid the extraneous traversals or the complicated pattern matching on axioms. But doing this we’d erase all guarantees that adding global binding information has to offer.

A third approach, and the one we take in Hakaru, is compartmentalize the places where variables can be constructed. The variable generation code must be part of our trusted code base, but unlike the unsafeCoerce approach we can keep all the TCB code together in one spot rather than spread out across the whole compiler.

Stratifying our data types

The above definition of ABTs is a simplified version of what we actually use in Hakaru. For example, Hakaru has user-defined algebraic data types, so we also need case analysis on those data types. Alas, generic case analysis is not a generalized quantifier, thus we cannot implement it with (:$). We could consider just adding case analysis to the ABT definition, but then we’d start running into extensibility issues again. Instead, we can break the ABT type apart into two types: one for capturing variable uses and bindings, and the other for whatever syntax we can come up with. Thus,

data Syntax  ([[U] × U]  U  )  ([U]  U  )  U   where
    (:$)  !(σ vus u)
          !(SArgs abt vus)
          Syntax σ abt u

data ABT  ([U]  U  )  [U]  U   where
    Syn   !(Syntax σ (ABT σ) u)
          ABT σ [] u

    Var   !(Variable v)
          ABT σ [] v

    Bind  !(Variable v)
          !(ABT σ vs u)
          ABT σ (v : vs) u

Of course, since we’re going to be extending Syntax with all our language-specific details, there’s not a whole lot of benefit to parameterizing over σ. Thus, we can simplify the types considerably by just picking some concrete Σ to plug in for σ.

By breaking Syntax apart from ABT we can now extend our notion of syntax without worrying about the details of variable binding (which can be defined once and for all on ABT). But we could still run into extensibility issues. In particular, often we want to separate the fixed-point portion of recursive types from their generating functor so that we can do things like add annotations at every node in the recursive data type. A prime example of such annotations is keeping track of free variables, as in Neel’s original post. To allow this form of extensibility we need to break up the ABT type into two parts: the recursion, and the Syn/Var/Bind view of the ABT.

data ABT  ([U]  U  )  [U]  U   where
    Unview  !(View σ (ABT σ) vs u)  ABT σ vs u

view  ABT σ vs u  View σ (ABT σ) vs u
view (Unview e) = e

data View  ([U]  U  )  [U]  U   where
    Syn   !(Syntax σ abt u)
          View σ abt [] u

    Var   !(Variable v)
          View σ abt [] v

    Bind  !(Variable v)
          !(View σ abt vs u)
          View σ abt (v : vs) u

Now, to allow arbitrary annotations we’ll replace the data type ABT with an equivalent type class. Each instance of the ABT class defines some sort of annotations, and we can use the view and unview methods to move between the instance and the concrete View type.

There’s one last form of extensibility we may want to add. Using fixed point combinators gives us a way of describing complete trees. A different way of introducing recursion is with free monads. The free-monad combinator is just like the fixed-point combinator, except that we have an additional type parameter for metavariables and we have a data constructor for using those metavariables instead of requiring the recursion to ground out with a complete syntax tree. The reasons why this might be nice to do are beyond the scope of this post, but the point is we might want to do that so we need to split the ABT class into two parts: one for the recursion itself, and another for the annotations.

In the end, we have a four-level type: the Syntax, the View, the annotations, and the recursion.

[1] In the accepted/current parlance, Hakaru is a “probabilistic programming language”; but I and a number of other folks working on such languages have become disaffected with that term of late, since it’s not entirely clear what should and should not count as a “probabilistic” PL. Over the course of a number of discussions on the topic, I’ve settled on “inferential” PL as describing what is (or, at least what I find) interesting about “probabilistic” PL. I’ve been meaning to write a post about the subject, and hopefully this footnote will remind me to do so.

[2] N.B., the indexing used in that package is what we get if we erase/compactify the universe U. That is: the erasure of U is a singleton set; the erasure of [U] is isomorphic to the Peano numbers; the erasure of [[U] × U] is isomorphic to a list of Peano numbers; etc.

[3] Though at one point I assume we have functions, (:→), just for the sake of an example.

[4] Ideally we’d be able to flatten this type to avoid all the overhead of the linked list implementation. In fact, the entire AST node of (:$) together with its SArgs should be flattened. These nodes have the same general layout as the heap objects in the STG machine: a record with a pointer to the data constructor (i.e., element of the signature) followed by an appropriate number of arguments; and so in principle we ought to be able to implement them directly with a single STG heap object.

winterkoninkje: shadowcrane (clean) (Default)

For all you local folks, I'll be giving a talk about my dissertation on November 5th at 4:00–5:00 in Ballantine Hall 011. For those who've heard me give talks about it before, not much has changed since NLCS 2013. But the majority of current CL/NLP, PL, and logic folks haven't seen the talk, so do feel free to stop by.

Abstract: Many natural languages allow scrambling of constituents, or so-called "free word order". However, most syntactic formalisms are designed for English first and foremost. They assume that word order is rigidly fixed, and consequently these formalisms cannot handle languages like Latin, German, Russian, or Japanese. In this talk I introduce a new calculus —the chiastic lambda-calculus— which allows us to capture both the freedoms and the restrictions of constituent scrambling in Japanese. In addition to capturing these syntactic facts about free word order, the chiastic lambda-calculus also captures semantic issues that arise in Japanese verbal morphology. Moreover, chiastic lambda-calculus can be used to capture numerous non-linguistic phenomena, such as: justifying notational shorthands in category theory, providing a strong type theory for programming languages with keyword-arguments, and exploring metatheoretical issues around the duality between procedures and values.

Edit 2014.11.05: The slides from the talk are now up.

winterkoninkje: shadowcrane (clean) (Default)

Gershom Bazerman gave some excellent advice for activism and teaching. His focus was on teaching Haskell and advocating for Haskell, but the advice is much more widely applicable and I recommend it to anyone interested in activism, social justice, or education. The piece has garnered a good deal of support on reddit— but, some people have expressed their impression that Gershom's advice is targeting a theoretical or future problem, rather than a very concrete and very contemporary one. I gave a reply there about how this is indeed a very real issue, not a wispy one out there in the distance. However, I know that a lot of people like me —i.e., the people who bear the brunt of these problems— tend to avoid reddit because it is an unsafe place for us, and I think my point is deserving of a wider audience. So I've decided to repeat it here:

This is a very real and current problem. (Regardless of whether things are less bad in Haskell communities than in other programming communities.) I used to devote a lot of energy towards teaching folks online about the ideas behind Haskell. However, over time, I've become disinclined to do so as these issues have become more prevalent. I used to commend Haskell communities for offering a safe and welcoming space, until I stopped feeling quite so safe and welcomed myself.

I do not say this to shame anyone here. I say it as an observation about why I have found myself pulling away from the Haskell community over time. It is not a deliberate act, but it is fact all the same. The thing is, if someone like me —who supports the ideology which gave rise to Haskell, who is well-educated on the issues at hand, who uses Haskell professionally, who teaches Haskell professionally, and most importantly: who takes joy in fostering understanding and in building communities— if someone like me starts instinctively pulling away, that's a problem.

There are few specific instances where I was made to feel unsafe directly, but for years there has been a growing ambiance which lets me know that I am not welcome, that I am not seen as being part of the audience. The ambiance (or should I say miasma?) is one that pervades most computer science and programming/tech communities, and things like dogmatic activism, dragon slaying, smarter-than-thou "teaching", anti-intellectualism, hyper-intellectualism, and talking over the people asking questions, are all just examples of the overarching problem of elitism and exclusion. The problem is not that I personally do not feel as welcomed as I once did, the problem is that many people do not feel welcome. The problem is not that my experience and expertise are too valuable to lose, it's that everyone's experience and expertise is too valuable to lose. The problem is not that I can't teach people anymore, it's that people need teachers and mentors and guides. And when the tenor of conversation causes mentors and guides to pull away, causes the silencing of experience and expertise, causes the exclusion and expulsion of large swaths of people, that always has an extremely detrimental impact on the community.

winterkoninkje: shadowcrane (clean) (Default)

This summer I've been working on optimizing compilation for a linear algebra DSL. This is an extension of Jeremy Siek's work on Built-to-Order BLAS functions. Often times it's more efficient to have a specialized function which fuses two or more BLAS functions. The idea behind BTO is that we'd like to specify these functions at a high level (i.e., with liner algebra expressions) and then automatically perform the optimizing transformations which have made BLAS such a central component of linear algebra computations.

The current/prior version of BTO already handles loop fusion, memory bandwidth constraints, and more. However, it is not currently aware any high-level algebraic laws such as the fact that matrix multiplication is associative, addition is associative and commutative, transposition reverse-distributes over multiplication, etc. My goal is to make it aware of these sort of things.

Along the way, one thing to do is solve the chain multiplication problem: given an expression like ∏[x1,x2...xN] figure out the most efficient associativity for implementing it via binary multiplication. The standard solution is to use a CKY-like dynamic programming algorithm to construct a tree covering the sequence [x1,x2...xN]. This is easy to implement, but it takes O(n^3) time and O(n^2) space.

I found a delicious alternative algorithm which solves the problem in O(n*log n) time and O(n) space! The key to this algorithm is to view the problem as determining a triangulation of convex polygons. That is, we can view [x0,x1,x2...xN] as the edges of a convex polygon, where x0 is the result of computing ∏[x1,x2...xN]. This amazing algorithm is described in the tech report by Hu and Shing (1981a), which includes a reference implementation in Pascal. Unfortunately the TR contains a number of typos and typesetting issues, but it's still pretty legible. A cleaner version of Part I is available here. And pay-walled presumably-cleaner versions of Part I and Part II are available from SIAM.

Hu and Shing (1981b) also have an algorithm which is simpler to implement and returns a heuristic answer in O(n) time, with the error ratio bounded by 15%. So if compile times are more important than running times, then you can use this version as well. A pay-walled version of the article is available from Elsevier.

winterkoninkje: shadowcrane (clean) (Default)

Recently gelisam posted an article on a combinatorial approach to provably commutative binary operators, which has a decent discussion on reddit. Here, I'm going to generalize that idea.

Edit (2013-08-03T05:06:58Z): Okay, I should be done tweaking things now.

Edit (2013-08-03T06:02:40Z): See also the followup post.

Preliminary Definitions

Let A and B be arbitrary types, and let κ be any non-zero cardinal number (we only really care about finite cardinals, or maybe at most aleph-naught). Let the notation A^{κ} denote a multiset of elements drawn from A with cardinality exactly κ. And let the notation A^{<κ} denote a non-empty multiset of elements drawn from A with cardinality not greater than κ.

Commutative operators

A κ-commutative operator from A to B can be thought of as a function of type A^{κ} → B. For example:

  • The 1-commutative functions are all the functions of type A → B.
  • The 2-commutative functions are commutative binary functions, thus a subtype of A → A → B.
  • The 3-commutative functions are ternary functions which return the same result for all permutation orders of their arguments, thus a subtype of A → A → A → B.

A (<κ)-commutative operator from A to B can be thought of as a function of type A^{<κ} → B, i.e., a polymorphic function of type ∀λ. 0<λ<κ ⇒ A^{λ} → B. Thus, the (<κ)-commutative functions can be thought of as a family of λ-commutative functions, for all non-zero λ not greater than κ, satisfying suitable coherence conditions. However, for finite κ, a simpler representation is to think of them in terms of folds over non-empty multisets; that is, as 4-tuples:

(C, h:A→C, f:A→C→C, g:C→B)
    such that
    ∀a,a'∈A, c∈C. f a (f a' c) == f a' (f a c)
    ∀a,a'∈A.      f a (h a')   == f a' (h a)

Alternatively, if you are willing to include the case where λ=0, then use 4-tuples:

(C, c0:C, f:A→C→C, g:C→B)
    such that
    ∀a,a'∈A, c∈C. f a (f a' c) == f a' (f a c)

Quasi-unordering pairs

Gelisam's solution to the higher-order problem is to generalize unordered pairs to quasi-unordered pairs: which are, either an unordered pair or else some summary information about a pair which can't be unordered. The way we do this is to have an observation function we apply to each argument. If that observation function can distinguish the arguments, then we can pass the arguments to the function in a canonical order such that we do not betray the original order of the inputs. However, if we cannot distinguish the two arguments, then the best we can do is to return the observation we made on both of them— since, if we returned the actual arguments themselves then we'd betray their input order. Rather than hard-coding the choice of Bool as gelisam did, we can generalize to any totally ordered set of observations:

distinguish :: Ord r ⇒ (a → r) → Commutative a (Either r (a,a))
distinguish f = Commutative $ \ x y ↦
    let fx = f x
        fy = f y
    case compare fx fy of
    LT ↦ Right (x,y)
    EQ ↦ Left fx
    GT ↦ Right (y,x)

Quasi-unordering multisets

Now, we want to generalize this to multisets with any non-zero cardinality. Note, because the observations r are totally ordered, the observation function induces an almost-linear partial order on the original domain a. Assume we have some suitable abstract implementation of multisets: Confusion a, which implements a^{<κ} for some particular κ. Because the observation function induces an order on our original domain, there exists a function:

observe_0 :: Ord r ⇒ (a → r) → Confusion a → [Confusion a]
    such that
    ∀ r f xs0. isOrdered (observe_0 f xs0)
        isOrdered (xs:xss) =
            (∀x,x'∈xs. f x == f x')
            ⋀ case xss of
              []   ↦ True
              ys:_ ↦
                  (∀ x∈xs, y∈ys. f x < f y)
                  ⋀ isOrdered xss
That is, every element of the result is a multiset of elements which are equivalent under the observation function, and the elements of previous multisets precede the elements of subsequent multisets. That is, we can take the input multiset which is inherently unordered and return a nearly-linear DAG.

As written observe_0 doesn't get us that far, since the implementation of multisets is abstract, and all we've done is break one multiset up into a bunch of multisets. Now, assume our multisets support the function isSingleton :: Confusion a → Maybe a which tells us whether a multiset is a singleton or not, and gives us the element if it is. Using this, we can strengthen our observing function to:

observe_1 :: Ord r ⇒ (a → r) → Confusion a → [Either (Confusion a) a]
observe_1 f
    = map (\xs ↦ maybe (Left xs) Right (isSingleton xs))
    . observe_0 f

That gets us pretty much what we want. However, in order to gain the full functionality we had in the binary case, we'd like to have some way to eliminate the remaining multisets. Probably the most general way is to define the following function which also returns the shared observation for classes of inputs which are still confused:

observe_2 :: Ord r ⇒ (a → r) → Confusion a → [Either (r, Confusion a) a]
-- or, even better
observe_2' :: Ord r ⇒ (a → r) → Confusion a → [(r, Either (Confusion a) a)]
We could implement this in terms of observe_0, however, that means we'd be calling the observation function redundantly since observe_0 throws away the results. Thus, it's more efficient to just implement this version directly, in lieu of implementing observe_0 at all. The overall complexity will be the same and this version is strictly more powerful.

If we're returning the result of observing those confusion sets, then why bother returning the confusion set too? The reason is that this way we can apply our observing function again in order to further distinguish those confusion sets. Therefore, this version is complete for composition of observation functions whereas the version that discards the confusion set is not.

And, as a final note, we'd also want to ensure that our multisets support the operation size :: Confusion a → Cardinal; otherwise, we need to return a triple of the multiset, its size, and its observation. The binary case could avoid this detail since there all confusion sets must be of cardinality 2.

winterkoninkje: shadowcrane (clean) (Default)

For those who haven't heard, John Reynolds (of separation logic fame) just passed on. I don't have anything particular to add to what's mentioned on LtU and Neel's blog, but I recommend reading them both.

winterkoninkje: shadowcrane (clean) (Default)

So, a friend of mine wrote a scathing review of the ACM's recent refusal to open access. As he mentions, the ACM claims to be a non-profit organization with the purported mission of fostering the open interchange of information, and yet it refuses to open access because that would cut into the bottom line be "too hard". This is absurd when USENIX, ACL, NIPS, JMLR, etc are all open; to say nothing of the arxiv. If the ACM publicly admitted to being a for-profit organization that would be one thing. I'd still be upset with them, but at least they'd be honest. Until the ACM updates its out-of-date practices, I will not support them because they are not a professional organization that represents the ethical standards of the computer science community I am a part of. If you're also part of this community, then you can help tear down this paywall.

winterkoninkje: shadowcrane (clean) (Default)

Quoth Clifford Beshers:

As a test, ask your students to write down the definition of 'type'. I'll bet that their answers will be longer than 'set' and include some mention of bytes, words and big-endian. We were all trained to be mechanics instead of drivers, because all we had were go-carts.

Math envy

31 Dec 2012 02:22 am
winterkoninkje: shadowcrane (clean) (Default)

I have often derided those who are susceptible to math envy. Y'know, the idea that math=intelligence. This utter foolishness leads to the simultaneous fear and awe of anyone who throws math around, as if the presence of mere symbols and equations demonstrates the clear superiority of the author's throbbing, bulging,... intellect. This utter foolishness leads, therefore, to authors who feel the need to add superfluous "mathematics" to their writings in order to demonstrate that their... intelligence measures up that of their colleagues.

Well, turns out, someone finally got around to doing a study on math envy: Kimmo Ericksson (2012) "The nonsense math effect", Judgment and Decision Making 7(6). As expected, those with less training in mathematics tend to rate utterly irrelevant "mathematical content" more highly than its absence. Lest anyone start feeling smugly superior, however, I'll note that I've seen this effect most strongly in those who should know better, i.e., those with just a little mathematical training. This includes, for example, computer scientists who are not formal theoreticians. Not to name names, but I've read more than one NLP paper that throws in some worthless equation just to try to look more worthwhile. (These papers are often fine, in and of themselves, but would have been better had they not succumbed to math envy.)

As Language Log points out in their coverage, this isn't limited just to math. Some people also have brain-scan envy and similar afflictions. That's definitely worth watching out for, but IME people seem more aware of their pernicious effects while being blind to math envy.

winterkoninkje: shadowcrane (clean) (Default)

Apparently I messed up my link to the map of ring theory last time. It's fixed now, but the fact that noone commented on it means noone's reading this blog anymore (or those who do don't care). Ah well. Regardless, it's time to finish up this little series for now.

Anyone who's hung around me long enough knows that I'm a great fan of semirings. Partly this is because of their ubiquity in natural language processing, but mostly it's because they're the intersection of two of the most powerful and ubiquitous abstractions in mathematics. Semirings simultaneously generalize rings (by not necessarily having negation) and bounded lattices (by not necessarily being commutative nor having both operators distribute). Therefore, they generalize both arithmetic and ordering, two of the core components of mathematics. Also the fact that the natural numbers do not form a ring is a strong case for taking semirings seriously. I don't have a map this time, instead I have an extensive list of common examples. The examples fall into about half a dozen common patterns.

  • Natural number-like semirings
  • Integer-like rings
  • Tropical and Viterbi semirings
  • Bounded distributive lattices
  • Boolean rings
  • Regular languages
  • Square matrices over a semiring

I'm sure you're already familiar with the natural numbers, integers, etc., so I won't spend much time talking about them. The one thing worth mentioning here, though, is what the heck that last column is for. It's for *-semirings (i.e., closed semirings), not to be confused with *-(semi)rings (i.e., (semi)rings with involution). Jacques Carette introduced me and Russell O'Connor to them, and Russell has written a bit about them. Since he does a good job of explaining them, I won't talk about them just now. But, because they're a bit obscure, I made sure to indicate what the asteration operation is when it exists. The one especially notable point here is that the Alexandroff compactification of the reals is no longer a ring! (Another reason to take semirings seriously.) We give up the ring-ness of the reals to obtain closed-ness and compact-ness.

The tropical and Viterbi semirings are a sort of cross between arithmetic and ordering, and you've probably never heard of them before. They are, however, rather common in computer science. They tend to arise whenever we use dynamic programming in order to find the "best" of some set of things: e.g., the most likely tag sequence, the shortest path, the longest path, etc. This is what part of speech tagging, matrix chain multiplication, and Dijkstra's algorithm are all about. And recognizing that these are semirings is what leads to generalizations like forward–backward and inside–outside, to say nothing of the Gauss–Jordan–Floyd–Warshall–McNaughton–Yamada algorithm that comes from focusing on *-semirings.

The last major class I'll talk about are the bounded distributive lattices. The reason to discuss these is just to point out that this includes all Boolean algebras (hence classical logic and subset inclusion hierarchies), all Heyting algebras (hence intuitionistic logic), and all bounded totally ordered sets.

This chart is released under Creative Commons Attribution-ShareAlike 3.0. Any questions? Anything I missed? Anything that needs further explanation?

winterkoninkje: shadowcrane (clean) (Default)

I just came across this fabulous paper by Broder and Stolfi. It should be more widely publicized. EOM.

winterkoninkje: shadowcrane (clean) (Default)

For those who haven't heard, John McCarthy (1927–2011) (of Lisp and Artificial Intelligence fame) passed on last night.

Edit: And since I hadn't mentioned it, Dennis Ritchie (1941–2011) (of C and Unix fame) passed on a short while back.

winterkoninkje: shadowcrane (clean) (Default)

I've mentioned these a few times in different places, most recently on Reddit. So I figured I should repost them here for better googleability for folks just starting to learn about dependent types and type theory.

PSA #1: Beware: "type theory" and "Type Theory" are not the same thing, note the caps. The former describes the entire field of enquiry which explores possible theories about types; the latter is the name of one particular theory/system, namely the one Per Martin-Löf popularized (and others have extended and reinvented since then).

Yes, this is an extremely obnoxious detail, but it's one that often confuses newcomers; e.g., learning MLTT and thinking that applies to all type systems, or learning other type systems and then being confused when people make flagrantly false statements (when interpreted as statements about type theory as a whole, though they're true of MLTT). This is why I prefer to refer to Type Theory as "TT" or "MLTT", to avoid confusion, and because I am interested in the entire field of enquiry and in comparing different systems rather than focusing on just one.

ObTangent: There are similar reasons for why ML is called "ML" instead of "metalanguage".

PSA #2: There are conflicting meanings for the terms "sum" and "product" in dependent type land. Those coming from category theory and functional programming tend to say "sum" to mean tagged unions, "product" to mean pairs (e.g, cartesian products or similar), and "exponentials" to mean functions as objects— all of these exactly as in non-dependent languages. However, those coming more from the set-theory side of things use "product" to mean functions (whence the capital Pi), "sum" to mean pairs (whence the capital Sigma), and have no common term for unions.


12 Mar 2011 11:55 pm
winterkoninkje: shadowcrane (clean) (Default)

I find it terribly unfortunate how susceptible academics are to Not Invented Here syndrome. Especially in disciplines like computer science where one of the primary acts of research is the creation of artifacts, a great amount of time and money are wasted replicating free publicly available programs. Worse than the effort wasted constructing the initial artifact is the continuous supply of effort it takes to maintain and debug these copies of the original. It's no wonder that so much of academic software is unreliable, unmaintained, and usable only by the developing team.

It's reasons like this why I support the free/open-source development model, demonstrated in academic projects like Joshua and GHC. The strong infusion of real-world software engineering methodologies that come from designing reliable software in F/OSS and industry seems to be the only way to save academia from itself.

winterkoninkje: shadowcrane (clean) (Default)

It's a well known fact that many algorithms in NLP are the same algorithm, just parameterized by a different semiring. For instance, the Viterbi algorithm for getting the probability of the best path in an HMM uses the (max,+) semiring. The forward part of the forward-backward algorithm is the same algorithm, just using the (+,*) semiring. (And sometimes you want the backward part of Viterbi too.)

Today's moment of duh: The usual implementation of Viterbi, with a sparse map, is the just the Max semiring coalesced to the MaxPriority semiring, i.e. where we map Max minBound to Nothing in order to remove the extra bottom in the lattice. And the Viterbi algorithm with backpointers so that you can extract the best path is just the Arg Max semiring (or Args Max if you want to keep ties).

winterkoninkje: shadowcrane (clean) (Default)
All you hackers should read this. True to form, I came to hacking late, much despite a strong interest in mathematics as a child. (For those who may not be aware, mathematics does not have the same gender inequity problems CS does.) These sorts of privilege contests have always pissed me off, not just because of the machismo involved but also —though I did not have the words at the time— exactly because of their brandishing of white male privilege as virtuous and ideal.
winterkoninkje: shadowcrane (clean) (Default)

In formal treatments of lazy functional languages like Haskell, appeal is often made to domain theory. For those who aren't familiar, a domain is nothing more than a set equipped with an ordering relation that obeys certain laws. The ordering relation captures a notion of "informativeness" whereby "greater" elements are more defined or more known-about than "lesser" elements. One of the particular laws is that the domain must have an element, ⊥, which is less than all others (and therefore contains no information other than what domain you're in).

Another way to think about domain theory is from the context of logic programming. The domains we often encounter in functional programming are defined structurally. That is, given an algebraic data type, we can construct a domain for the type where the elements of the domain are generated by its initial algebra on a set of variables representing unknown substructure. Indeed, this is usually the domain we are most interested in. In this case, ⊥ is simply a free variable (and thus, is actually ⊤). If we treat the variables as logic variables, then the domain ordering is defined by the subsumption relation induced by unification (aka pattern matching). Thus, a domain is simply the set of all patterns we could write for some type, and the subsumption relation between those patterns.

In Haskell we often conflate "⊥" as a domain element we know nothing about, and "⊥" as a non-terminating computation. There are natural reasons for doing so, but it's unfortunate because it loses a critical distinction between what is already-known and what is knowable. This raises questions about how much of our domain-theoretic training we can carry over to total lazy languages, a topic I've become quite interested in lately.

The idea that non-termination and free variables are the same comes from the idea that if we try to case match on an unbound variable then we will deadlock, waiting for that variable to become bound to something in order to proceed. In a fully interpreted language this holds up fairly well. However, in languages which aren't fully interpreted (e.g. dependently typed languages, languages with non-ground terms, parallel data-flow languages) it doesn't hold up because variables serve multiple roles. Depending on the context, we will want to think of a variable as either being bound to no element of the domain, or as being bound nondeterministically to every element of the domain. The former matches our functional programming intuitions about non-termination, whereas the latter matches our logic programming intuitions about non-ground terms presenting the type of all the terms they subsume.

In particular, the idea of variables bound to everything allows us to consider the idea of a hypothetical input: an input about which we know nothing, but which we are permitted to inspect for more information. Case analysis on hypothetical variables is the same as reasoning by cases in proof theory. These are different from abstract variables, which are values about which we know nothing and which we are not permitted to inspect. Loosely speaking, hypothetical variables are those introduced by universal quantification, whereas abstract variables are those introduced by existential quantification. (However, existentially bound variables may still by hypothetically analyzable rather than being held abstract; it depends on the situation.)

In a total language we are not permitted to write computations that we cannot prove will terminate. Thus, we can never obtain a nonterminating 'value' inhabiting the bottom element of a domain. However, we can still use bottom to represent things we hold abstract or which we don't force the evaluation of. Thus, in a total lazy language, domain theory should still have a role in strictness analysis and certain other formal reasoning. Given a domain we can interpret it hypothetically, as we do when performing strictness analysis or reasoning by cases, or we can interpret it abstractly, as we do when performing termination analysis. This conflict feels very similar to the conflict between parametricity and dependency. I wonder if any work has been done on this domain theoretic duality?

winterkoninkje: shadowcrane (clean) (Default)

Joel on Software says somewhere that there are two things every programmer must understand to call themselves a computer scientist. The first: pointers, which can only be understood —in all their subtle horror— by learning C (or assembly). The second is recursion which can only really be learned from pure functional languages (or mathematical topology). Many imperative programmers think they understand recursion, but they don't. Lists are pretty. Trees are cute. Binary is pretty cute. But you don't understand recursion until you've been throwing bananas at bushes and convinced people it makes sense.

Today I present a job problem a coworker thought was impossible. It's a run of the mill problem, says the Haskeller, but it highlights the extent to which Java and imperative thinking cloud the simple mathematical problem and the equally simple functional solution. Consider for the moment that you're writing a program to do parsing-based machine translation. You have a bunch of CFG-like rules. Somewhere in your program you have a function that takes, for each non-terminal position, a list of all parses which can produce the given non-terminal, and you want to try all possible ways of completing each rule. For grammars restricted to only having at most two nonterminals per rule, your function looks something like this:

public void completeCell(a bunch of arguments,
                         ArrayList<Rule> rules,
                         ArrayList<ArrayList<Parse>> allParses) {
    for (Rule r : rules) {
        if (r.arity == 1) {
            for(Parse p : allParses.get(0)) {
                ArrayList<Parse> antecedents = new ArrayList<Parse>();
                doCrazyStuff(a bunch of arguments, antecedents);
        } else if (r.arity == 2) {
            for(Parse p0 : allParses.get(0)) {
            for(Parse p1 : allParses.get(1)) {
                ArrayList<Parse> antecedents = new ArrayList<Parse>();
                doCrazyStuff(a bunch of arguments, antecedents);
        } else {
            System.crash("ohnoes, we can only do two!");

Did you read all that? Neither did I. But now it's your job to generalize this function so that it will work for rules with an arbitrary number of nonterminals. Obviously, adding extra conditionals with additional loops can only ever get so far. I bring up that the problem domain involves formal language theory, not just because it does, but because it gives us a much cleaner way to think about the problem without code bloat obscuring our vision. Consider the following class of formal languages: someone hands you a sequence of sets of letters, and you must enumerate all sequences of letters consistent with it (that is, the letter at position X is an element of the set at position X, for all X). In particular, your algorithm for enumerating these sequences must work no matter how long the (finite) sequence of (finite) sets is. These are exactly the same problem. All that fluff about what you do with the sequences after you've created them is just that. Fluff. Ditto for what the letters look like inside and why you'd want to do this in the first place. But given this simple problem enumerate :: Seq (Set a) -> Set (Seq a), do you see what the answer is?

See if you can work it out. )
RSS Atom

June 2017

18192021 222324


Page generated 21 Oct 2017 05:20 pm
Powered by Dreamwidth Studios