winterkoninkje: shadowcrane (clean) (Default)

Last time I talked a bit about ABTs; in particular, I introduced the notion of strongly-typed ABTs (or "GABTs" if you prefer) and showed how we can extend the basic idea of ABTs to guarantee well-typedness in addition to well-aritiedness. However, I also made a note that ensuring this sort of well-typedness runs counter to what Neel and other CMUers often do. One of my colleagues here at IU noticed the reason, so I thought I'd write a bit more about it.

The issue at stake here is how general we can make our ABT library, to minimize the amount of boilerplate needed whenever inventing a new language. By encoding object-language type systems into the kinding of the ABT, we restrict the the possible object languages we can use the ABT implementation for (namely those object languages with type systems that can be embedded into whatever kinding the ABT has). To put a finer point on it, using the kinds presented in the previous post you cannot have binders in your type system. (Edit 2016.02.29: actually the details are more complicated.) This means no System F, and no dependent types. This is unfortunate as the whole point of ABTs is to capture binding structure once and for all!

However, I'd like to reiterate that, for our purposes in Hakaru this limitation is no restriction. Hakaru is simply-typed, so there are no type-level binders in sight. Moreover, we do a lot of program transformations in Hakaru. By using GABTs we can have GHC verify that our program transformations will never produce Hakaru code which is ill-typed, and that our program transformations will always produce Hakaru code of an appropriate type (e.g., the same type as the input term, for things like partial evaluation; but we have a number of type-changing transformations too). Thus, even though our GABT library could not be reused for implementing languages with type-level binders, it still provides a substantial benefit for those languages without type-level binders.

Although our GABTs cannot handle type-level binders, that does not mean we're restricted to only working with simply typed languages. For example, intersection types are not usually thought of as "simple types"; but they do not require binders and so they're fine. More generally, Larry Moss is engaged in a research project where he asks, "given infinite time, how far could Aristotle have gotten in logic?" By which he means, given the Aristotelian restriction to syllogistic logics (i.e., ones without the quantifiers introduced by Frege), what are the limits in what we can cover? It turns out that we can cover quite a lot. Some syllogistic logics go beyond the power of the "Peano–Frege" boundary: they can handle comparing cardinality of sets! A good pictorial summary of this line of research is on slide 2 of this talk; and a bit more about the complexity results is given in this talk (the requisite picture is on slide 39).


Edit 2016.02.29: In actuality, there's nothing inherent in type theory that prohibits having type-level binders for our object language; it's a limitation in GHC. In particular, GHC doesn't allow lifting GADTs into data kinds. If we could lift GADTs, then we could simply use ABTs to define the syntax of object-language type expressions, and lift those to serve as the type indices for using ABTs to define the syntax of object-language term expressions. This stratified approach is sufficient to handle System F and any other non-dependent quantifiers. To go further and handle dependent quantifiers as well, we'd also need to be able to define the object-language's terms and types in a mutually inductive way.

winterkoninkje: shadowcrane (clean) (Default)

Edit 2015.10.29: Be sure to also read the followup post on the benefits and limitations of this approach compared to the usual untyped ABTs.

Earlier this year Neel Krishnaswami talked about abstract binding trees (ABTs) [part 1] [part 2]. IMO, the best way to think about ABTs is as a generalization of abstract syntax trees (ASTs), though this is not a perspective sanctioned by the CMUers I’ve talked to. CMUers oppose this way of phrasing things, in part, because the ABT libraries they’re familiar with make crucial use of the design pattern of two-level types; but I think the essential insights of ABTs and two-level types are quite different, and we ought to keep the benefits of these two techniques distinct.

Over the past year I’ve been working on the inferential language1 Hakaru, and in the new version of the compiler we’re using ABTs for our syntax trees. However, contrary to Neel’s stance against using strongly-typed internal representations for syntax, we extend the ABT approach to make use of GADTs to guarantee local well-typedness— since this in turn can be used to guarantee that program transformations are also well-typed. (If you don’t want those guarantees, then take a look at Jon Sterling’s abt library on Hackage2.) In this post I’m going to present a simplified version of our architecture, and then talk about some of the extra stuff bringing it closer to our production architecture.

First things first

Since we want everything to be well-typed, we first must introduce some universe, U, of all the types in our language. (In Haskell we can implement such a universe by using the -XDataKinds extension, so I’ll equivocate between calling U a “universe” vs a “kind”.) For the rest of this post it doesn’t actually matter what lives in that universe3, just so long as things match up when they need to. Since the choice of universe is irrelevant, we could abstract over U by turning on the -XPolyKinds extension; but I avoid doing so below, just to help keep things more concrete.

Implementing ASTs

The simplest way of thinking about well-typed ASTs is that they capture the set of terms generated by a (typed) signature; that is, the fixed point of some Σ [U] U . Unpacking the type for Σ, we have that every syntactic constructor sΣ is associated with some arity (the length of the list), each argument to s has some type in U (the elements of the list), and applying s to the right number of ASTs of the right types will generate a new AST with some type in U (the second argument to Σ).

To implement this fixed point we define an AST type which is parameterized by its signature. To ensure well-aritiedness (and well-typedness) of our ASTs with respect to that signature, we’ll need to introduce a helper type SArgs4. And to ensure that we obtain the least fixed-point of the signature, we’ll make everything strict.

infix  4 :$
infixr 5 :*

data SArgs  (U  )  [U]   where
    End   SArgs ast []

    (:*)  !(ast u)
          !(SArgs ast us)
          SArgs ast (u : us)

data AST  ([U]  U  )  U   where
    (:$)  !(σ us u)
          !(SArgs (AST σ) us)
          AST σ u

Implementing ABTs

The problem with ASTs is that they have no notion of variables, and thus have no notion of variable binding. Naively we could implement binders like lambda-abstraction by having something like λ Σ [u, v] (u :→ v) but then we’d need to do a post-hoc check to ensure that the first argument to λ is in fact a variable. To build that check into the datatype itself we’d have to move λ into the definition of AST (since the first argument is of type Variable u rather than AST Σ u). If lambda-abstraction were the only binder we had, that might not be so bad; but any real-world language has a plethora of binders, and this approach doesn’t scale.

The essential idea behind ABTs is to abstract over the notion of binding itself. Given a single uniform definition of what it means to be a binding form, we don’t have to worry about adding a bunch of ad-hoc constructors to our AST datatype. Moreover, we can then provide single uniform definitions for things which mess with variables and are homomorphic over the signature. Things like capture-avoiding substitution and providing a HOAS API for our first-order representation.

The crucial step is to adjust our notion of what a signature contains. The basic signatures used above only contained applicative forms; i.e., things we can apply to locally-closed terms; i.e., what are called “functors” in the logic programming community. For ABTs we’ll want to allow our signatures to include any generalized quantifier. That is, our signatures will now be of type Σ [[U] × U] U . Previously, the arguments were indexed by U; now, they’re indexed by [U] × U. The length of the list gives the number of variables being bound, the types in the list give the types of those variables, and the second component of the pair gives the type of the whole locally-open expression.

To implement this we need to extend our syntax tree to include variable bindings and variable uses:

data SArgs  ([U]  U  )  [[U] × U]   where
    End   SArgs abt []

    (:*)  !(abt vs u)
          !(SArgs abt vus)
          SArgs abt ((vs,u) : vus)

data ABT  ([[U] × U]  U  )  [U]  U   where
    (:$)  !(σ vus u)
          !(SArgs (ABT σ) vus)
          ABT σ [] u

    Var   !(Variable v)
          ABT σ [] v

    Bind  !(Variable v)
          !(ABT σ vs u)
          ABT σ (v : vs) u

Time for an example of how this all fits together. To add lambda-abstraction to our language we’d have λ Σ [([u],v)] (u :→ v): that is, the λ constructor takes a single argument which is a locally-open term, binding a single variable of type u, and whose body has type v. So given some x Variable u and e ABT Σ [] v we’d have the AST (λ :$ Bind x e :* End) ABT Σ [] (u :→ v).

“Local” vs “global” well-typedness

With the ABT definition above, every term of type ABT Σ vs u must be locally well-typed according to the signature Σ. I keep saying “locally” well-typed because we only actually keep track of local binding information. This is an intentional design decision. But only tracking local well-typedness does have some downsides.

So what are the downsides? Where could things go wrong? Given a locally-closed term (i.e., either Var x or f :$ e) any free variables that occur inside will not have their U-types tracked by Haskell’s type system. This introduces some room for the compiler writer to break the connection between the types of a variable’s binder and its use. That is, under the hood, every variable is represented by some unique identifier like an integer or a string. Integers and strings aren’t U-indexed Haskell types, thus it’s possible to construct a Variable u and a Variable v with the same unique identifier, even though u and v differ. We could then Bind the Variable u but Var the Variable v. In order to ensure global well-typedness we need to ensure this can’t happen.

One way is to keep track of global binding information, as we do in the paper presentation of languages. Unfortunately, to do this we’d need to teach Haskell’s typechecker about the structural rules of our language. Without a type-level implementation of sets/maps which respects all the axioms that sets/maps should, we’d be forced to do things like traverse our ASTs and rebuild them identically, but at different type indices. This is simply too hairy to stomach. Implementing the axioms ourselves is doubly so.

Or we could fake it, using unsafeCoerce to avoid the extraneous traversals or the complicated pattern matching on axioms. But doing this we’d erase all guarantees that adding global binding information has to offer.

A third approach, and the one we take in Hakaru, is compartmentalize the places where variables can be constructed. The variable generation code must be part of our trusted code base, but unlike the unsafeCoerce approach we can keep all the TCB code together in one spot rather than spread out across the whole compiler.

Stratifying our data types

The above definition of ABTs is a simplified version of what we actually use in Hakaru. For example, Hakaru has user-defined algebraic data types, so we also need case analysis on those data types. Alas, generic case analysis is not a generalized quantifier, thus we cannot implement it with (:$). We could consider just adding case analysis to the ABT definition, but then we’d start running into extensibility issues again. Instead, we can break the ABT type apart into two types: one for capturing variable uses and bindings, and the other for whatever syntax we can come up with. Thus,

data Syntax  ([[U] × U]  U  )  ([U]  U  )  U   where
    (:$)  !(σ vus u)
          !(SArgs abt vus)
          Syntax σ abt u

data ABT  ([U]  U  )  [U]  U   where
    Syn   !(Syntax σ (ABT σ) u)
          ABT σ [] u

    Var   !(Variable v)
          ABT σ [] v

    Bind  !(Variable v)
          !(ABT σ vs u)
          ABT σ (v : vs) u

Of course, since we’re going to be extending Syntax with all our language-specific details, there’s not a whole lot of benefit to parameterizing over σ. Thus, we can simplify the types considerably by just picking some concrete Σ to plug in for σ.

By breaking Syntax apart from ABT we can now extend our notion of syntax without worrying about the details of variable binding (which can be defined once and for all on ABT). But we could still run into extensibility issues. In particular, often we want to separate the fixed-point portion of recursive types from their generating functor so that we can do things like add annotations at every node in the recursive data type. A prime example of such annotations is keeping track of free variables, as in Neel’s original post. To allow this form of extensibility we need to break up the ABT type into two parts: the recursion, and the Syn/Var/Bind view of the ABT.

data ABT  ([U]  U  )  [U]  U   where
    Unview  !(View σ (ABT σ) vs u)  ABT σ vs u

view  ABT σ vs u  View σ (ABT σ) vs u
view (Unview e) = e

data View  ([U]  U  )  [U]  U   where
    Syn   !(Syntax σ abt u)
          View σ abt [] u

    Var   !(Variable v)
          View σ abt [] v

    Bind  !(Variable v)
          !(View σ abt vs u)
          View σ abt (v : vs) u

Now, to allow arbitrary annotations we’ll replace the data type ABT with an equivalent type class. Each instance of the ABT class defines some sort of annotations, and we can use the view and unview methods to move between the instance and the concrete View type.

There’s one last form of extensibility we may want to add. Using fixed point combinators gives us a way of describing complete trees. A different way of introducing recursion is with free monads. The free-monad combinator is just like the fixed-point combinator, except that we have an additional type parameter for metavariables and we have a data constructor for using those metavariables instead of requiring the recursion to ground out with a complete syntax tree. The reasons why this might be nice to do are beyond the scope of this post, but the point is we might want to do that so we need to split the ABT class into two parts: one for the recursion itself, and another for the annotations.

In the end, we have a four-level type: the Syntax, the View, the annotations, and the recursion.


[1] In the accepted/current parlance, Hakaru is a “probabilistic programming language”; but I and a number of other folks working on such languages have become disaffected with that term of late, since it’s not entirely clear what should and should not count as a “probabilistic” PL. Over the course of a number of discussions on the topic, I’ve settled on “inferential” PL as describing what is (or, at least what I find) interesting about “probabilistic” PL. I’ve been meaning to write a post about the subject, and hopefully this footnote will remind me to do so.

[2] N.B., the indexing used in that package is what we get if we erase/compactify the universe U. That is: the erasure of U is a singleton set; the erasure of [U] is isomorphic to the Peano numbers; the erasure of [[U] × U] is isomorphic to a list of Peano numbers; etc.

[3] Though at one point I assume we have functions, (:→), just for the sake of an example.

[4] Ideally we’d be able to flatten this type to avoid all the overhead of the linked list implementation. In fact, the entire AST node of (:$) together with its SArgs should be flattened. These nodes have the same general layout as the heap objects in the STG machine: a record with a pointer to the data constructor (i.e., element of the signature) followed by an appropriate number of arguments; and so in principle we ought to be able to implement them directly with a single STG heap object.

winterkoninkje: shadowcrane (clean) (Default)

Quoth Paul Taylor:

This result is folklore, which is a technical term for a method of publication in category theory. It means that someone sketched it on the back of an envelope, mimeographed it (whatever that means) and showed it to three people in a seminar in Chicago in 1973, except that the only evidence that we have of these events is a comment that was overheard in another seminar at Columbia in 1976. Nevertheless, if some younger person is so presumptuous as to write out a proper proof and attempt to publish it, they will get shot down in flames.

winterkoninkje: shadowcrane (clean) (Default)

I've mentioned these a few times in different places, most recently on Reddit. So I figured I should repost them here for better googleability for folks just starting to learn about dependent types and type theory.

PSA #1: Beware: "type theory" and "Type Theory" are not the same thing, note the caps. The former describes the entire field of enquiry which explores possible theories about types; the latter is the name of one particular theory/system, namely the one Per Martin-Löf popularized (and others have extended and reinvented since then).

Yes, this is an extremely obnoxious detail, but it's one that often confuses newcomers; e.g., learning MLTT and thinking that applies to all type systems, or learning other type systems and then being confused when people make flagrantly false statements (when interpreted as statements about type theory as a whole, though they're true of MLTT). This is why I prefer to refer to Type Theory as "TT" or "MLTT", to avoid confusion, and because I am interested in the entire field of enquiry and in comparing different systems rather than focusing on just one.

ObTangent: There are similar reasons for why ML is called "ML" instead of "metalanguage".

PSA #2: There are conflicting meanings for the terms "sum" and "product" in dependent type land. Those coming from category theory and functional programming tend to say "sum" to mean tagged unions, "product" to mean pairs (e.g, cartesian products or similar), and "exponentials" to mean functions as objects— all of these exactly as in non-dependent languages. However, those coming more from the set-theory side of things use "product" to mean functions (whence the capital Pi), "sum" to mean pairs (whence the capital Sigma), and have no common term for unions.

RSS Atom

April 2017

S M T W T F S
      1
2 345678
9101112131415
161718192021 22
23242526272829
30      

Tags

Page generated 25 Apr 2017 08:34 am
Powered by Dreamwidth Studios