### Abstract Binding Trees in Hakaru

20 Sep 2015 08:01 pm*Edit 2015.10.29: Be sure to also read the followup post on the benefits and limitations of this approach compared to the usual untyped ABTs.*

Earlier this year Neel Krishnaswami talked about abstract binding trees (ABTs) [part 1] [part 2]. IMO, the best way to think about ABTs is as a generalization of abstract syntax trees (ASTs), though this is not a perspective sanctioned by the CMUers I’ve talked to. CMUers oppose this way of phrasing things, in part, because the ABT libraries they’re familiar with make crucial use of the design pattern of two-level types; but I think the essential insights of ABTs and two-level types are quite different, and we ought to keep the benefits of these two techniques distinct.

Over the past year I’ve been working on the inferential language^{1} Hakaru, and in the new version of the compiler we’re using ABTs for our syntax trees. However, contrary to Neel’s stance against using strongly-typed internal representations for syntax, we extend the ABT approach to make use of GADTs to guarantee local well-typedness— since this in turn can be used to guarantee that program transformations are also well-typed. (If you don’t want those guarantees, then take a look at Jon Sterling’s abt library on Hackage^{2}.) In this post I’m going to present a simplified version of our architecture, and then talk about some of the extra stuff bringing it closer to our production architecture.

### First things first

Since we want everything to be well-typed, we first must introduce some universe, `U`

, of all the types in our language. (In Haskell we can implement such a universe by using the `-XDataKinds`

extension, so I’ll equivocate between calling `U`

a “universe” vs a “kind”.) For the rest of this post it doesn’t actually matter what lives in that universe^{3}, just so long as things match up when they need to. Since the choice of universe is irrelevant, we could abstract over `U`

by turning on the `-XPolyKinds`

extension; but I avoid doing so below, just to help keep things more concrete.

### Implementing ASTs

The simplest way of thinking about well-typed ASTs is that they capture the set of terms generated by a (typed) signature; that is, the fixed point of some `Σ ∷ [U] → U → ⭑`

. Unpacking the type for `Σ`

, we have that every syntactic constructor `s ∈ Σ`

is associated with some arity (the length of the list), each argument to `s`

has some type in `U`

(the elements of the list), and applying `s`

to the right number of ASTs of the right types will generate a new AST with some type in `U`

(the second argument to `Σ`

).

To implement this fixed point we define an `AST`

type which is parameterized by its signature. To ensure well-aritiedness (and well-typedness) of our ASTs with respect to that signature, we’ll need to introduce a helper type `SArgs`

^{4}. And to ensure that we obtain the *least* fixed-point of the signature, we’ll make everything strict.

```
infix 4 :$
infixr 5 :*
data SArgs ∷ (U → ⭑) → [U] → ⭑ where
End ∷ SArgs ast []
(:*) ∷ !(ast u)
→ !(SArgs ast us)
→ SArgs ast (u : us)
data AST ∷ ([U] → U → ⭑) → U → ⭑ where
(:$) ∷ !(σ us u)
→ !(SArgs (AST σ) us)
→ AST σ u
```

### Implementing ABTs

The problem with ASTs is that they have no notion of variables, and thus have no notion of variable binding. Naively we could implement binders like lambda-abstraction by having something like `λ ∷ Σ [u, v] (u :→ v)`

but then we’d need to do a post-hoc check to ensure that the first argument to `λ`

is in fact a variable. To build that check into the datatype itself we’d have to move `λ`

into the definition of `AST`

(since the first argument is of type `Variable u`

rather than `AST Σ u`

). If lambda-abstraction were the only binder we had, that might not be so bad; but any real-world language has a plethora of binders, and this approach doesn’t scale.

The essential idea behind ABTs is to abstract over the notion of binding itself. Given a single uniform definition of what it means to be a binding form, we don’t have to worry about adding a bunch of ad-hoc constructors to our AST datatype. Moreover, we can then provide single uniform definitions for things which mess with variables and are homomorphic over the signature. Things like capture-avoiding substitution and providing a HOAS API for our first-order representation.

The crucial step is to adjust our notion of what a signature contains. The basic signatures used above only contained applicative forms; i.e., things we can apply to locally-closed terms; i.e., what are called “functors” in the logic programming community. For ABTs we’ll want to allow our signatures to include any generalized quantifier. That is, our signatures will now be of type `Σ ∷ [[U] × U] → U → ⭑`

. Previously, the arguments were indexed by `U`

; now, they’re indexed by `[U] × U`

. The length of the list gives the number of variables being bound, the types in the list give the types of those variables, and the second component of the pair gives the type of the whole locally-open expression.

To implement this we need to extend our syntax tree to include variable bindings and variable uses:

```
data SArgs ∷ ([U] → U → ⭑) → [[U] × U] → ⭑ where
End ∷ SArgs abt []
(:*) ∷ !(abt vs u)
→ !(SArgs abt vus)
→ SArgs abt ((vs,u) : vus)
data ABT ∷ ([[U] × U] → U → ⭑) → [U] → U → ⭑ where
(:$) ∷ !(σ vus u)
→ !(SArgs (ABT σ) vus)
→ ABT σ [] u
Var ∷ !(Variable v)
→ ABT σ [] v
Bind ∷ !(Variable v)
→ !(ABT σ vs u)
→ ABT σ (v : vs) u
```

Time for an example of how this all fits together. To add lambda-abstraction to our language we’d have `λ ∷ Σ [([u],v)] (u :→ v)`

: that is, the `λ`

constructor takes a single argument which is a locally-open term, binding a single variable of type `u`

, and whose body has type `v`

. So given some `x ∷ Variable u`

and `e ∷ ABT Σ [] v`

we’d have the AST `(λ :$ Bind x e :* End) ∷ ABT Σ [] (u :→ v)`

.

### “Local” vs “global” well-typedness

With the `ABT`

definition above, every term of type `ABT Σ vs u`

must be locally well-typed according to the signature `Σ`

. I keep saying “locally” well-typed because we only actually keep track of local binding information. This is an intentional design decision. But only tracking local well-typedness does have some downsides.

So what are the downsides? Where could things go wrong? Given a locally-closed term (i.e., either `Var x`

or `f :$ e`

) any free variables that occur inside will not have their `U`

-types tracked by Haskell’s type system. This introduces some room for the compiler writer to break the connection between the types of a variable’s binder and its use. That is, under the hood, every variable is represented by some unique identifier like an integer or a string. Integers and strings aren’t `U`

-indexed Haskell types, thus it’s possible to construct a `Variable u`

and a `Variable v`

with the same unique identifier, even though `u`

and `v`

differ. We could then `Bind`

the `Variable u`

but `Var`

the `Variable v`

. In order to ensure global well-typedness we need to ensure this can’t happen.

One way is to keep track of global binding information, as we do in the paper presentation of languages. Unfortunately, to do this we’d need to teach Haskell’s typechecker about the structural rules of our language. Without a type-level implementation of sets/maps which respects all the axioms that sets/maps should, we’d be forced to do things like traverse our ASTs and rebuild them identically, but at different type indices. This is simply too hairy to stomach. Implementing the axioms ourselves is doubly so.

Or we could fake it, using `unsafeCoerce`

to avoid the extraneous traversals or the complicated pattern matching on axioms. But doing this we’d erase all guarantees that adding global binding information has to offer.

A third approach, and the one we take in Hakaru, is compartmentalize the places where variables can be constructed. The variable generation code must be part of our trusted code base, but unlike the `unsafeCoerce`

approach we can keep all the TCB code together in one spot rather than spread out across the whole compiler.

### Stratifying our data types

The above definition of ABTs is a simplified version of what we actually use in Hakaru. For example, Hakaru has user-defined algebraic data types, so we also need case analysis on those data types. Alas, generic case analysis is not a generalized quantifier, thus we cannot implement it with `(:$)`

. We could consider just adding case analysis to the `ABT`

definition, but then we’d start running into extensibility issues again. Instead, we can break the `ABT`

type apart into two types: one for capturing variable uses and bindings, and the other for whatever syntax we can come up with. Thus,

```
data Syntax ∷ ([[U] × U] → U → ⭑) → ([U] → U → ⭑) → U → ⭑ where
(:$) ∷ !(σ vus u)
→ !(SArgs abt vus)
→ Syntax σ abt u
data ABT ∷ ([U] → U → ⭑) → [U] → U → ⭑ where
Syn ∷ !(Syntax σ (ABT σ) u)
→ ABT σ [] u
Var ∷ !(Variable v)
→ ABT σ [] v
Bind ∷ !(Variable v)
→ !(ABT σ vs u)
→ ABT σ (v : vs) u
```

Of course, since we’re going to be extending `Syntax`

with all our language-specific details, there’s not a whole lot of benefit to parameterizing over `σ`

. Thus, we can simplify the types considerably by just picking some concrete `Σ`

to plug in for `σ`

.

By breaking `Syntax`

apart from `ABT`

we can now extend our notion of syntax without worrying about the details of variable binding (which can be defined once and for all on `ABT`

). But we could still run into extensibility issues. In particular, often we want to separate the fixed-point portion of recursive types from their generating functor so that we can do things like add annotations at every node in the recursive data type. A prime example of such annotations is keeping track of free variables, as in Neel’s original post. To allow this form of extensibility we need to break up the `ABT`

type into two parts: the recursion, and the `Syn`

/`Var`

/`Bind`

view of the ABT.

```
data ABT ∷ ([U] → U → ⭑) → [U] → U → ⭑ where
Unview ∷ !(View σ (ABT σ) vs u) → ABT σ vs u
view ∷ ABT σ vs u → View σ (ABT σ) vs u
view (Unview e) = e
data View ∷ ([U] → U → ⭑) → [U] → U → ⭑ where
Syn ∷ !(Syntax σ abt u)
→ View σ abt [] u
Var ∷ !(Variable v)
→ View σ abt [] v
Bind ∷ !(Variable v)
→ !(View σ abt vs u)
→ View σ abt (v : vs) u
```

Now, to allow arbitrary annotations we’ll replace the data type `ABT`

with an equivalent type class. Each instance of the `ABT`

class defines some sort of annotations, and we can use the `view`

and `unview`

methods to move between the instance and the concrete `View`

type.

There’s one last form of extensibility we may want to add. Using fixed point combinators gives us a way of describing complete trees. A different way of introducing recursion is with free monads. The free-monad combinator is just like the fixed-point combinator, except that we have an additional type parameter for metavariables and we have a data constructor for using those metavariables instead of requiring the recursion to ground out with a complete syntax tree. The reasons why this might be nice to do are beyond the scope of this post, but the point is we might want to do that so we need to split the `ABT`

class into two parts: one for the recursion itself, and another for the annotations.

In the end, we have a four-level type: the `Syntax`

, the `View`

, the annotations, and the recursion.

[1] In the accepted/current parlance, Hakaru is a “probabilistic programming language”; but I and a number of other folks working on such languages have become disaffected with that term of late, since it’s not entirely clear what should and should not count as a “probabilistic” PL. Over the course of a number of discussions on the topic, I’ve settled on “inferential” PL as describing what is (or, at least what *I* find) interesting about “probabilistic” PL. I’ve been meaning to write a post about the subject, and hopefully this footnote will remind me to do so.↩

[2] N.B., the indexing used in that package is what we get if we erase/compactify the universe `U`

. That is: the erasure of `U`

is a singleton set; the erasure of `[U]`

is isomorphic to the Peano numbers; the erasure of `[[U] × U]`

is isomorphic to a list of Peano numbers; etc.↩

[3] Though at one point I assume we have functions, `(:→)`

, just for the sake of an example.↩

[4] Ideally we’d be able to flatten this type to avoid all the overhead of the linked list implementation. In fact, the entire AST node of `(:$)`

together with its `SArgs`

should be flattened. These nodes have the same general layout as the heap objects in the STG machine: a record with a pointer to the data constructor (i.e., element of the signature) followed by an appropriate number of arguments; and so in principle we ought to be able to implement them directly with a single STG heap object.↩