winterkoninkje: shadowcrane (clean) (Default)

I have a new academic/professional website: http://cl.indiana.edu/~wren/!

There are still a few unfinished areas (e.g., my publications page), but hopefully I'll be finished with them shortly. The new site it built with Hakyll instead of my old Google-Summer-of-Code static website generator. I'm still learning how to implement my old workflow in Hakyll, and if I get the chance I'll write a few posts on how I've set things up. Reading through other folks' posts on how they use Hakyll have been very helpful, and I'd like to give back to the community. I've already issued a pull request for adding two new combinators for defining template fields.

In the meantime, if you notice any broken links from my old blog posts, please let me know.

winterkoninkje: shadowcrane (clean) (Default)

The past couple weeks I've been teaching about recursive types and their encodings in B522. Here's a short annotated bibliography for followup reading:

  • For a basic intro to recursive types, and for the set-theoretic metatheory: see section IV, chapters 20 and 21.
    • Benjamin C. Pierce (2002) Types and Programming Languages. MIT Press.
  • The proof of logical inconsistency and non-termination is "well-known". For every type τ we can define a fixed-point combinator and use that to exhibit an inhabitant of the type:
    • fixτ = λf:(τ→τ). let e = λx:(μα.α→τ). f (x (unroll x)) in e (roll(μα.α→τ) e)
    • τ = fixτ (λx:τ. x)
  • A category-theoretic proof that having fixed-points causes inconsistency
  • The proof of Turing-completeness is "well-known". Here's a translation from the untyped λ-calculus to STLC with fixed-points:
    • (x)* = x
    • (λx. e)* = roll(μα.α→α) (λx:(μα.α→α). e*)
    • (f e)* = unroll (f*) (e*)
  • Knaster–Tarski (1955): For any monotone function, f, (a) the least fixed-point of f is the intersection of all f-closed sets, and (b) the greatest fixed-point of f is the union of all f-consistent sets.
  • For a quick introduction to category theory, a good place to start is:
    • Benjamin C. Pierce (1991) Basic Category Theory for Computer Scientists. MIT Press.
  • For a more thorough introduction to category theory, consider:
  • The Boehm–Berarducci encoding
  • Under βη-equivalence, Church/Boehm–Berarducci encodings are only weakly initial (hence, can define functions by recursion but can't prove properties by induction)
  • However, using contextual equivalence, Church/Boehm–Berarducci encodings are (strongly) initial
  • Surjective pairing cannot be encoded in STLC (i.e., the implicational fragment of intuitionistic propositional logic): see p.155
    • Morten H. Sørensen and Paweł Urzyczyn (2006) Lectures on the Curry–Howard isomorphism. Studies in Logic and the Foundations of Mathematics, v.149.
  • However, adding it is a conservative extension
  • Boehm–Berarducci encoded pairs is not surjective pairing: the η-rule for Boehm–Berarducci encoding of pairs cannot be derived in System F. (The instances for closed terms can be, just not the general rule.)
  • Compiling data types with Scott encodings
  • For more on the difference between Scott and Mogensten–Scott encodings:
  • Parigot encodings
    • M. Parigot (1988) Programming with proofs: A second-order type theory. ESOP, LNCS 300, pp.145–159. Springer.
  • Parigot encoding of natural numbers is not canonical (i.e., there exist terms of the correct type which do not represent numbers); though both Church/Boehm–Berarducci and Scott encoded natural numbers are.
  • For more on catamorphisms, anamorphisms, paramorphisms, and apomorphisms
  • build/foldr list fusion
    • Andy Gill, John Launchbury, and Simon Peyton Jones (1993) A short cut to deforestation. In Proc. Functional Programming Languages and Computer Architecture, pp.223–232.
    • Many more links at the bottom of this page
  • For another general introduction along the lines of what we covered in class
  • "Iterators" vs "recursors" in Heyting arithmetic and Gödel's System T: see ch.10:
    • Morten H. Sørensen and Paweł Urzyczyn (2006) Lectures on the Curry–Howard isomorphism Studies in Logic and the Foundations of Mathematics, v.149.
  • There are a great many more papers by Tarmo Uustalu, Varmo Vene, Ralf Hinze, and Jeremy Gibbons on all this stuff; just google for it.
winterkoninkje: shadowcrane (clean) (Default)

I don't think I ever mentioned it here but, last semester I took a much-needed sabbatical. The main thing was to take a break from all the pressures of work and grad school and get back into a healthy headspace. Along the way I ended up pretty much dropping off the internet entirely. So if you've been missing me from various mailing lists and online communities, that's why. I'm back now. If you've tried getting in touch by email, irc, etc, and don't hear from me in the next few weeks, feel free ping me again.

This semester I'm teaching foundations of programming language theory with Jeremy Siek, and work on the dissertation continues apace. Over the break I had a few breakthrough moments thinking about the type system for chiastic lambda-calculi— which should help to clean up the normal forms for terms, as well as making room for extending the theory to include eta-conversion. Once the dust settles a bit I'll write some posts about it, as well as continuing the unification-fd tutorial I started last month.

winterkoninkje: shadowcrane (clean) (Default)

A while back I released the unification-fd library, which gives a generic implementation of first-order unification of non-cyclic terms. I've given a few talks on how the library is implemented and what optimizations it performs, but that's not the topic for today. Today, I'm going to talk about how to use it.

Unification is a widely useful operation and, consequently, comes in many different flavors. The version currently supported by the library is the sort used by logic programming languages like Prolog, Curry, Dyna, and MiniKanren; which is the same sort that's used for unification-based type inference algorithms like Hindley–Damas–Milner. Of these two examples, the logic programming example is the simpler one to discuss— at least for folks who've used a language like Prolog before. So let's start from there.

Caveat Emptor: This post is something of a stream of consciousness. I've gotten a few requests for tutorials on how to use the library, but the requests weren't terribly specific about what problems people've had or what's been difficult to figure out. So I'm shooting in the dark as far as what folks need and how much background they have. I'm going to assume you're familiar with Prolog and the basics of what unification is and does.

Preemptive apology: I started writing this post months and months (and months) ago, but unintentionally dropped it after running into a certain issue and then getting distracted and moving onto other things. Actually, this happened at least twice. I'm terribly sorry about that. So, apologies for not tackling the disjunction issue in this post. I'll come back to it later, but figured this post really needs to get out the door already.

Logic Terms

A term, in Prolog, is just a fancy name for a value of some algebraic data type. In most variants of Prolog there's no explicit definition of the ADT, no restriction on what the constructors are, and no type checking to ensure that subterms have a particular shape. That is, Prolog is what's called a single-sorted logic; in other words, Prolog is an untyped/unityped language. With unification-fd we can implement multi-sorted (aka typed) logics, but for this tutorial we're going to stick with Prolog's single-sorted approach.

Opening up Control.Unification we'll see a handful of types and type classes, followed by a bunch of operators. The UTerm data type captures the recursive structure of logic terms. (UTerm is the free monad, if you're familiar with that terminology.) That is, given some functor t which describes the constructors of our logic terms, and some type v which describes our logic variables, the type UTerm t v is the type of logic terms: trees with multiple layers of t structure and leaves of type v. For our single-sorted logic, here's an implementation of t:

data T a = T String [a]

The String gives the name of the term constructor, and the list gives the ordered sequence of subterms. Thus, the Prolog term foo(bar,baz(X)) would be implemented as UTerm$T "foo" [UTerm$T "bar" [], UTerm$T "baz" [UVar x]]. If we're going to be building these terms directly, then we probably want to define some smart constructors to reduce the syntactic noise:

foo x y = UTerm$T "foo" [x,y]
bar     = UTerm$T "bar" []
baz x   = UTerm$T "baz" [x]

Now, we can implement the Prolog term as foo bar (baz x). If you prefer a more Prolog-like syntax, you can use uncurried definitions for smart constructors that take more than one argument.

Unifiable

In order to use our T data type with the rest of the API, we'll need to give a Unifiable instance for it. Before we do that we'll have to give Functor, Foldable, and Traversable instances. These are straightforward and can be automatically derived with the appropriate language pragmas.

The Unifiable class gives one step of the unification process. Just as we only need to specify one level of the ADT (i.e., T) and then we can use the library's UTerm to generate the recursive ADT, so too we only need to specify one level of the unification (i.e., zipMatch) and then we can use the library's operators to perform the recursive unification, subsumption, etc.

The zipMatch function takes two arguments of type t a. The abstract t will be our concrete T type. The abstract a is polymorphic, which ensures that we can't mess around with more than one level of the term at once. If we abandon that guarantee, then you can think of it as if a is UTerm T v. Thus,t a means T (UTerm T v); and T (UTerm T v) is essentially the type UTerm T v with the added guarantee that the values aren't in fact variables. Thus, the arguments to zipMatch are non-variable terms.

The zipMatch method has the rather complicated return type: Maybe (t (Either a (a,a))). Let's unpack this a bit by thinking about how unification works. When we try to unify two terms, first we look at their head constructors. If the constructors are different, then the terms aren't unifiable, so we return Nothing to indicate that unification has failed. Otherwise, the constructors match, so we have to recursively unify their subterms. Since the T structures of the two terms match, we can return Just t0 where t0 has the same T structure as both input terms. Where we still have to recursively unify subterms, we fill t0 with Right(l,r) values where l is a subterm of the left argument to zipMatch and r is the corresponding subterm of the right argument. Thus, zipMatch is a generalized zipping function for combining the shared structure and pairing up substructures. And now, the implementation:

instance Unifiable T where
    zipMatch (T m ls) (T n rs)
        | m /= n    = Nothing
        | otherwise =
            T n <$> pairWith (\l r -> Right(l,r)) ls rs

Where list-extras:Data.List.Extras.Pair.pairWith is a version of zip which returns Nothing if the lists have different lengths. So, if the names m and n match, and if the two arguments have the same number of subterms, then we pair those subterms off in order; otherwise, either the names or the lengths don't match, so we return Nothing.

Feature Structures

For the T example, we don't need to worry about the Left option. The reason it's there is to support feature structures and other sparse representations of terms. That is, consider the following type:

newtype FS k a = FS (Map k a)

Using this type, our logic terms are sets of key–subterm pairs. When unifying maps like these, what do we do if one argument has a binding for a particular key but the other argument does not? In the T example we assumed that subterms which couldn't be paired together (because the lists were different lengths) meant the unification must fail. But for FS it makes more sense to assume that terms which can't be paired up automatically succeed! That is, we'd like to assume that all the keys which are not explicitly present in the Map k a are implicitly present and each one is bound to a unique logic variable. Since the unique logic variables are implicit, there's no need to actually keep track of them, we'll just implicitly unify them with the subterm that can't be paired off.

This may make more sense if you see the Unifiable instance:

instance (Ord k) => Unifiable (FS k) where
    zipMatch (FS ls) (FS rs) =
        Just . FS $
            unionWith (\(Left l) (Left r) -> Right(l,r))
                (fmap Left ls)
                (fmap Left rs)

We start off by mapping Left over both the ls and the rs. We then call unionWith to pair things up. For any given key, if both ls and rs specify a subterm, then these subterms will be paired up as Right(l,r). If we have extra subterms from either ls or rs, however, then we keep them around as Left l or Left r. Thus, the Unifiable instance for FS performs a union of the FS structure, whereas the instance for T performs an intersection of T structure.

The Left option can be used in any situation where you can immediately resolve the unification of subterms, whereas the Right option says you still have work to do.1

Logic Variables

The library ships with two implementations of logic variables. The IntVar implementation uses Int as the names of variables, and uses an IntMap to keep track of the environment. The STVar implementation uses STRefs, so we can use actual mutation for binding logic variables, rather than keeping an explicit environment around. Of course, mutation complicates things, so the two implementations have different pros and cons.

Performing unification has the side effect of binding logic variables to terms. Thus, we'll want to use a monad in order to keep track of these effects. The BindingMonad type class provides the definition of what we need from our ambient monad. In particular, we need to be able to generate fresh logic variables, to bind logic variables, and to lookup what our logic variables are bound to. The library provides the necessary instances for both IntVar and STVar.

You can, of course, provide your own implementations of Variable and BindingMonad. However, doing so is beyond the scope of the current tutorial. For simplicity, we'll use the IntVar implementation below.

Example Programs

When embedding Prolog programs into Haskell, the main operators we want to consider are those in the section titled "Operations on two terms". These are structural equality (i.e., equality modulo substitution), structural equivalence (i.e., structural equality modulo alpha-variance), unification, and subsumption.

Consider the following Horn clause in Prolog:

example1(X,Y,Z) :- X = Y, Y = Z.

To implement this in Haskell we want a function which takes in three arguments, unifies the first two, and then unifies the second two. Thus,2

example1 x y z = do
    x =:= y
    y =:= z

To run this program we'd use one of the functions runIntBindingT, evalIntBindingT, or execIntBindingT, depending on whether we care about the binding state, the resulting logic term, or both. Of course, since the unifications may fail, we also need to run the underlying error monad, using something like runErrorT3,4. And since these are both monad transformers, we'll need to use runIdentity or the like in order to run the base monad. Thus, the functions to execute the entire monad stack will look like:

-- Aliases to simplify our type signatures. N.B., the
-- signatures are not actually required to get things
-- to typecheck.
type PrologTerm           = UTerm T IntVar 
type PrologFailure        = UnificationFailure T IntVar
type PrologBindingState   = IntBindingState T

-- N.B., the @FallibleBindingMonad@ isn't yet a monad
-- for Prolog because it doesn't support backtracking.
type FallibleBindingMonad =
    ErrorT PrologFailure (IntBindingT T Identity)

-- N.B., this definition runs into certain issues.
type PrologMonad =
    ErrorT PrologFailure (IntBindingT T Logic)

runFBM
    :: FallibleBindingMonad a
    -> (Either PrologFailure a, PrologBindingState)
runFBM = runIdentity . runIntBindingT . runErrorT

Here are some more examples:

-- A helper function to reduce boilerplate. First we get
-- a free variable, then we embed it into @PrologTerm@,
-- and then we embed it into some error monad (for
-- capturing unification failures).
getFreeVar = lift (UVar <$> freeVar)

-- example2(X,Z) :- X = Y, Y = Z.
example2 x z = do
    y <- getFreeVar
    x =:= y
    y =:= z

-- example3(X,Z) :- example1(X,Y,Z).
example3 x z = do
    y <- getFreeVar
    example1 x y z

-- example4(X) :- X = bar; X = backtrack.
example4 x = (x =:= bar) <|> (x =:= atom "backtrack")

The complete code for this post can be found here online, or at ./test/tutorial/tutorial1.hs in the Darcs repo. Notably, there are some complications about the semantics of example4; it doesn't mean what you think it should mean. We'll tackle that problem and fix it later on in the tutorial series (in part 4 or thereabouts).

Term Factoring and Clause Resolution Automata (CRAs)

Note that for the above examples, the Haskell functions only execute the right-hand side of the Horn clause. In Prolog itself, there's also a process of searching through all the Horn clauses in a program and deciding which one to execute next. A naive way to implement that search process would be to have a list of all the Horn clauses and walk through it, trying to unify the goal with the left-hand side of each clause and executing the right-hand side if it matches. A more efficient way would be to compile all the right-hand sides into a single automaton, allowing us to match the goal against all the right-hand sides at once. (The idea here is similar to compiling a bunch of strings together into a trie or regex.)

Constructing optimal CRAs is NP-complete in general, though it's feasible if we have an arbitrary ordering of clauses (e.g., Prolog's top–down order for trying each clause). The unification-fd library does not implement any support for CRAs at present, though it's something I'd like to add in the future. For more information on this topic, see Dawson et al. (1995) Optimizing Clause Resolution: Beyond Unification Factoring and Dawson et al. (1996) Principles and Practice of Unification Factoring.

Other operators

In addition to unification itself, it's often helpful to have various other operators on hand.

One such operator is the subsumption operator. Whereas unification looks for a most-general substitution which when applied to both arguments yields terms which are structurally equal (i.e., l =:= r computes the most general s such that s l === s r), subsumption applies the substitution to only one side. That is, l subsumes r just in case r is a substitution instance of l (i.e., there exists a substitution s such that s l === r). The symbolic name (<:=) comes from the fact that when l subsumes r we also say that l is less defined5 than r. Subsumption shows up in cases where we have to hold r fixed for some reason, such as when implementing polymorphism or subtyping.

Other operators work on just one term, such as determining the free variables of a term, explicitly applying the ambient substitution to obtain a pure term, or cloning a term to make a copy where all the free variables have been renamed to fresh variables. These sorts of operators aren't used very often in logic programming itself, but are crucial for implementing logic programming languages.

Conclusion

Hopefully that gives a quick idea of how the library's API is set up. Next time I'll walk through an implementation of Hindley–Damas–Milner type inference, and then higher-ranked polymorphism à la Peyton Jones et al. (2011) Practical type inference for arbitrary-rank types. After that, I'll discuss the complications about backtracking choice I noticed when writing this post, and walk through how to fix them. If there's still interest after that, I can get into some of the guts of the library's implementation— like ranked path compression, maximal structure sharing, and so on.

If you have any particular questions you'd like me to address, drop me a line.


[1] Older versions of the library used the type zipMatch :: forall a b. t a -> t b -> Maybe (t (a,b)) in order to ensure that we did in fact properly pair up subterms from the two arguments. Unfortunately I had to relax that guarantee in order to add support for feature structures.

[2] N.B., a more efficient implementation is:

example1' x y z = do
    y' <- x =:= y
    y' =:= z

The unification operator returns a new term which guarantees maximal structure sharing with both of its arguments. The implementation of unification makes use of observable structure sharing, so by capturing y' and using it in lieu of y, the subsequent unifications can avoid redundant work.

[3] The ErrorT transformer was deprecated by transformers-0.4.1.0, though it still works for this tutorial. Unfortunately, the preferred ExceptT does not work since UnificationFailure doesn't have a Monoid instance as of unification-fd-0.9.0. The definition of UnificationFailure already contains a hack to get it to work with ErrorT, but future versions of the library will remove that hack and will require users to specify their own monoid for combining errors. The First monoid captures the current behavior, though one may prefer to use other monoids such as a monoid that gives a trace of the full execution path, or witnesses for all the backtracks, etc.

[4] To be honest, I don't entirely recall why I had the error monad explicitly separated out as a monad transformer over the binding monad, rather than allowing these two layers to be combined. Since it's so awkward, I'm sure there was some reason behind it, I just failed to make note of why. If there turns out to be no decent reason for it, future versions of the library may remove this fine-grain distinction.

[5] The symbolic name for subsumption is chosen to reflect the meaning of more/less defined (rather than more/less grounded) so that the subsumption ordering coincides with the domain ordering (think of logic variables as being bottom). This is the standard direction for looking at subsumption; though, of course, we could always consider the dual ordering instead.

winterkoninkje: shadowcrane (clean) (Default)

For all you local folks, I'll be giving a talk about my dissertation on November 5th at 4:00–5:00 in Ballantine Hall 011. For those who've heard me give talks about it before, not much has changed since NLCS 2013. But the majority of current CL/NLP, PL, and logic folks haven't seen the talk, so do feel free to stop by.

Abstract: Many natural languages allow scrambling of constituents, or so-called "free word order". However, most syntactic formalisms are designed for English first and foremost. They assume that word order is rigidly fixed, and consequently these formalisms cannot handle languages like Latin, German, Russian, or Japanese. In this talk I introduce a new calculus —the chiastic lambda-calculus— which allows us to capture both the freedoms and the restrictions of constituent scrambling in Japanese. In addition to capturing these syntactic facts about free word order, the chiastic lambda-calculus also captures semantic issues that arise in Japanese verbal morphology. Moreover, chiastic lambda-calculus can be used to capture numerous non-linguistic phenomena, such as: justifying notational shorthands in category theory, providing a strong type theory for programming languages with keyword-arguments, and exploring metatheoretical issues around the duality between procedures and values.

Edit 2014.11.05: The slides from the talk are now up.

winterkoninkje: shadowcrane (clean) (Default)

Citation is a necessary practice for any sort of intellectual engagement, whether formal or colloquial, and whether academic or activistic. It is crucial to give credit to the originators of ideas— for ethical honesty: to acknowledge those who've enlightened you; for professional honesty: to make clear where your contributions begin; and for intellectual honesty: to allow others to read the sources for themselves and to follow up on other extensions and criticisms of that work.

When encountering a new idea or text, I often engage in a practice I call "encitation". In order to more thoroughly understand and ingrain a text's intellectual content, I try (temporarily) to view all other ideas and arguments through its lens. This is why when I was reading Whipping Girl I was citing it left and right, just as when I was reading Killing Rage I quoted it incessantly. To understand structuralism, I embraced the structuralist theory and viewed all things in structuralist terms; to understand functionalism, or Marxism, or Freudianism, or performativity, I did the same. Of course, every framework is incomplete and emphasizes certain things to the exclusion of observing others; so viewing the world entirely from within any single framework distorts your perception of reality. The point of the exercise is not to embrace the framework per se, it's to roleplay the embracing of it. The point of this roleplay is to come to understand the emphases and limitations of the framework— not abstractly but specifically. This is especially important for trying to understand frameworks you disagree with. When we disagree with things, the instinct is to discount everything they say. But it's intellectually dishonest to refuse to understand why you disagree. And it's counterproductive, since you cannot debunk the theory nor convince people to change their minds without knowing and addressing where they're coming from.

I engage in encitation not only for anthropological or philosophical ideas, I also do it for mathematical ideas. By trying to view all of mathematics through a particular idea or framework, you come to understand both what it's good at and what it cannot handle. That's one of the things I really love about the way Jason Eisner teaches NLP and declarative methods. While it's brutal to give people a framework (like PCFGs or SAT solving) and then ask them to solve a problem just barely outside of what that framework can handle, it gives you a deep understanding of exactly where and why the framework fails. This is the sort of knowledge you usually have to go out into industry and beat your head against for a while before you see it. But certain fields, like anthropology and writing, do try to teach encitation as a practice for improving oneself. I wonder how much of Jason's technique comes from his background in psychology. Regardless, this practice is one which should, imo, be used (and taught explicitly) more often in mathematics and computer science. A lot of the arguing over OO vs FP would go away if people did this. Instead, we only teach people hybridized approaches, and they fail to internalize the core philosophical goals of notions like objects, functions, types, and so on. These philosophical goals can be at odds, and even irreconcilable, but that does not make one or the other "wrong". The problem with teaching only hybridized approaches is that this irreconcilability means necessarily compromising on the full philosophical commitment to these goals. Without understanding the full philosophical goals of these different approaches, we cannot accurately discuss why sometimes one philosophy is more expedient or practical than another, and yet why that philosophy is not universally superior to others.

The thing to watch out for, whether engaging in the roleplay of encitation or giving citations for actual work, is when you start reciting quotes and texts like catechisms. Once things become a reflexive response, that's a sign that you are no longer thinking. Mantras may be good for meditation, but they are not good critical praxis. This is, no doubt, what Aoife is referring to when she castigates playing Serano says. This is also why it's so dangerous to engage with standardized narratives. The more people engage in recitations of The Narrative, the more it becomes conventionalized and stripped of whatever humanity it may once have had. Moreover, reiterating The Narrative to everyone you meet is the surest way to drive off anyone who doesn't believe in that narrative, or who believes the content but disagrees with the message. Even if I was "born this way", saying so doesn't make it any more true or any more acceptable to those who who would like Jesus to save me from myself. More to the point, saying so places undue emphasis on one very tiny aspect of the whole. I'd much rather convince people of the violent nature of gender enculturation, and get them to recognize the psychological damage that abuse causes, than get them to believe that transgender has a natal origin.

As time goes on, we ask different questions. Consequently, we end up discarding old theories and embracing new ones when the old theory cannot handle our new questions. In our tireless pursuit of the "truth", educators are often reticent to teach defunct theories because we "know" they are "wrong". The new theory is "superior" in being able to address our new questions, but we often lose track of the crucial insights of the old theory along the way. For this reason, it's often important to revive old theories in order to re-highlight those insights and to refocus on old questions which may have become relevant once more. In a way, this revitalization is similar to encitation: the goal is not to say that the old theory is "right", the goal is to understand what the theory is saying and why it's important to say those things.

But again, one must be careful. When new theories arise, practitioners of the immediately-old theory often try to derail the asking of new questions by overemphasizing the questions which gave rise to the preceding theory. This attempt to keep moribund theories on life support often fuels generational divides: the new theoreticians cannot admit to any positives of the old theory lest they undermine their own work, while the old theoreticians feel like they must defend their work against the unrelenting tide lest it be lost forever. I think this is part of why radfems have been spewing such vitriol lately. The theoretical framework of radical feminism has always excluded and marginalized trans women, sex workers, and countless others; but the framework does not justify doxxing, stalking, and harassing those women who dare refute the tenets of The Doctrine. This reactionary violence bears a striking resemblance to the violence of religious fundamentalists1. And as with the religious fundamentalists, I think the reactionary violence of radfems stems from living in a world they can no longer relate to or make sense of.

Major changes in mathematics often result in similar conflicts, though they are seldom so violent. The embracing/rejection of constructivism as a successor to classical mathematics. The embracing/rejection of category theory as an alternative to ZFC set theory. Both of these are radical changes to the philosophical foundations of mathematical thought, and both of these are highly politicized, with advocates on both sides who refuse to hear what the other side is saying. Bob Harper's ranting and railing against Haskell and lazy evaluation is much the same. Yes, having simple cost models and allowing benign side effects is important; but so is having simple semantic models and referential transparency. From where we stand now, those philosophical goals seem to be at odds. But before we can make any progress on reconciling them, we must be willing to embrace both positions long enough to understand their crucial insights and to objectively recognize where and how both fail.


[1] To be clear: I do not draw this analogy as a way of insulting radfems; only to try and make sense of their behavior. There are many religious people (even among those who follow literalist interpretations of their religious texts) who are not terrorists; so too, there are women who believe in the radfem ideology and don't support the behavior of TERFs, SWERFs, etc. It is important to recognize both halves of each community in order to make sense of either side's reactions; and it's important to try to understand the mechanism that leads to these sorts of splits. But exploring this analogy any further is off-topic for this post. Perhaps another time.

winterkoninkje: shadowcrane (clean) (Default)

Meanwhile, back in math land... A couple-few months ago I was doing some work on apartness relations. In particular, I was looking into foundational issues, and into what an apartness-based (rather than equality-based) dependently-typed programming language would look like. Unfortunately, too many folks think "constructive mathematics" only means BHK-style intuitionistic logic— whereas constructive mathematics includes all sorts of other concepts, and they really should be better known!

So I started writing a preamble post, introducing the basic definitions and ideas behind apartnesses, and... well, I kinda got carried away. Instead of a blog post I kinda ended up with a short chapter. And then, well, panic struck. In the interests of Publish Ever, Publish Often, I thought I might as well share it: a brief introduction to apartness relations. As with my blog posts, I'm releasing it under Creative Commons Attribution-NonCommercial-NoDerivs 4.0; so feel free to share it and use it for classes. But, unlike the other columbicubiculomania files, it is not ShareAlike— since I may actually turn it into a published chapter someday. So do respect that. And if you have a book that needs some chapters on apartness relations, get in touch!

The intro goes a little something like this:


We often talk about values being "the same as" or "different from" one another. But how can we formalize these notions? In particular, how should we do so in a constructive setting?

Constructively, we lack a general axiom for double-negation elimination; therefore, every primitive notion gives rise to both strong (strictly positive) and weak (doubly-negated) propositions. Thus, from the denial of (weak) difference we can only conclude weak sameness. Consequently, in the constructive setting it is often desirable to take difference to be a primitive— so that, from the denial of strong difference we can in fact conclude strong sameness.

This ability "un-negate" sameness is the principal reason for taking difference to be one of our primitive notions. While nice in and of itself, it also causes the strong and weak notions of sameness to become logically equivalent (thm 1.4); enabling us to drop the qualifiers when discussing sameness.

But if not being different is enough to be considered the same, then do we still need sameness to be primitive? To simplify our reasoning, we may wish to have sameness be defined as the lack of difference. However, this is not without complications. Sameness has been considered a natural primitive for so long that it has accrued many additional non-propositional properties (e.g., the substitution principle). So, if we eliminate the propositional notion of primitive equality, we will need somewhere else to hang those coats.

The rest of the paper fleshes out these various ideas.

winterkoninkje: shadowcrane (clean) (Default)

A followup to my previous [reddit version]:

The examples are of limited utility. The problem is not a few bad apples or a few bad words; were that the case it would be easier to address. The problem is a subtle one: it's in the tone and tenor of conversation, it's in the things not talked about, in the implicitization of assumptions, and in a decentering of the sorts of communities of engagement that Haskell was founded on.

Back in 2003 and 2005, communities like Haskell Cafe were communities of praxis. That is, we gathered because we do Haskell, and our gathering was a way to meet others who do Haskell. Our discussions were centered on this praxis and on how we could improve our own doing of Haskell. Naturally, as a place of learning it was also a place of teaching— but teaching was never the goal, teaching was a necessary means to the end of improving our own understandings of being lazy with class. The assumptions implicit in the community at the time were that Haskell was a path to explore, and an obscure one at that. It is not The Way™ by any stretch of the imagination. And being a small community it was easy to know every person in it, to converse as you would with a friend not as you would online.

Over time the tone and nature of the Cafe changed considerably. It's hard to explain the shift without overly praising the way things were before or overly condemning the shift. Whereas the Cafe used to be a place for people to encounter one another on their solitary journeys, in time it became less of a resting stop (or dare I say: cafe) and more of a meeting hall. No longer a place to meet those who do Haskell, but rather a place for a certain communal doing of Haskell. I single the Cafe out only because I have the longest history with that community, but the same overall shift has occurred everywhere I've seen. Whereas previously it was a community of praxis, now it is more a community of educationalism. In the public spaces there is more teaching of Haskell than doing of it. There's nothing wrong with teaching, but when teaching becomes the thing-being-done rather than a means to an end, it twists the message. It's no longer people asking for help and receiving personal guidance, it's offering up half-baked monad tutorials to the faceless masses. And from tutorialization it's a very short path to proselytizing and evangelizing. And this weaponization of knowledge always serves to marginalize and exclude very specific voices from the community.

One class of voices being excluded is women. To see an example of this, consider the response to Doaitse Swierstra's comment at the 2012 Haskell Symposium. Stop thinking about the comment. The comment is not the point. The point is, once the problematic nature of the comment was raised, how did the community respond? If you want a specific example, this is it. The example is not in what Swierstra said, the example is in how the Haskell community responded to being called out. If you don't recall how this went down, here's the reddit version; though it's worth pointing out that there were many other conversations outside of reddit. A very small number of people acquitted themselves well. A handful of people knew how to speak the party line but flubbed it by mansplaining, engaging in flamewars, or allowing the conversation to be derailed. And a great many people were showing their asses all over the place. Now I want you to go through and read every single comment there, including the ones below threshold. I want you to read those comments and imagine that this is not an academic debate. Imagine that this is your life. Imagine that you are the unnamed party under discussion. That your feelings are the ones everyone thinks they know so much about. That you personally are the one each commenter is accusing of overreacting. Imagine that you are a woman, that you are walking down the street in the middle of the night in an unfamiliar town after a long day of talks. It was raining earlier so the streets are wet. You're probably wearing flats, but your feet still hurt. You're tired. Perhaps you had a drink over dinner with other conference-goers, or perhaps not. Reading each comment, before going on to the next one, stop and ask yourself: would you feel safe if this commenter decided to follow you home on that darkened street? Do you feel like this person can comprehend that you are a human being on that wet street? Do you trust this person's intentions in being around you late at night? And ask yourself, when some other commenter on that thread follows you home at night and rapes you in the hotel, do you feel safe going to the comment's author to tell them what happened? Because none of this is academic. As a woman you go to conferences and this is how you are treated. And the metric of whether you can be around someone is not whether they seem interesting or smart or anything else, the metric is: do you feel safe? If you can understand anything about what this is like, then reading that thread will make you extremely uncomfortable. The problem is not that some person makes a comment. The problem is that masculinized communities are not safe for women. The problem is that certain modes of interaction are actively hostile to certain participants. The problem is finding yourself in an uncomfortable situation and knowing that noone has your back. Knowing that anyone who agrees with you will remain silent because they do not think you are worth the time and energy to bother supporting. Because that's what silence says. Silence says you are not worth it. Silence says you are not one of us. Silence says I do not think you are entirely human. And for all the upvotes and all the conversation my previous comment has sparked on twitter, irc, and elsewhere, I sure don't hear anyone here speaking up to say they got my back.

This is not a problem about women in Haskell. Women are just the go-to example, the example cis het middle-class educated able white men are used to engaging. Countless voices are excluded by the current atmosphere in Haskell communities. I know they are excluded because I personally watched them walk out the door after incidents like the one above, and I've been watching them leave for a decade. I'm in various communities for queer programmers, and many of the folks there use Haskell but none of them will come within ten feet of "official" Haskell communities. That aversion is even stronger in the transgender/genderqueer community. I personally know at least a dozen trans Haskellers, but I'm the only one who participates in the "official" Haskell community. Last fall I got hatemail from Haskellers for bringing up the violence against trans women of color on my blog, since that blog is syndicated to Planet Haskell. Again, when I brought this up, people would express their dismay in private conversations, but noone would say a damn thing in public nor even acknowledge that I had spoken. Ours has never been a great community for people of color, and when I talk to POC about Haskell I do not even consider directing them to the "official" channels. When Ken Shan gave the program chair report at the Haskell symposium last year, there was a similarly unwholesome response as with Swierstra's comment the year before. A number of people have shared their experiences in response to Ken's call, but overwhelmingly people feel like their stories of being marginalized and excluded "don't count" or "aren't enough to mention". Stop. Think about that. A lot of people are coming forward to talk about how they've been made to feel uncomfortable, and while telling those stories they feel the need to qualify. While actively explaining their own experiences of racism, sexism, heterosexism, cissexism, ablism, sanism, etc, they feel the simultaneous need to point out that these experiences are not out of the ordinary. Experiencing bigotry is so within the ordinary that people feel like they're being a bother to even mention it. This is what I'm talking about. This is what I mean when I say that there is a growing miasma in our community. This is how racism and sexism and ablism work. It's not smacking someone on the ass or using the N-word. It's a pervasive and insidious tone in the conversation, a thousand and one not-so-subtle clues about who gets to be included and who doesn't. And yes the sexual assaults and slurs and all that factor in, but that's the marzipan on top of the cake. The cake is made out of assuming someone who dresses "like a rapper" can't be a hacker. The cake is made out of assuming that "mother" and "professional" are exclusive categories. The cake is made out of well-actuallys and feigned surprise. And it works this way because this is how it avoids being called into question. So when you ask for specific examples you're missing the point. I can give examples, but doing so only contributes to the errant belief that bigotry happens in moments. Bigotry is not a moment. Bigotry is a sustained state of being that permeates one's actions and how one forms and engages with community. So knowing about that hatemail, or knowing about when I had to call someone out for sharing titty pictures on Haskell Cafe, or knowing about the formation of #nothaskell, or knowing about how tepid the response to Tim's article or Ken's report were, knowing about none of these specifics helps to engage with the actual problem.

winterkoninkje: shadowcrane (clean) (Default)

Gershom Bazerman gave some excellent advice for activism and teaching. His focus was on teaching Haskell and advocating for Haskell, but the advice is much more widely applicable and I recommend it to anyone interested in activism, social justice, or education. The piece has garnered a good deal of support on reddit— but, some people have expressed their impression that Gershom's advice is targeting a theoretical or future problem, rather than a very concrete and very contemporary one. I gave a reply there about how this is indeed a very real issue, not a wispy one out there in the distance. However, I know that a lot of people like me —i.e., the people who bear the brunt of these problems— tend to avoid reddit because it is an unsafe place for us, and I think my point is deserving of a wider audience. So I've decided to repeat it here:

This is a very real and current problem. (Regardless of whether things are less bad in Haskell communities than in other programming communities.) I used to devote a lot of energy towards teaching folks online about the ideas behind Haskell. However, over time, I've become disinclined to do so as these issues have become more prevalent. I used to commend Haskell communities for offering a safe and welcoming space, until I stopped feeling quite so safe and welcomed myself.

I do not say this to shame anyone here. I say it as an observation about why I have found myself pulling away from the Haskell community over time. It is not a deliberate act, but it is fact all the same. The thing is, if someone like me —who supports the ideology which gave rise to Haskell, who is well-educated on the issues at hand, who uses Haskell professionally, who teaches Haskell professionally, and most importantly: who takes joy in fostering understanding and in building communities— if someone like me starts instinctively pulling away, that's a problem.

There are few specific instances where I was made to feel unsafe directly, but for years there has been a growing ambiance which lets me know that I am not welcome, that I am not seen as being part of the audience. The ambiance (or should I say miasma?) is one that pervades most computer science and programming/tech communities, and things like dogmatic activism, dragon slaying, smarter-than-thou "teaching", anti-intellectualism, hyper-intellectualism, and talking over the people asking questions, are all just examples of the overarching problem of elitism and exclusion. The problem is not that I personally do not feel as welcomed as I once did, the problem is that many people do not feel welcome. The problem is not that my experience and expertise are too valuable to lose, it's that everyone's experience and expertise is too valuable to lose. The problem is not that I can't teach people anymore, it's that people need teachers and mentors and guides. And when the tenor of conversation causes mentors and guides to pull away, causes the silencing of experience and expertise, causes the exclusion and expulsion of large swaths of people, that always has an extremely detrimental impact on the community.

New skin

31 Jul 2014 05:41 pm
winterkoninkje: shadowcrane (clean) (Default)

Hello all,

I just changed the theme/skin for my blog and have been playing around with new fonts, css, handling of footnotes, etc. Let me know what you think and whether you run into any issues (especially on older posts). It's been years since I've done webdev, long before CSS3 and HTML5 so thing's are a bit different than I'm used to.

In other news, I haven't heard back from the Haskell Planet admins about getting the feed switched over to Haskell/coding/math content only. So, if you've been annoyed by the OT, sorry bout that!

winterkoninkje: shadowcrane (clean) (Default)

Hello all,

This is just a short blurb to announce that I'm changing it so that Haskell Planet only picks up on posts tagged with haskell planet. For the next while here, I'll probably be writing about social justice, gender theory, and personal topics more often than Haskell and theoretical mathematics; so I don't want to flood the Planet with too much off-topic material.

Worry not, the pendulum will swing back again— it always does. But lately, this is where my thoughts have been focused.

winterkoninkje: shadowcrane (clean) (Default)

This summer I've been working on optimizing compilation for a linear algebra DSL. This is an extension of Jeremy Siek's work on Built-to-Order BLAS functions. Often times it's more efficient to have a specialized function which fuses two or more BLAS functions. The idea behind BTO is that we'd like to specify these functions at a high level (i.e., with liner algebra expressions) and then automatically perform the optimizing transformations which have made BLAS such a central component of linear algebra computations.

The current/prior version of BTO already handles loop fusion, memory bandwidth constraints, and more. However, it is not currently aware any high-level algebraic laws such as the fact that matrix multiplication is associative, addition is associative and commutative, transposition reverse-distributes over multiplication, etc. My goal is to make it aware of these sort of things.

Along the way, one thing to do is solve the chain multiplication problem: given an expression like ∏[x1,x2...xN] figure out the most efficient associativity for implementing it via binary multiplication. The standard solution is to use a CKY-like dynamic programming algorithm to construct a tree covering the sequence [x1,x2...xN]. This is easy to implement, but it takes O(n^3) time and O(n^2) space.

I found a delicious alternative algorithm which solves the problem in O(n*log n) time and O(n) space! The key to this algorithm is to view the problem as determining a triangulation of convex polygons. That is, we can view [x0,x1,x2...xN] as the edges of a convex polygon, where x0 is the result of computing ∏[x1,x2...xN]. This amazing algorithm is described in the tech report by Hu and Shing (1981a), which includes a reference implementation in Pascal. Unfortunately the TR contains a number of typos and typesetting issues, but it's still pretty legible. A cleaner version of Part I is available here. And pay-walled presumably-cleaner versions of Part I and Part II are available from SIAM.

Hu and Shing (1981b) also have an algorithm which is simpler to implement and returns a heuristic answer in O(n) time, with the error ratio bounded by 15%. So if compile times are more important than running times, then you can use this version as well. A pay-walled version of the article is available from Elsevier.

winterkoninkje: shadowcrane (clean) (Default)

Katie Miller is giving a talk about FP outreach and diversity at next month's Erlang User Conference. She sent a questionnaire to the Lambda Ladies mailing list about our experiences, and I thought I'd share my responses here.

What led you to pursue functional programming?

Curiosity. I was in grad school, working towards a masters in computer science. And I was particularly interested in programming languages, though I was only familiar with imperative and OO languages at that point. I saw a course on functional programming, so I signed up. Best CS decision I ever made.

What sort of perception did you have of functional programming before you learnt more about it? Was this a barrier? If so, how did you overcome this?

All I knew at the time was that it was some sort of paradigm completely different from imperative and OO. That's it really. This was before FP started becoming popular; so, long before Clojure or Scala were invented, and long before C++ considered adding lambdas/closures to the language. Even within the FP community, Haskell was still considered the new kid on the block (despite having been around for quite some time).

What were the challenges for becoming part of the FP community?

The main challenge was just in figuring out where the community was and how to take part. As I said, this was long before FP became popular. My first FP language was Haskell, but I'd learned it in that course on functional programming so I didn't really know what the community was like. It was a year or two after taking the class that I decided to start really using Haskell for projects. At the time I was taking part in the Perl community, so I thought I'd go searching for some Haskell mailing lists to join. That's when I found the firehose that is Haskell Cafe.

Why do you think women are underrepresented in FP, more so than in programming generally?

I think there are a number of reasons. One of the big ones is how academic the community is. I don't mean that in the way people usually do. I'm an academic, and I love it here! No, the problem is that this creates a huge selection bias. I only really found FP by stumbling into it, and I only stumbled into it because I had a number of supportive advisors who helped foster my interest in programming languages. By the point I found FP, many women would have already been filtered out. Just getting into and affording college is a huge thing, especially for women of color. Let alone making it through undergrad and then getting into a masters program in CS without a bachelor's in CS. Let alone ending up at a school that can offer good FP classes, and finding those supportive advisors to help you along and guide you in the right direction.

If my story is anything to go by, it takes a lot of privilege (and luck) just to get to the point where you discover FP. After that, then you add on all the issues about maintaining community involvement. Because the community is so academic, this heightens issues of impostor syndrome. (Even men are driven out of FP due to impostor syndrome!) And since FP tends to be sold in a hyper-intellectualized manner, this evokes the "math is hard" brand of anti-intellectualism. While this drives a lot of people away, I think it has a differentially powerful impact on women due to the way we gender the sciences. That is, FP propaganda has a habit of taking the things which cause women to be underrepresented in STEM generally, and then cranking them up to eleven.

Another issue, and one I haven't seen discussed very often, is the fact that many FP communities are also FOSS communities. Women are more underrepresented in FOSS than in other CS communities, so the fact that FP tends to be FOSS means that women will tend to be more underrepresented in FP than other CS communities.

What strategies do you think the community could employ to address this problem and improve the (gender, and other types of) diversity in FP?

Setting up communities which aren't so hyper-intellectualized is a big step. Getting rid of all that propaganda and just treating FP like any other paradigm will do a lot to mitigate the impact of impostor syndrome and "math is hard" anti-intellectualism. It's no panacea, but it's probably the easiest thing we can tackle. Addressing the systemic issues is a lot harder.

Do you think it helps to have a women's group like Lambda Ladies? How has it been helpful for you?

I do think it helps. Over the years I've seen a lot of women come and go (mostly go) on Haskell Cafe. Overall I feel like the Cafe is one of the safer and more welcoming communities, but we've still had our misogynistic flareups. And after each one, I've watched the subsequent evacuation as women no longer feel quite so safe or welcome. By offering a safe space, women's groups are an important form of resistance against this sort of problem. It's a space where you don't always have to be on your guard against harassment. It's a space where you don't have to worry about how you present yourself, don't have to worry that femininity will undermine your credibility, don't have to worry about how asking "stupid" questions will affect the way people think of women as a whole.

Also —we don't really do this on LL, but— women's groups can provide a safe environment for venting about the sorts of problems we encounter online, in the work force, etc. Venting is always a tricky topic, but I think the importance of venting is grossly underrated. Whatever community you're a part of, bad things are going to come up sooner or later. When that happens, having a side community where you can let off steam or discuss why the particular thing is problematic is an important way to deal with the emotional fallout of these bad things. Once you've dealt with it, you can return to the main community; but if you have nowhere to deal with it, then things build up and up until you're just done and you quit the community.

In addition to providing a safe space, women's groups also serve an important role regarding announcements for jobs, conferences, etc. The announcements we get are tailored for women and so include important details like how welcoming they are of women, whether they can offer travel expenses, whether they offer child care, and so on.

For me, LL has been helpful mainly as a place to witness women in FP. Just seeing other women is energizing, and keeps me interested in being out there as part of the general FP community. The bit about announcements has also been helpful.

winterkoninkje: shadowcrane (clean) (Default)

I ended up spending this weekend learning a lot about topos theory. Why? I have no idea. But for whatever reason, this happened. Like many areas of category theory, the biggest hurdle is figuring out the terminology— in particular, figuring out what terminology you actually need to know now vs the terminology you can put off learning until later.

So, with that in mind, I threw together a quick sketch of topos/logos theory. I should emphasize that this is only a very quick sketch. It shows when one kind of category is also another kind of category, but for the most part[1] it doesn't say what the additional characteristics of the subkind are (unlike in my other terminology maps). One major thing missing from this sketch is a notation for when one kind of category is exactly the intersection of two other kinds (e.g., a pretopos is precisely an extensive exact category). Once I figure out how to show that in PSTricks, I'll post a new version. As before, the dashed lines indicate things you get for free (e.g., every pretopos is coherent for free). The dotted line from Heyting categories to well-powered geometric categories is meant to indicate that, technically, all WPG categories are also Heyting categories, but the Heyting structure is not preserved by geometric functors and therefore should not be "relied on" when reasoning about WPG categories as a whole. And finally, the table at the bottom shows what operators exist in the internal logic of these various categories, as well as what structure the subobject posets have.

Despite being far less polished than my other maps, hopefully it'll give you enough to go on so that you can read through the pages at n-lab in the correct order.

[1] For the arcs which are explained, the following abbreviations are used: "So.C." = subobject classifier; "AMC" = axiom of multiple choice; "NNO" = natural numbers object.; "LCCC" = locally cartesian closed.

winterkoninkje: shadowcrane (clean) (Default)

I've been mucking about with real analysis lately. One of the things I've always hated about real analysis (and complex analysis, and other calculus) is the unrepentant lack of types. It's easy to get confused about what's going on and what goes where when you think everything is a real number (or complex number, or tensors thereof). In particular, because of this "everything is a real array" assumption, libraries for performing optimization etc end up with utterly inscrutable APIs and other nastiness like needing to manually flatten your parameters into a single array. So this has got me thinking, what should the API for optimization libraries look like? Which in turn brings me back to an age-old conundrum I've toyed around with before, namely: what is the type of the derivative operator?

From the "everything is a real number" perspective we'd say deriv : (ℝ ↝ ℝ) → ℝ → ℝ, where the A ↝ B function space are "nice" functions (e.g., continuous, smooth, analytic, whatever). However, this only works out because we're conflating a number of important differences. For example, for any affine space A we have the difference space ∆A— that is, values in ∆A denote differences or differentials of values in A (e.g., for any a : A and b : A we have a-b : ∆A). However, it turns out that there's a bijection between and ∆ℝ. It also turns out that the difference of arrays, ∆(ℝn), is isomorphic to an array of differences, (∆ℝ)n. Putting these together we get the common conflation between points (n) and vectors (∆(ℝn)). For another example, consider linear transformations, written A ⊸ B. We have a bijection between linear transformations (m ⊸ ℝn) and matrices (n×m), and hence more specifically a bijection between and ℝ ⊸ ℝ.

So here's what I'm currently thinking:

deriv : (F X ↝ Y) → F X → F(∆X ⊸ ∆Y)

For now, just ignore F; instantiate it as the identity functor. Thus, the total derivative of f : X ↝ Y at the point x : X is a linear transformation from minor perturbations about x to minor perturbations about f x. We can think of ∆X ⊸ ∆Y as being the "slope" of this mapping between minor perturbations. In particular, when X=ℝ and Y=ℝ, we get a bijection between ∆ℝ ⊸ ∆ℝ and , so this coincides with the traditional notion of the slope of a one-dimensional curve.

Now, let's think about the gradient rather than the total derivative. Assume that F is a "nice" functor (i.e., representable, traversable, etc). Because the functor is representable, we have an isomorphism between F A and I → A (natural in A), where I is the type of positions/indices in F. Thus, the gradient of a function g : F X ↝ Y at the point z : F X is essentially a function from F-indices, i : I, to the i-th partial derivative of g at z. Why partial derivative? Because the linear transformation only takes ∆X as an argument —not ∆(F X)—, thus we can only modify a single "scalar" at a time, and the other scalars in z must remain fixed.

Edit 2014.02.17 (#1): Actually, things are a bit more complicated than the above. For example, consider a function f : ℝn ↝ ℝ. The derivative of f with respect to its vector of inputs is not a vector of partial derivatives— rather, it's a covector of partial derivatives! (i.e., deriv (f : ℝn×1 ↝ ℝ) : ℝn×1 → ℝ1×n.) Thus, we should really have that the return type of deriv is some Fop(∆X ⊸ ∆Y) where Fop is some sort of "dual" of F. It's not clear a priori which notion of "duality" is in play here, however.

Edit 2014.02.17 (#2): And here's what I'm currently thinking about how to incorporate the Jacobian into the above. Consider this particular instance of the derivative operator, deriv : (F X ↝ G Y) → F X → Fop(∆X ⊸ ∆(G Y)). When G is some type of vectors (e.g., G Y = Yn for some fixed n), we get ∆(G Y) = G(∆Y) as mentioned before. And let's assume ∆X ⊸ G(∆Y) = G(∆X ⊸ ∆Y); presumably this follows immediately by linearity, though I haven't verified that yet. And finally, in our standard real analysis we get that G∘Fop and Fop∘G are the same— that is, vectors-of-covectors and covectors-of-vectors are essentially the same thing; they're just the row-major and column-major representations of matrices, respectively. And then, putting all these together we get that the original return type Fop(∆X ⊸ ∆(G Y)) is isomorphic to G(Fop(∆X ⊸ ∆Y)). So when X=ℝ and Y=ℝ we get the expected deriv : (ℝm ↝ ℝn) → ℝm → ℝn×m. Now, it remains to show how this extends to other choices of F and G...

Edit 2014.02.17 (#3): It was pointed out on reddit that the above is "well known"— at least for those who are familiar with differentiable manifolds and tangent spaces. Now, I don't know much about manifolds, but it certainly wasn't well-known to me— which, in itself, says a lot about how little this knowledge has spread to the rest of mathematics. I'm glad to hear there's some followup reading to be done here, but I still don't think the necessary work has been done in terms of turning this into a decent API for optimization libraries.

winterkoninkje: shadowcrane (clean) (Default)

So there was a discussion recently on the libraries mailing list about how to deal with MonadPlus. In particular, the following purported law fails all over the place: x >> mzero = mzero. The reason it fails is that we are essentially assuming that any "effects" that x has can be undone once we realize the whole computation is supposed to "fail". Indeed this rule is too strong to make sense for our general notion that MonadPlus provides a notion of choice or addition. I propose that the correct notion that MonadPlus should capture is that of a right-seminearring. (The name right-nearsemiring is also used in the literature.) Below I explain what the heck a (right-)seminearring is.

Monoids

First, I will assume you know what a monoid is. In particular, it's any associative binary operation with a distinguished element which serves as both left- and right-identity for the binary operation. These are ubiquitous and have become fairly well-known in the Haskell community of late. A prime example is (+,0) —that is, addition together with the zero element; for just about any any notion of "numbers". Another prime example is (*,1)— multiplication together with unit; again, for just about any notion of "numbers".

An important caveat regarding intuitions is that: both "addition" and "multiplication" of our usual notions of numbers turn out to be commutative monoids. For the non-commutative case, let's turn to regular expressions (regexes). First we have the notion of regex catenation, which captures the notion of sequencing: first we match one regex and then another; let's write this as (*,1) where here we take 1 to mean the regex which matches only the empty string. This catenation of strings is very different from multiplication of numbers because we can't swap things around. The regex a*b will first match a and then match b; whereas the regex b*a will match b first. Nevertheless, catenation (of strings/sequences/regexes/graphs/...) together with the empty element still forms a monoid because catenation is associative and catenating the empty element does nothing, no matter which side you catenate on.

Importantly, the non-deterministic choice for regexes also forms a monoid: (+,0) where we take 0 to be the absurd element. Notably, the empty element (e.g., the singleton set of strings, containing only the empty string) is distinct from the absurd element (e.g., the empty set of strings). We often spell 1 as ε and spell 0 as ; but I'm going to stick with the arithmetical notation of 1 and 0.

Seminearrings

Okay, so what the heck is a right-seminearring? First, we assume some ambient set of elements. They could be "numbers" or "strings" or "graphs" or whatever; but we'll just call them elements. Second, we assume we have a semigroup (*)— that is, our * operator is associative, and that's it. Semigroups are just monoids without the identity element. In our particular case, we're going to assume that * is non-commutative. Thus, it's going to work like catenation— except we don't necessarily have an empty element to work with. Third, we assume we have some monoid (+,0). Our + operator is going to act like non-deterministic choice in regexes— but, we're not going to assume that it's commutative! That is, while it represents "choice", it's some sort of biased choice. Maybe we always try the left option first; or maybe we always try the right option first; or maybe we flip a biased coin and try the left option first only 80% of the time; whatever, the point is it's not entirely non-deterministic, so we can't simply flip our additions around. Finally, we require that our (*) semigroup distributes from the right over our (+,0) monoid (or conversely, that we can factor the monoid out from under the semigroup, again only factoring out parts that are on the right). That is, symbolically, we require the following two laws to hold:

0*x = 0
(x+y)*z = (x*z)+(y*z)

So, what have we done here? Well, we have these two interlocking operations where "catenation" distributes over "choice". What the first law mean is that: (1) if we first do something absurd or impossible and then do x, well that's impossible. We'll never get around to doing x so we might as well just drop that part. The second law means: (2) if we first have a choice between x and y and then we'll catenate whichever one with z, this is the same as saying our choice is really between doing x followed by z vs doing y followed by z.

MonadPlus

Okay, so what does any of this have to do with MonadPlus? Intuitively, our * operator is performing catenation or sequencing of things. Monads are all about sequencing. So how about we use the monad operator (>>) as our "multiplication"! This does what we need it to since (>>) is associative, by the monad laws. In order to turn a monad into a MonadPlus we must define mplus (aka the + operator) and we must define a mzero (aka the 0 element). And the laws our MonadPlus instance must uphold are just the two laws about distributing/factoring on the right. In restating them below, I'm going to generalize the laws to use (>>=) in lieu of (>>):

mzero >>= f = mzero
(x `mplus` y) >>= f = (x >>= f) `mplus` (y >>= f)

And the reason why these laws make sense are just as described before. If we're going to "fail" or do something absurd followed by doing something else, well we'll never get around to that something else because we've already "failed". And if we first make a choice and then end up doing the same thing regardless of the choice we made, well we can just push that continuation down underneath the choice.

Both of these laws make intuitive sense for what we want out of MonadPlus. And given that seminearrings are something which have shown up often enough to be named, it seems reasonable to assume that's the actual pattern we're trying to capture. The one sticking point I could see is my generalization to using (>>=). In the second law, we allow f to be a function which "looks inside" the monad, rather than simply being some fixed monadic value z. There's a chance that some current MonadPlus implementations will break this law because of that insight. If so, then we can still back off to the weaker claim that MonadPlus should implement a right-seminearring exactly, i.e., with the (>>) operator as our notion of multiplication/catenation. This I leave as an exercise for the reader. This is discussed further in the addendum below.

Notably, from these laws it is impossible to derive x*0 = 0, aka x >> mzero = mzero. And indeed that is a stringent requirement to have, since it means we must be able to undo the "effects" of x, or else avoid doing those "effects" in the first place by looking into the future to know that we will eventually "fail". If we could look into the future to know we will fail, then we could implement backtracking search for logic programming in such a way that we always pick the right answer. Not just return results consistent with always choosing the right answer, which backtracking allows us to do; but rather, to always know the right answer beforehand and so never need to backtrack! If we satisfy the x*0 = 0 law, then we could perform all the "search" during compile time when we're applying the rewrite rule associated with this law.

Addendum

There's a long history of debate between proponents of the generalized distribution law I presented above, vs the so-called "catch" law. In particular, Maybe, IO, and STM obey the catch law but do not obey the generalized distribution law. To give an example, consider the following function:

f a' = if a == a' then mzero else return a'

Which is used in the following code and evaluation trace for the Maybe monad:

mplus (return a) b >>= f
⟶ Just a >>= f
⟶ f a
⟶ if a == a then mzero else return a
⟶ mzero

As opposed to the following code and evaluation trace:

mplus (return a >>= f) (b >>= f)
⟶ mplus (f a) (b >>= f)
⟶ mplus mzero (b >>= f)
⟶ b >>= f

But b >>= f is not guaranteed to be identical to mzero. The problem here is, as I suspected, because the generalized distribution law allows the continuation to "look inside". If we revert back to the non-generalized distribution law which uses (>>), then this problem goes away— at least for the Maybe monad.

Second Addendum (2014.02.06)

Even though Maybe satisfies the non-generalized distributivity laws, it's notable that other problematic MonadPlus instances like IO fail even there! For example,

First consider mplus a b >> (c >> mzero). Whenever a succeeds, we get that this is the same as a >> c >> mzero; and if a fails, then this is the same as a' >> b >> c >> mzero where a' is the prefix of a up until failure occurs.

Now instead consider mplus (a >> c >> mzero) (b >> c >> mzero). Here, if a succeeds, then this is the same as a >> c >> b >> c >> mzero; and if a fails, then it's the same as a' >> b >> c >> mzero. So the problem is, depending on whether we distribute or not, the effects of c will occur once or twice.

Notably, the problem we're running into here is exactly the same one we started out with, the failure of x >> mzero = mzero. Were this law to hold for IO (etc) then we wouldn't run into the problem of running c once or twice depending on distributivity.

RSS Atom

June 2017

S M T W T F S
    123
45678910
11121314151617
18192021 222324
252627282930 

Tags

Page generated 26 Jul 2017 12:33 am
Powered by Dreamwidth Studios