winterkoninkje: shadowcrane (clean) (Default)

So there was a discussion recently on the libraries mailing list about how to deal with MonadPlus. In particular, the following purported law fails all over the place: x >> mzero = mzero. The reason it fails is that we are essentially assuming that any "effects" that x has can be undone once we realize the whole computation is supposed to "fail". Indeed this rule is too strong to make sense for our general notion that MonadPlus provides a notion of choice or addition. I propose that the correct notion that MonadPlus should capture is that of a right-seminearring. (The name right-nearsemiring is also used in the literature.) Below I explain what the heck a (right-)seminearring is.

Monoids

First, I will assume you know what a monoid is. In particular, it's any associative binary operation with a distinguished element which serves as both left- and right-identity for the binary operation. These are ubiquitous and have become fairly well-known in the Haskell community of late. A prime example is (+,0) —that is, addition together with the zero element; for just about any any notion of "numbers". Another prime example is (*,1)— multiplication together with unit; again, for just about any notion of "numbers".

An important caveat regarding intuitions is that: both "addition" and "multiplication" of our usual notions of numbers turn out to be commutative monoids. For the non-commutative case, let's turn to regular expressions (regexes). First we have the notion of regex catenation, which captures the notion of sequencing: first we match one regex and then another; let's write this as (*,1) where here we take 1 to mean the regex which matches only the empty string. This catenation of strings is very different from multiplication of numbers because we can't swap things around. The regex a*b will first match a and then match b; whereas the regex b*a will match b first. Nevertheless, catenation (of strings/sequences/regexes/graphs/...) together with the empty element still forms a monoid because catenation is associative and catenating the empty element does nothing, no matter which side you catenate on.

Importantly, the non-deterministic choice for regexes also forms a monoid: (+,0) where we take 0 to be the absurd element. Notably, the empty element (e.g., the singleton set of strings, containing only the empty string) is distinct from the absurd element (e.g., the empty set of strings). We often spell 1 as ε and spell 0 as ; but I'm going to stick with the arithmetical notation of 1 and 0.

Seminearrings

Okay, so what the heck is a right-seminearring? First, we assume some ambient set of elements. They could be "numbers" or "strings" or "graphs" or whatever; but we'll just call them elements. Second, we assume we have a semigroup (*)— that is, our * operator is associative, and that's it. Semigroups are just monoids without the identity element. In our particular case, we're going to assume that * is non-commutative. Thus, it's going to work like catenation— except we don't necessarily have an empty element to work with. Third, we assume we have some monoid (+,0). Our + operator is going to act like non-deterministic choice in regexes— but, we're not going to assume that it's commutative! That is, while it represents "choice", it's some sort of biased choice. Maybe we always try the left option first; or maybe we always try the right option first; or maybe we flip a biased coin and try the left option first only 80% of the time; whatever, the point is it's not entirely non-deterministic, so we can't simply flip our additions around. Finally, we require that our (*) semigroup distributes from the right over our (+,0) monoid (or conversely, that we can factor the monoid out from under the semigroup, again only factoring out parts that are on the right). That is, symbolically, we require the following two laws to hold:

0*x = 0
(x+y)*z = (x*z)+(y*z)

So, what have we done here? Well, we have these two interlocking operations where "catenation" distributes over "choice". What the first law mean is that: (1) if we first do something absurd or impossible and then do x, well that's impossible. We'll never get around to doing x so we might as well just drop that part. The second law means: (2) if we first have a choice between x and y and then we'll catenate whichever one with z, this is the same as saying our choice is really between doing x followed by z vs doing y followed by z.

MonadPlus

Okay, so what does any of this have to do with MonadPlus? Intuitively, our * operator is performing catenation or sequencing of things. Monads are all about sequencing. So how about we use the monad operator (>>) as our "multiplication"! This does what we need it to since (>>) is associative, by the monad laws. In order to turn a monad into a MonadPlus we must define mplus (aka the + operator) and we must define a mzero (aka the 0 element). And the laws our MonadPlus instance must uphold are just the two laws about distributing/factoring on the right. In restating them below, I'm going to generalize the laws to use (>>=) in lieu of (>>):

mzero >>= f = mzero
(x `mplus` y) >>= f = (x >>= f) `mplus` (y >>= f)

And the reason why these laws make sense are just as described before. If we're going to "fail" or do something absurd followed by doing something else, well we'll never get around to that something else because we've already "failed". And if we first make a choice and then end up doing the same thing regardless of the choice we made, well we can just push that continuation down underneath the choice.

Both of these laws make intuitive sense for what we want out of MonadPlus. And given that seminearrings are something which have shown up often enough to be named, it seems reasonable to assume that's the actual pattern we're trying to capture. The one sticking point I could see is my generalization to using (>>=). In the second law, we allow f to be a function which "looks inside" the monad, rather than simply being some fixed monadic value z. There's a chance that some current MonadPlus implementations will break this law because of that insight. If so, then we can still back off to the weaker claim that MonadPlus should implement a right-seminearring exactly, i.e., with the (>>) operator as our notion of multiplication/catenation. This I leave as an exercise for the reader. This is discussed further in the addendum below.

Notably, from these laws it is impossible to derive x*0 = 0, aka x >> mzero = mzero. And indeed that is a stringent requirement to have, since it means we must be able to undo the "effects" of x, or else avoid doing those "effects" in the first place by looking into the future to know that we will eventually "fail". If we could look into the future to know we will fail, then we could implement backtracking search for logic programming in such a way that we always pick the right answer. Not just return results consistent with always choosing the right answer, which backtracking allows us to do; but rather, to always know the right answer beforehand and so never need to backtrack! If we satisfy the x*0 = 0 law, then we could perform all the "search" during compile time when we're applying the rewrite rule associated with this law.

Addendum

There's a long history of debate between proponents of the generalized distribution law I presented above, vs the so-called "catch" law. In particular, Maybe, IO, and STM obey the catch law but do not obey the generalized distribution law. To give an example, consider the following function:

f a' = if a == a' then mzero else return a'

Which is used in the following code and evaluation trace for the Maybe monad:

mplus (return a) b >>= f
⟶ Just a >>= f
⟶ f a
⟶ if a == a then mzero else return a
⟶ mzero

As opposed to the following code and evaluation trace:

mplus (return a >>= f) (b >>= f)
⟶ mplus (f a) (b >>= f)
⟶ mplus mzero (b >>= f)
⟶ b >>= f

But b >>= f is not guaranteed to be identical to mzero. The problem here is, as I suspected, because the generalized distribution law allows the continuation to "look inside". If we revert back to the non-generalized distribution law which uses (>>), then this problem goes away— at least for the Maybe monad.

Second Addendum (2014.02.06)

Even though Maybe satisfies the non-generalized distributivity laws, it's notable that other problematic MonadPlus instances like IO fail even there! For example,

First consider mplus a b >> (c >> mzero). Whenever a succeeds, we get that this is the same as a >> c >> mzero; and if a fails, then this is the same as a' >> b >> c >> mzero where a' is the prefix of a up until failure occurs.

Now instead consider mplus (a >> c >> mzero) (b >> c >> mzero). Here, if a succeeds, then this is the same as a >> c >> b >> c >> mzero; and if a fails, then it's the same as a' >> b >> c >> mzero. So the problem is, depending on whether we distribute or not, the effects of c will occur once or twice.

Notably, the problem we're running into here is exactly the same one we started out with, the failure of x >> mzero = mzero. Were this law to hold for IO (etc) then we wouldn't run into the problem of running c once or twice depending on distributivity.

winterkoninkje: shadowcrane (clean) (Default)

Apparently I messed up my link to the map of ring theory last time. It's fixed now, but the fact that noone commented on it means noone's reading this blog anymore (or those who do don't care). Ah well. Regardless, it's time to finish up this little series for now.

Anyone who's hung around me long enough knows that I'm a great fan of semirings. Partly this is because of their ubiquity in natural language processing, but mostly it's because they're the intersection of two of the most powerful and ubiquitous abstractions in mathematics. Semirings simultaneously generalize rings (by not necessarily having negation) and bounded lattices (by not necessarily being commutative nor having both operators distribute). Therefore, they generalize both arithmetic and ordering, two of the core components of mathematics. Also the fact that the natural numbers do not form a ring is a strong case for taking semirings seriously. I don't have a map this time, instead I have an extensive list of common examples. The examples fall into about half a dozen common patterns.

  • Natural number-like semirings
  • Integer-like rings
  • Tropical and Viterbi semirings
  • Bounded distributive lattices
  • Boolean rings
  • Regular languages
  • Square matrices over a semiring

I'm sure you're already familiar with the natural numbers, integers, etc., so I won't spend much time talking about them. The one thing worth mentioning here, though, is what the heck that last column is for. It's for *-semirings (i.e., closed semirings), not to be confused with *-(semi)rings (i.e., (semi)rings with involution). Jacques Carette introduced me and Russell O'Connor to them, and Russell has written a bit about them. Since he does a good job of explaining them, I won't talk about them just now. But, because they're a bit obscure, I made sure to indicate what the asteration operation is when it exists. The one especially notable point here is that the Alexandroff compactification of the reals is no longer a ring! (Another reason to take semirings seriously.) We give up the ring-ness of the reals to obtain closed-ness and compact-ness.

The tropical and Viterbi semirings are a sort of cross between arithmetic and ordering, and you've probably never heard of them before. They are, however, rather common in computer science. They tend to arise whenever we use dynamic programming in order to find the "best" of some set of things: e.g., the most likely tag sequence, the shortest path, the longest path, etc. This is what part of speech tagging, matrix chain multiplication, and Dijkstra's algorithm are all about. And recognizing that these are semirings is what leads to generalizations like forward–backward and inside–outside, to say nothing of the Gauss–Jordan–Floyd–Warshall–McNaughton–Yamada algorithm that comes from focusing on *-semirings.

The last major class I'll talk about are the bounded distributive lattices. The reason to discuss these is just to point out that this includes all Boolean algebras (hence classical logic and subset inclusion hierarchies), all Heyting algebras (hence intuitionistic logic), and all bounded totally ordered sets.

This chart is released under Creative Commons Attribution-ShareAlike 3.0. Any questions? Anything I missed? Anything that needs further explanation?

winterkoninkje: shadowcrane (clean) (Default)

Last time we talked about sets which support a single binary operator. That's all well and good, but things get really interesting once we have two of those operators. Thus, for lack of a better name, I present a map of ring theory. The majority of this isn't actually part of ring theory proper, of course; but that's as good a name as any, since we'll want to distinguish these from lattice theory which also has two interlocking operators.

In the center of the map are (unital) rings. For those who aren't aware, in older literature, what are now called pseudorings used to be called "rings", and what are now called rings used to be called "unital rings" or "rings with unit". I've run into very few proper pseudorings of interest, so I support this change in terminology. In any case, if we start from rings and move upward we get rid of properties. In addition to pseudorings we also get semirings, which are utterly ubiquitous. In the next post I'll give a few dozen of the most common semirings, so I'll save my rant for then. If we keep moving up then we start getting rid of associativities and distributivities, resulting in things like seminearrings (aka near semirings) which arise naturally in certain kinds of parsing problems. This area is poorly explored, hence the dearth of names in the upper part of the map. This is, however, where I've been spending a lot of time lately; so you'll probably hear more about seminearrings and their ilk in the near future. As the names suggest, we have both left-near and right-near versions of these various structures, though I only draw the right-near ones for clarity.

Moving downward from semirings there are two directions to head. Off to the left we run into Kleene algebras and lattice theory yet again. And to the south we run into the swamp along the way to fields. In spite of their apparent axiomatization, fields are not really algebraic objects, which is a big part of the reason for all this mess. In the lower left we see a chain of inclusions based on some peculiar properties like every ideal being principal, the existence of greatest common divisors (though not necessarily computable with Euclid's algorithm), the ascending chain condition on principal ideals, etc. These properties will be familiar to the actual ring theorists, as would numerous other properties I didn't bother putting on here. Off to the lower right we get a different sort of breakdown. In particular, before we get unique inverses for all non-zero elements, we can instead just have pseudoinverses or strong pseudoinverses. This is similar to the distinction between (von Neumann) regular semigroups vs inverse semigroups.

There's plenty more to say about ring theory and related areas, but that would be a whole series of posts on its own. This is actually the first map I started to make, because this is the region where we find so many mathematicians coming in from different areas and not being familiar with all that has been done before. As I'm sure you notice, quite a lot has been done, both in breadth and in depth. I brought this map up once when chatting with Amr Sabry, and he's the one who convinced me to finally get around to posting all these. So, hopefully these'll give y'all a better lay of the land.

There are some notable omissions from this map as it stands. In particular, complete semirings (aka *-semirings) are missing, as are rings with involution (aka *-rings), as are the various constructive notions of fields (like discrete fields, Heyting fields, residue fields, meadows, etc.). Next time I'll talk a bit about complete semirings; they were omitted mainly for lack of space, but should be included in future versions. The various constructive notions of fields were also omitted mainly for space reasons. I'll probably end up making a close-up map of the swamplands between rings and fields in order to do justice to them. Rings with involution were omitted mainly because I'm not entirely sure where the best place to put them is. As the name suggests, they've been primarily studied in contexts where you have a full-blown ring. However, there's nothing about the involution which really requires the existence of negation. I've started chatting with Jacques Carette about related topics recently, though; so perhaps I'll have more to say about them later.

This map is released under Creative Commons Attribution-ShareAlike 3.0. Any questions? Anything I missed? Anything that needs further explanation?

RSS Atom

April 2017

S M T W T F S
      1
2 345678
9101112131415
161718192021 22
23242526272829
30      

Tags

Page generated 27 May 2017 01:39 pm
Powered by Dreamwidth Studios