winterkoninkje: shadowcrane (clean) (Default)

Meanwhile, back in math land... A couple-few months ago I was doing some work on apartness relations. In particular, I was looking into foundational issues, and into what an apartness-based (rather than equality-based) dependently-typed programming language would look like. Unfortunately, too many folks think "constructive mathematics" only means BHK-style intuitionistic logic— whereas constructive mathematics includes all sorts of other concepts, and they really should be better known!

So I started writing a preamble post, introducing the basic definitions and ideas behind apartnesses, and... well, I kinda got carried away. Instead of a blog post I kinda ended up with a short chapter. And then, well, panic struck. In the interests of Publish Ever, Publish Often, I thought I might as well share it: a brief introduction to apartness relations. As with my blog posts, I'm releasing it under Creative Commons Attribution-NonCommercial-NoDerivs 4.0; so feel free to share it and use it for classes. But, unlike the other columbicubiculomania files, it is not ShareAlike— since I may actually turn it into a published chapter someday. So do respect that. And if you have a book that needs some chapters on apartness relations, get in touch!

The intro goes a little something like this:

We often talk about values being "the same as" or "different from" one another. But how can we formalize these notions? In particular, how should we do so in a constructive setting?

Constructively, we lack a general axiom for double-negation elimination; therefore, every primitive notion gives rise to both strong (strictly positive) and weak (doubly-negated) propositions. Thus, from the denial of (weak) difference we can only conclude weak sameness. Consequently, in the constructive setting it is often desirable to take difference to be a primitive— so that, from the denial of strong difference we can in fact conclude strong sameness.

This ability "un-negate" sameness is the principal reason for taking difference to be one of our primitive notions. While nice in and of itself, it also causes the strong and weak notions of sameness to become logically equivalent (thm 1.4); enabling us to drop the qualifiers when discussing sameness.

But if not being different is enough to be considered the same, then do we still need sameness to be primitive? To simplify our reasoning, we may wish to have sameness be defined as the lack of difference. However, this is not without complications. Sameness has been considered a natural primitive for so long that it has accrued many additional non-propositional properties (e.g., the substitution principle). So, if we eliminate the propositional notion of primitive equality, we will need somewhere else to hang those coats.

The rest of the paper fleshes out these various ideas.

winterkoninkje: shadowcrane (clean) (Default)

I've been mucking about with real analysis lately. One of the things I've always hated about real analysis (and complex analysis, and other calculus) is the unrepentant lack of types. It's easy to get confused about what's going on and what goes where when you think everything is a real number (or complex number, or tensors thereof). In particular, because of this "everything is a real array" assumption, libraries for performing optimization etc end up with utterly inscrutable APIs and other nastiness like needing to manually flatten your parameters into a single array. So this has got me thinking, what should the API for optimization libraries look like? Which in turn brings me back to an age-old conundrum I've toyed around with before, namely: what is the type of the derivative operator?

From the "everything is a real number" perspective we'd say deriv : (ℝ ↝ ℝ) → ℝ → ℝ, where the A ↝ B function space are "nice" functions (e.g., continuous, smooth, analytic, whatever). However, this only works out because we're conflating a number of important differences. For example, for any affine space A we have the difference space ∆A— that is, values in ∆A denote differences or differentials of values in A (e.g., for any a : A and b : A we have a-b : ∆A). However, it turns out that there's a bijection between and ∆ℝ. It also turns out that the difference of arrays, ∆(ℝn), is isomorphic to an array of differences, (∆ℝ)n. Putting these together we get the common conflation between points (n) and vectors (∆(ℝn)). For another example, consider linear transformations, written A ⊸ B. We have a bijection between linear transformations (m ⊸ ℝn) and matrices (n×m), and hence more specifically a bijection between and ℝ ⊸ ℝ.

So here's what I'm currently thinking:

deriv : (F X ↝ Y) → F X → F(∆X ⊸ ∆Y)

For now, just ignore F; instantiate it as the identity functor. Thus, the total derivative of f : X ↝ Y at the point x : X is a linear transformation from minor perturbations about x to minor perturbations about f x. We can think of ∆X ⊸ ∆Y as being the "slope" of this mapping between minor perturbations. In particular, when X=ℝ and Y=ℝ, we get a bijection between ∆ℝ ⊸ ∆ℝ and , so this coincides with the traditional notion of the slope of a one-dimensional curve.

Now, let's think about the gradient rather than the total derivative. Assume that F is a "nice" functor (i.e., representable, traversable, etc). Because the functor is representable, we have an isomorphism between F A and I → A (natural in A), where I is the type of positions/indices in F. Thus, the gradient of a function g : F X ↝ Y at the point z : F X is essentially a function from F-indices, i : I, to the i-th partial derivative of g at z. Why partial derivative? Because the linear transformation only takes ∆X as an argument —not ∆(F X)—, thus we can only modify a single "scalar" at a time, and the other scalars in z must remain fixed.

Edit 2014.02.17 (#1): Actually, things are a bit more complicated than the above. For example, consider a function f : ℝn ↝ ℝ. The derivative of f with respect to its vector of inputs is not a vector of partial derivatives— rather, it's a covector of partial derivatives! (i.e., deriv (f : ℝn×1 ↝ ℝ) : ℝn×1 → ℝ1×n.) Thus, we should really have that the return type of deriv is some Fop(∆X ⊸ ∆Y) where Fop is some sort of "dual" of F. It's not clear a priori which notion of "duality" is in play here, however.

Edit 2014.02.17 (#2): And here's what I'm currently thinking about how to incorporate the Jacobian into the above. Consider this particular instance of the derivative operator, deriv : (F X ↝ G Y) → F X → Fop(∆X ⊸ ∆(G Y)). When G is some type of vectors (e.g., G Y = Yn for some fixed n), we get ∆(G Y) = G(∆Y) as mentioned before. And let's assume ∆X ⊸ G(∆Y) = G(∆X ⊸ ∆Y); presumably this follows immediately by linearity, though I haven't verified that yet. And finally, in our standard real analysis we get that G∘Fop and Fop∘G are the same— that is, vectors-of-covectors and covectors-of-vectors are essentially the same thing; they're just the row-major and column-major representations of matrices, respectively. And then, putting all these together we get that the original return type Fop(∆X ⊸ ∆(G Y)) is isomorphic to G(Fop(∆X ⊸ ∆Y)). So when X=ℝ and Y=ℝ we get the expected deriv : (ℝm ↝ ℝn) → ℝm → ℝn×m. Now, it remains to show how this extends to other choices of F and G...

Edit 2014.02.17 (#3): It was pointed out on reddit that the above is "well known"— at least for those who are familiar with differentiable manifolds and tangent spaces. Now, I don't know much about manifolds, but it certainly wasn't well-known to me— which, in itself, says a lot about how little this knowledge has spread to the rest of mathematics. I'm glad to hear there's some followup reading to be done here, but I still don't think the necessary work has been done in terms of turning this into a decent API for optimization libraries.

winterkoninkje: shadowcrane (clean) (Default)

So there was a discussion recently on the libraries mailing list about how to deal with MonadPlus. In particular, the following purported law fails all over the place: x >> mzero = mzero. The reason it fails is that we are essentially assuming that any "effects" that x has can be undone once we realize the whole computation is supposed to "fail". Indeed this rule is too strong to make sense for our general notion that MonadPlus provides a notion of choice or addition. I propose that the correct notion that MonadPlus should capture is that of a right-seminearring. (The name right-nearsemiring is also used in the literature.) Below I explain what the heck a (right-)seminearring is.


First, I will assume you know what a monoid is. In particular, it's any associative binary operation with a distinguished element which serves as both left- and right-identity for the binary operation. These are ubiquitous and have become fairly well-known in the Haskell community of late. A prime example is (+,0) —that is, addition together with the zero element; for just about any any notion of "numbers". Another prime example is (*,1)— multiplication together with unit; again, for just about any notion of "numbers".

An important caveat regarding intuitions is that: both "addition" and "multiplication" of our usual notions of numbers turn out to be commutative monoids. For the non-commutative case, let's turn to regular expressions (regexes). First we have the notion of regex catenation, which captures the notion of sequencing: first we match one regex and then another; let's write this as (*,1) where here we take 1 to mean the regex which matches only the empty string. This catenation of strings is very different from multiplication of numbers because we can't swap things around. The regex a*b will first match a and then match b; whereas the regex b*a will match b first. Nevertheless, catenation (of strings/sequences/regexes/graphs/...) together with the empty element still forms a monoid because catenation is associative and catenating the empty element does nothing, no matter which side you catenate on.

Importantly, the non-deterministic choice for regexes also forms a monoid: (+,0) where we take 0 to be the absurd element. Notably, the empty element (e.g., the singleton set of strings, containing only the empty string) is distinct from the absurd element (e.g., the empty set of strings). We often spell 1 as ε and spell 0 as ; but I'm going to stick with the arithmetical notation of 1 and 0.


Okay, so what the heck is a right-seminearring? First, we assume some ambient set of elements. They could be "numbers" or "strings" or "graphs" or whatever; but we'll just call them elements. Second, we assume we have a semigroup (*)— that is, our * operator is associative, and that's it. Semigroups are just monoids without the identity element. In our particular case, we're going to assume that * is non-commutative. Thus, it's going to work like catenation— except we don't necessarily have an empty element to work with. Third, we assume we have some monoid (+,0). Our + operator is going to act like non-deterministic choice in regexes— but, we're not going to assume that it's commutative! That is, while it represents "choice", it's some sort of biased choice. Maybe we always try the left option first; or maybe we always try the right option first; or maybe we flip a biased coin and try the left option first only 80% of the time; whatever, the point is it's not entirely non-deterministic, so we can't simply flip our additions around. Finally, we require that our (*) semigroup distributes from the right over our (+,0) monoid (or conversely, that we can factor the monoid out from under the semigroup, again only factoring out parts that are on the right). That is, symbolically, we require the following two laws to hold:

0*x = 0
(x+y)*z = (x*z)+(y*z)

So, what have we done here? Well, we have these two interlocking operations where "catenation" distributes over "choice". What the first law mean is that: (1) if we first do something absurd or impossible and then do x, well that's impossible. We'll never get around to doing x so we might as well just drop that part. The second law means: (2) if we first have a choice between x and y and then we'll catenate whichever one with z, this is the same as saying our choice is really between doing x followed by z vs doing y followed by z.


Okay, so what does any of this have to do with MonadPlus? Intuitively, our * operator is performing catenation or sequencing of things. Monads are all about sequencing. So how about we use the monad operator (>>) as our "multiplication"! This does what we need it to since (>>) is associative, by the monad laws. In order to turn a monad into a MonadPlus we must define mplus (aka the + operator) and we must define a mzero (aka the 0 element). And the laws our MonadPlus instance must uphold are just the two laws about distributing/factoring on the right. In restating them below, I'm going to generalize the laws to use (>>=) in lieu of (>>):

mzero >>= f = mzero
(x `mplus` y) >>= f = (x >>= f) `mplus` (y >>= f)

And the reason why these laws make sense are just as described before. If we're going to "fail" or do something absurd followed by doing something else, well we'll never get around to that something else because we've already "failed". And if we first make a choice and then end up doing the same thing regardless of the choice we made, well we can just push that continuation down underneath the choice.

Both of these laws make intuitive sense for what we want out of MonadPlus. And given that seminearrings are something which have shown up often enough to be named, it seems reasonable to assume that's the actual pattern we're trying to capture. The one sticking point I could see is my generalization to using (>>=). In the second law, we allow f to be a function which "looks inside" the monad, rather than simply being some fixed monadic value z. There's a chance that some current MonadPlus implementations will break this law because of that insight. If so, then we can still back off to the weaker claim that MonadPlus should implement a right-seminearring exactly, i.e., with the (>>) operator as our notion of multiplication/catenation. This I leave as an exercise for the reader. This is discussed further in the addendum below.

Notably, from these laws it is impossible to derive x*0 = 0, aka x >> mzero = mzero. And indeed that is a stringent requirement to have, since it means we must be able to undo the "effects" of x, or else avoid doing those "effects" in the first place by looking into the future to know that we will eventually "fail". If we could look into the future to know we will fail, then we could implement backtracking search for logic programming in such a way that we always pick the right answer. Not just return results consistent with always choosing the right answer, which backtracking allows us to do; but rather, to always know the right answer beforehand and so never need to backtrack! If we satisfy the x*0 = 0 law, then we could perform all the "search" during compile time when we're applying the rewrite rule associated with this law.


There's a long history of debate between proponents of the generalized distribution law I presented above, vs the so-called "catch" law. In particular, Maybe, IO, and STM obey the catch law but do not obey the generalized distribution law. To give an example, consider the following function:

f a' = if a == a' then mzero else return a'

Which is used in the following code and evaluation trace for the Maybe monad:

mplus (return a) b >>= f
⟶ Just a >>= f
⟶ f a
⟶ if a == a then mzero else return a
⟶ mzero

As opposed to the following code and evaluation trace:

mplus (return a >>= f) (b >>= f)
⟶ mplus (f a) (b >>= f)
⟶ mplus mzero (b >>= f)
⟶ b >>= f

But b >>= f is not guaranteed to be identical to mzero. The problem here is, as I suspected, because the generalized distribution law allows the continuation to "look inside". If we revert back to the non-generalized distribution law which uses (>>), then this problem goes away— at least for the Maybe monad.

Second Addendum (2014.02.06)

Even though Maybe satisfies the non-generalized distributivity laws, it's notable that other problematic MonadPlus instances like IO fail even there! For example,

First consider mplus a b >> (c >> mzero). Whenever a succeeds, we get that this is the same as a >> c >> mzero; and if a fails, then this is the same as a' >> b >> c >> mzero where a' is the prefix of a up until failure occurs.

Now instead consider mplus (a >> c >> mzero) (b >> c >> mzero). Here, if a succeeds, then this is the same as a >> c >> b >> c >> mzero; and if a fails, then it's the same as a' >> b >> c >> mzero. So the problem is, depending on whether we distribute or not, the effects of c will occur once or twice.

Notably, the problem we're running into here is exactly the same one we started out with, the failure of x >> mzero = mzero. Were this law to hold for IO (etc) then we wouldn't run into the problem of running c once or twice depending on distributivity.

winterkoninkje: shadowcrane (clean) (Default)

It's been a while since I've posted under this tag, but I made a map of common normal modal logics based off the diagram in SEP. I don't have much to say about it, but I'm taking a course on the topic now so hopefully I'll have a bit more to say in the future.

While I was at it, I revised the map for binary relations (first published back here). Not too much has changed overall. I added the definition of "dense" (op-transitive) and "shift-reflexive" relations, since these show up in modal logics and so could be interesting elsewhere. I also tightened up the list of entailments to more explicitly capture the difference between the weak and strong definitions of (anti)symmetry. In the new versions, the entailments all hold in minimal logic assuming the substitution principle for equality (if x=y and P(x) then P(y), for any elements x and y and any property P)— except for those marked with a subscript I, which require intuitionistic logic. In addition, any of the entailments with strong (anti)symmetry as a premise can be strengthened to only require weak (anti)symmetry if we have decidable equality or are working in classical logic.

Edit 2013.10.12: Updated these two again. For the relations, just added a note that shift-reflexivity entails density. For the modal logics, added a bunch of new goodies and cleaned things up a bit.

winterkoninkje: shadowcrane (clean) (Default)

The final API for (<κ)-commutative operators

Last time I talked about generalizing the notion of quasi-unordered pairs to the notion of quasi-unordered multisets. The final API from last time was:
type Confusion :: * → *

isSingleton :: Confusion a → Maybe a

size :: Confusion a → Cardinal

observe :: Ord r ⇒ (a → r) → Confusion a → [(r, Either (Confusion a) a)]

Now, every function of type Confusion a → b is guaranteed to be a (<κ)-commutative operator, where κ is implicitly given by the definition of Confusion. However, we have no way to construct the arguments to those functions! We need to add a function confuse :: ∀λ. 0<λ<κ ⇒ Vector a λ → Confusion a so that we can construct arguments for our (<κ)-commutative operators. Of course, rather than using bounded quantification and the Vector type, it'd be a lot easier to just define a type which incorporates this quantification directly:

data BoundedList (a::*) :: Nat → * where
    BLNil  :: BoundedList a n
    BLCons :: a → BoundedList a n → BoundedList a (1+n)

data NonEmptyBoundedList (a::*) :: Nat → * where
    NEBLSingleton :: a → NonEmptyBoundedList a 1
    NEBLCons      :: a → NonEmptyBoundedList a n → NonEmptyBoundedList a (1+n)
Now we have:
confuse :: NonEmptyBoundedList a κ → Confusion a

type Commutative a b = Confusion a → b

runCommutative :: Commutative a b → NonEmptyBoundedList a κ → b
runCommutative f xs = f (confuse xs)

Ideally, we'd like to take this a step further and have a version of runCommutative which returns a variadic function of type a → ... a → b for the appropriate number of arguments. This way we'd be able to call them like regular curried functions rather than needing to call them as uncurried functions. There are a number of ways to do variadic functions in Haskell, but discussing them would take us too far afield. Naturally, implementing them will amount to taking advantage of the 4-tuple for folding over multisets, which was defined last time.

Handling κ-commutative operators

Continuing the theme, suppose we really want to handle the case of κ-commutative operators rather than (<κ)-commutative operators. For simplicity, let's restrict ourselves to finite κ, and let's pretend that Haskell has full dependent types. If so, then we can use the following API:

type Confusion :: * → Nat → *

extractSingleton :: Confusion a 1 → a

size :: Confusion a n → Nat
size _ = n

data ConfusionList (r, a :: *) :: Nat → * where
    CLNil  :: ConfusionList r a 0
    CLCons :: r → Confusion a n → ConfusionList r a m → ConfusionList r a (n+m)

observe :: Ord r ⇒ (a → r) → Confusion a n → ConfusionList r a n

confuse :: Vector a (1+n) → Confusion a (1+n)

type Commutative a n b = Confusion a n → b

runCommutative :: Commutative a n b → Vector a n → b
runCommutative f xs = f (confuse xs)

winterkoninkje: shadowcrane (clean) (Default)

Recently gelisam posted an article on a combinatorial approach to provably commutative binary operators, which has a decent discussion on reddit. Here, I'm going to generalize that idea.

Edit (2013-08-03T05:06:58Z): Okay, I should be done tweaking things now.

Edit (2013-08-03T06:02:40Z): See also the followup post.

Preliminary Definitions

Let A and B be arbitrary types, and let κ be any non-zero cardinal number (we only really care about finite cardinals, or maybe at most aleph-naught). Let the notation A^{κ} denote a multiset of elements drawn from A with cardinality exactly κ. And let the notation A^{<κ} denote a non-empty multiset of elements drawn from A with cardinality not greater than κ.

Commutative operators

A κ-commutative operator from A to B can be thought of as a function of type A^{κ} → B. For example:

  • The 1-commutative functions are all the functions of type A → B.
  • The 2-commutative functions are commutative binary functions, thus a subtype of A → A → B.
  • The 3-commutative functions are ternary functions which return the same result for all permutation orders of their arguments, thus a subtype of A → A → A → B.

A (<κ)-commutative operator from A to B can be thought of as a function of type A^{<κ} → B, i.e., a polymorphic function of type ∀λ. 0<λ<κ ⇒ A^{λ} → B. Thus, the (<κ)-commutative functions can be thought of as a family of λ-commutative functions, for all non-zero λ not greater than κ, satisfying suitable coherence conditions. However, for finite κ, a simpler representation is to think of them in terms of folds over non-empty multisets; that is, as 4-tuples:

(C, h:A→C, f:A→C→C, g:C→B)
    such that
    ∀a,a'∈A, c∈C. f a (f a' c) == f a' (f a c)
    ∀a,a'∈A.      f a (h a')   == f a' (h a)

Alternatively, if you are willing to include the case where λ=0, then use 4-tuples:

(C, c0:C, f:A→C→C, g:C→B)
    such that
    ∀a,a'∈A, c∈C. f a (f a' c) == f a' (f a c)

Quasi-unordering pairs

Gelisam's solution to the higher-order problem is to generalize unordered pairs to quasi-unordered pairs: which are, either an unordered pair or else some summary information about a pair which can't be unordered. The way we do this is to have an observation function we apply to each argument. If that observation function can distinguish the arguments, then we can pass the arguments to the function in a canonical order such that we do not betray the original order of the inputs. However, if we cannot distinguish the two arguments, then the best we can do is to return the observation we made on both of them— since, if we returned the actual arguments themselves then we'd betray their input order. Rather than hard-coding the choice of Bool as gelisam did, we can generalize to any totally ordered set of observations:

distinguish :: Ord r ⇒ (a → r) → Commutative a (Either r (a,a))
distinguish f = Commutative $ \ x y ↦
    let fx = f x
        fy = f y
    case compare fx fy of
    LT ↦ Right (x,y)
    EQ ↦ Left fx
    GT ↦ Right (y,x)

Quasi-unordering multisets

Now, we want to generalize this to multisets with any non-zero cardinality. Note, because the observations r are totally ordered, the observation function induces an almost-linear partial order on the original domain a. Assume we have some suitable abstract implementation of multisets: Confusion a, which implements a^{<κ} for some particular κ. Because the observation function induces an order on our original domain, there exists a function:

observe_0 :: Ord r ⇒ (a → r) → Confusion a → [Confusion a]
    such that
    ∀ r f xs0. isOrdered (observe_0 f xs0)
        isOrdered (xs:xss) =
            (∀x,x'∈xs. f x == f x')
            ⋀ case xss of
              []   ↦ True
              ys:_ ↦
                  (∀ x∈xs, y∈ys. f x < f y)
                  ⋀ isOrdered xss
That is, every element of the result is a multiset of elements which are equivalent under the observation function, and the elements of previous multisets precede the elements of subsequent multisets. That is, we can take the input multiset which is inherently unordered and return a nearly-linear DAG.

As written observe_0 doesn't get us that far, since the implementation of multisets is abstract, and all we've done is break one multiset up into a bunch of multisets. Now, assume our multisets support the function isSingleton :: Confusion a → Maybe a which tells us whether a multiset is a singleton or not, and gives us the element if it is. Using this, we can strengthen our observing function to:

observe_1 :: Ord r ⇒ (a → r) → Confusion a → [Either (Confusion a) a]
observe_1 f
    = map (\xs ↦ maybe (Left xs) Right (isSingleton xs))
    . observe_0 f

That gets us pretty much what we want. However, in order to gain the full functionality we had in the binary case, we'd like to have some way to eliminate the remaining multisets. Probably the most general way is to define the following function which also returns the shared observation for classes of inputs which are still confused:

observe_2 :: Ord r ⇒ (a → r) → Confusion a → [Either (r, Confusion a) a]
-- or, even better
observe_2' :: Ord r ⇒ (a → r) → Confusion a → [(r, Either (Confusion a) a)]
We could implement this in terms of observe_0, however, that means we'd be calling the observation function redundantly since observe_0 throws away the results. Thus, it's more efficient to just implement this version directly, in lieu of implementing observe_0 at all. The overall complexity will be the same and this version is strictly more powerful.

If we're returning the result of observing those confusion sets, then why bother returning the confusion set too? The reason is that this way we can apply our observing function again in order to further distinguish those confusion sets. Therefore, this version is complete for composition of observation functions whereas the version that discards the confusion set is not.

And, as a final note, we'd also want to ensure that our multisets support the operation size :: Confusion a → Cardinal; otherwise, we need to return a triple of the multiset, its size, and its observation. The binary case could avoid this detail since there all confusion sets must be of cardinality 2.

winterkoninkje: shadowcrane (clean) (Default)

Over on Reddit there's a discussion where one commenter admitted:

"the whole (^) vs (^^) vs (**) [distinction in Haskell] confuses me."
It's clear to me, but it's something they don't teach in primary school, and it's something most programming languages fail to distinguish. The main problem, I think, for both primary ed and for other PLs, is that they have an impoverished notion of what "numbers" could be, and this leads to artificially conflating things which should be kept distinct. I wrote a reply over on Reddit, hoping to elucidate the distinction, but I thought I should repeat it in more persistent venue so it can reach a wider audience.

First, let us recall the basic laws of powers:

a^0 = e
a^1 = a
(a^x)^y = a^(x*y)
(a^x)*(a^y) = a^(x+y)
(a*b)^x = (a^x)*(b^x)

There are two very important things to notice here. First off, our underlying algebra (the as and bs) only needs to have the notion of multiplication, (*), with identity, e. Second, our powers (the xs and ys) have an entirely different structure; in particular, they form at least a semiring (+,0,*,1). Moreover, if we're willing to give up some of those laws, then we can weaken these requirements. For example, if we get rid of a^0 = e then we no longer need our underlying algebra to be a monoid, being a semigroup is enough. And actually, we don't even need it to be a semigroup. We don't need full associativity, all we need for this to be consistent is power-associativity.

So we can go weaker and more abstract, but let's stick here for now. Any time we have a monoid, we get a notion of powers for free. This notion is simply iterating our multiplication, and we use the commutative semiring (Natural,+,0,*,1) in order to represent our iteration count. This is the notion of powers that Haskell's (^) operator captures. Unfortunately, since Haskell lacks a standard Natural type (or Semiring class), the type signature for (^) lies and says we could use Integer (or actually, Num which is the closest thing we have to Ring), but the documentation warns that negative powers will throw exceptions.

Moving on to the (^^) operator: suppose our monoid is actually a group, i.e. it has a notion of reciprocals. Now, we need some way to represent those reciprocals; so if we add subtraction to our powers (yielding the commutative ring (Integer,+,-,0,*,1)), we get the law a^(-x) = 1/(a^x). The important thing here is to recognize that not all monoids form groups. For example, take the monoid of lists with concatenation. We do have a (^) notion of powers, which may be more familiar as the replicate function from the Prelude. But, what is the reciprocal of a string? what is the inverse of concatenation? The replicate function simply truncates things and treats negative powers as if they were zero, which is on par with (^) throwing exceptions. It is because not all monoids are groups that we need a notion of powers for monoids (i.e., (^)) which is different from the notion of powers for groups (i.e., (^^)). And though Haskell fails to do so, we can cleanly capture this difference in the type signatures for these operations.

Further up, we get another notion of powers which Haskell doesn't highlight; namely the notion of powers that arises from the field (Rational,+,-,0,*,/,1). To get here, we take our group and add to it the ability to take roots. The fractions in powers are now taken to represent the roots, as in the law a^(1/y) = root y a. Again note that there's a vast discrepancy between our underlying algebra which has multiplication, reciprocals, and roots vs our powers which have addition, subtraction, multiplication, and division.

Pulling it back a bit, what if our monoid has a notion of roots, but does not have inverses? Here our powers form a semifield; i.e., a commutative semiring with multiplicative inverses; e.g., the non-negative rationals. This notion is rather obscure, so I don't fault Haskell for lacking it, though it's worth mentioning.

Finally, (**) is another beast altogether. In all the previous examples of powers there is a strong distinction between the underlying algebra and the powers over that algebra. But here, we get exponentiation; that is, our algebra has an internal notion of powers over itself! This is remarkably powerful and should not be confused with the basic notion of powers. Again, this is easiest to see by looking at where it fails. Consider multiplication of (square) matrices over some semiring. This multiplication is associative, so we can trivially implement (^). Assuming our semiring is actually a commutative ring then almost all (though not all) matrices have inverses, so we can pretend to implement (^^). For some elements we can even go so far as taking roots, though we run into the problem of there being multiple roots. But as for exponentiation? It's not even clear that (**) should be meaningful on matrices. Or rather, to the extent that it is meaningful, it's not clear that the result should be a matrix.

N.B., I refer to (**) as exponentials in contrast to (^), (^^), etc as powers, following the standard distinction in category theory and elsewhere. Do note, however, that this notion of exponentials is different again from the notion of the antilog exp, i.e. the inverse of log. The log and antilog are maps between additive monoids and multiplicative monoids, with all the higher structure arising from that. We can, in fact, give a notion of antilog for matrices if we assume enough about the elements of those matrices.

winterkoninkje: shadowcrane (clean) (Default)

An important notion that shows up in algebra is the idea of a free object. For example, we have the following free objects for their corresponding algebraic theories:

  • NonemptyList A — free semigroups
  • List A — free monoids
  • NonemptyBag A — free commutative semigroups
  • Bag A [1] — free commutative monoids
  • NonemptySet A — free commutative bands (band = idempotent semigroup)
  • Set A — free commutative bands with identity

Recently I've been coming across things whose quotient structure looks like:

Foo A B = List (NonemptyMap A (NonemptyList B))
Bar A B = (Foo A B, NonemptyList (A,B))

I'm curious if anyone has encountered either of these as free objects of some algebraic theory?

[1] I.e., a multiset. In order to get the proper quotienting structure, we can implement Bag A by Map A Nat where a `elem` xs = member a xs and multiplicity a xs = maybe 0 (1+) (lookup a xs).

winterkoninkje: shadowcrane (clean) (Default)

Yesterday I was trying to explain some of the paradoxes of probability theory to a friend who disbelieves in the real numbers. It's not always clear whether this disbelief is actual, or if it's just an affectation; constructivist and devil's-advocate that he is, it could go either way really. In any case, it's always amusing to spar with (not that I have any especial concern for the un/reality of the reals). Midway through, Dylan Thurston came over to listen in and raised a question I've mulled over before but have been turning over again and again since then. What is it that I mean when describing a space (as opposed to a function etc) as "continuous"?

The knee-jerk response is that continuity is the antithesis of discreetness. That is, given some collection or space or other arrangement of things, often we are interested in accumulating some value over the lot of them. In the easiest setting, finite collections, we just sum over each element of that collection. But this process isn't limited to finite collections; we sum over infinite collections like the natural numbers with nary a care, and use the same large sigma notation to do so. So mere countable infinity isn't a problem for the notion of summation or accumulation. In programming we oft take our infinitudes even further. There's nothing special about the natural numbers. We can sum over the collection of trees, or lists, or any other polynomial type with just as little (or as much) concern for how many values are in these types as for how many natural numbers there are. But at some point this breaks down. Somewhere between the polynomial types and the real numbers, everything falls apart. We cannot in any meaningful sense use large sigma to accumulate a value over the vast majority of subsets of the reals. Instead we must turn to a different notion of accumulation: integration. For discrete collections summation is fine, but when we enter the continuous setting we must switch to integration.

The problem, of course, is that integrals are not really well-defined. Regardless of your choice of formalization, they all run into paradoxes and problems[1]. One of these problems rears its head in that probability theoretic paradox I was attempting to explain. Namely, the conception of inhabited sets of measure zero. The paradox arises even before probabilities rear their head. Colloquially, integrals are the area under a curve over some interval of the curve's domain. How do we get the area of some curvy shape? Well, we can approximate the shape by making a bunch of rectangles; and our approximation becomes better and better as those rectangles become thinner and thinner. In the limit, this approximation matches the actual shape and so we can get its area. But, in the limit, those rectangles have thickness zero; and thus, they must have area zero. So how is it that summing all those slivers with area zero can ever result in a non-zero total area? Thus, is the paradox.

But pulling things back to the original question: what does it mean for a space to be continuous in the first place? What is it ---exactly--- that causes summation to fail and forces us into this problematic regime of integration? Is the notion of continuity or of the reals or of infinite divisibility or however you want to phrase it, is the notion itself a hack? And if it is a hack, then how do we get away from it? Classical mathematicians are fond of hacks but, while I respect a good hack, as a constructivist myself I prefer to be on surer footing than simply believing something must be the case since the alternative is too absurd to conceive of. So, why do we integrate? I've yet to find a reason I can believe in...

[1] One can make the same complaint about logics (and other areas of mathematics) too. Impredicativity is much the same as the situation in probability theory; the idea is so simple and obvious that we want to believe in it, but to do so naively opens the door to demonstrable unsoundness. The liar's paradox is another close analogy, what with making perfect sense except in the limit where everything breaks down. Indeed, the paradoxes of impredicativity are of the exact same sort as the liar's paradox. But in spite of all these issues, we do not usually say that logic is ill-defined; so perhaps my judgment of calculus is unfair. Though, to my knowledge, people seem to have a better handle on the problems of logic. Or perhaps it's just that the lack of consensus has led to the balkanization of logic, with constructivists and classicalists avoiding one another, whereas in calculus the different sides exchange ideas more freely and so the confusion and disagreements are more in the open...

Math envy

31 Dec 2012 02:22 am
winterkoninkje: shadowcrane (clean) (Default)

I have often derided those who are susceptible to math envy. Y'know, the idea that math=intelligence. This utter foolishness leads to the simultaneous fear and awe of anyone who throws math around, as if the presence of mere symbols and equations demonstrates the clear superiority of the author's throbbing, bulging,... intellect. This utter foolishness leads, therefore, to authors who feel the need to add superfluous "mathematics" to their writings in order to demonstrate that their... intelligence measures up that of their colleagues.

Well, turns out, someone finally got around to doing a study on math envy: Kimmo Ericksson (2012) "The nonsense math effect", Judgment and Decision Making 7(6). As expected, those with less training in mathematics tend to rate utterly irrelevant "mathematical content" more highly than its absence. Lest anyone start feeling smugly superior, however, I'll note that I've seen this effect most strongly in those who should know better, i.e., those with just a little mathematical training. This includes, for example, computer scientists who are not formal theoreticians. Not to name names, but I've read more than one NLP paper that throws in some worthless equation just to try to look more worthwhile. (These papers are often fine, in and of themselves, but would have been better had they not succumbed to math envy.)

As Language Log points out in their coverage, this isn't limited just to math. Some people also have brain-scan envy and similar afflictions. That's definitely worth watching out for, but IME people seem more aware of their pernicious effects while being blind to math envy.

winterkoninkje: shadowcrane (clean) (Default)

Apparently I messed up my link to the map of ring theory last time. It's fixed now, but the fact that noone commented on it means noone's reading this blog anymore (or those who do don't care). Ah well. Regardless, it's time to finish up this little series for now.

Anyone who's hung around me long enough knows that I'm a great fan of semirings. Partly this is because of their ubiquity in natural language processing, but mostly it's because they're the intersection of two of the most powerful and ubiquitous abstractions in mathematics. Semirings simultaneously generalize rings (by not necessarily having negation) and bounded lattices (by not necessarily being commutative nor having both operators distribute). Therefore, they generalize both arithmetic and ordering, two of the core components of mathematics. Also the fact that the natural numbers do not form a ring is a strong case for taking semirings seriously. I don't have a map this time, instead I have an extensive list of common examples. The examples fall into about half a dozen common patterns.

  • Natural number-like semirings
  • Integer-like rings
  • Tropical and Viterbi semirings
  • Bounded distributive lattices
  • Boolean rings
  • Regular languages
  • Square matrices over a semiring

I'm sure you're already familiar with the natural numbers, integers, etc., so I won't spend much time talking about them. The one thing worth mentioning here, though, is what the heck that last column is for. It's for *-semirings (i.e., closed semirings), not to be confused with *-(semi)rings (i.e., (semi)rings with involution). Jacques Carette introduced me and Russell O'Connor to them, and Russell has written a bit about them. Since he does a good job of explaining them, I won't talk about them just now. But, because they're a bit obscure, I made sure to indicate what the asteration operation is when it exists. The one especially notable point here is that the Alexandroff compactification of the reals is no longer a ring! (Another reason to take semirings seriously.) We give up the ring-ness of the reals to obtain closed-ness and compact-ness.

The tropical and Viterbi semirings are a sort of cross between arithmetic and ordering, and you've probably never heard of them before. They are, however, rather common in computer science. They tend to arise whenever we use dynamic programming in order to find the "best" of some set of things: e.g., the most likely tag sequence, the shortest path, the longest path, etc. This is what part of speech tagging, matrix chain multiplication, and Dijkstra's algorithm are all about. And recognizing that these are semirings is what leads to generalizations like forward–backward and inside–outside, to say nothing of the Gauss–Jordan–Floyd–Warshall–McNaughton–Yamada algorithm that comes from focusing on *-semirings.

The last major class I'll talk about are the bounded distributive lattices. The reason to discuss these is just to point out that this includes all Boolean algebras (hence classical logic and subset inclusion hierarchies), all Heyting algebras (hence intuitionistic logic), and all bounded totally ordered sets.

This chart is released under Creative Commons Attribution-ShareAlike 3.0. Any questions? Anything I missed? Anything that needs further explanation?

winterkoninkje: shadowcrane (clean) (Default)

Last time we talked about sets which support a single binary operator. That's all well and good, but things get really interesting once we have two of those operators. Thus, for lack of a better name, I present a map of ring theory. The majority of this isn't actually part of ring theory proper, of course; but that's as good a name as any, since we'll want to distinguish these from lattice theory which also has two interlocking operators.

In the center of the map are (unital) rings. For those who aren't aware, in older literature, what are now called pseudorings used to be called "rings", and what are now called rings used to be called "unital rings" or "rings with unit". I've run into very few proper pseudorings of interest, so I support this change in terminology. In any case, if we start from rings and move upward we get rid of properties. In addition to pseudorings we also get semirings, which are utterly ubiquitous. In the next post I'll give a few dozen of the most common semirings, so I'll save my rant for then. If we keep moving up then we start getting rid of associativities and distributivities, resulting in things like seminearrings (aka near semirings) which arise naturally in certain kinds of parsing problems. This area is poorly explored, hence the dearth of names in the upper part of the map. This is, however, where I've been spending a lot of time lately; so you'll probably hear more about seminearrings and their ilk in the near future. As the names suggest, we have both left-near and right-near versions of these various structures, though I only draw the right-near ones for clarity.

Moving downward from semirings there are two directions to head. Off to the left we run into Kleene algebras and lattice theory yet again. And to the south we run into the swamp along the way to fields. In spite of their apparent axiomatization, fields are not really algebraic objects, which is a big part of the reason for all this mess. In the lower left we see a chain of inclusions based on some peculiar properties like every ideal being principal, the existence of greatest common divisors (though not necessarily computable with Euclid's algorithm), the ascending chain condition on principal ideals, etc. These properties will be familiar to the actual ring theorists, as would numerous other properties I didn't bother putting on here. Off to the lower right we get a different sort of breakdown. In particular, before we get unique inverses for all non-zero elements, we can instead just have pseudoinverses or strong pseudoinverses. This is similar to the distinction between (von Neumann) regular semigroups vs inverse semigroups.

There's plenty more to say about ring theory and related areas, but that would be a whole series of posts on its own. This is actually the first map I started to make, because this is the region where we find so many mathematicians coming in from different areas and not being familiar with all that has been done before. As I'm sure you notice, quite a lot has been done, both in breadth and in depth. I brought this map up once when chatting with Amr Sabry, and he's the one who convinced me to finally get around to posting all these. So, hopefully these'll give y'all a better lay of the land.

There are some notable omissions from this map as it stands. In particular, complete semirings (aka *-semirings) are missing, as are rings with involution (aka *-rings), as are the various constructive notions of fields (like discrete fields, Heyting fields, residue fields, meadows, etc.). Next time I'll talk a bit about complete semirings; they were omitted mainly for lack of space, but should be included in future versions. The various constructive notions of fields were also omitted mainly for space reasons. I'll probably end up making a close-up map of the swamplands between rings and fields in order to do justice to them. Rings with involution were omitted mainly because I'm not entirely sure where the best place to put them is. As the name suggests, they've been primarily studied in contexts where you have a full-blown ring. However, there's nothing about the involution which really requires the existence of negation. I've started chatting with Jacques Carette about related topics recently, though; so perhaps I'll have more to say about them later.

This map is released under Creative Commons Attribution-ShareAlike 3.0. Any questions? Anything I missed? Anything that needs further explanation?

winterkoninkje: shadowcrane (clean) (Default)

Continuing the thread from last time, let's move on from relations and consider a map of individual binary operations. In a lot of ways this is even simpler than the binary relations from last time, though the map requires a bit more explanation. This time, rather than having definitions at the top, they're given as labels on the arcs. Arcs in the same color denote the same property, dashed lines represent things you get for free, and black lines are for the odd things; all just like last time.

Most of the study of individual binary operations falls under group theory, which forms the core of this map. The one interesting thing here is that if you have at least monoid structure (i.e., have an identity element) then the uniqueness of inverses follows from having the presence of inverses. However, for semigroups which are not monoids, these two properties differ. This'll come up again next time when we start talking about rings and fields.

Off to the left we veer into lattices. And to the right we get the crazy stuff that comes from non-associative algebra. Quasigroups and loops are somewhat similar to groups in that they have an invertible structure, but unfortunately they don't have associativity. It turns out, there's a whole hierarchy of almost-but-not-quite-associative properties, which is shown on the second page. The strongest property you can get without being fully associative is Moufang, which can be phrased in four different ways. Below this we have left- and right-Bol (if you have both the Bols then you have Moufang). Below that we have alternativity where you choose two of three: left-alternativity, right-alternativity, and flexibility. Below that, of course, you can have just one of those properties. And finally, at the bottom, power associativity means that powers associate (and so "powers" is a well-formed notion) but that's it.

As I said, there's not a whole lot here, but I needed to bring this one up before getting into ring-like structures. This map is released under Creative Commons Attribution-ShareAlike 3.0. Any questions? Anything I missed? Anything that needs further explanation?

winterkoninkje: shadowcrane (clean) (Default)

Friday I turned in the last of my papers, and last night I turned in the last of my grades. Which means I'm free for a whole two weeks or so. Which is good because for the last couple weeks I've been wanting desperately to blog as a means of escaping all those due dates. So now I get to flush some of the backlog from this semester. In particular, there's a cluster of posts I've been meaning to put up for a while, a sort of Intro to Maths for mathematicians, as it were. To save y'all's friendspages I'll spread these posts out over the next few days.

columbicubiculomania — The compulsion to stick things into pigeonholes. (Jim Matisoff 1990)

Ever since stumbling upon that fabulous word I've wanted to spread its popularity. As a geek with certain obsessive–compulsive tendencies, I'm a bit prone to columbicubiculomania; or rather, as the theoretician that I am, I'm prone to the dual of columbicubiculomania. I don't care so much about the pigeons, they can take care of themselves. But I do have a certain compulsion to design and arrange pigeonholes such that whenever someone feels the columbicubiculocompulsion, they'll have the right places to stuff all their pigeons into. Part of this is also tied up in me trying to figure out (or rather, to convey) where exactly I situate myself in the vast sea of competing academic fields. As the perennial outsider, I'm more interested (it seems) in seeing how everything is connected than are those stuck on the inside.

And so, over the past few years I've been accumulating the jargon from different subfields of mathematics and organizing them into a cohesive whole. In particular, for the next few posts, I'm concerned with the terminology for algebraic objects. Too often I read papers with practitioners of one algebraic field (e.g., group theory, ring theory,...) getting unwittingly caught up in another field and subsequently using such outrageous terms as "semigroup with identity" or "monoid without identity". Because of this ridiculousness, a lot of mathematicians and computer scientists end up not realizing when the thing they're studying has already been studied in depth under another name. So let's see all those names and how they connect. Let's produce a cartography of mathematical objects.

Perhaps the easiest place to start is one of the latter maps I produced. Binary relations are ubiquitous in all areas of mathematics, logic, and computer science. Typically we don't care about all binary relations, we only care about relations with particular properties. Of course, the interaction between properties is nontrivial since properties A and B together can entail that property C must hold. Thus, a map of binary relations is helpful for keeping track of it all. This map requires relatively little explanation, which is why I started here.

All the common properties of interest are defined at the top, and color coded for use in the map. And, constructivist that I am, I've been sure to distinguish between strong and weak versions of the properties (which collapse in the classical setting). The arrowheads in the map are for keeping track of when we're talking about P, anti-P, or co-P (for some particular property P). And the big black dot is the starting point of the map (i.e., a binary relation with no special properties). The dashed lines indicate when some property will follow for free. So, for example, there's no name for a transitive irreflexive relation since trans+irrefl entails antisymmetry and trans+irrefl+antisym is a strict partial order. And the black lines are for when the named object requires some peculiar property that doesn't show up all over the place. These conventions for dashed and black lines are common to all my maps.

Once you're familiar with my conventions, the only thing really left unstated on this map is that the apartness relation used in the definition of a strongly connected binary relation is actually a tight apartness. The constructive significance of apartness relations vs equality relations is something I'll have to talk about another time.

Towards the top of the map we start veering away from binary relations per se and start heading into order theory. Because the complexities of order theory are a bit different from the complexities of binary relations, I've chosen to break them up into different maps. As yet, I haven't actually made the map for order theory, but I'll get around to it eventually. Lattice theory and domain theory also gets tied up in all this (in the order theory particularly). I've started work on a lattice theory map, but haven't finished it just yet.

This map is released under Creative Commons Attribution-ShareAlike 3.0. Any questions? Anything I missed? Anything that needs further explanation?

winterkoninkje: shadowcrane (clean) (Default)

Quoth Paul Taylor:

This result is folklore, which is a technical term for a method of publication in category theory. It means that someone sketched it on the back of an envelope, mimeographed it (whatever that means) and showed it to three people in a seminar in Chicago in 1973, except that the only evidence that we have of these events is a comment that was overheard in another seminar at Columbia in 1976. Nevertheless, if some younger person is so presumptuous as to write out a proper proof and attempt to publish it, they will get shot down in flames.

winterkoninkje: shadowcrane (clean) (Default)

The integers mod 3 are better thought of as {-1, 0, 1} rather than the traditional {0,1,2}. This makes it clear that the two non-zero elements are additive inverses of one another. And it also makes it clear that multiplication of non-zero elements just returns an "is same/different" judgment, which comes for free from our usual arithmetic: (-1)*(-1) = 1 = 1*1 and (-1)*1 = -1 = 1*(-1).

RSS Atom

June 2017

18192021 222324


Page generated 21 Oct 2017 05:24 pm
Powered by Dreamwidth Studios