winterkoninkje: Shadowcrane (Default)
[personal profile] winterkoninkje

I've been mucking about with real analysis lately. One of the things I've always hated about real analysis (and complex analysis, and other calculus) is the unrepentant lack of types. It's easy to get confused about what's going on and what goes where when you think everything is a real number (or complex number, or tensors thereof). In particular, because of this "everything is a real array" assumption, libraries for performing optimization etc end up with utterly inscrutable APIs and other nastiness like needing to manually flatten your parameters into a single array. So this has got me thinking, what should the API for optimization libraries look like? Which in turn brings me back to an age-old conundrum I've toyed around with before, namely: what is the type of the derivative operator?

From the "everything is a real number" perspective we'd say deriv : (ℝ ↝ ℝ) → ℝ → ℝ, where the A ↝ B function space are "nice" functions (e.g., continuous, smooth, analytic, whatever). However, this only works out because we're conflating a number of important differences. For example, for any affine space A we have the difference space ∆A— that is, values in ∆A denote differences or differentials of values in A (e.g., for any a : A and b : A we have a-b : ∆A). However, it turns out that there's a bijection between and ∆ℝ. It also turns out that the difference of arrays, ∆(ℝn), is isomorphic to an array of differences, (∆ℝ)n. Putting these together we get the common conflation between points (n) and vectors (∆(ℝn)). For another example, consider linear transformations, written A ⊸ B. We have a bijection between linear transformations (m ⊸ ℝn) and matrices (n×m), and hence more specifically a bijection between and ℝ ⊸ ℝ.

So here's what I'm currently thinking:

deriv : (F X ↝ Y) → F X → F(∆X ⊸ ∆Y)

For now, just ignore F; instantiate it as the identity functor. Thus, the total derivative of f : X ↝ Y at the point x : X is a linear transformation from minor perturbations about x to minor perturbations about f x. We can think of ∆X ⊸ ∆Y as being the "slope" of this mapping between minor perturbations. In particular, when X=ℝ and Y=ℝ, we get a bijection between ∆ℝ ⊸ ∆ℝ and , so this coincides with the traditional notion of the slope of a one-dimensional curve.

Now, let's think about the gradient rather than the total derivative. Assume that F is a "nice" functor (i.e., representable, traversable, etc). Because the functor is representable, we have an isomorphism between F A and I → A (natural in A), where I is the type of positions/indices in F. Thus, the gradient of a function g : F X ↝ Y at the point z : F X is essentially a function from F-indices, i : I, to the i-th partial derivative of g at z. Why partial derivative? Because the linear transformation only takes ∆X as an argument —not ∆(F X)—, thus we can only modify a single "scalar" at a time, and the other scalars in z must remain fixed.

Edit 2014.02.17 (#1): Actually, things are a bit more complicated than the above. For example, consider a function f : ℝn ↝ ℝ. The derivative of f with respect to its vector of inputs is not a vector of partial derivatives— rather, it's a covector of partial derivatives! (i.e., deriv (f : ℝn×1 ↝ ℝ) : ℝn×1 → ℝ1×n.) Thus, we should really have that the return type of deriv is some Fop(∆X ⊸ ∆Y) where Fop is some sort of "dual" of F. It's not clear a priori which notion of "duality" is in play here, however.

Edit 2014.02.17 (#2): And here's what I'm currently thinking about how to incorporate the Jacobian into the above. Consider this particular instance of the derivative operator, deriv : (F X ↝ G Y) → F X → Fop(∆X ⊸ ∆(G Y)). When G is some type of vectors (e.g., G Y = Yn for some fixed n), we get ∆(G Y) = G(∆Y) as mentioned before. And let's assume ∆X ⊸ G(∆Y) = G(∆X ⊸ ∆Y); presumably this follows immediately by linearity, though I haven't verified that yet. And finally, in our standard real analysis we get that G∘Fop and Fop∘G are the same— that is, vectors-of-covectors and covectors-of-vectors are essentially the same thing; they're just the row-major and column-major representations of matrices, respectively. And then, putting all these together we get that the original return type Fop(∆X ⊸ ∆(G Y)) is isomorphic to G(Fop(∆X ⊸ ∆Y)). So when X=ℝ and Y=ℝ we get the expected deriv : (ℝm ↝ ℝn) → ℝm → ℝn×m. Now, it remains to show how this extends to other choices of F and G...

Edit 2014.02.17 (#3): It was pointed out on reddit that the above is "well known"— at least for those who are familiar with differentiable manifolds and tangent spaces. Now, I don't know much about manifolds, but it certainly wasn't well-known to me— which, in itself, says a lot about how little this knowledge has spread to the rest of mathematics. I'm glad to hear there's some followup reading to be done here, but I still don't think the necessary work has been done in terms of turning this into a decent API for optimization libraries.

(no subject)

13/2/14 14:56 (UTC)
Posted by [identity profile] chrisamaphone.livejournal.com
I have no idea if this is relevant or just superficially similar, but this post reminded me of bob atkey's POPL talk on giving more precise types to things in physics (and presumably corresponding paper, though i haven't read it).

(no subject)

13/2/14 17:45 (UTC)
silmaril: (Default)
Posted by [personal profile] silmaril
You know, this is a perfect example of the winterkoninkje-post-genre "I know most of the words and can follow about 50% of this, but not only have I never thought about these things in these relations to each other, but didn't even know there were people who did think about these things in these relations to each other." IOW, awesome. Thank you.

nice!

13/2/14 18:21 (UTC)
Posted by [identity profile] existentialtype.wordpress.com
I agree with your point of view, and enjoyed reading this. It seems to me that the idea of the tangent bundle in differential geometry is to "get the types right" for derivatives. It's striking to see the use of logical concepts such as exchange, weakening, and currying in the formulation of the tangent bundle.

Profile

winterkoninkje: Shadowcrane (Default)
wren gayle romano

August 2014

S M T W T F S
     12
34 56 789
1011121314 1516
171819 20212223
24252627282930
31      

Common Tags