winterkoninkje | What is the type of the derivative operator?

I've been mucking about with real analysis lately. One of the things I've always hated about real analysis (and complex analysis, and other calculus) is the unrepentant lack of types. It's easy to get confused about what's going on and what goes where when you think everything is a real number (or complex number, or tensors thereof). In particular, because of this "everything is a real array" assumption, libraries for performing optimization etc end up with utterly inscrutable APIs and other nastiness like needing to manually flatten your parameters into a single array. So this has got me thinking, what should the API for optimization libraries look like? Which in turn brings me back to an age-old conundrum I've toyed around with before, namely: what is the type of the derivative operator?

From the "everything is a real number" perspective we'd say deriv : (ℝ ↝ ℝ) → ℝ → ℝ, where the A ↝ B function space are "nice" functions (e.g., continuous, smooth, analytic, whatever). However, this only works out because we're conflating a number of important differences. For example, for any affine space A we have the difference space ∆A— that is, values in ∆A denote differences or differentials of values in A (e.g., for any a : A and b : A we have a-b : ∆A). However, it turns out that there's a bijection between ℝ and ∆ℝ. It also turns out that the difference of arrays, ∆(ℝⁿ), is isomorphic to an array of differences, (∆ℝ)ⁿ. Putting these together we get the common conflation between points (ℝⁿ) and vectors (∆(ℝⁿ)). For another example, consider linear transformations, written A ⊸ B. We have a bijection between linear transformations (ℝ^m ⊸ ℝⁿ) and matrices (ℝ^n×m), and hence more specifically a bijection between ℝ and ℝ ⊸ ℝ.

So here's what I'm currently thinking:

deriv : (F X ↝ Y) → F X → F(∆X ⊸ ∆Y)

For now, just ignore F; instantiate it as the identity functor. Thus, the total derivative of f : X ↝ Y at the point x : X is a linear transformation from minor perturbations about x to minor perturbations about f x. We can think of ∆X ⊸ ∆Y as being the "slope" of this mapping between minor perturbations. In particular, when X=ℝ and Y=ℝ, we get a bijection between ∆ℝ ⊸ ∆ℝ and ℝ, so this coincides with the traditional notion of the slope of a one-dimensional curve.

Now, let's think about the gradient rather than the total derivative. Assume that F is a "nice" functor (i.e., representable, traversable, etc). Because the functor is representable, we have an isomorphism between F A and I → A (natural in A), where I is the type of positions/indices in F. Thus, the gradient of a function g : F X ↝ Y at the point z : F X is essentially a function from F-indices, i : I, to the i-th partial derivative of g at z. Why partial derivative? Because the linear transformation only takes ∆X as an argument —not ∆(F X)—, thus we can only modify a single "scalar" at a time, and the other scalars in z must remain fixed.

Edit 2014.02.17 (#1): Actually, things are a bit more complicated than the above. For example, consider a function f : ℝⁿ ↝ ℝ. The derivative of f with respect to its vector of inputs is not a vector of partial derivatives— rather, it's a covector of partial derivatives! (i.e., deriv (f : ℝ^n×1 ↝ ℝ) : ℝ^n×1 → ℝ^1×n.) Thus, we should really have that the return type of deriv is some F^op(∆X ⊸ ∆Y) where F^op is some sort of "dual" of F. It's not clear a priori which notion of "duality" is in play here, however.

Edit 2014.02.17 (#2): And here's what I'm currently thinking about how to incorporate the Jacobian into the above. Consider this particular instance of the derivative operator, deriv : (F X ↝ G Y) → F X → F^op(∆X ⊸ ∆(G Y)). When G is some type of vectors (e.g., G Y = Yⁿ for some fixed n), we get ∆(G Y) = G(∆Y) as mentioned before. And let's assume ∆X ⊸ G(∆Y) = G(∆X ⊸ ∆Y); presumably this follows immediately by linearity, though I haven't verified that yet. And finally, in our standard real analysis we get that G∘F^op and F^op∘G are the same— that is, vectors-of-covectors and covectors-of-vectors are essentially the same thing; they're just the row-major and column-major representations of matrices, respectively. And then, putting all these together we get that the original return type F^op(∆X ⊸ ∆(G Y)) is isomorphic to G(F^op(∆X ⊸ ∆Y)). So when X=ℝ and Y=ℝ we get the expected deriv : (ℝ^m ↝ ℝⁿ) → ℝ^m → ℝ^n×m. Now, it remains to show how this extends to other choices of F and G...

Edit 2014.02.17 (#3): It was pointed out on reddit that the above is "well known"— at least for those who are familiar with differentiable manifolds and tangent spaces. Now, I don't know much about manifolds, but it certainly wasn't well-known to me— which, in itself, says a lot about how little this knowledge has spread to the rest of mathematics. I'm glad to hear there's some followup reading to be done here, but I still don't think the necessary work has been done in terms of turning this into a decent API for optimization libraries.

Flat | Top-Level Comments Only

I have no idea if this is relevant or just superficially similar, but this post reminded me of bob atkey's POPL talk on giving more precise types to things in physics (and presumably corresponding paper, though i haven't read it).

Atkey's paper is on my todo list, but I haven't gotten around to reading it yet. I'm sure it's in a similar genre, even if not immediately relevant.

If you're interested in dimensional analysis problems, you might also be interested in Maryam Moghadas' MSc thesis on dimensional analysis of inverse problems for fMRI. Most of the bugs in fMRI code comes from dimension/type errors, so she was trying to rule those out by performing dimensional analysis in Haskell's type system. The key contribution was figuring out what those units/dimensions actually are (rather than doing the dimensional type-checking per se, which has been done before for standard physical units).

You know, this is a perfect example of the winterkoninkje-post-genre "I know most of the words and can follow about 50% of this, but not only have I never thought about these things in these relations to each other, but didn't even know there were people who did think about these things in these relations to each other." IOW, awesome. Thank you.

Glad you liked it :)

FWIW, this one is mostly off-the-cuff rambling in order to get my thoughts down for future reference. It's very much a work in progress. So I'd be curious about that other 50%, and how much is due to my taking shortcuts in writing things down vs how much is due to conceptual issues.

I agree with your point of view, and enjoyed reading this. It seems to me that the idea of the tangent bundle in differential geometry is to "get the types right" for derivatives. It's striking to see the use of logical concepts such as exchange, weakening, and currying in the formulation of the tangent bundle.

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Renaissance Grrrl

What is the type of the derivative operator?

What is the type of the derivative operator?

no subject

no subject

no subject

no subject

nice!

Profile

April 2019

Posts

Where next?