winterkoninkje: shadowcrane (clean) (Default)

This summer I've been working on optimizing compilation for a linear algebra DSL. This is an extension of Jeremy Siek's work on Built-to-Order BLAS functions. Often times it's more efficient to have a specialized function which fuses two or more BLAS functions. The idea behind BTO is that we'd like to specify these functions at a high level (i.e., with liner algebra expressions) and then automatically perform the optimizing transformations which have made BLAS such a central component of linear algebra computations.

The current/prior version of BTO already handles loop fusion, memory bandwidth constraints, and more. However, it is not currently aware any high-level algebraic laws such as the fact that matrix multiplication is associative, addition is associative and commutative, transposition reverse-distributes over multiplication, etc. My goal is to make it aware of these sort of things.

Along the way, one thing to do is solve the chain multiplication problem: given an expression like ∏[x1,x2...xN] figure out the most efficient associativity for implementing it via binary multiplication. The standard solution is to use a CKY-like dynamic programming algorithm to construct a tree covering the sequence [x1,x2...xN]. This is easy to implement, but it takes O(n^3) time and O(n^2) space.

I found a delicious alternative algorithm which solves the problem in O(n*log n) time and O(n) space! The key to this algorithm is to view the problem as determining a triangulation of convex polygons. That is, we can view [x0,x1,x2...xN] as the edges of a convex polygon, where x0 is the result of computing ∏[x1,x2...xN]. This amazing algorithm is described in the tech report by Hu and Shing (1981a), which includes a reference implementation in Pascal. Unfortunately the TR contains a number of typos and typesetting issues, but it's still pretty legible. A cleaner version of Part I is available here. And pay-walled presumably-cleaner versions of Part I and Part II are available from SIAM.

Hu and Shing (1981b) also have an algorithm which is simpler to implement and returns a heuristic answer in O(n) time, with the error ratio bounded by 15%. So if compile times are more important than running times, then you can use this version as well. A pay-walled version of the article is available from Elsevier.

winterkoninkje: shadowcrane (clean) (Default)

I finally got around to posting the slides for a talk I gave twice this summer: Probability Smoothing for NLP: A case study for functional programming and little languages. The first version of the talk was presented at the McMaster Workshop on Domain Specific Lanaguages (and Ed Kmett has posted a video of that version on YouTube) with the presentation focused on EDSLs, with smoothing given as an example. The second version was presented at the AMMCS minisymposium on Progress and Prospects in Model-Based Scientific Software Development, where the focus was more on the domain itself and how the use of a DSL allows ensuring correctness, modularity, and maintainability of code for developing probability models. The slides are essentially the same for both talks, with the benchmarks updated a bit in the latter.

As you may have surmised, this is but a small facet of the Posta project I was working on last year. I had meant to submit it as a functional pearl for ICFP, but the timing didn't work out for that. After giving the McMaster version of the talk, Ed convinced me that I should publish the code for the smoothing DSL separately from the rest of Posta. So he's the one to blame about my being so slow in releasing the Posta code I promised this summer. Though seriously, I'd been considering breaking up and reorganizing the code anyways. Now that I'm back from ICFP and all my traveling over the summer, I hope to get that code pushed out soon. Sorry for the delay y'all.

RSS Atom

April 2019

S M T W T F S
 123456
78910111213
14151617181920
212223242526 27
282930    

Tags

Page generated 5 Jun 2025 10:33 am
Powered by Dreamwidth Studios