winterkoninkje | The PLS group are made of awesome. (Reply)

Now that the summer's here I've been hoping to get back into posting more regularly. Over the last year I've accumulated a large number of "to read" papers on various topics having to do with functional and logic programming. And now that the summer's here I'm finally getting a chance to go back and read them. That sounds like two great tastes that taste great together, so let the linkblogging begin.

Functional programming is historically maligned with having less efficient implementations than imperative languages do, mostly because historically that's been the case. Note the emphasis on "historically". However, if you're not careful, you may be perpetuating that sad state of history. If you think —being a non-academic programmer— that academic papers don't affect you, quoth dons on #haskell: programming is not for the naïve.

In functional languages the list data structure is foundationally ubiquitous, and in lazy languages like Haskell they also abstract over the recursion pattern of loops in imperative languages. Unfortunately this excessive use of lists places an enormous burden on allocating and reclaiming memory, which is a major source of historical inefficiencies in functional language implementations. Fortunately, much research has been done to overcome these issues... if you take advantage of it.

One of the great claims of high-level (mathematically-rigorous) languages is that the altitude gives the programmer the ability to program prettily and idiomatically, while making the true invariants more visible to the compiler thus allowing it the freedom to optimize as far as (or further than) illegible programming would've let you. Today's papers are about approaching that goal through "deforestation" techniques which are a subset of "fusion" optimizations. All of today's papers are about automating these optimizations, but compilers are complex and it takes time to work current research into them. So today's post is also a PSA: only you can start forest fires, use libraries which optimize your lists away.

@InProceedings{ CSL2006:rewriting,
    author    = "Duncan Coutts and Don Stewart and Roman Leshchinskiy",
    title     = "Rewriting {H}askell {S}trings",
    booktitle = "PADL 2007: Practical Aspects of Declarative Languages
                 8th International Symposium",
    year      = 2007,
    month     = jan,
    pages     = "50--64",
    publisher = "Springer-Verlag",
    url = {http://www.cse.unsw.edu.au/~dons/papers/CSL06.html},
    url = {http://www.cse.unsw.edu.au/~dons/fps.html},

    Abstract = {
    The Haskell \textit{String} type is notoriously inefficient.
    We introduce a new data type, \textit{ByteString}, based on
    lazy lists of byte arrays, combining the speed benefits of
    strict arrays with lazy evaluation. Equational transformations
    based on term rewriting are used to deforest intermediate
    ByteStrings automatically. We describe novel fusion combinators
    with improved expressivity and performance over previous
    functional array fusion strategies. A library for ByteStrings
    is implemented, providing a purely functional interface, and
    approaches the speed of low-level mutable arrays in C.}
}

If your code does any string processing of note, you'll want to use Data.ByteString available on Hackage. This library provides the standard C implementation of unboxed arrays of characters, but with the standard Haskell interface of high-level functions for treating them like lists. This paper is also one of the best introductions to the field of deforestation I've seen so it's a good read to get a handle on the issues and the solutions out there.

@InProceedings{ Gill93:shortcut,
    author    = "Andrew Gill and John Launchbury and Simon~L. Peyton Jones",
    title     = "A Short Cut to Deforestation",
    booktitle = "FPCA '93: Conference on Functional Programming
                 Languages and Computer Architecture",
    location  = "Copenhagen, Denmark",
    pages     = "223--232",
    month     = jun,
    year      = 1993,
    publisher = "ACM Press",
    address   = "New York, NY, USA",
    
    isbn = "0-89791-595-X",
    url  = {http://citeseer.ist.psu.edu/gill93short.html},
    url  = {http://portal.acm.org/citation.cfm?id=165214},
    doi  = {http://doi.acm.org/10.1145/165180.165214},
    
    Abstract = {
    Lists are often used as ``glue'' to connect separate parts of
    a program together. We propose an automatic technique for
    improving the efficiency of such programs, by removing many of
    these intermediate lists, based on a single, simple, local
    transformation. We have imoplemented the method in the Glasgow
    Haskell compiler.}
}

This is an older paper and one of the first to present a general method for all permissible programs rather than a more restricted subset. The optimizations mentioned here are already implemented for stock GHC lists so you don't even need a library to take advantage of them. This is more of a background paper for contextualizing the PLS papers.

@InProceedings{ CLS2007:stream,
    author    = "Duncan Coutts and Roman Leshchinskiy and Don Stewart",
    title     = "Stream {F}usion: From Lists to Streams to Nothing at All",
    booktitle = "ICFP 2007: Proceedings of the ACM SIGPLAN International
                 Conference on Functional Programming",
    year      = 2007,
    month     = apr,
    note      = "To appear",
    url = {http://www.cse.unsw.edu.au/~dons/papers/CLS07.html},
    url = {http://www.cse.unsw.edu.au/~dons/streams.html},
    
    Abstract = {
    This paper presents an automatic fusion system, \textit{stream
    fusion}, based on equational transformations, that fuses a wider
    range of functions than existing short-cut fusion systems. In
    particular, stream fusion is able to fuse zips, left folds and
    functions over nested lists, including list comprehensions. A
    distinguishing feature of the stream fusion framework is its
    simplicity: by transforming list functions to expose their
    structure, intermediate values are eliminated by general purpose
    compiler optimisations.
    
    We have reimplemented the entire Haskell standard list library
    on top of our framework, providing stream fusion for Haskell
    lists. By allowing a wider range of functions to fuse, we see
    an increase in the number of occurrences of fusion in typical
    Haskell programs. We present benchmarks documenting time and
    space improvements.}
}

This paper is a followup on Rewriting which seeks to apply the fusion techniques to all lists (though not the unboxed arrays part of Data.ByteString). This too is provided as a library Data.List.Stream also on hackage. The plan is for this library to eventually replace the whole Data.List module and to ship with GHC itself.

One of the reasons the PLS folks are made of awesome is the empirical benchmarking they give along with the theory. Sadly, due to some limitations in GHC's general optimization techniques, this library is still experimental and still somewhat hit or miss (as of 2007 at least). For the majority of programs there's marginal gain over the foldr/build approach, for some there's amazing gains (because foldr/build can't apply), but for some deeply nested loops those limitations in GHC means we can't completely clean up after ourselves and so there's actually performance degradation. The moral of the story is to profile your code and use this library whenever possible (and help the maintainers when it breaks). As for the profiling, stay tuned...

The PLS group are made of awesome.

Post a comment in response: