Parallel Map

… in JavaScript.

A number of events and coincidences have reminded me about functional programming and parallelism lately, especially an example I saw of Clojure’s pmap:

pmap
function
Usage: (pmap f coll)
       (pmap f coll & colls)

Like map, except f is applied in parallel. Semi-lazy in that the parallel computation stays ahead of the consumption, but doesn’t realize the entire result unless required. Only useful for computationally intensive functions where the time of f dominates the coordination overhead.

I’ve heard, for a long time, the argument that functional programming offers a programming style better suited for parallelism. After all, MapReduce is perhaps the ultimate form of famous-brand massive parallelization. In a typical map operation, you don’t have to worry so much about all the nasty little parallelism problems since your code, by convention, is side-effect free.

parallelmap

So, as my contribution to the world of JavaScript hackery, here’s parallel map in JavaScript:

parallelmap(proc, tasks, callback)

where proc is a function that takes one argument, tasks is an array of things, and callback is a function that will be called with the results when processing has completed. The computation will occur in parallel and show dramatic speedups on processing-intensive tasks.

You can try the jspmap.js benchmark for yourself (if you have a recent build of Chrome) in a little test runner that I put together. If the test runner says “This isn’t going to work on your browser”, then it isn’t going to work on your browser. Sorry. There are a lot of details in the standards that haven’t been completely worked out yet, so I have to count on vendor-specific implementations to do what I want them to do.

My completely artificial benchmark shows close to a quadruple speedup on my quad-core machine.

var fun = function(x) {
    var q = 3;
    for(i = 0; i < x*300000; i++) {
        q *= 3.1;
        q = q % 10000000;
    }
    return q;
}
var list = [];
for(var i = 0; i < 50; i++) {
    list.push(i);
}
map(fun, list); // a lot of milliseconds
parallelmap(fun, list, alert); // a lot fewer milliseconds

Time to publish? Just don’t let anyone see the downsides:

  • High overhead
  • Relies on not-yet-standardized behaviors
  • Relies on unintended usage (some would call it abuse) of not-yet-standardized behaviors
  • proc must be serializable
  • proc can’t use browser’s native functions
  • proc can’t refer to variables outside of itself
  • List items and result items must be serializable
  • Insane

parallelworkers

The parallelmap function is admittedly a hack, but there’s also a more practical function included called parallelworkers(url, list, callback) that creates a Web Worker in the more natural way of specifying an external script to run. There’s the overhead of fetching another object, but it’s a lot more practical for writing complex scripts and parallelizing them, and it has the same freedoms and limitations as regular Web Workers.

parallelworkers("modulo.js", [1,2,3,4,5], alert);

How it works

The thing works by using Web Workers that only process one thing. A trivial that squares a number might look like:

self.onmessage = function(event) {
    self.postMessage( event.data * event.data )
}

The test runner simply tests multiplication, so each worker multiplies one number and returns the result. The main thread (browser thread) handles assigning work and collecting the results. Note that we don’t have to worry about the typical problems associated with concurrency—no need for locks or mutexes or similar tricky stuff. The implementation handles the gory message-passing details, and the main thread is just a single thread anyways. The architecture of web workers also means there is zero resource sharing—no object references get shared and the workers have no access to the main thread’s data.

Being able to pass in a function to parallelmap is just a dirty illusion, however. Browsers will happily convert a function to a string for you, and even pretty-print it in the process. Thus, attempting to do the following won’t work:

Math.sqrt.toString()
// function sqrt() { [native code] }
parallelmap(Math.sqrt,
    [1,2,3,4,5],
    alert)
// SyntaxError: Unexpected identifier

var x = 5
var fun = function(y) { return x*y }
fun.toString()
// function (y) { return x*y }
parallelmap(fun,
    [1,2,3,4,5],
    alert)
// ReferenceError: x not defined

Because Web Workers can only be created with a URL that points to the JavaScript to run, we have to look at special URLs to get around this. In parallelmap, we create a “blob” object with the (serialized) code. Blobs let us store arbitrary binary data client-side like this:

var bb = new BlobBuilder()
bb.append("Hello world!")
var url = window.URL.createObjectURL(bb.getBlob())

We’re given an identifier that looks like blob:http://notes.ericjiang.com/733d29b6-65ce-4318-9d41-9b6c893c2c6a. That satisfies the requirement for Web Workers (at least in Chrome), but other browsers may have a different idea of what constitutes “same-origin” for a Web Worker, and might refuse to play along with this stunt. The parallelworkers function is more practical, if you don’t mind the overhead of fetching another resource over the network (more overhead!).

Source code

If you’re still reading, the jspmap.js code can be found on GitHub. It provides map, parallelmap, and parallelworkers.

2 Responses to “Parallel Map”

  1. JavaScript Weekly #57 | island205 writes:

    [...] 使用Web Worker实现一个JavaScript的并行Map [...]

  2. Sam writes:

    Awesome that you have tried this and succeeded at it !

Leave a Reply