Parallel Map
Thursday, 8 December 2011
… in JavaScript.
A number of events and coincidences have reminded me about functional programming and parallelism lately, especially an example I saw of Clojure’s pmap:
pmap
function
Usage: (pmap f coll)
(pmap f coll & colls)
Like map, except f is applied in parallel. Semi-lazy in that the parallel computation stays ahead of the consumption, but doesn’t realize the entire result unless required. Only useful for computationally intensive functions where the time of f dominates the coordination overhead.
I’ve heard, for a long time, the argument that functional programming offers a programming style better suited for parallelism. After all, MapReduce is perhaps the ultimate form of famous-brand massive parallelization. In a typical map operation, you don’t have to worry so much about all the nasty little parallelism problems since your code, by convention, is side-effect free.
parallelmap
So, as my contribution to the world of JavaScript hackery, here’s parallel map in JavaScript:
parallelmap(proc, tasks, callback)
where proc
is a function that takes one argument, tasks
is an array of things, and callback
is a function that will be called with the results when processing has completed. The computation will occur in parallel and show dramatic speedups on processing-intensive tasks.
You can try the jspmap.js benchmark for yourself (if you have a recent build of Chrome) in a little test runner that I put together. If the test runner says “This isn’t going to work on your browser”, then it isn’t going to work on your browser. Sorry. There are a lot of details in the standards that haven’t been completely worked out yet, so I have to count on vendor-specific implementations to do what I want them to do.
My completely artificial benchmark shows close to a quadruple speedup on my quad-core machine.
[javascript]
var fun = function(x) {
var q = 3;
for(i = 0; i < x*300000; i++) {
q *= 3.1;
q = q % 10000000;
}
return q;
}
var list = [];
for(var i = 0; i < 50; i++) {
list.push(i);
}
map(fun, list); // a lot of milliseconds
parallelmap(fun, list, alert); // a lot fewer milliseconds
[/javascript]
Time to publish? Just don’t let anyone see the downsides:
- High overhead
- Relies on not-yet-standardized behaviors
- Relies on unintended usage (some would call it abuse) of not-yet-standardized behaviors
proc
must be serializableproc
can’t use browser’s native functionsproc
can’t refer to variables outside of itself- List items and result items must be serializable
- Insane
parallelworkers
The parallelmap
function is admittedly a hack, but there’s also a more practical function included called parallelworkers(url, list, callback)
that creates a Web Worker in the more natural way of specifying an external script to run. There’s the overhead of fetching another object, but it’s a lot more practical for writing complex scripts and parallelizing them, and it has the same freedoms and limitations as regular Web Workers.
[javascript]
parallelworkers("modulo.js", [1,2,3,4,5], alert);
[/javascript]
How it works
The thing works by using Web Workers that only process one thing. A trivial that squares a number might look like:
[javascript]
self.onmessage = function(event) {
self.postMessage( event.data * event.data )
}
[/javascript]
The test runner simply tests multiplication, so each worker multiplies one number and returns the result. The main thread
(browser thread) handles assigning work and collecting the results. Note that we don’t have to worry about the typical problems associated with concurrency—no need for locks or mutexes or similar tricky stuff. The implementation handles the gory message-passing details, and the main thread is just a single thread anyways. The architecture of web workers also means there is zero resource sharing—no object references get shared and the workers have no access to the main thread’s data.
Being able to pass in a function to parallelmap
is just a dirty illusion, however. Browsers will happily convert a function to a string for you, and even pretty-print it in the process. Thus, attempting to do the following won’t work:
[javascript]
Math.sqrt.toString()
// function sqrt() { [native code] }
parallelmap(Math.sqrt,
[1,2,3,4,5],
alert)
// SyntaxError: Unexpected identifier
var x = 5
var fun = function(y) { return x*y }
fun.toString()
// function (y) { return x*y }
parallelmap(fun,
[1,2,3,4,5],
alert)
// ReferenceError: x not defined
[/javascript]
Because Web Workers can only be created with a URL that points to the JavaScript to run, we have to look at special URLs to get around this. In parallelmap
, we create a “blob” object with the (serialized) code. Blobs let us store arbitrary binary data client-side like this:
[javascript]
var bb = new BlobBuilder()
bb.append("Hello world!")
var url = window.URL.createObjectURL(bb.getBlob())
[/javascript]
We’re given an identifier that looks like blob:http://notes.ericjiang.com/733d29b6-65ce-4318-9d41-9b6c893c2c6a
. That satisfies the requirement for Web Workers (at least in Chrome), but other browsers may have a different idea of what constitutes “same-origin” for a Web Worker, and might refuse to play along with this stunt. The parallelworkers
function is more practical, if you don’t mind the overhead of fetching another resource over the network (more overhead!).
Source code
If you’re still reading, the jspmap.js code can be found on GitHub. It provides map
, parallelmap
, and parallelworkers
.
No. 1 — December 19th, 2011 at 2:02 am
[…] ??Web Worker????JavaScript???Map […]
No. 2 — December 20th, 2011 at 10:16 am
Awesome that you have tried this and succeeded at it !