What concurrency in Node.js could have been
Wednesday, 2 July 2014
People wrote a lot of good comments on my last post about Node.js. (They also wrote some bad comments, but the worst have been modded out.)
One of the key points I was trying to make was that the way concurrency is written in Node.js sucks, so it blew my mind that people kept referring to new libraries that supposedly fixed things: bluebird, fibers, etc. You see, the callback-hell problem is something that is impossible to fix in “user space”. You either have to modify Node.js internals or design a new language that compiles to JavaScript.
Let me explain.
In a language that allows you to write concurrent code, you might write something like:
[javascript]
function bakeCookies(cookie_mix) {
print("Baking cookies…");
var oven = new Oven(cookie_mix);
var cookies = oven.bake(15); // bake for fifteen minutes
cookies.decorate();
return cookies;
}
[/javascript]
In Node.js, oven.bake would be callback based and return immediately (otherwise it would wedge your process):
[javascript]
function bakeCookies(cookie_mix, callback) {
print("Baking cookies…");
var oven = new Oven(cookie_mix);
oven.bake(15, function(cookies) {
cookies.decorate();
callback(cookies);
}); // bake for fifteen minutes
}
[/javascript]
I want to say again – it’s impossible for any user-space library to turn this callback-based version into the first sequential version. Not Q, not bluebird, not fibers, none of it. But you might think that callbacks are needed for your code to be concurrent, that callback hell is the price we pay in order to serve two HTTP requests at once. Wrong! You can have your cake and eat it too!
Why? A good runtime will handle multitasking and the continuations for us:
[javascript]
function bakeCookies(cookie_mix) {
print("Baking cookies…");
var oven = new Oven(cookie_mix);
var cookies = oven.bake(15); // When oven.bake begins, it will yield to other code.
cookies.decorate(); // and the runtime will handle the continuation.
return cookies; // The yield and continuation are hidden from the programmer.
}
[/javascript]
This isn’t fiction. It isn’t new, either. Languages like Go, Haskell, and others let you do this already. It’s not any harder to implement than what JS engines already do. In fact, it’s so straightforward even an undergraduate could do it.
In node.js, you don’t have a choice. All “async” functions will return immediately and go on to the next piece of code, and it’s beyond the control of user-space code. No matter how you slice it, in the end you’re stuck with the overhead of handling callbacks, yields, promises, wrappers, or something else. The only thing that the callback-based model gave us was more ways to shoot ourselves in the foot. There is actually nothing extra you can do with callbacks that you couldn’t do otherwise.
But what would if we designed a Node.js equivalent that actually did handle the yielding and continuation management for us? That made it pleasant to write code that was expressed logically, while still having good performance characteristics? And from the start we abandoned the idea of making everything callbacks?
It already exists, and it’s called StratifiedJS.
[javascript]
// file: shouter.sjs
function shouter(i) {
for(;;) {
console.log("Strata " + i + " says HEY");
// hey! no callbacks!
longRunningComputation();
}
}
// this could be a network request, database access, etc.
function longRunningComputation() {
// blocks execution for 1000-1400 ms
hold(1000 + (Math.random() * 400));
}
for(var i = 0; i < 5; i++) {
spawn shouter(i);
}
[/javascript]
Try running this piece of code with StratifiedJS. Each “strata” (or what some call a “lightweight thread”) executes sequentially. Did we write callbacks, or sprinkle “yield” liberally? No. Spawning 5 threads is just spawning 5 threads. And it’s definitely not using operating system threads (which aren’t “web scale”) because it’s running on top of Node.js.
This is what Node.js could have been.
If I still cared, I might try to hype up StratifiedJS. But it’s too late. We have too many people who believe that callbacks are required to make concurrency work. In all of the comments on my previous post, not a single person mentioned StratifiedJS. The closest was streamline.js, where you replace callbacks with underscores (nevermind that there’s a popular library that’s also an underscore) and it compiles your code down into CPS-ish JavaScript.
The people who realize how bad Node.js is at concurrency have been jumping ship to Go and other languages. And the people who don’t mind callback hell are going to keep using whatever Node.js continuation-management library is in vogue at the moment. I wonder if Go would’ve gotten as popular if Node.js had a good concurrency story from the get-go, but it’s too late to change the mountains of callback-style code that’s been written plus all of the blog posts on how to program Node.js using continuations.
I don’t actually care about StratifiedJS. It’s almost certainly not the only solution. But it’s proof that we could have done things sanely this whole time.
So, StratifiedJS gives you JavaScript with easy, performant, sane concurrency while still maintaining compatibility with the JavaScript ecosystem, and it’s been around since 2010. It must be really popular. How popular? Let’s ask Google:
No. 1 — July 4th, 2014 at 23:36
Well you should investigate into node.js workers that are taking advantage even better when running a multi core server (better than cluster which is available by default).
So there is plenty of room for improvement, just that some things aren’t so well advertised.
No. 2 — July 7th, 2014 at 21:39
> I want to say again – it’s impossible for any user-space library to turn this callback-based version into the first sequential version. Not Q, not bluebird, not fibers, none of it.
This is wrong: neither Q nor bluebird will do it. But fibers will! With fiber’s futures library, you can write:
var bakeCookies = function(cookie_mix) {
print(“Baking cookies…”);
var oven = new Oven(cookie_mix);
var cookies = oven.bake(15).wait(); // will yield
cookies.decorate();
return cookies;
}.future(); // caller will be able to wait() on it.
Now, you can call bakeCookies from another async function:
var bakeManyCookies = function(cookieMix, n) {
for (var i = 0; i < n; i++) bakeCookies(cookie_mix).wait();
}.future();
There is a little bit of extra noise (.wait() and .future()) but the control flow is the same as in your sync version. You can also use regular exception handling (try/catch).
The only gotcha is that you need a Fiber.run(reqHandler) in your HTTP dispatcher and that you have to wrap low level async APIs with futures (see the node-fibers readme). But all the intermediate code (between your dispatcher and the low level APIs that you have wrapped) becomes quasi-synchronous (with wait() calls everywhere it yields).
This is possible because fibers code breaks the run-to-completion semantics of JS by enabling deep continuations (the .wait() calls). This cannot be done in pure JS but it can be done with a C++ addon.
Streamline.js also solves it (but as you say it is not a pure library but a preprocessor). Your code becomes:
function bakeCookies(cookie_mix, _) {
print("Baking cookies…");
var oven = new Oven(cookie_mix);
var cookies = oven.bake(15, _);
cookies.decorate();
return cookies;
}
Here, the _ indicates all the points where code yields.
ES6 generators also allow you to write in sync-like style. For example with galaxy:
function* bakeCookies(cookie_mix) {
print("Baking cookies…");
var oven = new Oven(cookie_mix);
var cookies = yield oven.bake(15);
cookies.decorate();
return cookies;
}
Also, to set things straight, what you say about the impossibility for user-space libraries to turn callbacks into sequential code is not true, even with vanilla ES5 (no fibers, no generators)!!!
Like you, I believed it was impossible but then I read this: http://smellegantcode.wordpress.com/2012/12/28/a-pure-library-approach-to-asyncawait-in-standard-javascript/
Amazing, isn't it? Of course, this makes it theoretically possible, but still practically impossible!
No. 3 — July 17th, 2014 at 22:06
Hi,
You’ve a valid point. Async and callback is something node makes so specially and it’s concurrently model. We don’t want to change Node. But instead we can build on top of that. And some did it:
* Fibers: It’s a way to achieve what you’ve mention as light threads. Meteor[0] uses Fibers and so successful with that.
* Generators/ES6 feature: That’s the language change you’ve suggested/mentioned – See koa[1] which uses generators and allow us to write callback less web apps.
[0] – https://www.meteor.com/
[1] – http://koajs.com/