[transl.] You went overseas—are you happy? (“出国的你 快乐吗”)

Found the below article through so-called “social media”. Don’t have any affiliation with the author nor the WeChat group, but thought it was worth being translated.

Original post at: http://mp.weixin.qq.com/s?__biz=MzA4ODQwMTkyOQ%3D%3D&mid=200035572&idx=1&sn=6fc8fba95e4c22b96616f5cd79b99437


[n.b. photos from original post omitted]

大家看着留学生在国外照的照片,不停说,好羡慕,好美好的生活。
Everyone looking at the photos from students studying abroad keeps saying, “Wow, what a wonderful life. I’m jealous.”
千万不要羡慕,这条路看去光鲜亮丽,其实每走一步就多一个伤痕。
Don’t be jealous, because the road they take might seem bright and glamorous, but every step taken is another scar suffered.

那些没有熟悉友人的日子,必须要学会一个人去熬,夜深人静的时候也是一个人独自听歌发呆,有时候听着听着眼泪就不停往下掉,手机里翻不到一个可以打电话诉说的人,太晚了,大家都睡了。
Those days when you don’t have a friend, you have to learn to endure them alone. In the dead of night, spacing out and listening to music, the tears start to flow. Flipping through your phone’s contacts, you can’t find a single person to talk to. It’s too late—everyone’s asleep.

那些没有家人陪伴的日子,一个人发烧到晕倒的时候,也只能扶着旁边的东西慢慢站起来,自己去弄吃的。就算饮食不规律,饿出了胃病,也得一个人上街去买药,拿着手机词典一个个查该买哪种药,是什么解释。
Those days when you don’t have family close by, when you catch a fever and feel like passing out, you can only lean against something and shuffle your way out to find something to eat. Or when your eating habits are thrown off and you feel sick, you’re still out there by yourself buying medicine, holding your phone trying to translate every unfamiliar label.

那些没有家乡菜的日子,每天都买面包,吃披萨,看着网上朋友们发的照片,心里不由得又难受起来,常常会想,哪,要我在家里就好了,我可以吃到好多好多好吃的菜,等我回去我一定要吃!
Those days when you don’t have your hometown foods, when you’re buying bread and eating pizza every day, you see your friends’ photos online. You suddenly feel down, thinking, “Man, if only I were back home eating all those delicious things. I’m definitely going to eat them when I go back!”

在国外,不能再像在家里一样,遇到什么就打电话回家抱怨,因为你知道,你在这头抱怨,父母在那头翻来覆去睡不着觉,等着第二天再不停打电话给你,巴不得马上跑到你身边帮你解决所有困难。于是学着自己去承受,遇到天大的事情自己顶着,顶累了就找个地方休息一下,想想爸妈的样子,嗯,你可以的,你可以撑过去。
Overseas, it’s like not being back home. You can’t just phone home and vent whenever something happens because, you know, complaining over here is just going to make your parents back home lose sleep. They’ll call you non-stop the next day—they’d come all the way over here to be by your side and help you fix all your problems, if only they could. So you learn to bear the world on your shoulders. And when that wears you down, you find a place to rest and think of mom and dad… “I can do it. I can pull through.”

在国外,遇到知心朋友不容易,大家都在不同环境下长大,需要很多磨合,所以一旦遇到一个谈得来的,就紧紧抓住不肯放,在外面能遇到一个可以无话不说的朋友是多么幸运的事情,你说你要去他的家乡玩,他也说会到你的家乡探望你。
Overseas, it’s rare to find a friend who understands you. It takes a lot of getting used to the habits of people who were raised differently, who grew up in a different environment. So when you do find someone who you get along with, you hold on to them as tight as you can. How lucky it is to find the kind of friend that just gets you! You’re going to visit their hometown when you go back home, and they’ll maybe swing by your area if they get a chance!

在国外,遇到一大堆歧视华人的外国人,听他们说话的时候真想一拳揍在他们脸上,但是你必须冷静,就算咬牙切齿也要冷静,因为你要告诉他们,真正没有素质的是说是非的他们,不是中国人,因为你要向所有人证明,中国留学生可以帮中国赢回面子,我们有气度。
Overseas, you see a lot of foreigners who are racist against Chinese people. When you hear what they say, you just want to go up and slap them across the face. But you have to stay cool. Just grit your teeth and stay cool, because you need to show them that the ones who don’t know right from wrong are them, not us. Because you need to show everyone that Chinese overseas students can win back some respect for China, that we can be the bigger man.

在国外,孤独,空虚,无助,这些都太明显了,当新鲜感过了之后,所有的不安都统统涌来,那时候心里的不安向谁述说,不能写在网上,因为会有很多人过问,你只想懂你的人来和你说说话,于是,你要学会和寂寞交朋友。
Overseas, the loneliness, emptiness, and helplessness are all too obvious. When the novelty of it all wears off, all of your fears and uneasiness well up inside you. Who are you going to tell this to? You can’t write it online—too many people will ask what’s up. You just want to talk to someone who understands you, so you learn to befriend loneliness.

在国外,不能随时都打电话回来,因为有时差。难过的时候谁陪你,影子陪你。别忘了,你还有一大堆英文没背,你没有时间浪费,你要做给那些曾经瞧不起你的人看,你要比那些外国人强,你要做给你自己看,不逼自己一把,你不知道自己有多优秀。
Overseas, you can’t call home whenever you want with the timezone difference. When you’re down in the dumps, who’s by your side? It’s just your shadow by your side. And don’t forget about all the English vocab that you still haven’t memorized. No time to waste—you have to show those people who looked down on you that you’re stronger than them. You have to show yourself (because it’s easy to forget) how amazing you are.

异地恋变得好有压力,彼此有了各自的交友圈,每次谈话都在谈以前的事情,你说你遇到的事情,他不懂,他说他遇到的新鲜事,你却不了解。
Keeping up a long-distance relationship is so stressful. You each have your own circle of friends and just keep talking about what happened in the past. You tell him about something that happened, but he doesn’t get it; he tells you about something new, but you don’t really understand.
我想找一个和我在同一个地方的人,我们一起努力,为了去同一个大学而努力,睁开眼睛可以去找他,周末就去图书馆看书背单词,放假就背着书包去旅游,然后一起打工。
I want to find someone in the same place as me. We’ll work hard to get into the same college, to be able to see each other with our own eyes. On weekends we’ll cram vocabulary in the library. On vacations, with our backpacks on our back, we’ll travel, and then together we’ll get jobs.

有个人陪你,再多的苦和难好像有减轻了,不愁时差不同,做完作业发一句晚安。
With someone besides you, all the burdens of the world feel lighter. No worrying about timezones—when you’re done with homework just send him a “good night”.
有个人陪你,再多的空虚和孤独都有个人和你一起承担,有了更多的梦想,虽然不切实际,但是想为青春疯狂一次。
With someone besides you, there’s someone else to fight the loneliness and isolation with. There’s someone to share in all your wild dreams and the craziness of youth.
留学生,一点也不容易。
There’s nothing easy about being an overseas student.

要承受的东西太多,每走一步都承载着好多压力。
With all these things to worry about, you feel the pressure of every step.
很想念在国内的朋友,很遗憾没有和他们照毕业照,遗憾没有和他们一起喝酒聊天,很怀念曾经走过的时光。
You miss all of your friends back home. You regret not being in their graduation pictures. You regret not sharing in the good times, not chatting with them over drinks. What happened to those days?
后来,只能在网上互相留言,留的最多的便是,I MISS U。
The Internet becomes your only connection to them. Your most common post becomes, “I MISS U”.
我是真的很想念你们。
I really miss you all.

想和你们一起去唱歌喝酒,想在繁华的家乡和一群朋友压马路,我从来不在乎去哪里,只在乎和谁在一起。
I want to go out with you all and sing and drink. I want get a group of friends together and wander the bustling streets of my hometown. I’ve never cared about where I go, only who I go with.
接下来的路,还很长,留学,毕业,未来,生活。
The road ahead is still long: studying abroad, graduation, the future, life.
留学生多牛逼,一直坚持下去,别中途放弃,好好学,只有这样这条路才有价值,才对得起父母花的钱。
How mighty overseas students have to be: endure the journey, resist quitting early, study hard. How else could we own up to how much money our parents are spending?

记得有句话是这么讲的:出国就像拍A片,看着的人爽,拍的人的艰辛没有人体会到>0<
Remember how the saying goes: going overseas is like filming a porno. The viewers get a kick out of it, but they don't know the pain that the actors go through >0<
欢迎在外国的华人学子以及想出国的童鞋关注我们的微信号:海外咨询快报(haiwai66)☺
All Chinese students overseas as well as those who want to go overseas, come check out our WeChat group: 海外咨询快报(haiwai66)☺


Again, no relation to the author. All views expressed are their own, not mine nor that of my affiliated orgs. Any faults in the English translation are mine alone. Jeez.

Workaround for HipChat on Linux: “can’t find build id”, “HashElfTextSection”

The new version of HipChat added support for video and screen-sharing. It also introduced the new requirement of OpenGL 2.0. On my computer, HipChat would crash on startup with repeated messages of

can't find build id
HashElfTextSection
can't find build id
HashElfTextSection
can't find build id
HashElfTextSection
can't find build id
HashElfTextSection

I wasn’t going to try video chatting on my old netbook anyways, so there’s no reason I needed hardware support for rendering text and emoticons.

First, get your system’s OpenGL info by running `glxinfo | grep OpenGL` and find your version string:

OpenGL vendor string: Tungsten Graphics, Inc
OpenGL renderer string: Mesa DRI Intel(R) 945GME x86/MMX/SSE2
OpenGL version string: 1.4 Mesa 8.0.4

If it’s a Mesa driver, you can force software rendering by setting LIBGL_ALWAYS_SOFTWARE=1. Try running HipChat like so:

LIBGL_ALWAYS_SOFTWARE=1 hipchat

And see if HipChat will run. You can also check what your OpenGL version is with software rendering by running

LIBGL_ALWAYS_SOFTWARE=1 glxinfo | grep OpenGL

and make sure that the output lists 2.0+:

OpenGL vendor string: VMware, Inc.
OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 0×300)
OpenGL version string: 2.1 Mesa 8.0.4

What concurrency in Node.js could have been

People wrote a lot of good comments on my last post about Node.js. (They also wrote some bad comments, but the worst have been modded out.)

One of the key points I was trying to make was that the way concurrency is written in Node.js sucks, so it blew my mind that people kept referring to new libraries that supposedly fixed things: bluebird, fibers, etc. You see, the callback-hell problem is something that is impossible to fix in “user space”. You either have to modify Node.js internals or design a new language that compiles to JavaScript.

Let me explain.

In a language that allows you to write concurrent code, you might write something like:

function bakeCookies(cookie_mix) {
  print("Baking cookies...");
  var oven = new Oven(cookie_mix);
  var cookies = oven.bake(15);  // bake for fifteen minutes
  cookies.decorate();
  return cookies;
}

In Node.js, oven.bake would be callback based and return immediately (otherwise it would wedge your process):

function bakeCookies(cookie_mix, callback) {
  print("Baking cookies...");
  var oven = new Oven(cookie_mix);
  oven.bake(15, function(cookies) {
    cookies.decorate();
    callback(cookies);
  });  // bake for fifteen minutes
}

I want to say again - it's impossible for any user-space library to turn this callback-based version into the first sequential version. Not Q, not bluebird, not fibers, none of it. But you might think that callbacks are needed for your code to be concurrent, that callback hell is the price we pay in order to serve two HTTP requests at once. Wrong! You can have your cake and eat it too!

Why? A good runtime will handle multitasking and the continuations for us:

function bakeCookies(cookie_mix) {
  print("Baking cookies...");
  var oven = new Oven(cookie_mix);
  var cookies = oven.bake(15);  // When oven.bake begins, it will yield to other code.
  cookies.decorate();           // and the runtime will handle the continuation.
  return cookies;               // The yield and continuation are hidden from the programmer.
}

This isn't fiction. It isn't new, either. Languages like Go, Haskell, and others let you do this already. It's not any harder to implement than what JS engines already do. In fact, it's so straightforward even an undergraduate could do it.

In node.js, you don't have a choice. All "async" functions will return immediately and go on to the next piece of code, and it's beyond the control of user-space code. No matter how you slice it, in the end you're stuck with the overhead of handling callbacks, yields, promises, wrappers, or something else. The only thing that the callback-based model gave us was more ways to shoot ourselves in the foot. There is actually nothing extra you can do with callbacks that you couldn't do otherwise.

But what would if we designed a Node.js equivalent that actually did handle the yielding and continuation management for us? That made it pleasant to write code that was expressed logically, while still having good performance characteristics? And from the start we abandoned the idea of making everything callbacks?

It already exists, and it's called StratifiedJS.

// file: shouter.sjs
function shouter(i) {
    for(;;) {
        console.log("Strata " + i + " says HEY");
        // hey! no callbacks!
        longRunningComputation();
    }
}

// this could be a network request, database access, etc.
function longRunningComputation() {
    // blocks execution for 1000-1400 ms
    hold(1000 + (Math.random() * 400));
}

for(var i = 0; i < 5; i++) {
    spawn shouter(i);
}

Try running this piece of code with StratifiedJS. Each "strata" (or what some call a "lightweight thread") executes sequentially. Did we write callbacks, or sprinkle "yield" liberally? No. Spawning 5 threads is just spawning 5 threads. And it's definitely not using operating system threads (which aren't "web scale") because it's running on top of Node.js.

This is what Node.js could have been.

If I still cared, I might try to hype up StratifiedJS. But it's too late. We have too many people who believe that callbacks are required to make concurrency work. In all of the comments on my previous post, not a single person mentioned StratifiedJS. The closest was streamline.js, where you replace callbacks with underscores (nevermind that there's a popular library that's also an underscore) and it compiles your code down into CPS-ish JavaScript.

The people who realize how bad Node.js is at concurrency have been jumping ship to Go and other languages. And the people who don't mind callback hell are going to keep using whatever Node.js continuation-management library is in vogue at the moment. I wonder if Go would've gotten as popular if Node.js had a good concurrency story from the get-go, but it's too late to change the mountains of callback-style code that's been written plus all of the blog posts on how to program Node.js using continuations.

I don't actually care about StratifiedJS. It's almost certainly not the only solution. But it's proof that we could have done things sanely this whole time.

So, StratifiedJS gives you JavaScript with easy, performant, sane concurrency while still maintaining compatibility with the JavaScript ecosystem, and it's been around since 2010. It must be really popular. How popular? Let's ask Google:

Comments also on Hacker News

The emperor’s new clothes were built with Node.js

There are plenty of people lambasting Node.js (see the infamous “Node.js is cancer”) but proponents tend to misunderstand the message and come up with irrelevant counterpoints. It’s made worse because there are two very different classes of people that use Node.js. The first kind of people are those who need highly concurrent servers that can handle many connections at once: HTTP proxies, Websocket chat servers, etc. The second are those who are so dependent on JavaScript that they need to use JS across their browser, server, database, and laundry machine.

I want to address one-by-one all of the strange and misguided arguments for Node.js in one place.

Update: Please keep sending in anything I missed! Moderation of comments will probably continue to lag behind, but I do read them and will continue fixing and tightening up this article as best as I can.

TL;DR: what’s clothing that doesn’t use threads and doesn’t block (anything)?

Node.js is fast!

This is actually too imprecise. Let’s break it down into two separate claims:

a. JavaScript running on V8 is fast!

You have to give the V8 developers some kudos. V8 has done incredible things to run JavaScript code really fast. How fast? Anywhere from 1x to 5x times slower than Java, at least for the Benchmarks Game. (Some of you may not realize that “slower” is not a typo.)

If you look at their benchmarks, you’ll notice that V8 ships a really freakin’ good regex engine. Conclusion? Node.js is best suited for CPU-bound regex-heavy workloads.

So if we take the Benchmarks Game to be gospel, then what languages/implementations are typically faster than JavaScript/V8? Oh, just some unproductive ones like Java, Go, Erlang (HiPE), Clojure, C#, F#, Haskell (GHC), OCaml, Lisp (SBCL). Nothing that you could write a web server in.

And it’s good that you don’t need to use multiple cores at once, since the interpreter is single-threaded. (Comments will no doubt point out that you can run multiple processes in Node.js, something that you can’t do with any other language.)

b. Node.js is non-blocking! It has super concurrency! It’s evented!

Sometimes I wonder whether people even understand what they’re saying.

Node.js is in this weird spot where you don’t get the convenience of light-weight threads but you’re manually doing all the work a light-weight threads implementation would do for you. Since JavaScript doesn’t have built-in support for any sort of sane concurrency, what grew out of it was a library of functions that use callbacks. PL folks will realize that it’s just a crappy version of continuation-passing style (Sussman and Steele 1975), but instead of being used to work around recursion growing the stack, it’s used to work around a language without built-in support for concurrency.

So yes, Node.js can effectively deal with many connections in a single-threaded application, but it wasn’t the first or only runtime to do so. Look at Vert.x, Erlang, Stackless Python, GHC, Go…

The best part is all the people jumping through hoops to create their MVP in Node.js because they think it’ll make their site faster for their swarms of future users. (Never mind that loading 500K of Backbone.js code and miscellaneous libraries is not very high performance anyways.)

Node.js makes concurrency easy!

JavaScript doesn’t have built-in language features for concurrency, Node.js doesn’t provide that magic, and there are no metaprogramming capabilities. You have to manage all of your continuations manually, or with the help of (lots of different) libraries that push JavaScript syntax to its absurd limits. (BTW, I find should.js both horrific and convenient.) It’s the modern-day equivalent of using GOTO because your language doesn’t have for loops.

Let’s compare.

In node.js, you might write this function for some business task:

function dostuff(callback) {
  task1(function(x) {
    task2(x, function(y) {
      task3(y, function(z) {
        if (z < 0) {
          callback(0);
        } else {
          callback(z);
        });
    });
  });
}

Clear as mud. Let’s use Q promises instead!

function dostuff() {
  return task1()
    .then(task2)
    .then(task3)
    .then(function(z) {
      if (z < 0) {
        return 0;
      } else {
        return z;
      });
}

A lot more readable, but still dumb. One side effect is that Q eats your exceptions unless you remember to finish your chain with “.done()”, and there are plenty of other pitfalls that aren’t obvious. Of course, most libraries in Node.js don’t use Q, so you’re still stuck using callbacks anyways. What if task2 didn’t return a Q promise?

function dostuff() {
  return task1()
    .then(function(x) {
      var deferred = Q.defer();
      task2(x, deferred.resolve);
      return deferred;
    })
    .then(task3)
    .then(function(z) {
      if (z < 0) {
        return 0;
      } else {
        return z;
      }
    })
    .done();
}

The code above is broken. Can you spot why? By the way, we also forgot to handle exceptions. Let’s fix these issues:

function dostuff() {
  return task1()
    .then(function(x) {
      var deferred = Q.defer();
      task2(x, function(err, res) {
        if (err) {
          deferred.reject(err);
        } else {
          deferred.resolve(res);
        }
      });
      return deferred.promise;
    },
    function(e) {
      console.log("Task 1 failed.");
    })
    .then(task3, function(e) {
      console.log("Task 2 failed.");
    })
    .then(function(z) {
      if (z < 0) {
        return 0;
      } else {
        return z;
      }
    },
    function(e) {
      console.log("Task 3 failed.");
    })
    .done();
}

Notice how the error handling and the tasks they correspond to are interleaved. Are we having fun yet?

In Go, you can write code like this:

func dostuff() int {
  z := task3(task2(task1())))
  if z < 0 {
    return 0
  }
  return z
}

Or with error handling:

func dostuff() int, err {
  x, err := task1();
  if err != nil {
    log.Print("Task 1 failed.")
    return 0, err
  }
  y, err := task2(x);
  if err != nil {
    log.Print("Task 2 failed.")
    return 0, err
  }
  z, err := task3(y);
  if err != nil {
    log.Print("Task 3 failed.")
    return 0, err
  }
  if z < 0 {
    return 0;
  }
  return z;
}

Realize that both the Go and Node.js versions are basically equivalent, except Go handles the yielding and waiting. In Node.js, we have to manage our continuations manually because we have to work against the built-in control flow.

Oh, before you actually do any of this stuff, you have to learn not to release Zalgo, possibly by using synthetic deferrals (say what?) so that you don’t make your API’s users unhappy. In the world of “lean” and MEAN MVPs, who has time to learn about leaky abstractions on top of some obtuse runtime?

By the way, Q is super slow (or so the Internet says). Check out this handy performance guide comparing 21 different ways of handling asynchronous calls!

No wonder people love Node.js. It gives you the same performance as lightweight threads but with the clarity and usability of x86 assembly.

When people point out how unpleasant it is to manually handle control flow in Node.js, the proponents say, “Use libraries to handle that, like async.js!” So you start using library functions to run a list of tasks in parallel or compose two functions, which is exactly what you’d do with any threaded language, except worse.

LinkedIn went from 30 servers to 3 by switching to Node.js!

Quoth Hacker News: “I switched from a dump truck to a motorbike and now I drive a lot faster!”

PayPal and Wal-Mart have also had high-profile switches to Node.js. Of course, they’re comparing two completely different things to make Node.js look better. In these too-good-to-be-true stories, they’re switching from a gigantic enterprisey codebase to a Node.js app written from scratch. Is there any question that it wouldn’t have been faster? They could have switched to pretty much any anything and gotten a performance gain.

In LinkedIn’s case, they had proxies running on Mongrel with a concurrency of 1. It’s like switching from using one finger to type on a QWERTY keyboard to using ten fingers on a Dvorak keyboard and giving all the credit to Dvorak for a better keyboard layout.

This is classic hype: real-world stories misunderstood and twisted to confuse the unwitting.

It lets you leverage your existing JavaScript expertise!

Let’s be more specific and break this down into a couple parts:

a. The frontend devs can work on the backend!

Where was JavaScript used previously? Primarily browser-side front-end code to animate buttons or smoosh JSON into fancy interfaces. By leveraging JavaScript on the backend, you let your ninja UI devs hack on mission-critical networking code. Since it’s JS on both ends, there’s nothing to learn! (Right?)

Wait until they find out that they can’t use return normally (because concurrency!), they can’t use throw/catch normally (because concurrency!), and everything they call is callback based, returns a Q promise, returns a native promise, is a generator, is a pipe, or some other weird thing because it’s Node.js. (Just tell them to check the type signatures.)

Have some faith in your frontend devs’ ability to learn a different backend language. Because if a different language for the backend is too big an obstacle, then so is figuring out how to mix and match all the different callbacks/promises/generators into code that doesn’t collapse every time a change is made.

b. We can share code between the backend and frontend!

You’re then limiting your server-side code to use only language features that browsers support. For example, your shared code can’t use JS 1.7 generators until the browsers support it too and we have enough experience to know that adoption could take years.

Effectively, we can’t improve the server language in Node in substantial ways without drifting away from the browser language. Node.js has so many gaping holes that are up to the libraries to fix, but since it’s chained to the language we call JavaScript, it’s can’t strike out on its own to address these things at the language level.
It’s an awkward situation where the language doesn’t give you much, but you can’t change the language so you keep doing npm install band-aid.

This can be fixed by running some sort of compilation step to transform new language features into older features so you can write for the server and still run on regular JavaScript. Your choices are either something that’s 95% JavaScript (TypeScript, CoffeeScript) or not JavaScript at all (ClojureScript, perhaps).

More worrying is that this argument implies that you actually muddle the concerns of your server and frontend. In the real world, you’ll find that your backend turns into a JSON API that handles all of the validation, processing, etc., and you have multiple (sometimes third-party) consumers of that API. For example, when you decide to build iPhone and Android apps, you’ll have to decide between a native app in Java, Obj-C, or C#, or packing your one-page Backbone.js/Angular.js app using Phonegap/Cordova. The code you share between the server and client may end up being a liability, depending on what platform you go with.

NPM is so great!

I think NPM has attained a status of “not awful”, which puts it ahead of many other package managers. Like most ecosystems, NPM is pretty cluttered with multiple redundant implementations of the same thing. Say you need a library for sending Android push notifications. On NPM, you’ll find: gcm, node-gcm, node-gcm-service, dpush, gcm4node, libgcm, and ngcm, not to mention all the libraries that support multiple notification services. Which are reliable? Which are abandoned? In the end, you just pick the one that has the most downloads (but why can’t we sort results by popularity?).

NPM also has a less-than-stellar operations track record. It used to go down quite often and it was hilarious seeing all the companies that suddenly couldn’t deploy code because NPM was having troubles again. Its up-time is quite a bit better now, but who knows if they will suddenly break your deployment process because they can.

We somehow managed to deploy code in the past without introducing a deploy-time dependency on a young, volunteer-run, created-from-scratch package repository. We even did such blasphemous things as including a copy of the library source locally!

I’m not worried about NPM—in some sense, it’s part of the ecosystem but not the language, and it generally gets the job done.

I’m so productive with Node.js! Agile! Fast! MVP!

There seems to be a weird dichotomy in the minds of Node.js programmers: either you’re running mod_php or some Java EE monstrosity and therefore a dinosaur, or you’re on Node.js and super lean and fast. This might explain why you don’t see as many people bragging about how they went from Python to Node.js.* Certainly, if you come from an over-engineered system where doing anything requires an AbstractFactoryFactorySingletonBean, the lack of structure in Node.js is refreshing. But to say that this makes Node.js more productive is an error of omission—namely, they leave out all the things that suck.

Here’s what a newcomer to Node.js might do:

1. This function might fail and I need to throw an exception, so I’ll write throw new Error("it broke");.
2. The exception isn’t caught by my try-catch!
3. Using process.on("uncaughtException") seemed to do it.
4. I’m not getting the stacktrace I expected, and StackOverflow says that this way violates best practices anyways.
5. Maybe if I try using domains?
6. Oh, callbacks typically take the error as the first parameter. I should go back and change my function calls.
7. Someone else told me to use promises instead.
8. After reading the examples ten or twelve times, I think I have it working.
9. Except that it ate my exceptions. Wait, I needed to put .done() at the end of the chain.

Here’s a Python programmer:

1. raise Exception("it broke");

Here’s a Go programmer:

1. I’ll add err to my return signature and change my return statements to add a second return value.

There is a lot of stuff in Node.js that actually gets in the way of producing an MVP. The MVP isn’t where you should be worrying about returning an HTTP response 40ms faster or how many simultaneous connections your DigitalOcean “droplet” can support. You don’t have time to become an expert on concurrency paradigms (and you’re clearly not because you wouldn’t be using Node otherwise!).

* Check out this great post about switching from Python to Node.js. The money quote is, “Specifically, the deferred programming model is difficult for developers to quickly grasp and debug. It tended to be ‘fail’ deadly, in that if a developer didn’t fully understand Twisted Python, they would make many innocent mistakes.” So they switched to another difficult system that fails in subtle ways if you don’t fully understand it and make innocent mistakes!

I love Node.js! Node.js is life!

Does your local Node.js Meetup group need a presenter? I am available for paid speaking engagements. Email me for more info.

These opinions do not represent that of and were not reviewed by my employer or colleagues. Also, feel free to wrap all lines with <sarcasm> tags.

OpenStreetMap provider CloudMade shuts its doors on small users

(Original email at bottom.)

CloudMade, a company selling mapping services (many based on OpenStreetMap data) that competed head-to-head with Google, let its users know that as of May 1st, they’ll stop serving anyone who’s not on an enterprise plan. This is rather sad, because they were one of the main alternatives for custom OpenStreetMap tiles.

Their map tiles definitely left something to be desired. The OSM data that they were using seems to have been last refreshed around the time Steve Coast left (maybe that’s a wee bit of an exaggeration) and the rendering was never very polished—ugly icons and labels getting cut off on tile boundaries. But for $25/1M tiles (with the first 500k free), could you really complain?

CloudMade even listed Steve Coast, founder of OpenStreetMap, as a co-founder. Steve Coast left in 2010, and it was hard to tell what the company was trying to become. Now, we see that they’re gunning for enterprise services, along the lines of Navteq and TomTom. Instead of dealing with small fries like us, they’re apparently focusing on bigger deals like providing data for hardware and consumer electronics.

Maybe they just got tired of my emails to support asking why this or that was broken or when they’d update their data. Now, we’re left with almost no options for custom hosted OSM tiles. MapBox is one popular choice, but their online map customizer is elementary compared to CloudMade’s (and CloudMade’s was not super advanced). MapBox also have stricter terms of how their map tiles can be used. No proxying/caching of MapBox tiles is allowed, for example, especially since they charge based on usage.

CloudMade helpfully gave some alternative providers for us small fries to switch to. Still, one less provider means more risk when using a hosted provider. For example, who are we going to turn to when MapQuest decides to shut off its routing services?

Here’s to hoping people will step up and fill the gap that CloudMade is leaving. Us little users who will only pay a couple hundred dollars per month will then have somewhere else to go.

This is what came through today:

Hi [username],

We want to let you know about some changes we’re making to the CloudMade APIs. As of May 1st we’re switching to an enterprise model that supports the medium to large sized users of the CloudMade APIs. As part of this transition we’ll stop serving Map Tile, Geocoding, Routing and Vector Stream Server requests coming from your API keys below as of May 1st, unless you take action.

Your active CloudMade API keys are: W,X,Y,Z

If you wish to continue using the CloudMade services after April 30th you’ll need to upgrade to an enterprise plan. Enterprise plans are available for customers with 10,000,000 or more transactions per month. The plans include dedicated hosting, custom SLAs, 24×7 support from a named customer support representative and custom data loading. You can find out more about upgrading and request more information on the Web Portals page.

If your monthly usage is less than 10,000,000 transactions, or you don’t wish to upgrade to an enterprise plan, you should take action to update the app or website that’s using the CloudMade API keys shown above to use an alternative provider. There are a number of alternative providers of Map Tiles, Geocoding and Routing services based on OpenStreetMap data, for example:

- Mapquest (Map Tiles, Routing, Geocoding)

- MapBox (Styled Map Tiles)

Thanks for using CloudMade’s APIs over the past months and years. If you don’t switch to an enterprise plan, we wish you a smooth transition to the new service provider you choose.

[...]

Disclaimer: Nothing written here represents my employer in any way. I am/was a mostly satisfied user of many OSM-based services out there, including MapBox, MapQuest, and CloudMade.

Good things happen when you subtract datetimes in MySQL

Of course, you know that “good things” and “MySQL” don’t go together. File this one under the category of “small ways in which MySQL is broken”.

Let’s fire up MySQL 5.1.72-2-log or 5.5.34-log.

mysql> create temporary table blah
    -> (alpha datetime, beta datetime);
Query OK, 0 rows affected (0.01 sec)

mysql> describe blah;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| alpha | datetime | YES  |     | NULL    |       |
| beta  | datetime | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+
2 rows in set (0.00 sec)

OK, so we have two datetimes in a table. Let’s try adding a row:

mysql> insert into blah (alpha, beta)
    -> VALUES ('2014-01-01 03:00:00', '2014-01-01 03:00:37'); 
Query OK, 1 row affected (0.00 sec)

What happens if we try subtracting two datetimes?

mysql> select alpha, beta, beta - alpha from blah;
+---------------------+---------------------+--------------+
| alpha               | beta                | beta - alpha |
+---------------------+---------------------+--------------+
| 2014-01-01 03:00:00 | 2014-01-01 03:00:37 |    37.000000 |
+---------------------+---------------------+--------------+
1 row in set (0.00 sec)

So we got the number of seconds between the two datetimes. Let’s try that again with two datetimes a minute apart.

mysql> insert into blah (alpha, beta)
    -> VALUES ('2014-01-01 03:00:00', '2014-01-01 03:01:00');
Query OK, 1 row affected (0.00 sec)

mysql> select alpha, beta, beta - alpha from blah;
+---------------------+---------------------+--------------+
| alpha               | beta                | beta - alpha |
+---------------------+---------------------+--------------+
| 2014-01-01 03:00:00 | 2014-01-01 03:00:37 |    37.000000 |
| 2014-01-01 03:00:00 | 2014-01-01 03:01:00 |   100.000000 |
+---------------------+---------------------+--------------+
2 rows in set (0.00 sec)

So, 100 seconds in a minute? Yikes. Obviously, this isn’t how you’re supposed to subtract datetimes in MySQL. But the great part is that it kind of works! You get a number back that correlates to the actual interval of time between the two, and if you’re measuring lots of small intervals, you might not notice that your data is between 100% and 167% of what it should be. Excellent puzzle to drive a junior dev crazy!

Wait, any reasonable database would have known that we were making a mistake, right?

mysql> show warnings;
Empty set (0.00 sec)

tuntuntun – Combine Multiple Internet Connections Into One

GitHub repo: https://github.com/erjiang/tuntuntun (proof of concept status)

I was trying to play Minecraft by tethering over a Sprint data connection but was having awful random latency and dropped packets. The Sprint hotspot seems to only allow a limited number of connections to utilize the bandwidth at a time – a download in Chrome would sometimes stall all other connections. This was a huge problem in Minecraft, as loading the world chunks would stall my movements, meaning that I could teleport somewhere and die to an enemy by the time the map finished loading.

I’ve been seeing the idea of channel bonding here and there for a while, and it always seems like a cool idea without any popular and free implementations. Most of the approaches, though, were restricted to assigning different connections to different network interfaces. Essentially, a connection to YouTube might go over one link, while a software download might go out another. This works OK if you’re trying to watch YouTube while downloading updates, and it works great for many-connection uses like BitTorrent but in this case, I wanted to create a single load-balanced connection. So, I created tuntuntun.

Somewhat like a VPN

tuntuntun diagram

This requires the help of an outside server to act as the proxy for all of the connections, because most current Internet protocols require connections to originate from one address. The idea is to establish a connection to the proxy using a custom protocol that allows data to be split between two links. Tuntuntun works by encapsulating IP traffic in UDP packets, and does this in userspace using tun interfaces. A tun interface is a virtual network interface that has a userspace program as the other end. This means that any packets sent to the tun are read by the userspace program, and anything that the userspace program writes to it becomes “real” packets in the kernel.

Tuntuntun establishes a UDP socket on each interface that you bond. In the hypothetical scenario where eth0 and eth1 are combined, then tuntuntun will open two sockets and use SO_BINDTODEVICE to force these two sockets to only use eth0 and eth1. Then, it creates another interface, tun0, and modifies the routing table so that the operating system routes all Internet traffic through tun0. This way, tuntuntun can intercept all Internet traffic on the machine.

On the server side, the server process listens to a certain port for UDP packets and also opens a tun interface. It does the same basic function of passes packets back and forth, like a middleman. This is effectively what software like OpenVPN does, except the purpose is quite different.

Load-balancing packets

The client, currently, just uses a round-robin load-balancer that will send all even packets out eth0 and all odd packets out eth1. Effectively, all normal packets that go into tun0 on the client are wrapped in a UDP packet and then sent to the server. The server just needs to receive these packets, extract the original packet, and push it out to the Internet. Going the other way, any packets that the server receives from the Internet that are meant for the client are wrapped in a UDP packet and sent back to the client. This introduces a small data overhead – the MTU for the load-balanced connection is a few bytes smaller than the smallest of the individual links.

There are some things that aren’t implemented yet. Smarter scheduling strategies could be used for certain use cases. For example, if you are using a metered connection and an unmetered connection, you could choose to have all packets sent out the unmetered connection until it’s maxed out, and any extra packets would go out the metered connection. Or, in the case of two asymmetric links, send 70% of packets out the higher-bandwidth link, and send the other 30% out the lower-bandwidth link.

The other problem is dealing with out-of-order packets. If one connection has a higher latency than the other (and this is going to be the case if you’re using two carriers for redundancy), then packets will arrive out of order. Normally, TCP deals with this, but since we have a custom protocol between the client and server, then we can write a sequence number into every packet and then do our own reordering of any received packets.

Why Go?

Most of the program is written in Go. Why Go? Go’s concurrency made it easier to listen to multiple sockets at once. For sending/receiving data split across multiple connections, it was natural to model receiving and sending as one channel each. We don’t care about which interface a packet came in through, so we stick them all in the same pipe.

gochanshttp://notes.ericjiang.com/posts/676

Plus, I had to integrate some C code, and the types in stdint.h and Go match up quite well. Using cgo to call some C functions from Go was a breeze. It was unfortunate that dealing with the OS required me to scrap a bunch of Go code after I realized that I hit the limits of Go’s net library, but the amount of C is relatively small.

Go’s type system is OK. Just having a static type system saved me a lot of development time, and Go’s interface type helped me replace the underlying sockets code by mimicking the original interface, so I didn’t have to modify the calling code. Is Go’s type system the best? No, but I’d rather have it than a dynamically typed language.

Go’s approach to error handling worked quite well. The likely alternative language, C, makes it easy to not handle errors, and that can be a deadly footgun when dealing with a lot of library calls (do I check errno? or the return code? or just ignore it and pray?). With Go, ignoring errors is explicit and intentional.

Future work…

Currently, tuntuntun is very rough around the edges.

Absolutely no security is implemented yet. Some mechanism for authentication between the server and client would be nice, and as of now, I recommend that you don’t run this on the public Internet. Server-client encryption is probably outside of the scope of this project – you can instead run a real VPN through a tuntuntun connection to get encrypted transport

There is more code that could be written to deal with interfaces that go away or come back. For example, if wlan0 is a USB radio that accidentally gets unplugged, tuntuntun continues to try to use it. But ideally, it would catch the error when trying to use wlan0, drop that interface from the list, and then periodically try recreating a socket on wlan0.

Partially implemented but not used yet is a circular buffer that can be used for packet reordering, and a simple ping test that can be used to measure latency or detect broken connections. A ping test would enable automatic connection failover.

For monitoring the status of tuntuntun (e.g. monitoring link status, bandwidth, ping, etc.), the current plan is to record statistics in the iface struct and then expose a JSON API that can be combined with an Angular UI. Since writing web servers in Go is so easy, this seems like a natural way of doing it.

GitHub repo: https://github.com/erjiang/tuntuntun (proof of concept status)

Raise your hand and ask

College lecturers (and teachers in general, I suppose) assume they need to ask if the class has any questions. The benchmark is that if the class doesn’t have any questions, then they understood the material, and if there were questions, then the lecturer should slow down a bit and maybe review it in a bit more detail.

It doesn’t work. Each class may have one or two students that play along with this and actually ask when they can’t follow along. Everyone else, when faced with inscrutable material, tends to shut up and sit through it.

Asking questions in public can be a lot of pressure. You worry that you’ll annoy other people by holding up the class. You worry that everyone else in the room already knows. You worry that you’re missing something really obvious. You worry that you’ll look like an idiot. And despite whatever cheerleading there is to encourage questions, all of these are possibilities.

What would you do if someone asked:

Can’t you just run `make -j` to parallelize your code?

Snickers? Laughs? (Look clueless because you’re not a computer programmer?)

It’s easier for someone with experience and accomplishments to ask questions—the experience means that you probably know as much or more than other people, so it won’t be a dumb question, and the accomplishments create a solid ego that won’t bruise so easily. The people without experience—the beginners—should be asking more questions, not fewer, but if they don’t have much in the way of experience or achievements (or have trouble internalizing them), then it can be a very scary deal.

We can do more to help people ask questions. The Internet is great for dumb questions—just check Yahoo! Answers. Every time I can Google my dumb question in private (“light stove without electricity”), then I feel better knowing that someone else took the fall.

And they probably asked under a pseudonym too. What if we created this tolerant environment for students? Several of my courses had simple CGI message boards where any student could post questions or reply to others, and for every asked question, there were probably several who wanted to ask it.

We could take this one step further and make it an anonymous board, where administrators could, if needed, unmask users (for cheating, harassment, etc.), but people with questions wouldn’t be so afraid of asking a dumb question. A college could create such a feature for their entire school—maybe even go to the extent of not even letting the teacher know the poster’s identity without going through some bureaucracy.

There is value into keeping course-related questions within the school. Teachers can monitor what people are asking about. Students can feel some camaraderie in their problems (misery loves company). Homework-specific questions often require a lot of context, like the homework questions that constantly pop up on Stack Overflow. And really, missing out on all of the help that could be provided in a school setting is a shame.

Nobody should be intimidated into not asking.

Considering a Computer Science major? Read this first

What school should I choose?

Look for a school that’s either big, or has a strong focus on providing a good CS education. Big schools offer more choice, so that you can skip or maneuver around poor teachers or take classes that might not be available in smaller schools. Undergrad-focused schools may have better quality education, although they’re typically smaller and more expensive (Harvey Mudd, Rose-Hulman, etc.).

What do I need to be good at in order to start?

  • Decent typing skills—not being a touch typist will make everything a bit more difficult.
  • Basic computer usage—know how to download and install programs, navigate and organize directories, find keyboard shortcuts, manage files.
  • Decent reading skills—specifically, in English (e.g. improve your English, if that is not your native language). Misunderstanding or glossing over a few words might mean missing an important step here or there.
  • High-school algebra or geometry—a lot of the thinking in CS is similar to the kind you’d use in algebra (moving variables and symbols around without screwing up) or geometry (being able to figure out a proof for why a line is perpendicular to this other line).
  • Google skills—knowing that you can figure things out with just you and your buddy Google is super useful.

Sometimes people ask about calculus. I personally think you can get by with zero calculus. Even if it’s needed, the probability that everyone else in the class is good at calculus is basically nil, so everyone can suffer together.

Do I need a specific/powerful computer?

It doesn’t hurt to have a more powerful computer, but often, you’ll be using a regular text editor and a web browser. Lately I’ve split most of my work between a 2009 netbook and a 2006 Tablet PC.

A regular mid-range laptop will be fine. A MacBook or MacBook Air has the bonus that OS X comes with many Unix tools built in, and Macs are a popular choice amongst CS majors. You may end up doing a lot of work by connecting remotely into a school server anyways. Pick a portable computer that doesn’t hurt to carry around.

What type of stuff gets covered in a 4-year CS undergrad program anyways?

An undergrad CS education is roughly split into two parts: the craft, and the theory. The craft includes all the little things around the act of writing code and making things. The theory includes the math and logic that informs your decisions and designs as you create things. Time for a bad analogy: it’s like becoming a painter—artists spend a lot of time practicing how to mix paints and move a brush back and forth on a canvas, but they also take art-history and theory courses to cover important ideas, color theory, historical movements, etc.

The first couple courses are usually just introducing the very basics of how to write things that a computer can understand. These classes are designed for people with no experience, so some students skip the first class if they already have experience. At the same time, the intro classes start to introduce some theoretical concepts to prepare for later courses.

Are there classes on iPhone apps/Ruby on Rails/video games/[currently trendy thing]?

Don’t count on it. These trendy things come and go every few years, and the average CS major is capable of self-learning these things by their sophomore year. There are a million different niches to focus on, but the school’s curriculum is there to teach the core concepts and give some exposure to different fields. Be confident that you can explore these things on your own after 2-3 semesters of CS courses. See “Do side projects” below.

There may be small topics classes or student-taught classes about some of these things. Also, clubs and student organizations may form around topics like game development or web startups.

I already have N years of experience programming. Can I just skip some classes?

Divide the number of years of experience you have by 2, and skip that many semesters of intro classes if you can. But be aware that some classes and topics (PLT, comp. architecture, etc.) are likely to be missed by self-taught programmers. Ask around and see what those classes actually cover before you skip them.

I have a year/summer/month of free time before I start. What should I do to prepare?

If you have no experience, then get started with a basic programming tutorial in any language. I recommend doing as much as you can of Learn Python the Hard Way. Going into anything with a tiny bit of experience always beats having no experience.

How can I keep up / succeed?

  • Get to know other students.

    If there’s a specific lab that CS majors tend to hang out in, then try to spend more time in there, and don’t be shy about asking about what other people are doing. If there’s one thing about nerd stereotypes that’s true, it’s that they love to tell anybody about what they’re working on. These are the people that will help you down the road with debugging your homework, explaining tough concepts, bringing you job offers, etc.

  • Ask upperclassmen about what classes to take.

    This is so important that it’s frustrating how many students don’t do this. If everybody you meet tells you that the teacher for CS2xx is horrible, incompetent, incomprehensible, and sadistic, then why on earth would you sign up for that class? Rule of thumb: pick courses based on professors, not on their topics. Choosing good profs might mean a 3x difference in what you get out of four years.

  • Do side projects.

    Two main benefits:

    The extra experience boosts your skills in many different ways. Here’s a bad analogy: if you want to be a painter, sitting through 8 semesters of college isn’t going to make you a great painter. You have to spend time breathing in turpentine to get the practice needed. Another bad analogy: doing side quests in RPGs will make your chars higher leveled than if you went straight through the main storyline. Real talk though: you’ll learn so much about what people in the real world are doing and thinking about that you wouldn’t get from the classroom.

    (See above section on “trendy topics”.) Besides, companies are falling over themselves trying to hire [currently trendy thing] programmers. Even making a crappy music organizer using [currently trendy thing] is a huge benefit when looking for part-time/internship/full-time positions.

    It doesn’t really matter what you make. One of the things that I made was a tool that automatically checked Craigslist and alerted me when someone listed an iPhone. I was the only person to ever use it and it was really just a bunch of sample code I found from various places on the Internet duct-taped together, but it was different from things I had done before and actually helped me out quite a few times.

    Need ideas? Check out Hacker News once every couple of days and look for posts that start with “Show HN” to see what everyone else is up to.

  • Ask for help.

    Use office hours. Nobody will know what you don’t know until you ask for help. Suffering in silence helps nobody. Office hours certainly aren’t the solution to all problems, but if you don’t try to use them, that’s your problem.

    Also, is everyone in the lab working on this week’s assignment? Chances are that someone is willing to whiteboard out a concept that’s difficult to understand.

Comments, suggestions, etc. appreciated.

Batch organize photos by date

I wanted to get a pile of jpegs organized by the year, and then the day. For example, IMG00001.JPG would go in 2013/06-04 based on its EXIF creation date. The handy exiftool can do this, via its own virtual “Directory” tag. The “Directory” tag is not an actual EXIF tag, but exiftool will move the photo if you write to it.

Here’s two snippets, one to copy photos to a new directory based on the date (-o dummy/ forces a copy):

exiftool -o dummy/ -d %Y/%m-%d "-directory<datetimeoriginal" /path/to/unorganized/photos/*.JPG

And one to move photos to a new directory based on the date:

exiftool -d %Y/%m-%d "-directory<datetimeoriginal" /path/to/unorganized/photos/*.JPG

This will automatically create directories called $PWD/year/day.