Not all STEM fields are created equal

Alternate title: “Why do people forget about psychology when talking about STEM?”

STEM majors! Everyone needs to be a STEM major! We have a huge shortage of STEM majors! Quick, get your sons and daughters (especially daughters) to major in a STEM field!

STEM stands for “science, technology, engineering, and mathematics” and we often use it as a big bucket of Things We Want More Of. You hear the president promoting STEM fields, along with the Department of Education. A lot of money is being thrown at convincing more people to major in STEM fields in college, especially towards the issue of getting more women into STEM.

“STEM” is often found in the same sentence as “high-paying jobs”, but what’s often glossed over is that not all STEM fields are created equal. Often, when people say “STEM fields”, they mean “STEM fields that have high wages and a shortage of job candidates or certain minorities”. When we think of STEM, we often think of fields like computer science, electrical engineering, and mathematics. But many fields in STEM are not particularly high-paying or male dominated.

Biology and psychology are the two most popular STEM fields, if you consider psychology to fall under the STEM umbrella. They have some of the highest women to men ratios. And, they are a couple of the lowest paying STEM fields. In fact, if you compare gender ratios in some of the most popular STEM fields with their median incomes, you see a trend:

Gender ratios vs STEM fields
(more…)

[transl.] You went overseas—are you happy? (“出国的你 快乐吗”)

Found the below article through so-called “social media”. Don’t have any affiliation with the author nor the WeChat group, but thought it was worth being translated.

Original post at: http://mp.weixin.qq.com/s?__biz=MzA4ODQwMTkyOQ%3D%3D&mid=200035572&idx=1&sn=6fc8fba95e4c22b96616f5cd79b99437


[n.b. photos from original post omitted]

大家看着留学生在国外照的照片,不停说,好羡慕,好美好的生活。
Everyone looking at the photos from students studying abroad keeps saying, “Wow, what a wonderful life. I’m jealous.”
千万不要羡慕,这条路看去光鲜亮丽,其实每走一步就多一个伤痕。
Don’t be jealous, because the road they take might seem bright and glamorous, but every step taken is another scar suffered.

(more…)

Workaround for HipChat on Linux: “can’t find build id”, “HashElfTextSection”

The new version of HipChat added support for video and screen-sharing. It also introduced the new requirement of OpenGL 2.0. On my computer, HipChat would crash on startup with repeated messages of

can't find build id
HashElfTextSection
can't find build id
HashElfTextSection
can't find build id
HashElfTextSection
can't find build id
HashElfTextSection

I wasn’t going to try video chatting on my old netbook anyways, so there’s no reason I needed hardware support for rendering text and emoticons.

First, get your system’s OpenGL info by running `glxinfo | grep OpenGL` and find your version string:

(more…)

What concurrency in Node.js could have been

People wrote a lot of good comments on my last post about Node.js. (They also wrote some bad comments, but the worst have been modded out.)

One of the key points I was trying to make was that the way concurrency is written in Node.js sucks, so it blew my mind that people kept referring to new libraries that supposedly fixed things: bluebird, fibers, etc. You see, the callback-hell problem is something that is impossible to fix in “user space”. You either have to modify Node.js internals or design a new language that compiles to JavaScript.

Let me explain.

In a language that allows you to write concurrent code, you might write something like:

(more…)

The emperor’s new clothes were built with Node.js

There are plenty of people lambasting Node.js (see the infamous “Node.js is cancer”) but proponents tend to misunderstand the message and come up with irrelevant counterpoints. It’s made worse because there are two very different classes of people that use Node.js. The first kind of people are those who need highly concurrent servers that can handle many connections at once: HTTP proxies, Websocket chat servers, etc. The second are those who are so dependent on JavaScript that they need to use JS across their browser, server, database, and laundry machine.

I want to address one-by-one all of the strange and misguided arguments for Node.js in one place.

Update: Please keep sending in anything I missed! Moderation of comments will probably continue to lag behind, but I do read them and will continue fixing and tightening up this article as best as I can.

TL;DR: what’s clothing that doesn’t use threads and doesn’t block (anything)?

Node.js is fast!

This is actually too imprecise. Let’s break it down into two separate claims:

(more…)

OpenStreetMap provider CloudMade shuts its doors on small users

(Original email at bottom.)

CloudMade, a company selling mapping services (many based on OpenStreetMap data) that competed head-to-head with Google, let its users know that as of May 1st, they’ll stop serving anyone who’s not on an enterprise plan. This is rather sad, because they were one of the main alternatives for custom OpenStreetMap tiles.

Their map tiles definitely left something to be desired. The OSM data that they were using seems to have been last refreshed around the time Steve Coast left (maybe that’s a wee bit of an exaggeration) and the rendering was never very polished—ugly icons and labels getting cut off on tile boundaries. But for $25/1M tiles (with the first 500k free), could you really complain?

CloudMade even listed Steve Coast, founder of OpenStreetMap, as a co-founder. Steve Coast left in 2010, and it was hard to tell what the company was trying to become. Now, we see that they’re gunning for enterprise services, along the lines of Navteq and TomTom. Instead of dealing with small fries like us, they’re apparently focusing on bigger deals like providing data for hardware and consumer electronics.

Maybe they just got tired of my emails to support asking why this or that was broken or when they’d update their data. Now, we’re left with almost no options for custom hosted OSM tiles. MapBox is one popular choice, but their online map customizer is elementary compared to CloudMade’s (and CloudMade’s was not super advanced). MapBox also have stricter terms of how their map tiles can be used. No proxying/caching of MapBox tiles is allowed, for example, especially since they charge based on usage.

CloudMade helpfully gave some alternative providers for us small fries to switch to. Still, one less provider means more risk when using a hosted provider. For example, who are we going to turn to when MapQuest decides to shut off its routing services?

Here’s to hoping people will step up and fill the gap that CloudMade is leaving. Us little users who will only pay a couple hundred dollars per month will then have somewhere else to go.

This is what came through today:

Hi [username],

We want to let you know about some changes we’re making to the CloudMade APIs. As of May 1st we’re switching to an enterprise model that supports the medium to large sized users of the CloudMade APIs. As part of this transition we’ll stop serving Map Tile, Geocoding, Routing and Vector Stream Server requests coming from your API keys below as of May 1st, unless you take action.

Your active CloudMade API keys are: W,X,Y,Z

If you wish to continue using the CloudMade services after April 30th you’ll need to upgrade to an enterprise plan. Enterprise plans are available for customers with 10,000,000 or more transactions per month. The plans include dedicated hosting, custom SLAs, 24×7 support from a named customer support representative and custom data loading. You can find out more about upgrading and request more information on the Web Portals page.

If your monthly usage is less than 10,000,000 transactions, or you don’t wish to upgrade to an enterprise plan, you should take action to update the app or website that’s using the CloudMade API keys shown above to use an alternative provider. There are a number of alternative providers of Map Tiles, Geocoding and Routing services based on OpenStreetMap data, for example:

– Mapquest (Map Tiles, Routing, Geocoding)

– MapBox (Styled Map Tiles)

Thanks for using CloudMade’s APIs over the past months and years. If you don’t switch to an enterprise plan, we wish you a smooth transition to the new service provider you choose.

[...]

Disclaimer: Nothing written here represents my employer in any way. I am/was a mostly satisfied user of many OSM-based services out there, including MapBox, MapQuest, and CloudMade.

Good things happen when you subtract datetimes in MySQL

Of course, you know that “good things” and “MySQL” don’t go together. File this one under the category of “small ways in which MySQL is broken”.

Let’s fire up MySQL 5.1.72-2-log or 5.5.34-log.

mysql> create temporary table blah
    -> (alpha datetime, beta datetime);
Query OK, 0 rows affected (0.01 sec)

mysql> describe blah;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| alpha | datetime | YES  |     | NULL    |       |
| beta  | datetime | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+
2 rows in set (0.00 sec)

OK, so we have two datetimes in a table. Let’s try adding a row:

mysql> insert into blah (alpha, beta)
    -> VALUES ('2014-01-01 03:00:00', '2014-01-01 03:00:37'); 
Query OK, 1 row affected (0.00 sec)

What happens if we try subtracting two datetimes?

mysql> select alpha, beta, beta - alpha from blah;
+---------------------+---------------------+--------------+
| alpha               | beta                | beta - alpha |
+---------------------+---------------------+--------------+
| 2014-01-01 03:00:00 | 2014-01-01 03:00:37 |    37.000000 |
+---------------------+---------------------+--------------+
1 row in set (0.00 sec)

So we got the number of seconds between the two datetimes. Let’s try that again with two datetimes a minute apart.

mysql> insert into blah (alpha, beta)
    -> VALUES ('2014-01-01 03:00:00', '2014-01-01 03:01:00');
Query OK, 1 row affected (0.00 sec)

mysql> select alpha, beta, beta - alpha from blah;
+---------------------+---------------------+--------------+
| alpha               | beta                | beta - alpha |
+---------------------+---------------------+--------------+
| 2014-01-01 03:00:00 | 2014-01-01 03:00:37 |    37.000000 |
| 2014-01-01 03:00:00 | 2014-01-01 03:01:00 |   100.000000 |
+---------------------+---------------------+--------------+
2 rows in set (0.00 sec)

So, 100 seconds in a minute? Yikes. Obviously, this isn’t how you’re supposed to subtract datetimes in MySQL. But the great part is that it kind of works! You get a number back that correlates to the actual interval of time between the two, and if you’re measuring lots of small intervals, you might not notice that your data is between 100% and 167% of what it should be. Excellent puzzle to drive a junior dev crazy!

Wait, any reasonable database would have known that we were making a mistake, right?

mysql> show warnings;
Empty set (0.00 sec)

tuntuntun – Combine Multiple Internet Connections Into One

GitHub repo: https://github.com/erjiang/tuntuntun (proof of concept status)

I was trying to play Minecraft by tethering over a Sprint data connection but was having awful random latency and dropped packets. The Sprint hotspot seems to only allow a limited number of connections to utilize the bandwidth at a time – a download in Chrome would sometimes stall all other connections. This was a huge problem in Minecraft, as loading the world chunks would stall my movements, meaning that I could teleport somewhere and die to an enemy by the time the map finished loading.

I’ve been seeing the idea of channel bonding here and there for a while, and it always seems like a cool idea without any popular and free implementations. Most of the approaches, though, were restricted to assigning different connections to different network interfaces. Essentially, a connection to YouTube might go over one link, while a software download might go out another. This works OK if you’re trying to watch YouTube while downloading updates, and it works great for many-connection uses like BitTorrent but in this case, I wanted to create a single load-balanced connection. So, I created tuntuntun.

Somewhat like a VPN

tuntuntun diagram

This requires the help of an outside server to act as the proxy for all of the connections, because most current Internet protocols require connections to originate from one address. The idea is to establish a connection to the proxy using a custom protocol that allows data to be split between two links. Tuntuntun works by encapsulating IP traffic in UDP packets, and does this in userspace using tun interfaces. A tun interface is a virtual network interface that has a userspace program as the other end. This means that any packets sent to the tun are read by the userspace program, and anything that the userspace program writes to it becomes “real” packets in the kernel.

(more…)

Raise your hand and ask

College lecturers (and teachers in general, I suppose) assume they need to ask if the class has any questions. The benchmark is that if the class doesn’t have any questions, then they understood the material, and if there were questions, then the lecturer should slow down a bit and maybe review it in a bit more detail.

It doesn’t work. Each class may have one or two students that play along with this and actually ask when they can’t follow along. Everyone else, when faced with inscrutable material, tends to shut up and sit through it.

Asking questions in public can be a lot of pressure. You worry that you’ll annoy other people by holding up the class. You worry that everyone else in the room already knows. You worry that you’re missing something really obvious. You worry that you’ll look like an idiot. And despite whatever cheerleading there is to encourage questions, all of these are possibilities.

What would you do if someone asked:

Can’t you just run `make -j` to parallelize your code?

Snickers? Laughs? (Look clueless because you’re not a computer programmer?)

It’s easier for someone with experience and accomplishments to ask questions—the experience means that you probably know as much or more than other people, so it won’t be a dumb question, and the accomplishments create a solid ego that won’t bruise so easily. The people without experience—the beginners—should be asking more questions, not fewer, but if they don’t have much in the way of experience or achievements (or have trouble internalizing them), then it can be a very scary deal.

We can do more to help people ask questions. The Internet is great for dumb questions—just check Yahoo! Answers. Every time I can Google my dumb question in private (“light stove without electricity”), then I feel better knowing that someone else took the fall.

And they probably asked under a pseudonym too. What if we created this tolerant environment for students? Several of my courses had simple CGI message boards where any student could post questions or reply to others, and for every asked question, there were probably several who wanted to ask it.

We could take this one step further and make it an anonymous board, where administrators could, if needed, unmask users (for cheating, harassment, etc.), but people with questions wouldn’t be so afraid of asking a dumb question. A college could create such a feature for their entire school—maybe even go to the extent of not even letting the teacher know the poster’s identity without going through some bureaucracy.

There is value into keeping course-related questions within the school. Teachers can monitor what people are asking about. Students can feel some camaraderie in their problems (misery loves company). Homework-specific questions often require a lot of context, like the homework questions that constantly pop up on Stack Overflow. And really, missing out on all of the help that could be provided in a school setting is a shame.

Nobody should be intimidated into not asking.

Considering a Computer Science major? Read this first

What school should I choose?

Look for a school that’s either big, or has a strong focus on providing a good CS education. Big schools offer more choice, so that you can skip or maneuver around poor teachers or take classes that might not be available in smaller schools. Undergrad-focused schools may have better quality education, although they’re typically smaller and more expensive (Harvey Mudd, Rose-Hulman, etc.).

What do I need to be good at in order to start?

  • Decent typing skills—not being a touch typist will make everything a bit more difficult.
  • Basic computer usage—know how to download and install programs, navigate and organize directories, find keyboard shortcuts, manage files.
  • Decent reading skills—specifically, in English (e.g. improve your English, if that is not your native language). Misunderstanding or glossing over a few words might mean missing an important step here or there.
  • High-school algebra or geometry—a lot of the thinking in CS is similar to the kind you’d use in algebra (moving variables and symbols around without screwing up) or geometry (being able to figure out a proof for why a line is perpendicular to this other line).
  • Google skills—knowing that you can figure things out with just you and your buddy Google is super useful.

Sometimes people ask about calculus. I personally think you can get by with zero calculus. Even if it’s needed, the probability that everyone else in the class is good at calculus is basically nil, so everyone can suffer together.

Do I need a specific/powerful computer?

It doesn’t hurt to have a more powerful computer, but often, you’ll be using a regular text editor and a web browser. Lately I’ve split most of my work between a 2009 netbook and a 2006 Tablet PC.

A regular mid-range laptop will be fine. A MacBook or MacBook Air has the bonus that OS X comes with many Unix tools built in, and Macs are a popular choice amongst CS majors. You may end up doing a lot of work by connecting remotely into a school server anyways. Pick a portable computer that doesn’t hurt to carry around.

What type of stuff gets covered in a 4-year CS undergrad program anyways?

An undergrad CS education is roughly split into two parts: the craft, and the theory. The craft includes all the little things around the act of writing code and making things. The theory includes the math and logic that informs your decisions and designs as you create things. Time for a bad analogy: it’s like becoming a painter—artists spend a lot of time practicing how to mix paints and move a brush back and forth on a canvas, but they also take art-history and theory courses to cover important ideas, color theory, historical movements, etc.

The first couple courses are usually just introducing the very basics of how to write things that a computer can understand. These classes are designed for people with no experience, so some students skip the first class if they already have experience. At the same time, the intro classes start to introduce some theoretical concepts to prepare for later courses.

Are there classes on iPhone apps/Ruby on Rails/video games/[currently trendy thing]?

Don’t count on it. These trendy things come and go every few years, and the average CS major is capable of self-learning these things by their sophomore year. There are a million different niches to focus on, but the school’s curriculum is there to teach the core concepts and give some exposure to different fields. Be confident that you can explore these things on your own after 2-3 semesters of CS courses. See “Do side projects” below.

There may be small topics classes or student-taught classes about some of these things. Also, clubs and student organizations may form around topics like game development or web startups.

I already have N years of experience programming. Can I just skip some classes?

Divide the number of years of experience you have by 2, and skip that many semesters of intro classes if you can. But be aware that some classes and topics (PLT, comp. architecture, etc.) are likely to be missed by self-taught programmers. Ask around and see what those classes actually cover before you skip them.

I have a year/summer/month of free time before I start. What should I do to prepare?

If you have no experience, then get started with a basic programming tutorial in any language. I recommend doing as much as you can of Learn Python the Hard Way. Going into anything with a tiny bit of experience always beats having no experience.

How can I keep up / succeed?

  • Get to know other students.

    If there’s a specific lab that CS majors tend to hang out in, then try to spend more time in there, and don’t be shy about asking about what other people are doing. If there’s one thing about nerd stereotypes that’s true, it’s that they love to tell anybody about what they’re working on. These are the people that will help you down the road with debugging your homework, explaining tough concepts, bringing you job offers, etc.

  • Ask upperclassmen about what classes to take.

    This is so important that it’s frustrating how many students don’t do this. If everybody you meet tells you that the teacher for CS2xx is horrible, incompetent, incomprehensible, and sadistic, then why on earth would you sign up for that class? Rule of thumb: pick courses based on professors, not on their topics. Choosing good profs might mean a 3x difference in what you get out of four years.

  • Do side projects.

    Two main benefits:

    The extra experience boosts your skills in many different ways. Here’s a bad analogy: if you want to be a painter, sitting through 8 semesters of college isn’t going to make you a great painter. You have to spend time breathing in turpentine to get the practice needed. Another bad analogy: doing side quests in RPGs will make your chars higher leveled than if you went straight through the main storyline. Real talk though: you’ll learn so much about what people in the real world are doing and thinking about that you wouldn’t get from the classroom.

    (See above section on “trendy topics”.) Besides, companies are falling over themselves trying to hire [currently trendy thing] programmers. Even making a crappy music organizer using [currently trendy thing] is a huge benefit when looking for part-time/internship/full-time positions.

    It doesn’t really matter what you make. One of the things that I made was a tool that automatically checked Craigslist and alerted me when someone listed an iPhone. I was the only person to ever use it and it was really just a bunch of sample code I found from various places on the Internet duct-taped together, but it was different from things I had done before and actually helped me out quite a few times.

    Need ideas? Check out Hacker News once every couple of days and look for posts that start with “Show HN” to see what everyone else is up to.

  • Ask for help.

    Use office hours. Nobody will know what you don’t know until you ask for help. Suffering in silence helps nobody. Office hours certainly aren’t the solution to all problems, but if you don’t try to use them, that’s your problem.

    Also, is everyone in the lab working on this week’s assignment? Chances are that someone is willing to whiteboard out a concept that’s difficult to understand.

Comments, suggestions, etc. appreciated.