Good things happen when you subtract datetimes in MySQL

Of course, you know that “good things” and “MySQL” don’t go together. File this one under the category of “small ways in which MySQL is broken”.

Let’s fire up MySQL 5.1.72-2-log or 5.5.34-log.

mysql> create temporary table blah
    -> (alpha datetime, beta datetime);
Query OK, 0 rows affected (0.01 sec)

mysql> describe blah;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| alpha | datetime | YES  |     | NULL    |       |
| beta  | datetime | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+
2 rows in set (0.00 sec)

OK, so we have two datetimes in a table. Let’s try adding a row:

mysql> insert into blah (alpha, beta)
    -> VALUES ('2014-01-01 03:00:00', '2014-01-01 03:00:37'); 
Query OK, 1 row affected (0.00 sec)

What happens if we try subtracting two datetimes?

mysql> select alpha, beta, beta - alpha from blah;
+---------------------+---------------------+--------------+
| alpha               | beta                | beta - alpha |
+---------------------+---------------------+--------------+
| 2014-01-01 03:00:00 | 2014-01-01 03:00:37 |    37.000000 |
+---------------------+---------------------+--------------+
1 row in set (0.00 sec)

So we got the number of seconds between the two datetimes. Let’s try that again with two datetimes a minute apart.

mysql> insert into blah (alpha, beta)
    -> VALUES ('2014-01-01 03:00:00', '2014-01-01 03:01:00');
Query OK, 1 row affected (0.00 sec)

mysql> select alpha, beta, beta - alpha from blah;
+---------------------+---------------------+--------------+
| alpha               | beta                | beta - alpha |
+---------------------+---------------------+--------------+
| 2014-01-01 03:00:00 | 2014-01-01 03:00:37 |    37.000000 |
| 2014-01-01 03:00:00 | 2014-01-01 03:01:00 |   100.000000 |
+---------------------+---------------------+--------------+
2 rows in set (0.00 sec)

So, 100 seconds in a minute? Yikes. Obviously, this isn’t how you’re supposed to subtract datetimes in MySQL. But the great part is that it kind of works! You get a number back that correlates to the actual interval of time between the two, and if you’re measuring lots of small intervals, you might not notice that your data is between 100% and 167% of what it should be. Excellent puzzle to drive a junior dev crazy!

Wait, any reasonable database would have known that we were making a mistake, right?

mysql> show warnings;
Empty set (0.00 sec)

tuntuntun – Combine Multiple Internet Connections Into One

GitHub repo: https://github.com/erjiang/tuntuntun (proof of concept status)

I was trying to play Minecraft by tethering over a Sprint data connection but was having awful random latency and dropped packets. The Sprint hotspot seems to only allow a limited number of connections to utilize the bandwidth at a time – a download in Chrome would sometimes stall all other connections. This was a huge problem in Minecraft, as loading the world chunks would stall my movements, meaning that I could teleport somewhere and die to an enemy by the time the map finished loading.

I’ve been seeing the idea of channel bonding here and there for a while, and it always seems like a cool idea without any popular and free implementations. Most of the approaches, though, were restricted to assigning different connections to different network interfaces. Essentially, a connection to YouTube might go over one link, while a software download might go out another. This works OK if you’re trying to watch YouTube while downloading updates, and it works great for many-connection uses like BitTorrent but in this case, I wanted to create a single load-balanced connection. So, I created tuntuntun.

Somewhat like a VPN

tuntuntun diagram

This requires the help of an outside server to act as the proxy for all of the connections, because most current Internet protocols require connections to originate from one address. The idea is to establish a connection to the proxy using a custom protocol that allows data to be split between two links. Tuntuntun works by encapsulating IP traffic in UDP packets, and does this in userspace using tun interfaces. A tun interface is a virtual network interface that has a userspace program as the other end. This means that any packets sent to the tun are read by the userspace program, and anything that the userspace program writes to it becomes “real” packets in the kernel.

Continue reading tuntuntun – Combine Multiple Internet Connections Into One

Raise your hand and ask

College lecturers (and teachers in general, I suppose) assume they need to ask if the class has any questions. The benchmark is that if the class doesn’t have any questions, then they understood the material, and if there were questions, then the lecturer should slow down a bit and maybe review it in a bit more detail.

It doesn’t work. Each class may have one or two students that play along with this and actually ask when they can’t follow along. Everyone else, when faced with inscrutable material, tends to shut up and sit through it.

Asking questions in public can be a lot of pressure. You worry that you’ll annoy other people by holding up the class. You worry that everyone else in the room already knows. You worry that you’re missing something really obvious. You worry that you’ll look like an idiot. And despite whatever cheerleading there is to encourage questions, all of these are possibilities.

What would you do if someone asked:

Can’t you just run `make -j` to parallelize your code?

Snickers? Laughs? (Look clueless because you’re not a computer programmer?)

It’s easier for someone with experience and accomplishments to ask questions—the experience means that you probably know as much or more than other people, so it won’t be a dumb question, and the accomplishments create a solid ego that won’t bruise so easily. The people without experience—the beginners—should be asking more questions, not fewer, but if they don’t have much in the way of experience or achievements (or have trouble internalizing them), then it can be a very scary deal.

We can do more to help people ask questions. The Internet is great for dumb questions—just check Yahoo! Answers. Every time I can Google my dumb question in private (“light stove without electricity”), then I feel better knowing that someone else took the fall.

And they probably asked under a pseudonym too. What if we created this tolerant environment for students? Several of my courses had simple CGI message boards where any student could post questions or reply to others, and for every asked question, there were probably several who wanted to ask it.

We could take this one step further and make it an anonymous board, where administrators could, if needed, unmask users (for cheating, harassment, etc.), but people with questions wouldn’t be so afraid of asking a dumb question. A college could create such a feature for their entire school—maybe even go to the extent of not even letting the teacher know the poster’s identity without going through some bureaucracy.

There is value into keeping course-related questions within the school. Teachers can monitor what people are asking about. Students can feel some camaraderie in their problems (misery loves company). Homework-specific questions often require a lot of context, like the homework questions that constantly pop up on Stack Overflow. And really, missing out on all of the help that could be provided in a school setting is a shame.

Nobody should be intimidated into not asking.

Considering a Computer Science major? Read this first

What school should I choose?

Look for a school that’s either big, or has a strong focus on providing a good CS education. Big schools offer more choice, so that you can skip or maneuver around poor teachers or take classes that might not be available in smaller schools. Undergrad-focused schools may have better quality education, although they’re typically smaller and more expensive (Harvey Mudd, Rose-Hulman, etc.).

What do I need to be good at in order to start?

  • Decent typing skills—not being a touch typist will make everything a bit more difficult.
  • Basic computer usage—know how to download and install programs, navigate and organize directories, find keyboard shortcuts, manage files.
  • Decent reading skills—specifically, in English (e.g. improve your English, if that is not your native language). Misunderstanding or glossing over a few words might mean missing an important step here or there.
  • High-school algebra or geometry—a lot of the thinking in CS is similar to the kind you’d use in algebra (moving variables and symbols around without screwing up) or geometry (being able to figure out a proof for why a line is perpendicular to this other line).
  • Google skills—knowing that you can figure things out with just you and your buddy Google is super useful.

Sometimes people ask about calculus. I personally think you can get by with zero calculus. Even if it’s needed, the probability that everyone else in the class is good at calculus is basically nil, so everyone can suffer together.

Do I need a specific/powerful computer?

It doesn’t hurt to have a more powerful computer, but often, you’ll be using a regular text editor and a web browser. Lately I’ve split most of my work between a 2009 netbook and a 2006 Tablet PC.

A regular mid-range laptop will be fine. A MacBook or MacBook Air has the bonus that OS X comes with many Unix tools built in, and Macs are a popular choice amongst CS majors. You may end up doing a lot of work by connecting remotely into a school server anyways. Pick a portable computer that doesn’t hurt to carry around.

What type of stuff gets covered in a 4-year CS undergrad program anyways?

An undergrad CS education is roughly split into two parts: the craft, and the theory. The craft includes all the little things around the act of writing code and making things. The theory includes the math and logic that informs your decisions and designs as you create things. Time for a bad analogy: it’s like becoming a painter—artists spend a lot of time practicing how to mix paints and move a brush back and forth on a canvas, but they also take art-history and theory courses to cover important ideas, color theory, historical movements, etc.

The first couple courses are usually just introducing the very basics of how to write things that a computer can understand. These classes are designed for people with no experience, so some students skip the first class if they already have experience. At the same time, the intro classes start to introduce some theoretical concepts to prepare for later courses.

Are there classes on iPhone apps/Ruby on Rails/video games/[currently trendy thing]?

Don’t count on it. These trendy things come and go every few years, and the average CS major is capable of self-learning these things by their sophomore year. There are a million different niches to focus on, but the school’s curriculum is there to teach the core concepts and give some exposure to different fields. Be confident that you can explore these things on your own after 2-3 semesters of CS courses. See “Do side projects” below.

There may be small topics classes or student-taught classes about some of these things. Also, clubs and student organizations may form around topics like game development or web startups.

I already have N years of experience programming. Can I just skip some classes?

Divide the number of years of experience you have by 2, and skip that many semesters of intro classes if you can. But be aware that some classes and topics (PLT, comp. architecture, etc.) are likely to be missed by self-taught programmers. Ask around and see what those classes actually cover before you skip them.

I have a year/summer/month of free time before I start. What should I do to prepare?

If you have no experience, then get started with a basic programming tutorial in any language. I recommend doing as much as you can of Learn Python the Hard Way. Going into anything with a tiny bit of experience always beats having no experience.

How can I keep up / succeed?

  • Get to know other students.

    If there’s a specific lab that CS majors tend to hang out in, then try to spend more time in there, and don’t be shy about asking about what other people are doing. If there’s one thing about nerd stereotypes that’s true, it’s that they love to tell anybody about what they’re working on. These are the people that will help you down the road with debugging your homework, explaining tough concepts, bringing you job offers, etc.

  • Ask upperclassmen about what classes to take.

    This is so important that it’s frustrating how many students don’t do this. If everybody you meet tells you that the teacher for CS2xx is horrible, incompetent, incomprehensible, and sadistic, then why on earth would you sign up for that class? Rule of thumb: pick courses based on professors, not on their topics. Choosing good profs might mean a 3x difference in what you get out of four years.

  • Do side projects.

    Two main benefits:

    The extra experience boosts your skills in many different ways. Here’s a bad analogy: if you want to be a painter, sitting through 8 semesters of college isn’t going to make you a great painter. You have to spend time breathing in turpentine to get the practice needed. Another bad analogy: doing side quests in RPGs will make your chars higher leveled than if you went straight through the main storyline. Real talk though: you’ll learn so much about what people in the real world are doing and thinking about that you wouldn’t get from the classroom.

    (See above section on “trendy topics”.) Besides, companies are falling over themselves trying to hire [currently trendy thing] programmers. Even making a crappy music organizer using [currently trendy thing] is a huge benefit when looking for part-time/internship/full-time positions.

    It doesn’t really matter what you make. One of the things that I made was a tool that automatically checked Craigslist and alerted me when someone listed an iPhone. I was the only person to ever use it and it was really just a bunch of sample code I found from various places on the Internet duct-taped together, but it was different from things I had done before and actually helped me out quite a few times.

    Need ideas? Check out Hacker News once every couple of days and look for posts that start with “Show HN” to see what everyone else is up to.

  • Ask for help.

    Use office hours. Nobody will know what you don’t know until you ask for help. Suffering in silence helps nobody. Office hours certainly aren’t the solution to all problems, but if you don’t try to use them, that’s your problem.

    Also, is everyone in the lab working on this week’s assignment? Chances are that someone is willing to whiteboard out a concept that’s difficult to understand.

Comments, suggestions, etc. appreciated.

Batch organize photos by date

I wanted to get a pile of jpegs organized by the year, and then the day. For example, IMG00001.JPG would go in 2013/06-04 based on its EXIF creation date. The handy exiftool can do this, via its own virtual “Directory” tag. The “Directory” tag is not an actual EXIF tag, but exiftool will move the photo if you write to it.

Here’s two snippets, one to copy photos to a new directory based on the date (-o dummy/ forces a copy):

exiftool -o dummy/ -d %Y/%m-%d "-directory<datetimeoriginal" /path/to/unorganized/photos/*.JPG

And one to move photos to a new directory based on the date:

exiftool -d %Y/%m-%d "-directory<datetimeoriginal" /path/to/unorganized/photos/*.JPG

This will automatically create directories called $PWD/year/day.

Using bcrypt in CakePHP 2.3

CakePHP 2.3 adds native support for the bcrypt hashing algorithm, which is often recommended because of the amount of analysis that’s gone into it and its configurable cost function.

Using it isn’t obvious, however. The 2.3 migration notes merely say, You can now use Blowfish in your $authenticate array to allow bcrypt passwords to be used.

Due to limitations in how Auth works, a new authentication type was added in addition to a new hash type. So to use bcrypt, your $authenticate variable should look like this:

$this->Auth->authenticate = array(
            AuthComponent::ALL => array(
                'scope' => array('User.active' => 0)
            ),
            'Blowfish',
        );

That only affects checking the password. To hash passwords using bcrypt, you also need to modify your model because Auth::password won’t work.

    public function beforeSave($options = array()) {
        if (isset($this->data[$this->alias]['password'])) {
            $this->data[$this->alias]['password'] =
                Security::hash($this->data[$this->alias]['password'], "blowfish");
        }
        return true;
    }

Note that you can configure the cost by using Security::setCost (the default is 10).

Googling for bugs: hundreds of date-formatting mistakes

Here’s something really easy to screw up:

<?php
echo date("Y-m-d H:m:s");
?>

Spot the mistake? The “m” format code can only mean one thing, which is “months”, not “minutes”. Yet, when you’re writing code in a hurry, it’s so easy to quickly write this code and assume it works. After all, it raises no errors or warnings, and always generates valid datetime strings. It’s just wrong.

Googling for ‘site:github.com “Y-m-d H:m:s” php’ returns hundreds of examples of mistakes. I tried to fix a couple instances but realized that there’s so much abandoned code on Github that it would’ve been useless.

This is a prime example of something that static analysis could warn about: just look for “H:m:s” or “h:m:s” in the first argument to date. This mistake isn’t limited to PHP, of course, as this Java example shows.

We don’t remember your startup

Yeah, your startup. You know, the one that had the nifty HTML5 responsive launch page? The one that promised big things and nifty features? The one where we typed in our email address so we could be notified when it launched? You spent the last five months working your butt off to productionize and bugfix your startup, and when the time came, you crafted together a short but sweet email asking us to try out the beta.

Thing is, you forgot to remind us what your startup does. So I get this email from some website with a sufficiently hip name asking me to check them out right this instant. I don’t recall ever seeing them before, so it must be spam, and the few seconds I have before my train arrives, I trash it and move on to the next piece of mail.

What is a divshot and why should I care about it?

After working on your startup for the last year and spending every waking moment thinking and talking about it, it’s easy to assume people know about what you’re doing.

But for the rest of us, it wouldn’t be too much work to include your one-line pitch, would it?

Have you ever been computer illiterate?

If you’re like most hackers, you’ve spent so much time on the computer that the jargon and lingo have become natural. You instinctively know where the OK and Cancel buttons are, and get upset when they’re switched around for no apparent reason. You scan dialog boxes and make the right decision in a second, because who hasn’t seen “Do you want to set Firefox as your default browser?” a million times?

We are perplexed when users click through confirmation windows without a second thought. We sigh when someone doesn’t know where to find their terminal emulator. We become frustrated when people complain about poor performance but can’t even tell us whether they’ve maxed out their RAM and have swapping.

Look at these words! We’ve churned out so many abbreviations and neologisms that other developers can’t even understand us! But what about our users? What if they’re not computer professionals with 10+ years of Internet experience? What if they don’t know what we’re talking about? Or what if they’re using your product in something other than their native language? Why aren’t our users like us?

Try this. Switch your software into a language that you don’t know very well. If you took Spanish throughout high school or a couple years’ German in college, switch your computer to that. It’s very easy to do in OS X and iOS, and almost as easy on Ubuntu. Google products can be switched by adding the “hl” param to most Google URLs. For example, “hl=ja” switches the page to Japanese.

Just drag’n’drop another language to the top.

It’s a completely different experience.

Don’t just switch back after a day. Force yourself to use your computer in another language. After all, there are millions of people who simply can’t switch their computer to their native language. Use your now-alien computer, and you’ll find your habits changing.

Your eyes don’t immediately zero in on the correct list item any more. You avoid using the menu bar until you know you have to, and when you do, you’re still guided by a spatial-positional “feel” of where things are. And if it’s a rare need, like changing your accessibility settings, you sit there and slowly read each menu item until one of them vaguely sounds like what you’re looking for.

And it’s not just an issue of knowing words. You might know the words, for some superficial meaning of the word “know”, but when ordinary words are applied to computers, it often takes mental processing to understand their new meaning.

When you click “Cancel” in a dialog box, what are you cancelling? Is it cancelling in the same sense that you would cancel a reservation or a plane ticket? (We are all guilty of genericizing “OK” and “Cancel” as the default dialog-box buttons.) When I see “Ventana”, I think of a big hole in my wall filled with a glass panel. I can’t help it—that’s the meaning I learned—and it takes a second or so to recognize that that’s where I should go to find a list of my terminal instances. And when I first saw “終端”, which could be defined as “end extremity”, it was only by recognizing its icon that I knew it meant “terminal”.

Why would expect people to know that “terminal” means that little window with monospace text and funny symbols? We don’t even think to ourselves that historic terminals were the endpoints for mainframe communications.

So notice how your usage changes in an unfamiliar environment. You’re afraid of clicking words you don’t understand, because it might be even harder to undo your changes (e.g. setting your keyboard to “드보락”). When you don’t know how to get the functionality you need, you’d rather work around the problem or Google for the answer, because you can’t skim long drop-down menus very quickly. And textual documentation is useless, because the time and energy it takes to decipher the documentation is more than you’re willing to spend on your problem anyways.

Suddenly, customizing your computer becomes a chore, made worse by the nagging uncertainty of being able to remember how to reset those options. You notice subtle differences in localization between vendors as well—why is it that you some times 复制 and other times 拷贝 to send the selection to your system clipboard? You tend to stick to applications you’re familiar with, because you’d much rather get things done than spend so much time figuring out a new interface. You can feel your textual comprehension speed drop by a factor compared to what you’re used to.

And when you’re not sure, you rely on visual cues to confirm your guesses. The System Preferences icon to change your language back might be “言語とテキスト”, but it looks especially promising because it has a wavy UN flag as its icon. (A globe or a couple characters from different character systems would be great too.) The default options in dialog boxes, along with their positions, become important too, because you don’t feel like reading the actual message, and you’re likely to pick the rightmost or highlighted option anyways.

You like things with fewer options, because it takes less time to read them all, and it means fewer chances to mess something weird up. Visual layouts help with understanding things, because you can draw analogies to what you know. And it’s not always obvious which things are action widgets that you can click on, especially if they’re not styled like a button (I’m looking at you, Android).

So go ahead, change your system language and use it for a month. You might be surprised at what you’ve taken for granted.

PHP’s wonderful pseudo-functions

There is a pseudo-function in PHP called empty, that tells you whether a variable is “empty” for someone’s definition of empty. The empty string, zero, null, and undefined variables are all considered to be empty.

<?php
$x = null;
echo empty($x); // prints "1"
echo empty($nothere); // prints "1"

Well, how did they make it work for undefined variables? They made empty a language construct, not a function, so that its argument isn’t evaluated before the empty check happens. This leads to some great errors:

$x = 0;
echo empty($x); // prints "1"
echo empty(0); // PHP Parse error:  syntax error, unexpected T_LNUMBER in php shell code on line 1

Or better yet,

function y() { return 0; }
echo y(); // prints "0"
echo empty(y()); // PHP Fatal error:  Can't use function return value in write context in php shell code on line 1

PLT is basically magick.