Optimizing Python With Cython

This is a mirror of a 2015 article I wrote about using Cython.

Python is slow. It’s still faster than a lot of other languages, but describing Python code as “fast” will probably get some weird looks from people who use compiled or JITted languages.

Still, I love Python for its design and speed at which I can write good code. Most of the time, when writing Python for distributed or web applications, it doesn’t matter. Your runtime is usually dominated by network traffic or data accesses than it is by raw computational speed. In those cases, the correct optimization is often reducing network requests or indexing your data.

Sometimes—just occasionally—we actually do need fast computation. Traditional Python wisdom has been to profile your code (perhaps using cProfile), identify the hot spot, and rewrite it in C as a Python extension. I’ve had to do that before for a handwriting recognition algorithm with great success (it turned a 2–3 second computation into a practically real-time one), but writing C code and then figuring out the FFI was a pain. I had to rewrite the algorithm in C, verify that my C version was correct and didn’t have memory errors, and then figure out the right incantations to get it to cooperate with Python.

Let’s try something different this time.


At DoubleMap, geographic distance calculation is something that gets used a lot, and we use the haversine function to calculate distance between two points on Earth. In some of our data analysis, we might run the haversine function millions of times in a tight loop, and it was causing some reports to take way too long.

Profiling the code showed that the two heaviest operations were fetching data from the data store and the haversine function.

Play along at home: Clone the repo at https://github.com/doublemap/haversine-optimization and start from the first commit to see how the code changes step by step.

The commands you should know are pip install cython and python setup.py build_ext --inplace.

Original code

from math import sin, cos, acos

def haversine(coord1, coord2):
    """Given two (lat, lng) tuples, returns the distance between them in
    lat1, lng1 = coord1
    lat2, lng2 = coord2

    if lat1 > 90 or lat1 < -90 or lat2 > 90 or lat2 < -90:
        raise ValueError("Invalid latitude (should be between +/- 90)")
    if lng1 > 180 or lng1 < -180 or lng2 > 180 or lng2 < -180:
        raise ValueError("Invalid longitude (should be between +/- 180)")

    phi1 = (90.0 - lat1) * 0.0174532925
    phi2 = (90.0 - lat2) * 0.0174532925
    theta1 = lng1 * 0.0174532925
    theta2 = lng2 * 0.0174532925

    c = (sin(phi1) * sin(phi2) * cos(theta1 - theta2) + cos(phi1) * cos(phi2))
    arc = acos(c)
    return arc * 6367444.7

Also, here’s the benchmark code that is used throughout:

from __future__ import print_function
from timeit import timeit

print(timeit("haversine((39.132213, -86.12439), (38.55213, -86.94910))",
             setup="from haversine import haversine",

Pretty straightforward. Let’s try pre-compiling this code into a C extension to see if that helps.

Following the Cython Basic Tutorial, I renamed haversine.py to haversine.pyx and created a setup.py file:

from distutils.core import setup
from Cython.Build import cythonize

    ext_modules = cythonize("haversine.pyx")

Running python setup.py build_ext --inplace built haversine.so out of my unmodified Python code. Easy-peasy.

Benchmark results (300,000 iterations):

Original code: 2.85 seconds
Compiled: 2.01 seconds

So, 29% time savings without any modification to the original code.

C math functions

Currently, we’re still using Python’s math functions. Let’s see if we can cut out some overhead by importing math.h instead.

Except, since this is Cython, all we need to do is:

--- a/haversine.pyx
+++ b/haversine.pyx
@@ -1,4 +1,4 @@
-from math import sin, cos, acos
+from libc.math cimport sin, cos, acos

Benchmark results (300,000 iterations):

Original code: 2.85 seconds
Compiled: 2.01 seconds
libc.math: 1.33 seconds

So far we’ve saved 53% of the time from the original code.

C types

Cython’s biggest extension of the Python language is its type annotations. Speed ups can be had by telling Cython the types of each variable in advance. We are dealing with geographic coordinates, so we’ll be using doubles for pretty much everything:

def haversine(tuple coord1, tuple coord2):
    """Given two (lat, lng) tuples, returns the distance between them in
    cdef double lat1
    cdef double lng1
    cdef double lat2
    cdef double lng2
    lat1, lng1 = coord1
    lat2, lng2 = coord2

    if lat1 > 90 or lat1 < -90 or lat2 > 90 or lat2 < -90:
        raise ValueError("Invalid latitude (should be between +/- 90)")
    if lng1 > 180 or lng1 < -180 or lng2 > 180 or lng2 < -180:
        raise ValueError("Invalid longitude (should be between +/- 180)")

    cdef double ph1
    cdef double ph2
    cdef double theta1
    cdef double theta2
    cdef double c
    cdef double arc

    phi1 = (90.0 - lat1) * 0.0174532925
    phi2 = (90.0 - lat2) * 0.0174532925
    theta1 = lng1 * 0.0174532925
    theta2 = lng2 * 0.0174532925

    c = (sin(phi1) * sin(phi2) * cos(theta1 - theta2) + cos(phi1) * cos(phi2))
    arc = acos(c)
    return arc * 6367444.7

Original code: 2.85 seconds
Compiled: 2.01 seconds
libc.math: 1.33 seconds
C types: 0.466 seconds

In three easy steps, we’ve reduced the run time of this function by 84%. More importantly, we’re still writing what’s basically Python. We don’t have to remember to free our variables or check our array bounds. That additional safety is not free (and there are various flags to disable Cython safety checks) but with Cython, optimizing a Python function doesn’t have to be a rewrite-everything task.

We probably could have gone further in optimizing this, but these speed-ups got us into “good enough” territory, and that’s good enough.


Our optimized haversine function is available on PyPI under the name “cHaversine”. One key difference is that the distributed tarball includes the C code generated by Cython so that you, as the package user, don’t need Cython to download and build the C extension.

Comments on Hacker News

NFTs are tearing the art community apart

Currently, the collision of artists and cryptocurrencies is playing out across Twitter and other platforms, dividing artists who normally praise and support each other into two camps. One camp wants to make money selling their art as NFTs, and the other camp hates the idea.

If you haven’t heard already, NFTs are enjoying increasing popularity alongside the current record-high cryptocurrency values. NFT stands for Non-Fungible Token – an atomic digital token that can be traded for cryptocurrency. Many artists are now selling “digital editions” of their artwork as NFTs on special-purpose marketplaces – the buyer can call themselves the owner of a piece of digital art, similar to how one might purchase a limited edition print. Some of these art NFTs have been trading for the equivalent of tens of thousands of dollars, despite granting the buyer no physical copy, no reproduction rights, or anything other than the ability to say, “Yes, I own this token,” and to sell it to someone else.

What’s been particularly divisive is the objection to the shockingly high energy usage of the most popular cryptocurrencies. One estimate puts the energy consumption involved in a single NFT sale at around 340 kWh, equivalent to driving 1,000km in an ICE car. Most of this is due to the proof-of-work algorithm that’s core to Ethereum[1], the blockchain that most of these NFT sales take place on. Essentially, many objectors see making and selling NFTs as something akin to clearcutting rainforests in order to grow crops.


Customize Huion tablet buttons on Linux

The “Huion Kamvas Pro (2019)” comes with 16 physical buttons and two touch strips along the sides of the tablet. On Windows, you can configure the buttons to send custom keystrokes (e.g. toggling painting tools or changing brush size). However, the key mapping is mirrored from the left to the right side, so you can only have 10 distinct assignments.

On Linux, the pen works out-of-the-box on my Ubuntu machines, but there’s no good way to customize the buttons. You can force X to use the wacom drivers (xf86-input-wacom) which lets you use xsetwacom to customize some of the buttons, but the touch strips don’t seem to be supported, nor do buttons 13 through 16 (the bottom right buttons).

I have published a Python program that watches the raw data coming from the tablet and sends commands using xdotool. It currently has a few rough edges: setting it up requires a bit of compilation in order to link to libxdo, and you also need to have permission to read from the /dev/hidraw device which you don’t have by default. For the permissions, I’m sure someone more knowledgeable can suggest a better way to set things up, but you can run the program as root or a setuid program, or manually grant read access on the /dev/hidraw file.

Configuring it should be pretty easy. Here are some buttons that I have configured for use with Krita, for example:


Using TypeScript to check for missing cases

TL;DR: use your type system to keep you from forgetting to handle all cases in switch statements and object keys.

Often in programming, you have to deal with a list of distinct options. In some languages, this would be expressed by an enum, but in TypeScript it’s more common to express them as specific strings:

type CarType = “mazda-miata” | “honda-s2000” | “toyota-mr2” | “pontiac-solstice”;

(Let’s say we’re building a racing game featuring old roadsters in this contrived example…)

Rather than enumerate all of the types in one CarType declaration, we might have varying properties for each car, expressed as TypeScript discriminated unions:


Who was Herman Wasserman?

Probably relatively well-known in music circles in his day, Herman Wasserman seems to only pop up today in conversation associated with George Gershwin. It seems like he was also a teacher to Ferde Grofé, who orchestrated Gershwin’s Rhapsody in Blue.

If you search library catalogs, he turns up as having edited George Gershwin’s Song-Book as well as having arranged a simplified piano solo version of Rhapsody in Blue.

In the foreword of that version of Rhapsody in Blue, it claims:


QListView not accepting drag and drop

Python + Qt (in the form of PyQt5 or PySide2) is a weird mash-up of the famously slow interpreted dynamic language plus a heavyweight C++ GUI library. It certainly has its advantages over writing in C++, but I’m really wondering if there aren’t better ways to write cross-platform desktop apps.

Anyways, in Qt, you’re supposed to be able to accept drag-and-drop in a widget by doing something like:

# install some event handlers
myWidget.dragMoveEvent = ...
myWidget.dropEvent = ...

If you use a QListView to present an explorer-style view of some items, this mysteriously doesn’t work. Why? Qt is object-oriented from top to bottom, and you think you’re using a QListView but it’s actually a QAbstractItemView which is actually a QAbstractScrollArea. But a QAbstractScrollArea is very different conceptually from a QListView.

It turns out that when you drop something onto the QListView, it’s not the QListView that gets the drop events. Instead, its internal viewport (the scroll area) gets the event.

And if you’ve setAcceptDrops(True) on myWidget, that’s not what you actually want. The viewport is its own QWidget with its own acceptDrops flag that isn’t True.

class MyWidget(QListView):
    def __init__(self):
        # NOT self.setAcceptDrops(True)

    def eventFilter(self, obj, event):
        # grab any drag and drop events that the viewport receives...

Is this a bug? It certainly seems like poor API design, because if doing a setAcceptDrops(True) on a QListView doesn’t also setAcceptDrops(True) on its internal viewport, then what does it accomplish?

Technically, this behavior seems to be mentioned in this old Qt 4.6 page: https://doc.qt.io/archives/4.6/model-view-dnd.html

But this kind of requirement that I cobble together bits and pieces of Qt 4, C++, PyQt*, and StackOverflow documentation to figure out what to do in PySide2 is apparently the state of using Qt with Python. Excuse me while I go back to complaining about JavaScript…

Fixing only left/right channels working on Logitech headsets

I have a Logitech G430 headset. It’s one of a series of Logitech headsets that offer fake surround sound as a marketing ploy. (I will write up something someday about why surround sound headphones are 95% marketing B.S.)

Using it on Windows, at some point all audio from channels other than left and right disappeared. I.e. 6 of the 8 channels were just completely inaudible.

Toggling the surround sound effect in Logitech Gaming Software didn’t work. I looked around and there were similar complaints from people with other Logitech headsets.

Here’s what did work, and how to test it:


100% Unbreakable Encryption is Achievable!

There were two common cryptography misconceptions that we unlearned in school, and this post is about the first one we learned about. (And Cryptonomicon helped with this one too.)

We hear a lot about how “strong” encryption is. That our files would take bazillion years to decrypt via brute force or that our Bitcoin account would take the fastest supercomputer a gajillion years to break into. But what about an encryption scheme that an adversary could never, ever brute force? Sounds pretty useful, doesn’t it?

You may be surprised to learn that yes, we can do it! And we’ve been doing perfectly unbreakable encryption in the military since before WWII.

Let’s say we have a plain-text message:


And we encrypted it with a random key that’s the same length as the message, where each character of the key is just added to the corresponding plain-text letter of the English alphabet (A = 0, so B + C = D, Y + B = Z and so on…).


Reverse-Engineered Dumpling Sauce Recipe

This was reverse-engineered from Wei-Chuan brand dumpling sauce. It’s not an exact replica but should provide a similar taste profile.

Makes approximately 200 mL.



`git add -p` has made me a better programmer

If you don’t know about this already, then file it under your collection of “One Simple Trick articles”…

git add -p (AKA git add --patch) will interactively show you each change in your repo and ask you if you want to stage it.

Do you ever use git commit --all? Have you ever accidentally committed more than you meant to?

By using git add -p, you get a chance to review each change. That helps you catch mistakes, like leaving in debugging printlns that you don’t need anymore, or other temporary hacks. Since git add -p will present each change separately, you can even include some changes and exclude others within the same file.

And, it’s not uncommon for me to fix small, unrelated issues while working on a major feature. I’d like for those fixes to go into a separate commit, and git add -p gives me a chance to catch those fixes if I haven’t committed them already.

The other major benefit is that it gives you a refresher on what all you changed before you need to write your commit message. If I work on a big changeset, sometimes it’s hard for me to remember all the important changes that I should write about in my commit message. Quickly running through the changes interactively means that, instead of writing “Add feature X”, I can write more detailed commit messages that list the major areas that have been touched.

Overall, it’s made my commits cleaner and me a more thoughtful programmer.

Don’t blindly commit files. Try git add -p.

P.S. Even though I now mostly use VS Code, I still prefer to review changes in the built-in terminal instead of the Source Control pane. Since git add‘s interactive mode uses single-character commands, I can quickly step through the changes without fiddling with the mouse.