Learning Chinese (well enough to order from the menu) using MTurk

TL;DR: I used MTurk and PostgreSQL to create vocab flash cards for Chinese restaurant menus, ranked from most common to least common terms.

It’s an embarrassing scenario: you’re dining with friends at a Chinese restaurant when the server drops off the menus, including the non-English “secret menu”. All eyes turn to you, as the only Chinese person in the group. “You’re Chinese right? Can you read this?”

I’ve created a tool to solve this problem: a deck of 1,214 flash cards featuring vocabulary found on Chinese menus. The cards are ordered from most common (鸡: chicken) to least common, so if you study them in order, you’ll quickly learn the characters and phrases you’re most likely to encounter.

Montage of thumbnails of photographs or screenshots of Chinese restaurant menus.

To do this, I compiled a set of photos of Chinese restaurant menus from places like restaurant websites and Yelp. What I needed was to extract all of the Chinese text but the images weren’t good enough for OCR. Different fonts, complex layouts, and low-quality phone photos of menus (with glare, perspective skew, etc.) made it impossible for OCR to process the text reliably. And even if OCR could get it 90% right, the transcription errors would pollute the data, potentially resulting in some nonsensical datasets.


Visualizing LUT data in Blender

This animated cube contains the data in a 3D LUT:

“Chemical 168.CUBE” from Rocketstock

A Look-Up Table, as the name describes, is simply a large table of numbers. Given an input color RᵢGᵢBᵢ, you simply go to the corresponding row in the table and find your new color RⱼGⱼBⱼ. Each dot in the video is an RGB values (each axis goes from 0.0 to 1.0) and the color of each dot is the output color RⱼGⱼBⱼ.

The contents of “Chemical 168.LUT”. It goes on for quite a few lines…

This simplified description glosses over a few details… the main one being that even in 8-bit depth, it’s not practical or useful to include all 8×8×8=16,777,216 possible table rows. If we try looking up a color that’s in between our data points, we need to interpolate between the nearest points. In fact, the cube animation above includes only a subset of the LUT data, and the LUT itself only has 32 points per color axis (about 37k rows).


HDR light probes for cheap

TL;DR: A Ricoh Theta S can be had for under $60 on eBay. Combined with pfstools, you can easily capture HDRI environment maps and merge them into OpenEXR images.

The Ricoh Theta S is Ricoh’s cheapest 360-degree camera that has automatic bracketing functionality. I bought mine for about $60 on eBay second-hand. The image quality certainly isn’t as good as the flagship Ricoh Theta Z1 ($1,050) but it does the job.

Honestly, I haven’t seen the camera in any color other than black or white.

The automatic bracketing is easy to set up in the app and when run will take each shot in sequence with a few seconds of thinking in between.


A look back at the Argus C44 rangefinder camera

The Argus C44 was not a particularly fancy camera. First sold in 1956, it was the latest and greatest (and last) in a line of Argus rangefinders. Most people who know the name Argus know it for the Argus C3, a boxy “everyman’s” camera, but consider that the very next year, Nikon released the famous Nikon F. Nobody would choose to find an Argus at a garage sale over a Nikon F. I was never given the choice.

Anyways, I’m writing this so that at least one person on the Internet will have said positive things about the Argus C44.

The Argus C44’s metal construction feels weighty in my hands. It comes with an attractive set of three lenses (tele, normal, and wide) in individual leather cases. It includes a small viewfinder attachment that sits in the flash shoe and simulates each lens’ field of view. It doesn’t have a light meter but does come with a faint whiff of cigarette smoke (or at least mine did, anyways).

Many people have complained about the infuriating lens mount, but that’s OK—just use the 50mm f/2.8 and keep the other lenses as display pieces. More than sixty years later, it holds up pretty well. Legend holds that it was one of the first camera lenses ever designed with the help of an electronic computer, University of Michigan’s MIDAC. Considering those were the days of vacuum tubes and magnetic-drum memory, that’s a pretty impressive selling point.

The focus ring on my camera is quite difficult to turn, probably due to corrosion or cigarette tar. And looking through the rangefinder only gives an approximate confirmation of focus. So the rough procedure for taking the above photo was:


Is Krita ready for HDR painting?

Right this minute, you can open up Krita and start a new document in linear ACEScg with either 16fp or 32fp encoding. And it works! You can open floating-point OpenEXR files or use the color picker to choose colors like RGB[3.5, 3.0, 1.5] where normally you would be limited to 0.0–1.0. You can paint in a sun with a value of 60.0 if you want, or erase the sun, or whatever! Mostly!

There are still plenty of parts in Krita that do not understand color values above 1.0. For example, I absolutely need to be able to adjust my view “exposure” when working in HDR. Krita has a LUT Management dockable which theoretically uses OCIO to be able to choose a look, exposure, etc. but at least on my recent master build doesn’t seem to do anything. Adding a Slope/Offset/Power filter layer works in a pinch, but unfortunately makes the color picker useless.

Exposure? Gamma? No?

Jef Raskins on Car Talk

Listening to episode #2326 of Car Talk, titled “Bill’s Chevy Maliboo-boo,” I noticed an interesting attribution for the Puzzler—it was “inspired by Jef Raskin” of Pacifica, California. The actual spelling is not given, but it seems reasonable that this is the Jef Raskin of Macintosh fame.

This Car Talk episode is a re-run (or some kind of cobbled-together edit) of an older episode, probably from the 1990s. Jef Raskin was probably fairly well known by the time the original episode aired, although I couldn’t find any mention of this connection between him and Car Talk on the Web. And I suppose it’s entirely possible that there was a Jeff Rasken also living in Pacifica at the time, but it seems rather unlikely.

Anyways, the puzzler is titled “Chrome Plated” and involves propeller airplanes. If you’re interested, the puzzler and the answer can be found on the Car Talk website.

See Procreate thumbnails on Linux

This is old thing I made that I never wrote about here: a thumbnailer for Procreate files. Procreate is a popular iPad drawing and painting app that has its own native *.procreate file format. Most people probably never have to think about this since you can’t access files directly on an iPad and PSD is probably more commonly used for interchange, but it’s pretty straightforward to create thumbnails for Procreate files.

The Procreate files themselves are zip archives and contain a preview image. The thumbnailer just needs to extract the image. Instructions for installing it are in the README file in the linked repo.

MacSD SCSI adapter

My Macintosh SE had the original 20MB hard drive still full of software, and although it was painfully slow (too much bloat, maybe?) I didn’t want to lose anything on it. That meant I needed to get a new hard drive that I could play around with.

Since the Macintosh SE uses SCSI, I wanted to get a SCSI to flash memory card adapter, of which several exist. There is the SCSI2SD, the open-source BlueSCSI, and a newer one called MacSD.

As far as I can tell, MacSD only popped online late 2020. Maybe it was a pandemic project? So the number of people using it is relatively small, but there were a couple key things that made me buy it over the others.

It features a lot of flexibility in how you manage your images. You can edit the macsd.ini file to define a “CD changer” with multiple CD images — just eject to go to the next CD. And you can load a bunch of assorted floppy images in various formats into a “composite” device.

The other nifty feature is MacSD Commander, which lets you copy files from the SD Card’s FAT32 partition directly into System 6/7. You just copy files straight onto the SD card from your modern Windows/Linux/Mac computer and then transfer them on your classic Mac.

I did run into some bugs but the creator was very responsive, worked with me to fix them, and even gave me some credit online 🙂

Screenshot from MacSD.com

MacSD is not an open-source product, but it might be the right one for your classic SCSI-equipped computer!

Should I use swap on Linux?

TL;DR: I turned off swap on my machine and it’s been fine.

I haven’t been able to find clear guidance on whether I should use swap or not on my machine, and if so, how much to set. There are some arguments in favor of using swap that I’ve not been able to fully understand, essentially saying that swap lets you have more useful pages in memory because less useful pages can be swapped out.

But what does that really mean? If I have 20 GB of RAM and 4 GB of swap on a hard drive, isn’t that worse than just adding another 4 GB of RAM and having no swap? And if I have 24 GB of RAM, are you going to tell me I still need swap?

My total RAM + swap will forever be finite, and at some point I will run out of memory. And if I’m to believe that having swap is “a method of increasing equality of reclamation”, what if I put my swap in RAM? That sounds silly, but if simply adding 4GB more RAM is not better than having 4GB more swap, then isn’t that true?

So where are the benchmarks? And how would you even benchmark something like this? My use case as a desktop user is certainly different from a constantly loaded server, but here are the very real downsides to me using swap:

  • Hard drive wear is real. Flash memory wears down, and spinning disks cannot rest if they are constantly asked for swap.
  • The out-of-memory (OOM) case is ridiculously painful. Without swap, the offending process (usually a runaway program) gets killed. With swap, my entire computer grinds to a halt and becomes unresponsive until the offending process has managed to fill up the swap as well. That could take 10 minutes or longer.

Turning off swap means that any process that uses too much memory is immediately killed. That sounds bad except it would have gotten killed even if I did have swap, just much later.

And after turning off swap, I haven’t noticed any performance hits at all. In fact, performance has been better because idle processes don’t get pushed into swap during times of high memory pressure, so I don’t need to incur the penalty of swapping them back in (the infamous alt-tab-is-slow-because-of-swap problem).

If you’re still using swap on your desktop machine out of superstition, try turning it off. Maybe you don’t need it.

Cleaning up narration audio in Audacity

The below steps have worked great for me for cleaning up narration audio I’ve recorded. After running through this process, the audio should be ready to be assembled into a video, podcasts, etc.

When recording:

  • Maintain consistent distance between mouth and microphone
  • Don’t record too loud (past maximum) or too soft (too close to noise floor)
  • Leave 5 seconds of silence before start for noise reduction

After recording:

  • Noise Reduction using first five seconds of silence
  • Compressor
    • Threshold: -12 dB
    • Noise Floor: -25 dB
    • Ratio: 5:1
    • Attack Time: 0.10 secs
    • Release Time: 1.0 secs
    • enable “Compress based on Peaks”
  • De-Clicker: https://forum.audacityteam.org/viewtopic.php?t=79278
    • default parameters
    • optional – listen to audio first
  • Normalize
    • Normalize Peak Amplitude: -1 dB
    • Do separately for each section if levels inconsistent

These steps will fix noise and levels, giving a good baseline for then incorporating into a video or presentation.