Command-line snippets for processing scans
Sunday, 14 August 2011
I originally wrote this in 2009 as a cheatsheet for processing black and white scanned sheet music. It’s a collection of tips for converting and processing scanned images to ultimately get a nice, high-quality PDF that you could archive.
The programs convert and mogrify are part of the ImageMagick suite and the program pdfimages is part of the Xpdf suite.
- To extract all images from BestSongEver.pdf to bse-000.ppm, bse-001.ppm, bse-002.ppm, … (color images end with ppm, b/w images end with pbm)
$ pdfimages BestSongEver.pdf bse
- To convert every *.pbm image to TIFF using Group4 compression with a nominal resolution of 300pixels/inch
$ mogrify -format tiff -compress Group4 -density 300 *.pbm
- To convert a series of TIFF images, in alphabetical order, into one PDF document
$ convert *.tiff MyNewDocument.pdf
- To convert a series of TIFF images, applying group4 compression and a fixed DPI, into one PDF document
$ convert *.tiff -compress Group4 -density 300 MyNewDocument.pdf
- To convert all TIFF images in-place to GRAYSCALE
$ mogrify -type Grayscale *.tiff
- To convert all TIFF images in-place to BLACK and WHITE
$ mogrify -type Bilevel *.tiff
- To convert a multi-page TIFF document into a multi-page PDF
$ tiff2pdf MyDocument.tiff -o MyDocument.pdf
- To concatenate a series of TIFF images into a single multi-page TIFF
$ tiffcp page*.tiff MyNewDocument.tiff
- To split a two-page scan original.tiff down the middle into page-0.tiff and page-1.tiff
$ convert original.tiff -crop 50%x100% page.tiff