When scanning past exams, older scanners may not allow you to specify density, causing the text to become light. So I wrote a script that can be used when you’ve scanned something but the text is too light to read. It just binarizes PDFs, but it works quite well like this.

sample1 sample2

As preparation, install ImageMagick.

wget ftp://ftp.imagemagick.org/pub/ImageMagick/ImageMagick.tar.gz
tar xvf ImageMagick.tar.gz
cd ImageMagick-6.7.9-6/
./configure
make
make install

Once installed, write the following code to an appropriate file (here called ScanDataConvert.rb). The depth = 63000 is a threshold value, so please change it as appropriate.

#!/usr/bin/ruby

#Text density (63000)
depth = 63000

#Character encoding specification
$KCODE = "UTF8"

if ARGV[0] == nil || ARGV[1] == nil
	exit
end

#Workspace
workspace = "/tmp/workspace/"

#Create workspace
com = "mkdir #{workspace}"
`#{com}`
p com

#Split pdf file
com = "pdftoppm " + ARGV[0] +" "+workspace+"aaa"
`#{com}`
p com

com = "mogrify -format jpg "+workspace+"*.ppm"
`#{com}`
p com

#Process each file
com = "convert -threshold "+depth.to_s+" "+workspace+"*.jpg "+workspace+"result.jpg"
`#{com}`
p com

#Combine each file to create a new file
com = "convert "+workspace+"result*.jpg "+ARGV[1]
`#{com}`
p com

#Delete each file
#print `ls #{workspace}`
com = "rm -rf #{workspace}"
`#{com}`
p com

print "\n\ndone.\n"

Then just use this script.

./ScanDataConvert.rb before_processing.pdf after_processing.pdf