Fred's ImageMagick Scripts



    Licensing:

    Copyright © Fred Weinhaus

    My scripts are available free of charge for non-commercial use, ONLY.

    For use of my scripts in commercial (for-profit) environments or non-free applications, please contact me (Fred Weinhaus) for licensing arrangements. My email address is fmw at alink dot net.

    If you: 1) redistribute, 2) incorporate any of these scripts into other free applications or 3) reprogram them in another scripting language, then you must contact me for permission, especially if the result might be used in a commercial or for-profit environment.

    Usage, whether stated or not in the script, is restricted to the above licensing arrangements. It is also subject, in a subordinate manner, to the ImageMagick license, which can be found at: http://www.imagemagick.org/script/license.php

TEXTCLEANER


Processes a scanned document of text to clean the text background

Download Script

last modified: August 27, 2013



USAGE: textcleaner [-r rotate] [-l layout] [-c cropoff] [-g] [-e enhance ] [-f filtersize] [-o offset] [-u] [-t threshold] [-s sharpamt] [-s saturation] [-a adaptblur] [-T] [-p padamt] [-b bgcolor] infile outfile
USAGE: textcleaner [-help]

-r .... rotate .......... rotate image 90 degrees in direction specified if
......................... aspect ratio does not match layout; options are cw
......................... (or clockwise), ccw (or counterclockwise) and n
......................... (or none); default=none or no rotation
-l .... layout .......... desired layout; options are p (or portrait) or
......................... l (or landscape); default=portrait
-c .... cropoff ......... image cropping offsets after potential rotate 90;
......................... choices: one, two or four non-negative integer comma
......................... separated values; one value will crop all around;
......................... two values will crop at left/right,top/bottom;
......................... four values will crop left,top,right,bottom
-g ...................... convert document to grayscale before enhancing
-e .... enhance ......... enhance image brightness before cleaning;
......................... choices are: none, stretch or normalize;
......................... default=none
-f .... filtersize ...... size of filter used to clean background;
......................... integer>0; default=15
-o .... offset .......... offset of filter in percent used to reduce noise;
......................... integer>=0; default=5
-u ...................... unrotate image; cannot unrotate more than
......................... about 5 degrees
-t .... threshold ....... text smoothing threshold; 0<=threshold<=100;
......................... nominal value is about 50; default is no smoothing
-s .... sharpamt ........ sharpening amount in pixels; float>=0;
......................... nominal about 1; default=0
-S .... saturation ...... color saturation expressed as percent; integer>=0;
......................... only applicable if -g not set; default=100 (no change)
-a .... adaptblur ....... alternate text smoothing using adaptive blur;
......................... floats>=0; default=0 (no smoothing)
-T ...................... trim background around outer part of image
-p .... padamt .......... border pad amount around outer part of image;
......................... integer>=0; default=0
-b .... bgcolor ......... desired color for background; default=white

PURPOSE: To process a scanned document of text to clean the text background.

DESCRIPTION: TEXTCLEANER processses a scanned document of text to clean the text background and enhance the text. The order of processing is:
1) optional 90 degree rotate if aspect does not match layout
2) optional crop,
3) optional convert to grayscale,
4) optional enhance,
5) filter to clean background and remove noise,
6) optional unrotate (limited to about 5 degrees or less),
7) optional text smoothing,
8) optional sharpening,
9) optional saturation change (if -g is not specified),
10) optional alternate text smoothing via adaptive blur
11) optional auto trim of border (effective only if background well-cleaned),
12) optional pad of border

ARGUMENTS:

-r rotate ... ROTATE image either clockwise or counterclockwise by 90 degrees, if image aspect ratio does not match the layout mode. Choices are: cc (or clockwise), ccw (or counterclockwise) and n (or none). The default is no rotation.

-l layout ... LAYOUT for determining if rotation is to be applied. The choices are p (or portrait) or l (or landscape). The image will be rotated if rotate is specified and the aspect ratio of the image does not match the layout chosen. The default is portrait.

-c cropoffsets ... CROPOFFSETS are the image cropping offsets after potential rotate 90. Choices: one, two or four non-negative integer comma separated values. One value will crop all around. Two values will crop at left/right,top/bottom. Four values will crop left,top,right,bottom.

-g ... Convert the document to grayscale.

-e enhance ... ENHANCE brightness of image. The choices are: none, stretch, or normalize. The default=stretch.

-f filtersize ... FILTERSIZE is the size of the filter used to clean up the background. Values are integers>0. The filtersize needs to be larger than the thickness of the writing, but the smaller the better beyond this. Making it larger will increase the processing time and may lose text. The default is 15.

-o offset ... OFFSET is the offset threshold in percent used by the filter to eliminate noise. Values are integers>=0. Values too small will leave much noise and artifacts in the result. Values too large will remove too much text leaving gaps. The default is 5.

-t threshold ... THRESHOLD is the text smoothing threshold. Values are integers between 0 and 100. Smaller values smooth/thicken the text more. Larger values thin, but can result in gaps in the text. Nominal value is in the middle at about 50. The default is to disable smoothing.

-s sharpamt ... SHARPAMT is the amount of pixel sharpening to be applied to the resulting text. Values are floats>=0. If used, it should be small (suggested about 1). The default=0 (no sharpening).

-S saturation ... SATURATION is the desired color saturation of the text expressed as a percentage. Values are integers>=0. A value of 100 is no change. Larger values will make the text colors more saturated. The default=200 indicates double saturation. Not applicable when -g option specified.

-a adaptblur ... ADAPTBLUR applies an alternate text smoothing using and adaptive blur. The values are floats>=0. The default=0 indicates no blurring.

-u ... UNROTATE the image. This is limited to about 5 degrees or less.

-T ... TRIM the border around the image.

-p padamt ... PADAMT is the border pad amount in pixels. The default=0.

-b bgcolor ... BGCOLOR is the desired background color after it has been cleaned up. Any valid IM color may be use. The default is white.

CAVEAT: No guarantee that this script will work on all platforms, nor that trapping of inconsistent parameters is complete and foolproof. Use At Your Own Risk.


EXAMPLES


Example 1

 

Original

 

Arguments:
-g -e none -f 15 -o 20

 

Arguments:
-g -e stretch -f 15 -o 20

 

Arguments:
-g -e stretch -f 25 -o 20

 

Arguments:
-g -e stretch -f 25 -o 20 -s 1

 

Arguments:
-g -e stretch -f 25 -o 20 -u -s 1 -T -p 20

 



Example 2

 

Original

 

Arguments:
-g -e stretch -f 25 -o 10 -u -s 1 -T -p 10


Example 3

 

Original

 

Arguments:
-g -e stretch -f 25 -o 5 -s 1


Example 4

 

Original

 

Arguments:
-g -e stretch -f 25 -o 5 -s 1


Example 5

 

Original

 

Arguments:
-g -e stretch -f 15 -o 5 -s 1


Example 6

 

Original

 

Arguments:
-g -e stretch -f 25 -o 10 -s 1


Example 7

 

Original

 

Arguments:
-e normalize -f 15 -o 5 -S 200

 

Arguments:
-e normalize -f 15 -o 5 -S 200 -s 1

 

Arguments:
-e normalize -f 15 -o 5 -S 400


Example 8
(Provided by BrScan Tecnologia, Brazil)

 

Original

 

Arguments:
-g -e none -f 15 -o 10

 

Arguments:
-g -e normalize -f 15 -o 10

 

Arguments:
-g -e normalize -f 15 -o 10 -s 1

 

Arguments:
-c "0,50,0,0" -g -e normalize -f 15 -o 10 -u -s 2 -T -p 20


What the script does is as follows:

  • Optionally, crops the image
  • Optionally, converts to grayscale
  • Optionally, enhance the image to stretch or normalize
  • Creates a grayscale version of the enhanced image and
    applies a local area threshold using -lat and (optionally)
    and optionally some blurring for antialiasing to create a mask
  • Composites the mask with the corrected image to make the background white
  • Optionally unrotates, sharpens, changes the saturation, trims and pads

This is equivalent to the following IM commands for the case of graylevel, stretch
unrotate and sharpen=1:

  • convert \( $infile -colorspace gray -type grayscale -contrast-stretch 0 \) \
    \( -clone 0 -colorspace gray -negate -lat ${filtersize}x${filtersize}+${offset}% -contrast-stretch 0 \) \
    -compose copy_opacity -composite -fill "$bgcolor" -opaque none +matte \
    -deskew 40% -sharpen 0x1 \ $outfile