Fred's ImageMagick Scripts: UNPERSPECTIVE

Fred's ImageMagick Scripts

Licensing:

My scripts are available free of charge for non-commercial (non-profit) use, ONLY.

For use of my scripts in commercial (for-profit) environments or non-free applications, please contact me (Fred Weinhaus) for licensing arrangements. My email address is fmw at alink dot net.

If you: 1) redistribute, 2) incorporate any of these scripts into other free applications or 3) reprogram them in another scripting language, then you must contact me for permission, especially if the result might be used in a commercial or for-profit environment.

Usage, whether stated or not in the script, is restricted to the above licensing arrangements. It is also subject, in a subordinate manner, to the ImageMagick license, which can be found at: http://www.imagemagick.org/script/license.php

Please read the Pointers For Use on my home page to properly install and customize my scripts.

UNPERSPECTIVE

Automatically removes pespective distortion from an image.

Download Script

last modified: November 22, 2023

USAGE: unperspective [-P prerotate] [-p procedure] [-C channel] [-c coords] [-b bgcolor] [-f fuzzval] [-F filter] [-A area] [-a aspect] [-w width] [-h height] [-d default] [-m method] [-t thresh] [-s smooth] [-S sharpen] [-B blur] [-r rotate] [-M] [-i images] [-k kind] [-ma maxaspect] [-ml minlength] [-mp maxpeaks] [-mr maxratio] [-T traps] [-V] infile outfile
USAGE: pagepeel [-help]

-P ... prerotate .... prerotate image; choices are: autorotate (a), 90,
..................... 180, 270; default is no prerotate; autorotate only
..................... works, if the image has auto-orient metadata
-p ... procedure .... background extraction procedure; choices are:
..................... floodfill (f), threshold (t), autothresh (a);
..................... autothresh requires my otsuthresh script;
..................... default=floodfill
-C ... channel ...... image channel to use for non-floodfill background
..................... extraction; choices are: gray, red, green, blue, cyan,
..................... magenta, yellow, black; default is no specific channel
-c ... coords ....... pixel coordinate to extract background color;
..................... may be expressed as gravity value
..................... (e.g. northwest) or as "x,y" value;
..................... default is 0,0
-b ... bgcolor ...... background color outside the distorted
..................... quadrilateral; any valid IM color;
..................... default determined by coords argument
-f ... fuzzval ...... fuzz value for isolating quadrilateral from
..................... background; 0<=float<=100; default=10
-F ... filter ....... morphology filter to smooth gaps and bumps on the
..................... mask boundary; integer>=0; default=0
-A ... area ......... area threshold for connected components filtering of
..................... mask image expressed as percent of image area;
..................... 0<integer<100; default is no connected components
..................... filtering; requires IM 6.9.2-8 or higher
-a ... aspect ....... desired width/height aspect ratio; float>0;
..................... default will be computed automatically
-w ... width ........ desired width of output; default determined
..................... automatically from "default" parameter below;
..................... only one of width or height may be specified
-h ... height ....... desired height of output; default determined
..................... automatically from "default" parameter below;
..................... only one of width or height may be specified
-d ... default ...... default output dimension; choices are: el
..................... (length of first edge of quadrilateral used
..................... as height), bh (quadrilateral bounding box
..................... height), bw (quadrilateral bounding box width),
..................... h (input image height), w (input image width);
..................... default=el
-m ... method ....... method of determining quadrilateral corners
..................... from peaks in depolar image; choices are:
..................... peak (p) or derivative (d); default=peak
-t ... thresh ....... threshold value for removing false peaks; integer>=0;
..................... default=4 for method=peak;
..................... default=10 for method=derivative
-s ... smooth ....... smoothing amount to remove false peaks; float>=0;
..................... default=1 for method=peak;
..................... default=5 for method=derivative
-S ... sharpen ...... sharpening amount to amplify true peaks; float>=0;
..................... default=5 for method=peak;
..................... default=0 for method=derivative
-B ... blur ......... blurring amount for preprocessing images of
..................... text with no quadrilateral outline; float>=0
-r ... rotate ....... desired rotation of output image; choices are:
..................... 90, 180 or 270; default is no rotation
-M .................. monitor and display textual information about
..................... processing to the terminal
-i ... images ....... keep ancillary processing images; choices are:
..................... view or save; default is neither
-k ... kind ......... kind of ancillary processing images; choices are:
..................... mask, polar, edge or all; default=mask
-ma ... maxaspect ... trap for maximum aspect ratio; integer>0;
..................... default=10
-ml ... minlength ... trap for minimum edge length; integer>0;
..................... default=10
-mp ... maxpeaks .... trap for maximum number of false peaks before
..................... filtering to remove false peaks; integer>0;
..................... default=40
-mr ... maxratio .... trap for maximum intermediate/input dimension ratio;
..................... integer>0; default=10
-T .... traps ....... turn off internal traps; choices are; maxaspect (ma),
..................... minlenght (ml), maxpeaks (mp), maxratio (mr) or all (a)
-V .................. disable viewport crop of output

PURPOSE: To automatically remove pespective distortion from an image.

DESCRIPTION: UNPERSPECTIVE attempts to automatically remove pespective distortion from an image without the need to manually pick control points. This technique is limited and relies upon the ability to isolate the outline or boundary of the distorted quadrilateral in the input image from its surrounding background. This technique will not look for internal edges or other details to assess the distortion. This technique also works to correct affine distortions such as rotation and/or skew.

The basic principal is to isolate the quadrilateral of the distorted region from its background to form a binary mask. The mask is converted from cartesian coordinates to polar coordinates and averaged down to one row. This row is then processed either to find the highest peaks or the highest second derivative peaks. The four peaks identified are then converted back to cartesian coordinates and used with the ouput dimensions determined from the user specified (or computed) aspect ratio and user specified dimension.

The processing is done using +distort perspective and thus a larger intermediate image is computed to encompass the undistorted result. Potential errors in finding the peaks may cause excessively large intermediate images to be generated taking a long time and much disk space. Therefore several traps have been specified, though one may disable them if one wants.

Also to avoid erroneous situations, there is an option to view or save some ancillary images. These include the quadrilateral mask, a graph of the 1D polar domain image or its second derivative, the outline of the edges drawn between the four corners found from the peaks, or all of them. If this mode is enabled, then processing will pause at a prompt on the terminal after each image is displayed (not saved) so that one may decide to continue or quit so as to avoid erroneous situations.

The most important one is the mask. If the mask shows that the fuzzy floodfill to remove the outside background color is either insufficient to isolate the quadrilateral cleanly or has incursions into the quadrilateral, then the processing will fail. In note, the peak method (default) is more robust to such flaws than the derivative method. It is also more robust to evenly rounded corners on the quadrilateral region. However, the peak method does not work well for aspect ratios greater than about five; whereas the derivative method is more robust in this regard and can work with aspect ratios of at least 10.

It is highly recommended that one ensures that the quadrilateral is cleanly separated from the background. This can be done either by enabling the mask view in the script so you can quit it if there is insufficient separation or prior to running the script do your own fuzzy floodfill. See http://www.imagemagick.org/Usage/draw/#matte

One may also enable monitoring of the processing steps to get textual information about the progress and also numerical information about the peaks, corner coordinates found, aspect ratio and sizes and ratios of the intermediate images. This is helpful when the result is not satisfactory or has quit from the traps so that one may refine some of the arguments.

ARGUMENTS:

-P prerotate ... PREROTATE the input image. The choices are: autorotate (a), 90, 180, 270. The default is no prerotate. Autorotate only works, if the image has auto-orient metadata.

-p procedure ... background extraction PROCEDURE. The choices are: floodfill (f), threshold (t), autothresh (a). NOTE: -p autothresh requires my otsuthresh script. The default=floodfill.

-C channel ... the image CHANNEL to use for non-floodfill background extraction. The choices are: gray, red, green, blue, cyan, magenta, yellow, black. The default is no specific channel. This can be useful, if one can select a channel that will have the highest contrast between the foreground and background color. Typically that will be the complement of the background colors closest primary or secondary color. For example, if the background color is yellowish, then choose the blue channel.

-c coords ... COORDS is any location within the input image for the algorithm to find the background color. It may be specified in terms of gravity parameters (NorthWest, North, NorthEast, East, SouthEast, South, SouthWest or West) or as a pixel coordinate "x,y". The default is the upper left corner = NorthWest = "0,0".

-b bgcolor ... BGCOLOR is the background color outside the distorted quadrilateral. Any valid IM color is allowed. The background color is used to do a fuzzy trim of the image to the bounding box around the quadrilateral. The default color is determined by the above "coords" argument.

-f fuzzval ... FUZZVAL is the fuzz amount specified as a float percent 0 to 100 (without the % sign). It is used 1) for trimming the image to boundingbox about the quadrilaterls, 2) for floodfilling the background to convert the the quadrilateral into a binary mask and 3) for trimming the output image. The default=10. Use a value that will produce a mask that cleanly corresponds to the distorted quadrilateral area of the image. Note that method=peak is fairly robust to minor imperfections in the mask, but method=derivative is not.

-F filter ... FILTER is a morphology filter to smooth gaps and bumps on the mask boundary. Values are integer2>=0. The default=0.

-A area ... AREA threshold for connected components filtering of the internally generate mask image expressed as percent of image area. Values are in the range of 0<integers<100. The default is no connected components filtering. If a value is provided, then this option requires IM 6.9.2-8 or higher.

-a aspect ... ASPECT is the desired output width/height aspect ratio Values are floats>0. The default will be computed automatically.

-w width ... WIDTH is the desired width of the output. The default is determined automatically from the "default" parameter below. Only one of width or height may be specified. Note: the output size will only be close to the value specified

-h height ... HEIGHT is the desired height of the output. The default is determined automatically from the "default" parameter below. Only one of width or height may be specified. Note: the output size will only be close to the value specified

-d default ... DEFAULT is the default output dimension. Choices are: el (length of first edge of quadrilateral used as height), bh (quadrilateral bounding box height), bw (quadrilateral bounding box width), h (input image height), or w (input image width). The default is el.

-m method ... METHOD is the method of determining the quadrilateral corners from peaks in the polar image. Choices are: peak (p) or derivative (d or deriv). The default=peak

-t thresh ... THRESH is the threshold value for removing false peaks. Values are integers>=0. The default=4 for method=peak and the default=10 for method=derivative. Higher values remove more peaks and if too high can remove true corner peaks.

-s smooth ... SMOOTH is the smoothing amount used to help remove false peaks. Values are floats>=0. This is filtering step applied to the 1D polar image. The default=1 for method=peak and the default=5 for method=derivative. Larger values will remove more peaks and if too high can remove true corner peaks.

-S sharpen ... SHARPEN is the sharpening amount used to amplify true peaks. Values are floats>=0. This is a filtering step applied after the smoothing to the 1D polar images. The default=5 for method=peak and the default=0 for method=derivative. Larger values will amplify the peaks. This is necessary when in method=peak, because obtuse angles in the quadrilateral show up as plateaus rather than peaks. The sharpening raises them enough to be turned into peaks. Sharpening also helps locate the true peaks more accurately when they are rounded and not sharp.

-B blur ... BLUR is the blurring amount for preprocessing images of text with no quadrilateral outline. Values are floats>=0. This option is only useful for text images with no quadrilateral outline when the text is evenly distributed, especially to the four corners of the quadrilateral. The option only works if at all with method=peak due to its more robust nature.

-r rotate ... ROTATE is the desired rotation of output image. Choices are: 90, 180 or 270. The default is no rotation. The technique used here can correct rotation when the top left corner of the distorted quadrilateral is in the top left or top right quadrant of the input image and the bottom left corner is in the bottom left quadrant of the input image. Otherwise, the technique cannot automatically determine 90 degree increment rotations.

-M ... ENABLE MONITORING and display textual information about processing to the terminal. Such information is usefull for determining progress as the non-IM bash scripting is extensive and slow. It is also usefull for determining changes to the parameters when the process fails to create a good image.

-i images ... IMAGES permits the viewing or saving of the ancillary images generated during the processing. The choices are: view or save. the default is neither. This option is important because it allows one to view the extracted quadrilateral (mask) to make sure it is well-isolated from the background. Other images may also be useful for determining why the process failed to create a good image so as to permit changing of the arguments. When images=view, processing will pause while one views the image and can be restarted or quit from a prompt at the terminal. This is important as a bad mask image can lead to bad corner points being extracted, which can lead to huge or strange output images with big memory or disk requirements.

-k kind ... KIND is the kind of ancillary images that can be viewed or saved. The choices are: mask (m), polar (p), edge (e) or all (a). Mask is the quadrilateral isolated from the background and binarized by the fuzzy floodfill process. Polar is the conversion of the mask from cartesian to polar coordinates, averaged down to one row and presented as a graph image. Edge is an image showing the lines connecting the extracted corner coordinates from the mask image. The default is mask.

-T traps ... TURNS off (disables) the internal TRAPS that are used to stop processing if their thresholds have been exceeded. The traps are used to avoid situations where the result may be erroneous and produce huge output images. The choices are: maxaspect (ma), minlenght (ml), maxpeaks (mp), maxratio (mr) or all (a). By default all traps are on.

-ma maxaspect ... MAXASPECT is a trap on the maximum aspect ratio to permit especially from the automatic aspect ratio computation. Values are integers>0. The default=10.

-ml minlength ... MINLENGTH is a trap on the minimum edge length to permit from the lengths between connected corners of the extracted quadrilateral. Values are integers>0. The default=10.

-mp maxpeaks ... MAXPEAKS is a trap on the maximum number of peaks to permit before filtering on false peaks. If too many peaks are located, then processing will take a long time to filter and may end up with erroneous peaks. This trap permits the user to change the thresh and smooth arguments to reduce the number of peaks. Values are integers>0. The default=40.

-mr maxratio ... MAXRATIO is a trap on the maximum ratio between the intermediate and input image's dimensions. Due to the use of +distort perspective to unwarp the image, a very large image may be produced before it is trimmed to its bounding box for output. This trap tries to prevent such situations as this may cause long processing times and need large memory and/or disk resources. Values are integer>0. The default=10.

-T traps ... TURNS OFF (disables) the internal TRAPS that are used to stop processing if their thresholds have been exceeded. The traps are used to avoid situations where the result may be erroneous and produce huge output images. The choices are: maxaspect (ma), minlenght (ml), maxpeaks (mp), maxratio (mr) or all (a). By default all traps are on.

-V ... disables the viewport crop of the output image and allows +distort to compute a larger output image before doing a fuzzy trim. When disabled, processing time will be longer. Disabling the viewport crop is usefull when the resulting image is cropped too much, such as can occur when the quadrilateral has rounded corners.

REFERENCES:
http://stackoverflow.com/questions/3790445/1d-multiple-peak-detection
http://www.sagenb.org/home/pub/704/
http://research.microsoft.com/users/zhang/Papers/WhiteboardRectification.pdf
http://research.microsoft.com/en-us/um/people/zhang/papers/tr03-39.pdf

REQUIREMENTS: Method=peak requires IM 6.4.2.6 or higher due to the use of -distort depolar. Method=derivative requires IM 6.5.9.2 due to the use of -morphology. NetPBM is needed for both methods.

CAVEAT: No guarantee that this script will work on all platforms, nor that trapping of inconsistent parameters is complete and foolproof. Use At Your Own Risk.

EXAMPLES

NOTE: Some slight difference may occur with the examples not using -V. This may occur because the original version of the script used the equivalent of -V all the time. The current version runs faster without the -V, but may crop slightly differently.

Example 1
Pre-Distorted Image	Distorted Image

Arguments: -f 20

Example 2 -- Rotation
Pre-Distorted Image	Distorted Image

Arguments: -f 20	Arguments: -f 20 -r 270

Example 3 -- Fuzz Robustness
Pre-Distorted Image	Distorted Image

Arguments: -f 20	Mask

Arguments: -f 52 -t 8 -s 2	Mask

Example 4 -- Variation In Default Dimension
Pre-Distorted Image	Distorted Image

Arguments: -f 20 -d el	Arguments: -f 20 -d bh	Arguments: -f 20 -d h

Example 5 -- Rounded Corners
Pre-Distorted Image	Distorted Image

Arguments: -f 20 -s 2 -V

Example 6
Pre-Distorted Image	Distorted Image

Arguments: -f 10 (default)

Example 7
Pre-Distorted Image (http://www.entheosweb.com/photoshop/glass_effects.asp)	Distorted Image

Arguments: -f 10 (default)

Example 8 -- Fuzz Variation And Width
Original Image (http://www.beechman.co.uk/receipt_paper/111mm_wide_receipt.jpg)

Arguments: -f 7 -V	Mask (noisy)

Arguments: -f 20 -V	Mask

Arguments: -f 20 -w 500 -V

Example 9 -- Noisy Background
Original Image (http://www.forkparty.com/wp-content/uploads/2011/02/Funny-Receipt-Fail.jpg)

Arguments: -f 50	Mask (noisy)

Arguments: -f 50 -w 200

Example 10 -- No Formal Quadrilateral Boundary
Pre-Distorted Image	Distorted Image

Arguments: -f 10 -t 10 -s 2 -S 0 -B 3 -V	Mask

What the script does is as follows:

Applies a fuzzy trim to the image
Applies a fuzzy floodfill to the trimmed image to create a binary mask
Converts the binary mask from Cartesian to Polar Coordinates using -distort depolar
Averages the polar image down to one row
Finds the peaks in the row image
Filters the peaks to get the 4 primary peaks
Converts the 4 primary peaks from Polar back to Cartesian coordinates
Determines the aspect ratio from the 4 Cartesian coordinates
Uses the aspect ratio and the first edge of the mask to determine the output dimensions
Uses -distort perspective to warp the image to remove the perspective distortion