Tests Of Perceptual Hash (PHASH) Compare Metric

(Author: Fred Weinhaus)

As of version 6.8.8.3, Imagemagick has a new compare metric: PHASH. The syntax is as follow:

compare -metric phash image1 image2 diffimage

or

compare -metric phash -subimage-search image1 image2 diffimage

(Note: in version 6.8.8.2, the phash metric was incorrect)

The phash metric was revised in IM 6.8.8.5, but the values are only slightly different.

This metric is different from all the rest. The two images do not have to be the same size when not using the -subimage-search setting. If the sizes differ, then the order is not important.

When doing subimage searches, the order is important and the larger image must be listed first. However, using this metric in a subimage search is exceedingly slow and works in limited circumstances. An example will be presented below.

For color images, this metric seems to be insensitive to a variety of mild to moderate attacks (variations in processing). These attacks include: resize, rotation (with black background), translation (with black background), brightness, contrast, gamma, blur, jpeg compression, noise, mirror/transpose, watermarking, arc distortion, barrel distortion, pincushion distortion and shear distortion.

The phash metric for color images is a modified version of one presented in http://www.naturalspublishing.com/files/published/54515x71g3omq1.pdf and is based upon the concept of Image Moments.

The main difference is the ImageMagick version uses the RGB and HCLp channels rather than the HSI and YCbCr channels described in the reference. For grayscale images, only the one channel is used. Only the first seven Hu moments listed in the Wikipedia reference were used.

The basis of this approach is that the Hu moments are scale, rotation and translation invariant for each of the RGB channels. However, they do not seem to be as robust to some of the other image attacks. The addition of the HCLp channels slightly decreases the insensitivity to scale, rotation, translation and compression, but increases the insensitivity to other attack, such as, brightness, contrast and gamma.

The hash is created by taking the log of the absolute value of each of the first 7 floating point Hu image moments from each channel using graylevels normalized to the range 0 to 1 (as of IM 6.8.8.5). The log is used to mitigate the large variation in exponents of the moments. The absolute value is used to make the moments insensitive to mirror-like attacks. For color images, there are 42 floating point values. For grayscale images, there are only 7 floating point moments. Therefore, the hash lengths are different for color images and grayscale images. Consequently, color and grayscale images cannot be compared directly. The phash metric computes the sum of the squared differences of all 42 or 7 log(abs(moments)) between the two images. One can review the Hu moments by the following commmand:

identify -verbose -moments yourimage

CAUTION: A constant color image channel will have zero values for Hu image moments I2-I7. For a black image, all moments will be zero. Since the log(0) is negative infinity, we clamp the moments to 1e-11 minimum (as of IM 6.8.8.5).

Tests were done using 510x510 center cropped versions of 8 (512x512) origninal images from the USC SIPI open source image library. The cropping was done to remove 1px border artifacts in the 512x512 images.

Color results and graphs and the subimage search example were done using IM 6.8.8.5. The grayscale results and graphs were generated using IM 6.8.8.3

This is the list of all image attack variations shown in the graphs

NOTE: False positives (attack scores larger than the lowest image vs image score) may happen for other images or for attacks beyond the range presented. Also false negatives may occur for other images in relation to the threshold range of this limited test set.



Color Results

Original Color Images
(click image for 510x510 test size)
airplane.png barbara.png boats.png lena.png
mandril.png peppers.png tiffany.png zelda.png
watermark image -- sphinx2.gif


Test Results

Threshold At 21

Five False Positives Out Of 704 Tests

No false positives, if

  • barrel limited to ≤2
  • color noise limited to ≤1.5
  • compression limited to ≥25
  • shear limited to ≤3

Results Table



Image vs Image
(minimum non-self match at 21.9)


Brightness Variations


Contrast Variations


Gamma Variations


Blur Variations


JPEG Compression Variations
(two false positives for airplane.png at 5,78.6 and 10,58.6)


Color Noise Variations
(one false positive for peppers.png at 2,29.5)


Grayscale Noise Variations


Scaling (Resize) Variations


Rotation (Black Background) Variations
(does not work as well with white background)


Mirror -- Flip, Flop, Transpose, Transverse


Translate (Black Background) Variations
Image Placed in North, West and Northwest Corners of 1024x1024 Black Background
(does not work as well with white background)


Watermark


Arc


Barrel
(one false positives for zelda.png at 3,36.4)


Pincushion


Shear
(one false positives for lena.png at 4,22.9)


Grayscale Results

Original Grayscale Images
(click image for 510x510 test size)
airplane_gray.png barbara_gray.png boats_gray.png lena_gray.png
mandril_gray.png peppers_gray.png tiffany_gray.png zelda_gray.png
watermark image -- sphinx2.gif


Test Results

Threshold At 3.7

Many More False Positives Than In Color Tests For The Same Ranges

Only resize, rotate, translate, mirror, blur and compress
are as good or better than in the color tests.
The rest are considerably worse.

Results Table



Image vs Image
(minimum non-self match at 3.74)


Brightness Variations


Contrast Variations


Gamma Variations


Blur Variations


JPEG Compression Variations
(two false positives for airplane.png at 5,78.6 and 10,58.6)


Grayscale Noise Variations


Scaling (Resize) Variations


Rotation (Black Background) Variations
(does not work as well with white background)


Mirror -- Flip, Flop, Transpose, Transverse


Translate (Black Background) Variations
Image Placed in North, West and Northwest Corners of 1024x1024 Black Background
(does not work as well with white background)


Watermark


Arc
(one false positive for lena.png at 3,22.5)


Barrel


Pincushion


Shear


Subimage Search For Rotated Copies

Large Image
original apple located at 150,150
180 deg rotated apple located at 30,30
Small Image
original unrotated apple
Match Score Image
(negated so that best match is white)
Match Score Image
(-linear-stretch 95x0% applied)
Resulting metric value is 150,150 0 (perfect match)
but closer inspection shows another perfect match at 30,30 0
The time to run this search was about 15 minutes on an
INTEL Mac Mini 2.66 GHz Dual Core, but using only one thread.