What is Mutual Information
When you consider the pixel values of images A and B to be random variables, "a" and "b"; and estimate the entropy of their distributions you get
Note that log2 is the logarithm in base two, not the natural logarigthm that unfortunately is commonly used.
When you use log2(), the Units of entropy are "bit"s. It is again unfortunate that the usage of "bit" as unit of information measurement has been distorted in order to become the a symbol for binary encoding, o a unit of measurement for raw capacity of memory storage (along with its derivatives the byte, KiloByte, MegaByte... )
A digital image whose pixels are encoded in pixels of M bits, can have 2^M different grayscale values in a pixel and therfore its entropy can go up to the maximum theoretical value of log2(2^M) which, not coincidentally is equal to M.
In other words, if you compute the entropy of an image with PixelType unsigned char, whose pixels have grayscale values following a uniform distribution, the maximum value that you can get is "8", and if you want to be formal, you should mention the units and say:
The Entropy of this image is: "8 bits"
In practice, of course you get lower values. For example the Entropy of the well known Lena image (the cropped version that is politicaly correct) is
Lena Entropy = 7.44 bits
Now if you consider the mutual information measure, you have the following situation:
Mutual Information = H(A) + H(B) - H(A,B)
MI(A,B) = H(A) + H(B) - H(A,B)
In general, both H(A) and H(B) are bounded between [0:M] where "M" is the number of bits used for encoding their pixels.
H(A,B) is in theory bounded in [0:M2]
Note that if you use histograms with *less* bins than the actual digital encoding of the image, then your estimation of Entropy is bounded by the number of bins in your Histogram. !
For example if you use a histogram with 20 bins instead of 256 in order to estimate Lena's Entropy you will not get 7.44 bits, but only
that reflects the fact that by quantizing the gray scale values in larger bins you lose information from the original image.
For the particular case of Self-similarity, the entropies H(A) and H(B) are expected to be pretty much the same. Their difference arise only from interpolation errors and from the eventual effect of one image having corners outside of the extent of the other (e.g. if the image is rotated).
So, in general Mutual Information will give you
MI(A,T(A)) = H(A) + H(T(A)) - H(A,T(A))
Where T(A) is the transformed version of A. E.g. under a translation, or rotation, or affine transform.
If T = Identity and the interpolator is not approximating, your measure of Mutual Information becomes
MI(A,A) = 2 H(A) - H(A,A)
and the joint entropy H(A,A) happens to be equal to the entropy of the single image H(A), therefore the expected value of Mutual Information is equal to the image Entropy (of course measured in bits).
MI(A,A) = H(A) bits
That means that if you evaluate the Mutual Information measure between Lena and itself, you should get
Note that the reason why the measure of Mutual Information is reported as a negative number in ITK is because traditionally that has been used as a cost function for minimization.
However, in principle, Mutual information should be reported in the range [0,H(A)], where zero corresponds to two totally uncorrelated images and H(A) corresponds to perfectly correlated images, case in which H(A)=H(B).
To summarize, note that ITK is not using log2() but just log(), and note that the measure is reported as a negative number.
We just added a simple example for computing the Entropy of the pixel value distribution of an image to the directory:
You may find interesting to play with this example. E.g. you should try the effect of changing the number of bins in the histogram.