Table of Content

Home

Fourier transform for image manipulation

1. Quick introduction: Fourier transform

The key point of this post is to review image manipulation, including efficient image blurring and images registration, through using Fourier transform . Recall that discrete Fourier aims to transfer a 2D image from spatial domain (pixel space), i.e. $I(x,y)$ into frequency domain, i.e. $F(u,v)$, where each point ($u,v$) in $F$ represents a particular frequency. For a given pair of frequencies $(u,v), where $$u\in\{1,\dots,N\},v\in\{1,\dots,M\}$ with $N, M$ is size of image $I$, the Fourier transform of $I(x,y)$ is obtained as follows:

$$F(u,v) = \sum_{x=0}^{N-1}\sum_{y=0}^{M-1}I(x,y)\exp(-j2\pi(\frac{ux}{N}+\frac{vy}{M})).$$ Interestingly, $F(0,0)$, which shows zero frequency in the image, is the summation of image's pixels intensity. Note that, for an image of size $NxM$, its representation in frequency domain has the same size.

The image in frequency domain are complex number, consisting of two parts: a real part (denoted as $R(u,v)$) and an imaginary part ($I(u,v)$), i.e. $F(u,v) = R(u,v) + I(u,v)j$. As it is shown in Figure 1, the complex number of $F(u,v)$ can be represented in polar coordinate system, leading to:

$$F(u,v) = \underbrace{|F(u,v)|}_{\text{Magnitude}}\exp(j \underbrace{\phi(u,v)}_{\text{phase}}),$$ where $|F(u,v)|=\sqrt{R(u,v)^2+I(u,v)^2}$ and $\phi(u,v) = \arctan(I(u,v)/R(u,v))$ are called magnitude (spectrum) and phase at a given frequency $u,v$, respectively.
Figure 1: representing a complex number ($x+iy$) in polar coordinate $re^{j\theta}$.

2. Image blurring

A blurred image can be achieved by obscuring high frequencies of a given image in the frequency domain. To filter out high frequencies, different variants of low pass filter exist, such as Gaussian low pass (GLP) and ideal filters, among others. For example, the 1st image at 2nd row in figure 2 is a Gaussian low pass filter with band-width 10. Filtering high frequencies is equivalent to removing edges in the spatial domains, which can be achieved by convolving the image with a spatial kernel filter (e.g. mean filter). However, regarding computational complexity, image blurring in domain frequency can simply be done by one simple multiplication, instead of convolving it in spatial domain.

2.1. Code snippet for image blurring

Here, we define a GLP in frequency domain with a specified band-width, then multiply it by an image, represented in its frequency domain, to obtain a smooth (blurred) image. Note an image in frequency domain is zero-centered so that its zero frequency is placed at center of image (using fftshift function from fft package of numpy).
import numpy as np
from numpy import 
def Gaussian_filter(shape, kernel_width, Low=True ):

    X , Y = np.meshgrid(range(shape[1]),range(shape[0]))
    # center of image
    Cx , Cy =  shape[1]/2, shape[0]/2
    # Gaussian filter with given kernel_width
    fltr = np.exp(-((X-Cx)**2+(Y-Cy)**2)/(2*(kernel_width)**2))

    # high pass filter is simply 1-fltr
    if not Low:
        fltr = 1-fltr

    return fltr

orig_frq = np.fft.fftshift(np.fft.fft2(I)) #computing 2D fft of the given image using numpy, then zero-center shifting it.
GLP_10 = Gaussian_filter(I.shape, 10) #create a GLP with band-width=10
filter_img_frq=orig_frq* GLP_10  #multiply it with image, represented in its frequency domain

filter_img_spt=np.fft.ifft2(np.fft.ifftshift(filter_img_frq)) # back the filtered image to spatial domain by inverse of 2D fft

Results

In figure 2, the 1st row shows the input image, the filtered image, their difference (removed pixel). 2nd row represents a Gaussian low pass filter (band-width=10) and log of amplitude ($log |F|$) of the above images in frequency domain. Similarly, the 3rd row exhibits the corresponding images by their phase in frequency domain. The last column represents densities of pixel intensity (1st row), amplitude (2nd row) and phase (third row) of the original image and the filtered one.

Notice that while the pixel densities and the magnitude densities of the filtered and original images are changed due to removing high frequencies (equivalently removing edges), their phase densities are equal. This can show filtering image using such as low-pass filter (kernel band-width=10) do not change their pahses, but only their magnitude. But the interesting question is this is the case for all GLPs with large band-width?

Figure 2: Image blurring; using a low pass filter (kernel width= 10). 1st row: original image, filtered one, and their difference; 2nd row: amplitude of the images, 3rd row: phase of the images. Note that intensity of phases of the images (original image and the blurred one) are identical (at 3rd column of 3rd row).

In figure 3, each row is exhibiting the original image (1st column), its filtered image (2nd column) using a Gaussian low pass (GLP) filter (with a given kernel width), difference between the original image and its filtered one (3rd column). and (4th column) pixel density of images in the 1st and 2nd columns.

Notice that the GLP filter with larger (smaller) kernel width leads to less (higher) blurriness, meaning it allows to pass more (less) higher frequencies.

band-width= 1
band-width= 5
band-width= 10
Figure 3: Gaussian LP (GLP) filters with various kernel bandwidths leads to different level of blurriness.

3. Image registration

Image registration tends to align two images through affine transformation including translation, rotation, sheer, and etc. Here, we see how to register two translated images using Fourier transform, which is simple and efficient. Consider two images $I$ and $I_1$ that the latter is the translated version of the former, i.e. $I_1(x,y) = I(x-x_0, y-y_0)$. Phase only correlation can be used to obtain the precise values for $(x_0, y_0)$. Specifically, by representing the images in their frequency domain, we have $F_1(u,v) = e^{-j(ux_0+vy_0)} F(u,v)$. As it can be seen, magnitude of the images are equal, while their phases are different. Thus, their phase difference can be used to estimate the amount of translation. But how exactly? regarding cross-correlation between $F$ and $F_2$, we have: $$C(u,v)=\frac{F(u,v)F^{*}_{1}(u,v)}{|F(u,v)||F^{*}_{1}(u,v)|}= e^{-j2\pi(\frac{u}{N}x_0+\frac{v}{M}y_0)}$$ Interestingly, inverse of Fourier transform of $C(u,v)$ leads to delta function $c(x,y) = \delta(x-x_0, y-y_0)$, which has only 1 at entry $(x_0,y_0)$. Therefore, the values of translation (displacement) is $\arg\max c(x,y)$. Therefore, regarding the result of corss-correlation in spatial domain, we know that it has a single peak.

Computing the difference in phase of two images, then transfer their difference (in phase) back to the spatial domain. Using argmax, we can find the $(x, y)$ value of displacement.

3.1. Python code snippet

the following code can be used to find the amount of displacement, then translate the first image accordingly.


    import numpy as np
    import cv2
    def trans_value(I1, I2):

    #im1: convert to rgb2gray
    if np.ndim(I1)==3:
        I1 = I1[:,:,0]

    #im2: convert to rgb2gray
    if np.ndim(I2)==3:
        I2 = I2[:,:, 0]

    #compute fft of im1
    I1_f = (fft.fft2(I1))
    #compute phase of fft(im1)
    phase_1= np.angle(I1_f)

    #the above operations repeated for im2
    I2_f = (fft.fft2(I2))
    phase_2 = np.angle(I2_f)

    #compute difference in the phase of images as follows
    diff_phase = np.exp(1j*(phase_1-phase_2))

    #inverse of fft, then shift it
    poc = fft.fftshift(fft.ifft2(diff_phase))


    #finally find the peak in poc
    #index of maximum value in matrix poc shows translation values (ty,tx)
    ty,tx  = np.unravel_index(poc.argmax(), poc.shape)

    # origin is now at upper left corner
    ty = (-ty+int(I1.shape[0]//2))
    tx = (-tx+int(I1.shape[1]//2))

    # define translation matrix
    T = np.float32([[1, 0, tx], [0, 1, ty]])

    img_translation = cv2.warpAffine(I1, T, (I1.shape[1], I1.shape[0]))

    return tx,ty, poc, img_translation

Results

Two images are given, one is shifted version of the other. Using POC (phase only correlation), we can find how much the first is
Figure 4: finding the amount of translation between left and middle images using POC method. Then applying the translation matrix to the left image to translate it (resulting to right image).