Autofocus using image phase congruency

Yibin Tian

doi:10.1364/OE.19.000261

1. Introduction

In a simple optical imaging system (Fig. 1 ), the image is focused when the object distance, image distance and lens focal length satisfy the lens equation

\frac{1}{p} + \frac{1}{q} = \frac{1}{f}

where p and q are the object and image distances respectively, and f the focal length of the thin lens. In Fig. 1, when the image sensor is at location A, the object O is focused on the sensor as image I. This perfect alignment of the object, lens and image sensor is achieved by focusing, a process where either the object, the lens, the image sensor, or certain combination of the three is adjusted.

Fig. 1 Thin-lens imaging system model. O, I and C represent the object, image and lens center respectively. The object distance, image distance, lens focal length and lens diameter are labeled p, q, f and d respectively. The image is focused when the image sensor is positioned at A, but blurred when the image sensor is positioned at B (the diameter of the blur circle is b).

Download Full Size | PDF

Numerous optical imaging systems, from microscopes, industrial inspection tools, to consumer cameras, have the functionality of autofocus. As most of these imaging systems are digital nowadays, passive autofocus techniques have mostly replaced active ones to reduce cost and improve robustness. Passive autofocus relies on analyzing defocus (optical blur) in one or more intermediate images in imaging systems to determine the right direction of focusing adjustment [1]. As such, a robust focus measure is a critical element of a passive autofocus system. In this report, defocus may indicate either lens focusing status or its resultant optical blur in the images, and its meaning should be unambiguous in its context. In addition, optical blur in this report refers only to that caused by defocus, excluding non-defocus aberrations.

In most applications, it is impossible to obtain absolute values of optical blur for images as the ground truths are unknown. Thus all practical focus measures are based on some trial-and-error mechanisms to obtain the best focused image [1,2]. Defocus acts as low-pass filters. For the same target, a less defocused image has more high spatial frequencies than its more defocused counterparts. Therefore functions sensitive to image frequency compositions can be focus measure candidates. The fundamental requirements for a focus measure are unimodalidy and monotonicity, which guarantee that the focus measure has only one extreme value and is monotonic on each side of its peak or valley [2]. For practical purpose, a good focus measure also needs to satisfy a few additional criteria, such as low sensitivity to noise and high sensitivity to defocus, wide effective range, robustness and decent computational efficiency [2]. Focus measures are not only used in passive autofocus, but also in a number of other optical imaging and machine vision applications, such as depth from focus or defocus [3–8], image fusion [9,10], and shape from focus or defocus [11,12], and so on.

Many focus measures have been proposed and studied in the literature [2,9,10,13]. Some of these focus measures, such as Image Variance and Energy of Image Gradient, possess good balance of sensitivity to defocus, effective range and computational efficiency [2,13]. However, these focus measures are generally based on image intensity, and thus are not robust in variable imaging conditions, such as partial occlusion and varying illumination. Recently, Aydin and Akgul proposed a focus measure insensitive to occlusion for depth from focus [14]. In this report we demonstrate a focus measure using image phase congruency that is robust for noisy imaging sensors in varying illuminations. The proposed focus measure also has great balance of defocus and noise sensitivity and effective range. Its advantages are shown with a number of synthetic image sequences. For simplicity, we only consider its application in autofocus. It can be easily adopted for other application areas mentioned above.

2. Method

2.1 Focus measurement using phase congruency

The phases of Fourier components of images contain important information of image features, and phase congruency was proposed as a feature perception and detection mechanism [15–17]. A local phase coherence model has also been proposed as a blur perception mechanism in human vision; the rationale of the model is that blur disrupts local phase structure [18]. Phase congruency is closely related to local energy [17]. The energy of salient local image features, such as edges, junctions or other textures, is reduced by the blurring effects of defocus, which translates into the reduction of phase congruency at the locations of these features. In other words, a blurred image has less total phase congruency than its corresponding focused counterpart. Thus phase congruency can be utilized as an indicator of defocus. In one dimension, phase congruency is defined as

P C (x) = \begin{matrix} \max_{\begin{array}{l} φ (x) \in \\ [0, 2 π] \end{array}} & (\frac{\sum_{n} A_{n} \cos [φ_{n} (x) - φ (x)]}{\sum_{n} A_{n}}) \end{matrix} .

where

A_{n}

and

ϕ_{n} (x)

are the amplitude and local phase at location x of the nth Fourier component, and

ϕ (x)

the amplitude weighted mean of local phases at location of all Fourier components.

As it is difficult to compute phase congruency as defined in Eq. (1), Kovesi proposed a computational definition of phase congruency [19] using the logarithmic Gabor wavelets widely adopted in the human vision and perception community [20]. In a 2-dimensional image, the phase congruency of a pixel at location $(x, y)$ is computed as

P C (x, y) = \frac{\sum_{i = 1}^{m} \sum_{j = 1}^{n} W_{i} (x, y) f_{p} {A_{j i} (x, y) [Δ ϕ_{j i} (x, y)] - T_{i}}}{\sum_{i = 1}^{m} \sum_{j = 1}^{n} A_{j i} (x, y)} .

where m and are the numbers of orientations and scales,

A_{j i} (x)

and

ϕ_{j i} (x)

the amplitude and local phase deviation for the jth scale logarithmic Gabor function at the ith orientation,

T_{i}

the estimated noise at the ith orientation,

W_{i} (x)

the weighting function at the ith orientation, and

f_{p} (u) = (u > 0) u

. The local phase deviation is computed as

Δ ϕ (x, y) = \cos [ϕ_{n} (x, y) - ϕ (x, y)] - | \sin [ϕ_{n} (x, y) - ϕ (x, y)] | .

where

ϕ (x, y)

the amplitude weighted mean of local phases at location

(x, y)

.

Similar to the popular gradient-based focus measures, the Image Phase Congruency (IPC) focus measure of a target image (i) can be simply computed as the sum of the phase congruency at all pixels within its focus window ( $i_{f w}$ )

I P C (i) = \iint_{(x, y) \in i_{f w}} P C (x, y) d x d y .

Four representative focus measures are chosen for comparisons: Image Variance (VAR) and Energy of Image Gradient (EIG) in spatial domain, and Energy of Spectrum (EOS) and Wavelet Band Ratio (WBR) in frequency/wavelet domain [2]. To reduce the impact of image contrast (illumination), the spatial domain focus measures VAR and EIG are slightly modified to be normalized by the average intensity of the image. For the readers' convenience, these focus measures are briefly described in Appendix A. The values of the focus measures defined above depend on the focus window; each of them can also be computed by dividing the number of pixels in the focus window.

2.2 Focus window

The focus window is the region of interest for autofocus in the target image. Except for special cases, the focus window is usually not the whole target image due to two equally important factors. The first reason is to use a smaller focus window to reduce computational cost, which is important because autofocus systems are mostly for real-time applications. The second reason is that it is optically impossible to obtain an all-focused image when objects are not at the same object distance unless advanced post-processing is utilized, and choosing an unnecessarily large focus window is likely to result in an all-blurred image [21]. Accurate identification of focus window involves advanced image segmentation and pattern recognition. Two simple methods have been reported to address this issue: to identify the region of interest in the image by tracking the photographer's pupil [21], or to separate human skin in the image as focus window [22]. In commercial digital cameras, face detection has been utilized to help the photographer choose focus windows. All these methods have limitations and focus window selection is still an unresolved issue. In this report, we do not address this issue. The targets in the image sequences used are planar so it is reasonable to use a rectangular area in the central portion of the target image as focus window. The focus window is fixed for all the cases analyzed in this report.

2.3 Phase congruency focus measure evaluation

Point spread functions (PSFs) from defocus are modeled as Gaussian functions for simplicity. The energy under the Gaussian envelope is always normalized to 1, and the standard deviation (σ) is determined by 1/3 of the blur radius (Fig. 2 ). In other words, we assume that 3σ of the Gaussian PSF is equal to the blur radius, which is in turn determined by camera parameters [23]

σ = \frac{R q}{3} (\frac{1}{f} - \frac{1}{q} - \frac{1}{p}) .

where R and fare the radius and focal length of the camera lens, andpand qthe object and image distances respectively.

Fig. 2 Point spread functions (PSFs) for 9 different amounts of defocus. Each PSF is a Gaussian function of 11x11 pixels, The standard deviation of each PSF is 1/3 of the blur radius from the corresponding amount of defocus. The energy of each PSF is normalized to 1.

Download Full Size | PDF

A gray-scale eye chart image is used as the perfectly focused image. The defocused images are generated by convolving the perfectly focused image with the corresponding PSFs. Different types of noise and/or contrast adjustment can be further imposed on the defocused images. Afterwards, values of the focus measures described above are computed for the chosen rectangular focus window. For each defocused image sequence, the absolute values of different focus measures vary dramatically. For easier visualization and comparison, the values of each focus measure are normalized so that the maximum is always 1. It should be noted that as we only consider optical blur caused by defocus, the focus measure curves will be symmetric if the amount of optical blur is the same for under-focus and over-focus. Therefore, only one side of each focus measure curve is presented.

Unimodalidy and monotonicity are the fundamental properties that a focus measure must possess. In addition, we evaluate three other important aspects of a focus measure: sensitivity to defocus, sensitivity to noise, and robustness under variable illumination conditions. The effective range of a focus measure is generally inversely correlated to its sensitivity to defocus.

3. Experimental results

3.1 Ideal case - no noise

When there is no noise in the images (Fig. 3 , top row), all focus measures evaluated are unimodal and monotonic, and the sensitivity to defocus and effective range of IPC are similar to those of EOS, between those of VAR, EIG and WBR (Fig. 4a ).

Fig. 3 images with variable amount of defocus and three types of noise.

Download Full Size | PDF

Fig. 4 Focus measure performance with (a) no noise, (b) Poisson noise, (c) Gaussian noise and (d) Speckle noise.

Download Full Size | PDF

3.2 With three types of noise

The imposed Gaussian (additive) and Speckle (multiplicative) noise is of zero mean and 5% variance. Poisson noise is neither additive nor multiplicative. For a given input pixel value, the output pixel value with Poisson noise is generated from a Poisson distribution with its mean equal to the input pixel value. Figure 3 rows 2-4 show some sample defocused images with the three types of noise imposed, and Fig. 4b-d illustrate the performance of the focus measures.

In all cases, the performance of IPC is least affected by the noise: it maintains unimodality and monotonicity, and its defocus sensitivity reduction is trivial while its effective range is almost intact. The other 4 focus measures are all very sensitive to the level of noise imposed: VAR and WBR are not even unimodal or monotonic in some cases, and the defocus sensitivity of EIG and EOS is significantly reduced, particularly in the presence of Gaussian and Speckle noise.

Additional filtering can be carried out to reduce noise in the target images (Fig. 5 ) to mitigate the impact of noise on the performance of EIG, EOS, VAR and WBR (Fig. 6 ). Preprocessing with a median filter before computing these focus measures improves their defocus sensitivity and effective range (except for VAR, which still loses unimodality/monotonicity). However, the negative impact of noise on these focus measures is still significant after filtering.

Fig. 5 Sample images with variable amount of defocus and different types of noise, further processed by a 5x5 median filter.

Download Full Size | PDF

Fig. 6 Focus measure performance with (a) no noise, (b) Poisson noise, (c) Gaussian noise and (d) Speckle noise. Except for IPC, noise in input images is reduced by a 5x5 median filter.

Download Full Size | PDF

3.3 With image contrast variations

Image contrast is manipulated to simulate illumination variations. For simplicity, image contrast manipulation is done via scaling by a factor k (between 0 and 1) and adding an additional offset u to the pixel values of every other input images (Fig. 7 ).

Fig. 7 Sample images with variable amount of defocus, different types of noise, and variable contrast (scaling factor 0.5, offset 20), further processed by a 5x5 median filter.

Download Full Size | PDF

Even when noise is absent, VAR, EIG and EOS are no longer unimodal and monotonic in the presence of significant image contrast variations. When both image contrast variations and noise are significantly present, IPC is the only focus measure that maintains robust performance in terms of unimodality, monotonicity, low noise sensitivity, and balanced defocus sensitivity and effective range (Fig. 8 ).

Fig. 8 Focus measure performance with contrast variations (scaling factor 0.5, offset 20): (a) no noise, (b) Poisson noise, (c) Gaussian noise and (d) Speckle noise. Except for IPC, noise in input images is reduced by a 5x5 median filter.

Download Full Size | PDF

4. Discussion and conclusion

In the presence of significant image contrast variations and noise (Poisson, Gaussian and Speckle), IPC is the only evaluated focus measure that maintains robust performance in terms of unimodality, monotonicity, low noise sensitivity, and defocus sensitivity and effective range, and VAR is the least robust. As a de-noising measure, median filter can reduce the negative impact of noise on EIG, EOS and WBR.

More sophisticated filtering that is optimal to the specific noise type may further improve the performance of VAR, EIG, EOS and WBR in the presence of significant noise. But insensitivity to image contrast (illumination) is more difficult to achieve using these image intensity based focus measures. The superior performance of IPC over them on noisy and contrast-varying target images is due to two factors: first, IPC as computed in this report has a built-in adaptive de-noising mechanism, that is, noise in each image is locally estimated and removed in the process of phase congruency calculation; second, IPC utilizes local image phases, which are insensitive to image contrast variations.

It should be noted that IPC is more computation intensive than the other 4 focus measures compared. However, as the embedded signal and image processors in imaging systems become more and more powerful, this is much less an issue than it would be a few years ago.

In conclusion, image phase congruency can be used as a robust focus measure for noisy imaging sensors in varying illumination conditions for the purpose of autofocus.

Appendix A:

Four representative focus measures used for comparisons

Assume the focus window of a grayscale target image is $i (x, y)$ , and its Fourier transform $I (u, v)$ . The four image intensity based focus measures used in this report are normalized Image Variance (VAR), normalized Energy of Image Gradient (EIG), Energy of Spectrum (EOS) and Wavelet Band Ratio (WBR); more details can be found in reference [2].

Normalized Image Variance (VAR) is defined as

V A R (i) = \iint \frac{{[i (x, y) - \bar{i (x, y)}]}^{2}}{{\bar{i (x, y)}}^{2}} d x d y .

where

\bar{i (x, y)}

is the mean value of the image.

Normalized Energy of Image Gradient (EIG) is defined as

E I G (i) = \iint \frac{{(\frac{\partial i}{\partial x})}^{2} + {(\frac{\partial i}{\partial y})}^{2}}{{\bar{i (x, y)}}^{2}} d x d y .

Energy of Spectrum (EOS) is defined as

E O S (i) = \int_{0 +}^{\infty} \int_{0 +}^{\infty} I^{2} (u, v) d u d v .

which is the sum of the spectrum energy without the DC component and the first two low frequency components.

Wavelet Band Ratio (WBR) is defined as

W B R (i) = \frac{\iint \sum_{k = 2}^{4} {[c_{k} (u, v)]}^{2} d u d v}{\iint c_{1}^{2} (u, v) d u d v} .

where

c_{1} (u, v)

and

c_{k} (u, v)

(k = 2, 3, 4)

are the first-level and second-level outputs of the two-level Daubechies wavelet transform of

i (x, y)

.

References and links

1. M. Subbarao, T. Choi, and A. Nikzad, “Focusing techniques,” Opt. Eng. 32(11), 2824–2836 (1993). [CrossRef]

2. Y. Tian, K. Shieh, and C. F. Wildsoet, “Performance of focus measures in the presence of nondefocus aberrations,” J. Opt. Soc. Am. A 24(12), B165–B173 (2007), http://www.opticsinfobase.org/abstract.cfm?URI=josaa-24-12-B165. [CrossRef]

3. M. Subbarao and G. Surya, “Depth from defocus: A spatial domain approach,” Int. J. Comput. Vis. 13(3), 271–294 (1994). [CrossRef]

4. A. Pentland, S. Scherock, T. Darrell, and B. Girod, “Simple range cameras based on focal error,” J. Opt. Soc. Am. A 11(11), 2925–2934 (1994), http://www.opticsinfobase.org/abstract.cfm?URI=josaa-11-11-2925. [CrossRef]

5. S. K. Nayar, M. Watanabe, and M. Noguchi, “Real-time focus range sensor,” IEEE Trans. Pattern Anal. Mach. Intell. 18(12), 1186–1198 (1996). [CrossRef]

6. A. N. Rajagopalan and S. Chaudhuri, “A variational approach to recovering depth from defocused images,” IEEE Trans. Pattern Anal. Mach. Intell. 19(10), 1158–1164 (1997). [CrossRef]

7. V. Aslantas and D. T. Pham, “Depth from automatic defocusing,” Opt. Express 15(3), 1011–1023 (2007), http://www.opticsinfobase.org/abstract.cfm?URI=oe-15-3-1011. [CrossRef] [PubMed]

8. S. O. Shim and T. S. Choi, “Depth from focus based on combinatorial optimization,” Opt. Lett. 35(12), 1956–1958 (2010), http://www.opticsinfobase.org/abstract.cfm?URI=ol-35-12-1956. [CrossRef] [PubMed]

9. W. Huang and Z. Jing, “Evaluation of focus measures in multi-focus image fusion,” Pattern Recognit. Lett. 28(4), 493–500 (2007). [CrossRef]

10. V. Aslantas and R. Kurban, “A comparison of criterion functions for fusion of multi-focus noisy images,” Opt. Commun. 82(16), 3231–3242 (2009). [CrossRef]

11. S. Nayar and Y. Nakagawa, “Shape from focus,” IEEE Trans. Pattern Anal. Mach. Intell. 16(8), 824–831 (1994). [CrossRef]

12. P. Favaro, S. Soatto, M. Burger, and S. J. Osher, “Shape from defocus via diffusion,” IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 518–531 (2008). [CrossRef] [PubMed]

13. M. Subbarao and J. K. Tyan, “Selecting the optimal focus measure for autofocusing and depth-from-focus,” IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 864–870 (1998). [CrossRef]

14. T. Aydin and Y. S. Akgul, “An occlusion insensitive adaptive focus measurement method,” Opt. Express 18(13), 14212–14224 (2010), http://www.opticsinfobase.org/abstract.cfm?URI=oe-18-13-14212. [CrossRef] [PubMed]

15. M.C. Morrone, J. R. Ross, D.C. Burr, and R. A. Owens, “Mach bands are phase dependent,” Nature 324, 250–253 (1986). [CrossRef]

16. M. C. Morrone and R. A. Owens, “Feature detection from local energy,” Pattern Recognit. Lett. 6(5), 303–313 (1987). [CrossRef]

17. M. C. Morrone and D. C. Burr, “Feature detection in human vision: a phase-dependent energy model,” Proc. R. Soc. Lond. B Biol. Sci. 235(1280), 221–245 (1988). [CrossRef] [PubMed]

18. Z. Wang, and E. P. Simoncelli, “Local phase coherence and the perception of blur,” in Advances in Neural Information Processing Systems, S. Thurn, L. Saul, and B. Schölkopf, ed. (MIT Press, Boston, M.A., 2004), pp. 1435–1442. [PubMed]

19. P. Kovesi, “Image features from phase congruency,” VIDERE: J. Comput. Vis. Res. 1, 1–27 (1999).

20. D. J. Field, “Relations between the statistics of natural image and the response properties of cortical cells,” J. Opt. Soc. Am. 4(12), 2379–2394 (1987), http://www.opticsinfobase.org/abstract.cfm?URI=josaa-4-12-2379. [CrossRef]

21. Y. Tian, H. Feng, Z. Xu, and J. Huang, “Dynamic focus window selection strategy for digital cameras,” Proc. SPIE 5678, 219–229 (2005). [CrossRef]

22. Y. Tian, “Dynamic focus window selection using a statistical color model,” Proc. SPIE 6069, 98–106 (2006).

23. H. Lin, and K. Gu, “Depth recovery using defocus blur as infinity,” in Proceedings of the 19th International Conference on Pattern Recognitions, (IEEE, Tampa, FL, 2008), pp. 1–4.

Autofocus using image phase congruency

Abstract

1. Introduction

2. Method

2.1 Focus measurement using phase congruency

2.2 Focus window

2.3 Phase congruency focus measure evaluation

3. Experimental results

3.1 Ideal case - no noise

3.2 With three types of noise

3.3 With image contrast variations

4. Discussion and conclusion

Appendix A:

Four representative focus measures used for comparisons

References and links

Cited By

Figures (8)

Equations (10)

Optics Express