Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Extended depth-of-field infrared imaging with deeply learned wavefront coding

Open Access Open Access

Abstract

Wavefront coding (WFC) techniques, including optical coding and digital image processing stages, enable significant capabilities for extending the depth of field of imaging systems. In this study, we demonstrated a deeply learned far-infrared WFC camera with an extended depth of field. We designed and optimized a high-order polynomial phase mask by a genetic algorithm, exhibiting a higher defocus consistency of the modulated transfer functions than works published previously. Additionally, we trained a generative adversarial network based on a synthesized WFC dataset for the digital processing part, which is more effective and robust than conventional decoding methods. Furthermore, we captured real-world infrared images using the WFC camera with far, middle, and near object distances. Their results after wavefront coding/decoding showed that the model of deeply learned networks improves the image quality and signal-to-noise ratio significantly and quickly. Therefore, we construct a novel artificial intelligent method of deeply learned WFC optical imaging by applying infrared wavelengths, but not limited to, and provide good potential for its practical application in “smart” imaging and large range target detection.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Wavefront coding (WFC) is a technique that jointly combines optical coding and digital image processing stages to extend the depth of field (DOF), which was initially proposed by Dowski and Cathey [1]. Typically, a WFC imaging system is composed of an optical coding stage where a cubic phase mask is inserted into the pupil plane and a digital decoding stage to restore the encoded images captured from the detector. Encoded images captured by the detector are blurred after optical coding, and both the point spread function (PSF) and modulation transfer function (MTF) show significant consistency enhancement over an extended defocus range. The phase mask is designed to generate a series of defocus invariant PSFs/MTFs, and a sharp image can be restored via an appropriate deconvolution digital algorithm with a given PSF. Compared with traditional optical focusing approaches [2,3], the WFC technology can achieve an extended DOF and large aperture at the same time. Considering these advantages, the WFC technique has been widely used in many fields, such as the extension of the DOF [46], the athermalization of infrared optical systems [7], long-range dynamic target detection [5], microscopic imaging [8,9], etc.

However, it is still challenging to improve imaging performance and increase the signal-to-noise ratio in the current WFC systems. Efforts are spent to solve these problems. On the one hand, to enhance the defocus consistency of MTF, the cubic phase mask can be replaced by an element with other different types of phase modulation, such as logarithmic [10], square root [11], tangent [12], inverse tangent [13], and generic polynomial [14,15]. Except for cubic phase masks, most phase masks are still applied and studied only in ideal optical systems [16]. On the other hand, the noise of the imaging system is easily amplified in the decoding part. The traditional decoding methods commonly used are the Wiener deconvolution algorithm [17] and Lucy-Richardson (LR) algorithm [18]. However, the Wiener filtering easily generates artifacts, and nonlinear signal processing always takes a long time to calculate [19]. The deep learning framework, known as a more effective decoding tool, has been proven valid in the WFC system [20]. The WFC technology based on deep learning has been applied in the field of visible light [5,21]. In addition, end-to-end design using deep learning is also a new research direction, which can realize the joint optimization of optical design and digital image processing [2224].

In this work, to further extend the DOF and reduce the influence of intrinsic aberrations and environmental noise in the far-infrared WFC camera, we designed and optimized a high-order polynomial phase mask by a genetic algorithm combined with a generative adversarial network (GAN) as the decoding algorithm. The proposed new high-order polynomial phase mask exhibits a higher consistency of MTF than previous works, leading to at least 10 times the extension of the DOF. With regard to the digital image processing stage, a synthetic WFC dataset with various types of PSFs and noise was quickly generated to train the GAN model. The experimental results of the synthetic dataset demonstrate that neural networks are more robust to images with aberrations over traditional algorithms, especially the z-tilt aberration, whose average peak signal-to-noise ratio (PSNR) of the synthetic dataset was up to 21. Furthermore, the WFC imaging camera was validated at 3 m, 10 m, and 100 m working distances in the real world. The experimental results of the realistic scenario revealed that the imaging performance was significantly improved by combining the jointly optimized phase mask and digital imaging stages, where the image details were restored and the noise was suppressed. In conclusion, this work developed and implemented a practical application of a deeply learned WFC in an infrared imaging technique. The proposed approach was more effective, robust, and faster than earlier works. It possessed a high imaging performance and a high DOF extension ratio, showing promising potential for enabling novel high-performance extended-DOF imaging systems working in, but not limited to, infrared bands.

2. Design and methods

2.1 Schematic of the infrared WFC camera

To verify the effect of the WFC technique, a conventional infrared imaging camera was designed with an F-number of 1, consisting of three infrared chalcogenide lenses and an uncooled infrared detector (Xcore HD series). The optical specifications of the conventional infrared camera are shown in Table 1.

Tables Icon

Table 1. Parameters of the conventional infrared camera

According to empirical formula [25], the depth of the focus of the infrared camera was ${\pm} 2\lambda {F^2}$=${\pm} 20$µm, which is the DOF in the image space. The formula of the relationship between image distance and object distance in an ideal situationis given by:

$$1/f = 1/{l_0} + 1/{l_i}$$
where ${l_0}$ is the object distance, ${l_i}$ is the image distance, and f is the focal length. When ${l_i}$ = 48.25${\pm}$ 0.02 mm, the DOF is 1.484 m. In the same way, when ${l_i}$ is 48.25${\pm}$ 0.20 mm, the DOF is 40.96 m. The extension of the depth of focus is equivalent to the extension of the DOF.

Then, a pre-designed germanium-material phase mask was placed on the pupil plane of the conventional infrared camera to make a WFC camera, and a digital processing stage was used to decode images captured by the camera, as shown in Fig. 1(a). The WFC demonstrator camera is shown in Fig. 1(b). We designed a clamp of the phase mask that allowed the infrared camera to switch between conventional and WFC modes, making it easier to compare the image performance between them, containing the extension of the DOF and image quality.

 figure: Fig. 1.

Fig. 1. (a) The WFC imaging system consists of a phase mask, a conventional camera, and a digital decoding stage. (b) The WFC demonstrator camera.

Download Full Size | PDF

2.2 Design and optimization of the phase mask

In this work, a new high-order polynomial (HOP) phase mask was proposed that obtained better defocus consistency than other types of phase masks. The function of the HOP phase mask is presented as:

$$z(x,y) = \alpha ({x^3} + {y^3}) + \beta ({x^7} + {y^7}) + \gamma ({x^{11}} + {y^{11}})$$
where $\alpha $, $\beta $, and $\gamma $ are the HOP phase mask parameters, and x and y are x- and y-coordinates in lens units.

To compare the performance of different phase masks, the parameters should be optimized under the same conditions. The cosine similarity can evaluate the similarity between the two functions, presented as:

$$\cos (\theta )= \frac{{\left\langle {a,b} \right\rangle }}{{||a ||\cdot ||b ||}},\theta \in (0,2\pi )$$
where the symbol $\left\langle \cdot \right\rangle$ is the inner product, and the symbol $||\cdot ||$ is the norm. In Eq. (3), the smaller the angle between functions a and b, the more similar the two parts are. We used cosine similarity to compare the similarity between the in-focus MTF and defocus MTFs. The parameters in Eq. (2) should balance the imaging performance between the recoverability and similarity [26], and the merit function can be presented as:
$$\left\{ {\begin{array}{l} {\arg \min \sum\limits_\lambda {\sum\limits_\psi {\sum\limits_i^{FOV(i)} {{{\cos }^{ - 1}}\left( {\frac{{\left\langle {H(\psi ,FOV(i),\lambda ),H(0,FOV(i),\lambda )} \right\rangle }}{{||{H(\psi ,FOV(i),\lambda )} ||\cdot ||{H(0,FOV(i),\lambda )} ||}}} \right)} } } }\\ {s.t.\min (H(0,0,\lambda )) \ge TH} \end{array}} \right.$$
where $H(\psi ,FOV(i),\lambda )$ is the array of the MTF, $\psi$ is the defocus distance, $FOV(i)$ is the angle of the FOV, $\lambda$ is the wavelength, and $TH$ is the minimum acceptable integral area of the MTF. The MTF of the primary wavelength, which was in the focus position and central FOV, was considered the reference function. The sum of the cosine similarity of the reference function and other MTFs was the merit function of the genetic algorithm, which comprehensively considered aberrations, FOVs, defocus distances, wavelengths, recovery, and other factors.

Figure 2 shows the optimization process of the phase mask parameters. We used a dynamic data exchange interface between Zemax OpticStudio and mathematical software (MATLAB) to optimize the parameters of the phase mask. The Nyquist frequency of the optical system in Zemax OpticStudio was 40 lp/mm, and the MTF sampling interval was 0.2 lp/mm. The optimization process was based on a genetic algorithm (GA). The first step was to generate an initial randomly generated population, which is iterated over several generations until the convergence condition is satisfied. The process of iterations had three main genetic operators: selection, crossover, and mutation. Zemax OpticStudio received optimized parameters returned by the genetic algorithm (GA) and calculated MTF values of the WFC optical systems with multiple wavelengths, defocus distances, and FOVs, whose defocus range was from -0.2 mm to 0.2 mm. The focal depth of the system was extended ten times. MTF values were returned to the MATLAB program to calculate the evaluation function (fitness function) and return new parameters. If the minimum MTF value of the primary wavelength was less than $TH = 0.1$, the fitness function returned NaN (not a number) and the optimized parameters were invalid.

 figure: Fig. 2.

Fig. 2. Optimization process of the phase mask parameters. Left: The process of calculating MTFs of the WFC camera by ZEAMX. Right: The process of the genetic algorithm program.

Download Full Size | PDF

2.3 Defocus consistency of the HOP phase mask

In contrast to previous phase mask types, including logarithmic [10], cubic [1], square root [11], tangent [12], inverse tangent [13], and HOP phase mask, the HOP phase mask has better defocus consistency performance. Six phase mask parameters were optimized under the same conditions, and phase functions were implemented through C language compiled DLL files in Zemax OpticStudio, considering the practical infrared optical camera. The final optimized parameters of the phase masks are given in Table 2. Obviously, the HOP phase mask has the smallest evaluation function value, which means the best defocus consistency.

Tables Icon

Table 2. Optimization results of phase mask parameters

Figure 3 compares the MTF curves of phase masks. The MTF curves have a high defocus consistency in the WFC camera with different defocus distances and FOVs, and no zeros over all frequency ranges. The HOP phase mask effectively stabilizes the oscillation of MTFs on the whole frequency, especially on the low frequency (2∼13 lp/mm), demonstrating superior property in restoring contour information.

 figure: Fig. 3.

Fig. 3. Simulated MTF curves of WFC systems at different FOVs and defocus distances, corresponding to different phase masks: logarithmic (a), cubic (b), square root (c), tangent (d), inverse tangent (e), and ours (f). The defocus distances are 0 mm, 0.1 mm, and 0.2 mm. The FOVs are 0°, 6°, and 9°. T: tangential. S: sagittal direction. The green rectangle amplifies the oscillations at the low spatial frequency (2∼13 lp/mm).

Download Full Size | PDF

The 2D phase mask profiles are shown in Fig. 4(a). Since these phase masks were oddly symmetric, we only showed the results on the x-positive half axis. As shown in Fig. 4(b), the farther away from the focus distance, the larger the merit function value. The defocus distance ranged from -0.2 mm to 0.2 mm. The defocusing performance of the phase masks was also symmetrical, which means that the WFC camera can extend the DOF well within or beyond the working range. The HOP phase mask has the smallest average merit function values among all phase masks, which performs better on defocus consistency and DOF extension than others.

 figure: Fig. 4.

Fig. 4. (a) Profiles of cubic, square root, tangent, inverse tangent, and our phase mask. (b) Merit functions of cubic, square root, tangent, inverse tangent, and our phase masks at different defocus distances.

Download Full Size | PDF

2.4 Comparison of the infrared conventional and WFC camera

Before optimization, the size of the simulated PSFs at the centre view changes rapidly with increasing defocus distance, causing optical information to be lost, which means that the camera can only take a clear image near the focal length, as shown in Fig. 5(a). After the optimization, there is a high consistency in the simulated PSFs at the centre view shaped like a series of similar isosceles triangles, as shown in Fig. 5(b). Despite the larger sizes of simulated PSFs, the high consistency of PSFs is beneficial to the recovery of encoded images, which a unified filter can decode.

 figure: Fig. 5.

Fig. 5. (a) Simulated PSFs of the conventional infrared camera. (b) Simulated PSFs of the WFC camera. The PSFs are at the centre FOV, and the defocus distance range is from -0.4 mm to 0.4 mm with an increment of 0.04 mm defocus.

Download Full Size | PDF

2.5 Tolerance analysis of the phase mask

Tolerance analysis is an important process in optical design. Table 3 shows the surface error and tolerance value of the phase­mask. There is an 80% probability that the value of MTF is higher than 0.075 at 40 lp/mm in the Monte Carlo simulation. The phase mask of the WFC system has a loose tolerance.

Tables Icon

Table 3. Surface error and assembly tolerance of phase mask

The single-point diamond cutting method on Ametek Precitech Freeform Ultra-precision Machine Tools was used to fabricate the phase mask. The total aperture size of the phase mask was 48 mm with a thickness of 4 mm. The infrared antireflection film was designed and deposited on phase mask substrates for 8∼12 µm. The phase mask surface figure error is shown in Fig. 6. The value of PV was 2.7 µm, which meets the processing requirement.

 figure: Fig. 6.

Fig. 6. Surface figure error of the phase mask.

Download Full Size | PDF

3. Decoding model

3.1 Dataset generation

When the x or y decenter is increased, the shape of the x or y direction PSFs on the image plane is distorted in the WFC system. A small x or y tilt has little effect on the shape of the PSFs, but a z tilt seriously affects the rotation angle of the PSFs. Simulated PSFs with the decenter and tilt error of the WFC camera are shown in Fig. 7(a). There were 20 simulated PSFs with different aberrations collected, containing defocus, decenter and tilt errors, where defocus distances were from -2 mm to 2 mm, x- or y-direction decenter distances were from -3 mm to 3 mm, angles of the x- or y-direction tilt are from -10° to 10°, and angles of the z-direction tilt were from 0° to 180°.

 figure: Fig. 7.

Fig. 7. (a) Simulated PSFs with the decenter and tilt error. The x- or y-direction center distances are 0, 3, and 5 mm. The x- or y-direction tilt angles are 0°, 5°, and 10°. The angles of the z-direction tilt are 0°, 5°, and 20°. (b) The process of WFC dataset generation. PSFs consist of calibrated PSFs and simulative PSFs.

Download Full Size | PDF

In addition, there were 10 calibrated PSFs that further enriched the dataset from actual coded/sharp image pairs without additional measurements or calibration. We calibrated PSFs via a scale invariant feature transform (SIFT) descriptor and random sample consensus (RANSAC) from similar image pairs [27]. The RANSAC algorithm can ignore these partial part mismatches (such as defocus and movement) in image pairs. The main steps of the PSF calibration are as follows:

  • 1: Use the conventional and WFC cameras to capture two sets of similar scenes (sharp/coded). Use the SIFT descriptor to register the images ${I^{_{sharp}}}$ and ${I^{_{WFC}}}$, which are evenly divided into N equal size pairs of images $I_i^{sharp}$ and $I_i^{WFC}$, $i \in \{ 1,2,\ldots ,N\} $.
  • 2: Randomly select one image pair to compute the PSF:
    $${K_i} = {F^{ - 1}}\{ \frac{{{F^\ast }(I_i^{sharp})F(I_i^{WFC})}}{{{F^\ast }(I_i^{sharp})F(I_i^{WFC})}}\}$$
    where F and ${F^{ - 1}}$ are the Fourier transform and the inverse Fourier transform, respectively, and the symbol ${\ast} $ represents the complex conjugate.
  • 3: Use ${K_i}$ to calculate the $j - th$ error function ${E_j}$ of different image pairs in all regions:
    $${E_j} = {(I_{ij}^{WFC} - I_{ij}^{sharp} \otimes {K_i})^2},j \in \{ 1,2,\ldots ,N\}$$
    where ${\otimes}$ is the convolution operator.
  • 4: Compute the set of inliers based on ${E_j}$ from whole image pairs. Re-estimate the PSF using all the inlier pairs.
    $${K_i} = {F^{ - 1}}\{ \frac{{\sum\nolimits_{i = 1}^M {{F^\ast }(I_i^{sharp})F(I_i^{WFC})} }}{{\sum\nolimits_{i = 1}^M {{F^\ast }(I_i^{sharp})F(I_i^{WFC})} }}\}$$
  • 5: Repeat 3-4 until the number of inliers reaches the maximum and obtain the PSF.
  • 6: Use a low pass filter to remove the noise of the PSF.

To avoid overfitting the neural network, data enhancement for PSFs of various aberrations is necessary, including flip, rotation, zoom, random additive noise, etc. Original datasets based on physically calibrated optical aberrations of the WFC camera produced training and validation datasets. The original datasets in this work were as follows:

  • • Far infrared datasets, LLVIP [28] and FLIR-ADAS [29], where 6,000 sharp images are randomly selected.
  • • High-contrast visible datasets, DIV2K [30] and Flickr2K [31], where 2,715 sharp images are randomly selected, and gamma corrections were applied.

These sharp images were selected to generate a WFC encoded dataset. Each image of the dataset was convolved with a randomly varying PSF to obtain an intermediate encoded image.

Additive Gaussian white noise (AGWN) of 10∼20 dB was randomly added to make the dataset more realistic. The degradation function is shown in Eq.(8).

$${I^{WFC}} = {I^{ori}} \otimes PS{F_{enhanced}} + {\varepsilon _{AGWN}}$$
where ${I^{WF\textrm{C}}}$ is the image in the generated WFC encoded dataset, ${I^{\textrm{ori}}}$ is the image in the original dataset, ${\otimes}$ is the convolution operator, $PS{F_{enhanced}}$ is a kernel randomly selected from enhanced PSFs, and ${\varepsilon _{AGWN}}$ is random AGWN.

The process of generating the WFC dataset is shown in Fig. 7(b). We quickly obtained the dataset that conforms to the aberration distribution and sample diversity. Therefore, the approach to generating the synthesized WFC dataset reduces the work required to obtain the actual dataset and enriches the training data.

3.2 Generative adversarial network framework

The DeblurGAN model was a classic deblur framework, composed of generator and discriminator, which was based on ResNets [32]. To reduce training time, the generator used five residual blocks instead of nine in this work, as shown in Fig. 8. The model was applied to restore the intermediate encoded images to sharp decoded images on 256 × 256 resolution images, which was implemented by the PyTorch deep learning framework and was trained for approximately 56 hours with NVIDIA GeForce RTX 2060. The experiment used the same computer configuration. The generator network generated encoded maps to ensure that space information was restored and suppressed noise in the decoding process, and the discriminator network distinguished the generator’s fake data from real data during the generation process.

 figure: Fig. 8.

Fig. 8. Overview of the decoding model architecture.

Download Full Size | PDF

The generator contained 3 downsampled convolutional layers, 5 ResBlocks, and 3 transposed convolution layers. Each ResBlock contained a convolution layer, instance normalization layer, and ReLU activation. The loss function of the generator was a combination of adversarial loss and content loss:

$$L = {L_{adv}} + 100{L_{content}}$$
where ${L_{adv}}$ was the value of the adversarial loss, and ${L_{content}}$ was the value of the content loss. The adversarial loss of the generator was a distance from WGAN-GP [33]:
$${L_{adv}} = \sum\limits_{n = 1}^N { - {D_{{\theta _D}}}} ({G_{{\theta _G}}}(I))$$
where ${D_{{\theta _D}}}$ and ${G_{{\theta _G}}}$ were parameters of the discriminator and generator, respectively, and N was the number of critic iterations per generator iteration.

The content loss was a perceptual loss, which was based on Euclidean distance [34]:

$${L_{content}} = \frac{1}{{{W_{i,j}}{H_{i,j}}}}\sum\limits_{x = 1}^{{W_{i,j}}} {\sum\limits_{y = 1}^{{H_{i,j}}} {{{({\phi _{i,j}}{{({I^S})}_{x,y}} - {\phi _{i,j}}{{({G_{{\theta _G}}}({I^B}))}_{x,y}})}^2}} }$$
where ${\phi _{i,j}}$ was the feature map obtained by the ${j^{th}}$ convolution before the ${i^{th}}$ maxpooling layer within the VGG19 network, and ${W_{i,j}}$ and ${H_{i,j}}$ were width and height of the feature maps.

The discriminator contained 5 convolutional layers, 4 LeakyReLU activations, and 3 InstanceNorm layers, which were identical to 70 × 70 PatchGAN [35], and the input size was 256. Loss of discriminator adopted WGAN-GP [33].

4. Simulation results and analysis

The average PSNR values on the synthetic training and validation WFC datasets are 20.62 and 21, respectively. The average Structural Similarity Index (SSIM) values on the synthetic training and validation WFC datasets are 0.8254 and 0.8164, respectively. Decoding results of different decoding approaches are shown in Fig. 9, including Wiener [17], LR [18], CycleGAN [36], and ours. The experimental results on the synthetic WFC dataset show that the proposed GAN framework effectively restores details and suppresses noise. The decoding speed of the proposed model was approximately 43 fps for 256 × 256 resolution images.

 figure: Fig. 9.

Fig. 9. Decoding results of the synthetic WFC dataset (256 × 256 resolution). The columns from left to right are: Original images, encoded images, Wiener, LR, CycleGAN, and ours.

Download Full Size | PDF

Assembly errors of the optical system lead to serious aberrations in practice [37]. Traditional decoding algorithms rely on a fixed PSF, so the allowable installation error of the phase mask was very small, especially the z-tilt error. As shown in Fig. 10(a), z-tilt error always occurs when rotating the HOP phase mask of the WFC camera, which affects the direction of PSFs. When installing the phase mask repeatedly, the traditional approaches need to calibrate the PSF, which is not conducive to fast switching between the WFC and conventional camera modes. The GAN model can decode the intermediate encoded image when the PSF is unknown. After reassembling the phase mask, the network works well without recalibration or retraining. As shown in Fig. 10(b), with the increment of the PSF angle error, the simulated results of traditional decoding algorithms worsen, while the GAN model performs well. The comparisons revealed that the neural network model was robust to the diversity of z-rotate aberrations.

 figure: Fig. 10.

Fig. 10. (a) Schematic diagram of the rotating effect. (b) Simulated results on binary images of different decoding algorithms: Wiener (no calibrated), LR (no calibrated), and ours.

Download Full Size | PDF

5. Experimental results and analysis

In this section, experimental results on infrared imaging are illustrated. By adjusting the focus ring of the lens manually, the camera’s working distance of different objects changes. The conventional camera produces clear images only within the working distance range and blurred images beyond the working distance. The greater the working distance is, the greater the DOF.

An optical setup was built to test the camera's performance between conventional and WFC modes in real-world imagery. The first image group was taken with the initial configuration where the lens was focused at 100 m, and target object distances were larger than 100 m, as shown in Fig. 11(a). Theoretically, the target was beyond the working range of the WFC camera and had serious artifacts at the edges of the buildings, but clear details can still be decoded. The second image group was taken with a 10 m focus distance and 10 m∼50 m target object distance, as shown in Fig. 11(b). Target objects larger than 10 m were blurred in the conventional image and sharp in the decoded image. In addition, some edges of the target objects have obvious hazes of the encoded image, which was improved and enhanced in the decoded image. The third image group was taken with a 3 m focus distance and 1.5 m∼11 m target object distance, as shown in Fig. 11(c). The results demonstrated that parts of the encoded image above or below the working distance were clearly imaged. Therefore, the proposed method is capable of enhancing the details and suppressing the noise of decoded images in the real world. The model decodes 3.8∼4.5 fps for 1024 × 768 resolution.

 figure: Fig. 11.

Fig. 11. Experimental capture results of the conventional and WFC cameras with different object distances. (a) Far scene. (b) Middle scene. (c) Near scene. Left: The image of the conventional camera. Center: The encoded image of the WFC camera. Right: The decoded image of the WFC camera. Below: Magnified images of the red (green)-box regions.

Download Full Size | PDF

To quantify the imaging property between the wavefront coding camera and the conventional camera, we applied the BRISQUE image quality evaluator [38] to the images in Fig. 11. A smaller BRISQUE score indicated a better perceptual quality of the image. The evaluation indices of conventional, intermediate encoded, and decoded images are shown in Table 4, proving that the proposed model can obtain a smaller index than the conventional camera in the actual experiment. Comparing real-world infrared images captured by the WFC camera with far, middle, and near object distances, it is clear that the proposed approach shows a significant improvement in image quality and signal-to-noise ratio.

Tables Icon

Table 4. BRISQUE values of the conventional and WFC images

6. Conclusion and future work

In this paper, a deeply learned WFC computational imaging system is presented for a large DOF. It is based on a proposed HOP phase mask with better defocus consistency and a far infrared WFC camera with F-number 1. In the decoding stage, a WFC dataset considering various PSFs is generated, and a GAN model is trained, which performs better in simulation and experiments than other methods.

Future work could be focused on applying rotationally symmetric phase masks, improving the architecture of the neural networks, increasing computational efficiency, optimizing in an end-to-end optimization pipeline, and expanding the application field of WFC technology.

Funding

National Key Research and Development Program of China (2020YFB2007600); Natural Science Foundation of Jiangxi Province (20212BAB202026).

Acknowledgments

The authors thank the Shanghai Engineering Research Centre of Ultra-Precision Optical Manufacturing, Fudan University, Shanghai Engineering Research Center of AI & Robotics, and Engineering Research Center of AI & Robotics, Ministry of Education, for their support.

Disclosures

The authors declare no conflicts of interest.

Data availability

The authors declare that all data and methods supporting the results reported in this study are available within the main text. Additional data used for the study are available from the corresponding author upon reasonable request.

References

1. E. R. Dowski and W. T. Cathey, “Extended depth of field through wave-front coding,” Appl. Opt. 34(11), 1859–1866 (1995). [CrossRef]  

2. J. Wang, A. Amani, C. Zhu, and J. Bai, “Design of a compact varifocal panoramic system based on the mechanical zoom method,” Appl. Opt. 60(22), 6448–6455 (2021). [CrossRef]  

3. L. Sun, S. Sheng, W. Meng, Y. Wang, Q. Ou, and X. Pu, “Design of spherical aberration free liquid-filled cylindrical zoom lenses over a wide focal length range based on ZEMAX,” Opt. Express 28(5), 6806–6819 (2020). [CrossRef]  

4. S. Colburn, A. Zhan, and A. Majumdar, “Varifocal zoom imaging with large area focal length adjustable metalenses,” Optica 5(7), 825–831 (2018). [CrossRef]  

5. U. Akpinar, E. Sahin, M. Meem, R. Menon, and A. Gotchev, “Learning Wavefront Coding for Extended Depth of Field Imaging,” IEEE Transactions on Image Processing 30, 3307–3320 (2021). [CrossRef]  

6. C.-F. Lee and C.-C. Lee, “Application of a cubic phase plate to a reflecting telescope for extension of depth of field,” Appl. Opt. 59(14), 4410–4415 (2020). [CrossRef]  

7. H. Xie, Y. Su, M. Zhu, L. Yang, S. Wang, X. Wang, and T. Yang, “Athermalization of infrared optical system through wavefront coding,” Opt. Commun. 441, 106–112 (2019). [CrossRef]  

8. J. Chung, G. W. Martinez, K. C. Lencioni, S. R. Sadda, and C. Yang, “Computational aberration compensation by coded-aperture-based correction of aberration obtained from optical Fourier coding and blur estimation,” Optica 6(5), 647–661 (2019). [CrossRef]  

9. X. Wei, J. Han, S. Xie, B. Yang, X. Wan, and W. Zhang, “Experimental analysis of a wavefront coding system with a phase plate in different surfaces,” Appl. Opt. 58(33), 9195–9200 (2019). [CrossRef]  

10. H. Zhao, Q. Li, and H. Feng, “Improved logarithmic phase mask to extend the depth of field of an incoherent imaging system,” Opt. Lett. 33(11), 1171–1173 (2008). [CrossRef]  

11. L. Van Nhu, F. Zhigang, M. Nghia Pham, and C. Shouqian, “Optimized square-root phase mask to generate defocus-invariant modulation transfer function in hybrid imaging systems,” Opt. Eng. 54, 1–7 (2015). [CrossRef]  

12. V. N. Le, S. Chen, and Z. Fan, “Optimized asymmetrical tangent phase mask to obtain defocus invariant modulation transfer function in incoherent imaging systems,” Opt. Lett. 39(7), 2171–2174 (2014). [CrossRef]  

13. M. Takahashi and S. Komatsu, “Evaluation of inverse tangent phase mask in wavefront coding,” in 22nd Microoptics Conference (MOC) (IEEE, 2017), 158–159.

14. N. Caron and Y. Sheng, “Polynomial phase masks for extending the depth of field of a microscope,” Appl. Opt. 47(22), E39–E43 (2008). [CrossRef]  

15. Y. Takahashi and S. Komatsu, “Optimized free-form phase mask for extension of depth of field in wavefront-coded imaging,” Opt. Lett. 33(13), 1515–1517 (2008). [CrossRef]  

16. Y. Lu, T. Zhao, X. Zhang, R. Qiu, A. Liu, R. Chen, and F. Yu, “Integrative optimization of the practical wavefront coding systems for depth-of-field extension,” Optik 144, 621–627 (2017). [CrossRef]  

17. J. Arines, R. O. Hernandez, S. Sinzinger, A. Grewe, and E. Acosta, “Wavefront-coding technique for inexpensive and robust retinal imaging,” Opt. Lett. 39(13), 3986–3988 (2014). [CrossRef]  

18. W. H. Richardson, “Bayesian-Based Iterative Method of Image Restoration,” J. Opt. Soc. Am. 62(1), 55–59 (1972). [CrossRef]  

19. R. N. Zahreddine, R. H. Cormack, and C. J. Cogswell, “Noise removal in extended depth of field microscope images through nonlinear signal processing,” Appl. Opt. 52(10), D1–D11 (2013). [CrossRef]  

20. H. Du, L. Dong, M. Liu, Y. Zhao, W. Jia, X. Liu, M. Hui, L. Kong, and Q. Hao, “Image Restoration Based on Deep Convolutional Network in Wavefront Coding Imaging System,” in Digital Image Computing: Techniques and Applications (DICTA, 2018), 1–8.

21. S. Tan, Y. Wu, S.-I. Yu, and A. Veeraraghavan, “Codedstereo: Learned phase masks for large depth-of-field stereo,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), 7170–7179.

22. S. Elmalem, R. Giryes, and E. Marom, “Learned phase coded aperture for the benefit of depth of field extension,” Opt. Express 26(12), 15316–15331 (2018). [CrossRef]  

23. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” 37, Article 114 (2018).

24. U. Akpinar, E. Sahin, and A. Gotchev, “Learning Optimal Phase-Coded Aperture for Depth of Field Extension,” in IEEE International Conference on Image Processing (ICIP) (2019), 4315–4319.

25. J. Zhang, Y. Huang, and F. Xiong, “Short-focus and ultra-wide-angle lens design in wavefront coding,” International Symposium on Optoelectronic Technology and Application (SPIE, 2016), Vol. 10154.

26. H. Du, L. Dong, M. Liu, Y. Zhao, Y. Wu, X. Li, W. Jia, X. Liu, M. Hui, and L. Kong, “Increasing aperture and depth of field simultaneously with wavefront coding technology,” Appl. Opt. 58(17), 4746–4752 (2019). [CrossRef]  

27. J. Duan, G. Meng, S. Xiang, and C. Pan, “Removing out-of-focus blur from similar image pairs,” in IEEE International Conference on Acoustics, Speech and Signal Processing (2013), 1617–1621.

28. X. Jia, C. Zhu, M. Li, W. Tang, and W. Zhou, “LLVIP: A Visible-infrared Paired Dataset for Low-light Vision,” 3489-3497 (2021).

29. F. A. Group, “FREE Teledyne FLIR Thermal Dataset for Algorithm Training,” Teledyne FLIR (2022.01.19, 2021), https://www.flir.in/oem/adas/adas-dataset-form/.

30. E. Agustsson and R. Timofte, “NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2017), 1122–1131.

31. E. Agustsson and R. Timofte, “NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2017), 1110–1121.

32. O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 8183–8192.

33. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasserstein GANs,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (Curran Associates Inc., 2017), pp. 5769–5779.

34. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution,” in Computer Vision – ECCV (Springer, 2016), 694–711.

35. P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 5967–5976.

36. J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” in IEEE International Conference on Computer Vision (ICCV) (2017), 2242–2251.

37. Q. Fan, W. Xu, X. Hu, W. Zhu, T. Yue, C. Zhang, F. Yan, L. Chen, H. J. Lezec, Y. Lu, A. Agrawal, and T. Xu, “Trilobite-inspired neural nanophotonic light-field camera with extreme depth-of-field,” Nat. Commun. 13(1), 2130 (2022). [CrossRef]  

38. A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-Reference Image Quality Assessment in the Spatial Domain,” IEEE Trans. on Image Process. 21(12), 4695–4708 (2012). [CrossRef]  

Data availability

The authors declare that all data and methods supporting the results reported in this study are available within the main text. Additional data used for the study are available from the corresponding author upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1.
Fig. 1. (a) The WFC imaging system consists of a phase mask, a conventional camera, and a digital decoding stage. (b) The WFC demonstrator camera.
Fig. 2.
Fig. 2. Optimization process of the phase mask parameters. Left: The process of calculating MTFs of the WFC camera by ZEAMX. Right: The process of the genetic algorithm program.
Fig. 3.
Fig. 3. Simulated MTF curves of WFC systems at different FOVs and defocus distances, corresponding to different phase masks: logarithmic (a), cubic (b), square root (c), tangent (d), inverse tangent (e), and ours (f). The defocus distances are 0 mm, 0.1 mm, and 0.2 mm. The FOVs are 0°, 6°, and 9°. T: tangential. S: sagittal direction. The green rectangle amplifies the oscillations at the low spatial frequency (2∼13 lp/mm).
Fig. 4.
Fig. 4. (a) Profiles of cubic, square root, tangent, inverse tangent, and our phase mask. (b) Merit functions of cubic, square root, tangent, inverse tangent, and our phase masks at different defocus distances.
Fig. 5.
Fig. 5. (a) Simulated PSFs of the conventional infrared camera. (b) Simulated PSFs of the WFC camera. The PSFs are at the centre FOV, and the defocus distance range is from -0.4 mm to 0.4 mm with an increment of 0.04 mm defocus.
Fig. 6.
Fig. 6. Surface figure error of the phase mask.
Fig. 7.
Fig. 7. (a) Simulated PSFs with the decenter and tilt error. The x- or y-direction center distances are 0, 3, and 5 mm. The x- or y-direction tilt angles are 0°, 5°, and 10°. The angles of the z-direction tilt are 0°, 5°, and 20°. (b) The process of WFC dataset generation. PSFs consist of calibrated PSFs and simulative PSFs.
Fig. 8.
Fig. 8. Overview of the decoding model architecture.
Fig. 9.
Fig. 9. Decoding results of the synthetic WFC dataset (256 × 256 resolution). The columns from left to right are: Original images, encoded images, Wiener, LR, CycleGAN, and ours.
Fig. 10.
Fig. 10. (a) Schematic diagram of the rotating effect. (b) Simulated results on binary images of different decoding algorithms: Wiener (no calibrated), LR (no calibrated), and ours.
Fig. 11.
Fig. 11. Experimental capture results of the conventional and WFC cameras with different object distances. (a) Far scene. (b) Middle scene. (c) Near scene. Left: The image of the conventional camera. Center: The encoded image of the WFC camera. Right: The decoded image of the WFC camera. Below: Magnified images of the red (green)-box regions.

Tables (4)

Tables Icon

Table 1. Parameters of the conventional infrared camera

Tables Icon

Table 2. Optimization results of phase mask parameters

Tables Icon

Table 3. Surface error and assembly tolerance of phase mask

Tables Icon

Table 4. BRISQUE values of the conventional and WFC images

Equations (11)

Equations on this page are rendered with MathJax. Learn more.

1 / f = 1 / l 0 + 1 / l i
z ( x , y ) = α ( x 3 + y 3 ) + β ( x 7 + y 7 ) + γ ( x 11 + y 11 )
cos ( θ ) = a , b | | a | | | | b | | , θ ( 0 , 2 π )
{ arg min λ ψ i F O V ( i ) cos 1 ( H ( ψ , F O V ( i ) , λ ) , H ( 0 , F O V ( i ) , λ ) | | H ( ψ , F O V ( i ) , λ ) | | | | H ( 0 , F O V ( i ) , λ ) | | ) s . t . min ( H ( 0 , 0 , λ ) ) T H
K i = F 1 { F ( I i s h a r p ) F ( I i W F C ) F ( I i s h a r p ) F ( I i W F C ) }
E j = ( I i j W F C I i j s h a r p K i ) 2 , j { 1 , 2 , , N }
K i = F 1 { i = 1 M F ( I i s h a r p ) F ( I i W F C ) i = 1 M F ( I i s h a r p ) F ( I i W F C ) }
I W F C = I o r i P S F e n h a n c e d + ε A G W N
L = L a d v + 100 L c o n t e n t
L a d v = n = 1 N D θ D ( G θ G ( I ) )
L c o n t e n t = 1 W i , j H i , j x = 1 W i , j y = 1 H i , j ( ϕ i , j ( I S ) x , y ϕ i , j ( G θ G ( I B ) ) x , y ) 2
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.