Image restoration for real-world under-display imaging

KeMing Gao; Meng Chang; Kunjun Jiang; Yaxu Wang; Zhihai Xu; Huajun Feng; Qi Li; Zengxin Hu; YueTing Chen

doi:10.1364/OE.441256

1. Introduction

With the increasing demands for full-screen mobile devices, under-display imaging technology has gradually developed recently. As a new type of optical imaging system, under-display imaging requires the display to be placed in front of the camera. While maintaining good imaging quality with the help of an appropriate algorithm, redundant bezels and buttons can be removed to obtain a higher screen-to-body ratio. In addition, under-display imaging can provide better human-computer interaction. By placing the camera in the center of the display, the teleconference experience is enhanced through perfect line-of-sight tracking, which is increasingly suitable for larger display devices such as laptops and TVs [1]. However, the image quality of the under-display devices is degraded. The display attenuates the energy of the transmitted light and introduces color distortion. In addition, as the display has a microscale microstructure, the optical diffraction effect is amplified, which leads to diffraction degradation. Under-display imaging faces multifaceted challenges. First, it needs to solve various types of image degradation including color distortion, blurring, noise, and diffraction artifacts around light sources. In addition, the difficulty of image registration [2–4] limits the possibility of capturing a large-scale dataset. Although current image dediffraction technology has made great progress [5,6], the algorithms have limited generalization ability for real-world under-display images, especially for images containing strong light sources.

Recently, Zhou et al. [5] first proposed the image degradation model for under-display imaging as:

(1)$$\mathbf{y} = (\gamma \mathbf{x})*\mathbf{k}+\mathbf{n} .$$

where x is the target object while y is the degraded observation. $\gamma$ is the intensity scaling factor denoting the color-shift degree. n is the zero-mean signal-dependent noise. k is the diffraction PSF. For a monochromatic plane wave with amplitude 1 and wavelength $\lambda$, the PSF $\boldsymbol{k}_\lambda $ is proportional to the Fourier transform of the transmittance function:

(2)$$\boldsymbol{k}_{\lambda}(u,v,\lambda) \propto \left| \iint^{\infty }_{-\infty }1 \cdot T^{U\!D}_p (m,n,\lambda)exp[{-}j\frac{2\pi}{\lambda f}(mu+nv)] \,dm\,dn \right|^2 .$$

where $T^{U\!D}_p(m,n,\lambda )$ is the transmittance function of the display and f is the focal length. However, the display is assumed to be at the principal plane of the lens, which may not be satisfied in real scenario. Therefore, it inspires us to calibrate the PSF of the optical system through multiexposure images experimentally. Although Zhou et al. obtained impressive results on the monitor-camera imaging system (MCIS) dataset, they did not present the performance on real-world images, especially results for diffraction artifacts around light sources.

Traditional image restoration technology usually starts from the image degradation model, finds appropriate image priors and designs a reasonable optimization algorithm to output final results. Existing image priors include the local smoothing prior [7,8], gradient sparsity prior [9,10], dark channel prior [11,12] and network-based image prior [13–16]. However, the performance of the traditional algorithms depends heavily on the accuracy of the degradation model. Recently, deep learning has made great progress in the field of low-level vision, such as denoising [17–20] , superresolution [21–23] , deblurring [24–27] , high dynamic range imaging [28–31] , reflection removal [32–34] , flare removal [35] , and optical aberration correction [36,37]. As deep learning is a data-driven method and can effectively fit the complex mapping process, it is suitable for under-display image restoration, which is a complex problem with various degradations.

In this paper, we propose a novel pipeline for building a synthetic dataset and train a two-stage neural network for under-display image restoration. Specifically, we capture a real-world dataset and carefully select well-aligned image pairs. We use the real-world image pairs to generate a color-correct dataset. In addition, we calibrate the PSF of our under-display camera, normalize its intensity, and use it to generate the CalibPSF dataset. We introduce artificial light sources into the dataset so that our network can effectively deal with diffraction artifacts. The two-stage network contains 1) a color-correct network (CCN), which is responsible for the color-correct task, and 2) an image restoration network (IRN), which deals with diffraction degradation. The CCN is trained on the color-correct dataset, and the IRN is trained on the CalibPSF dataset. At the inference stage, the original UD image is first sent to the CCN to align its color with that of the no-display (ND) image. Then the color-corrected UD images are fed into the IRN to output our final solution. Note that our model needs two differently exposed UD images (marked as EV0 and EV-3) to address the diffraction artifacts. All the proposed methods and pipelines are applied to images of the RAW domain. We test the performance of our model on a real-world test set containing various scenes of different dynamic ranges. Experimental results show that our algorithm can effectively deal with color distortion and diffraction degradation simultaneously and achieves competitive object scores compared with other methods. As shown in Fig. 1, our results even surpass the ND images visually with better contrast. Our final image restoration model is shown in Fig. 2.

Fig. 1. Our results for under-display image restoration. The final solution is even sharper and clearer than the no-display (ND) image. In addition, diffraction artifacts around the light source are removed effectively.

Download Full Size | PDF

Fig. 2. Our two-stage algorithm. We feed the EV0 UD image into the color-correct network and use the same output color vector to rectify the color of the EV0 and EV-3 UD images. Then we put two color-corrected UD images into the image restoration network to output our final solution.

Download Full Size | PDF

2. Analysis of the optical system

In this paper, we focus on solving under-display image degradation for OLED displays. Note that the display is nonactive in our paper since in real scenario, the OLED can be turned off locally when the camera is in operation to 1) reduce unnecessary difficulty from display contents while not affecting user experience and 2) provide users with the status of the device and thus ensure privacy. This strategy is commonly used for most under-display smartphones at present.

We calculate the PSF according to the optical principle. Then we acquire the calibrated PSF through a multiexposure calibration experiment. We denote the calculated PSF as SynPSF and the calibrated PSF as CalibPSF. The SynPSFs and CalibPSFs are shown in Fig. 3, and the acquisition processes are shown as follows.

Fig. 3. Our display pattern with the calculated PSFs and calibrated PSFs. (a) The transmittance function of the display. (b) The pupil function. (c) UD SynPSF. (d) ND SynPSF. (e) UD CalibPSF. (f) ND CalibPSF.

Download Full Size | PDF

2.1 Calculating the SynPSFs

Our goal is to calculate a three-channel PSF for both under-display and no-display cameras. We assume the display transparency with complex amplitude transmittance to be $T^{U\!D}(m,n)$ at Cartesian coordinates $(m, n)$. Let the camera pupil function be $P(m, n)$, which is shown in Fig. 3(b). Then the transmittance function of the display in Fig. 3(a) becomes:

(3)$$T^{U\!D}_p(m,n) = T^{U\!D}(m,n) \cdot P(m,n) .$$

For the no-display case, as there is no display in front of the camera, the transmittance function equals the pupil function:

(4)$$T^{N\!D}_p(m,n) = 1 \cdot P(m,n) = P(m,n) .$$

For a monochromatic plane wave with a wavelength of $\lambda$, the attenuation coefficient of the display to the amplitude is assumed to be $t_p(\lambda )$. Considering the amplitude attenuation, the above transmittance functions are modified to:

(5)$$T^{U\!D}_p(m,n,\lambda) = t_p(\lambda) \cdot T^{U\!D}_p(m,n) = t_p(\lambda) \cdot T^{U\!D}(m,n) \cdot P(m,n) .$$

(6)$$T^{N\!D}_p(m,n,\lambda) = 1 \cdot T^{N\!D}_p(m,n) = P(m,n) .$$

We substitute (5) into (2) to calculate the under-display PSF:

(7)$$\boldsymbol{k^{U\!D}_\lambda}(u,v,\lambda) \propto \left| \mathscr{F}(T^{U\!D}_p(m,n,\lambda)) \right|^2 = {t_p}^2(\lambda) \cdot \left| \mathscr{F}(T^{U\!D}_p(m,n)) \right|^2 .$$

where $\mathscr {F}(\cdot )$ is the Fourier transform operator. We denote the energy attenuation coefficient of the display as $\gamma _d(\lambda )$. The coefficient can be modeled as:

(8)$$\gamma_d(\lambda) = \frac{\sum_{u,v} \left| \mathscr{F}(T^{U\!D}_p(m,n,\lambda)) \right|^2}{\sum_{u,v} \left| \mathscr{F}(T^{N\!D}_p(m,n,\lambda)) \right|^2} = \frac{\sum_{u,v} \left| \mathscr{F}(t_p(\lambda) \cdot T^{U\!D}_p(m,n)) \right|^2}{\sum_{u,v} \left| \mathscr{F}(P(m,n)) \right|^2} \propto {t_p}^2(\lambda) .$$

Thus the under-display PSF is:

(9)$$\boldsymbol{k^{U\!D}_\lambda}(u,v,\lambda) \propto \gamma_d(\lambda) \cdot \left| \mathscr{F}(T^{U\!D}_p(m,n)) \right|^2 .$$

To obtain a three-channel kernel, we multiply the sensor’s spectral energy attenuation curve $\gamma _s(\lambda )$ by $\boldsymbol{k_\lambda }$. For a channel c, the spectral curve of the channel is assumed to be $\gamma _{s,c}(\lambda )$. Then the PSF is:

(10)$$\boldsymbol{k^{U\!D}_c}(u,v) \propto \sum_{\lambda} \Big(\gamma_{s,c}(\lambda) \cdot \boldsymbol{k^{U\!D}_\lambda}(u,v,\lambda)\Big) \propto \sum_{\lambda} \Big(\gamma_{s,c}(\lambda) \cdot \gamma_d(\lambda) \cdot \left| \mathscr{F}_{\downarrow r(\lambda)}(T^{U\!D}_p(m,n)) \right|^2\Big) .$$

where c represents the R, G or B channels. For discrete Fourier transform (DFT), $\mathscr {F}(T^{U\!D}_p(m,n))$ needs spatial downsampling, and the downsampling factor r is a function of wavelength $\lambda$. Details can be found in [5]. Similarly, we replace $T^{U\!D}_p(m,n,\lambda )$ with $T^{N\!D}_p(m,n,\lambda )$ in (2), and then the ND PSF can be calculated by:

(11)$$\boldsymbol{k^{N\!D}_c}(u,v) \propto \sum_{\lambda} \Big(\gamma_{s,c}(\lambda) \cdot \left| \mathscr{F}_{\downarrow r(\lambda)}(T^{N\!D}_p(m,n)) \right|^2\Big) .$$

Finally, we normalize the SynPSFs. Thus the three-channel SynPSFs become:

(12)$$\boldsymbol{k^{U\!D}_{syn}}(u,v) = \Big(\frac{\boldsymbol{k^{U\!D}_r}(u,v)}{\sum_{u,v}{\boldsymbol{k^{U\!D}_r}(u,v)}}, \frac{\boldsymbol{k^{U\!D}_g}(u,v)}{\sum_{u,v}{\boldsymbol{k^{U\!D}_g}(u,v)}}, \frac{\boldsymbol{k^{U\!D}_b}(u,v)}{\sum_{u,v}{\boldsymbol{k^{U\!D}_b}(u,v)}}\Big) .$$

(13)$$\boldsymbol{k^{N\!D}_{syn}}(u,v) = \Big(\frac{\boldsymbol{k^{N\!D}_r}(u,v)}{\sum_{u,v}{\boldsymbol{k^{N\!D}_r}(u,v)}}, \frac{\boldsymbol{k^{N\!D}_g}(u,v)}{\sum_{u,v}{\boldsymbol{k^{N\!D}_g}(u,v)}}, \frac{\boldsymbol{k^{N\!D}_b}(u,v)}{\sum_{u,v}{\boldsymbol{k^{N\!D}_b}(u,v)}}\Big) .$$

We calibrate the display’s spectral response curve $\gamma _d(\lambda )$ and the sensor’s curve $\gamma _s(\lambda )$, which are shown in Fig. 4. The calculated three-channel PSFs are shown in Fig. 3(c)(d), which are $\boldsymbol{k^{N\!D}_{syn}}(u,v)$ and $\boldsymbol{k^{N\!D}_{syn}}(u,v)$, respectively.

Fig. 4. Two spectral response curves. (a) Display’s curve $\gamma _d(\lambda )$. (b) Sensor’s curve $\gamma _s(\lambda )$.

Download Full Size | PDF

2.2 Calibrating the CalibPSFs

To calibrate the UD and ND CalibPSFs, we place a light source behind a pinhole in a dark room to simulate the point light source. Note that the pinhole has a certain size and can be considered as an ideal point only if it is placed far enough from the camera. Given the pinhole size, the focal length, and the sensor’s pixel size, we calculate the critical distance through Gaussian lens formula. The distance between the pinhole and our imaging system is modified to be larger than the critical distance. Thus the pinhole imaged on the sensor is smaller than one pixel and can be considered as an ideal point.

We capture 16 differently exposed images of the point light source with 16 frames per exposure. Then, we average the 16 frames per exposure to suppress noise and improve calibration accuracy. Finally, we synthesize the denoised multiexposure image into a single high dynamic range (HDR) image of the point light source and normalize the intensity of the three channels separately. We denote the i-th LDR image from the 16 differently exposed images as $L_i$, and its exposure time and gain are denoted as $t_i$ and $g_i$, respectively. The HDR image $H$ can be calculated through Algorithm 1.

We calibrate both UD PSF and ND PSF shown in Fig. 5 with exposure settings shown in Table 1 and align them on a spacial scale based on a gravity-center alignment algorithm. Note that although the ND CalibPSF is still larger than one pixel, it is caused by optical aberrations and diffraction rather than the pinhole within the critical distance.

Fig. 5. Our multiexposure experiment to acquire the CalibPSFs. We calculate the HDR UD and ND CalibPSFs according to the multiexposure images and normalize the PSFs’ RGB channels. (a) Captured multiexposure UD images. (b) Captured multiexposure ND images. (c) Normalized UD CalibPSF . (d) Normaized ND CalibPSF.

Download Full Size | PDF

Table 1. Exposure parameter settings in the multiexposure experiment. We record the exposure time (ms) and gain for UD images in Fig. 5(a) and ND images in Fig. 5(b).

View Table | View all tables in this article

3. Under-display imaging dataset

The color distortion problem is related to illumination conditions. The CalibPSFs are calibrated under a specific illumination condition, which means that the PSFs cannot adequately simulate the color distortion in complicated scenes. Therefore, we normalize the PSFs to generate the CalibPSF dataset and build the color-correct dataset to specifically solve the color-distortion problem.

We capture a real-world under-display imaging dataset containing UD and ND image pairs. The real-world dataset is split into two parts. One contains most of the image pairs and is used to generate the color-correct dataset, while the other is used to evaluate the performance of different methods. We denote the former as the real-world training set and the latter real-world test set. In addition, we use our normalized CalibPSFs to generate the CalibPSF dataset. The color-correct dataset is the training set for CCN, and the CalibPSF dataset is for IRN. To show the superiority of our datasets, we use the SynPSFs to generate a SynPSF dataset and capture an MCIS dataset. To demonstrate the effectiveness of our two-stage network, we use the unnormalized CalibPSFs to generate a CalibPSF end2end dataset.

3.1 Capturing the real-world dataset

Our captured real-world dataset contains various indoor and outdoor scenes. It also contains many high dynamic range scenes with different types of light sources. When taking a UD image, we capture an EV 0 (normal-exposure) frame and another EV -3 (under-exposure) frame. For the corresponding ND image, we preserve only the normal-exposure frame, which has the same exposure settings as the EV 0 UD image. We capture 5 frames per exposure and average them to suppress noise. After that, we conduct a feature point-based registration algorithm for UD-ND image pairs. Since most images may contain unregistered areas, we cut the images into patches and select well-aligned UD and ND pairs manually. Some samples of our real-world dataset are shown in Fig. 6.

Fig. 6. Our captured real-world dataset with UD and ND images of different dynamic ranges.

Download Full Size | PDF

3.2 Generating the color-correct dataset

We use the normal-exposed image pairs in the real-world training set and calculate the average RGB values of the UD and ND images separately. Then, we divide the ND average values by that of UD to obtain a color gain vector that indicates the color difference of the UD and ND images. After that, we multiply the color gain vector and the original UD image to obtain a color-corrected UD GT image. The UD and the color-corrected UD GT pairs form the color-correct dataset. When training the CCN, we input the original normal-exposure UD image and use the color-corrected UD GT image as the reference to supervise the output (color-corrected UD). Our pipeline to generate the color-correct dataset is shown in Fig. 7.

Fig. 7. Our color-correct pipeline.

Download Full Size | PDF

3.3 Generating the CalibPSF dataset

We use the two normalized CalibPSFs to generate the CalibPSF dataset. Specifically, we collect normally exposed raw images from Adobe5K as our source materials. The source raw image is first downsampled [38] to output a three-channel scene image. To address diffraction artifacts, we generate several artificial light sources (random light images) with random shapes, sizes and intensities. We denote the superposition of the random light image and scene image as the hypothetical HDR scene, which indicates the light intensity distribution of the hypothetical scene:

(14) $$I_h = I_s + I_l .$$

where $I_h$, $I_s$, and $I_l$ are the hypothetical HDR scene, scene image, and random light image, respectively. When creating the normal-exposure UD image $x_0$, we convolute the normalized UD PSF on $I_h$ and perform Bayer sampling to acquire a raw UD image. Afterward, we decrease its intensity to obtain the under-exposure UD image $x_{-3}$. We add heteroscedastic Gaussian noise [39,40] to the generated UD images:

(15)$$n(I)\sim \mathcal{N}(0, \sigma_s I + \sigma_r) .$$

where the noise variance depends on the pixel intensity $I$. The noise parameters $\sigma _s$ and $\sigma _r$ are determined by the gain values and camera sensor. Then, we clip the intensity range of both UD images into [0, 1]. The simulation becomes:

(16)$$x_0 = clip\Big(I_h * \boldsymbol{k^{U\!D}_{calib}} + n(I_h * \boldsymbol{k^{U\!D}_{calib}})\Big) .$$

(17)$$x_{{-}3} = clip\Big(\frac{1}{8}I_h * \boldsymbol{k^{U\!D}_{calib}} + n(\frac{1}{8}I_h * \boldsymbol{k^{U\!D}_{calib}})\Big) .$$

Theoretically, the process to synthesize the normal-exposure ND image $y_0$ is to replace $\boldsymbol{k^{U\!D}_{calib}}$ with $\boldsymbol{k^{N\!D}_{calib}}$. However, to instruct our network to output sharper and clearer results, we choose to skip the convolution on the $I_s$ and convolute the normalized ND PSF only on the $I_l$. That is:

(18)$$y_0 = clip\Big(I_s + I_l * \boldsymbol{k^{N\!D}_{calib}}\Big) .$$

The synthetic UD-ND image pairs are fed into the IRN for training. Although we calibrate the PSFs of a fixed object distance, the experimental results show that our model can generalize well to real-world data of different depths. Our simulation pipeline is shown in Fig. 8.

Fig. 8. Our pipeline to generate the CalibPSF dataset.

Download Full Size | PDF

3.4 Other datasets

To prove the superiority of the CalibPSFs, we replace the CalibPSFs with the SynPSFs and use the same pipeline to generate a SynPSF dataset. We use the unnormalized CalibPSFs and generate a CalibPSF end2end dataset, which is used to train an end-to-end model. In addition, we also capture pictures displayed on an HD monitor and create an MCIS dataset [5]. The real-world training set is also trained for the comparative experiment.

4. Image restoration network

For the CCN, we build a VGG-based neural network (as shown in Fig. 9) to rectify the color distortion. The network calculates a vector after several convolution and downsampling operations. The vector is multiplied by the input UD image to obtain the color-corrected UD image. We use the $L_1$ loss to train the CCN. The output of the network is denoted as $\hat {I}_C$, while the color-corrected UD GT image is $I_C$. The color loss function $L_C$ is:

(19)$$L_C = \big\| \hat{I}_C - I_C \big\| _1 \;$$

Fig. 9. Our VGG-based color-correct network.

Download Full Size | PDF

For the IRN, we use our CalibPSF dataset to train a UNet-based convolutional neural network. We split the UD image into four channels according to its Bayer pattern and concatenate the normal-exposure image and under-exposure image to obtain an eight-channel input. The network accepts the input and calculates feature maps of four different scales. For each scale, we replace the convolution layers with a ResBlock. In addition, we add a global connection between the input and output to conduct residual learning. Our IRN is shown in Fig. 10. We minimize the $L_1$ loss between the output $\hat {I}_{R}$ and ND image $I_{R}$ in the intensity and gradient domains. Thus, the restoration loss function $L_R$ can be expressed as:

(20)$$L_R = \big\| \hat{I}_R - I_R \big\| _1 + \lambda \big\| \nabla\hat{I}_R - \nabla I_R \big\| _1 \;$$

where $\lambda$ is used to balance two loss functions.

The above two networks solve the color distortion and diffraction degradation respectively, and then we cascade them to form our final image restoration model. When testing real-world data, we first input the normal-exposure UD image into the CCN and obtain the predicted color gain vector. Then we use the vector to correct both the EV0 UD image and EV-3 UD image and feed them into the IRN to obtain our final output, as shown in Fig. 2.

Fig. 10. Our UNet-based image restoration network.

Download Full Size | PDF

5. Experimental results

We use the captured real-world test set mentioned in Section 3.1 to evaluate the performance of our method. Our test set contains 50 images that involve only normal intensity scenes and 40 images containing strong light sources. We evaluate the performance of our two networks separately. We compare the CCN with a fixed coefficient correction method. We train the IRN on the CalibPSF dataset, SynPSF dataset, real-world training set, and MCIS dataset, respectively. We use the color-corrected UD GT image as the input for the real-world dataset and MCIS dataset so that the IRN only processes image pairs without color distortion. In addition, we train an end-to-end model using the CalibPSF end2end dataset. We assess their final output on the cascaded model.

Details of the training sets are as follows. The real-world training set contains 2354 image pairs with 1632$\times$1632 resolution. As the color-correct dataset is derived from the real-world training set, it has the same scale as the real-world training set. For the CalibPSF dataset, we crop the original raw image from the Adobe5K dataset into 1632$\times$1632 patches and obtain a total number of 21826 patches. These patches are downsampled and serve as the scene images mentioned in Section 3.3. Then, we randomly generate 4000 light images, and each light image contains 1 to 4 artificial light sources. The shapes of artificial light sources are randomly generated from common light source shapes, including circles, ellipses, rectangles, straight lines, etc. The intensity of the light sources is randomly sampled from [40, 200], while the dynamic range of the scene image is [0, 1]. Scene images and random light images are randomly combined to create synthetic UD images of different exposures. Note that not all of the synthetic UD images contain an artificial light source. We add artificial light sources to 70$\%$ of the Scene Images. The noise parameters $\sigma _s$ and $\sigma _r$ are sampled from [$10^{-4}$, $10^{-5}$] and [$10^{-5}$, $10^{-4}$], respectively. Settings for the SynPSF dataset and the CalibPSF end2end dataset are the same as those of the CalibPSF dataset. The MCIS dataset contains 690 raw image pairs with a resolution of 2120$\times$4210 from DIV2K.

When training our CCN and IRN, the patch size is set to 512 and the batch size is 8. We use the Adam optimizer with an initial learning rate of $10^{-4}$. For the CCN, we iterate 200 epochs in total and decrease the learning rate by half after every 60 epochs. For the IRN, we iterate 200 epochs for the CalibPSF dataset and SynPSF dataset, 220 epochs for the real-world training set, and 800 epochs for the MCIS dataset. Every 60 epochs, we decrease the learning rate by half for the CalibPSF dataset, SynPSF dataset, and real-world training set. For the MCIS dataset, we set the milestone to 180 epochs. For the end-to-end model, we iterate 200 epochs in total and set the milestone to 60 epochs.

5.1 Color-correction results

In this part, we exhibit the performance of our CCN compared with a spectrum calibration method. Specifically, we multiply the display’s spectral curve and the sensor’s spectral curve. Then we sum the values of the three channels separately and obtain a three-dimensional vector. In addition, we sum the values of the sensor curve and obtain another vector. Finally, the former vector is divided by the latter to calculate the color-correct vector $v_s$, as we consider the image sensor to not cause the color-distortion problem between UD and ND images. The process can be described as:

(21)$$v_s = \Big( \frac{\sum_\lambda{(\gamma_d(\lambda) \cdot \gamma_{s,r}(\lambda)})}{\sum_\lambda{\gamma_{s,r}(\lambda)}}, \frac{\sum_\lambda{(\gamma_d(\lambda) \cdot \gamma_{s,g}(\lambda)})}{\sum_\lambda{\gamma_{s,g}(\lambda)}}, \frac{\sum_\lambda{(\gamma_d(\lambda) \cdot \gamma_{s,b}(\lambda)})}{\sum_\lambda{\gamma_{s,b}(\lambda)}} \Big) .$$

We use $v_s$ to correct the color distortion. $v_s$ plays the same role as the vector $[\hat {R}, \hat {G}, \hat {G}, \hat {B}]$ in Fig. 9 but does not vary from the input UD image. The results of the calibration method are denoted as calibration UD. We calculate the PSNR and angular error [41] of two color-correction methods in our test set with ND images as the reference. Objective results are shown in Table 2. Note that we exclude overexposed pixels from the angular error calculation to avoid interference. Both methods can effectively solve the color-distortion problem. Our CNN-based color-correction method can self-adaptively adjust the color according to the input UD image and is superior to the fixed coefficient correction method. This is evidence that the color distortion depends on the scene and indicates that the unnormalized CalibPSFs cannot adequately simulate the color distortion in complicated scenes. The visual results are shown in Fig. 11. Our results are closer to the ND images in color and brightness.

Fig. 11. Visual comparison between two color-correct methods.

Download Full Size | PDF

Table 2. PSNR and angular error (degree) of two different color-correct methods.

View Table | View all tables in this article

5.2 Cascade results of different types of datasets

In this part, we compare the cascade results of the different datasets. As shown in Fig. 12, the SynPSF dataset and MCIS dataset struggle to generalize to real-world data in various scenes. Although the real-world dataset can output merely good results for normal-exposure scenes, diffraction artifacts cannot be well eliminated when strong light sources exist. Color distortion and diffraction artifacts remain in the outputs of the CalibPSF end2end dataset, which demonstrates the necessity of treating color distortion and diffraction degradation separately. The results of our CalibPSF dataset can indeed surpass other results with higher contrast and better visual effects. In addition, diffraction artifacts can be removed effectively, and the texture near the light source is restored to a certain extent. As we use the original scene images as ground truth when generating the CalibPSF dataset, we can obtain even sharper outputs than real-world ND images. We calculate PSNR, SSIM, and NIQE [42] for the five models on the test set. As Table 3 shows, our CalibPSF dataset achieves competitive scores.

Fig. 12. Visual comparison of the results on five different models. Our CalibPSF dataset-based model obtains the best contrast and even surpasses the ND images. Diffraction artifacts are effectively suppressed, and the overexposure area around strong light sources is reduced.

Download Full Size | PDF

Table 3. Objective metrics of the different models on our test set. All five models can effectively improve the quality of the UD images. However, our CalibPSF dataset-based model exhibits competitive performance and achieves the best PSNR and NIQE.

View Table | View all tables in this article

6. Conclusion

In this paper, we introduced an effective two-stage neural network for under-display image restoration based on a novel CalibPSF dataset. By calibrating the PSFs of the under-display optical system, we conveniently generated a large quantity of well-aligned synthetic data. In addition, we designed a two-stage neural network that self-adaptively solved the color-distortion problem and restored the diffraction degraded image. Our method is shown to generalize well to real-world data compared with other methods and even surpasses the ND images with better contrast.

Funding

ZJU-Sunny Photonics Innovation Center (2020-08).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. V. D. J. Evans, X. Jiang, A. E. Rubin, M. Hershenson, and X. Miao, “Optical sensors disposed beneath the display of an electronic device,” (2019). US Patent App. 16/450, 727.

2. G. Ward, “Fast, robust image registration for compositing high dynamic range photographs from hand-held exposures,” J. graphics tools 8(2), 17–30 (2003). [CrossRef]

3. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis. 60(2), 91–110 (2004). [CrossRef]

4. P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, “Deepflow: Large displacement optical flow with deep matching,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2013), pp. 1385–1392.

5. Y. Zhou, D. Ren, N. Emerton, S. Lim, and T. Large, “Image restoration for under-display camera,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2021), pp. 9179–9188.

6. R. Feng, C. Li, H. Chen, S. Li, C. C. Loy, and J. Gu, “Removing diffraction image artifacts in under-display camera via dynamic skip connection network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2021), pp. 662–671.

7. S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regularization method for total variation-based image restoration,” Multiscale Model. & Simul. 4(2), 460–489 (2005). [CrossRef]

8. J. Xu and S. Osher, “Iterative regularization and nonlinear inverse scale space applied to wavelet-based denoising,” IEEE Trans. on Image Process. 16(2), 534–544 (2007). [CrossRef]

9. R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman, “Removing camera shake from a single photograph,” in ACM SIGGRAPH (2006), pp. 787–794.

10. L. Xu, S. Zheng, and J. Jia, “Unnatural l0 sparse representation for natural image deblurring,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (IEEE, 2013), pp. 1107–1114.

11. K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2011). [CrossRef]

12. J. Pan, D. Sun, H. Pfister, and M.-H. Yang, “Blind image deblurring using dark channel prior,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 1628–1636.

13. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (IEEE, 2018), pp. 9446–9454.

14. Y. Gandelsman, A. Shocher, and M. Irani, “"double-dip": Unsupervised image decomposition via coupled deepimage-priors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 11026–11035.

15. X. Pan, X. Zhan, B. Dai, D. Lin, C. C. Loy, and P. Luo, “Exploiting deep generative prior for versatile image restoration and manipulation,,” in European Conference on Computer Vision (Springer, 2020), pp. 262–277.

16. T. R. Shaham, T. Dekel, and T. Michaeli, “Singan: Learning a generative model from a single natural image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (IEEE, 2019), pp. 4570–4580.

17. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2017). [CrossRef]

18. K. Zhang, W. Zuo, and L. Zhang, “FFDNet: Toward a fast and flexible solution for cnn-based image denoising,” IEEE Trans. on Image Process. 27(9), 4608–4622 (2018). [CrossRef]

19. S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang, “Toward convolutional blind denoising of real photographs,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 1712–1722.

20. M. Chang, Q. Li, H. Feng, and Z. Xu, “Spatial-adaptive network for single image denoising,” in European Conference on Computer Vision (Springer, 2020), pp. 171–187.

21. C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in European Conference on Computer Vision (Springer, 2014), pp. 184–199.

22. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 4681–4690.

23. W. Yang, F. Zhou, R. Zhu, K. Fukui, G. Wang, and J.-H. Xue, “Deep learning for image super-resolution,” Neurocomputing 398, 291–292 (2020). [CrossRef]

24. C. J. Schuler, M. Hirsch, S. Harmeling, and B. Schölkopf, “Learning to deblur,” IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1439–1451 (2015). [CrossRef]

25. S. Nah, T. Hyun Kim, and K. Mu Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 3883–3891.

26. O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan: Blind motion deblurring using conditional adversarial networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (IEEE, 2018), pp. 8183–8192.

27. H. Zhang, Y. Dai, H. Li, and P. Koniusz, “Deep stacked hierarchical multi-patch network for image deblurring,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 5978–5986.

28. J. Li and P. Fang, “Hdrnet: Single-image-based hdr reconstruction using channel attention cnn,” in Proceedings of the 2019 4th International Conference on Multimedia Systems and Signal Processing (2019), pp. 119–124.

29. Q. Yan, D. Gong, Q. Shi, A. v. d. Hengel, C. Shen, I. Reid, and Y. Zhang, “Attention-guided network for ghost-free high dynamic range imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 1751–1760.

30. N. K. Kalantari and R. Ramamoorthi, “Deep high dynamic range imaging of dynamic scenes,” ACM Trans. Graph. 36(4), 1–12 (2017). [CrossRef]

31. Q. Sun, E. Tseng, Q. Fu, W. Heidrich, and F. Heide, “Learning rank-1 diffractive optics for single-shot high dynamic range imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2020), pp. 1386–1396.

32. Q. Fan, J. Yang, G. Hua, B. Chen, and D. Wipf, “A generic deep architecture for single image reflection removal and image smoothing,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2017), pp. 3238–3247.

33. K. Wei, J. Yang, Y. Fu, D. Wipf, and H. Huang, “Single image reflection removal exploiting misaligned training data and network enhancements,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 8178–8187.

34. X. Zhang, R. Ng, and Q. Chen, “Single image reflection separation with perceptual losses,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (IEEE, 2018), pp. 4786–4794.

35. Y. Wu, Q. He, T. Xue, R. Garg, J. Chen, A. Veeraraghavan, and J. Barron, “Single-image lens flare removal,” arXiv preprint arXiv:2011.12485 (2020).

36. S. Chen, H. Feng, D. Pan, Z. Xu, Q. Li, and Y. Chen, “Optical aberrations correction in postprocessing using imagingsimulation,” ACM Trans. Graph. 40(5), 1–15 (2021). [CrossRef]

37. S. Chen, H. Feng, K. Gao, Z. Xu, and Y. Chen, “Extreme-quality computational imaging via degradation framework,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE, 2021), pp. 2632–2641.

38. X. Xu, Y. Ma, and W. Sun, “Towards real scene super-resolution with raw images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2019), pp. 1723–1731.

39. C. Liu, R. Szeliski, S. B. Kang, C. L. Zitnick, and W. T. Freeman, “Automatic estimation and removal of noise from asingle image,” IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 299–314 (2008). [CrossRef]

40. T. Brooks, B. Mildenhall, T. Xue, J. Chen, D. Sharlet, and J. T. Barron, “Unprocessing images for learned rawdenoising,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2019), pp. 11036–11045.

41. A. Gijsenij, T. Gevers, and M. P. Lucassen, “Perceptual analysis of distance measures for color constancy algorithms,” J. Opt. Soc. Am. A 26(10), 2243–2256 (2009). [CrossRef]

42. A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a "completely blind" image quality analyzer,” IEEE Signal processing letters 20(3), 209–212 (2012). [CrossRef]

	Multi-exposure UD image				Multi-exposure ND image
time/gain	0004/1	0008/1	0016/1	0032/1	0001/1	0002/1	0004/1	0008/1
	0064/1	0128/1	0300/1	0400/1	0016/1	0032/1	0064/1	0128/1
	0600/1	0700/1	0800/1	0900/1	0200/1	0400/1	0600/1	0800/1
	1000/1	1000/2	1000/4	1000/8	1000/1	1000/2	1000/4	1000/8

Method	PSNR $↑$	Angular Error $↓$
Original UD	10.68	6.4004
Color-Corrected UD GT	27.75	0.2325
Calibration UD	22.86	0.9155
Color-Corrected UD	25.15	0.5998

Method	PSNR $↑$	SSIM $↑$	NIQE $↓$
Original UD	10.68	0.3196	6.9243
Color-Corrected UD	25.15	0.9115	6.6102
SynPSF Dataset	25.97	0.9130	5.4392
MCIS Dataset	25.78	0.9486	6.2929
Real-World Dataset	26.70	0.9791	5.5965
CalibPSF end2end Dataset	18.10	0.9417	5.8208
CalibPSF Dataset	27.25	0.9604	5.4092

	Multi-exposure UD image				Multi-exposure ND image
time/gain	0004/1	0008/1	0016/1	0032/1	0001/1	0002/1	0004/1	0008/1
	0064/1	0128/1	0300/1	0400/1	0016/1	0032/1	0064/1	0128/1
	0600/1	0700/1	0800/1	0900/1	0200/1	0400/1	0600/1	0800/1
	1000/1	1000/2	1000/4	1000/8	1000/1	1000/2	1000/4	1000/8

Method	PSNR $↑$	Angular Error $↓$
Original UD	10.68	6.4004
Color-Corrected UD GT	27.75	0.2325
Calibration UD	22.86	0.9155
Color-Corrected UD	25.15	0.5998

Image restoration for real-world under-display imaging

Abstract

1. Introduction

2. Analysis of the optical system

2.1 Calculating the SynPSFs

2.2 Calibrating the CalibPSFs

3. Under-display imaging dataset

3.1 Capturing the real-world dataset

3.2 Generating the color-correct dataset

3.3 Generating the CalibPSF dataset

3.4 Other datasets

4. Image restoration network

5. Experimental results

5.1 Color-correction results

5.2 Cascade results of different types of datasets

6. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Tables (3)

Equations (21)

Optics Express