Polarized image super-resolution via a deep convolutional neural network

Haofeng Hu; Haofeng Hu; Shiyao Yang; Xiaobo Li; Zhenzhou Cheng; Tiegen Liu; Jingsheng Zhai

doi:10.1364/OE.479700

1. Introduction

Polarimetric imaging systems can measure the polarization state of light reflected or transmitted by the target objects, and thus have been widely used in various applications, including imaging in remote sensing [1–3], surface detection [4], object identification for autonomous vehicles [5], etc. However, the reduced image resolution, which may be caused by the inherent limitation of imaging sensors or the down-sampling introduced by related processes, significantly affects the practical performance of these applications. In other words, detailed polarization information may be lost, and such small targets or targets with weak polarization signals cannot be detected [6]. Therefore, an effective resolution improvement, or super-resolution (SR) reconstruction algorithm for polarized images is of great significance.

SR reconstruction aims at recovering a high-resolution (HR) signal, image as an example, from a low-resolution (LR) one. Traditional SR methods mainly include the interpolation-based methods (e.g., the bilinear interpolation and the bicubic interpolation [7],) and model-based methods (e.g., based on the sparse representation and statistical prediction [8,9].) Yet, the interpolation-based methods can produce rapid results, but suffer low interpolation accuracy. While the model-based methods always provide better performance at the expense of high computational complexity [10]. Recently, convolutional neural networks (CNNs) have been successfully applied for the SR and achieved remarkable reconstruction performance with higher computational efficiency [11–16].

Although the above-mentioned SR methods have achieved satisfactory results, they can not be directly applied to the task for polarized images. This is because that, compared with intensity-mode images, polarized images have more channels, which correspond to different polarization proprieties, making it difficult to handle the reconstruction processing in all channels. Besides, most polarization parameters, i.e., the degree of linear polarization (DoLP) and the angle of polarization (AoP), were calculated from these channels via nonlinear ways; therefore, polarization SR reconstruction must consider the cross-link among such channels and should pay additional attention not only to recovering intensity information, but to polarization information. In other words, how to balance the reconstruction of both intensity and polarization information is a major challenge to the polarized image SR task.

It’s worth noting that, a similar topic is the demosaicing for polarized images [17–20]. For example, Zhang et al. designed a network (PDCNN) [19], in which the mosaic polarized images are input into a CNN using skip connections to output full-resolution four-channel polarized images. The Fork-Net, proposed by Zeng et al. [20], uses an end-to-end structure from mosaic polarized images to full-resolution intensity, DoLP, and AoP images. All the used mosaic (i.e., their so-called LR) polarized images in these methods are generated by only a down-sampling process. Although both demosaicing and the SR focus on improving images’ resolution, the degradation mechanisms of the two are different. In particular, when performing SR operation, we must consider mechanisms of blurring, down-sampling, and noising [21].

In this paper, different from traditional demosaicing methods, we considered two degradation models of polarized image to simulate image degradation in practice. Based on the models, a CNN-based main structure and a well-designed polarization loss function for the polarization SR reconstruction are proposed, which balance the restoration of intensity information and polarization information. Experimental results show that the proposed method is superior to other methods in both quantitative evaluation, i.e., the peak-signal-to-noise-ratio (PSNR) [22] and the structural similarity (SSIM) [23], and visual effect evaluation for the images obtained by the two degradation models. Especially, the method can effectively restore the polarization information from LR images with a maximum scaling factor of four.

2. Proposed method

2.1 Super-resolution model of polarized images based on Stokes measurement

In practice, we record four intensity images with different polarization directions, i.e., $0^{\circ }$, $45^{\circ }$, $90^{\circ }$, and $135^{\circ }$. Let $I_{0}$, $I_{45}$, $I_{90}$, and $I_{135}$ to calculate the linear Stokes vector, i.e., $\mathbf {S} = \left [S_0, S_1, S_2\right ]^T$. The calculation is expressed by Eq. (1) [24,25]:

(1)$$\begin{aligned} & S_{0}=\frac{1}{2}(I_{0}+I_{45}+I_{90}+I_{135}),\\ & S_{1}=I_{0}-I_{90},\\ & S_{2}=I_{45}-I_{135}. \end{aligned}$$

Based on the Stokes vector, we can deduce two other essential polarization information, i.e., DoLP and AoP, by the following expressions:

(2)$$DoLP=\frac{\sqrt{S_{1}^2+S_{2}^2}} {S_{0}},\quad AoP=\frac{1}{2} \tan^{{-}1}(\frac{S_{2}}{S_{1}}).$$

The SR of the Stokes vector is based on reconstructing the HR polarized image $I^{HR}_\theta$ from the LR polarized image $I^{LR}_\theta$, and then getting the Stokes vector S of HR polarized image. The basic model can be expressed as [26,27]:

(3)$$I^{LR}_\theta=D(I^{HR}_\theta) =D(\mathbb{W}_\theta \mathbf{S}),\quad \theta = \{0^{{\circ}}, 45^{{\circ}}, 90^{{\circ}}, 135^{{\circ}}\},$$

where $D(\cdot )$ denotes the degradation operator; $\mathbb {W}$ defined in Eq. (4) is the measurement matrix depending on the polarimetric imaging system, and each row vector, i.e., $\mathbb {W}_\theta$, is the measurement vector related to the four different polarization directions [28].

(4)$${\mathbb{W}} = \frac{1}{2}\left[ {\begin{array}{*{20}{c}} 1 & 1 & 0\\ 1 & 0 & 1\\ 1 & { - 1} & 0\\ 1 & 0 & { - 1} \end{array}} \right].$$

The degradation model in Eq. (3) defines the relationship between the HR and LR images. Notably, in some special applications, the practical image degradation always suffers from the influence of three factors, i.e., the blurring, down-sampling and noising [29]. Based on this fact, there are two representative degradation models including

• the traditional degradation (TD) model, which only considers the down-sampling with the scaling factor $\alpha$, i.e., $I_{\theta }^{LR}=(I_{\theta }^{HR})_{\downarrow \alpha }$[7];
• the improved degradation (ID) model, which also considers the influence of the noising (i.e., $N$) and blurring (i.e., $\otimes k$), i.e., $I_{\theta }^{LR}=(I_{\theta }^{HR}\otimes k)_{\downarrow \alpha }+N$ [30].

The scaling factor $\alpha$ is the magnification of the LR image resolution, and its value needs to change for varying users’ demands. The ID model degraded the resolution of an image by three steps: 1) blurring the image $I_{\theta }^{HR}$ using a convolutional operation with a blur kernel $k$; 2) down-sampling the blurred images via an operation $\downarrow \alpha$ with the scaling factor $\alpha$; and 3) adding noise $N$ into the final image. In this paper, we chose a Gaussian kernel as the blur kernel and used white Gaussian noise with standard deviation $\sigma$ as $N$ [31] for simplicity. As the addition of white Gaussian noises may make some pixel values negative, it is necessary to clip the resulting image with the values between 0 and 255, i.e., to revise the negative values to zero. The operation order of ID cannot be interchanged; this is because it is determined by the real imaging process using an imager/camera under non-ideal conditions [32,33]. Specifically, the blurring appears first as it is caused by factors, such as the object is not positioned on the focal plane; the down-sampling is due to the limitation of the imaging sensor during the photon capturing; and the noise is finally considered because it is generated in the photoelectric conversion process. To clarify the difference between the two models, we present the corresponding schematic diagram in Fig. 1.

Fig. 1. Schematic diagram of two degradation models, and the diagram dataset capturing setup is inserted in the bottom left of the figure.

Download Full Size | PDF

As an effective and simplified way, the TD model is always applied as a high-efficiency tool for storing, transferring, and sharing large-sized images. This is because the down-sampling process can significantly save storage, efficiently utilize bandwidth, and be easily suitable for screens with different resolutions while maintaining visually valid information [34]. Compared to the TD model, the ID model is more complex but close to the real optical imaging system, especially for such optical imagers with a reduced resolution and under non-ideal imaging environments. In short, the two models are both important. In the following sections, we will verify that the proposed polarization SR solution is effective for the two models.

2.2 Structure of the neural network

To reconstruct LR polarized images into HR ones, we proposed a polarization SR network (PSRNet), whose structure was presented in Fig. 2. The network is supervised and based on the residual dense network and consists of four components: shallow feature extraction (SFE), a series of residual dense blocks (RDBs), the global feature fusion (GFF) and up-sampling module. The applied residual dense network enjoys advantages in enriching hierarchical representations from LR features by the residual learning and fully using different LR feature levels by the feature fusion. All the advantages make reconstructing the details of polarized images better.

Fig. 2. The structure of the proposed method. (a) The architecture of the PSRNet for the scaling factor of four. (b) The architecture of residual dense block (RDB).

Download Full Size | PDF

The four-channel LR input (with dimensions of $H\times W\times 4$, where $H$ and $W$ denote the height and weight of the input, respectively) of the network consists of $0^{\circ }$, $45^{\circ }$, $90^{\circ }$, and $135^{\circ }$ intensity images. The SFE consists of two convolutional (Conv.) layers, which serve to extract the polarization information in the LR space. The extracted features are inputted to a series of RDBs, where the features are densely connected and residually learned. The output of each RDB is concatenated at concatenation (Concat.) layer in the channel dimension. The output of the first Conv. layer of SFE is added to the concatenated result; that is, the global residual learning is completed. Then, the result is sent through two Conv. layers which follow the rectified linear unit (ReLU) activation function for further feature extraction. The function of the up-sampling module is to upscale features to dimensions of ${\alpha H\times \alpha W\times C}$. $C$ denotes the channel number of the convolutional kernel. In the proposed network, the upscaling is performed using the pixel shufflers [12]. Each pixel shuffler can upscale the feature by a factor of $2H_f \times 2W_f$, where $H_f$ and $W_f$ denote the height and weight of the former layer’s output. The final upsampled features are then sent through a Conv. layer to obtain an SR polarized image of dimensions of ${\alpha H\times \alpha W\times 4}$.

Detailed parameters of each layer of the network are given in Table 1. $G$ is the number of RDBs. $I$ denotes the number of Conv-ReLU blocks per RDB. And $64\times \ i$ is the input channel of the $i^{th}$ Conv-ReLU block.

Table 1. Parameters of each network layer

View Table | View all tables in this article

2.3 Loss function

For polarization SR reconstruction, directly computing the loss of the output and the label can not well match polarization information images with the ground truth. To solve this problem, one of effective ways is to use polarization representation losses [35]. Besides, perceptual losses [36] have been applied to many SR tasks [15], and achieved satisfactory results in the works based on perceptual similarity [37]. As such, combing the polarization losses and the perceptual losses may propose an effective solution for the polarization SR reconstruction. Specially, the final loss function needs to include both content loss and polarization perceptual loss:

• Content loss focuses more attention on the reconstructed intensity image. As the proposed network is supervised, the loss is given by calculating the difference between the label (i.e., HR) and output: $(5)$$L_{cont}={\Vert f(I^{LR};\Theta)-I^{HR} \Vert}_{2},$$$ where $\Theta$ denotes the trainable parameters in the proposed network. $f(\cdot )$ denotes the SR reconstruction operation predicted by network, ${\Vert \cdot \Vert }_2$ is the $L_2$ norm operation.
• Polarization perceptual loss focuses more on the reconstruction of the polarization information, i.e., DoLP and AoP images in this paper, which is given by: $(6)$$\begin{aligned} L_{polar} & ={\Vert \phi(DoLP(f(I^{LR};\Theta)))-\phi(DoLP(I^{HR})) \Vert}_{2}\\ & +{\Vert \phi(AoP(f(I^{LR};\Theta)))-\phi(AoP(I^{HR})) \Vert}_{2}, \end{aligned}$$$ where $DoLP(\cdot )$ and $AoP(\cdot )$ denotes the calculation operations of DoLP and AoP. $\phi (\cdot )$ denotes the feature map obtained by the last Conv. layer before the first maxpooling layer within the VGG-19 network [38].

Therefore, the total loss for the network is:

(7)$$L=L_{cont}+\lambda \times L_{polar},$$

where the balance parameter $\lambda$ is an empirical value, and its function is to balance the weights of content loss and polarization perception loss (i.e., $L_{cont}$ and $L_{polar}$) and constrain the two loss values on the same order of magnitude [18,20,39].

3. Experiment and results

3.1 Dataset and training setting

The dataset for polarization SR reconstruction includes 150 groups of image; each group contains HR polarized images, LR polarized images deduced via the TD model, and LR polarized images deduced via the ID model. Notably, all HR polarized images (with a spatial resolution of $1224 \times 1024$) contain four different sub-images corresponding to polarization directions, i.e., $0^\circ, 45^\circ, 90^\circ,$ and $135^\circ$, which are the so-called ground truth, and are captured by a commercial division of focal plane (DoFP) polarization camera (LUCID, PHX050S-PC) under a natural illumination (the diagram dataset capturing setup is shown in the bottom left of Fig. 1.) In the following, we will demonstrate the performance of the network, i.e., PSRNet, based on the two degradation models (i.e., TD and ID) presented in Fig. 1, respectively. For the TD model, the degradation scaling factors are two and four (with a spatial resolution of $4 \times 612 \times 512$ for the scaling factor of two and a spatial resolution of $4 \times 306 \times 256$ for the scaling factor of four) respectively. For the ID model, the Gaussian blur kernel’s size is set to $7 \times 7$ and its standard deviation is one. The down-sampling scaling factor is set to four and the added Gaussian noise satisfying zero mean and one standard deviation.

To enlarge the scale of the dataset, we applied a window with a size of $32 \times 32$ to crop the LR images with a step size of 16. Accordingly, the window and step sizes corresponding to the HR images cropping could be determined; i.e., the patch sizes of the HR images were $64 \times 64$ and $128 \times 128$ when the scaling factors are two and four, respectively.

In the proposed network, 120 of the 150 groups data were served as the training set, and the rest 30 groups were served as the validation set (15) and the test set (15), respectively. When training, all weights were randomly initialized by a normal distribution with 0.01 standard deviation and zero mean value. The size of the mini-batch was set to 16. The Adam optimizer [40] was used to update training parameters. The number of RDBs $G$ was set to 16, and each RDB satisfies eight Conv-ReLU blocks (i.e., $I=8$). The balance parameter, i.e., $\lambda$ was set to 0.01. The learning rate was initialized to 0.0001 and decreased with a rate of 0.5 for every 20 epochs. The number of max-epoch was set to 110. The proposed model is trained with an NVIDIA RTX 3090 GPU.

3.2 Results and discussion

To verify that our method is effective for both two models, we compared it with some representative methods in terms of quantitative and visual effect evaluation. These methods include one traditional interpolation method, i.e., bicubic interpolation, and three learning-based SR methods, i.e., SRCNN [11], PDCNN [19], and SRResNet [41]. Notably, as the PDCNN is designed for polarization demosaicing, its input is four-channel polarized images interpolated by bicubic. On the contrary, SRCNN and SRResNet are primordially designed for RGB images; we modified their inputs to four-channel polarized images interpolated by bicubic and four-channel LR polarized images, respectively, to make a comparison fair. In the comparison, all learning-based SR methods were set up according to the description in the original files and trained with the same training set described in Section 3.1.

3.2.1 Results of TD model

In this section, we first verify the effectiveness and superiority of the proposed method, i.e., PSRNet, based on the LR images deduced via the TD model.

Figures 3 and 4 show the comparison results generated from SR reconstructed polarized images using the TD model with the two scaling factors (i.e., $\times 2$ and $\times 4$) by different methods. From the enlarged views, we can confirm that our method better restores the DoLP and AoP images. Especially in the DoLP images, as shown in Fig. 3, other methods have incorrect filling in the hollow words and triangles, while our method can clearly divide the edges. It is worth noting that, compared with the images with a scaling factor of two, more details were lost in the LR images with a scaling factor of four. Even so, our method reconstructs the resolution best on the three images and retains almost all image details, especially the ruler’s scales in the AoP image. On the contrary, although the bicubic method seems to handle the reduced-resolution issue in the intensity and DoLP images, the reconstruction of AoP images totally fails. The other three learning-based methods, of course, well addressed the intensity and DoLP images as well as reconstructed most details in the AoP image; but there are obvious artifacts in the AoP image and make some weak ruler’s scales undistinguished.

Fig. 3. Intensity, DoLP, and AoP images obtained through different methods using TD model with the scaling factor of two.

Download Full Size | PDF

Fig. 4. Intensity, DoLP, and AoP images obtained through different methods using TD model with the scaling factor of four.

Download Full Size | PDF

In terms of quantitative evaluation, the PSNR and SSIM were selected as metrics to evaluate the resolution reconstruction performance [42,43]. Table 2 shows the average PSNR and SSIM values of intensity, DoLP, and AoP images obtained from different methods in the test set. From Table 2, one may observe that the PSNR and SSIM values of PSRNet in intensity, DoLP, and AoP are higher than other methods. Especially in DoLP and AoP images, PSNR and SSIM values have a great improvement.

Table 2. Average PSNR/SSIM of different methods on the simulation results with TD.

View Table | View all tables in this article

3.2.2 Results of ID degradation model

Compared with the TD model, reconstructing LR images obtained by the ID model is more challenging. Since DoLP and AoP are derived from the intensity images of four polarization directions by nonlinear operators, they are more sensitive to noises [44].

Figure 5 presents the image of intensity, DoLP and AoP generated from SR reconstructed polarized images using the ID model by different methods. From the figure, we can see that our method not only restores the image details, but also remove the image noise; as a result, the image features in all images (especially the DoLP and AoP images) become more obvious. On the contrary, the ruler’s scales shown in the enlarged views are difficult to distinguish in the reconstructed AoP images by other methods. It is worth noting that, the reconstructed AoP image in Fig. 5 is visually better than the ground truth because the learning-based method is data-driven, and its result mainly depends on the scale of training data and the network architecture. Therefore, it may be possible to appear the above phenomenon. To make a solid verification, we compare the average PSNR and SSIM values of reconstructed intensity, DoLP, and AoP images obtained from different methods in Table 3. We see that our method outperforms other methods in most cases, and only the SSIM value of AoP is slightly worse than other methods because the noises in the ground truth and the reconstructed AoP image amplify their differences, and this problem can be solved by optimizing the caption processing of the ground truth and improving its quality.

Fig. 5. Intensity, DoLP, and AoP images obtained through different methods using ID model.

Download Full Size | PDF

Table 3. Average PSNR/SSIM and running/inference time of different methods on the simulation results with ID.

View Table | View all tables in this article

Besides, we also measure the running time of Bicubic and inference times of network-based methods in the case of the ID model with the scaling factors of four and compared them in the last column of Table 3. Results show that the proposed PSRNet achieves better performance at the cost of inference time.

In addition, to verify the robustness of PSRNet, we compare the results with different blur kernels. Based on the ID model and dataset in Section 3.1, we compare the test sets (with the Gaussian blur kernel sizes of $3 \times 3$, $5 \times 5$, $7 \times 7$, and $9 \times 9$, respectively) when the training Gaussian kernel size is $3 \times 3$ or $7 \times 7$. The results are shown in Table 4. From the table, we may find that the network trained by blur kernels of different sizes can effectively reconstruct LR polarized images generated by blur kernels of different sizes, which verifies the robustness of the proposed network.

Table 4. Average PSNR/SSIM of different methods on the simulation results with TD of different blur kernels.

View Table | View all tables in this article

To verify the effectiveness of the proposed method on real data, we perform the proposed PSRNet on the red channel of images captured by a color DoFP polarization camera ($2448 \times 2048$, PHX050S-QC). In this case, the pixel resolution of each polarized image is reduced to $612 \times 512$ and will be reconstructed to $2448 \times 2048$. Figure 6 presents the example images of reconstructed intensity, DoLP, and AoP. From the results, which show high-resolution reconstructed images, we can conclude that the proposed method is equally effective for real polarized image datasets.

Fig. 6. Real images from the PHX050S-QC polarization camera and reconstructed results of intensity, DoLP, and AoP.

Download Full Size | PDF

4. Conclusion

In this paper, we proposed an effective solution to improve the resolution of polarized images. Considering different image degradation operations, i.e., blurring, down-sampling, and noising, two degradation models of polarized images are analyzed and applied to generate the polarized image dataset, including LR and HR image groups. Based on the degradation models, a CNN and a polarization loss function for polarization SR are designed. Compared with the bicubic interpolation and deep learning methods based on intensity-mode images, the proposed method focuses more attention on reconstructions of polarization information and recovers polarization information details closer to the ground truth. In addition, we tested the method with LR images generated by different scaling factors, and the proposed method can effectively recover the details of three polarization parameters, i.e., intensity, DoLP, and AoP images in LR images with a maximum scaling factor of four.

Funding

National Natural Science Foundation of China (62205243, 62075161).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. S. Lee and E. Pottier, Polarimetric radar imaging: from basics to applications (CRC, 2017).

2. L. Yan, Y. Li, V. Chandrasekar, H. Mortimer, J. Peltoniemi, and Y. Lin, “General review of optical polarization remote sensing,” International Journal of Remote Sensing 41(13), 4853–4864 (2020). [CrossRef]

3. X. Li, L. Zhang, P. Qi, Z. Zhu, J. Xu, T. Liu, J. Zhai, and H. Hu, “Are indices of polarimetric purity excellent metrics for object identification in scattering media?” Remote Sens. 14(17), 4148 (2022). [CrossRef]

4. Y. Huang, M. Sang, L. Xing, H. Hu, and T. Liu, “Unsupervised anomaly detection of mems in low illumination based on polarimetric support vector data description,” Opt. Express 29(22), 35651–35663 (2021). [CrossRef]

5. R. Blin, S. Ainouz, S. Canu, and F. Meriaudeau, “Road scenes analysis in adverse weather conditions by polarization-encoded images and adapted deep learning,” in 2019 IEEE Intelligent Transportation Systems Conference (ITSC), (IEEE, 2019), pp, 27–32.

6. M. Haris, G. Shakhnarovich, and N. Ukita, “Task-driven super resolution: Object detection in low-resolution images,” in International Conference on Neural Information Processing, (Springer, 2021), pp. 387–395.

7. R. Keys, “Cubic convolution interpolation for digital image processing,” IEEE Trans. Acoust., Speech, Signal Process. 29(6), 1153–1160 (1981). [CrossRef]

8. J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Transactions on Image Processing 19(11), 2861–2873 (2010). [CrossRef]

9. V. A. Rahiman and S. N. George, “Single image super resolution using neighbor embedding and statistical prediction model,” Computers & Electrical Engineering 62, 281–292 (2017). [CrossRef]

10. Y. K. Ooi and H. Ibrahim, “Deep learning algorithms for single image super-resolution: a systematic review,” Electronics 10(7), 867 (2021). [CrossRef]

11. C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in European Conference on Computer Vision, (Springer, 2014), pp. 184–199.

12. W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), pp. 1874–1883.

13. T. Tong, G. Li, X. Liu, and Q. Gao, “Image super-resolution using dense skip connections,” in Proceedings of the IEEE International Conference on Computer Vision, (2017), pp. 4799–4807.

14. B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition workshops, (2017), pp. 136–144.

15. X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy, “Esrgan: Enhanced super-resolution generative adversarial networks,” in The European Conference on Computer Vision Workshops (ECCVW), (2018).

16. Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), pp. 2472–2481.

17. R. Wu, Y. Zhao, N. Li, and S. G. Kong, “Polarization image demosaicking using polarization channel difference prior,” Opt. Express 29(14), 22066–22079 (2021). [CrossRef]

18. X. Liu, X. Li, and S.-C. Chen, “Enhanced polarization demosaicking network via a precise angle of polarization loss calculation method,” Opt. Lett. 47(5), 1065–1069 (2022). [CrossRef]

19. J. Zhang, J. Shao, H. Luo, X. Zhang, B. Hui, Z. Chang, and R. Liang, “Learning a convolutional demosaicing network for microgrid polarimeter imagery,” Opt. Lett. 43(18), 4534–4537 (2018). [CrossRef]

20. X. Zeng, Y. Luo, X. Zhao, and W. Ye, “An end-to-end fully-convolutional neural network for division of focal plane sensors to reconstruct s 0, dolp, and aop,” Opt. Express 27(6), 8566–8577 (2019). [CrossRef]

21. R. Zhou, R. Achanta, and S. Süsstrunk, “Deep residual network for joint demosaicing and super-resolution,” in Color and imaging conference, vol. 2018 (Society for Imaging Science and Technology, 2018), pp. 75–80.

22. A. Hore and D. Ziou, “Image quality metrics: Psnr vs. ssim,” in 2010 20th International Conference on Pattern Recognition, (IEEE, 2010), pp. 2366–2369.

23. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing 13(4), 600–612 (2004). [CrossRef]

24. Y. Sun, J. Zhang, and R. Liang, “Color polarization demosaicking by a convolutional neural network,” Opt. Lett. 46(17), 4338–4341 (2021). [CrossRef]

25. X. Li, B. Le Teurnier, M. Boffety, T. Liu, H. Hu, and F. Goudail, “Theory of autocalibration feasibility and precision in full stokes polarization imagers,” Opt. Express 28(10), 15268–15283 (2020). [CrossRef]

26. H. Hu, Y. Lin, X. Li, P. Qi, and T. Liu, “IPLNet: a neural network for intensity-polarization imaging in low light,” Opt. Lett. 45(22), 6162–6165 (2020). [CrossRef]

27. X. Li, T. Liu, B. Huang, Z. Song, and H. Hu, “Optimal distribution of integration time for intensity measurements in stokes polarimetry,” Opt. Express 23(21), 27690–27699 (2015). [CrossRef]

28. X. Li, H. Hu, F. Goudail, and T. Liu, “Fundamental precision limits of full stokes polarimeters based on dofp polarization cameras for an arbitrary number of acquisitions,” Opt. Express 27(22), 31261–31272 (2019). [CrossRef]

29. K. Zhang, J. Liang, L. Van Gool, and R. Timofte, “Designing a practical degradation model for deep blind image super-resolution,” in Proceedings of the IEEE International Conference on Computer Vision, (2021), pp. 4791–4800.

30. K. Zhang, L. V. Gool, and R. Timofte, “Deep unfolding network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2020), pp. 3217–3226.

31. Z. Long, T. Wang, C. You, Z. Yang, K. Wang, and J. Liu, “Terahertz image super-resolution based on a deep convolutional neural network,” Appl. Opt. 58(10), 2731–2735 (2019). [CrossRef]

32. M. Elad and A. Feuer, “Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images,” IEEE Transactions on Image Processing 6(12), 1646–1658 (1997). [CrossRef]

33. S. Farsiu, D. Robinson, M. Elad, and P. Milanfar, “Advances and challenges in super-resolution,” International Journal of Imaging Systems and Technology 14(2), 47–57 (2004). [CrossRef]

34. M. Xiao, S. Zheng, C. Liu, Y. Wang, D. He, G. Ke, J. Bian, Z. Lin, and T.-Y. Liu, “Invertible image rescaling,” in European Conference on Computer Vision, (Springer, 2020), pp. 126–144.

35. H. Liu, Y. Zhang, Z. Cheng, J. Zhai, and H. Hu, “Attention-based neural network for polarimetric image denoising,” Opt. Lett. 47(11), 2726–2729 (2022). [CrossRef]

36. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision, (Springer, 2016), pp. 694–711.

37. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), pp. 586–595.

38. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv, arXiv:1409.1556 (2014). [CrossRef]

39. P. Qi, X. Li, Y. Han, L. Zhang, J. Xu, Z. Cheng, T. Liu, J. Zhai, and H. Hu, “U2r-pgan: Unpaired underwater-image recovery with polarimetric generative adversarial network,” Optics and Lasers in Engineering 157, 107112 (2022). [CrossRef]

40. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014). [CrossRef]

41. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Pecognition, (2017), pp. 4681–4690.

42. X. Li, J. Xu, L. Zhang, H. Hu, and S.-C. Chen, “Underwater image restoration via stokes decomposition,” Opt. Lett. 47(11), 2854–2857 (2022). [CrossRef]

43. H. Hu, Y. Han, X. Li, L. Jiang, L. Che, T. Liu, and J. Zhai, “Physics-informed neural network for polarimetric underwater imaging,” Opt. Express 30(13), 22512–22522 (2022). [CrossRef]

44. X. Li, H. Li, Y. Lin, J. Guo, J. Yang, H. Yue, K. Li, C. Li, Z. Cheng, H. Hu, and T. Liu, “Learning-based denoising for polarimetric images,” Opt. Express 28(11), 16309–16321 (2020). [CrossRef]

part	layer	input channel	output channel	kernel size
SFE	Conv. 1	$4$	$64$	$(3, 3)$
SFE	Conv. 2	$64$	$64$	$(3, 3)$
RDBs(a total of $G$ )	Conv-ReLU. $i$	$64 \times i$	$64$	$(3, 3)$
	Concat.	/	/	/
	Conv.	$64 \times (I + 1)$	$64$	$(1, 1)$
GFF	Concat.	/	/	/
	Conv. 1	$64 \times G$	64	$(1, 1)$
	Conv. 2	$64$	$64$	$(3, 3)$
Up-sampling module	Conv-ReLU. 1	$64$	$64$	$(5, 5)$
	Conv-ReLU. 2	$64$	$64$	$(3, 3)$
	Conv-Pixel Shuffler. 1	$64$	$64$	$(3, 3)$
	Conv-Pixel Shuffler. 2(only for the scaling factor of four)	$64$	$64$	$(3, 3)$
	Conv.	$64$	$4$	$(3, 3)$

Method	TD ( $\times 2$ )			TD ( $\times 4$ )
Method	Intensity	DoLP	AoP	Intensity	DoLP	AoP
Bicubic	40.363/0.976	25.932/0.770	16.881/0.431	33.187/0.910	23.628/0.657	15.305/0.215
SRCNN	42.668/0.982	27.153/0.795	17.276/0.425	34.745/0.921	23.854/0.664	16.333/0.233
PDCNN	41.384/0.978	26.467/0.769	17.088/0.423	34.783/0.918	24.346/0.692	16.559/0.263
SRResNet	43.879/0.985	28.690/0.832	17.537/0.485	35.842/0.938	24.540/0.694	16.144/0.269
PSRNet	44.147/0.986	30.123/0.851	18.393/0.523	36.099/0.947	25.558/0.727	17.392/0.307

Method	ID ( $\times 4$ )
Method	Intensity	DoLP	AoP	Running/inference time (ms)
Bicubic	34.108/0.907	23.414/0.602	14.636/0.165	6.939
SRCNN	35.805/0.924	24.124/0.661	16.098/0.224	16.649
PDCNN	35.658/0.929	23.883/0.664	16.306/0.201	120.873
SRResNet	36.810/0.936	24.348/0.677	16.171/0.231	41.056
PSRNet	37.059/0.945	25.285/0.697	17.000/0.221	200.519

The size of blur kernels	ID ( $3 \times 3$ )			ID ( $7 \times 7$ )
The size of blur kernels	Intensity	DoLP	AoP	Intensity	DoLP	AoP
$3 \times 3$	37.008/0.948	25.334/0.700	17.039/0.226	36.554/0.945	25.289/0.697	16.950/0.225
$5 \times 5$	36.934/0.943	24.890/0.692	17.068/0.222	37.075/0.945	25.333/0.698	17.030/0.223
$7 \times 7$	36.870/0.942	24.810/0.690	17.069/0.221	37.059/0.945	25.285/0.697	17.000/0.221
$9 \times 9$	36.868/0.942	24.811/0.690	17.069/0.222	37.079/0.944	25.261/0.697	17.041/0.223

part	layer	input channel	output channel	kernel size
SFE	Conv. 1	$4$	$64$	$(3, 3)$
SFE	Conv. 2	$64$	$64$	$(3, 3)$
RDBs(a total of $G$ )	Conv-ReLU. $i$	$64 \times i$	$64$	$(3, 3)$
	Concat.	/	/	/
	Conv.	$64 \times (I + 1)$	$64$	$(1, 1)$
GFF	Concat.	/	/	/
	Conv. 1	$64 \times G$	64	$(1, 1)$
	Conv. 2	$64$	$64$	$(3, 3)$
Up-sampling module	Conv-ReLU. 1	$64$	$64$	$(5, 5)$
	Conv-ReLU. 2	$64$	$64$	$(3, 3)$
	Conv-Pixel Shuffler. 1	$64$	$64$	$(3, 3)$
	Conv-Pixel Shuffler. 2(only for the scaling factor of four)	$64$	$64$	$(3, 3)$
	Conv.	$64$	$4$	$(3, 3)$

Polarized image super-resolution via a deep convolutional neural network

Abstract

1. Introduction

2. Proposed method

2.1 Super-resolution model of polarized images based on Stokes measurement

2.2 Structure of the neural network

2.3 Loss function

3. Experiment and results

3.1 Dataset and training setting

3.2 Results and discussion

3.2.1 Results of TD model

3.2.2 Results of ID degradation model

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (6)

Tables (4)

Equations (7)

Optics Express