Joint artifact correction and super-resolution of image slicing and mapping system via a convolutional neural network

Anqi Liu; Anqi Liu; Xianzi Zeng; Xianzi Zeng; Yan Yuan; Lijuan Su; Wanyue Wang

doi:10.1364/OE.413076

1. Introduction

The image mapping spectrometer (IMS) is a snapshot imaging spectrometer that can acquire the three-dimensional (3D) datacube $(x,y,\lambda )$ of the scene during a single exposure. To realize multidimensional imaging, the key strategy is utilizing an optical component, which is termed as the image mapper, to slice the image of the target into segments and redistribute the image segments spatially on the detector. However, the image mapper introduces complex degradation in the reconstructed images. Few image processing algorithms have been investigated to solve the problem.

The spatial sampling of the IMS is intrinsically determined by the facet density of the image mapper. Fabricating the image mapper with a smaller cutting tool was proved to be efficient to improve the sampling density [1]. Nonetheless, it requires advanced manufacturing technique and increases cost. In addition, the edge eating effect becomes more significant for thinner facets and reduces the light throughput of the system [2]. In recent years, single-image super-resolution (SR) is an emerging technique to reconstruct a high resolution (HR) image from a low resolution (LR) image. Compared with conventional methods based on interpolation [3,4] or image statistics [5–7], deep convolutional neural networks (CNNs) have demonstrated excellent performance to improve the SR results, such as EDSR [8], SRResNet [9], and ERSGAN [10].

Moreover, the image mapper introduces intensity artifacts in the reconstructed images of the IMS [11–14], which can be observed at all wavelengths. The artifacts can be sorted into nonuniform intensity and missing data. The nonuniform intensity might be attributed to the variation of the pixel response and the reflectivity of the facets. Besides, the system alignment error causes defocus of the image lines [15]. The edge-eating effect [2] and the shadowing effect [16] lead to thinning or even incompleteness of the image lines. Considering the quantum efficiency of the detector, some weak signals might be lower than the detection level. The above issues in the raw measurement cause intensity decrease in the reconstructed image and finally lead to nonuniform intensity and missing data. The defects of the fabricated image mapper also contribute to the missing data. Meanwhile, image inpainting methods [17–20] have been proposed to repair the missing regions. The above methods fill missing pixels of an image by borrowing information from surrounding regions, which suffer no other degradation. However, the intensity artifacts of the IMS include not only missing data but also intensity variation. As a result, the learning-based image inpainting methods [21,22] can not effectively handle the problem either. Hence, an artifact correction method is needed for the specific intensity artifacts of the IMS.

With the advancement of deep learning, recent studies attempt to deal with joint image degradation problems [23–26], which are intensity artifacts and resolution decrease in our case. There are mainly two kinds of solutions to the joint problem. The straightforward approach is to develop a specific network for each problem and utilize the output of the first network as the input of the second network [25]. Meanwhile, the concatenation method tends to deliver the estimation error of the first network to the second one. The other approach to the joint problem is using a single network, which executes the two tasks in parallel [26]. It exploits the correlation between the two tasks to reduce error accumulation.

In this research, we propose a novel image enhancing method based on CNN to handle the complex image degradation induced by the image mapper. We present an image enhancing network dedicated to the panchromatic images acquired by the system without the dispersive element, which is denoted as the image slicing and mapping system (ISMS) for clarity. The proposed joint network contains two branches that extract features from a degraded LR image for artifact correction and SR respectively, then combines the above features with an attention fusion module, and finally reconstructs an artifact-free HR image (see Section 3 for details). We compare our network with state-of-the-art methods and extend the network architecture and training methodology to other image mapper designs. Both the simulation results in Section 4 and the experimental results in Section 5 demonstrate the efficiency and generalization of the proposed method.

2. System description

2.1 Optical layout

The optical layout of the ISMS is shown in Fig. 1. The target is imaged by the fore optics (focal length (FL) = 60mm, f/2.8D, Nikon), which is telecentric in the image space. The image mapper is a custom-fabricated reflective element comprised of $M \times N$ long and narrow mirror facets with two-dimensional tilt angles $(\alpha _{m},\beta _{m})$, where $m=1,2,\ldots ,M$. $M$ is the number of tilt angles and $N$ is the number of periodic blocks. The design parameters of the image mapper are listed in Table 1. It is fabricated by diamond raster fly cutting on the aluminum substrate, as displayed in Fig. 1(c). The chief rays incident on the image mapper are parallel to the optical axis, the directions of the reflected rays are determined by the tilt angles of the corresponding facets. As a result, the image mapper is capable of slicing the image of the target into segments and redirecting them to different directions. The reflected rays are collimated by the collimating lens (FL = 50mm, f/1.4D, Nikon) and converged by the reimaging lens array (FL = 12mm, diameter D = 4mm, Edmund Optics, 63704), as shown in Fig. 1(d). A 1:1 relay lens (GCO-2301, Daheng Optics) is used to adjust the working distance between the collimating lens and the lens array. The raw image is captured by the detector (VA-29M, Vieworks, $4384 \times 6596$, pixel size: $5.5\mathrm{\mu} \mathrm {m}\times 5.5\mathrm{\mu} \mathrm {m}$).

Fig. 1. Optical layout of the ISMS. (a) Schematic description. (b) Prototype setup. Close up images of (c) image mapper, and (d) reimaging lens array.

Download Full Size | PDF

Table 1. Design parameters of the image mapper.

View Table | View all tables in this article

2.2 Degradation of reconstructed image

As shown in Fig. 2, the image degradation process of the ISMS can be modeled by:

(1)$$\mathbf{I}^\mathrm{d}=\mathcal{R}\{\mathcal{M}\{\mathbf{I}^\mathrm{HR}\}\},$$

where $\mathbf {I}^{\mathrm {HR}}$, $\mathbf {I}^{\mathrm {d}}$ denote the input HR image and the degraded reconstructed image respectively, and $\mathcal {M}\{\cdot \}$, $\mathcal {R}\{\cdot \}$ are the mapping and remapping operators. The mapping operator is based on the accurate ray tracing model proposed in [16], where only the shadowing effect of the image mapper is considered. Using the design parameters in Table 1, the shadowing effect leads to missing data and image line thinning in the simulated raw measurement. The remapping operator uses the lookup table to obtain the reconstructed image, which suffers intensity artifacts and low spatial resolution. To solve the joint problem, we propose an image processing method based on CNN to perform artifact correction and resolution improvement simultaneously.

Fig. 2. Image degradation process of the ISMS.

Download Full Size | PDF

3. Proposed method

In this section, we introduce the architecture of the proposed joint network and the training process. The goal of the network is to learn a non-linear function $f$, which can transform the degraded LR image $\mathbf {I}^{\mathrm {d}}$ to an artifact-free HR image $\hat {\mathbf {I}}^{\mathrm {HR}}$ and minimize the difference between the output $\hat {\mathbf {I}}^{\mathrm {HR}}$ and the ground truth $\mathbf {I}^{\mathrm {HR}}$.

3.1 Network architecture

The joint network integrates the artifact correction task and SR task in a single network. As illustrated in Fig. 3(a), the proposed network consists of four modules. (I) An artifact correction module that focuses on extracting features to repair missing regions and improve uniformity. (II) An SR module that can extract high dimensional features for image SR. (III) A feature fusion module based on channel attention [27] to fuse the features extracted by the artifact correction module and the SR module. (IV) A reconstruction module that can generate the final artifact-free HR image from the fused features.

Fig. 3. Overview of the proposed joint network. (a) Architecture of the network. (b) Structure of the attention fusion module.

Download Full Size | PDF

Artifact correction module. The artifact correction module aims to extract artifact features from the input degraded LR image $\mathbf {I}^{\mathrm {d}}$. One convolutional layer with the kernel size of 3$\times$3 and 16 residual-in-residual dense blocks (RRDBs) [10] are used to extract the deep feature expression. To enlarge the receptive field, three flat-convolutional layers with different kernel sizes of 3$\times$3, 5$\times$5, and 7$\times$7 are used to generate the artifact features by concatenating the convolutional responses. In order to calculate the loss function described in Section 3.2.2, another 3$\times$3 convolutional layer is added to reconstruct the artifact-free LR image $\hat {\mathbf {I}}^{\mathrm {LR}}$.

SR module. The SR module is composed of a 3$\times$3 convolutional layer and 16 RRDBs [10] to extract the high-dimensional features for SR. In order to accelerate the convergence of the network, skip connections are added to prevent the gradient vanishing caused by the deep network architecture. The output features of the SR module are denoted as SR features, which will be fed into the attention fusion module for feature fusion.

Attention fusion module. The artifact and SR features are visualized in Fig. 3(b). The artifact features have a higher response in the missing regions and the SR features contain more spatial details in the input image. A trainable feature fusion module based on channel attention [26] is constructed to selectively combine the features from the two modules, and the structure of the attention fusion module is illustrated in Fig. 3(b).

Reconstruction module. The fused features are fed into the upsampling layer to improve the spatial resolution. Two additional convolutional layers are used to reconstruct the artifact-free HR image $\hat {\mathbf {I}}^{\mathrm {HR}}$. The number of the feature maps in each convolutional layer is set to be 64.

3.2 Training

3.2.1 Dataset

Similar to many methods based on deep learning, the network requires a large amount of data with ground truth for training. We use 800 training images in the dataset DIV2K [28] to generate sufficient data for the network training. The test dataset is obtained by 200 images in dataset DIV2K and Urban100 [29] to evaluate the performance. Firstly, the HR images are cropped into $336\times 483$ patches to acquire HR ground truth images $\mathbf {I}^{\mathrm {HR}}$. According to Eq. (1), the HR patches are transformed to $48\times 69$ degraded images $\mathbf {I}^{\mathrm {d}}$ by MATLAB, using the parameters detailed in Section 2. Meanwhile, the HR patches are also downscaled by 7$\times$ to obtain the LR ground truth images $\mathbf {I}^{\mathrm {LR}}$ via the bicubic kernel function. In total, we obtain 800 triplets of {$\mathbf {I}^{\mathrm {d}}, \mathbf {I}^{\mathrm {LR}}, \mathbf {I}^{\mathrm {HR}}$} for training. $\mathbf {I}^{\mathrm {d}}$ is the input of the network, $\mathbf {I}^{\mathrm {LR}}$ is used to train the artifact correction module for generating the artifact-free LR image $\hat {\mathbf {I}}^{\mathrm {LR}}$, and $\mathbf {I}^{\mathrm {HR}}$ is dedicated to training the SR module for reconstructing the artifact-free HR image $\hat {\mathbf {I}}^{\mathrm {HR}}$.

3.2.2 Loss function

Mean squared error (MSE) loss. The MSE loss is the most widely used optimization target for image restoration tasks [9,30], which is based on the pixel-to-pixel operation. We use the MSE loss to minimize the difference between the ground truth and the output of the network. Given an image with a size of $W\times H$, the pixel-wise MSE loss function is expressed as:

(2)$$L_\mathrm{P}=\frac{1}{WH}\sum_{i=1}^{W}\sum_{j=1}^{H}(\hat{\mathbf{I}}-\mathbf{I})^2.$$

Perceptual loss. Meanwhile, the pixel-wise MSE loss may lead to the lack of high-frequency details and produce unreal images with overly smooth textures. To ensure the perceptual similarity, we add the perceptual loss to measure the semantic difference between the features of the output and the ground truth, which can be extracted from a pre-trained deep network named VGG16 [31]. The perceptual loss function can be expressed as:

(3)$$L_\mathrm{F}=\left|\phi(\hat{\mathbf{I}})-\phi(\mathbf{I})\right|,$$

where $\phi$ is the network VGG16, which is composed of thirteen convolutional layers and three fully connected layers and can produce the feature maps of a given image.

Overall loss. For the artifact correction module and SR module, the loss function of each module is the weighted combination of the above loss functions:

(4)$$\mathcal{L}_\mathrm{AC}=\omega_1L^\mathrm{LR}_\mathrm{P}+\omega_2L^\mathrm{LR}_\mathrm{F},$$

(5)$$\mathcal{L}_\mathrm{SR}=\lambda_1L^\mathrm{HR}_\mathrm{P}+\lambda_2L^\mathrm{HR}_\mathrm{F},$$

where $\omega _1,\omega _2,\lambda _1,\lambda _2$ are the weights of different loss functions.

To perform joint artifact correction and SR on the degraded image, the network is trained by optimizing the joint loss function:

(6)$$\min\quad\xi\mathcal{L}_\mathrm{AC}(\hat{\mathbf{I}}^\mathrm{LR}, \mathbf{I}^\mathrm{LR})+\gamma\mathcal{L}_\mathrm{SR}(\hat{\mathbf{I}}^\mathrm{HR}, \mathbf{I}^\mathrm{HR}),$$

where $\xi , \gamma$ are the weights to balance the loss of the two modules.

3.2.3 Training details

The proposed network is optimized by Adam optimizer [32] with $\beta _1=0.9,\beta _2=0.99$. The learning rate is initialized as $1\times 10^{-4}$ and reduced by the scale of 0.5 every $1\times 10^{5}$ iterations. The parameters in Eqs. (4), (5), and (6) are set to be $\omega _1=10$, $\omega _2=1$, $\lambda _1=10$, $\lambda _2=1$, $\xi =0.5$, and $\gamma =0.5$. A large receptive field can capture more semantic information, which is advantageous for the training of a deep network, but it also costs extensive training time and computational resources. Therefore, the mini-batch size is set to be 16 and the spatial size of the randomly cropped LR patch is 48$\times$48. The models are implemented with the PyTorch framework and performed on NVIDIA Titan Xp GPU with $5\times 10^5$ iterations.

4. Simulation results and analysis

In this section, we assess the performance of the proposed network in simulation. We compare our method with several state-of-the-art methods. Then, we perform an ablation study to demonstrate the efficacy of each component of the joint network. Next, we analyze the influence of different image mapper designs.

4.1 Comparison with state-of-the-art methods

We treat the joint problem with the traditional bicubic interpolation method [3], the learning-based SR methods (SRResNet [9] and ESRGAN [10]), the concatenation of SR methods and image inpainting methods (contextual attention (CA) [21] and edge connect (EC) [22]), and the proposed joint method. For a fair comparison, we use the same parameter settings recommended in the papers and retrain the above methods on our dataset.

Quantitative evaluation. Peak signal to noise ratio (PSNR) [33] and structural similarity (SSIM) [34] are selected to evaluate the performance of different methods. Table 2 tabulates the average PSNR and SSIM on the test datasets DIV2K and Urban100 at the magnification factors of 2 and 4. The concatenation methods simply execute inpainting and SR algorithms in sequence and do not always get better performance against sole SR methods. The reason might be the error delivered from the inpainting algorithm is magnified by the SR algorithm. The error accumulation is random and difficult to control. In comparison, our method makes stable and significant improvements in terms of PSNR and SSIM.

Table 2. Average PSNR and SSIM of different methods on the simulation results.

View Table | View all tables in this article

Qualitative evaluation. The visual comparison of several images from the test dataset is shown in Fig. 4. The best-performing SR method and concatenation method from Table 2, which are ESRGAN [10] and CA [21]+ESRGAN [10] respectively, are selected to present the results. The bicubic method cannot remove the intensity artifacts, the SR method and the concatenation method tend to generate unsmooth textures, while our method produces more natural details. It can be concluded that the proposed method generates artifact-free HR images of higher quality.

Fig. 4. Visual comparison of different methods on the simulation results at the magnification factor of 2.

Download Full Size | PDF

4.2 Ablation study

To demonstrate the improvements achieved by each component of the proposed network, we perform an ablation study with three ablation configurations: SR (only use SR module), AC-SR (use artifact correction and SR modules, and concatenate the features), and AC-F-SR (use the attention fusion module to combine the artifact correction and SR features). As shown in Fig. 5, AC-SR performs better than SR, which means the artifact correction module is effective to the specific artifacts introduced by the image mapper. Furthermore, AC-F-SR performs favorably against AC-SR because the attention fusion module plays an important role in connecting the artifact correction and SR modules and optimizing the performance of the joint network.

Fig. 5. Convergence curves of different ablation configurations in terms of (a) PSNR and (b) SSIM. All the configurations are trained with the same hyper-parameters.

Download Full Size | PDF

4.3 Different image mapper designs

Compared with traditional imaging processing algorithms designed for specific features, our CNN-based approach has an advantage in generalization. The joint network is targeted at the image degradation induced by the image mapper, therefore it should be flexible to different image mapper designs to promote the application.

4.3.1 Different configurations

According to [2], the sequence of tilted facets must be optimized to minimize the height difference between neighboring facets. As shown in Fig. 6(a), the sequential configuration minimizes the height difference of adjacent facets within a block of $\alpha _{m}$-tilt-only facets. However, the height difference between repeating blocks is the largest. Figure 6(b) shows the staggered configuration that minimizes the height difference between repeating blocks but increases the height differences within a block.

Fig. 6. Layout of the tilted facets. (a) Sequential configuration. (b)Staggered configuration.

Download Full Size | PDF

Consequently, the distribution of intensity artifacts in the reconstructed image varies with the configuration of the image mapper. The presented prototype used the sequential configuration. To prove the generalization of the proposed network, we retrained it on the simulated dataset based on the staggered configuration. The network performance on corresponding test datasets at the magnification factor of 2 is shown in Table 3.

Table 3. Network performance on different configurations of image mapper ($2\times$).

View Table | View all tables in this article

As shown in Fig. 7, the intensity artifacts in the reconstructed images are different, but our network still achieves almost the same performance on different configurations.

Fig. 7. Visual comparison of the network performance on different configurations of image mapper ($2\times$).

Download Full Size | PDF

4.3.2 Different facet densities

The spatial resolution of the reconstructed image is limited by the facet density of the image mapper. In the presented prototype, the facet width is 165$\mathrm{\mu} \mathrm {m}$ and the facet density is 6.06 facets/mm. As a result, the image sampling is $48 \times 69$, which is relatively low. With advanced fabrication level, the minimum facet width can be decreased to 70$\mathrm{\mu} \mathrm {m}$. We decreased the facet width to 118$\mathrm{\mu} \mathrm {m}$ and 70$\mathrm{\mu} \mathrm {m}$ respectively. Meanwhile, the image mappers still used the sequential configuration with the same tilt angles. The samplings of the degraded images are $67 \times 92$ and $114 \times 161$ respectively. Then, we retrained the network on the datasets corresponding to different facet densities. As presented in Table 4, the proposed network still makes distinct improvement in the case of higher facet density. It should also be noted that as the facet density increases, the performance of the network decreases. This is due to lower image quality before enhancement. As shown in the first row of Fig. 7 and Fig. 8, the system with higher facet density suffers more severe shadowing effect due to thinner facets.

Fig. 8. Visual comparison of the network performance on different facet densities of image mapper ($2\times$).

Download Full Size | PDF

Table 4. Network performance on different facet densities of image mapper ($2\times$).

View Table | View all tables in this article

In conclusion, the proposed joint network architecture and training methodology can be generalized to other image mapper designs.

5. Experimental results

To further verify the effectiveness of the proposed method, we conducted experiments on real targets. Forty-five pictures from the training dataset were printed out and used as the imaging targets. Considering only the shadowing effect is considered in the simulation, the intensity artifacts in the experimental results are different from those in the simulation results. As a result, the network trained on the simulation images cannot perform well on the experimental images. To solve this problem, we adopted transfer learning [35] to finely tune the pre-trained network by adding the experimental images to the training dataset. It is a kind of training strategy which has been proved to effectively address the generalization issue in other imaging applications [36].

To determine the optimal number of experimental images used for transfer learning, we gradually increased the number of experimental images from 5 to 40 with a step of 5. The network performance on the test images is presented in Fig. 9. As the number of experimental images increases, the PSNR and SSIM of each test image increase fast in the beginning. Then, the increasing rate gradually slows down and the result becomes stable.

Fig. 9. Network performance with different numbers of experimental images used for transfer learning. (a) PSNR. (b) SSIM.

Download Full Size | PDF

Finally, in this research, we chose to use 35 experimental images for transfer learning and 10 experimental images for testing. As shown in Table 5, transfer learning efficiently improves the performance of all learning-based methods on the experimental images. In comparison, our tuned network still has an advantage over the other tuned networks. Figure 10 visually demonstrates the excellent performance of our tuned network on the experimental images.

Fig. 10. Visual comparison of different methods on the experimental results at the magnification factor of 2. The methods with * sign are finely tuned by transfer learning.

Download Full Size | PDF

Table 5. Average PSNR and SSIM of different methods on the experimental results. The methods with * sign are finely tuned by transfer learning.

View Table | View all tables in this article

6. Discussion and conclusion

In this paper, we proposed a joint image processing method based on CNN to improve the quality of the reconstructed image of the ISMS. The network utilized two branches to deal with the joint problem. The artifact correction module was designed specifically to remove the artifacts in the image and the SR module was used to improve the spatial resolution. The attention fusion module was developed to fuse the features extracted by the artifact correction and SR modules. Finally, the network reconstructed an artifact-free HR image from the fused features. The simulation results demonstrated the proposed network outperformed the state-of-the-art methods and could be generalized to other image mapper designs. The efficacy of the proposed methodology was validated by the experimental results.

In the case of limited experimental data, transfer learning was an efficient approach to bridge the gap between the simulation and experimental images. However, the PSNR and SSIM of the experimental results after transfer learning were lower than those of the simulation results. One reason might be the number of experimental images used for transfer learning was not sufficient enough. Another reason might be that the digital image before printing was regarded as the ground truth image and the printed images were already degraded. Enlightened by the display-capture lab setup presented in [37], capturing targets displayed on a high-resolution LCD monitor can be a time-efficient approach to obtain abundant experimental data for training in the future.

The proposed method made remarkable improvements in the remapped images of the ISMS. By incorporating a dispersive element, the ISMS becomes the spectral imaging system IMS, which can obtain spectral images at different wavelengths. Based on the proposed model, we will focus on training multiple models that can process the spectral images at corresponding wavelengths. By adding spectral constraints, the models can be combined as an ensemble model, which is expected to perform artifact correction and SR on a datacube.

In conclusion, this paper demonstrates an effective image processing method based on CNN for enhancing the image quality of the ISMS. It is meaningful for the development of the image processing pipeline of the IMS and improving the instrument’s performance in applications.

Funding

National Natural Science Foundation of China (61635002); Fundamental Research Funds for the Central Universities.

Acknowledgments

Anqi Liu was responsible for the simulation and experiment of the ISMS and Xianzi Zeng built the network and completed the training. We are grateful to the editors and reviewers. Their advice helped us improve the quality of this paper.

Disclosures

The authors declare no conflicts of interest.

References

1. L. Gao, R. T. Kester, N. Hagen, and T. S. Tkaczyk, “Snapshot Image Mapping Spectrometer (IMS) with high sampling density for hyperspectral microscopy,” Opt. Express 18(14), 14330–14344 (2010). [CrossRef]

2. R. T. Kester, L. Gao, and T. S. Tkaczyk, “Development of image mappers for hyperspectral biomedical imaging applications,” Appl. Opt. 49(10), 1886–1899 (2010). [CrossRef]

3. R. G. Keys, “Cubic convolution interpolation for digital image processing,” IEEE Trans. Acoust., Speech, Signal Process. 29(6), 1153–1160 (1981). [CrossRef]

4. L. Zhang and X. Wu, “An edge-guided image interpolation algorithm via directional filtering and data fusion,” IEEE Transactions on Image Processing 15(8), 2226–2238 (2006). [CrossRef]

5. W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Example-based super-resolution,” IEEE Comput. Grap. Appl. 22(2), 56–65 (2002). [CrossRef]

6. H. Chang, D.-Y. Yeung, and Y. Xiong, “Super-resolution through neighbor embedding,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (IEEE, 2004), pp. 1–8.

7. J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution as sparse representation of raw image patches,” in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (IEEE, 2008), pp. 1–8.

8. B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, (IEEE, 2017), pp. 136–144.

9. C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and S. Wenzhe, “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), pp. 4681–4690.

10. X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy, “Esrgan: Enhanced super-resolution generative adversarial networks,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, (Springer, 2018), pp. 63–79.

11. L. Gao, A. D. Elliott, R. T. Kester, N. Bedard, N. Hagen, D. W. Piston, and T. S. Tkaczyk, “Image mapping spectrometer (IMS) for real time hyperspectral fluorescence microscopy,” in Frontiers in Optics, (Optical Society of America, 2010), p. FML2.

12. R. T. Kester, L. Gao, N. Bedard, and T. S. Tkaczyk, “Real-time hyperspectral endoscope for early cancer diagnostics,” Proc. SPIE 7555, 75550A (2010). [CrossRef]

13. R. T. Kester, N. Bedard, L. Gao, and T. S. Tkaczyk, “Real-time snapshot hyperspectral imaging endoscope,” J. Biomed. Opt. 16(5), 056005 (2011). [CrossRef]

14. L. Gao, R. T. Smith, and T. S. Tkaczyk, “Snapshot hyperspectral retinal camera with the Image Mapping Spectrometer (IMS),” Biomed. Opt. Express 3(1), 48–54 (2012). [CrossRef]

15. N. Bedard, N. Hagen, L. Gao, and T. S. Tkaczyk, “Image mapping spectrometry: calibration and characterization,” Opt. Eng. 51(11), 111711 (2012). [CrossRef]

16. A. Liu, L. Su, Y. Yuan, and X. Ding, “Accurate ray tracing model of an imaging system based on image mapper,” Opt. Express 28(2), 2251–2262 (2020). [CrossRef]

17. T. Ružić and A. Pižurica, “Context-aware patch-based image inpainting using Markov random field modeling,” IEEE Trans. on Image Process. 24(1), 444–456 (2015). [CrossRef]

18. H. Li, W. Luo, and J. Huang, “Localization of diffusion-based inpainting in digital images,” IEEE Transactions on Inf. Forensics and Secur. 12(12), 3050–3064 (2017). [CrossRef]

19. X. Jin, Y. Su, L. Zou, Y. Wang, P. Jing, and Z. J. Wang, “Sparsity-based image inpainting detection via canonical correlation analysis with low-rank constraints,” IEEE Access 6, 49967–49978 (2018). [CrossRef]

20. D. Ding, S. Ram, and J. J. Rodríguez, “Image inpainting using nonlocal texture matching and nonlinear filtering,” IEEE Trans. on Image Process. 28(4), 1705–1719 (2019). [CrossRef]

21. J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, 2018), pp. 5505–5514.

22. K. Nazeri, E. Ng, T. Joseph, F. Qureshi, and M. Ebrahimi, “Edgeconnect: Generative image inpainting with adversarial edge learning,” https://arxiv.org/abs/1901.00212 (2019).

23. X. Liu, M. Suganuma, X. Luo, and T. Okatani, “Restoring images with unknown degradation factors by recurrent use of a multi-branch network,” https://arxiv.org/abs/1907.04508 (2019).

24. A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, 2018), pp. 7482–7491.

25. G. Zhao, J. Liu, J. Jiang, and W. Wang, “A deep cascade of neural networks for image inpainting, deblurring and denoising,” Multimed. Tools Appl. 77(22), 29589–29604 (2018). [CrossRef]

26. X. Zhang, H. Dong, Z. Hu, W.-S. Lai, F. Wang, and M.-H. Yang, “Gated fusion network for joint image deblurring and super-resolution,” https://arxiv.org/abs/1807.10806 (2018).

27. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, 2018), pp. 7132–7141.

28. E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, (IEEE, 2017), pp. 126–135.

29. J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, 2015), pp. 5197–5206.

30. S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Globally and locally consistent image completion,” ACM Trans. Graph. 36(4), 1–14 (2017). [CrossRef]

31. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” https://arxiv.org/abs/1409.1556 (2014).

32. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” https://arxiv.org/abs/1412.6980 (2014).

33. A. Horé and D. Ziou, “Image quality metrics: PSNR vs. SSIM,” in 2010 20th International Conference on Pattern Recognition, (IEEE, 2010), pp. 2366–2369.

34. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

35. S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). [CrossRef]

36. Y. Jin, J. Chen, C. Wu, Z. Chen, X. Zhang, H. Shen, W. Gong, and K. Si, “Wavefront reconstruction based on deep transfer learning for microscopy,” Opt. Express 28(14), 20738–20747 (2020). [CrossRef]

37. Y. Peng, Q. Sun, X. Dun, G. Wetzstein, W. Heidrich, and F. Heide, “Learned large field-of-view imaging with thin-plate optics,” ACM Trans. Graph. 38(6), 1–14 (2019). [CrossRef]

Parameter	Value	Parameter	Value	Parameter	Value
No. of tilt angles, $M$	23	Facet width, $b$	165 $μ m$	Length, $l$	8mm
No. of blocks, $N$	3	Overall tilt angle, $θ$	$25^{\circ}$	Width, $w$	11.385mm
$α_{m} /^{\circ}$	5.422, 5.259, 4.991, 4.033, 3.868, 2.731, 2.649, 2.513, 1.351,
	1.296, 0.000, 0.000, 0.000, -1.351, -1.296, -2.731, -2.649, -2.513,
	-4.033, -3.868, -5.422, -5.259, -4.991
$β_{m} /^{\circ}$	4.878, -0.186, -5.242, 2.455, -2.664, 5.045, -0.047, -5.137, 2.558,
	-2.582, 5.102, 0.000, -5.102, 2.558, -2.582, 5.045, -0.047, -5.137,
	2.455, -2.664, 4.878, -0.186, -5.242

Method	DIV2K		Urban100
	PSNR(dB)	SSIM	PSNR(dB)	SSIM
	$2 \times$ / $4 \times$	$2 \times$ / $4 \times$	$2 \times$ / $4 \times$	$2 \times$ / $4 \times$
Bicubic	17.07/16.55	0.39/0.36	15.54/14.79	0.30/0.26
SRResNet [9]	17.42/15.88	0.40/0.27	15.37/15.18	0.31/0.26
ESRGAN [10]	17.73/16.53	0.44/0.34	15.60/14.17	0.33/0.24
CA [21]+SRResNet [9]	17.36/15.83	0.40/0.25	15.00/13.53	0.29/0.17
EC [22]+SRResNet [9]	17.24/15.90	0.39/0.27	15.11/13.80	0.30/0.19
CA [21]+ESRGAN [10]	17.97/16.62	0.44/0.34	15.79/14.02	0.34/0.22
EC [22]+ESRGAN [10]	17.90/16.65	0.44/0.34	15.51/14.25	0.32/0.24
Ours	24.37/22.53	0.81/0.64	21.97/19.71	0.76/0.55

Configuration	Method	DIV2K		Urban100
		PSNR(dB)	SSIM	PSNR(dB)	SSIM
Sequential	Bicubic	17.07	0.39	15.54	0.30
	Ours	24.37	0.81	21.97	0.76
Staggered	Bicubic	17.01	0.39	15.51	0.30
	Ours	24.64	0.82	22.19	0.77

Facet density	Facet width	Method	DIV2K		Urban100
(facets/mm)	( $μ m$ )		PSNR(dB)	SSIM	PSNR(dB)	SSIM
Low (6.06)	165	Bicubic	17.07	0.39	15.54	0.30
		Ours	24.37	0.81	21.97	0.76
Medium (8.47)	118	Bicubic	15.89	0.30	14.47	0.23
		Ours	22.12	0.70	19.33	0.60
High (14.29)	70	Bicubic	13.82	0.20	12.48	0.15
		Ours	22.01	0.68	18.75	0.59

Method	PSNR(dB)	SSIM	Method	PSNR(dB)	SSIM
	$2 \times$ / $4 \times$	$2 \times$ / $4 \times$		$2 \times$ / $4 \times$	$2 \times$ / $4 \times$
Bicubic	9.93/9.81	0.26/0.27	-	-	-
SRResNet [9]	10.51/9.83	0.28/0.20	SRResNet $^{*}$ [9]	14.13/13.31	0.27/0.19
ESRGAN [10]	10.36/10.06	0.28/0.24	ESRGAN $^{*}$ [10]	14.83/14.53	0.35/0.30
CA [21]+SRResNet [9]	10.50/9.83	0.27/0.20	CA [21]+SRResNet $^{*}$ [9]	14.40/11.19	0.31/0.11
EC [22]+SRResNet [9]	10.64/9.88	0.27/0.20	EC [22]+SRResNet $^{*}$ [9]	14.39/13.63	0.30/0.20
CA [21]+ESRGAN [10]	10.42/10.04	0.28/0.23	CA [21]+ESRGAN $^{*}$ [10]	15.16/14.43	0.37/0.30
EC [22]+ESRGAN [10]	10.71/10.19	0.28/0.25	EC [22]+ESRGAN $^{*}$ [10]	14.98/14.41	0.36/0.29
Ours	17.38/12.28	0.51/0.45	Ours $^{*}$	19.25/15.02	0.66/0.41

Joint artifact correction and super-resolution of image slicing and mapping system via a convolutional neural network

Abstract

1. Introduction

2. System description

2.1 Optical layout

2.2 Degradation of reconstructed image

3. Proposed method

3.1 Network architecture

3.2 Training

3.2.1 Dataset

3.2.2 Loss function

3.2.3 Training details

4. Simulation results and analysis

4.1 Comparison with state-of-the-art methods

4.2 Ablation study

4.3 Different image mapper designs

4.3.1 Different configurations

4.3.2 Different facet densities

5. Experimental results

6. Discussion and conclusion

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (10)

Tables (5)

Equations (6)

Optics Express