Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Hyperspectral imaging from a raw mosaic image with end-to-end learning

Open Access Open Access

Abstract

Hyperspectral imaging provides rich spatial-spectral-temporal information with wide applications. However, most of the existing hyperspectral imaging systems require light splitting/filtering devices for spectral modulation, making the system complex and expensive, and sacrifice spatial or temporal resolution. In this paper, we report an end-to-end deep learning method to reconstruct hyperspectral images directly from a raw mosaic image. It saves the separate demosaicing process required by other methods, which reconstructs the full-resolution RGB data from the raw mosaic image. This reduces computational complexity and accumulative error. Three different networks were designed based on the state-of-the-art models in literature, including the residual network, the multiscale network and the parallel-multiscale network. They were trained and tested on public hyperspectral image datasets. Benefiting from the parallel propagation and information fusion of different-resolution feature maps, the parallel-multiscale network performs best among the three networks, with the average peak signal-to-noise ratio achieving 46.83dB. The reported method can be directly integrated to boost an RGB camera for hyperspectral imaging.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Hyperspectral imaging provides the spectral characteristics of each spatial location of the target scene, representing the scene’s responses at different wavelengths (optical frequencies) [1]. Benefiting from the rich spatial-spectral-temporal information, hyperspectral imaging has been widely used in various fields including agriculture, geology and medicine, for such as crop monitoring, mineral detection and pathological examination [24]. The applications of hyperspectral imaging have also made continuous breakthroughs in the field of computer vision and graphics, including object tracking, image segmentation and scene rendering [57].

Most of the existing hyperspectral imaging systems require light splitting/filtering devices for spectral modulation, which make the system complex and expensive. Specifically, the existing hyperspectral imaging methods can be classified into scanning methods and single-shot methods. The scanning methods can be realized in the spatial domain [8] or in the spectral domain [9]. The spatial scanning utilizes a spectrometer to scan the scene point by point, and the spectral scanning leverages a camera to capture the scene channel by channel using a series of narrow-band filters. Both the scanning methods sacrifice temporal resolution in exchange for spectral information. The single-shot methods [1014] have higher temporal resolution, but rely on specially designed optics to convert the 3D spatial-spectral information into 1D or 2D measurements in the spatial domain. Correspondingly, spatial resolution is sacrificed [15], and reconstruction algorithms are required to recover hyperspectral images from the low-dimensional measurements. For example, in the famous coded-aperture snapshot multispectral imaging system (CASSI) [14], a prism and a coded aperture are integrated with a sensor to convert 3D hyperspectral information into 2D measurements, and the compressive sensing based algorithms are applied for reconstruction.

In recent years, hyperspectral imaging using an RGB camera has attracted growing attentions due to its simple setup [1619]. An RGB camera is constructed by covering a color filter array (CFA) in front of a sensor. Since the spectral modulation of CFA is wide-band coupling, hyperspectral information can be therefore decoupled by statistical learning from the acquired data. B. Arad et al. [16] created an overcomplete dictionary of hyperspectral bases and corresponding RGB projections, and then used it to reconstruct hyperspectral images. S. Wug Oh et al. [17] utilized a group of RGB cameras with different spectral sensitivities to yield different RGB measurements of the scene. Hyperspectral information was then decoupled from the RGB measurements using the principal component analysis (PCA) method. R. M. Nguyen et al. [18] proposed a training-based method, which implements a learning process to construct the mapping between white balanced RGB images and their spectral reflectances based on the radial basis function. Z. Xiong et al. [19] first upsampled an RGB image into hyperspectral ones by interpolation, and then used a residual network to improve hyperspectral reconstruction quality. All the above algorithms use the full-resolution three-channel RGB images as input, while an RGB camera directly acquires only one single image, i.e. the mosaic image. To obtain the full-resolution RGB images, demosaicing algorithms are required, which increases computational complexity and accumulative reconstruction error [20].

In this work, we aim to reconstruct hyperspectral images directly from a single raw mosaic image, without the separate demosaicing process. This reduces computational complexity and facilitates hardware integration. The reconstruction can be regarded as a dimension ascending process from the perspective of spectral information, and can be also regarded as a super resolution process from the perspective of spatial information. The deep learning method [21] is well suited for dimension ascending tasks. For example, it has been successfully applied in single-pixel imaging (reconstructing 2D images from 1D measurement sequence) [2224] and 3D modeling (reconstructing 3D geometry from 2D images) [2527]. In addition, the deep learning method is also widely used for image super resolution and produces great performance [2830]. Therefore, we resort the deep learning technique to tackle the hyperspectral reconstruction task.

Figure 1 presents a comparison of the conventional hyperspectral reconstruction algorithms from full-resolution RGB images and our method from a single mosaic image. The mosaic image and corresponding hyperspectral images are set as the input and output of the proposed network, respectively, for end-to-end learning. In order to obtain state-of-the-art hyperspectral reconstruction, we designed three different-structure networks based on the state-of-the-art models in literature, including the residual network, the multiscale network and the parallel-multiscale network. A series of experiments were conducted to compare their performance. The experiments validate that the proposed method can effectively realize hyperspectral decoupling from a single raw mosaic image. Taking advantages of the end-to-end learning’s low computational complexity and high reconstruction quality, the proposed method can be directly integrated to boost an RGB camera for hyperspectral imaging.

 figure: Fig. 1.

Fig. 1. Comparison among different hyperspectral imaging methods using an RGB camera. The conventional methods require a demosaicing algorithm to first reconstruct the full-resolution RGB images from the raw mosaic image, and then apply different algorithms for hyperspectral reconstruction. The reported method recovers hyperspectral images directly from the raw mosaic image with end-to-end learning. CNN stands for convolutional neural network.

Download Full Size | PDF

2. Methods

Mathematically, the hyperspectral images can be described as $S(x, y, \lambda )$, where $(x, y)$ index the spatial coordinates, and $\lambda$ indexes the spectral coordinate. The raw mosaic image acquired by an RGB camera can be described as

$$I(x, y)=\sum_{i=1}^{n} F\left(x, y, \lambda_{i}\right) S\left(x, y, \lambda_{i}\right) L\left(\lambda_{i}\right),$$
where $\bf {F}$ represents the CFA’s spectral response, $\bf {L}$ denotes the spectrum of illumination, and $n$ is the total number of spectrum channels. We aim to reconstruct the hyperspectral images $S(x, y, \lambda )$ from the single mosaic image $I(x, y)$, using the deep learning technique.

To explore the best solution, we designed three different networks based on the state-of-the-art models in literature, including the residual network, the multiscale network and the parallel-multiscale network. The residual-learning based network was first proposed by He et al. [31], and has been widely applied in image classification and restoration tasks [28,29,31]. It eases the training of networks with high depth, and resolves the degradation problem of deep networks. The multiscale network is able to extract the correlation information among different-scale feature maps [32], which upgrades final decoupling. It has been successfully applied in image segmentation and medical image analysis [32,33]. The parallel-multiscale network maintains high-resolution feature maps and leverages low-resolution representations to further boost high-resolution reconstruction [34], which produces more tiny structures and image details. It has been applied in objection detection and semantic segmentation [35]. We summarize the designed three networks in Fig. 2, all of which take the raw mosaic image as input and output the recovered hyperspectral images. Their specific structures are detailed as follows.

 figure: Fig. 2.

Fig. 2. Structures of the three designed networks, including the residual network, the multiscale network and the parallel-multiscale network. All of the networks use the single raw mosaic image as input, and output the reconstructed hyperspectral images.

Download Full Size | PDF

Residual network: The residual network contains two stages. The first stage operates on the input mosaic image with 64 convolution filters of size $3\times 3$, followed by a ReLU layer for activation. There are $k$ layers followed for feature extraction. Each layer consists of 64 filters of size $3\times 3$. The last layer of the first stage contains 3 filters of size $3\times 3$, and the output is added to the input to go for the second stage. The second stage comprises two phases. The first phase is for dimension ascending. For each spectrum channel, the convolution operation is performed with $c$ filters of size $1\times 1$, and the results are added together to produce high-dimensional data. The second phase learns the residual between the images after dimension ascending and corresponding ground-truth hyperspectral images. The convolution kernel size is the same as stage 1, and there are $m$ layers in total. The last layer, used for generating the residual, consists of $c$ convolution filters of size $3\times 3$. The residual is finally added to the high-dimensional data to output the reconstructed hyperspectral images.

Multiscale network: The multiscale network consists of two stages. The first stage is for dimension ascending, which is the same as the phase 1 of stage 2 in the proposed residual network. The second stage contains four downsampling operations and four upsampling operations to reconstruct hyperspectral images. The image resolution is reduced to half with every downsampling operation, and the width (number of channels) is doubled. A downsampling operation consists of a Conv_block and a Maxpool layer. The Conv_block has two convolution filters of size $3\times 3$, two ReLU layers and two batchnorm layers. An upsampling operation comprises an Up_block, a skip connection and a Conv_block. An Up_block consists of an interpolation operation, a convolution layer of size $3\times 3$, a ReLU layer and a batchnorm layer. The Up_block’s output is concatenated with the previous feature maps of the same resolution through skip connection. The Conv_block is utilized to further extract high-level features with more spatial structure information. The final layer, used for hyperspectral output, consists of $c$ filters of size $1\times 1$.

Parallel-multiscale network: The parallel-multiscale network consists of two stages. The first stage is for dimension ascending, which is the same as that mentioned above. The second stage is for hyperspectral completion. We first perform feature extraction with a Conv_block and four Bottleneck blocks. The Bottleneck block is the same as that in the ResNet50 network [31]. To better preserve spatial structures, we design four parallel subnets. For each subnet (each row), feature extraction is first performed with four BasicBlocks that are the same in the ResNet50 network, and then the feature maps are used with that of the other subnets. The image resolution of each subnet is reduced to half, and the width (number of channels) is doubled. We use a strided $3\times 3$ convolution layer for image downsampling, whose kernel size is $3\times 3$ and stride equals 2. The nearest neighbor interpolation and a convolution filter of size $1\times 1$ are respectively used for image upsampling and width matching. Finally, a convolution layer with $c$ filters of size $1\times 1$ is applied to produce hyperspectral images.

In order to obtain the optimal parameters of the networks, we use the mean squared error (MSE) between the reconstructed hyperspectral images $S(x, y, \lambda )$ and corresponding ground truth $S^{\prime }(x, y, \lambda )$ as the objective function, which is calculated as

$$MSE = \frac{1}{m \times n} \sum_{j=1}^{m} \sum_{i=1}^{n}\left\|S_j(x, y, \lambda_{i})-S_j^{\prime}(x, y, \lambda_{i})\right \|^{2},$$
where $m$ is the total number of training data, and $n$ is the total number of spectrum channels.

3. Experiments

3.1 Simulation settings

We used three public hyperspectral datasets to train the designed networks, including the B. Arad dataset [16], the CAVE dataset [36] and the Nascimento dataset [37]. These three datasets contain a large number of hyperspectral images of both indoor and outdoor scenes, ensuring the diversities of training and testing data. They have been widely used in hyperspectral imaging research [16,19,36,37]. The resolution of the B.Arad dataset is $1392\times 1300\times 31$, and that of the CAVE dataset is $512\times 512\times 31$. The spectral range of both datasets is from 400nm to 700nm, with the resolution of 10 nm. The resolution of the Nascimento dataset is $1024\times 1344\times 33$, and the spectrum ranges from 400 nm to 720 nm with the resolution of 10 nm. We took 30 samples from the three datasets in total for testing, and the rest of the data was used for training. In order to expand training data and make full use of the datasets, we randomly cropped the images into patches with the size of $128\times 128\times 31$, and finally obtained a total of 14900 samples for training. The corresponding mosaic images were simulated using the standard CIE color matching functions for hyperspectral-to-RGB projection [19].

In the experiments, we set $k$ = 6 and $m$ = 9 for the residual network. The batch size of all the networks was set to 32. We used the Adam solver [38] for gradient optimization, and set the weight decay as 0.0001. The learning rate was initialized as 0.001, and decreased by a factor of 10 for every 40 epochs. A total number of 120 epochs were trained to reach convergence. We used the peak signal-to-noise ratio (PSNR) and structral similarity index (SSIM) [39] as the quantitative metrics to evaluate reconstruction quanlity. PSNR measures the global intensity difference between the reconstructed hyperspectral images and corresponding ground truth, and SSIM evaluates their structural similarity. Both of these two metrics are widely used in the image processing community [15]. We first calculated the PSNR and SSIM of each channel separately, and then averaged them for a comparison among the three networks.

3.2 Results on synthetic data

Table 1 presents the quantitative reconstruction results of the three networks. We can see that the multiscale-parallel network obtains the best overall reconstruction quality, with the average PSNR reaching 46.38dB and SSIM being 0.9942. The residual network performs relatively worst with the average PSNR being 36.17dB and SSIM reaching 0.9587. In terms of network complexity, the residual network has the lowest computational complexity and the least parameters. The multiscale network has more parameters than the other two networks, but its computational complexity is lower than that of the parallel-multiscale network. The reason is that the parallel-multiscale network contains more high-resolution feature maps, which lead to much computation.

Tables Icon

Table 1. Reconstruction performance and network complexity of the reported three networks. M stands for million, and B stands for billion.

Table 1 also shows the reconstruction stability of the three networks on different spectrum channels, which is evaluated by the variation of the metrics. From the results, we can see that the multiscale network maintains the highest reconstruction stability on different spectrum channels, with the PSNR and SSIM variation being 12.03 and 1.6e-5, respectively. The metric variation of the parallel-multiscale network is a little higher. This indicates that the reconstruction improvement of the parallel-multiscale network over the multiscale network does not apply for every spectrum channel. Figure 3 presents a visual PSNR comparison among the three networks, which further validates that the parallel-multiscale network has a little wider PSNR range, but it maintains the best overall performance.

 figure: Fig. 3.

Fig. 3. Comparison of reconstruction stability on different spectrum channels and different samples of the three networks. The solid plots present the average PSNR of different channels, and the corresponding color areas indicate the PSNR range.

Download Full Size | PDF

Figure 4 presents a visual comparison of five selected channels among the three networks. The parallel-multiscale network outperforms the other two networks at most spectrum channels, but it may produce more errors at a few channels (such as 400nm) than the multiscale network. This further explains the reason that the parallel-multiscale network has a wider PSNR range than the multiscale network. Figure 5 shows exemplar spectrum reconstruction of 4 selected spatial locations from 4 scenes. It further validates that the multiscale network and the parallel-multiscale network have higher-fidelity reconstruction than the residual network.

 figure: Fig. 4.

Fig. 4. Reconstructed hyperspectral images and corresponding error maps of five selected channels using the reported mosaic-to-hyperspectral method. From left to right: ground-truth reference images, results of the residual network and corresponding error maps, results of the multiscale network and corresponding error maps, results of the parallel-multiscale network and corresponding error maps.

Download Full Size | PDF

 figure: Fig. 5.

Fig. 5. Reconstructed spectra of 4 selected spatial locations from 4 scenes. The X axis represents wavelength (nm), and the Y axis represents spectrum intensity.

Download Full Size | PDF

The different reconstruction performances of the three networks are attributed to their intrinsic structures. Although the residual network performs well on the conventional single-image super-resolution task [28], it produces relatively large error in hyperspectral reconstruction. The reason is that the residual network can only learn residuals between low-resolution images and corresponding high-resolution ones under the premise that they are highly similar in spatial structure. However, the hyperspectral images have a high spectral dimensionality, and the differences between the mosaic image and corresponding hyperspectral images are relatively large, which makes the residual difficult to learn. Another drawback of the residual network is that as the network depth increases, it is hard for the first few feature maps with rich spatial and structural information to contribute to the final reconstruction [29]. To improve reconstruction quality, it is necessary to preserve structure information as much as possible as the network goes deeper. The multiscale network utilizes skip connection to propagate feature maps of different resolutions to the final reconstruction, and thus preserves more spatial structures. To further improve reconstruction, the parallel-multiscale network leverages parallel propagation and multiscale fusion of different-resolution feature maps, which boost high-resolution representations with the help of low-resolution ones [34], and produce the highest average metric evaluation. We note that the parallel-multiscale network has a wider PSNR range than the multiscale network, as shown in Table 1 and Fig. 3. The reason may be that the parallel-multiscale network has repeated image upsampling operations and multiscale fusion, which may introduce extra errors and accumulate the errors into high-resolution feature maps at certain spectrum channels.

3.3 Results on experiment data

We also conducted experiments with real captured data to validate the effectiveness of the proposed method. We first utilized a tunable bandpass filter (Thorlabs Kurios-VB1/M) to calibrate the RGB spectral responses of a commercial RGB camera (Sony FCB-EV7520A, as shown in Fig. 6(a)). An edgepass filter (Thorlabs FELH0750, cut-on wavelength 750nm) was integrated to eliminate the influence of infrared light. The illumination was provided by natural sunlight. We set a whiteboard as the calibration target, and used the RGB camera to capture images band by band (from 420nm to 720nm, with the resolution of 1nm). The RGB spectral responses of the camera can be calculated by dividing the captured data by the spectrum of the tunable bandpass filter. Figure 6(b) presents the calibrated RGB spectra. We note that the calibrated spectra already contains the illumination spectrum. Besides, the three networks were re-trained with the synthetic mosaic images using the calibrated RGB spectra, because the calibrated RGB spectra are different from the CIE color spectra that we used in the simulations, and the training and testing images should follow the same mosaic formation.

 figure: Fig. 6.

Fig. 6. Experiment results of imaging a color checker. (a) shows the RGB camera used in our experiment. (b) presents its calibrated RGB spectral responses. (c) is the Macbeth color checker. (d - f) show reconstructed spectra of 3 exemplar color blocks. For (b) and (d - f), the X axis represents wavelength (nm), and the Y axis represents spectrum intensity.

Download Full Size | PDF

We used the camera to first image a Macbeth color checker. The three networks were applied on the acquired raw mosaic image for hyperspectral reconstruction. To obtain the ground truth as reference, we used the RGB camera to acquire hyperspectral images band by band with the tunable bandpass filter, and normalized the images with the camera’s spectral responses. Figure 6(d-f) show three exemplar reconstructed spectra of the Macbeth color checker. We can see that the parallel-multiscale network maintains the highest reconstruction accuracy than the other two models.

We also imaged an outdoor scene in the campus, as shown on the top left of Fig. 7. Two exampler reconstructed spectra of two locations are presented on the top right of Fig. 7. Reconstructed hyperspectral images are shown on the bottom, which reveal that the multiscale network and parallel-multiscale network have higher reconstruction quality than the residual network. The results are consistent with the simulations.

 figure: Fig. 7.

Fig. 7. Reconstruction results of an outdoor scene using the reported mosaic-to-hyperspectral method. The first row presents the target scene, the acquired raw mosaic image, and the reconstructed spectra of two locations. The X axis represents wavelength (nm), and Y axis represents spectrum intensity. Ref abbreviates reference, Res denotes the residual network, MS represents the multiscale network, and PMS denotes the parallel-multiscale network. The reconstructed hyperspectral images of three channels and corresponding error maps are shown below for a comparison.

Download Full Size | PDF

4. Conclusion

In this work, we reported a deep learning method to directly reconstruct hyperspectral images from a single raw mosaic image acquired by an RGB camera. Its advantages over the conventional hyperspectral imaging methods lie in the following three aspects. First, the setup of the proposed method is simple with only one RGB camera. Second, the end-to-end learning framework eliminates the requirement of separate image demosaicing, which simplifies reconstruction, and reduces computational complexity and accumulative error. Third, the state-of-the-art network structure is explored by a comprehensive model comparison, including the residual model, the multiscale model and the parallel-multiscale model. Their reconstruction performance is explored on various scenes. Note that the networks should be re-trained for different RGB cameras, to conform to the mosaic formation of different RGB spectra. Experiments on both synthetic and real captured data show that the parallel-multiscale network performs best among the three networks, with the average PSNR achieving 46.83dB. The great performance originates from the parallel propagation and information fusion of different-resolution feature maps. We believe that the proposed framework can promote the development and applications of hyperspectral imaging, with reduced hardware and software complexity.

Funding

Fundamental Research Funds for the Central Universities (3052019024); National Natural Science Foundation of China (61827901, 61971045).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. C.-I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification (Kluwer Academic/Plenum Publishers, 2003).

2. D. Haboudane, J. R. Miller, E. Pattey, P. J. Zarco-Tejada, and I. B. Strachan, “Hyperspectral vegetation indices and novel algorithms for predicting green LAI of crop canopies: Modeling and validation in the context of precision agriculture,” Remote Sens. Environ. 90(3), 337–352 (2004). [CrossRef]  

3. L.-J. Cheng and G. F. Reyes, “AOTF polarimetric hyperspectral imaging for mine detection,” in Detection Technologies for Mines and Minelike Targets, vol. 2496 (International Society for Optics and Photonics, 1995), pp. 305–311.

4. V. Backman, M. B. Wallace, L. T. Perelman, J. T. Arendt, R. Gurjar, M. G. Muller, Q. Zhang, G. Zonios, E. Kline, T. McGillican, S. Shapshay, T. Valdez, K. Badizadegan, J. M. Crawford, M. Fitzmaurice, S. Kabani, H. S. Levin, M. Seiler, R. R. Dasari, I. Itzkan, J. Van Dam, and M. S. Feld, “Detection of preinvasive cancer cells,” Nature 406(6791), 35–36 (2000). [CrossRef]  

5. J. Suo, L. Bian, F. Chen, and Q. Dai, “Bispectral coding: compressive and high-quality acquisition of fluorescence and reflectance,” Opt. Express 22(2), 1697–1712 (2014). [CrossRef]  

6. A. Sobral, S. Javed, S. Ki Jung, T. Bouwmans, and E.-H. Zahzah, “Online stochastic tensor decomposition for background subtraction in multispectral video sequences,” in IEEE I. Conf. Comp. Vis., pp. 946–953.

7. C. McElfresh, T. Harrington, and K. S. Vecchio, “Application of a novel new multispectral nanoparticle tracking technique,” Meas. Sci. Technol. 29(6), 065002 (2018). [CrossRef]  

8. A. F. Goetz, G. Vane, J. E. Solomon, and B. N. Rock, “Imaging spectrometry for earth remote sensing,” Science 228(4704), 1147–1153 (1985). [CrossRef]  

9. N. Gat, “Imaging spectroscopy using tunable filters: a review,” in Wavelet Applications VII, vol. 4056 (International Society for Optics and Photonics, 2000), pp. 50–64.

10. X. Cao, H. Du, X. Tong, Q. Dai, and S. Lin, “A prism-mask system for multispectral video acquisition,” IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2423–2435 (2011). [CrossRef]  

11. M. Gehm, R. John, D. Brady, R. Willett, and T. Schulz, “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express 15(21), 14013–14027 (2007). [CrossRef]  

12. X. Lin, Y. Liu, J. Wu, and Q. Dai, “Spatial-spectral encoded compressive hyperspectral imaging,” ACM Trans. Graph. 33(6), 1–11 (2014). [CrossRef]  

13. M. Descour and E. Dereniak, “Computed-tomography imaging spectrometer: experimental calibration and reconstruction results,” Appl. Opt. 34(22), 4817–4826 (1995). [CrossRef]  

14. G. R. Arce, D. J. Brady, L. Carin, H. Arguello, and D. S. Kittle, “Compressive coded aperture spectral imaging,” IEEE Signal Process. Mag. 31(1), 105–115 (2014). [CrossRef]  

15. N. A. Hagen and M. W. Kudenov, “Review of snapshot spectral imaging technologies,” Opt. Eng. 52(9), 090901 (2013). [CrossRef]  

16. B. Arad and O. Ben-Shahar, “Sparse recovery of hyperspectral signal from natural RGB images,” in IEEE European Conference on Computer Vision, pp. 19–34.

17. S. Wug Oh, M. S. Brown, M. Pollefeys, and S. Joo Kim, “Do it yourself hyperspectral imaging with everyday digital cameras,” in IEEE I. Conf. Comp. Vis. Patt. Recog., pp. 2461–2469.

18. R. M. Nguyen, D. K. Prasad, and M. S. Brown, “Training-based spectral reconstruction from a single RGB image,” in IEEE European Conference on Computer Vision, pp. 186–201.

19. Z. Xiong, Z. Shi, H. Li, L. Wang, D. Liu, and F. Wu, “HSCNN: CNN-based hyperspectral image recovery from spectrally undersampled projections,” in IEEE I. Conf. Comp. Vis., pp. 518–525.

20. O. Losson, L. Macaire, and Y. Yang, “Comparison of color demosaicing methods,” Adv. Imaging Electron Phys. 162, 173–265 (2010). [CrossRef]  

21. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]  

22. M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. 7(1), 17865 (2017). [CrossRef]  

23. C. F. Higham, R. Murray-Smith, M. J. Padgett, and M. P. Edgar, “Deep learning for real-time single-pixel video,” Sci. Rep. 8(1), 2369 (2018). [CrossRef]  

24. T. Shimobaba, Y. Endo, T. Nishitsuji, T. Takahashi, Y. Nagahama, S. Hasegawa, M. Sano, R. Hirayama, T. Kakue, A. Shiraki, and T. Ito, “Computational ghost imaging using deep learning,” Opt. Commun. 413, 147–151 (2018). [CrossRef]  

25. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A deep representation for volumetric shapes,” in IEEE I. Conf. Comp. Vis. Patt. Recog., pp. 1912–1920.

26. A.-I. Popa, M. Zanfir, and C. Sminchisescu, “Deep multitask architecture for integrated 2D and 3D human sensing,” in IEEE I. Conf. Comp. Vis. Patt. Recog., pp. 6289–6298.

27. J. Wu, Y. Wang, T. Xue, X. Sun, B. Freeman, and J. Tenenbaum, “MarrNet: 3D shape reconstruction via 2.5D sketches,” in Adv. Neur. In., pp. 540–550.

28. J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in IEEE I. Conf. Comp. Vis., pp. 1646–1654.

29. B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in IEEE I. Conf. Comp. Vis. Patt. Recog., pp. 136–144.

30. Y. Tai, J. Yang, X. Liu, and C. Xu, “MemNet: A persistent memory network for image restoration,” in IEEE I. Conf. Comp. Vis., pp. 4539–4547.

31. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE I. Conf. Comp. Vis. Patt. Recog., pp. 770–778.

32. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2015), pp. 234–241.

33. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Med. Image Anal. 42, 60–88 (2017). [CrossRef]  

34. K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” arXiv preprint arXiv:1902.09212 (2019).

35. K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, and J. Wang, “High-resolution representations for labeling pixels and regions,” CoRR, abs/1904.04514 (2019).

36. F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar, “Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum,” IEEE Trans. on Image Process. 19(9), 2241–2253 (2010). [CrossRef]  

37. S. M. Nascimento, K. Amano, and D. H. Foster, “Spatial distributions of local illumination color in natural scenes,” Vision Res. 120, 39–44 (2016). [CrossRef]  

38. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

39. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]  

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. Comparison among different hyperspectral imaging methods using an RGB camera. The conventional methods require a demosaicing algorithm to first reconstruct the full-resolution RGB images from the raw mosaic image, and then apply different algorithms for hyperspectral reconstruction. The reported method recovers hyperspectral images directly from the raw mosaic image with end-to-end learning. CNN stands for convolutional neural network.
Fig. 2.
Fig. 2. Structures of the three designed networks, including the residual network, the multiscale network and the parallel-multiscale network. All of the networks use the single raw mosaic image as input, and output the reconstructed hyperspectral images.
Fig. 3.
Fig. 3. Comparison of reconstruction stability on different spectrum channels and different samples of the three networks. The solid plots present the average PSNR of different channels, and the corresponding color areas indicate the PSNR range.
Fig. 4.
Fig. 4. Reconstructed hyperspectral images and corresponding error maps of five selected channels using the reported mosaic-to-hyperspectral method. From left to right: ground-truth reference images, results of the residual network and corresponding error maps, results of the multiscale network and corresponding error maps, results of the parallel-multiscale network and corresponding error maps.
Fig. 5.
Fig. 5. Reconstructed spectra of 4 selected spatial locations from 4 scenes. The X axis represents wavelength (nm), and the Y axis represents spectrum intensity.
Fig. 6.
Fig. 6. Experiment results of imaging a color checker. (a) shows the RGB camera used in our experiment. (b) presents its calibrated RGB spectral responses. (c) is the Macbeth color checker. (d - f) show reconstructed spectra of 3 exemplar color blocks. For (b) and (d - f), the X axis represents wavelength (nm), and the Y axis represents spectrum intensity.
Fig. 7.
Fig. 7. Reconstruction results of an outdoor scene using the reported mosaic-to-hyperspectral method. The first row presents the target scene, the acquired raw mosaic image, and the reconstructed spectra of two locations. The X axis represents wavelength (nm), and Y axis represents spectrum intensity. Ref abbreviates reference, Res denotes the residual network, MS represents the multiscale network, and PMS denotes the parallel-multiscale network. The reconstructed hyperspectral images of three channels and corresponding error maps are shown below for a comparison.

Tables (1)

Tables Icon

Table 1. Reconstruction performance and network complexity of the reported three networks. M stands for million, and B stands for billion.

Equations (2)

Equations on this page are rendered with MathJax. Learn more.

I ( x , y ) = i = 1 n F ( x , y , λ i ) S ( x , y , λ i ) L ( λ i ) ,
M S E = 1 m × n j = 1 m i = 1 n S j ( x , y , λ i ) S j ( x , y , λ i ) 2 ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.