Simultaneous coded aperture and dictionary optimization in compressive spectral imaging via coherence minimization

Chenning Tao; Huanzheng Zhu; Peng Sun; Rengmao Wu; Zhenrong Zheng

doi:10.1364/OE.396260

1. Introduction

Spectral imaging captures both spatial and spectral information of a scene. The captured spatial (2D) and spectral (1D) information form a three-dimensional (3D) hyperspectral data cube, which is extensively used in remote sensing [1,2], astronomy [3,4], agriculture [5,6], medical imaging [7–10], etc. The 3D data cube can be acquired with a full sampling scheme employing scanning process, which is time-consuming and limited in applications of dynamic scene [11]. Compressive spectral imaging alleviates the time cost by simultaneously sensing and compressing the spatio-spectral information based on the compressive sensing (CS) theory [12]. CS theory asserts that certain signals can be recovered from measurements with sampling rate far lower than the requirement of the Nyquist-Shannon sampling theorem [13]. Coded aperture snapshot spectral imager (CASSI) [14–17] is a typical compressive spectral imaging system which gains increasing interest due to its potential in reconstructing hyperspectral images within a single shot. Improvement has been made to the CASSI system to ensure both the reconstruction quality and the imaging speed. The first CASSI was proposed with a symmetrically-placed dual-disperser (DD) architecture [18]. Soon afterwards a single-disperser (SD) CASSI system was developed [19,20] and the use of multiple snapshots was discussed [21]. Then, the CASSI system was expanded to a colored version by using an RGB image sensor in place of the original monochrome image sensor [22] or replacing the binary coded aperture (CA) with the colored coded aperture [23,24].

The sensing process of the CASSI system can be described by the system matrix H, which is related to the system structure and the CA. With the sparse basis or the dictionary D, the projection of CASSI can be expressed as A = HD, which is the sensing matrix in the CS problem. In CS theory, the Restricted Isometry Property (RIP) is used to analyze the robustness of compressive sampling. However, as it is difficult to directly calculate RIP, the notion of coherence offers an equivalent approach to evaluate CS measurements. The coherence of A is determined by the correlation between the rows of H and the columns of D. Therefore, the coherence of A can be minimized by optimizing H and D. Recent works on coherence minimization of A mainly focus on optimizing H with binary (block-unblock) CA optimization [25], colored CA optimization [23,24,26] and blue noise CA optimization [27,28] with fixed D (usually discrete cosine transform basis, wavelet basis or their combination); or directly decreasing coherence of D with decorrelation step [29], iterative projections and rotations [30] or rank shrinkage [31] without regard to H. In other related works on coherence minimization with system matrix or sparse basis optimization for universal CS problem [32–34], the system matrix is not structured in the form of the CASSI system and unsuitable for CASSI. A simultaneous optimization of the system matrix H and the sparse basis or dictionary D is required for efficient coherence minimization of the sensing matrix A and better sparse representation performance in CASSI.

In this paper, a simultaneous system matrix H and overcomplete dictionary D optimization algorithm is proposed and demonstrated with the reconstruction of hyperspectral images in a DD CASSI system with an RGB image sensor (DD-RGB CASSI). To minimize the coherence and improve the sampling efficiency, the Frobenius norm coherence is introduced as the target for CA and dictionary optimization. The CA and the dictionary are optimized by genetic algorithm and gradient descent method separately. Significant improvement has been made in the reconstruction performance by the optimized CA and the dictionary, and the sampling efficiency is also appreciably increased due to the lower coherence value.

The structure of this paper is arranged as follows. In Subsection 2.1, the configuration of the DD-RGB CASSI system and the sensing process are introduced. Subsection 2.2 illustrates the concept of coherence and its importance in CS, and shows the calculation of mutual coherence and sensing coherence. In Subsection 2.3, the dictionary optimization algorithm based on gradient descent method and the coded aperture optimization algorithm solved by genetic algorithm for minimization of coherence, and the simultaneous optimization of the system are elaborated. After that, simulation and discussion will be presented to prove the performance of the proposed algorithm in Section 3 before we conclude our work in Section 4.

2. System configuration and method

2.1 System model

The sensing mechanism of the DD-RGB CASSI system is depicted in Fig. 1. The DD-RGB CASSI system is a modified version of the original DD CASSI by replacing the monochrome image sensor with an RGB image sensor. The input 3D data cube is denoted as f_mnl with the spatial coordinates m, n, and the spectral coordinate l. f_mnl describes a data cube of resolution N × N × L (0 ≤ m, n ≤ N, 0 ≤ l ≤ L). The data cube is successively modulated by the first disperser, the coded aperture, and the second disperser. As the two dispersers are symmetrically placed on the two sides of the coded aperture, the dispersion in the spatial dimension is counteracted. With the binary coded aperture T_mn, the light field g_mn before the sensor can be written as

(1)$${g_{mn}} = \sum\limits_{l = 0}^{L - 1} {{f_{mnl}}} {T_{({m + l} )n}}$$

Fig. 1. Sensing mechanism of the DD-RGB CASSI system. The input data cube is dispersed by a disperser and then filtered by the CA. The spatio-spectral modulated data is dispersed by a second disperser to counteract the dispersion. Finally, the projection is detected by an RGB image sensor.

Download Full Size | PDF

As mentioned above, an RGB camera with the Foveon structure is adopted here as the sensor of the DD-RGB CASSI system. There are three layers of pixels with identical pixel numbers and pixel size in the Foveon structure, which detect red, green and blue in sequence, as shown in Fig. 1. The response of the sensor is given by

(2)$$\begin{aligned} {g_{mn,R}} &= \sum\limits_{l = 0}^{L - 1} {{\omega _{l,R}}{f_{mnl}}} {T_{({m + l} )n}}\\ {g_{mn,G}} &= \sum\limits_{l = 0}^{L - 1} {{\omega _{l,G}}{f_{mnl}}} {T_{({m + l} )n}}\\ {g_{mn,B}} &= \sum\limits_{l = 0}^{L - 1} {{\omega _{l,B}}{f_{mnl}}} {T_{({m + l} )n}} \end{aligned}$$

where ω_R, ω_G and ω_B are the spectral response of the RGB camera for red, green and blue layer respectively. Equation (2) can be expressed in the matrix form as

(3)$$\textbf{g} = {\textbf{H}_{DD - RGB}}\textbf{f}$$

H_DD_-RGB is the system matrix of the DD-RGB CASSI system.

It should be noted that, a single shot of compressive measurement may not provide sufficient information for reconstruction of a target scene which includes rich details in both the spatial and spectral dimension. Therefore, a set of K snapshots are taken, each corresponding to a different coded aperture. Figure 2 illustrates the structure of H_DD-RGB for N = 4, L = 6 using K = 2 shots with random binary CA. Instead of the binary element in the system matrix of a conventional CASSI, the gray elements in H_DD-RGB indicate the spectral response of the RGB sensor, which is different for the red, green, and blue sensors, respectively.

Fig. 2. The structure of H_DD_-RGB is illustrated for N = 4, L = 6 using K = 2 shots with random binary CA.

Download Full Size | PDF

Compared to a common SD CASSI system, the unique feature of the proposed DD-RGB CASSI system mainly lies in the symmetric dual-disperser structure and the RGB sensor. The DD structure is preferred due to its better adaptation for block-wise reconstruction, which may significantly relieve the computational burden. In conventional SD structure, as there is only one disperser shearing the data cube in the spatial dimension, the block-wise reconstruction is strongly influenced by the neighbor sub data cubes (neighbor blocks) due to overlapping. However, in the DD structure, the spatial shear from the first disperser is counteracted by the second disperser, the spatial shear is only applied on the CA. The overlapping of the neighbor sub data cubes is eliminated, and therefore the spatial division and block-wise reconstruction is applicable. Moreover, for the DD structure, the coherence of the sensing matrix A is lower than SD structure, which will be discussed later. An RGB sensor functions similarly to the colored CA, which provides an additional modulation in the spectral dimension with the spectral response (spectral transmission for colored CA). And as the Foveon structure absorbs all light arriving at the sensor, the light efficiency of DD-RGB CASSI system is identical to the common SD CASSI or DD CASSI system, which is generally twice higher than the colored CA systems [35]. Furthermore, the coherence of A can be further decreased with the DD-RGB CASSI system, which is also discussed later in the results part.

2.2 Coherence

In CS theory, a given signal $\textbf{f} \in {{\mathbb R}^{{N^2}L}}$ can be sparsely represented under a fixed basis D by $\textbf{f} = \textbf{D}\boldsymbol{\theta}$, where $\textbf{D} = [{{\textbf{d}_1}\;{\textbf{d}_2} \ldots {\textbf{d}_d}} ]\in {{\mathbb R}^{{N^2}L \times d}}$ (N²L ≤ d, equality holds for orthogonal basis, smaller holds for overcomplete dictionary) and $\boldsymbol{\theta} \in {{\mathbb R}^d}$ is a sparse representation for f. In the CASSI regime, the signal f is the vectorized 3D data cube. With the system matrix $\textbf{H} = {[{\textbf{h}_1^T,\; \textbf{h}_2^T, \ldots ,\;\textbf{h}_{XK}^T} ]^T} \in {{\mathbb R}^{XK \times {N^2}L}}$(X = N² for the DD system, X = N(N + L-1) for the SD system), the projection between the sparse representation θ and the available measurement g is ${\textbf{g}} = {\textbf{HD}}\boldsymbol{\theta}$. Therefore, the CS problem for the reconstruction of hyperspectral images in CASSI is the nonlinear optimization

(4)$$\hat{\textbf{f}} = {\textbf{D}} \mathop{\arg \min }\limits_{\boldsymbol{\theta}} ({||{{\textbf{A}}{\boldsymbol{\theta}} - {\textbf{g}}} ||_2^2 + \tau {{||\boldsymbol{\theta}||}_1}} )$$

where A = HD is the sensing matrix, τ is a regularization parameter.

As mentioned above, the coherence is an equivalent evaluation criterion for RIP judgement in CS. The coherence of a matrix is defined as the largest absolute inner product between any two different columns after column normalization, which is also named the mutual coherence [33]

(5)$$\mu (\textbf{A} )= \mathop {\max }\limits_{1 \le i,j \le d\atop {i} \ne j} \frac{{\left|{\left\langle {{\textbf{a}_i},{\textbf{a}_j}} \right\rangle } \right|}}{{||{{\textbf{a}_i}} ||||{{\textbf{a}_j}} ||}}$$

where a_i is a column in matrix A. As A = HD, the mutual coherence can also be expressed as $\mu ({\textbf{H},\; \textbf{D}} )= \mathop {\max }\nolimits_{1 \le i \le {N^2}K \atop {i}\le j \le d} |{{\textbf{h}_i}{\textbf{d}_j}} |$ after row/column normalization of H/D [36]. In CS theory, if the number of measurements m satisfies the condition [13] that

(6)$$m \ge c{\mu ^2}({\textbf{H},\textbf{D}} )S{N^2}L\log ({{N^2}L} )$$

with a positive constant c and the sparsity S of θ, the signal f can be exactly reconstructed in the optimization problem [Eq. (4)] with overwhelming probability. This theorem indicates that by decreasing the sparsity S or the mutual coherence μ(H, D), fewer samples are required and the sampling efficiency can be significantly improved. For the sparsity S, the sparse basis or the dictionary D should be able to represent the data as sparse as possible. As for the mutual coherence μ(H, D), the correlation between H and D should be reduced by optimization. However, decreasing the mutual coherence is equivalent to solve the problem $\mathop{\min}_{\textbf{H},\textbf{D}} f({\textbf{H},\; \textbf{D}} )= ||{\textbf{D}^T}{\textbf{H}^T}\textbf{HD} - {\textbf{I}}||_{\infty}$, where the λ_∞-norm is nonconvex and the objective is nonsmooth. To ease this problem, the Frobenius norm is adopted as a substitute for the λ_∞-norm [36,37]

(7)$${\cal J}({\textbf{H},\textbf{D}} )\buildrel \Delta \over = ||{{\textbf{D}^T}{\textbf{H}^T}\textbf{HD} - \textbf{I}} ||_F^2$$

The minimization of ${\cal J}$(H, D) aims to eliminate the off-diagonal elements in the Gram matrix D^TH^THD. In the following discussion, ${\cal J}$(H, D) is named the sensing coherence for convenience.

2.3 Optimization

In order to minimize the sensing coherence ${\cal J}$(H, D), algorithms for joint optimization of the over-complete dictionary D and the coded aperture (corresponding to H) are proposed. Instead of the commonly used sparse bases e.g. DCT basis and wavelet basis, the trainable over-complete dictionary is used. The over-complete dictionary can not only represent the training data with better sparsity, but also retain a necessary space for the optimization of D, as the coherence of the common orthogonal basis is close to zero. To begin with, an initial dictionary D₀ is trained independently using the K-SVD algorithm [38] with samples from a hyperspectral image dataset X. Then a gradient descent method is applied to optimize D with fixed H for efficient coherence minimization. The gradient of ${\cal J}$ with respect to D can be computed by (see Appendix A)

(8)$$\frac{{\partial {\cal J}}}{{\partial \textbf{D}}} = 4{\textbf{H}^T}\textbf{HD}{\textbf{D}^T}{\textbf{H}^T}\textbf{HD} - 4{\textbf{H}^T}\textbf{HD}$$

The update equation of the dictionary is

(9)$${\textbf{D}_{q + 1}} = {\textbf{D}_q} - \beta (4{\textbf{H}^T}\textbf{HD}{\textbf{D}^T}{\textbf{H}^T}\textbf{HD} - 4{\textbf{H}^T}\textbf{HD})$$

where q is the iteration index and β is the stepsize. However, the gradient descent optimization applied to the dictionary only considers coherence minimization and is unconstrained by the training dataset or the original dictionary. To maintain the sparse representation property of the coherence minimized dictionary for the training dataset, the gradient descent is coupled with K-SVD. The pseudocode of the coupled algorithm is shown in Algorithm 1. After Q iterations of gradient descent update (with result D_Q), another K-SVD dictionary training is performed with the function KSVD(X, D), which is initialized by the given dictionary D = D_Q. The coupled gradient descent with K-SVD is iterated for M times to decrease the coherence of the sensing matrix A and obtain a dictionary that sparsely represents the training dataset X.

Algorithm 1: Dictionary optimization
Input: Dataset X, dictionary D₀, system matrix H, stepsize ${\beta}$, number of iterations M, Q
Output: Dictionary D
1 begin
2 D ← D₀
3 for m = 1 : M do
4 for q = 1 : Q do
5 D ← D – ${\beta}$(4H^T HDD^T H^T HD – 4H^T HD)
6 end
7 D ← KSVD(X,D)
8 end

As the structure of a system matrix H is defined by the system configuration, when the spectral response ω of the detector is determined, the optimization for H is performed with the coded aperture optimization. Define the translation matrix as

(10)$${\boldsymbol{\Gamma}_{l,k}} = \left[ {\begin{array}{ccc} {{\textbf{0}_{{N^2} \times N({l - 1} )+ N({N + L - 1} )({k - 1} )}}}&{{\textbf{I}_{{N^2}}}}&{{\textbf{0}_{{N^2} \times N({L - l} )+ N({N + L - 1} )({K - k} )}}} \end{array}} \right]$$

where l = 1, 2, …, L is the spectral coordinate, k = 1, 2, …, K is the shot number and 0 is the matrix of zeros. With the vectorized coded aperture $\textbf{p} \in {{\mathbb R}^{NK({N + L - 1} )}}$, the system matrix H_DD_-RGB for DD-RGB CASSI of one shot (K = 1) is

(11)$${\textbf{H}_{DD - RGB}} = \left[ {\begin{array}{cccc} {{\omega_R}(1)diag({\boldsymbol{\Gamma}_{1,1}}\textbf{p})}&{{\omega_R}(2)diag({\boldsymbol{\Gamma}_{2,1}}\textbf{p})}& \cdots &{{\omega_R}(L)diag({\boldsymbol{\Gamma}_{L,1}}\textbf{p})}\\ {{\omega_G}(1)diag({\boldsymbol{\Gamma}_{1,1}}\textbf{p})}&{{\omega_G}(2)diag({\boldsymbol{\Gamma}_{2,1}}\textbf{p})}& \cdots &{{\omega_G}(L)diag({\boldsymbol{\Gamma}_{L,1}}\textbf{p})}\\ {{\omega_B}(1)diag({\boldsymbol{\Gamma}_{1,1}}\textbf{p})}&{{\omega_B}(2)diag({\boldsymbol{\Gamma}_{2,1}}\textbf{p})}& \cdots &{{\omega_B}(L)diag({\boldsymbol{\Gamma}_{L,1}}\textbf{p})} \end{array}} \right]$$

where diag(v) forms a diagonal matrix with the diagonal elements from a vector v. The objective for optimization is ${\cal J}$(p) = ${\cal J}$(H_DD_-RGB(p), D) with a fixed D. The sensing coherence is then minimized by the genetic algorithm (GA) with p constrained by the requirement of binary block-unblock coded aperture.

The sensing coherence ${\cal J}$(H, D) is determined by the correlation between H and D. Therefore, a simultaneous optimization algorithm is required for further decrease of the coherence. Instead of fixing H or D in the optimization, the individually optimized H and D are iteratively updated, which is shown in Algorithm 2. The simultaneous optimization is performed until the convergence of the coherence, determined by the coherence value difference between iterations smaller than a given threshold. At the end of each optimization for dictionary or CA, the reconstructions are performed and the quality metrics are calculated for validation of optimization. To be noted, the reconstructions are not required for the optimization, which are listed in Algorithm 2 only for clarity.

Algorithm 2: Simultaneous optimization
Input: Dictionary D₀
Output: Dictionary D, system matrix H_DD-RGB
1 begin
2 D ← D₀
3 Repeat until convergence:
4 For D fixed, update H_DD-RGB with optimized p from GA
5 For H_DD-RGB fixed, update D using Algorithm 1
6 Solve (4) with H_DD-RGB and D, and compute quality metrics
7 end

3. Results and discussions

The mutual coherence and the sensing coherence of the SD CASSI with a monochrome image sensor (SD-mono), the SD CASSI with an RGB image sensor (SD-RGB), the DD CASSI with a monochrome image sensor (DD-mono) and the DD CASSI with an RGB image sensor (DD-RGB) are calculated as shown in Table 1. The highly coherent dictionary (μ(D) = 1) is trained by K-SVD with the hyperspectral images from Natural Scenes 2015 (NS 2015) dataset [39], in which the dictionary contains 3000 atoms and the atom size is 8 × 8 × 31 (spatial × spatial × spectral). The coherence is calculated with 1000 times Monte Carlo simulation with one random block-unblock coded aperture each time. The mutual coherence between the system matrix and the dictionary is not influenced by the system structure, which is constantly 1. The sensing coherence for all of the four system structures are above 0.98, indicating limited contribution of the system structure in decreasing the coherence when the dictionary is highly coherent. For the dictionary (μ(D) = 0.6) decorrelated by INK-SVD [29], both the mutual coherence and the sensing coherence are significantly decreased with the system structure improvement, which reveals the advantage of DD/RGB over SD/monochrome in terms of coherence and therefore indicates that higher sampling efficiency is ensured with the DD-RGB system structure. Moreover, the sensing coherence for an optimized blue noise (BN) CA [28] with a non-optimized dictionary for all four configurations shows improvement compared to the random CA, and the DD-RGB structure maintains the lowest sensing coherence. The coherence comparison in Table 1 demonstrates that optimization on CA and dictionary efficiently decreases the coherence, and reveals the advantage of the DD-RGB structure not influenced by the potential optimization on both CA and dictionary in terms of coherence.

Table 1. Coherence comparison of four CASSI system configurations

View Table | View all tables in this article

The NS 2015 dataset is used for the dictionary training in our optimization algorithm. The NS 2015 dataset consists of 30 hyperspectral images, in which 24 hyperspectral images are randomly selected for training and the other 6 hyperspectral images are used for testing. To formulate the training dataset, patches with size 8 × 8 × 31 (spatial × spatial × spectral) are extracted from the hyperspectral images without overlapping. In algorithm 1, parameters are set to M = 10, Q = 100, and β = 10⁻¹³. The optimized dictionary contains 3000 atoms with size 8 × 8 × 31 (spatial × spatial × spectral), with a sparsity level of 3, which limits the maximum number of atoms for representing one hyperspectral image. To ensure the binary values of the elements in the coded aperture during the GA optimization, the lower and upper limit for the elements in p is set as 0 and 1, respectively, and the precision of the variable is set to restrict the elements to be binary. The initial coded apertures are randomly generated with transmittance of 50%, the number of generation and the population size of each generation are both set to 200, the crossover probability is set to 0.8, the mutation probability is set to 0.1 and the inversion probability is set to 0.2.

In order to evaluate the performance of the dictionary and the coded aperture optimized by the proposed algorithms, three quantitative image quality metrics are adopted here, including full-reference quality metric peak signal-to-noise ratio (PSNR)

(12)$$\textrm{PSNR} = 10{\log _{10}}(\frac{{\max {{(\textbf{f})}^2}}}{{\frac{1}{{{N^2}L}}||{\textbf{f}\textrm{ - }\hat{\textbf{f}}} ||_2^2}})$$

the structural similarity (SSIM) [40], which is related to the spatial fidelity of the reconstruction

(13)$$\textrm{SSIM} = \frac{\left(2\nu _{\textbf{f}}{\nu _{\hat{\textbf{f}}} + {C_1}} \right)\left( 2\sigma _{\textbf{f}\hat{\textbf{f}}} + {C_2} \right)}{\left(\nu _{\textbf{f}}^2 + \nu _{{\hat{\textbf{f}}}^2 + {C_1}} \right)\left( \sigma _{\textbf{f}}^2 + \sigma _{\hat{\textbf{f}}}^2 + {C_2} \right)}$$

where ${\nu _\textbf{f}},\; \sigma _\textbf{f}^2$ and ${\sigma _{{{\textbf f}{\hat f}}}}$ are the arithmetic mean, the variance and the covariance respectively, C₁ and C₂ are constants; and the spectrum angular mapper (SAM) [41], which describes the spectral fidelity of the reconstruction

(14)$$\textrm{SAM} = \frac{1}{{{N^2}}}\sum\limits_{i = 1}^{{N^2}} {{{\cos }^{ - 1}}\left( {\frac{{\left\langle {{\textbf{f}_{i,L}},{{\hat{\textbf{f}}}_{i,L}}} \right\rangle }}{{{{||{{\textbf{f}_{i,L}}} ||}_2} \cdot {{||{{{\hat{\textbf{f}}}_{i,L}}} ||}_2}}}} \right)}$$

where f_i,L is a L-length vector extracted from f as the spectrum of a spatial pixel i. Higher PSNR and SSIM values, and lower SAM indicate better reconstruction quality.

For K = 3 shots, the sensing coherence of the optimized system and the PSNR, SSIM, SAM values of the reconstruction (solving (4)) of three test images with OMP [42] are shown in Table 2. The non-optimized CA and dictionary are denoted with CA₀ and D₀, which are random block-unblock CA and dictionary trained with K-SVD, respectively. For CA₀ D₁ which is not included in the iteration of Algorithm 2, the dictionary is individually optimized with Algorithm 1 and would not influence the following steps in Table 2. The sensing coherence of CA₀ D₀ is close to 1, which corresponds to the default system structure of DD-RGB. With CA optimized (CA₁ D₀), the sensing coherence is still high, and meanwhile the quality metrics of the reconstruction are slightly better than CA₀ D₀, indicating limited improvement of mere CA optimization. The optimization of dictionary CA₀ D₁ shows significant reduction on the sensing coherence and simultaneous improvement of the quality metrics. However, if only the dictionary is optimized (CA₀ D₁), the coherence and the quality metrics cannot reach the best values (CA₂ D₂). The convergence is reached at CA₂ D₂, with the coherence difference between iterations smaller than 1% of the current coherence value.

Table 2. Reconstruction quality analysis of three test images at each iteration step

View Table | View all tables in this article

The optimized CA (CA of iteration step CA₂ D₂) is shown in Fig. 3. Each shot is shown respectively with the black pixels denoting block and the white pixels denoting unblock. The size of the CA for one shot is 8 × 38 (N × N(N + L-1)). The transmittance of the optimized CA is slightly lower than 50%, which is a relatively high value compared to the state-of-the-art blue noise CA. The reason for the higher transmittance value is related to the optimization approach. For direct optimization on (colored) coded aperture according to RIP, the CA of different shots in one set are required to be complimentary [15,28]. Whereas for the optimization based on equivalent coherence notation, the complementary CA constraint is removed to avoid double restriction. Therefore, there might be multiple measurements for a spatial point, which contributes to a high transmittance value.

Fig. 3. The optimized CA for K = 3 with the corresponding transmittance.

Download Full Size | PDF

To intuitively demonstrate the reconstruction performance of the simultaneously optimized CA and dictionary, the corresponding RGB images for the reconstructed hyperspectral images of three test images are shown in Fig. 4, compared with the reconstruction results of the non-optimized DD-RGB system, blue noise CA [28] /orthogonal dictionary (OD) [43] on SD-mono system, and other conventional system structures. The test images are 512 × 512 × 31 in resolution with a wavelength range of 400-700 nm at an interval of 10 nm.

Fig. 4. Visual comparison of three test images from the reconstructions of the SD-mono system, the SD-RGB system, the DD-mono system, the DD-RGB system, the SD-mono system with orthogonal dictionary, the SD-mono system with blue noise coded aperture and the optimized DD-RGB system. (P1-P3) are the spectral signatures for three spatial points which are indicated by the red points in the ground truth.

Download Full Size | PDF

The ground truths of the three test images are shown in the first column in Fig. 4. Parts of the images inside the red dashed boxes are magnified and shown in insets for clear comparison. The corresponding quality metrics of the reconstructions are shown in Table 3. For the SD-mono and the SD-RGB with single disperser configuration, as there is spatial shear in the projection, the spatial fidelity indicated with SSIM is generally worse than DD-mono and DD-RGB respectively. In the insets of Fig. 4, both the SD-based structures show more apparent spatial noise compared to the DD-based configurations. On the other hand, the spectral fidelity of the reconstruction for SD-mono and SD-RGB is higher than that of DD-mono and DD-RGB respectively, indicated by the lower SAM values. Overall, DD-mono and DD-RGB are slightly better compared to the corresponding SD-based structures in terms of PSNR. State-of-the-art algorithms of individual CA optimization with blue noise patterns and individual dictionary optimization with fast orthogonal dictionary learning improve the corresponding quality metrics compared to the non-optimized SD-mono system. For the DD-RGB with the simultaneously optimized CA and dictionary, all quality metrics show significant improvement over the non-optimized DD-RGB, individually optimized dictionary/CA, or other structures. The point spectra of three spatial points (P1, P2, and P3 shown in the ground truth, respectively) are shown for the non-optimized DD-RGB, the orthogonal dictionary SD-mono, the blue noise CA SD-mono, and the optimized DD-RGB. For the non-optimized DD-RGB, the orthogonal dictionary SD-mono, and the blue noise CA SD-mono, the spectra are roughly close to the spectra of the ground truth. However, spectral details are lost in the reconstruction of these configurations. Whereas for the optimized DD-RGB, the spectra are very close to the ground truth, with error in the form of fluctuation around the truth. Therefore, the optimized DD-RGB can appreciably increase both the spatial and spectral fidelity of the reconstructed hyperspectral images, especially compared to the conventional SD-monochrome structure with non-optimized CA and dictionary.

Table 3. Reconstruction quantitative assessment of different CASSI system configurations

View Table | View all tables in this article

The reconstruction quality of CASSI is strongly determined by the number of measurement shots. Generally, a larger number of shots yields higher fidelity in reconstruction. In Fig. 5, the reconstruction quality metrics of the DD-RGB with optimized CA and dictionary are compared to that of the non-optimized DD-RGB and the conventional SD-monochrome system structure. For test image 1 and image 2, with only 1 measurement shot in the optimized DD-RGB, the PSNR and the SSIM for the reconstructed image are higher than the result with 10 shots for the non-optimized DD-RGB and SD-monochrome, and the corresponding SAM indicates the same conclusion. For test image 3, the reconstruction quality for 5 shots in SD-monochrome is similar to that of 1 shot measurement in the optimized DD-RGB. Fewer measurements required for the reconstruction in the DD-RGB with optimized CA and dictionary prove higher sampling efficiency, which is in good accordance with the conclusion of coherence analysis.

Fig. 5. Reconstruction quality analysis of three test images from the optimized DD-RGB system, the DD-RGB system and the SD-monochrome system with respect to the number of shots.

Download Full Size | PDF

To investigate the robustness of the optimization on CA and dictionary, white Gaussian noises with different noise levels are added in the measurements, and the quantitative assessment of the reconstruction is shown in Table 4. Before added noise, the original data is normalized according to the maximum of each hyperspectral image. With the noise level increased from σ = 0.001 to 0.01, the quality metrics of the optimized DD-RGB degrade. High noise level is mainly detrimental to the reconstruction in the spectral dimension, indicated by the significant increase of SAM values. Whereas for the spatial dimension, the influence is weak with a slight decrease of SSIM. Compared to the non-optimized DD-RGB, though the optimized DD-RGB is more sensitive to noise, the quality metrics are still better than those of the non-optimized DD-RGB in both spatial and spectral dimensions. Furthermore, for test image 1 and 2, the quality metrics for the optimized DD-RGB at noise level of σ = 0.01 are better than the non-optimized DD-RGB without noise. Therefore, the robustness of the optimized CA and dictionary is reliable for most cases and the optimization is indispensable even under the disturbance of potential noises.

Table 4. Reconstruction quantitative assessment of the DD-RGB system and the optimized DD-RGB system at different noise levels

View Table | View all tables in this article

The complexity analysis of the proposed simultaneous optimization for CA and dictionary is shown in Table 5, for one iteration of gradient descent, genetic algorithm, and K-SVD, respectively. The variable Pop in the genetic algorithm denotes the population size, t is the number of vectorized hyperspectral images in the training dataset X, and S corresponds to the inner iterations in K-SVD. The dimension of the vectorized hyperspectral image is N²L, and therefore the time complexity is mostly related to the power of its dimension. Although the third or second power for N²L is included in the complexity, the block-wise measurement and reconstruction assist to mitigate the pressure on high computation demand. Furthermore, for a given hyperspectral image dataset and a determined system configuration, the optimization for the dictionary and the CA is required for only one time, which will not be included in the following measurements and reconstructions.

Table 5. Complexity analysis for one iteration

View Table | View all tables in this article

4. Conclusion

In summary, we propose a simultaneous system matrix H and overcomplete dictionary D optimization algorithm to minimize the coherence and improve the sampling efficiency of the DD-RGB CASSI system. Compared to other system structures, DD-RGB CASSI has the lowest coherence of the sensing matrix. To minimize the Frobenius norm coherence, the CA and the dictionary are optimized by genetic algorithm and gradient descent method respectively. With the optimized CA and the dictionary, the sensing coherence of the system is significantly reduced and the quality metrics of the reconstruction show appreciable improvement compared to the non-optimized ones. With lower coherence value, the sampling efficiency is also effectively increased, the reconstruction quality is maintained for fewer measurements with the optimized CA and the dictionary. The simultaneous optimization on the system matrix and the dictionary relieves the data acquisition afford in CASSI with high sampling efficiency and improves the reconstruction quality of hyperspectral images. The concept of simultaneous optimization on system projection and sparse representation basis introduces opportunities for further development of system structure and reconstruction algorithm on other compressive sensing based imaging systems e.g. magnetic resonance imaging, compressive X-ray tomography and single pixel imaging.

Appendix A

The gradient of ${\cal J}$ with respect to D is calculated, where H is considered fixed:

(15)$$\begin{aligned} \frac{{\partial {\cal J}}}{{\partial \textbf{D}}} &= \frac{\partial }{{\partial \textbf{D}}}||{{\textbf{D}^T}{\textbf{H}^T}\textbf{HD} - \textbf{I}} ||_F^2\\ &= \frac{\partial }{{\partial \textbf{D}}}Tr\{{{{({{\textbf{D}^T}{\textbf{H}^T}\textbf{HD} - \textbf{I}} )}^T}({{\textbf{D}^T}{\textbf{H}^T}\textbf{HD} - \textbf{I}} )} \}\\ &= \frac{\partial }{{\partial \textbf{D}}}[{Tr({{\textbf{D}^T}{\textbf{H}^T}\textbf{HD}{\textbf{D}^T}{\textbf{H}^T}\textbf{HD}} )- 2Tr({{\textbf{D}^T}{\textbf{H}^T}\textbf{HD}} )} ]\\ &= 4{\textbf{H}^T}\textbf{HD}{\textbf{D}^T}{\textbf{H}^T}\textbf{HD} - \frac{\partial }{{\partial \textbf{D}}}2Tr({{\textbf{D}^T}{\textbf{H}^T}\textbf{HD}} )\\ &= 4{\textbf{H}^T}\textbf{HD}{\textbf{D}^T}{\textbf{H}^T}\textbf{HD} - 4{\textbf{H}^T}\textbf{HD} \end{aligned}$$

Funding

National Natural Science Foundation of China (61327902).

Disclosures

The authors declare no conflicts of interest.

References

1. M. Borengasser, W. S. Hungate, and R. L. Watkins, Hyperspectral Remote Sensing: Principles and Applications (CRC, 2008).

2. I. Makki, R. Younes, C. Francis, T. Bianchi, and M. Zucchetti, “A survey of landmine detection using hyperspectral imaging,” ISPRS J. Photogramm. 124, 40–53 (2017). [CrossRef]

3. A. S. K. Kumar and A. R. Chowdhury, “Hyper-Spectral Imager in visible and near-infrared band for lunar compositional mapping,” Proc. - Indian Acad. Sci., Earth Planet. Sci. 114(6), 721–724 (2005). [CrossRef]

4. A. S. K. Kumar, A. R. Chowdhury, A. Banerjee, A. B. Dave, B. N. Sharma, K. J. Shah, K. R. Murali, S. Mehta, S. R. Joshi, and S. S. Sarkar, “Hyper Spectral Imager for lunar mineral mapping in visible and near infrared band,” Curr. Sci. 96(4), 496–499 (2009).

5. J. Im and J. R. Jensen, “Hyperspectral remote sensing of vegetation,” Geogr. Compass 2(6), 1943–1961 (2008). [CrossRef]

6. D. Wu and D.-W. Sun, “Advanced applications of hyperspectral imaging technology for food quality and safety analysis and assessment: A review—Part I: Fundamentals,” Innovative Food Sci. Emerging Technol. 19, 1–14 (2013). [CrossRef]

7. G. Lu and B. Fei, “Medical hyperspectral imaging: a review,” J. Biomed. Opt. 19(1), 010901 (2014). [CrossRef]

8. A. Bjorgan and L. L. Randeberg, “Towards real-time medical diagnostics using hyperspectral imaging technology, “ in Proceedings of the OSA European Conference on Biomedical Optics, (Optical Society of America, 2015), pp. 953712.

9. W. R. Johnson, D. W. Wilson, W. Fink, M. S. Humayun, and G. H. Bearman, “Snapshot hyperspectral imaging in ophthalmology,” J. Biomed. Opt. 12(1), 014036 (2007). [CrossRef]

10. C. Tao and Z. Zheng, “Compressive Hyperspectral Imaging Enhanced Biomedical Imaging,” BJSTR 22(4), 16805–16807 (2019). [CrossRef]

11. Y. Garini, I. T. Young, and G. McNamara, “Spectral imaging: Principles and applications,” Cytometry, Part A 69A(8), 735–747 (2006). [CrossRef]

12. N. Hagen and M. W. Kudenov, “Review of snapshot spectral imaging technologies,” Opt. Eng. 52(9), 090901 (2013). [CrossRef]

13. E. J. Candes and M. B. Wakin, “An Introduction to Compressive Sampling,” IEEE Signal Process. Mag. 25(2), 21–30 (2008). [CrossRef]

14. Y. Wu, I. O. Mirza, G. R. Arce, and D. W. Prather, “Development of a digital-micromirror-device-based multishot snapshot spectral imaging system,” Opt. Lett. 36(14), 2692–2694 (2011). [CrossRef]

15. G. Arce, D. Brady, C. Lawrence, H. Arguello, and D. Kittle, “Compressive Coded Aperture Spectral Imaging: An Introduction,” IEEE Signal Process. Mag. 31(1), 105–115 (2014). [CrossRef]

16. L. Wang, T. Zhang, Y. Fu, and H. Huang, “HyperReconNet: Joint Coded Aperture Optimization and Image Reconstruction for Compressive Hyperspectral Imaging,” IEEE Trans. on Image Process. 28(5), 2257–2270 (2019). [CrossRef]

17. C. Tao, H. Zhu, P. Sun, R. Wu, and Z. Zheng, “Hyperspectral image recovery based on fusion of coded aperture snapshot spectral imaging and RGB images by guided filtering,” Opt. Commun. 458, 124804 (2020). [CrossRef]

18. M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express 15(21), 14013–14027 (2007). [CrossRef]

19. A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt. 47(10), B44–B51 (2008). [CrossRef]

20. A. A. Wagadarikar, N. P. Pitsianis, X. Sun, and D. J. Brady, “Video rate spectral imaging using a coded aperture snapshot spectral imager,” Opt. Express 17(8), 6368–6388 (2009). [CrossRef]

21. D. Kittle, K. Choi, A. A. Wagadarikar, and D. J. Brady, “Multiframe image estimation for coded aperture snapshot spectral imagers,” Appl. Opt. 49(36), 6824–6833 (2010). [CrossRef]

22. H. Rueda, D. Lau, and G. R. Arce, “Multi-spectral compressive snapshot imaging using RGB image sensors,” Opt. Express 23(9), 12207–12221 (2015). [CrossRef]

23. H. Arguello and G. R. Arce, “Colored coded aperture design by concentration of measure in compressive spectral imaging,” IEEE Trans. on Image Process. 23(4), 1896–1908 (2014). [CrossRef]

24. A. Parada-Mayorga and G. R. Arce, “Colored coded aperture design in compressive spectral imaging via minimum coherence,” IEEE Trans. Comput. Imag. 3(2), 202–216 (2017). [CrossRef]

25. H. Arguello and G. R. Arce, “Restricted isometry property in coded aperture compressive spectral imaging,” in IEEE Statistical Signal Processing Workshop (SSP) (IEEE, 2012), pp. 716–719.

26. L. Galvis, E. Mojica, H. Arguello, and G. Arce, “Shifting colored coded aperture design for spectral imaging,” Appl. Opt. 58(7), B28–B38 (2019). [CrossRef]

27. N. Diaz, C. Hinojosa, and H. Arguello, “Adaptive grayscale compressive spectral imaging using optimal blue noise coding patterns,” Opt. Laser Technol. 117, 147–157 (2019). [CrossRef]

28. C. V. Correa, H. Arguello, and G. R. Arce, “Spatiotemporal blue noise coded aperture design for multi-shot compressive spectral imaging,” J. Opt. Soc. Am. A 33(12), 2312–2322 (2016). [CrossRef]

29. B. Mailhé, D. Barchiesi, and M. D. Plumbley, “INK-SVD: Learning incoherent dictionaries for sparse representations,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2012), pp. 3573–3576.

30. D. Barchiesi and M. D. Plumbley, “Learning incoherent dictionaries for sparse approximation using iterative projections and rotations,” IEEE Trans. Signal Process. 61(8), 2055–2065 (2013). [CrossRef]

31. S. Ubaru, A.-K. Seghouane, and Y. Saad, “Improving the incoherence of a learned dictionary via rank shrinkage,” Neural Comput. 29(1), 263–285 (2017). [CrossRef]

32. J. M. Duarte-Carvajalino and G. Sapiro, “Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,” IEEE Trans. Image Process. 18(7), 1395–1408 (2009). [CrossRef]

33. V. Abolghasemi, S. Ferdowsi, and S. Sanei, “A gradient-based alternating minimization approach for optimization of the measurement matrix in compressive sensing,” Signal Process. 92(4), 999–1009 (2012). [CrossRef]

34. C. Lu, H. Li, and Z. Lin, “Optimized projections for compressed sensing via direct mutual coherence minimization,” Signal Process. 151, 45–55 (2018). [CrossRef]

35. M. Marquez, P. Meza, H. Arguello, and E. Vera, “Compressive spectral imaging via deformable mirror and colored-mosaic detector,” Opt. Express 27(13), 17795–17808 (2019). [CrossRef]

36. X. Cao, T. Yue, X. Lin, S. Lin, X. Yuan, Q. Dai, L. Carin, and D. J. Brady, “Computational snapshot multispectral cameras: Toward dynamic capture of the spectral world,” IEEE Signal Process. Mag. 33(5), 95–108 (2016). [CrossRef]

37. A. P. Cuadros and G. R. Arce, “Coded aperture optimization in compressive X-ray tomography: a gradient descent approach,” Opt. Express 25(20), 23833–23849 (2017). [CrossRef]

38. M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process. 54(11), 4311–4322 (2006). [CrossRef]

39. S. M. C. Nascimento, K. Amano, and D. H. Foster, “Spatial distributions of local illumination color in natural scenes,” Vision Res. 120, 39–44 (2016). [CrossRef]

40. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13(4), 600–612 (2004). [CrossRef]

41. F. A. Kruse, A. Lefkoff, J. Boardman, K. Heidebrecht, A. Shapiro, P. Barloon, and A. J. Goetz, “The spectral image processing system (SIPS)—interactive visualization and analysis of imaging spectrometer data,” Remote Sens. Environ. 44(2-3), 145–163 (1993). [CrossRef]

42. J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007). [CrossRef]

43. Y. Fonseca, T. Gelvez, and H. Arguello, “Robust Compressive Spectral Image Recovery Algorithm Using Dictionary Learning and Transform Tensor SVD,” in 2019 IEEE 27th European Signal Processing Conference (EUSIPCO), (IEEE, 2019), pp.1–5.

Dictionary	CA	Index	SD-mono	SD-RGB	DD-mono	DD-RGB
μ(D) = 1	Random binary	μ(H, D)	1	1	1	1
μ(D) = 1	Random binary	$J$ (H, D)	0.9872	0.9857	0.9848	0.9817
μ(D) = 0.6	Random binary	μ(H, D)	0.9537	0.9408	0.9176	0.8507
μ(D) = 0.6	Random binary	$J$ (H, D)	0.6709	0.6400	0.5675	0.5016
μ(D) = 1	BN	μ(H, D)	1	1	1	1
μ(D) = 1	BN	$J$ (H, D)	0.9838	0.9824	0.9822	0.9801

Quality metrics	Image number	CA₀ D₀	CA₁ D₀	CA₀ D₁	CA₁ D₁	CA₂ D₁	CA₂ D₂
$J$ (H, D)	/	0.9817	0.9805	0.8267	0.8116	0.8096	0.8189
PSNR	Img 1	34.06	34.63	40.83	41.44	41.24	42.10
	Img 2	33.28	34.05	38.92	39.26	39.12	39.34
	Img 3	36.46	37.13	40.61	40.58	41.01	41.02
SSIM	Img 1	0.9905	0.9916	0.9980	0.9983	0.9982	0.9985
	Img 2	0.9825	0.9853	0.9952	0.9956	0.9954	0.9957
	Img 3	0.9863	0.9883	0.9948	0.9948	0.9953	0.9953
SAM	Img 1	0.1262	0.1190	0.05134	0.04753	0.04915	0.04332
	Img 2	0.1268	0.1165	0.07322	0.06753	0.06840	0.06611
	Img 3	0.1431	0.1362	0.07754	0.07651	0.07444	0.07440

Image number	Quality metrics	SD-mono	SD-RGB	DD-mono	DD-RGB	OD SD-mono	BN SD-mono	Optimized DD-RGB
Img 1	PSNR	32.23	34.30	33.24	34.32	36.05	34.48	42.10
	SSIM	0.9858	0.9910	0.9910	0.9909	0.9940	0.9914	0.9985
	SAM	0.1222	0.1081	0.1375	0.1202	0.08163	0.1080	0.04332
Img 2	PSNR	32.79	33.67	33.52	33.81	33.98	34.74	39.34
	SSIM	0.9802	0.9838	0.9783	0.9843	0.9852	0.9873	0.9957
	SAM	0.1015	0.09960	0.1242	0.1192	0.1016	0.0954	0.06611
Img 3	PSNR	34.49	36.65	35.70	36.62	34.73	36.83	41.02
	SSIM	0.9805	0.9875	0.9849	0.9873	0.9801	0.9884	0.9953
	SAM	0.1250	0.1155	0.1366	0.1406	0.1334	0.1126	0.07440

Image number	Quality metrics	σ = 0.001		σ = 0.005		σ = 0.01
Image number	Quality metrics	DD-RGB	Optimized DD-RGB	DD-RGB	Optimized DD-RGB	DD-RGB	Optimized DD-RGB
Img 1	PSNR	34.28	41.96	33.55	39.16	33.08	36.34
	SSIM	0.9909	0.9985	0.9854	0.9971	0.9839	0.9930
	SAM	0.1203	0.04568	0.1257	0.06940	0.1385	0.1054
Img 2	PSNR	33.81	39.25	33.04	37.42	32.52	35.41
	SSIM	0.9843	0.9956	0.9785	0.9933	0.9805	0.9866
	SAM	0.1195	0.06702	0.1247	0.08333	0.1362	0.1153
Img 3	PSNR	36.61	40.82	36.21	38.51	34.73	36.05
	SSIM	0.9873	0.9950	0.9811	0.9916	0.9821	0.9885
	SAM	0.1407	0.07760	0.1498	0.1147	0.1739	0.1658

	Complexity
Gradient descent	$N^{6} L^{2} K + 3 N^{4} L^{2} d + 2 N^{6} L^{3}$
Genetic algorithm	$P o p \cdot (N^{2} K d^{2} + N^{2} K^{2} d^{2})$
K-SVD	$t \cdot (8 N^{4} L^{2} + 4 N^{2} L S^{2} + 32 N^{2} L S + S^{3}) + 80 N^{6} L^{3}$

Simultaneous coded aperture and dictionary optimization in compressive spectral imaging via coherence minimization

Abstract

1. Introduction

2. System configuration and method

2.1 System model

2.2 Coherence

2.3 Optimization

3. Results and discussions

4. Conclusion

Appendix A

Funding

Disclosures

References

Cited By

Figures (5)

Tables (5)

Equations (15)

Optics Express