Multi-threshold splitting tree algorithm to reduce the number of filters in programmable hyperspectral imaging for fast multi-target classification

Fengdi Zhang; Wenbin Xu; Zhuoyu Zhang; Yuning Gao; Zheng Zhou; Xiaoyu Cui; Xiaoyu Cui; Hao Lei; Shuo Chen; Shuo Chen

doi:10.1364/OE.458531

1. Introduction

Programmable hyperspectral imaging is an optical computational imaging technique that typically codes hyperspectral post-processing algorithms as spectral transmittances. Such post-processing can be directly performed during the optical imaging process through a special optical dispersive element, i.e., a programmable optical filter [1]. The schematic of the programmable hyperspectral imaging technique is shown in Fig. 1. More specifically, by coding a specific hyperspectral post-processing algorithm (e.g., a hyperspectral image classification model) as the spectral transmittance, the optical imaging of a real-world scene through such spectral transmittance can be equivalent to the numerical post-processing of the hyperspectral data of the same real-world scene, i.e., the acquired optical measurements are theoretically consistent with the outputs of numerical post-processing. Those employed post-processing algorithms often have the inherent characteristics of enhancing the optical signals at wavelengths with more considerable target-background differences while alleviating the others [2]. Thus, the programmable hyperspectral imaging technique is expected to be a promising and efficient approach for target classification.

Fig. 1. The schematic of programmable hyperspectral imaging technique.

Download Full Size | PDF

In contrast to conventional hyperspectral imaging techniques, which typically perform such wavelength selection in the post-processing step, the programmable hyperspectral imaging technique brings it forward into the physical process of optical imaging. Although such a technique cannot obtain the full spectrum at each pixel, the most critical spectral information about the target classification can still be maintained in the acquired optical image. Therefore, the programmable hyperspectral imaging technique can benefit from its advantages of fast image acquisition [3,4], post-processing free [5,6], and a much lower load of data transmission and storage [7]. In addition, since the final output of such a technique is the integration of optical signals at multiple wavelengths in nature, its signal-to-noise ratio (SNR) is supposed to be several dozens or hundreds of times higher than that of conventional hyperspectral imaging techniques [5,8]. The programmable hyperspectral imaging technique has been reported to be capable of achieving ultra-fast target classification for a large field of view (512 $\times$ 217 pixels) within an average time cost of only 0.066 s, which is only 0.35% of the time cost for conventional hyperspectral imaging-based target classification method, and the SNR can also be improved by approximately ten times [9]. Therefore, the programmable hyperspectral imaging technique is expected to achieve fast or even real-time target classification in a snapshot.

Based on the principle of the programmable hyperspectral imaging technique, its performance should be closely related to the design of the filter’s spectral transmittance used for imaging. Such spectral transmittance is frequently designed and optimized by the regression coefficient vectors (RCVs) of specific linear models [10]. This is attributed to the fact that the input feature vectors of the linear model can be regarded as the inherent hyperspectral data of the real-world scene, and the RCV of the linear model can be coded as the filter’s spectral transmittance. More specifically, according to the theory of optical imaging, the acquired optical measurements of a real-world scene are essentially the integration of the optical signals at different wavelengths with certain transmittances, which can be theoretically represented as the multiplication between the intrinsic spectral information (i.e., the spectra) of a real-world scene and filter’s spectral transmittance; while in numerical post-processing, the final outputs used for classification decision making can be theoretically represented as the multiplication between the spectra of a real-world scene and RCV. It can be found that both the filter’s spectral transmittance and RCV are essentially functioned to select optical signals at specific wavelengths with certain weights. More specifically, the filter’s spectral transmittance performs such selection during the optical imaging process; however, the RCV performs such selection during the numerical post-processing. If the spectral transmittance coded by the RCV achieves the same selection, the acquired optical measurements through such coding spectral transmittance should be consistent with the outputs of numerical post-processing through such RCV. In real practice, the coding spectral transmittance can be realized by a programmable optical filter, which can realize arbitrary spectral transmittance. Its detailed working principles can be found in Section 1.1 of Supplement 1 and a published paper [1]. Although there might be negative transmittances in the coding spectral transmittance, such a problem can be solved by inducing a compensation spectral transmittance, in which only one additional optical imaging through all-pass spectral transmittance (i.e., 100% transmittances at all wavelengths) is sufficient to compensate for all kinds of coding spectral transmittance (to complete different target classification tasks) when imaging each real-world scene. The detailed procedure and derivation can be found in Section 1.2 of Supplement 1.

Since only one RCV is typically required to differentiate target and background in numerical post-processing, it can be reasonably speculated that the programmable hyperspectral imaging technique also requires only one coding spectral transmittance to complete a single-target classification task. However, when the programmable hyperspectral imaging technique encounters a multi-target classification task, such task was commonly decomposed into multiple single-target classification tasks with multiple filters of different spectral transmittances, in which the acquisition time should increase dramatically with the required number of filters. Therefore, reducing the number of filters is extremely important for a multi-target classification task when using programmable hyperspectral imaging technique. Such a problem can be simplified as reducing the number of RCVs used for coding the spectral transmittances.

In this paper, a multi-threshold splitting tree (MTST) splitting strategy is proposed to reduce the number of filters, i.e., the number of RCVs, for fast multi-target classification with programmable hyperspectral imaging technique. Such multi-threshold design and tree structure can squeeze more performance out of each RCV and save the required number of RCVs as much as possible. The proposed MTST splitting strategy is numerically investigated on six publicly available hyperspectral data sets. The results have demonstrated that this splitting strategy can reduce the number of filters by 25% to 80% while maintaining similar classification performance as those conventional splitting strategies, which is of great significance to improve the speed of multi-target classification with programmable hyperspectral imaging technique.

2. Related works

Because the filter’s spectral transmittance used for imaging has a decisive impact on the performance of programmable hyperspectral imaging technique, various studies have been conducted to explore the design of such coding spectral transmittance. Based on multivariate curve resolution (MCR) [11] and total least squares (TLS) [10] algorithms, Wilcox et al. designed two types of binary spectral transmittances (i.e., only with 0% or 100% transmittances) to realize a single-target classification of different chemicals, respectively. Rehrauer et al. [12] further modified it into a pair of complementary binary spectral transmittances, and can selectively direct photons about two different chemicals to two detectors, thus a pair of images can be captured simultaneously to classify those two different chemicals. Besides, Scotté et al. [13] broke the constraint of binary spectral transmittance and developed a Cramer-Rao lower bound-based linear model to optimize the spectral transmittance, in which a fast single-target classification was achieved for microcalcification chemical imaging. Recently, we proposed a method to code convolutional neural networks (CNNs) as spectral transmittances to realize precise multi-target classification, demonstrating significant advantages over the spectral transmittances designed by conventional methods [9]. Such a multi-target classification task was actually decomposed into multiple single-target classification tasks following the one-vs.-rest (OvR) splitting strategy [14,15], and each single-target classification task consists of an RCV.

In addition to OvR splitting strategy, other conventional splitting strategies can achieve similar decomposition. Those conventional splitting strategies [16] can mainly be categorized into one-vs.-one (OvO) [14], OvR and many-vs.-many (MvM). According to error-correcting output codes (ECOC) [17], MvM splitting strategy can be implemented by 5 different coding modes, including ordinal coding [17], dense random coding [14], sparse random coding [14], binary complete coding [14] and ternary complete coding [18,19]. The detailed rules of the above splitting strategies are described as follows.

1. OvO: All targets are paired with all possible combinations. In each pair of targets, one target is set as positive and the other one is set as negative. Thus, an $N$-target classification task always requires $N(N-1)/2$ RCVs in this splitting strategy.
2. OvR: One target is set as positive, and the remaining targets are set as negative. The above procedure is repeated until each target has been set as positive once. Thus, an $N$-target classification task always requires $N$ RCVs in this splitting strategy.
3. Ordinal coding: When the first base-classifier is trained, the first target is set as positive and the other targets are set as negative. When the second base-classifier is trained, the first two targets are set as positive and the other targets are set as negative. For an $N$-target classification task, the above procedure can only stop until the $(N-1)$th base-classifier has been trained, thus $N-1$ RCVs are always required in this coding mode of MvM.
4. Dense random coding: When each base-classifier is trained, all targets are randomly set as positive or negative with the same probability. Each target must be set as positive and negative at least once. If the settings of any two or more base-classifiers are the same or opposite, one of them is retained and the rest are reset. For an $N$-target classification task, the number of RCVs required in this coding mode of MvM is uncertain but close to ${10\log }_2{N}$.
5. Sparse random coding: When each base-classifier is trained, all targets are randomly set as positive, negative, or ignored with the probability of 0.25, 0.25 and 0.5, respectively. Similar to the dense random coding, each target must be set as positive and negative at least once. If the settings of any two or more base-classifiers are the same or opposite (excluding ignored targets), one of them is retained and the rest are reset. For an $N$-target classification task, the number of RCVs required in this coding mode of MvM is also uncertain but close to ${15\log }_2{N}$.
6. Binary complete coding: All targets are set as positive or negative with all possible combinations except all positive or all negative. For each pair of combinations, which are opposite to each other, only one of them is retained for training the base-classifier. Thus, an $N$-target classification task always requires $2^{N-1}-1$ RCVs in this coding mode of MvM.
7. Ternary complete coding: All targets are set as positive, negative, or ignored with all possible combinations except all positive, all negative or all ignored. For each pair of combinations, which are opposite to each other (excluding ignored targets), only one of them is retained for training the base-classifier. Thus, an $N$-target classification task always requires ${(3^{N}-2^{N+1}+1)}/2$ RCVs in this coding mode of MvM.

3. Methodology

In programmable hyperspectral imaging technique, the hyperspectral data cube of a real-world scene is specifically projected into low-dimensional subspace directly through optical imaging with specific spectral transmittances, which is typically coded by the RCVs of a classification model trained from hyperspectral data cubes of similar scenes consisting of similar targets, aiming to maximize the discrimination of different targets. For the multi-target classification with programmable hyperspectral imaging technique, a multi-target classification model is necessary to code the required spectral transmittances of different filters, and the acquisition time should increase with the required number of filters. The multi-target classification model is generally composed of a splitting strategy and certain base-classifiers, in which each base-classifier contains an RCV. The function of the splitting strategy is to split the multi-target classification model into multiple base-classifiers to be trained and connect the trained base-classifiers to output the prediction results. Thus, the splitting strategy can significantly affect the number of base-classifiers, i.e., the number of RCVs. As mentioned above, the number of required filters in programmable hyperspectral imaging is proportional to the number of RCVs in the multi-target classification model. In this study, the multi-threshold linear base-classifier template and MTST splitting strategy are specially designed to reduce the number of RCVs, i.e., to reduce the number of filters in programmable hyperspectral imaging for fast multi-target classification. It should be noted that the base-classifier is replaceable if it conforms to the multi-threshold linear base-classifier template below.

3.1 Multi-threshold linear base-classifier template

The conventional single-threshold linear base-classifier template can be described as

(1)$$f(\boldsymbol{x})=\left\{\begin{array}{ll} 0, & \boldsymbol{w}^{T} \boldsymbol{x} \geq \theta \\ 1, & \boldsymbol{w}^{T} \boldsymbol{x}<\theta \end{array}\right.,$$

where a target or cluster-target (each cluster-target consists of one or multiple targets) can be classified by the output of $f(\boldsymbol {x})$, column vector $\boldsymbol {x}$ is the input feature vector, column vector $\boldsymbol {w}$ is the RCV, and $\theta$ is a threshold. When a multi-target classification task is decomposed into multiple classification subtasks, multiple targets or cluster-targets are classified one by one with the employment of multiple base-classifiers with the single-threshold linear base-classifier template, indicating that the multiple RCVs are always required accordingly.

The multi-threshold linear base-classifier template, which can complete an $(N+1)$-target classification task with only one RCV and $N$ thresholds, can be described as

(2)$$f(\boldsymbol{x})=\left\{\begin{array}{cc} 0, & \boldsymbol{w}^{T} \boldsymbol{x} \in\left[\theta_{1},+\infty\right) \\ 1, & \boldsymbol{w}^{T} \boldsymbol{x} \in\left[\theta_{2}, \theta_{1}\right) \\ \vdots & \vdots \\ N, & \boldsymbol{w}^{T} \boldsymbol{x} \in\left(-\infty, \theta_{N}\right) \end{array}\right.,$$

where $\theta _1,\theta _2,\ldots,\theta _N$ are $N$ thresholds. In contrast to the single-threshold linear base-classifier template, the multi-threshold linear base-classifier template can perform multi-target classification by only increasing the number of thresholds, but without increasing the number of RCVs.

The multi-threshold linear base-classifier template involves two types of parameters to be trained: the RCV $\boldsymbol {w}$ and the multiple thresholds $\theta _1,\theta _2,\ldots,\theta _N$. The feasible methods to train the RCV can mainly be categorized into label-dependent and label-independent methods. For those label-dependent methods that usually utilize the idea of regression, such as ordinary least squares (OLS) [20] and support vector regression (SVR) [21], the label values of different targets participate in the training and optimization of the RCV, thus the different settings of label values on targets can produce different optimal solutions of the RCV and further influence the classification performance. In contrast, label-independent methods usually utilize the idea of clustering, such as linear discriminant analysis (LDA) [22] and contrastive loss [23], to train the RCV, in which the label values only represent different targets other than being involved in training. Because the number of possible combinations of label values exponentially increases with the number of targets when training the RCV, the label-independent methods are more recommended to obtain the optimal solution of the RCV efficiently and consistently.

After obtaining the RCV, the multiple thresholds $\theta _1,\theta _2,\ldots,\theta _N$ are determined by maximizing the Youden’s Index [24]. More specifically, given an $(N+1)$-target labeled training set $\left \{X_{i}\right \}_{i=1}^{N+1}$, let $m_i$ be the sample number of $i$th target, then the mean value of the feature vectors in $i$th target can be calculated by

(3)$$u_{i}=\frac{1}{\left|m_{i}\right|} \sum_{\boldsymbol{x} \in X_{i}} \boldsymbol{w}^{T} \boldsymbol{x}.$$

The $\left \{X_{i}\right \}_{i=1}^{N+1}$ is then sorting in descending order according to the value of $u_i$ to obtain $\left \{X_{i^{\prime }}\right \}_{i^{\prime }=1}^{N+1}$, where $u_{i^{\prime }}\geq u_{i^{\prime }+1}$. The number of positive and negative samples are probably unbalanced at each time that the threshold is determined. Therefore, both of the true positive rate (TPR) and false positive rate (FPR) are considered when determining the $N$ thresholds for classifying each target. The $N$ thresholds can be determined by

(4)$$\theta_{d}=\arg \max _{\theta}\left[\mathrm{TPR}_{d}(\theta)-\mathrm{FPR}_{d}(\theta)\right],$$

where ${\rm TPR}_d\left (\theta \right )$ and ${\rm FPR}_d\left (\theta \right )$ are the TPR and FPR, respectively, when determining the $d$th threshold $\theta _d$, and can be calculated by

(5)$$\operatorname{TPR}_{d}(\theta)=\frac{1}{\left|\left\{X_{i^{\prime}}\right\}_{i^{\prime}=1}^{d}\right|} \sum_{\boldsymbol{x} \in\left\{X_{i^{\prime}}\right\}_{i^{\prime}=1}^{d}} \mathrm{I}\left(\boldsymbol{w}^{T} \boldsymbol{x} \geq \theta\right),$$

(6)$$\mathrm{FPR}_{d}(\theta)=\frac{1}{\left|\left\{X_{i^{\prime}}\right\}_{i^{\prime}=d+1}^{N+1}\right|} \sum_{\boldsymbol{x} \in\left\{X_{i^{\prime}}\right\}_{i^{\prime}=d+1}^{N+1}} \mathrm{I}\left(\boldsymbol{w}^{T} \boldsymbol{x} \geq \theta\right),$$

where $\mathrm {I}(\cdot )$ is an indicator function with an output of 1 or 0 in response to input of true or false. It should be noted that maximizing the subtraction between TPR and FPR in Eq. (4) is exactly equivalent to maximizing the subtraction between true negative rate (TNR) and false negative rate (FNR). The reason for choosing TPR and FPR other than TNR and FNR is attributed to the fact that they are more frequently used, e.g., TPR and FPR are generally used to derive the receiver operating characteristic (ROC) curve [25].

3.2 MTST splitting strategy based on the multi-threshold linear base-classifier template

Since the RCV $\boldsymbol {w}$ in the multi-threshold linear base-classifier template can be assumed as a projection line $l$ across origin, $\boldsymbol {w}^{T} \boldsymbol {x}$ is actually the projection points of the input feature vector $\boldsymbol {x}$ onto $l$. In this paper, the concept of “multi-threshold linearly separability” is introduced and can be defined as Definition 3.1. Definition 3.1 If there is no intersection between the distributions of the projection points from any pair of targets on the projection line $l$, these targets are defined to be “multi-threshold linearly separable” for this projection line $l$; otherwise, they are defined to be “multi-threshold linearly inseparable” for this projection line $l$.

In fact, “multi-threshold linear separability”, as shown in Fig. 2(a), is an ideal situation and rarely exists in real applications, because some projection points may be sparsely located at the margin of the distributions as shown in Fig. 2(b), which are named as margin-located projection points for simplification in the following paragraphs. Although those margin-located projection points are assumed to only contribute a very small portion of all projection points, they usually occupy a large interval along the projection line and significantly disperse the distribution of each target’s projection points, which makes it very difficult to find the optimal decision boundaries to classify different targets. By ignoring those margin-located projection points, the remaining projection points can fulfill the situation of “multi-threshold linearly separability” with a slight degradation in classification accuracy, as shown in Fig. 2(c). Therefore, in order to minimize such degradation, the concept of “$p$-minimum interval” is put forward and defined as Definition 3.2 and the pseudo-code to determine the “$p$-minimum interval” is shown as Algorithm 1.

Algorithm 1. Determination of the "p-minimum interval"

View Table | View all tables in this article

Fig. 2. The distributions of the projection points on the projection line: (a) A “multi-threshold linear separable” situation without intersection between any pair of targets. (b) A “multi-threshold linearly inseparable” situation with intersections more commonly existed in real applications. (c) By ignoring the margin-located projection points of (b) by “$p$-minimum interval” ($p$ = 0.9), the remaining projection points can be assumed to fulfill the “multi-threshold linearly separable” situation with a slight degradation in classification accuracy.

Download Full Size | PDF

Definition 3.2 Let a target has $m$ projection points on the projection line $l$, then the “$p$-minimum interval” is defined as the interval containing at least $p\cdot m$ projection points with the minimum interval length, where $p\in (0,1]$ is a scale factor.

By inducing the concept of “$p$-minimum interval”, Definition 3.1 can be further modified as Definition 3.1*. Definition 3.1* If the “$p$-minimum intervals” of any pair of targets do not intersect with each other on the projection line $l$, these targets are defined to be “$p$-minimum interval multi-threshold linearly separable” for this projection line $l$; otherwise, they are defined to be “$p$-minimum interval multi-threshold linearly inseparable” for this projection line $l$.

However, in most cases, especially when the number of different targets increases and the projection points of each target follow complex distribution, the condition of “$p$-minimum interval multi-threshold linear separability” can also be rarely fulfilled, thus multiple base-classifiers are required to classify all targets step by step. To minimize the number of base-classifiers, the following steps are proposed to classify different targets gradually by the divide-and-conquer strategy. Besides, the schematic of MTST-based programmable hyperspectral imaging process is shown in Fig. 3 and the pseudo-code to generate an MTST is shown as Algorithm 2.

1. The RCV $\boldsymbol {w}$ of a base-classifier with the multi-threshold linear base-classifier template for classifying each target is calculated to obtain the projection line $l$. Then, multiple cluster-target, each consisting of one or multiple targets, are found to be “$p$-minimum interval multi-threshold linearly separable” from each other according to Definition 3.1*.
2. The targets in each cluster-target are served as the inputs of each branch node. Step 1 is repeated recursively until the node only contains one target or multiple targets that are “$p$-minimum interval multi-threshold inseparable”.
3. At each non-leaf node, the RCV $\boldsymbol {w}$ for classifying each cluster-target (rather than the “each target” in step 1) is recalculated to replace the original one. Then, the corresponding multiple thresholds are determined by Eq. (4).

Algorithm 2. Generation of a multi-threshold splitting tree

View Table | View all tables in this article

Fig. 3. The schematic of multi-target classification with programmable hyperspectral imaging technique based on MTST splitting strategy: in the first stage, the hyperspectral information about targets is compressed into multiple optical images through multiple specific spectral transmittances; in the second stage, the area of each cluster-target in different nodes of MTST are judged by applying one or more thresholds on the gray values in each optical images; in the third stage, the final multi-target classification result is obtained by processing the judgment results with different nodes of MTST, where each node only processes the colorful area of the corresponding subgraph.

Download Full Size | PDF

From the above, it can be noted that only the scale factor $p$ needs to be adjusted in MTST splitting strategy, which is expected to be an easy-to-use method. By changing the scale factor $p$, the strictness of “$p$-minimum interval multi-threshold linear separability” judgment can be adjusted, thus can further influence the classification accuracy and the number of RCVs required in the generated MTST. More specifically, when $p$ is small, the judgment of “$p$-minimum interval multi-threshold linear separability” is relatively loose, thus the depth of MTST (i.e., the maximum of the path lengths between all nodes and root node) tends to decrease, which leads to the decrease of the number of RCVs required; however, if $p$ is too small, the targets that are seriously “multi-threshold linearly inseparable” may be judged to be “$p$-minimum interval multi-threshold linearly separable”, thus may further lead to degradation in classification accuracy. In contrast, when $p$ is large, the judgment of “$p$-minimum interval multi-threshold linear separability” is relatively strict, thus the depth of MTST tends to increase, which leads to the increase in the number of RCVs required; however, if $p$ is too large, the recursive stop condition (i.e., the node only contains multiple targets that are “$p$-minimum interval multi-threshold inseparable”) may be met earlier than expected, thus the classification accuracy can also be degraded. Therefore, it is necessary to adjust the scale factor $p$ appropriately according to the distribution characteristics of the input data or using the grid search method [26] to achieve optimal performance.

In general, MTST first employs a set of parallel decision boundaries to classify all targets into several cluster-targets, then recursively sub-classifies the targets in each cluster-target with another set of parallel decision boundaries until all targets have been completely classified. Compared to conventional splitting strategies, the superior performance of MTST can be mainly attributed to the following three aspects: 1. MTST employs a multi-threshold strategy instead of a conventional single-threshold strategy, in which a relatively simple multi-classification task can be directly completed by a base-classifier consisting of multiple thresholds other than using multiple single-threshold base-classifiers, thus the total number of base-classifiers and RCVs required for multi-target classification can be significantly reduced; 2. MTST introduces the “divide-and-conquer” splitting idea by using the tree structure instead of the conventional “one-size-fits-all” splitting idea, in which the “divide-and-conquer” splitting idea allows the upper base-classifiers (i.e., the base-classifiers closer to the root node) to share the classification outcomes with the lower base-classifiers (i.e., the base-classifiers further away from the root node) to avoid the targets that have already been classified by the upper base-classifiers being repeatedly classified by the lower base-classifiers, thus the number of base-classifiers can be saved; 3. MTST creatively proposes a scale factor $p$ to adjust the compromise between the overall classification accuracy and the number of base-classifiers, in which an acceptable overall classification accuracy can be achieved by optimal tree structures with the least number of base-classifiers.

4. Experiments

4.1 Data set description

Six publicly available hyperspectral data sets [27] were used in this study. The detailed information about those hyperspectral data sets is described as follows.

1. KSC Data Set: This data set was collected by an AVIRIS sensor over Kennedy Space Center (KSC) in Florida with a spatial resolution of 18-meter pixels, consisting of 512 ${\times }$ 614 pixels and 224 spectral bands in the wavelength range of 400–2500 nm. After removing the low SNR and water absorption spectral bands, a total of 176 spectral bands were used for further analysis. In this data set, 13 different targets were defined and labeled.
2. Pavia Centre Data Set: This data set was collected by ROSIS sensor over Pavia in northern Italy with a spatial resolution of 1.3-meter pixels, consisting of 1096 ${\times }$ 1096 pixels and 102 spectral bands in the wavelength range of 430–860 nm. Some pixels containing no information in this data set were discarded, while the remaining pixels were mainly categorized into 9 different targets.
3. Pavia University Data Set: This data set was also collected by ROSIS sensor over Pavia in northern Italy with a spatial resolution of 1.3-meter pixels, consisting of 610 ${\times }$ 610 pixels and 103 spectral bands in the wavelength range of 430–860 nm. In this data set, most pixels were categorized into 9 different targets, but some pixels containing no information were discarded.
4. Salinas Data Set: This data set was collected by an AVIRIS sensor over Salinas Valley in California with a spatial resolution of 3.7-meter pixel, consisting of 512 ${\times }$ 217 pixels and 224 spectral bands in the wavelength range of 400–2500 nm. Twenty spectral bands covering the range of water absorption were removed. In this data set, 16 different targets were defined and labeled.
5. Indian Pines Data Set: This data set was collected by AVIRIS sensor over the Indian Pines test site in northwestern Indiana with a spatial resolution of 20-meter pixel, consisting of 145 ${\times }$ 145 pixels and 224 spectral bands in the wavelength range of 400–2500 nm. Twenty spectral bands covering the range of water absorption were removed. In this data set, 16 different targets were defined and labeled.
6. Botswana Data Set: This data set was collected by the NASA EO-1 satellite over the Okavango Delta in Botswana with a spatial resolution of 30-meter pixels, consisting of 1476 ${\times }$ 256 pixels and 242 spectral bands in the wavelength range of 400–2500 nm. After removing the spectral bands related to water absorption, a total of 145 spectral bands were used for further analysis. In this data set, 14 different targets were defined and labeled.

4.2 Comparative experimental setups

Three conventional splitting strategies, i.e., one-vs.-one (OvO), one-vs.-rest (OvR) and many-vs.-many (MvM) as described in section 2, were employed to compare with the proposed MTST splitting strategy. It can be found that the complexities of the number of RCVs in OvO, binary complete coding, and ternary complete coding are $\mathcal {O}(N^{2})$, $\mathcal {O}(2^{N})$ and $\mathcal {O}(3^{N})$, respectively. If the above three splitting strategies are employed for multi-target classification tasks based on programmable hyperspectral imaging technique, a large number of filters are necessary, and the number may even exceed that of the detection channels for conventional hyperspectral imaging technique, especially when a large number of targets are required to be classified. For example, the DAIS sensor-based conventional hyperspectral imaging technique only requires 79 detection channels for a 16-target classification task [28]; however, to complete the same classification task by programmable hyperspectral imaging technique, OvO splitting strategy requires 120 detection channels, binary complete coding splitting strategy requires 32,767 detection channels, and ternary complete coding splitting strategy even requires 21,457,825 detection channels, thus the imaging speed of programmable hyperspectral imaging technique is much slower than that of the conventional hyperspectral imaging technique. Therefore, only the other 4 splitting strategies with relatively low complexities, i.e., OvR, ordinal coding, dense random coding, and sparse random coding, are finally used for comparison.

The dimensionality reduction method, e.g., principal component analysis (PCA) [29] or partial least squares method (PLS) [30], combined with linear discriminant analysis (LDA) were employed to derive the base-classifiers, because of the inherent label-independent characteristic as mentioned above. It should be noted that the PCA or PLS can also be coded as spectral transmittance and involved during the physical process of optical imaging in programmable hyperspectral imaging, because they only involve linear operations [29,30]; moreover, no matter how many components of PCA or PLS are used, those linear operations of PCA or PLS can further merge with the linear operations of LDA to code a single spectral transmittance. The detailed deviation of coding PCA-LDA or PLS-LDA as a single spectral transmittance can be found in Section 2 of Supplement 1. It is well known that the physical process of optical imaging can numerically be formulated as the dot product of the detected spectrum at each pixel and the spectral transmittance used for imaging. In programmable hyperspectral imaging technique, by assuming the detected spectrum as an input feature vector $\boldsymbol {x}=\left (x_1,x_2,\ldots,x_n\right )$, where $x_\lambda$ are the intensities of $\lambda$th spectral band, if the RCV is coded as a spectral transmittance, the corresponding numerical computation can be replaced by the physical process of acquiring an optical image through a filter with coded spectral transmittance, thus fast or even real-time target classification can be achieved by judging the gray values at each pixel on the acquired optical image. Because the number of RCVs is equal to the number of filters used in programmable hyperspectral imaging, thus the number of RCVs can be used as an alternative criterion to evaluate its imaging speed.

To evaluate the proposed splitting strategy in an unbiased manner, the stratified 10-fold cross-validation [31] is employed, in which the parameters scale factor $p$ and number of PCA or PLS’s components are determined through the grid search method [26] until the optimal average overall accuracy of cross-validation sets is achieved. More specifically, the different values of scale factor $p$ from 0.1% to 100% with the step size of 0.1% for MTST splitting strategy and the different number of components from 1 to the maximum for all the methods are tested through stratified 10-fold cross-validation. The parameters that achieve the optimal average overall accuracy of cross-validation sets are used to establish the models and subsequently to code the spectral transmittance. The comparisons of imaging speed among the proposed MTST splitting strategy and other splitting strategies are implemented by assessing the number of RCVs, i.e., the number of spectral transmittances used for programmable hyperspectral imaging. For the splitting strategies requiring an uncertain number of RCVs (i.e., dense random coding and sparse random coding), the maximum number of RCVs in all folds is used to evaluate those splitting strategies. The multi-target classification performances with MTST and other conventional splitting strategies are quantitatively evaluated based on overall accuracies. Besides, other criteria [32], including the accuracies, precisions, false-alarm rates and miss-alarm rates of each target as well as their averages, are employed to supplementally evaluate the multi-target classification performance. All the codes and experiments are written and run in MATLAB R2019b and R2020b on a computer with an i9-10980XE CPU (18 physical cores) and 64 GB RAM.

4.3 Results and discussions

The comparisons of the number of RCVs, overall accuracies, and some parameters among the proposed MTST splitting strategy and the conventional splitting strategies are shown in Table 1. Besides, the more detailed classification results with MTST splitting strategy and other conventional splitting strategies, including the accuracies, precisions, false-alarm rates, miss-alarm rates of each target as well as their averages have been shown in Tables S1–S12 in Section 3 of Supplement 1.

Table 1. The number of regression coefficient vectors (RCVs), overall accuracies (OA), the number of components (Comps) of PCA or PLS, and the scale factor $p$ of MTST for the six hyperspectral data sets by using different splitting strategies^a

View Table | View all tables in this article

Comparing the classification performance of MTST splitting strategy with base-classifiers of PCA-LDA and other conventional splitting strategies with base-classifiers of PCA-LDA or PLS-LDA, it can be found that, it usually needs a very high cost (i.e., a significant increase in the number of RCVs) to increase the classification performance in the conventional splitting strategies, and reducing one RCV can also lead to a significant decline in classification performance such as OvR and ordinal coding splitting strategies. However, the proposed MTST splitting strategy can reduce a large number of RCVs while maintaining the classification performance similar to the optimal performance in the conventional splitting strategies. Specifically, compared with the ordinal coding and dense random coding splitting strategies, the proposed MTST splitting strategy achieves the best classification performance on all data sets using the least number of RCVs. Compared with the OvR splitting strategy, the MTST splitting strategy achieves the best classification performance in most cases with the least number of RCVs, which only induces slight degradation in overall accuracy for Indian Pines Data Set and average precisions for Indian Pines Data Set and Pavia University Data Set. The classification performances of the MTST splitting strategy and sparse random coding splitting strategy are similar on the whole, but the number of RCVs required by the MTST splitting strategy is far less than that of the sparse random coding splitting strategy. The results of the Pavia Centre Data Set, including the coded spectral transmittances and the corresponding gray values of different targets on simulated optical images, are chosen as the representative results to demonstrate the capability of using the proposed MTST splitting strategy for fast multi-target classification with programmable hyperspectral imaging technique, as shown in Fig. 4.

Fig. 4. The target classification results of Pavia Centre Data Set by using the MTST splitting strategy with base-classifiers of PCA-LDA by 7 RCVs. (a1)–(a7) The spectral transmittances coded from the RCVs in nodes 1–7. (b1)–(b7) The scatter plots of gray values of different targets on simulated optical images in nodes 1–7. (c) The structure of the employed MTST.

Download Full Size | PDF

In summary, by maintaining similar classification performance, the proposed MTST splitting strategy can reduce the number of RCVs by at least 25%, 33%, 75% and 80% compared to OvR, ordinal coding, dense random coding, and sparse random coding, respectively. In other words, the number of filters can be reduced with the same percentages, which should be proportional to the improvement in the speed of programmable hyperspectral imaging for multi-target classification. Besides, the proposed MTST splitting strategy allows users to arbitrarily adjust the number of RCVs required according to the application environment and acceptable error range, which is impossible for other conventional splitting strategies.

The training and prediction time of the proposed MTST splitting strategy and other conventional splitting strategies are also compared, in which the dimensionality reduction process is removed in order to achieve fair comparisons. More specifically, in the comparative experiments, different values of scale factor $p$ from 0.1% to 100% with the step size of 0.1% were tested, and the corresponding computational time costs were recorded. The longest computational time cost was treated as the final computational time and listed in Table 2. It can be found that the training and prediction speed of the proposed MTST splitting strategy are also faster than that of other conventional splitting strategies. Such an advantage can improve the speed of parameter adjustment and performance tests, so that the time cost to obtain the optimal spectral transmittance is affordable, commonly from several minutes to several hours. To further demonstrate the influence of $p$ on training and prediction time, the curves of computational time with different values of scale factor $p$ have been plotted and added in Section 4 of Supplement 1.

Table 2. The training and prediction time for the six hyperspectral data sets by using different splitting strategies with LDA as the base-classifier

View Table | View all tables in this article

The potential limitations of the programmable hyperspectral imaging technique are the requirement of prior knowledge when designing spectral transmittances and the switch of different spectral transmittances when imaging totally different scenes. In real practical applications, the hyperspectral data cube of a certain scene can be first obtained to code the spectral transmittance. Once the spectral transmittance has been designed to image a certain scene, it can also be used to image similar scenes consisting of similar targets, in which unexpected target rarely exists. As a matter of fact, similar assumptions have already been made in many industrial and clinical applications, such as crop identification [33,34], defect detection [35,36], pharmaceuticals quality control [4,37] and tumor margin assessment [38,39], in which the goal is to quickly find subtle differences in spectra among groups rather than finding a new species. Besides, meeting the “independently and identically distribution (i.i.d.)” between the training set and test set is a general assumption of most machine learning algorithms (e.g., deep neural networks) [40], and has been widely used in many practical applications. Based on this general assumption, we can infer that the classification accuracy should only drop slightly in most cases when the training scene and test scene are similar and meet the i.i.d. assumption. In such a manner, the spectral transmittance can be designed beforehand and only need to be optimized once, thus the time cost for designing the spectral transmittance can be negligible in the practical applications of programmable hyperspectral imaging technique. In addition, although different spectral transmittances may apply in different applications, because of the ultra-fast binary pattern rate of digital micromirror devices (DMDs), arbitrary spectral transmittances can be switched rapidly without adjusting the hardware of the programmable hyperspectral imaging system, as proved in Section 1.1 of Supplement 1.

5. Conclusion

In this paper, the proposed MTST splitting strategy is evaluated on six publicly available hyperspectral data sets based on numerical simulation. The results show its excellent reduction ability of 25% to 80% on the number of filters used in programmable hyperspectral imaging for multi-target classification, compared with conventional splitting strategies by maintaining similar classification performance. Thus, such a splitting strategy is expected to promote a fast or even real-time programmable hyperspectral imaging-based multi-target classification with high classification performance. The proposed MTST splitting strategy can also be adapted to other base-classifiers that conform to the multi-threshold linear base-classifier template, which can provide a general framework of the filter design in programmable hyperspectral imaging for fast multi-target classification.

Funding

Fundamental Research Funds for the Central Universities (N2119002); National Natural Science Foundation of China (61605025).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [27].

Supplemental document

See Supplement 1 for supporting content.

References

1. J. Lu, Y. Ren, W. Xu, X. Cui, L. Xie, S. Chen, J. Guo, and Y. Yao, “A Programmable Optical Filter With Arbitrary Transmittance for Fast Spectroscopic Imaging and Spectral Data Post-Processing,” IEEE Access 7, 119294–119308 (2019). [CrossRef]

2. D. Cebeci, B. R. Mankani, and D. Ben-Amotz, “Recent Trends in Compressive Raman Spectroscopy Using DMD-Based Binary Detection,” J. Imaging 5(1), 1–15 (2018). [CrossRef]

3. B. Sturm, F. Soldevila, E. Tajahuerce, S. Gigan, H. Rigneault, and H. B. D. Aguiar, “High-sensitivity high-speed compressive spectrometer for Raman imaging,” ACS Photonics 6(6), 1409–1415 (2019). [CrossRef]

4. D. Cebeci-Maltas, R. McCann, P. Wang, R. Pinal, R. Romanach, and D. Ben-Amotz, “Pharmaceutical Application of Fast Raman Hyperspectral Imaging with Compressive Detection Strategy,” J Pharm. Innov. 9(1), 1–4 (2014). [CrossRef]

5. J. E. V. Jr., A. Dong, R. W. Boyd, and Z. Shi, “Multiple-output multivariate optical computing for spectrum recognition,” Opt. Express 22(21), 25005–25014 (2014). [CrossRef]

6. T. C. Corcoran, “Compressive Detection of Highly Overlapped Spectra Using Walsh–Hadamard-Based Filter Functions,” Appl. Spectrosc. 72(3), 392–403 (2018). [CrossRef]

7. N. T. Quyen, E. D. Silva, N. Q. Dao, and M. D. Jouan, “New Raman Spectrometer Using a Digital Micromirror Device and a Photomultiplier Tube Detector for Rapid On-Line Industrial Analysis. Part I: Description of the Prototype and Preliminary Results,” Appl. Spectrosc. 62(3), 273–278 (2008). [CrossRef]

8. B. M. Davis, A. J. Hemphill, D. C. Maltas, M. A. Zipper, P. Wang, and D. Ben-Amotz, “Multivariate hyperspectral Raman imaging using compressive detection,” Anal. Chem. 83(13), 5086–5092 (2011). [CrossRef]

9. F. Zhang, Z. Zhang, L. Wei, W. Xu, X. Cui, and S. Chen, “Coding Convolutional Neural Networks as Spectral Transmittance for Intelligent Hyperspectral Remote Sensing in a Snapshot,” IEEE Geosci. Remote. Sens. Lett. 18(9), 1635–1639 (2021). [CrossRef]

10. D. S. Wilcox, G. T. Buzzard, B. J. Lucier, P. Wang, and D. Ben-Amotz, “Photon level chemical classification using digital compressive detection,” Anal. Chim. Acta 755, 17–27 (2012). [CrossRef]

11. D. S. Wilcox, G. T. Buzzard, B. J. Lucier, O. G. Rehrauer, P. Wang, and D. Ben-Amotz, “Digital compressive chemical quantitation and hyperspectral imaging,” Analyst 138(17), 4982–4990 (2013). [CrossRef]

12. O. G. Rehrauer, V. C. Dinh, B. R. Mankani, G. T. Buzzard, B. J. Lucier, and D. Ben-Amotz, “Binary Complementary Filters for Compressive Raman Spectroscopy,” Appl. Spectrosc. 72(1), 69–78 (2017). [CrossRef]

13. C. Scotté, H. B. D. Aguiar, D. Marguet, E. M. Green, P. Bouzy, S. Vergnole, C. P. Winlove, N. Stone, and H. Rigneault, “Assessment of compressive Raman versus hyperspectral Raman for microcalcification chemical imaging,” Anal. Chem. 90(12), 7197–7203 (2018). [CrossRef]

14. E. L. Allwein, R. E. Schapire, and Y. Singer, “Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers,” J. Mach. Learn. Res. 1(2), 113–141 (2001).

15. J. Fürnkranz, “Round Robin Classification,” J. Mach. Learn. Res. 2(4), 721–747 (2002).

16. Z. Zhou, Machine Learning (Tsinghua University Press, 2016), 60–63.

17. T. G. Dietterich and G. Bakiri, “Solving Multiclass Learning Problems via Error-Correcting Output Codes,” J. Artif. Intell. Res. 2(2), 263–286 (1995). [CrossRef]

18. S. Escalera, O. Pujol, and P. Radeva, “Separability of ternary codes for sparse designs of error-correcting output codes,” Pattern Recognit. Lett. 30(3), 285–297 (2009). [CrossRef]

19. S. Escalera, O. Pujol, and P. Radeva, “On the Decoding Process in Ternary Error-Correcting Output Codes,” IEEE Transactions on Pattern Analysis Mach. Intell. 32(1), 120–134 (2010). [CrossRef]

20. G. D. Hutcheson, The Multivariate Social Scientist (SAGE Publications, 1999).

21. J. A. Smola and B. Schölkopf, “A tutorial on support vector regression,” Stat. Comput. 14(3), 199–222 (2004). [CrossRef]

22. R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals Eugen. 7(2), 179–188 (1936). [CrossRef]

23. R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality Reduction by Learning an Invariant Mapping,” in Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2006).

24. J. Hilden and P. Glasziou, “Regret Graphs, Diagnostic Uncertainty and Youden’s Index,” Stat. Medicine 15(10), 969–986 (1996). [CrossRef]

25. L. Gonçalves, A. Subtil, M. R. Oliveira, and P. Z. Bermudez, “ROC curve estimation: An overview,” REVSTAT-Statistical J. 12(1), 1–20 (2014).

26. R. Liu, E. Liu, J. Yang, M. Li, and F. Wang, “Optimizing the Hyper-parameters for SVM by Combining Evolution Strategies with a Grid Search,” Intell. Control. Autom. 344, 712–721 (2006). [CrossRef]

27. M. Graña, M. A. Veganzons, and B. Ayerdi, “Hyperspectral Remote Sensing Scenes,” Grupo de Inteligencia Computacional (2021) [retrieved 9 Jan. 2021], http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes.

28. R. Richter, “Atmospheric Correction of DAIS Hyperspectral Image Data,” Comput. Geosci. 22(7), 785–793 (1996). [CrossRef]

29. H. Hotelling, “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol. 24(6), 417–441 (1933). [CrossRef]

30. S. D. Jong, “SIMPLS: an alternative approach to partial least squares regression,” Chemom. Intell. Lab. Syst. 18(3), 251–263 (1993). [CrossRef]

31. R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” in Proceedings of the 14th International Joint Conference on Artificial Intelligence (1995).

32. C.-I. Chang, “Statistical Detection Theory Approach to Hyperspectral Image Classification,” IEEE Transactions on Geosci. Remote. Sens. 57(4), 2057–2074 (2019). [CrossRef]

33. G. Camps-Valls, L. Gómez-Chova, J. Calpe-Maravilla, E. Soria-Olivas, J. D. Martín-Guerrero, and J. Moreno, “Support Vector Machines for Crop Classification Using Hyperspectral Data,” in Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis (2003).

34. G. Hong and H. T. A. El-Hamid, “Hyperspectral imaging using multivariate analysis for simulation and prediction of agricultural crops in Ningxia, China,” Comput. Electron. Agric. 172, 105355 (2020). [CrossRef]

35. D. P. Ariana and R. Lu, “Hyperspectral waveband selection for internal defect detection of pickling cucumbers and whole pickles,” Comput. Electron. Agric. 74(1), 137–144 (2010). [CrossRef]

36. B. Zhang, J. Li, S. Fan, W. Huang, C. Zhao, C. Liu, and D. Huang, “Hyperspectral imaging combined with multivariate analysis and band math for detection of common defects on peaches (Prunus persica),” Comput. Electron. Agric. 114, 14–24 (2015). [CrossRef]

37. I. Vermaak, A. Viljoen, and S. W. Lindström, “Hyperspectral imaging in the quality control of herbal medicines – The case of neurotoxic Japanese star anise,” J. Pharm. Biomed. Anal. 75, 207–213 (2012). [CrossRef]

38. B. Fei, G. Lu, X. Wang, H. Zhang, J. V. Little, M. R. Patel, C. C. Griffith, M. W. El-Diery, and A. Y. Chen, “Label-free reflectance hyperspectral imaging for tumor margin assessment: a pilot study on surgical specimens of cancer patients,” J. Biomed. Opt. 22(08), 1 (2017). [CrossRef]

39. E. Kho, L. L. D. Boer, K. K. V. D. Vijver, F. V. Duijnhoven, M.-J. T. Vrancken Peeters, H. J. Sterenborg, and T. J. Ruers, “Hyperspectral imaging for resection margin assessment during cancer surgery,” Clin. Cancer Res. 25(12), 3572–3580 (2019). [CrossRef]

40. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (The MIT Press, 2016), 111.

Base-classifier		PLS-LDA				PCA-LDA				PCA-LDA
Splitting Strategy		OvR	Ordinal	Dense Random	Sparse Random	OvR	Ordinal	Dense Random	Sparse Random	MTST (Ours)
KSC(13 targets)	RCVs $↓$	13	12	40	60	13	12	40	60	12	11	10	9	8	7	6
	OA(%) $↑$	91.0	84.5	88.6	92.4	91.1	84.4	88.5	92.5	92.6	91.4	90.6	87.7	85.6	83.6	80.2
	Comps	160	152	159	174	121	176	166	152	81	34	52	59	42	45	155
	p(%)	-	-	-	-	-	-	-	-	91.0	77.2	77.7	74	65.8	59.1	93.9
Pavia Centre (9 targets)	RCVs $↓$	9	8	39	57	9	8	40	57	7	6	8	5	4	3	2
	OA(%) $↑$	96.4	92.8	94.1	97.0	96.6	92.8	94.1	97.0	97.9	97.4	96.9	95.3	91.0	87.0	86.5
	Comps	75	22	100	99	99	99	43	87	55	77	10	13	11	14	38
	p(%)	-	-	-	-	-	-	-	-	88.0	90.5	92.2	87.4	91.5	25.3	22.0
Pavia University (9 targets)	RCVs $↓$	9	8	39	58	9	8	40	55	7	6	5	4	3	1	2
	OA(%) $↑$	82.1	68.6	77.9	83.5	82.1	68.5	78.0	83.2	85.2	85.0	82.9	77.4	69.1	56.5	56.4
	Comps	102	102	93	103	99	103	102	102	102	102	102	9	3	1	1
	p(%)	-	-	-	-	-	-	-	-	90.1	76.5	61.6	97.6	97.8	100.0	98.9
Salinas (16 targets)	RCVs $↓$	16	15	50	75	16	15	50	75	10	11	14	13	12	9	15
	OA(%) $↑$	90.1	82.4	87.6	91.5	90.1	82.4	87.6	91.5	91.2	91.0	90.5	90.3	90.2	89.9	89.4
	Comps	200	198	189	197	204	204	180	194	60	58	165	123	121	48	156
	p(%)	-	-	-	-	-	-	-	-	85.3	88.1	99.1	97.9	97.8	84.4	99.2
Indian Pines (16 targets)	RCVs $↓$	16	15	50	75	16	15	50	75	12	11	10	9	13	14	15
	OA(%) $↑$	76.1	63.4	73.7	78.9	76.1	63.3	73.7	79.1	75.8	75.1	72.5	70.0	69.8	69.1	67.8
	Comps	197	192	162	187	199	200	193	178	100	148	110	117	81	199	126
	p(%)	-	-	-	-	-	-	-	-	71.4	58.9	56.1	54.4	69.8	97.3	97.3
Botswana (14 targets)	RCVs $↓$	14	13	40	60	14	13	40	60	11	12	10	9	8	7	6
	OA(%) $↑$	91.9	82.5	89.8	94.7	91.8	82.4	89.4	94.7	93.1	93.1	90.9	89.1	85.0	77.1	72.9
	Comps	128	134	142	142	114	145	142	139	54	60	71	68	106	60	122
	p(%)	-	-	-	-	-	-	-	-	93.0	90.7	92.2	84.7	78.8	52.3	50.5

Splitting Strategy		OvR	Ordinal	Dense Random	Sparse Random	MTST (Ours)
KSC (5.2k samples)	Train(s)	0.49	0.45	2.64	3.35	0.43
KSC (5.2k samples)	Predict(s)	0.30	0.27	0.90	1.31	0.10
Pavia Centre (148.2k samples)	Train(s)	4.77	4.16	19.57	19.86	1.55
Pavia Centre (148.2k samples)	Predict(s)	2.25	1.98	8.37	11.61	1.78
Pavia University (42.8k samples)	Train(s)	1.27	1.13	6.00	6.63	0.67
Pavia University (42.8k samples)	Predict(s)	0.67	0.60	2.75	4.13	0.70
Salinas (54.1k samples)	Train(s)	5.92	5.88	19.64	21.74	1.65
Salinas (54.1k samples)	Predict(s)	3.31	3.10	10.21	13.26	1.16
Indian Pines (10.2k samples)	Train(s)	1.09	1.04	4.93	6.38	0.57
Indian Pines (10.2k samples)	Predict(s)	0.66	0.63	2.10	3.09	0.21
Botswana (3.2k samples)	Train(s)	0.29	0.28	2.05	2.82	0.28
Botswana (3.2k samples)	Predict(s)	0.21	0.19	0.59	0.88	0.07

Base-classifier		PLS-LDA				PCA-LDA				PCA-LDA
Splitting Strategy		OvR	Ordinal	Dense Random	Sparse Random	OvR	Ordinal	Dense Random	Sparse Random	MTST (Ours)
KSC(13 targets)	RCVs $↓$	13	12	40	60	13	12	40	60	12	11	10	9	8	7	6
	OA(%) $↑$	91.0	84.5	88.6	92.4	91.1	84.4	88.5	92.5	92.6	91.4	90.6	87.7	85.6	83.6	80.2
	Comps	160	152	159	174	121	176	166	152	81	34	52	59	42	45	155
	p(%)	-	-	-	-	-	-	-	-	91.0	77.2	77.7	74	65.8	59.1	93.9
Pavia Centre (9 targets)	RCVs $↓$	9	8	39	57	9	8	40	57	7	6	8	5	4	3	2
	OA(%) $↑$	96.4	92.8	94.1	97.0	96.6	92.8	94.1	97.0	97.9	97.4	96.9	95.3	91.0	87.0	86.5
	Comps	75	22	100	99	99	99	43	87	55	77	10	13	11	14	38
	p(%)	-	-	-	-	-	-	-	-	88.0	90.5	92.2	87.4	91.5	25.3	22.0
Pavia University (9 targets)	RCVs $↓$	9	8	39	58	9	8	40	55	7	6	5	4	3	1	2
	OA(%) $↑$	82.1	68.6	77.9	83.5	82.1	68.5	78.0	83.2	85.2	85.0	82.9	77.4	69.1	56.5	56.4
	Comps	102	102	93	103	99	103	102	102	102	102	102	9	3	1	1
	p(%)	-	-	-	-	-	-	-	-	90.1	76.5	61.6	97.6	97.8	100.0	98.9
Salinas (16 targets)	RCVs $↓$	16	15	50	75	16	15	50	75	10	11	14	13	12	9	15
	OA(%) $↑$	90.1	82.4	87.6	91.5	90.1	82.4	87.6	91.5	91.2	91.0	90.5	90.3	90.2	89.9	89.4
	Comps	200	198	189	197	204	204	180	194	60	58	165	123	121	48	156
	p(%)	-	-	-	-	-	-	-	-	85.3	88.1	99.1	97.9	97.8	84.4	99.2
Indian Pines (16 targets)	RCVs $↓$	16	15	50	75	16	15	50	75	12	11	10	9	13	14	15
	OA(%) $↑$	76.1	63.4	73.7	78.9	76.1	63.3	73.7	79.1	75.8	75.1	72.5	70.0	69.8	69.1	67.8
	Comps	197	192	162	187	199	200	193	178	100	148	110	117	81	199	126
	p(%)	-	-	-	-	-	-	-	-	71.4	58.9	56.1	54.4	69.8	97.3	97.3
Botswana (14 targets)	RCVs $↓$	14	13	40	60	14	13	40	60	11	12	10	9	8	7	6
	OA(%) $↑$	91.9	82.5	89.8	94.7	91.8	82.4	89.4	94.7	93.1	93.1	90.9	89.1	85.0	77.1	72.9
	Comps	128	134	142	142	114	145	142	139	54	60	71	68	106	60	122
	p(%)	-	-	-	-	-	-	-	-	93.0	90.7	92.2	84.7	78.8	52.3	50.5

Splitting Strategy		OvR	Ordinal	Dense Random	Sparse Random	MTST (Ours)
KSC (5.2k samples)	Train(s)	0.49	0.45	2.64	3.35	0.43
KSC (5.2k samples)	Predict(s)	0.30	0.27	0.90	1.31	0.10
Pavia Centre (148.2k samples)	Train(s)	4.77	4.16	19.57	19.86	1.55
Pavia Centre (148.2k samples)	Predict(s)	2.25	1.98	8.37	11.61	1.78
Pavia University (42.8k samples)	Train(s)	1.27	1.13	6.00	6.63	0.67
Pavia University (42.8k samples)	Predict(s)	0.67	0.60	2.75	4.13	0.70
Salinas (54.1k samples)	Train(s)	5.92	5.88	19.64	21.74	1.65
Salinas (54.1k samples)	Predict(s)	3.31	3.10	10.21	13.26	1.16
Indian Pines (10.2k samples)	Train(s)	1.09	1.04	4.93	6.38	0.57
Indian Pines (10.2k samples)	Predict(s)	0.66	0.63	2.10	3.09	0.21
Botswana (3.2k samples)	Train(s)	0.29	0.28	2.05	2.82	0.28
Botswana (3.2k samples)	Predict(s)	0.21	0.19	0.59	0.88	0.07

Multi-threshold splitting tree algorithm to reduce the number of filters in programmable hyperspectral imaging for fast multi-target classification

Abstract

1. Introduction

2. Related works

3. Methodology

3.1 Multi-threshold linear base-classifier template

3.2 MTST splitting strategy based on the multi-threshold linear base-classifier template

4. Experiments

4.1 Data set description

4.2 Comparative experimental setups

4.3 Results and discussions

5. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (4)

Tables (4)

Equations (6)

Optics Express