## Abstract

We present an approach to the problems of weak plume detection and sub-pixel target detection in hyperspectral imagery that operates in a two-dimensional space. In this space, one axis is a matched-filter projection of the data and the other axis is the magnitude of the residual after matched-filter subtraction. Although it is only two-dimensional, this space is rich enough to include several well-known signal detection algorithms, including the adaptive matched filter, the adaptive coherence estimator, and the finite-target matched filter. Because this space is only two-dimensional, adaptive machine learning methods can produce new plume detectors without being stymied by the curse of dimensionality. We investigate, in particular, the utility of the support vector machine for learning boundaries in this matched-filter-residual space, and compare the performance of the resulting nonlinearly adaptive detector to well-known alternatives.

©2009 Optical Society of America

## 1. Introduction

Hyperspectral remote sensing imagers, which combine a large number of spectral bands (>100) with a large number of pixels (>10^{5}), have the potential for accurate and comprehensive scene characterization [1, 2]. The spectral sensitivity (from many spectral bands and high signal-to-noise ratio) enables even very similar materials to be distinguished. Hyperspectral imagery has been used to detect chemical spills and airborne plumes in the infrared wavelength region [3, 4], where chemicals have distinct signatures even when observed only in atmospheric transmission windows (3–5 *µ*m, 8–14 *µ*m). The exquisite spectral resolution also enables the detection of solid targets smaller than the size of a pixel [5, 6]. Exploiting the high-dimensional data can be difficult, however, because much of the intuition about clustering and detection algorithms that is helpful in multispectral data does not apply, and there is often high correlation among spectral bands.

In this paper, we address the challenge of high dimensional data analysis by mapping hyperspectral data to a two-dimensional space in which one axis is the projection of the data on the matched-filter (MF) direction and the other axis is the magnitude of the residual (R) orthogonal to the MF. In this matched-filter-residual (MFR) plane, the adaptive matched filter (AMF) [7, 8] and the adaptive coherence estimator (ACE) [9] both have decision regions defined by simple straight lines. (ACE is similar to the t statistic used in regression [10]; see the Appendix.) While the AMF is optimal when the background clutter is Gaussian, the ACE has some useful invariance properties and often exhibits better performance when the tails of the clutter distribution are much fatter than Gaussian. The finite-target matched filter (FTMF) [5] can also be expressed in the MFR space, but exhibits a curved boundary. In the present work, we describe detectors with decision boundaries in the MFR plane that are learned from data. Although our approach is more computationally complex than the AMF, ACE, or FTMF, it can improve target detection by adapting to an *a priori* unknown, and typically non-Gaussian, background clutter. Working in a two-dimensional MFR space, rather than in the high dimensions of the original data, allows machine learning algorithms to more effectively produce good target detectors. If some model distribution for the clutter is accurate, we can implement a Generalized Likelihood Ratio Test (GLRT); when no good model can be found, a machine learning approach has an advantage. The lower dimension also aids intuition about decision boundaries.

The layout of this paper is as follows. Section 2 describes the plume detection and subpixel target problems in terms of the measured radiance. Section 3 introduces the MFR space, which represents the hyperspectral data in a reduced number of dimensions corresponding to the matched filter (MF) and residual (*R*) quantities. Section 4 introduces the generation of “matched-pair” data for evaluating the performance of detectors, and Section 5 extends the notion to the use of machine learning to develop new detectors. Adaptive detection in MFR space is discussed in Section 5.1. We apply the nonlinear MFR detectors in Section 6 to simulated non-Gaussian LWIR data and to hyperspectral data from the AVIRIS instrument, and show that the nonlinear MFR has advantages over the AMF and ACE detectors. Finally, Section 7 concludes with a discussion of limations and some variations on this approach.

## 2. Targets and background

#### 2.1. Weak gas plume

A detailed treatment of the radiative signature of a chemical plume in the long-wave infrared (LWIR) would include the nonlinear aspects of radiative emission, absorption, and transport through the atmosphere [11]. For the practically important case of a weak chemical plume, however, the following linear model is often adequate [3, 4, 12, 13, 14]:

In Eq. (1), *x*′ is the spectral radiance observed at one pixel in a hyperspectral image, *ε*′ is a scalar representing the strength of the plume, s is the spectral signature of the plume, and w′ is the spectral radiance of the background. Since the spectral imaging sensor has *N* spectral bands, the vectors x′ and w′ are *N*-dimensional at each of the *m* pixels in the image, and the target signature s is treated as a known vector that is constant over the whole image. (The vector s incorporates the atmospheric transmission, which may vary, but is usually approximately constant over one scene.) The background clutter w′ arises from ground radiance (thermal emission and reflected downwelling), atmospheric transmission, atmospheric path radiance, and sensor noise [11]. While sensor noise may be approximately Gaussian, the ground clutter is highly structured and non-Gaussian in most cases [15, 16]. The plume strength *ε*′ is a product of the column density of the chemical and the thermal contrast between the plume and the ground [17]. The vector x′ corresponds to what we measure, while w′ signifies the clutter that we aim to suppress in order to retrieve the plume strength.

In practice, we work with the mean-subtracted radiance. Define

**x**=**x**′-〈**x**′〉

**w**=**w**′-〈**w**′〉

**ε**=**ε**′-〈**ε**′〉

where the angle brackets correspond to the average over the image. This leads to the signal model

which has the same form as Eq. (1). It is often the case in hyperspectral target detection (particularly for broad area search applications) that the target signature s is contained in only a few pixels, or (particularly for weak chemical plumes) it is present as a small fraction of the observed radiance. Then we can write *µ*=〈**x**〈≈〈**w**〈, and the target pixels have a negligible effect on the sample mean and covariance. We make this “weak sparse target” assumption from this point forward. There are other remote sensing applications in which the sample mean and covariance are influenced by target pixels; this effect is discussed in more detail elsewhere [13, 14, 18, 19].

Alternative physical derivations that incorporate more subtle details, and produce more complicated equations than Eq. (2), appear in Refs. [14, 17]. One may reformulate the basic signal model of Eq. (2) to incorporate atmospheric compensation [20]; to the extent that the compensation is accurate, it provides a number of advantages, from simplifying the ground clutter, to reducing the wavelength dependence of the plume strength. Ref. [4] discusses a more general signal model than Eq. (2) in which a scalar multiplies the clutter term; the consequences are discussed in the Appendix.

## 2.2. Solid sub-pixel target

The target may be a solid object with radiance spectrum **r** that occupies some fraction *f* of the pixel under investigation. In this case, a “radiance replacement” signal model is more appropriate [5, 18]:

In the reflective spectral regions, radiance reflected from the target object occurs at the expense of the background radiance. By defining the target spectrum s=**r**-*µ*, the model can be rewritten in the mean-subtracted form:

We will use this signal model in connection with target detection in the ultraviolet to short-wave-infrared spectral region in Section 6.2. Signal models (2) and (4) are the same in the small target limit, motivating the use of the adaptive matched filter in many target detection applications.

## 3. Detection quantities and matched-filter-residual (MFR) space

In the Appendix we derive two detectors, the adaptive matched filter (AMF) and the *t* statistic, that are familiar from the remote sensing and signal processing literature. In particular, the AMF is used widely on hyperspectral data and in radar; it was used in early work on chemical detection in infrared spectroscopy [21, 22]. The *t* statistic is closely related to the *ACE* detector [9]. In the Appendix these statistics are expressed in terms of two quantities that are computed from a pixel spectrum: the matched filter output (MF) and the residual (*R*) (see Eqs. (26) and (27)), which can be cast in the forms:

where **C** is the sample covariance matrix, a property of the background clutter. Then the adaptive matched filter output is simply given by MF, and

expresses the *t* statistic as a ratio of MF and *R*. The AMF, ACE, and the t statistic can all be derived as GLRT detectors under different assumptions [8, 9, 18].

The two-dimensional matched-filter-residual (MFR) space, with axes given by MF and *R*, was introduced in Ref. [23], and provides a convenient way to visualize a variety of detection algorithms. The quantities MF and *R* are both dimensionless but are scaled in a compatible way. In particular, the distance to the origin in the MFR plot is equal to the Mahalanobis distance to the centroid of the original data:

The hyperspectral data can be exhibited on a scatter plot by using the scalar values of Eqs. (5,6) as coordinates. This MFR plot provides a revealing two-dimensional representation of detector behavior on high dimensional data. The two dimensions are chosen to provide useful information about the shape of the high-dimensional data cloud in the direction of interest, that of the target signature.

Further, since the AMF and *t* statistic detectors depend only on MF and *R*, the decision boundaries based on these statistics can be represented in this two-dimensional MFR space. In particular, the contours of the AMF detector will be lines of constant MF, which will be horizontal lines in the MFR plot. And since *t* is the ratio of MF to *R*, a detection contour for *t* on the MFR plot will be a sloped line emanating from the origin. We remark that other contours in MFR space could lead to different detectors, such as those proposed in Ref. [24].

This is illustrated in Figs. 1(a,b) for two synthetic datasets, one Gaussian and one non-Gaussian. The latter is an elliptically-contoured (EC) multivariate *t* distribution, advocated by Marden and Manolakis [25] for hyperspectral data in the vis-SWIR wavelength regions. A parameter ν characterizes the degree of non-Gaussianity. In each case, the dimension is *N*=128 spectral bands, and the decision contours drawn correspond to a false alarm rate *P*
_{fa}=0.005. There are two clouds of points in the figure: the lower one represents background clutter only, and the upper one represents background clutter with target signature added, discussed next.

## 4. Performance evaluation with matched-pair data

Although the ideal way to evaluate detection strategies for chemical plumes is by analyzing experimental data from controlled, quantified plume releases, these are rarely available with enough accuracy and in enough quantity to permit statistical evaluation. As an alternative, we employ “matched-pair” data sets [23]. We start with a hyperspectral image in which the target signature does not appear (or else it appears only weakly or rarely), and we generate a second dataset by artificially adding the target signature to every pixel in the image. This produces on-plume and off-plume pixels. The off-plume pixels are taken directly from observed data, while the on-plume pixels are generated by

where *ε*
_{mp} corresponds to the strength of the plume. These matched pairs are shown in Fig. 1(a,b). By using these matched pair data, the ROC (receiver operating characteristic) curves in Fig. 1(c,d) can be generated by shifting the threshold for the detection statistic, and computing Pfa from the fraction of **x**
_{off} above the boundary, and Pd from the fraction of **x**
_{on} above the boundary. Each value of *ε*
_{mp} leads to a particular ROC curve. A family of ROC curves can be obtained for a range of values of *ε*
_{mp}, corresponding to different plume strengths; or relative performance can be examined at a single representative value of *ε*
_{mp}.

Although the on-plume pixels are artificial, the matched-pair datasets have certain advantages over controlled release data sets: they are readily generated from either synthetic or real data; they provide a large number of both on-target and background-only pixels; and the signal strength in the on-target pixels is known precisely. Further, they can be used for any signal model appropriate to the problem at hand, including those more sophisticated than Eqs. (2,4).

For purely Gaussian data (Figs. 1(a,c)), the AMF has higher performance (as expected from Neyman-Pearson optimality), but the *t* statistic is only slightly worse. For the heavy-tailed EC data in Fig. 1(b), the *t* statistic substantially outperforms the AMF at low *P*
_{fa} (Fig. 1(d)). At higher *P*
_{fa}, however, the AMF is better.

## 5. Binary classification with matched-pair data

Although the use of matched-pair data for the *evaluation* of existing detection algorithms is straightforward [23], we propose a further use for these matched-pair datasets: as training data for machine learning methods, leading to the *development* of new detection algorithms.

In general, the use of machine learning for producing detection algorithms can be problematic, but the proposed approach deals with at least two common problems. By using matched-pair data, an ample supply of both on-target and background-only pixels is available; and by working in the reduced-dimensional MFR space, the curse of dimensionality is addressed.

Once a matched-pair dataset is generated, the aim is to find a boundary that separates the **x**
_{on} pixels from the **x**
_{off} pixels. This is a binary classification problem, and the solution is generally expressed with a scalar function *g*(**x**): where *g*(**x**) is positive, **x** is predicted to be on-plume, and where g(x) is negative, x is predicted to be off-plume. For the adaptive matched filter, we take *g*(**x**)=**q**
^{T}
**x**+*c*, with **q** defined in Eq. (23), and *c* a scalar offset that corresponds to the matched filter threshold. But, by considering a broader class of functions *g*(**x**), more complicated (and potentially more discriminating) classifiers can be found.

For the plume detection problem, the matched pair formula is given in Eq. (9). If we know the precise amount of target signal that we are looking for, we can employ Eq. (9) using a corresponding value of *ε*
_{mp}. In practice, we use *ε*=*ε*
_{mp} as a constant corresponding to a signal-to-clutter ratio (SCR) in the range of interest for the problem; other options will be discussed in Section 5.2.

In the case of the sub-pixel target problem, the matched pair should be generated with the signal model of Eq. (4); that is:

In this expression, *f*
_{mp} is the fraction of the pixel occupied by the target; a fixed value is used for all of the data **x**
* _{i}*. Additional changes for the detection algorithm will be discussed below.

## 5.1. Adaptive detection in MFR space

The MFR plot in Fig. 1(b) suggests that curved decision boundaries might outperform linear boundaries when the clutter is non-Gaussian. The idea of curved boundaries in MFR space is the key concept in the adaptive detection algorithm we propose.

This algorithm is described in Table 1. Here, steps 1, 2, and 4 are necessary for using the standard AMF and *t* statistics. The matched pair in Step 3 can be generated for any signal model appropriate for the problem; it is shown here for the additive plume signal model in Eq. (2), but more elaborate signal models are also possible. Step 5 of the algorithm, generating a nonlinear decision boundary, is the key step that requires additional effort. But since the data have been reduced from the original high dimension to just two dimensions, one can effectively apply nonlinear methods, such as the support vector machine (SVM) algorithm [26].

Note: Among the parameters specified to the SVM is the ratio of weights on the two classes; greater weight on the non-plume class reduces the *P*
_{fa}. Using a sequence of ratios, one may generate a sequence of discriminating functions whose performance traces out the ROC curve.

The operation of the proposed algorithm is shown graphically in Fig. 2. The data points, initially treated as if they were background only, are first plotted in MFR space (Fig. 2(a)). A matched-pair data set is generated using a suitable value of *ε*
_{mp}. An SVM (or any binary classification algorithm) is used to produce a curved decision boundary for the matched pair at some desired alarm rate (Fig. 2(b)). Then the original data are classified according to this boundary, and a small number of samples is reclassified as containing the target signature (circled points in Fig. 2(c)).

The data in Fig. 2 are from the same EC distribution as shown in Fig. 1(b). Decision boundaries were obtained using the publicly-available libSVM routines [27]. SVM produces boundaries for arbitrary distributions of labeled data, by optimizing a surrogate loss function that approximates classification error but is convex. One can choose a variety of kernels in the SVM algorithm and set kernel parameters that affect the degree of curvature of the boundaries [26]. Fig. 3 shows the results of the linear, polynomial (degree 2), and Gaussian radial basis function kernels. Using the linear kernel produces linear decision boundaries that look like the t statistic (Fig. 1(a,b)) but are shifted vertically. Choosing a nonlinear kernel gives a very different set of contours. Although the contours are close to linear at small values of *R*, they become quite curved for larger residual values. For this simulated data, however, there is not much data at larger values of *R*, so the detection performance of the three different SVM kernels is about the same.

## 5.2. Choosing the matched-pair separation parameter εmp

In Step 3 of the algorithm and in Eq. (9), the matched pair is generated with a constant *ε*=*ε*
_{mp}. The classifier resulting from a constant *ε*
_{mp} will be optimized for finding a specific strength of plume, but in general this is not explicitly known *a priori*. In real-world situations with plumes and sub-pixel targets, the target strength is variable from pixel to pixel, and one must choose a value of *ε*
_{mp} close to the detection limit, where the false alarm rate is tolerable. One can intentionally set:

where *n* is the “number of sigmas” (square root of the SCR; see Eq. (31)) at the desired level of detection, such that strong target pixels lie above threshold and the weakest lie below. In Fig. 2(b), we chose *n*=3; nearby values gave similar results.

## 6. Applications of the proposed algorithm

In this section, we present two demonstrations to illustrate performance gains made possible by curved boundaries in MFR space. We do not intend to make general claims about the absolute field performance of these detectors - that performance will always depend on specific details of the clutter distribution and of the targets being sought.

## 6.1. Plume detection in synthetic LWIR data

The first example is a synthetic hyperspectral infrared scene. We synthesized a cluttered scene of long-wavelength infrared (LWIR) radiance at 300 K, consisting of 5 common materials with reflectance spectra shown in Fig. 4(a). The spectrum of a common chemical, chlorodifluoromethane, is used as the target to be detected, Fig. 4(b). The spectra have *N*=128 wavelengths, and a large number of pixels (*m*=16384) to yield a reliable covariance matrix. The principal source of variability in the clutter is the spectral radiance emitted by the 5 components. We also added a small amount of white Gaussian noise (~0.05% of the total radiance) to simulate instrument noise, although this has little effect.

We further modified the synthetic spectra to mimic the variability of the emissivity for a single material (e.g. asphalt) that is observed in real data [6, 14, 28]. Spectral emissivity can be altered by chemical variations, weathering, and environmental influences. We simulated material inhomogeneity by amplitude-modulating the radiance spectra with a non-Gaussian multiplier, similar to the procedure suggested by Marden and Manolakis [25]. We modified the procedure by using a multiplier with a spectral dependence, shown in Fig. 4(a), that augments the inhomogeneity in the spectral regions of the chemical absorption. The effect of this is to create a heavy-tailed distribution of MF values. The intent here is not to simulate some detailed physical origin of heavy-tailed data, but to inject enough heavy-tailed character in the clutter to show the advantage of one detection algorithm over another.

Details of the resulting radiance spectra are shown in Fig. 5. Fluctuations about the mean radiance (Fig. 5(a)) are substantial. Brightness temperature spectra of typical pixels are shown in Fig. 5(b). Two spectra are plotted, with and without added target signature. A difference spectrum in Fig. 5(c) shows that the amount of added target signature is very small: this is a situation corresponding to a very weak chemical plume, so that detection rates can be expected to be small.

A typical atmospheric radiance spectrum was added to the simulated clutter, and is evident in Fig. 5(b), but since it was constant, it does not factor into the detection problem. Atmospheric variability is not explicitly addressed in this paper, although in practice it becomes an additional source of clutter.

Figure 6 is an MFR plot of the synthetic LWIR data. The SCR of the matched filter output for this matched pair is small (~2.6), producing significant overlap of the two data clouds. A subset of the 2-D data were processed by the SVM algorithm and yielded the decision boundaries shown as thick lines. In order to get a range of boundaries with different *P*
_{fa} values, we weighted the two classes differently in the SVM analysis. Using the rbf kernel, selection of the SVM parameters can produce more or less curvature of the two boundaries, but there is little difference in performance. Repeated SVM runs on different subsets of the data produced essentially the same boundaries.

The SVM yields decision boundaries that are different from the contours associated with the *t* statistic and AMF detectors (Figs. 6(b,c)). The contours of Fig. 6 are drawn at the same set of *P*
_{fa} values. The SVM contours tend to converge at low *R*, unlike the AMF contours (which are constrained to remain parallel), and similar to the *t* statistic contours. Unlike the *t*, however, the convergence occurs at a point shifted vertically from the origin.

As in Section 4, the matched-pair framework provides a direct way to evaluate the detection and false alarm rates achieved by different algorithms. Figure 7 illustrates the detection performance of the three detectors on ROC curves. As with the simulated EC data, we see a strong divergence of the *t* statistic and AMF performance at low *P*
_{fa}, with the *t* statistic being the better performer. At higher *P*
_{fa}, the *t* statistic falters and the AMF is superior. Near this crossover point, the nonlinear SVM-based detector has better performance than either. Furthermore, the nonlinear detector is able to match the performance of the *t* statistic down to the lowest values of *P*
_{fa}. (Below *P*
_{fa}=0.02, shown in the Fig. 7(c), the SVM curve is essentially identical to the *t* statistic.) Over the full range of *P*
_{fa}, the nonlinear detector is better than or comparable to both the AMF and the *t* statistic. The advantage of the curved decision boundaries is that they can mimic the linear detectors in some regions of the MFR plot, but improve upon them in others. The AMF and *t* statistic are good choices in general, but nonlinear boundaries offer additional performance when needed.

## 6.2. Subpixel target detection in AVIRIS data

Figure 8 illustrates an extracted portion of hyperspectral data from the AVIRIS sensor [29, 30]. This sensor detects solar reflectance in the visible to short-wave infrared (SWIR) region, acquiring spectra in 224 channels, and *m*=10201 pixels are displayed in the figure. The scene has typical urban clutter and some areas of vegetation. Spectra of selected pixels are shown in Fig. 8(b), along with a target spectrum used to generate the matched pair. The target spectrum was taken from one pixel in the scene extracted by means of endmember analysis [31].

We generated a matched-pair dataset from the cluttered scene, using the radiance-replacement model of Eq. (4). We chose a small value for the target pixel fraction, *f*=0.08, corresponding to a difficult situation for detection that will yield low detection rates. As in the previous example, the matched pair is generated by synthetically adding the target signature to every pixel in the original data, producing a large number of pixels for performance comparison. The MFR plot for these conditions, shown in Fig. 9(a), is different from the MFR plot for the additive signal model because the upper cloud of points is shifted to the left and is smaller in extent, reflecting a reduced-size covariance matrix [5].

Performance of the detectors is compared in Fig. 9(b) as ROC curves. As in Fig. 7, these curves are computed numerically using the same matched pair data that was used to generate the SVM decision boundaries. While detection curves on real target-containing hyperspectral data might not be as straightforward to interpret, we feel that this in-sample performance comparison serves to demonstrate the advantage of curved decision boundaries over linear. The matched filter and the t statistic perform poorly under these conditions. They effectively assume that the target-containing pixels have the same covariance as the background pixels, and do not adjust for the fact that the target pixels have smaller variance (and are shifted to the left on the MFR plot). A more appropriate version of the matched filter for comparison purposes is the Finite-Target Matched Filter (FTMF) proposed by Schaum and Stocker, which is a GLRT detector based on Gaussian clutter and the replacement signal model [5]. This algorithm takes proper account of the covariance change, and yields a detection boundary that is curved in MFR space: it is another version of a nonlinear MFR detector. One can show with some additional algebra that the FTMF detection statistic has the following logarithmic form.

where *f* is the fractional coverage of the pixel by the target object. The maximum likelihood estimate of *f* is obtained by differentiating the log-likelihood with respect to *f* and setting the result to zero. This leads to:

where *α*=**x**
^{T}**C**
^{-1}
**x**/*N*,*β*=**s**
^{T}**C**
^{-1}
**x**/*N*,*γ*=**s**
^{T}**C**
^{-1}
**s**/*N* (following the notation in [5]). Note that *f*̂ is a function of the pixel under test, so it is variable from one sample of clutter to another. The FTMF detector can be visualized in MFR space by plotting a contour of constant *D*
_{FTMF}(**x**, *f*=*f*̂), as illustrated in Fig. 9(a). Both FTMF and SVM yield curved decision boundaries. The curves are different because SVM optimizes a different criterion, and because the AVIRIS data are not Gaussian. The SVM-based detector has better performance than the FTMF for a wide range of *P*
_{fa}. We remark that FTMF and SVM are both substantially better than the AMF and the *t*-statistic.

Figure 10 shows detection performance on a power curve: the fraction of target-containing pixels that are properly classified is plotted as a function of the strength of the signal. All four detector thresholds are set to the same *P*
_{fa}. For the SVM result, the detector was initially trained on a matched pair with SCR=0.5 (*P*
_{d}≈0.2), and the same detector was used at other SCR values. (This is accomplished by making a series of matched pairs with successively increasing f values in Eq. (10). As before, *P*
_{d} is evaluated on the matched pair.) The SVM detector performs better than the other three approaches except at very high SCR, which is outside of the weak target regime. One consideration in an SVM-based MFR detector is the value of *f* chosen in the matched pair for optimizing the decision boundary. (The AMF, *t*, and FTMF do not require a matched-pair dataset, and therefore are independent of *f*). For this example, we find that the SVM results are only weakly sensitive to the value of target strength *f* used in the matched-pair for training. We find that optimizing at any point in a range of *P*
_{d}=0.1 to 0.7 gives comparable results.

## 7. Discussion and conclusions

#### 7.1. Limitations

Because the representation of the hyperspectral data is in a two-dimensional space, the usual difficulty of adaptive learning, namely the overfitting of the data by too-complicated boundaries in too-high dimensional spaces, is largely alleviated. However, compared to standard approaches such as AMF and ACE, the extra flexibility offered by a nonlinear boundary does provide some risk that data will be overfitted. This risk is greatest where the data density is lowest, and the behavior of the support vector machine, for instance, on the tails of the distribution, can be sensitive to just a few data points. With some SVM parameters, it is possible to get disjoint decision regions in MFR space that are clearly unphysical or otherwise undesirable. Experimentation with the data in this paper suggests that overall detection performance is not strongly sensitive to the SVM parameters; nonetheless, we recommend that the practitioner examine the SVM decision boundary on the MFR plot and avoid SVM parameters that produce counter-intuitive results. The two-dimensional nature of the MFR space facilitates this visualization.

The problem of signal contamination occurs when the data include pixels that contain the target signature. This situation is inevitable in a broad area search scenario, since the target pixel locations are *a priori* unknown. Contamination is a general problem that affects all detectors, and its impact on the AMF has been previously discussed [13, 14, 18, 19]. If we generate a matched pair and label the two sets as off-plume and on-plume, there will be pixels which actually contain the plume signature but have been labeled as off-plume. The increased flexibility of the nonlinear MFR method then becomes a liability, since the procedure will try to draw a decision boundary that curves around these pixels to keep them on the off-plume side. In a few test cases, we have observed that strong target-containing pixels will influence the decision boundary, and may even decrease detection performance compared to the linear MFR boundaries. But we also found that a small number of weak target pixels is not problematic, since the decision boundary is still dominated by background-only pixels.

It may be possible to pre-process the data with the AMF and identify the strongest target-containing pixels for removal from the data for a second iteration, as suggested in Refs. [4, 18].

## 7.2. Variations and extensions

The flexibility of the nonlinear MFR approach allows for a number of variations, which permit adaptation to other signal processing problems, and can enable it to deal with some of the limitations discussed previously.

One simplification that we did not explore, except briefly in Fig. 3(a), is the use of linear decision boundaries. The AMF and the *t* statistic both provide linear boundaries, but the AMF has zero slope and the *t* statistic has zero intercept. It is straightforward to generalize to linear boundaries with arbitrary slope and intercept. This includes the AMF and *t* statistic as special cases, but should be less prone to overfitting than the more flexible nonlinear boundaries. Implementation of an SVM with a linear kernel is straightforward.

A variant of this simplification is to consider nonlinear boundaries, but with the nonlinearity constrained to a simple model, a conic section for instance, or a higher order polynomial curve. Again the idea is to address the trade-off between flexibility and robustness to overfitting. A principled way to obtain nonlinear boundaries with a small number of free parameters is to identify a form for the distribution of the background clutter, and then to derive optimal boundaries in the context of that background. For instance, when the background is Gaussian, the optimal boundaries are given by the AMF, but for more general elliptically-contoured distributions, one can derive optimal boundaries using, for instance, the generalized likelihood ratio test (GLRT) [8, 32]. These boundaries can be displayed in MFR space, and in some cases the decision statistic may be reducible to analytic form [24].

When using a specific value of the target signal strength (*ε*
_{mp} or *f*
_{mp}) the detector is optimized for that signal strength, and may perform less effectively on either weaker or stronger signals. We showed that there is only limited sensitivity to the choice of target strength, but it may be desirable for some problems to use a distribution of plume strengths in the matched pair, in Eq. (9) or Eq. (10). The idea is that for each off-plume pixel, one or possibly several matched pixels will be generated using a distribution of *ε*
_{mp} (or *f*
_{mp}) values. This effectively implements a Bayesian likelihood ratio test [33].

An extension to the problem of detecting multiple targets simultaneously would employ a subspace matched filter that can be used in place of the AMF. Working in a higher-than-two dimensional space might also be beneficial: for instance, one could identify separate MF coordinates for each of the target signatures, along with a residual R direction. This higher dimensional space would relinquish some of the advantages of the two-dimensional space that we have advocated, but this space would still be of much lower dimension than the number N of channels in the hyperspectral imagery.

## 7.3. Conclusions

We have presented an approach to the detection of targets in hyperspectral remote sensing data. It is based on two long-established detection metrics, the Adaptive Matched Filter (AMF) and the *t* statistic, but it employs a nonlinear classification strategy to achieve increased performance for a broader range of distributions and signal models. For simple cases (*e*.*g*. Gaussian clutter and additive target signal), the algorithm reduces to simple detectors (*e*.*g*. AMF). We have reported performance improvement for examples involving weak gaseous plume detection and solid subpixel target detection.

Transforming the data to a two-dimensional space simplifies high-dimensional data for which traditional detector boundaries can be difficult to visualize. In this paper, we chose the matched filter (MF) and the residual (*R*) as two useful dimensions for displaying the data. Decision boundaries for the AMF and the *t* statistic are lines in MFR space. The AMF performs well when the clutter distribution is accurately modeled as Gaussian and the signal model is additive; in other circumstances, however, a curved decision boundary is a natural extension that offers increased flexibility.

While our development of this approach is cast in terms of the plume detection problem, we have shown that it is readily extended to other signal models, in particular more sophisticated models than those implied by AMF and ACE. One example is the Finite Target Matched Filter (FTMF) [5], a nonlinear detector that performs much better than the standard AMF for the sub-pixel target problem because it takes into account the change in covariance from background class to target class. Like the AMF, the FTMF is optimized for Gaussian clutter, and although the detection boundary is nonlinear, it can be exhibited in an MFR space. By considering more general nonlinear boundaries in MFR space, one can achieve better performance than the FTMF with non-Gaussian clutter. By using an SVM to find such a nonlinear boundary, we are employing essentially the same algorithm we used for the plume detection problem; the only difference is in how we set up the matched-pair training data.

## Appendix: derivation of MF, R, and t statistic

In this section, we derive expressions for MF, *R*, and the *t* statistic, from the perspective of Generalized Least Squares (GLS). Although this can be found in textbooks [34, 35], we review it here to clarify the notation and terminology, and give motivation for MFR space.

The model for the observation is:

$$E\left\{\mathbf{w}\right\}=\mathbf{0}$$

$$E\left\{{\mathbf{ww}}^{T}\right\}=\mathbf{C}\left(\mathrm{known}\right)$$

The expectation operator is *E*, the hyperspectral data have covariance matrix **C**, *N* spectral bands and *m* pixels, and *p* denotes the number of targets that potentially account for an observed pixel spectrum **x**. The unknown background clutter is denoted **w**. The quantity **H θ** is a linear combination of some known target spectra, where

**H**is

*N*×

*p*and θ is

*p*×1.

We minimize a residual sum of squares that is a whitened version of that for ordinary least squares (OLS). The vector of residuals (**x**-**H θ**) is multiplied by the whitening matrix, and the sum of squares of components of that vector is minimized:

$$={\left(\mathbf{x}-\mathbf{H}\theta \right)}^{T}{\mathbf{C}}^{-1}\left(\mathbf{x}-\mathbf{H}\theta \right)$$

The best estimate of *θ* is the value that minimizes *RSS*. It can be shown to be [34, 35]:

The sum of squares due to error, SSE, is the value of *RSS* with *θ*̂ substituted for *θ* :

$$={\left(\mathbf{x}-\hat{\mathbf{x}}\right)}^{T}{\mathbf{C}}^{-1}\left(\mathbf{x}-\hat{\mathbf{x}}\right)$$

$$={\mathbf{x}}^{T}{\mathbf{C}}^{-1}\mathbf{x}+{\hat{\mathbf{x}}}^{T}{\mathbf{C}}^{-1}\hat{\mathbf{x}}-2{\hat{\mathbf{x}}}^{T}{\mathbf{C}}^{-1}\mathbf{x}$$

where **x**̂=**H**
*θ*̂. Using (16), one can verify that

which simplifies SSE to:

We have identified the total sum of squares (SST) and the sum of squares due to the regression (SSR) in this equation, following common usage of these terms [35]. The formal definition of the F statistic is the ratio of two of these sums of squares, adjusted by the number of degrees of freedom [34]:

Substituting for the sums of squares yields:

$$=\genfrac{}{}{0.1ex}{}{{\mathbf{x}}^{T}{\mathbf{C}}^{-1}\mathbf{H}{\left({\mathbf{H}}^{T}{\mathbf{C}}^{-1}\mathbf{H}\right)}^{-1}{\mathbf{H}}^{T}{\mathbf{C}}^{-1}\mathbf{x}}{{\mathbf{x}}^{T}{\mathbf{C}}^{-1}\mathbf{x}-{\mathbf{x}}^{T}{\mathbf{C}}^{-1}\mathbf{H}{\left({\mathbf{H}}^{T}{\mathbf{C}}^{-1}\mathbf{H}\right)}^{-1}{\mathbf{H}}^{T}{\mathbf{C}}^{-1}\mathbf{x}}\genfrac{}{}{0.1ex}{}{N-p}{p}$$

The expression holds when **H** shrinks down to just a single vector **s**, sometimes called the “rank-1 case,” *p*=1. In that case, the expression simplifies:

For convenience, we define the matched filter vector **q** as:

This allows a compact expression for the *F* statistic:

In the rank-1 case, the *t* statistic is simply the square root of the *F* statistic [35]:

If the signal model, Eq. (14), is modified so that a scalar multiplies the clutter term **w**, the covariance of the on-plume pixels differs from the off-plume pixels by a scalar multiplier. The Generalized Likelihood Ratio Test (GLRT) then yields the *F* statistic [4, 9, 34]. Ref. [10] derives Eq. (25) as a detector that is invariant to the magnitude of **x**, calling it the CFAR matched subspace detector.

We define MF as the numerator of the t statistic, and R as the denominator:

This definition of MF is consistent with the signal processing literature [7]. Strictly speaking, the covariance matrix of the background clutter is not “known,” but must be computed from the hyperspectral image data itself (see below). It is sometimes referred to as the “adaptive” matched filter (AMF) to reflect this [36].

From the standpoint of likelihood-ratio testing, when **w** is Gaussian and ε is a known constant, the matched filter is Neyman-Pearson optimal. In the more usual case where ε varies among the target pixels of interest, the matched filter is Uniformly Most Powerful [8]. Among linear filters, the AMF is optimal for a range of distributions, including elliptically contoured (EC) distributions [15]. The residual *R* corresponds to the portion of the observed data left over after the matched filter has been applied and subtracted. It also has an interesting geometrical interpretation [10, 35].

In the remote sensing literature, the *t* statistic, or ${t}_{N}=t\u2044\sqrt{N-1}$, is used less often than the closely related Adaptive Coherence Estimator (ACE) [9].

$$=\genfrac{}{}{0.1ex}{}{{t}_{N}}{\sqrt{{t}_{N}^{2}+1}}={\left(\genfrac{}{}{0.1ex}{}{1}{{t}_{N}^{2}}+1\right)}^{-1\u20442}$$

Since ACE and *t* are monotonically related, their decision boundaries and ROC curves are identical. Relationships among these detectors have been discussed in detail in Ref. [9].

The mean value of the matched filter output MF for the target-containing half of the matched pair has a simple relationship to the signal-to-clutter ratio SCR:

$$={\epsilon}_{\mathrm{mp}}{\left(\genfrac{}{}{0.1ex}{}{{\mathbf{C}}^{-1}\mathbf{s}}{\sqrt{{\mathbf{s}}^{T}{\mathbf{C}}^{-1}\mathbf{s}}}\right)}^{T}\mathbf{s}={\epsilon}_{\mathrm{mp}}\sqrt{{\mathbf{s}}^{T}{\mathbf{C}}^{-1}\mathbf{s}}$$

$$=\sqrt{\mathrm{SCR}}$$

Here, SCR is the square of the ratio of the signal part of MF, *ε*
**q**
^{T}**s**, to the variance of MF on the background pixels. The latter quantity is:

(Note that 〈**q**
^{T}**w**〉=0.) Then

(Reed *et al.* [7] call this the “signal-to-noise ratio.”) Thus, SCR characterizes the vertical separation between the two classes of points on an MFR plot.

One well-known operational problem with the AMF (and other covariance-based approaches, such as linear discriminant analysis for terrain classification) is that the sample covariance for only a small number of background samples is not well determined and may lead to degraded performance, particularly when the dimension of the data (number of spectral bands) is high. This point has been discussed in the literature [7, 37, 38]. Fortunately, modern hyperspectral sensors generate many thousands of pixels, and the sample covariance **C** closely approximates the true covariance **C**
_{true}. Reed *et al*. [7] showed that in the case of Gaussian background clutter, the SCR obtained using the sample covariance has an expectation value given by:

While this expression was derived for Gaussian clutter, we can reasonably expect fast convergence with other types of clutter as well. In practice, there are so many background pixels that frequently *N*/*m*<10^{-2}, and we restrict our attention to those cases in this paper.

## Acknowledgments

This work was supported by the Laboratory Directed Research and Development program at Los Alamos National Laboratory. Additional support came from the Department of Energy Office of Nonproliferation Research and Engineering. We thank Brian McVey for LWIR simulation software, and Brad Henderson for useful discussions.

## References and links

**1. **J. A. Richards and X. Jia, *Remote Sensing Digital Image Analysis* (Springer, New York, 2006).
[CrossRef]

**2. **J. B. Adams and A. R. Gillespie, *Remote Sensing of Landscapes with Spectral Images* (Cambridge Univ. Press, New York, 2006).
[CrossRef]

**3. **A. Hayden, E. Niple, and B. Boyce, “Determination of trace-gas amounts in plumes by the use of orthogonal digital filtering of thermal-emission spectra,” Appl. Opt. **35**, 2803–2809 (1996).
[CrossRef]

**4. **D. Manolakis, L. Jairam, D. Zhang, and M. Rossacci, “Statistical models for LWIR hyperspectral backgrounds and their applications in chemical agent detection,” Proc. SPIE **6565**, 656525-1, (2007).
[CrossRef]

**5. **A. Schaum and A. Stocker, “Spectrally-selective target detection,” in *Proc. Intl. Symp. Spectral Sens. Research, San Diego*,
B. A. Mandel, ed., p. 23. Available at http://leupold.gis.usu.edu/docs/protected/procs/isssr/1997/.

**6. **L. Kirkland, K. Herr, E. Keim, P. Adams, J. Salisbury, J. Hackwell, and A. Treiman, “First use of an airborne thermal infrared hyperspectral scanner for compositional mapping,” Remote Sens. Environ. **80**, 447–59 (2002).
[CrossRef]

**7. **I. S. Reed, J. D. Mallett, and L. E. Brennan, “Rapid convergence rate in adaptive arrays,” IEEE Trans. Aerosp. Electron. Syst. **10**, 853–863 (1974).
[CrossRef]

**8. **S. Kay, *Fundamentals of Statistical Signal Processing, Volume 2: Detection Theory* (Prentice Hall, NJ, 1998).

**9. **S. Kraut, L. L. Scharf, and L. T. McWhorter, “Adaptive subspace detectors,” IEEE Trans. Signal Process. **49**, 1–16 (2001).
[CrossRef]

**10. **L. L. Scharf, *Statistical Signal Processing* (Addison-Wesley, Reading, MA, 1990).

**11. **J. Schott, *Remote Sensing: the Image Chain Approach* (Oxford University Press, New York, 1997).

**12. **N. B. Gallagher, B. M. Wise, and D. M. Sheen, “Estimation of trace vapor concentration-pathlength in plumes for remote sensing applications from hyperspectral images,” Analytica Chimica Acta **490**, 139–152 (2003).
[CrossRef]

**13. **J. Theiler and B. R. Foy, “Effect of signal contamination in matched-filter detection of the signal on a cluttered background,” IEEE Geosci. Remote Sens. Lett. **3**, 98–102 (2006).
[CrossRef]

**14. **S. J. Young, “Detection and Quantification of Gases in Industrial-Stack Plumes Using Thermal-Infrared Hyperspectral Imaging,” Tech. Rep. ATR-2002(8407)-1, The Aerospace Corporation (2002).

**15. **J. Theiler, B. R. Foy, and A. M. Fraser, “Characterizing non-Gaussian clutter and detecting weak gaseous plumes in hyperspectral imagery,” Proc. SPIE **5806**, 182–193 (2005).
[CrossRef]

**16. **D. Manolakis, D. Marden, J. Kerekes, and G. Shaw, “On the Statistics of Hyperspectral Imaging Data,” Proc. SPIE **4381**, 308–316 (2001).
[CrossRef]

**17. **B. R. Foy, R. R. Petrin, C. R. Quick, T. Shimada, and J. J. Tiee, “Comparisons between hyperspectral passive and multispectral active sensor measurements.” Proc. SPIE **4722**, 98–109 (2002).
[CrossRef]

**18. **D. Manolakis, “Taxonomy of detection algorithms for hyperspectral imaging applications,” Opt. Eng. **44**, 066403 (2005).
[CrossRef]

**19. **J. Theiler, B. R. Foy, and A. M. Fraser, “Nonlinear signal contamination effects for gaseous plume detection in hyperspectral imagery,” Proc. SPIE **6233**, 62331U-1 (2006).

**20. **S. J. Young, B. R. Johnson, and J. A. Hackwell, “An in-scene method for atmospheric compensation of thermal hyperspectral data,” J. Geophys. Res. Atm. **107**, 4774 (2002).
[CrossRef]

**21. **D. Morgan, “Spectral absorption pattern detection and estimation. I. Analytical techniques,” Appl. Spectrosc. **31**, 404–415 (1977).
[CrossRef]

**22. **D. Morgan, “Spectral absorption pattern detection and estimation. II. System applications and design procedures,” Appl. Spectrosc. **31**, 415–424 (1977).
[CrossRef]

**23. **J. Theiler, B. Foy, and A. Fraser, “Beyond the adaptive matched filter: nonlinear detectors for weak signals in high-dimensional clutter,” Proc. SPIE **6565**, 656503-1 (2007).

**24. **J. Theiler and B. Foy, “EC-GLRT: Detecting weak plumes in non-Gaussian hyperspectral clutter using an elliptically-contoured generalized likelihood ratio test,” Proc. IEEE Intl. Geosci. Remote Sensing Symp. (2008).

**25. **D. B. Marden and D. Manolakis, “Using elliptically contoured distributions to model hyperspectral imaging data and generate statistically similar synthetic data,” Proc. SPIE **5425**, 558–572 (2004).
[CrossRef]

**26. **B. Schölkopf and A. J. Smola, *Learning with Kernels* (MIT Press, Cambridge, MA, 2002).

**27. **C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines (2001). Software available at www.csie.ntu.edu.tw/~cjlin/libsvm.

**28. **T. J. Cudahy, J. Wilson, R. Hewson, P. Linton, P. Harris, M. Sears, K. Okada, and J. A. Hackwell, “Mapping variations in plagioclase felspar mineralogy using airborne hyperspectral TIR imaging data,” in *Proc. IEEE 2001 Intl. Geosci. Remote Sensing Symp., Sydney, Australia*,
T. Milne and B. Cechet, eds., pp. 730–732 (IEEE, Piscataway, NJ, 2001).

**29. **G. Vane, R. O. Green, T. G. Chrien, H. T. Enmark, E. G. Hansen, and W. M. Porter, “The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS),” Remote Sens. Environ. **44**, 127–143 (1993).
[CrossRef]

**30. **AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) Free Standard Data Products, Jet Propulsion Laboratory (JPL), National Aeronautics and Space Administration (NASA), http://aviris.jpl.nasa.gov/html/aviris.freedata.html.

**31. **M. E. Winter, “N-FINDR: an algorithm for fast autonomous spectral end-member determination in hyperspectral data,” Proc. SPIE3753, 266–275 (1999).
[CrossRef]

**32. **A. M. Mood, F. A. Graybill, and D. C. Boes, *Introduction to the Theory of Statistics*, 3rd ed. (McGraw-Hill, New York, 1974).

**33. **A. Schaum, “Hyperspectral Target Detection using a Bayesian Likelihood Ratio Test,” Proc. IEEE Aerospace Conf. **3**, 1537–1540 (2002).

**34. **A. C. Rencher, *Linear Models in Statistics* (Wiley, New York, 2000).

**35. **N. R. Draper and H. Smith, *Applied Regression Analysis* (Wiley, New York, 1998).

**36. **F. C. Fuhrmann, D. R. Robey, E. J. Kelly, and R. Nitzberg, “A CFAR adaptive matched filter detector,” IEEE Trans. Aerosp. Electron. Syst. **28**, 208–216 (1992).
[CrossRef]

**37. **C. Lee and D. A. Landgrebe, “Analyzing high-dimensional multispectral data,” IEEE Trans. Geosci. Remote Sens. **31**, 792–800 (1993).
[CrossRef]

**38. **P. V. Villeneuve, H. A. Fry, J. Theiler, B. W. Smith, and A. D. Stocker, “Improved matched-filter detection techniques,” Proc. SPIE **3753**, 278–285 (1999).
[CrossRef]