The synthesis of a new category of spatial filters that produces sharp output correlation peaks with controlled peak values is considered. The sharp nature of the correlation peak is the major feature emphasized, since it facilitates target detection. Since these filters minimize the average correlation plane energy as the first step in filter synthesis, we refer to them as minimum average correlation energy filters. Experimental laboratory results from optical implementation of the filters are also presented and discussed.
© 1987 Optical Society of America
The technique of using matched spatial filters (MSFs) for optical pattern recognition has been well investigated, and several methods have been proposed to use it for recognition of 2-D images in the presence of noise and geometric distortions. Correlators are one of the most powerful techniques for locating multiple objects in parallel, and the MSF is optimal for recognition of targets in the presence of white noise. However, the MSF has two major limitations: (1) the output correlation peak degrades rapidly with geometric image distortions; and (2) the MSF (matched to one given image) cannot be used for multiclass pattern recognition.
The concept of MSFs has been greatly extended in recent years by several types of generalized filter. These methods can be broadly classified into two categories. The first category concerns in-plane 2-D scaling and rotation distortions. Such methods include the use of space-variant transforms and circular harmonic functions (CHFs). In these techniques, the intensity at the origin of the correlation function cannot generally be specified during filter synthesis. The second category of filters uses training images that are sufficiently descriptive and representative of the expected distortions. These filters can be viewed as generalizations of MSFs for the identification of multiple targets in the presence of virtually any type of distortion (i.e., 3-D distortions). The intensity at the center of the cross-correlation function (defined as the filter output) can be specified for each training image during synthesis, and several objects can be handled by one filter by including all object classes in the training set.
Several versions of this second class of filter exist.– The most well known is the synthetic discriminant function (SDF) and its variations.– We refer to all such combination filters as correlation filters since they are designed for implementation in optical correlators. We also use the terms center and origin interchangeably to refer to the point of interest (the true peak) in the correlation plane (i.e., the filter output). This loses no generality since correlation is a shift-invariant operation. When only one image is present, the conventional SDF reduces to the MSF of that image. The use of more training set data is intended, (and required) to reduce sensitivity to image distortions. An SDF with minimum variance has been derived and is one optimal filter. However, it controls SNR at the correlation peak only. None of the prior filters offers optimum detection. To aid in detecting correlation peaks, correlation filters using shifted versions of each training set image have been suggested to control the shape of true correlation peaks. Peak-to-sidelobe ratio (PSR) filters have also been used. However, these only maximize PSR in the vicinity of the peak but not in the full correlation plane. PSR filters do not allow control of the correlation peak, and both PSR and correlation filters require typically 5 times more training set images (for four shifts of each training set image). The motivation for these last two filters is to supress the presence of extraneous correlation peaks (away from the central peak) that make detection hard.
For the MSF, when the correct image is present at the input, the output of the correlator is the autocorrelation function. Thus locating an image with its MSF is simple, since the peak of the autocorrelation function is easy to identify. However, the linear combination correlation filters lack a sharp correlation peak, since the input image cross-correlates with all images in the training set. This often produces sidelobes of high intensity and degrades correlation plane PSR. The proposed filter in this paper uses a new technique for producing sharp correlation peaks and allowing easy detection in the full correlation plane as well as control of the correlation peak value. As before, training set images are used to reduce the filter's sensitivity to 3-D object distortions.
Section II contains a description of the mathematical notation and terminology used in this paper. The filter design problem is formulated in Sec. III, and its solution is discussed in Sec. IV. Several interesting properties of the filter are then discussed in Sec. V. A gradient descent-type procedure is proposed in Sec. VI for obtaining relatively constant correlation plane energies for all training images. Section VII summarizes the results of initial computer simulations and quantitative data to evaluate filter performance. Section VIII discusses the results obtained on an optically implemented laboratory version of the filter.
The ith training image is described as a 1-D discrete sequence (obtained by lexicographic ordering the rows of the image) denoted by xi(n). Its discrete Fourier transform (DFT) is denoted by Xi(k). In this discussion, we describe the discrete image sequence as a column vector xi of dimensionality d equal to the number of pixels in the image xi(n), i.e.,Equation (3) is a direct realization of Parseval's theorem. Using the vector form of the image sequence, we can also write Eq. (3) as
III. Problem Definition
We now state the pattern recognition problem to be solved. We wish to design a correlation filter that ensures sharp correlation peaks while allowing constraints on the correlation peak values and retaining shift invariance. We also seek to improve tolerance to distortion using a reduced number of training images. Our main concern in this paper is the production of sharp easily detected correlation peaks (since the distortion tolerance of correlation filters has been addressed elsewhere). To achieve good detection it is necessary to reduce correlation function levels at all points except at the origin of the correlation plane, where the imposed constraint on the peak value must also be met. Specifically, the value of the correlation function must be at a user specified value at the origin but is free to vary elsewhere. This is equivalent to minimizing the energy of the correlation function while satisfying intensity constraints at the origin.
In vector notation, the correlation peak amplitude constraint isEq. (6a), gi(0) is the value of the output correlation at the peak (origin). This filter must also minimize the correlation plane energy Eq. (6a), which is written for all images as
The solution to this problem does not exist because the simultaneous constrained minimization of all Ei (i = 1,2, … ,N) is not possible. We, therefore, attempt to minimize the average value of Ei (average correlation energy) in Eq. (6b) while meeting the linear constraints in Eq. (7). Hence we refer to the proposed filter as a minimum average correlation energy (MACE) filter.
We make several observations at this point regarding the differences between the MACE filter and the peak-to-sidelobe ratio correlation filter. These PSR filters optimize the correlation plane PSR (but only in a small region about the peak). The PSR filter requires shifted images (and hence more training set images) for synthesis and does not allow control of the peak value as in Eq. (7). The MACE filter is attractive and unique since it does not require shifted images (during filter synthesis) and since it allows control of correlation peak intensities. Moreover the MACE filter is synthesized in the frequency domain, whereas the PSR and all other correlation filters are synthesized in the space domain. The algorithm for PSR filter synthesis maximizes the ratio of average true and false class correlation peak intensities (with shifted images being false class images). Our MACE filter seeks to minimize the average correlation plane energy for the training images. The algorithm for PSR filter synthesis maximizes a quadratic ratio by solving a generalized eigenvector problem, while the synthesis algorithm for MACE filters uses the method of Lagrange multipliers (as will be shown in Sec. IV) to minimize a quadratic term subject to linear constraints. Thus MACE filters differ significantly in concept and mathematical detail from PSR filters.
A. MACE Filter Solution
The average correlation plane energy isEq. (8) as Eq. (10). Since the minimum is not affected by a scale factor, we must minimize H+DH subject to the linear constraints X+H = u. In Sec. VI, we discuss selection of nonequal αi to improve performance.
The solution to this problem may be found using the method of Lagrange multipliers. This solution method is possible since we solve for the filter in the frequency domain. The vector H given byAppendix.
B. General Solution
Assume that the d × d matrix A is nonsingular. A general solution vector H given byEqs. (12) minimize the quadratic term H+A−1H subject to the linear constraint X+H = u. This result is well known and used extensively in many areas of research. In this paper, we limit our attention to the relationship between the SDFs and the general family of solution vectors H. In fact, the general solution form in Eqs. (12) unifies several existing types of SDF and provides a common ground for comparison, as we now discuss.
Kallman has suggested a minimax formulation of the problem to maximize the correlation peak PSR. In this approach, the filter is assumed to be a linear combination of the training images. Moreover, the constraint vector u is allowed to be complex to obtain a complex image plane correlation filter. Computing the solution to the minimax problem (as suggested by Kallman) requires enormous amounts of computer time, and an exhaustive search must be carried out to optimize the proposed criterion. Our method does not restrict the solution to be a linear combination of the training images and requires far less CPU time. Although Kallman's filter is ideal, the performance of the MACE filter (as shown in Sec. VII) excells that of most other existing correlation filters. As sated before, the MACE filter can be synthesized in a relatively short time with fewer training images and is hence preferable.
If A is the identity matrix (A = I), the filter vector reduces to the conventional SDF-1 or projection SDF and is given by H = X(X+X)−1u. Recall that all terms refer to quantities in the frequency domain. Therefore, this expression represents SDF-1 (or projection SDF) filters in the frequency domain. Equivalent formulations in the space domain are also possible. Thus, when A is the identity matrix, we have the more familiar space domain expression h = x(xTx)−1u. Note that in this form x contains the training images, not their Fourier transforms. The vector u must be scaled by the constant d (the dimensionality of the training vectors) to obtain the same projection outputs as in the case of the frequency domain method.
A second example is the minimum variance synthetic discriminant function (MVSDF) proposed by Kumar. Assume that the images to be identified are corrupted by additive zero mean noise with a covari-ance matrix C. It was shown that when A = C−1, the resulting filter maximizes the output SNR at the correlation peak by minimizing the variance of the filter output peak value.
We have provided a third choice for A in this paper. We have shown (in the frequency domain) that when A is diagonal and its nonzero elements constitute a sequence which is the reciprocal of the average power spectrum of the training data, the resulting filter H minimizes the average cross-correlation energy of the training data and the filter. An equivalent formulation in the space domain is possible where the matrix A would be Toeplitz. The problem was formulated in the frequency domain for the sake of analytical simplicity.
V. Properties of the MACE Filter
In this section, we discuss three noteworthy properties of the proposed filter. The structure of the MACE filter as a cascade of a whitening filter is examined in Sec. V.A. Sections V.B and V.C discuss special aspects of the filter's performance, proving that correlation energies obtained with the MACE filter cannot be further reduced and that in the extreme case a delta function is obtained in the output correlation plane.
A. Structure of the Optimal MACE filter
In this section, we show that the MACE filter can be interpreted as the cascade of two stages. The first stage has a transfer function related directly to the average power spectrum of the training data, and the second stage is a simple projection SDF based on the training images filtered by the first stage. Recall that
Let D−0.5 = P, i.e., P is a diagonal matrix with its diagonal elements being the reciprocal square roots of the diagonal elements of D. ThenEquation (15) can be described by the block diagram in Fig. 1. The input FT data Xi is first filtered by P (which may be viewed as a spectrum whitening filter), and then filtered by (the projection SDF based on the filtered data) to obtain the final output ui. The above discussion is important for the following two reasons.
- The MACE filter is the same as the conventional SDF operating on preprocessed (filtered) data, where the preprocessor forces the average (over all training images) power spectrum of the training images to become white.
- The MACE filter is also optimal for target recognition in the presence of noise for which P is the whitening filter. This is a direct consequence of the earlier results which show that the optimal filter for a particular type of input noise is a cascade of the whitening filter and the conventional SDF based on the transformed data.
B. Preprocessing Invariance
In this section we prove that no linear preprocessing of the training data can alter or improve on the performance of the MACE filter. In other words, we show that, although the filter structure changes if the training data are linearly preprocessed, the correlation energies Ei do not change. This means that high pass, low pass, or any bandpass filtering of the data is of no consequence. This statement is significant because intuitively one may feel that high pass filtered data should yield lower correlation energies. If such filtering is useful, it is included automatically in the filter.
Assume that we prefilter the training data by a linear shift-invariant filter whose DFT is given by F(k). Let be the DFT of the filtered data. Then, in the frequency domain, we have
Define the diagonal matrix S so that S−0.5(k,k) = F(k). In matrix vector notation, we then haveEq. (18) and the term . Using Eq. (17), Eq. (19) in matrix form] Eq. (22) and in Eq. (17) into Eq. (18), we get Eq. (4). Thus we conclude that the MACE filter's performance cannot be changed by preprocessing the input images with any linear filter. In this sense, the MACE filter achieves the lowest possible values of Ei while meeting linear projection constraints.
C. Single Training Image
Consider the case when N = 1. Let X represent the DFT of the single training image. The diagonal elements of D are then given byEq. (25) into Eq. (26) to obtain Eq. (11) reduces to Eq. (1), and H(k) corresponds to H, we have the following frequency domain expression for the single training image MACE filter: Equation (29) is identical to the phase-correlation filter proposed by Pearson et al. Thus their filter is a special case of the MACE filter and is obtained when N = 1. Note that the single training image MACE filter is not a phase-only filter since its magnitude is given by |H(k)| = |u| · [1/|X(k)|] and in general is not constant. When X*(k) is input to the system (as in the case of the MSF) with the filter in Eq. (29), the data leaving the frequency plane are Eq. (30) represents the product of the Fourier transform of the input and the filter, we obtain the output correlation
VI. Iterative Energy Scatter Reduction
The MACE filter described in Sec. IV minimizes the average correlation energy Eav in Eq. (10) for the training set. This ensures that (on the average) all training data correlation planes yield the sharpest possible peaks while meeting the imposed constraints. We now advance a further iterative improvement to the filter. We note that some individual Ei values may lie much below Eav while others may exceed Eav by a large amount. We thus consider a filter that minimizes the largest of the individual Ei to reduce the scatter in the Ei values. This increases the value of Eav by a small amount but reduces the scatter in the Ei values, and this is preferable. The scatter for all the Ei is
This final MACE filter is derived from the optimal MACE filter in Eq. (11) as a minimax optimization problem using a simple gradient descent procedure. To reduce the correlation energies of those images whose Ei (from the optimal MACE filter) are large, we alter their αi coefficients in Eq. (9). The coefficients αi determine the contribution of Di towards D and hence the contribution of Ei towards Eav. When all αi are not exactly equal to 1, the weighted sum of the Ei is not the exact average. We denote this weighted sumEq. (11) with the altered αi. This process is continued until the scatter reaches a minimum. The algorithm is summarized in Table I. In our tests, we used P = 3 in Table I. Smaller values of P result in slower descent, and large P values may cause oscillations. A formal proof for the convergence of this procedure does not yet exist. However, this algorithm was found to converge in all cases examined.
VII. Initial Simulation Results
The new suboptimal MACE filter was synthesized to discriminate between a tank and an armored personnel carrier (APC). Thirty-six images of each object were available from a 20° depression angle at 10° increments in aspect. Six images of each object were chosen at aspect intervals of 60° for a two-class training set of twelve images. Each image contained 32 × 32 pixels with the pixel values coded to 256 gray levels. Edge enhancement was not performed. We now report the test results obtained by correlating the training images with the MACE filter. Correlation output amplitudes of 1.0 and 0.296 were arbitrarily specified for true (class 1, tank) and false (class 2, APC) class targets. Since detection is in intensity mode, we expect to measure output correlation intensities of 1.0 and 0.2962 = 0.0876. The total CPU time for filter synthesis (including iterative scatter reduction) based on these twelve training images was 50.4 s on a VAX 11/ 750.
The results of the iterative procedure for minimizing correlation energy scatter are shown in Fig. 2. The ● points in Fig. 2 are the initial individual correlation energy levels Ei of the training images prior to iterative reduction of scatter σ2. The first six training images are tanks, and training images 7 to 12 are APCs. The average correlation energy value is found to be 3.875. This is the minimum produced by the initial algorithm for the given training set. The scatter σ2 is 2,13. The × points in Fig. 2 show the correlation energies for each image after two iterative cycles of scatter reduction with P = 3. The new correlation energy average is 4.09 (an increase of 0.22), while the scatter is reduced to 1.03 (a reduction of 1.10). A relatively significant decrease in the scatter of the correlation energies is obtained at the cost of an acceptably small increase in the average energy value, as shown in Fig. 2. Note that E4 (the correlation energy for the fourth training image) increased while all other true class Ei, i = 1, 2, … ,6 decreased. This can be attributed to the fact that the initial value of E4 was sufficiently smaller than Emax to cause α4 to be small. In general, the results of the scatter reduction algorithm are data dependent, and no significant comment can be made on the behavior of individual Ei values.
Typical 3-D plots of class 1 and class 2 correlation planes are shown in Figs. 3(A) and (B), respectively. The sharpness of the correlation peak is excellent, and the sidelobes away from the peaks are very low in both cases. Tables II and III list the statistics for true and false class training data, respectively. These data include the test data identifiers, the intensity at the center of the correlation peak, the largest value anywhere in the correlation plane, the location of the largest correlation plane peak, plus two measures (N and PSR) of the sharpness of the correlation peak described below. The parameter N listed is the number of standard deviations above the mean of the correlation plane that the peak is. PSR is the peak-to-sidelobe ratio measured in a 11 × 11 pixel region around the peak. The full 64- × 64-pixel correlation plane was stored in 64 × 64 memory arrays. Pixel (33,33) is the center at which the value is user specified (1.0 or 0.0876). We note that all twelve correlation planes satisfied the imposed constraints at the center. For the true class 1 images, the peak at the center is also the largest peak and has very high N and PSR measures indicating a sharp peak. The largest peak for the false class objects to be rejected is not always at the center but is three pixels off-center. Its value is much lower than the true class peak, and thus detection of the class 1 objects is improved. As seen, the largest peaks anywhere for class 2 objects have low values between 0.08 and 0.21. In general, the N and PSR measures are much larger for true class 1 than for false class 2 peaks, as expected.
VIII. Experimental Results
We now present initial optical laboratory results obtained with MACE filters. Twelve high resolution (256 × 256) images were used for filter synthesis, six images of the tank and six of the APC. Output correlation peak amplitudes of 1.0 (for the tank) and 0.707 (for the APC) were specified. The measured output intensities are expected to be 1.0 and 0.5 for class 1 and class 2 images, respectively. The MACE filter was synthesized in the frequency plane, and the image plane MACE filter was obtained by an inverse DFT. It is real since the training data are real. The 2-D discrete filter image h(i,j) was regenerated from h(n) by reordering the samples appropriately. The resulting gray level image plane filter was then recorded on a laser printer using halftoning techniques with sixty-four gray levels employed. The image plane MACE filter from the laser printer was photoreduced to 0.5 × 0.5 cm2. Its frequency domain matched filter was formed optically in λ = 633 nm with a fL = 371 mm lens and an optical reference beam at 20°.
A test scene of two tanks and two APCs was generated (Fig. 4). It contains two training set images (the 0° tank and APC at the left) and two nontraining set images (the 10° rotated tank and APC shown at the right). This test scene was recorded with sixty-four gray levels on a laser printer, photoreduced to 1 × 1 cm2, and placed in the input plane of an optical frequency plane correlator, with the MACE filter in the frequency plane. Figure 5 shows cross sections of the correlation output for the two object classes. The cross section of the correlation output for the two tank inputs is shown in Fig. 5(A). Two large and sharp peaks occur. Figure 5(B) shows the outputs for the class 1 (tank) as the left peak and the class 2 object (APC) as the right peak. The nearly equal peaks in Fig. 5(A) demonstrate the distortion tolerance of the filter. The tank correlation peak at the left in Fig. 5(B) is of the same height. The APC correlation peak to the right in Fig. 5(B) is seen to be half of the height of the tank peak, in agreement with theory. This verifies the ability to control correlation peak values and reject one class of object. The sidelobes are seen to be low, and hence the PSR of the MACE filter is demonstrated.
Let A be a nonsingular matrix. We wish to find a vector H so that the quadratic term H+ AH is minimized subject to the linear constraintEq. (Al). Since the matrix A is invertible, we can rewrite H as (A5) into Eq. (Al), the constraint equation becomes Eq. (A7) into Eq. (A5), we obtain the final expression for H as Eq. (A8) simultaneously satisfies X+H = u and minimizes H+ AH.
The authors acknowledge the support of this research by the independent research and development funds of General Dynamics–Pomona. We thank J. Z. Song for helpful laboratory assistance.
Figures and Tables
1. A. B. VanderLugt, “Signal Detection by Complex Matched Spatial Filtering,” IEEE Trans. Inf. Theory IT-10, 139 (1964). [CrossRef]
8. B. V. K. Vijaya Kumar, “Minimum Variance Synthetic Discriminant Functions,” J. Opt. Soc. Am. A 3, 1579 (1986). [CrossRef]
9. B. V. K. Vijaya Kumar and A. Mahalanobis, “Alternate Interpretation for Minimum Variance Synthetic Discriminant Functions,” Appl. Opt. 25, 2484 (1986). [CrossRef]
10. J. J. Pearson, D. C. Hines Jr., S. Golosman, and C. D. Kuglin, “Video-Rate Image Correlation Processor,” Proc. Soc. Photo-Opt. Instrum. Eng.119, 197 (IOCC1977).