Automatic Target Acquisition of the DEMO III Program by Sandor Der, Alex Chan, Gary Stolovy, Michael Lander, and Matthew Thielke ARL-TR-2683 August 2002 Approved for public release; distribution unlimited. The findings in this report are not to be construed as an official Department of the Army position unless so designated by other authorized documents. Citation of manufacturer’s or trade names does not constitute an official endorsement or approval of the use thereof. Destroy this report when it is no longer needed. Do not return it to the originator. Army Research Laboratory Adelphi, MD 20783-1197 ARL-TR-2683 August 2002 Automated Target Acquisition for the DEMO III Program Sandor Der, Alex Chan, Gary Stolovy, Michael Lander, and Matthew Thielke Sensors and Electron Devices Directorate, ARL Aapproved for public release, distribution unlimited. Contents 1. Introduction ....................................................................................................................... 1 2. The Detection Algorithm................................................................................................... 1 2.1 The Data ..................................................................................................................... 2 2.2 The Features ............................................................................................................... 2 2.2.1 Maximum Grey Level–Feature 0 ..................................................................... 2 2.2.2 Contrastbox–Feature 1..................................................................................... 2 2.2.3 Average Gradient Strength–Feature 2 .............................................................. 3 2.2.4 Local Variation–Feature 3 ............................................................................... 3 2.2.5 Straight Edge–Feature 4 .................................................................................. 4 2.2.6 Rectangular Gradient Strength–Feature 5 ........................................................ 4 2.2.7 Vertical Gradient Strength–Feature 6............................................................... 4 2.2.8 How the Features Were Selected ..................................................................... 5 2.3 Combining the Features .............................................................................................. 5 2.4 Experimental Results .................................................................................................. 6 3. The Clutter Rejection Algorithm .................................................................................... 14 3.1 PCA.......................................................................................................................... 14 3.2 MLP ......................................................................................................................... 16 3.3 Experimental Results ................................................................................................ 17 4. Target Recognition .......................................................................................................... 18 4.1 Introduction .............................................................................................................. 18 4.2 The Data ................................................................................................................... 18 4.3 Algorithm Architecture ............................................................................................. 19 4.3.1 PCA Decomposition/Reconstruction Architecture ......................................... 20 4.3.2 Linear Weighting of Reconstruction Error ..................................................... 22 4.3.3 Scale and Shift Search Space......................................................................... 22 4.4 Experimental Results ................................................................................................ 22 5. Conclusions and Future Work ........................................................................................ 25 i References ............................................................................................................................... 26 Report Documentation Page................................................................................................... 29 List of Figures Figure 1. ROC curve on Hunter-Liggett April 1992 imagery. The horizontal axis gives the average number of false alarms per frame, the vertical axis is the target detection probability................................................................................................................... 7 Figure 2. ROC curve on Yuma July 1992 imagery ...................................................................... 7 Figure 3. ROC curve on Greyling August 1992 imagery ............................................................. 8 Figure 4. Easy image from Hunter-Liggett April 1992 data set.................................................... 8 Figure 5. Results on previous image............................................................................................ 9 Figure 6. Moderate image from Hunter-Liggett April 1992 data set ............................................ 9 Figure 7. Results on previous image.......................................................................................... 10 Figure 8. ROC curve on 12-bit 2001 Fort Indiantown gap data ................................................. 11 Figure 9. Histogram of grey levels of 37 Fort Indiantown gap images with no targets ............... 12 Figure 10. Histogram of grey levels of Fort Indiantown gap images with no targets. The y axis has been magnified 100 times to show tail of distribution................................. 12 Figure 11. Histogram of grey levels of 60 Fort Indiantown gap images with targets .................. 13 Figure 12. Histogram of grey levels of Fort Indiantown gap images with targets. The y axis has been magnified 100¥ to show tail of distribution ............................................... 13 Figure 13. 100 most dominant PCA eigenvectors extracted from the target chips...................... 15 Figure 14. Performance curves.................................................................................................. 17 Figure 15. Eigenvectors of HMMWV front side ....................................................................... 20 Figure 16. Eigenvectors of HMMWV left side.......................................................................... 20 Figure 17. Eigenvectors of HMMWV back side........................................................................ 20 Figure 18. Eigenvectors of HMMWV right side........................................................................ 20 Figure 19. Eigenvectors of M113 front side .............................................................................. 20 Figure 20. Eigenvectors of M113 left side................................................................................. 20 Figure 21. Eigenvectors of M113 back side............................................................................... 21 Figure 22. Eigenvectors of M113 right side............................................................................... 21 Figure 23. Eigenvectors of target board 1.................................................................................. 21 Figure 24. Eigenvectors of target board 2.................................................................................. 21 Figure 25. A simple image containing only clutter .................................................................... 23 Figure 26. An image of the left side of an M113 ....................................................................... 24 ii Figure 27. Side view of an M113 .............................................................................................. 24 Figure 28. Front view of an M113, on the road near the center of the image.............................. 25 Figure 29. View of target board type II ..................................................................................... 25 List of Tables Table 1. Confusion matrix on test set ........................................................................................ 23 iii 1. Introduction This work was performed for the DEMO III Unmanned Ground Vehicle (UGV) program, which is developing UGVs that will assist U.S. Army scouts. The Electro-Optics Infrared (EOIR) Image Processing branch (AMSRL-SE-SE) has been tasked with developing algorithms for acquiring and recognizing targets imaged by the Wescam Forward-Looking Infrared (FLIR) sensor. These images are sent back to the user upon request or when the automatic target recognizer (ATR) indicates a location of interest. The user makes the ultimate decision about whether an object in an image is actually a target. The ATR reduces the bandwidth requirement of the communication link because the imagery can be sent back at reduced resolution, except those regions indicated by the ATR as being possible targets. The algorithms consist of a frontend detector, a clutter rejector, and a recognizer. The next three sections describe these components. 2. The Detection Algorithm The algorithm described in this report was designed to address a need for a detection algorithm with wide applicability which could serve as a prescreener/detector for a number of applications. While most automatic target detection/recognition (ATD/R) algorithms use much problemspecific knowledge to improve performance, the result is an algorithm that is tailored to specific target types and poses. The approximate range to target is often required, with varying amounts of tolerance. For example, in some scenarios, it is assumed that the range is known to within a meter from a laser range finder or a digital map. In other scenarios, only the range to the center of the field-of-view and the depression angle is known so that a flat earth approximation provides the best estimate. Many algorithms, both model-based and learning-based, either require accurate range information or compensate for inaccurate information by attempting to detect targets at a number of different ranges within the tolerance of the range. Because many such algorithms are quite sensitive to scale, even a modest range tolerance requires that the algorithm iterate through a large number of closely spaced scales, driving up both the computational complexity and the false alarm rate. Algorithms have often used statistical methods [1] or view-based neural networks [2, 3, 4]. The proximate motivation for the development of the scale-insensitive algorithm was to provide a fast prescreener for a robotic application for which no range information was available. The algorithm instead attempted to find targets at all ranges between some reasonable minimum, determined from operational requirements and the maximum effective range of the sensor. Another motivation was to develop an algorithm that could be applied to a wide variety of image sets and sensor types, which required it to perform consistently on new data, without the severe degradation in performance that commonly occurs with learning algorithms, such as neural networks and principal component analysis (PCA)-based methods, that have been trained on a limited variety of sensor types, terrain types, and environmental conditions. While we recognize 1 that with a suitable training set, learning algorithms will often perform better than other methods, this typically requires a large and expensive training set, which is sometimes not feasible. 2.1 The Data The dataset used in training and testing this system was the April 1992 Comanche FLIR collection at Fort Hunter-Ligget, CA. This dataset consists of 1225 images, each 720 by 480 pixels. Each image has a field of view of ~1.75° squared. Each of the images contains one or two targets in a hilly, wooded background. Ground truth was available, which provided target centroid, range-to-target, target type, target aspect, range-tocenter of field-of-view, and the depression angle. The target centroid and range-to-target were used to score the algorithm, as described in the experimental results section, but none of the target-specific information was used in the testing process. The algorithm only assumes that the vertical and horizontal fields of view and the number of pixels horizontally and vertically is known. The only range information used is the operational minimum range and the maximum effective range of the sensor. 2.2 The Features Each of the features is calculated for every pixel in the image. As more complex features are added in the future, it might become beneficial to calculate some of the features only at those locations for which the other feature values are high. While each of the features assumes knowledge of the range to determine approximate target size, these features are not highly range sensitive. The algorithm calculates each of these features at coarsely sampled ranges between the minimum and maximum allowed range. The features are described below. Each of the features was chosen based on intuition, with the criteria that they be monotonic and computationally simple. The features are described in decreasing order of importance. 2.2.1 Maximum Grey Level–Feature 0 The maximum grey level is the highest grey level within a roughly target-sized rectangle centered on the pixel. It was chosen because in many FLIR images of vehicles, there are a few pixels that are significantly hotter than the rest of the target or the background. These pixels are usually on the engine, the exhaust manifold, or the exhaust pipe. The feature is defined as Fi0, j = max ( k ,l )ŒNin (i, j ) f (k, l ) , (1) where f(k,l) is the grey level value of the pixel in the kth row and lth column, Nin(i,j) is the neighborhood of the pixel (i,j), defined as a rectangle whose width is the length of the longest vehicle in the target set and whose height is the height of the tallest vehicle in the target set. For the applications that we have considered, the width is 7 m and the height is 3 m. 2.2.2 Contrastbox–Feature 1 The contrastbox feature measures the average grey level over a target-sized region and compares it to the grey level of the local background. It was chosen because many pixels that are not on the engine or on other particularly hot portions of the target are still somewhat warmer than the 2 natural background. This feature has been used by a large number of authors. The feature is defined as   Fi1, j = 1 nin (k,l)ŒNin (i, j) f (k,l) - 1 nout (k,l)ŒNout (i, j) f (k,l) , (2) where nout is the number of pixels in Nout(i,j), nin is the number of pixels in Nin(i,j), Nin(i,j) is the target-sized neighborhood defined above, and the neighborhood Nout(i,j) contains all of the pixels in a larger rectangle around (i,j), except those pixels that are in Nin(i,j). 2.2.3 Average Gradient Strength–Feature 2 The gradient strength feature was chosen because manmade objects tend to show sharper internal detail than natural objects, even when the average intensity is similar. To prevent large regions of background that show higher than normal variation from showing a high value for this feature, the average gradient strength of the local background is subtracted from the average gradient strength of the target-sized region. The feature is calculated as  ( )  ( ) Fi2, j = 1 nin Gin (k,l)ŒNin (i, j) i, j 1 - nout Gout (k,l)ŒNout (i, j) i, j , (3) where Gin(i, j) = Gihn(i, j) + Givn(i, j) , (4) Gihn(i, j) = f (i, j) - f (i, j + 1) , (5) Givn = f (i, j) - f (i + 1, j) , (6) and Gout(i,j) is defined similarly. 2.2.4 Local Variation–Feature 3 The local variation feature was chosen because manmade objects often show greater variation in temperature than natural objects. This feature merely determines the average absolute difference between each pixel and the mean of the internal region and compares it to the same measurement for a local background region. The feature is calculated as Fi3, j = Lout (i, nin j) - Lout (i, nout j) , (7) where Lin(i, j) =  f (k,l) - min(i, j) , (8) (k,l)ŒNin (i, j) 3 and 1  min (i, j) = nin (k,l)ŒNin (i, j) f (k,l) , (9) and Lout (i,j) and µin(i,j) are defined similarly. 2.2.5 Straight Edge–Feature 4 The straight edge feature was chosen because manmade object often display straighter temperature gradients than natural objects, especially in the near vertical and horizontal directions. This feature measures the strength of a straight edge that extends for several pixels, and then determines if the edge values in a target sized region differ from the local background. The feature is calculated as  ( )  ( ) Fi,4j = 1 nin Hin (k,l)ŒNin (i, j) i, j = 1 nout Hout (k,l)ŒNout (i, j) i, j , (10) where Hin(i, j) = Hihn(i, j) + Hivn(i, j) , (11) Hihn(i, j) =  f (k, j) - f (k, j + 1) , (12) k-i