Long-distance surveillance is a challenging task because of atmospheric turbulence that causes time-varying image shifts and blurs in images. These distortions become more significant as the imaging distance increases. This paper presents a new method for compensating image shifting in a video sequence while keeping real moving objects in the video unharmed. In this approach, firstly, a highly accurate and fast optical flow technique is applied to estimate the motion vector maps of the input frames and a centroid algorithm is employed to generate a geometrically correct frame in which there is no moving object. The second step involves applying an algorithm for detecting real moving objects in the video sequence and then restoring it with those objects unaffected. The performance of the proposed method is verified by comparing it with that of a state-of-the-art approach. Simulation experiments using both synthetic and real-life surveillance videos demonstrate that this method significantly improves the accuracy of image restoration while preserving moving objects.
© 2015 Optical Society of America
Atmospheric turbulence can be troublesome when a scene is viewed through any significant atmospheric path. Such a phenomenon can be observed on a hot day when objects near the ground shimmer or at night when stars or distant lights twinkle. Atmospheric turbulence arises from wind pressure, temperature and density that cause random fluctuations in the index of refraction of the atmosphere [1, 2]. Because this index is not uniform along a path and varies over time, it distorts the clean electromagnetic wave passing through it. As a result, images captured by ground-level telescopes or cameras are affected by random geometric distortions and non-uniform blurring which makes it more difficult for an observer to differentiate between real moving objects and turbulence-induced motions.
There is a diversity of methods which address the problem of restoring geometrically distorted videos containing real motions, and several others which do not consider the presence of moving objects. As the latter may not work successfully if any moving objects are present because information about these objects may be lost, such types of methods are not discussed in this study. Frakes et al.  proposed an adaptive control grid interpolation approach for restoring turbulence-degraded videos. Firstly, the dense motion vector fields of turbulent frames are computed using bilinear parameter estimations which, in turn, are used as part of a turbulence compensation algorithm. Then, the pixel value, the vector component magnitude of which exceeds the median value in that frame and the median value for the same pixel in the sequence by percentages, is considered to contain real motion. In , a new approach for motion detection in real-time surveillance based on temporal differencing, optical flow, double background filtering and morphological processing was proposed. However, this method is applicable for only short-distance surveillance videos, i.e., those with low distortions. In , another approach based on non-rigid image registration and the Gaussian mixture model (GMM) for detecting moving objects in turbulent conditions was proposed. In it, the image registration compensates for geometric distortions while moving objects are detected using the GMM-based background subtraction technique. However, this method uses the temporal average of the input frames as the reference frame for image registration which provides inaccurate estimations of the motion vector fields.
Gepshtein et al.  and Yaroslavsky et al.  presented similar approaches in which the algorithm estimates the motion vector maps of warped frames using a differential elastic image registration and uses it to generate a stable frame. The moving object regions are located using these maps through a hard thresholding technique. As the algorithm then regenerates a corrected motion vector map for each warped frame, where the motions due to moving objects are compensated, each dewarped frame contains both a stable background and moving objects. Fishbain et al. [8, 9] proposed a method that compensates turbulence-induced warping while keeping real motions unaffected. The method first generates a stable background image using a temporal median filter. Then, a mask is generated for each warped frame from its difference image with respect to the stable image. The algorithm also uses the amplitudes and angles of the motion vectors of the warped frames (with respect to the stable image) to reconstruct the masks which are finally used to extract moving objects from the initial video sequence. The limitations of this algorithm are that it uses a temporal filter to generate a stable image and constant values for thresholding which may not work properly in dynamically turbulent conditions.
Lou et al.  applied a homomorphic filter to detect moving objects in the presence of an illumination variance. Their algorithm separates the components of the illumination and reflectance of an image by automatically selecting thresholds for every pixel. Although this method performs well under lighting changes and shadows, it may not work perfectly in the presence of geometric distortions. In , an atmospheric distortion-mitigating algorithm based on region-based image fusion was proposed, with the fusion performed in the dual tree complex wavelet transform domain because of its near-shift-invariant and directional selectivity features. This method uses a frame selection algorithm to collect informative regions of interest (ROIs) from good-quality frames. These ROIs are then pre-processed before being applied to image registration for motion compensation which is followed by image fusion and contrast enhancement. As the presence of moving objects is handled under only limited conditions, by the authors’ own admission, further improvement is required for full motion videos. Oreifej et al.  proposed a new approach for video stabilization and moving object detection in turbulent conditions. It uses a three-term low-rank matrix decomposition approach and de-composes the turbulent video into three components: background, turbulence-induced motion and moving objects. This method is based on the assumption that moving objects are sparse and their motions are linear while random turbulence-induced motions are Gaussian-distributed around zero displacements. Although this method works well for turbulence mitigation, it is very computationally expensive.
In this work, an efficient method for removing geometrical distortions in long-range videos degraded by the atmospheric path is presented. At this point, other ill-effects, such as any space-time blur, low contrast and CCD noise are not considered. The regions of moving objects in the frames are handled separately to preserve them in the restored frames without distortions. Firstly, an image dewarping algorithm based on the estimated motion vector fields of the distorted frames is applied to generate a geometrically stable frame. Then, a two-stage moving object detection algorithm is used to generate a mask for each distorted frame to preserve any moving objects present. The performance of the proposed method is compared with the method in , namely the Fishbain method, by applying them to synthetic and real-life surveillance videos. Both quantitative and qualitative measures are presented and better results are obtained by the proposed approach.
The rest of this paper is organized as follows. Section 2 briefly describes the optical flow estimation technique and method for generating a geometrically stable frame from a distorted video sequence. Section 3 presents the proposed two-stage moving object detection algorithm. The results and a comparison of them against those from the Fishbain method are presented in Section 4. Finally, Section 5 concludes the paper and provides future research directions.
2. Turbulence compensation
2.1. Image registration
Image registration is the process of determining geometrical deformations that align two or more images of the same scene taken at different times. According to the gray-value constancy assumption, pixel intensities between images are conserved as 14] Eq. (2) is somewhat complicated since a function in the continuous spatial domain needs to be optimized, it is possible to derive it from a discrete version of Eq. (2). It is assumed under the incremental flow framework that an estimate of the flow field is known as s, one needs to estimate the best increment ds = (dsx,dsy). The energy function in Eq. (2) becomes
Assuming Fz(p) = F(p+s)−F(p), and , F(p+s+ ds) − F(p) can be linearized using the first-order Taylor series as
Let SX, SY, dSX and dSY be the vectorized forms of sx, sy, dsx and dsy, respectively, Fx = diag(Fx) and Fy = diag(Fy) be the diagonal matrices, where the diagonals are the frames Fx and Fy, respectively, Dx and Dy be the matrix corresponding to the x− and y− derivative filters, and δp be the column vector that has only one non-zero value at location p. Then, the continuous function in Eq. (3) can be written in the discrete form as
Assuming and , Eq. (5) can be rewritten as
Now the goal is to find dSX and dSY that minimize the energy function in Eq. (6). An iterative re-weighted least square algorithm is used for optimization [14, 15] and the motion vector maps are estimated through a coarse-to-fine refining scheme on a dense Gaussian pyramid. Figures 1(c) and 1(d) show the distributions of the motion vector maps in the x- and y- directions, respectively, calculated for the sample distorted frame shown in Fig. 1(a) with respect to the warped reference frame in Fig. 1(b). The lighter regions in Figs. 1(c) and 1(d) indicate larger displacements of the pixel intensities in those regions while the darker ones dominate in areas with similar pixel intensities.
2.2. Generating stable frame
The first stage of the proposed image restoration method involves generating a geometrically stable frame which, since it is further used to detect moving objects in the sequence, should not contain any moving object. Therefore, it is assumed that there is a sufficiently large number of frames available to generate a turbulence-free and moving object-free frame. For the algorithm, it is assumed that the wander of each point on a remote object over a certain period of time follows a Gaussian distribution and oscillates around its true position, i.e., the average wander will be zero. Therefore, the average wander of a pixel’s intensity with respect to any fixed position will represent its displacement from the true location . Consider a sequence of short-exposure frames needed to be recovered to be as close as possible to the frames that would ideally be acquired in the absence of any turbulence. Calculating the motion vector maps with respect to a reference frame using image registration, it is possible to retain each pixel’s intensity of the warped frame in its true location. For simplicity, the first warped frame is considered as the reference frame. The algorithm for image dewarping is described in the following steps.
- Apply image registration to compute the motion vector maps sx(p) and sy(p) of all the frames with respect to the reference frame.
- Calculate the centroids of the motion vector maps c = (cx,cy) by computing the pixel-wise median.
where F* represents the dewarped version of the input frame F.
Finally, the pixel-wise median of the dewarped frames is computed to obtain a geometrically stable frame. Although both the mean and median may have the same effect for a large number of frames, the median is preferable as it increases the likelihood of excluding any moving object’s effect on the stable frame whereas the mean will guarantee its retention, no matter how little that may be.
3. Detection of moving object
The motion vector maps s used to dewarp the warped frames to their true location only work successfully for restoring static objects and background scenes in the frames. At locations containing moving objects, the motion vector fields may not be accurate as the reference frame does not contain these objects. Therefore, the dewarped frames obtained from Section 2 will contain moving objects in distorted forms. This problem can be solved by generating a mask for each input warped frame to extract moving objects from it. This mask is generated in two stages using the difference image between a warped and geometrically stable frame, and the estimated motion vector maps. The first stage enables the detection of moving object regions but may contain false regions due to the effect of strong turbulence while the second refines the mask by eliminating falsely detected regions.
3.1. Mask generation using difference image
The first stage involves generating a mask for each warped frame using its difference image with respect to the geometrically stable frame. As the restored frames may contain visible unnatural edges surrounding moving objects if a binary mask is used, in order to avoid these edges, a three-level decision-making algorithm is considered. Let F and denote a warped and geometrically stable frame, respectively. Then, the difference image D is
In order to generate a mask from D, an adaptive thresholding technique is considered, with the threshold determined as
The accuracy of this mask depends greatly on the values of the offsets and gain. If GD and KDL are too low compared with the turbulence effect, the mask may contain undesired regions with no moving objects; on the other hand, higher values may lead to moving object regions being truncated. Similarly, a lower value of KDH causes stronger turbulence to be treated as real motion whereas a too-high value hampers the extraction of the true intensities of moving objects from the input frames.
Figures 2(a) and 2(b) show a sample warped frame containing a moving object (people) and geometrically stable frame generated from Section 2, respectively. The difference image and mask generated using Eq. (11) are shown in Fig. 2(c) and 2(d), respectively. Values of 2% and 8% of the maximum image intensity for the offsets KDL and KDH, respectively, and a value of 2 for the gain are chosen for the whole frame. It is noted in Fig. 2(d) that lower turbulence effects are truncated which keeps the moving object region unaltered. Although few unwanted effects due to stronger turbulence still exist in the mask, approximately 5% of the pixels in the mask contain non-zero values which means that almost 95% of the pixels in the frame containing the background scene are truncated.
3.2. Mask generation using motion vector map
A different algorithm is applied to detect moving object regions based on the estimated motion vector maps of the warped frames. The magnitudes of the corrected motion vector maps R are calculated as
In this case, the single threshold value in Eq. (13) (TM) is chosen for R. It is defined such that any value below it is considered a turbulence-induced effect and any above it a moving object effect.
A connected component labeling algorithm is applied to MM and the connected components that are too small are truncated. The rationale behind this is that regions with very few non-zero (one) elements are due to either inaccuracies in their pixel registrations or severe intensity distortions rather than moving objects. Finally, the smallest regions with non-zero (one) elements are determined and considered as the real moving object regions.
The values of the gain GM and offset KM are chosen according to the strength of the turbulence. High values of these constants may truncate motions due to real moving objects whereas too-low ones lead to selection of undesired regions. Figure 2(e) shows a magnitude map of the corrected motion vector for the frame in Fig. 2(a), with lighter regions indicating higher motions. The motion vector-based mask MM overlaid with the difference image-based mask MD is shown in Fig. 2(f). In this example, a value of 2 for the gain and 2 pixels for the offset are empirically considered, with only the regions detected by both masks treated as real motion regions. In this case, almost 20% of the non-zero pixels in the mask in Fig. 2(d) are truncated, almost 96% with respect to the difference image in Fig. 2(c).
3.3. Combined mask generation and preservation of moving objects
After masks MD and MM are calculated, they are combined to generate the final mask MC as
4. Simulation experiments
The complete method developed in this paper was implemented in MATLAB, tested on both synthetic and real degraded video sequences, and its results are compared with those from the state-of-the-art Fishbain method  that handles the same imaging problems. As the image registration algorithm, the most computationally expensive part of both approaches, was written in C++ MEX code, the processing time was significantly reduced, requiring approximately 0.55 seconds to register a pair of images, each of 512×512 pixels. The synthetic video sequence considered in this study consists of 60 warped frames, each of 512×512 pixels , which was produced from a real-life sequence containing moving objects without any turbulence-induced distortions. In order to incorporate turbulence effects, each pixel in the video frames was given a temporally smooth random walk with a maximum displacement of 3 pixels around its unwarped location on the grid. This provided a reasonably good representation of a video sequence that could be expected from real measurements of geometric distortions.
Both qualitative (visual) and quantitative analyses were carried out to compare the methods. The average times required to process each frame of the synthetic sequence were approximately 11.9 and 1.5 seconds for the Fishbain and proposed methods, respectively. Figure 3(a) shows three sample warped frames from the synthetic sequence; the first one contains only geometric distortions and the last two contain both geometric distortions and a moving object. The generated masks and restored frames of the three samples using both methods are shown in Fig. 3(b) to 3(e). The masks generated by the Fishbain method contain more erroneous regions than those from the proposed method which cause undesired extractions from the input warped frames. Also, the background scene is more accurately extracted by the proposed method as this approach considers the temporal median of the dewarped frames as the stable background frame.
For quantitative comparison, the normalized mean-squared-error (NMSE) of the restored frames are calculated with respect to their non-turbulent truth frames. The NMSE is defined asTable 1 summarizes the performance of the methods on the synthetic sequence. The NMSE values for the sample frames are lower for the proposed method compared to those of the state-of-the-art method. With increasing accuracy in both background scene estimation and mask generation, the proposed method provides better restoration of the geometrically distorted frames, whilst keeping the moving objects unaltered.
Figure 4 shows a plot of the NMSE values between two successive frames of the synthetic sequence and its restored versions by the two methods. These NMSE values are also calculated using the same definition as in Eq. (17), where the R and G frames are two consecutive frames, at indices t and t +1. Since the moving object changes its position throughout the sequence, the NMSE value changes from frame to frame. The lower NMSE values for the first few frames are due to the absence of moving object, i.e., due to only noise in the background. The proposed method yields lower NMSE value consistently throughout the sequence. The average NMSE values of the sequence obtained by the Fishbain and proposed methods are 0.32% and 0.22%, respectively, whereas for the input sequence it is 0.38%. The graph confirms that the restored sequence using the proposed method is more stable than that using the Fishbain method.
After obtaining a good restoration performance from the proposed method on a synthetically warped sequence, it was applied to a real one  which contains 120 frames, each of 486×720 pixels, with three sample frames shown in Fig. 5(a). It takes about 15.6 and 2.1 seconds for processing a single frame by the Fishbain and proposed methods, respectively. Figures 5(b) to 5(e) show the generated masks and restored frames using both methods. A careful visual comparison reveals that the proposed method provides more accurate masks and, hence, better-restored frames. Graphs of NMSE values between two successive frames of the warped input sequence and restored sequence using the two methods are shown in Fig. 6. The NMSE curve for the proposed method is consistently lower than that of the Fishbain method. The average NMSE values for the Fishbain and proposed methods are 0.24% and 0.20%, respectively. The overall performance measures confirm the higher geometric restoration fidelity of the proposed method compared to the state-of-the-art method.
5. Discussion and conclusions
In this work, we considered the problem of recovering distorted images while imaging through turbulent medium. The main challenge was to distinguish between moving objects and turbulence mis-compensation due to inaccuracies of the estimated translation vectors. We successfully implemented a simple and efficient image restoration method for geometric corrections of turbulence-degraded video sequences while preserving real moving objects in the sequence. This method contains two main building blocks: (1) generation of a geometrically stable frame; and (2) detection of real moving objects and their accurate preservation in the restored frames. The pixel registration technique performs well for estimating the geometric deformations of the warped frames. Once the motion vector maps are estimated, a stable frame is generated by a dewarping algorithm. In order to preserve real moving objects in frames, the real motion regions are detected and proper masks are generated to extract them from the input frames.
The proposed method is intended to be used in surveillance where the scene under observation is stationary and objects, such as people, tanks and cars happen to move through it. The main advantage of this method over others is that it has less computational complexity and is simpler to implement. It was tested on both synthetic and real-world videos containing real moving objects. A comparison with the state-of-the-art Fishbain method showed that the proposed method significantly improves image restoration in terms of both computational time and accuracy. However, one immediate limitation is that it cannot remove turbulence-induced distortions from moving objects. Future work will continue the effort to overcome this and make this method more robust and take into account other atmospheric effects such as blurring. Also, the complete method could be implemented on graphical processing units or field programmable gate arrays in order to achieve higher computational efficiencies.
References and links
1. R. K. Tyson, Introduction to Adaptive Optics (SPIE, 2000). [CrossRef]
3. D. H. Frakes, J. W. Monaco, and M. J. T. Smith, “Suppression of atmospheric turbulence in video using an adaptive control grid interpolation approach,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Institute of Electrical and Electronics Engineers, 2001), pp. 1881–1884.
4. N. Lu, J. Wang, Q. H. Wu, and L. Yang, “An improved motion detection method for real-time surveillance,” IAENG Int,”. J. Comput. Sci. . 35, 119 (2008).
5. A. S. Deshmukh, S. S. Medasani, and G. R. Reddy, “Moving object detection from images distorted by atmospheric turbulence,” in Proceedings of the International Conference on Intelligent Systems and Signal Processing (Institute of Electrical and Electronics Engineers, 2013), pp. 122–127.
6. S. Gepshtein, A. Shtainman, B. Fishbain, and L. P. Yaroslavsky, “Restoration of atmospheric turbulent video containing real motion using rank filtering and elastic image registration,” in Proceedings of the European Signal Processing Conference (European Association for Signal Processing, 2004), pp. 477–480.
7. L. P. Yaroslavsky, B. Fishbain, A. Shteinman, and S. Gepshtein, “Processing and fusion of thermal and video sequences for terrestrial long range observation systems,” in Proceedings of the International Conference on Information Fusion (International Society of Information Fusion, 2004), pp. 848–855.
8. B. Fishbain, L. P. Yaroslavsky, and I. A. Ideses, “Real-time stabilization of long range observation system turbulent video,” J. Real-Time Image Proc. 2, 11–22 (2007). [CrossRef]
9. B. Fishbain, L. P. Yaroslavsky, and I. A. Ideses, “Spatial, temporal, and interchannel image data fusion for long-distance terrestrial observation systems,” Adv. Opt. Technol. 2008, 546808 (2008). [CrossRef]
10. J. Lou, H. Yang, W. Hu, and T. Tan, “An illumination invariant change detection algorithm,” in Proceedings of the Asian Conference on Computer Vision (Asian Federation of Computer Vision Societies, 2002), pp. 1–6.
11. N. Anantrasirichai, A. Achim, N. G. Kingsbury, and D. Bull, “Atmospheric turbulence mitigation using complex wavelet-based fusion,” IEEE Trans. Image Process. 22, 2398–2408 (2013). [CrossRef] [PubMed]
12. O. Oreifej, X. Li, and M. Shah, “Simultaneous video stabilization and moving object detection in turbulence,” IEEE Trans. Pattern Anal. Mach. Intell. 35, 450–462 (2013). [CrossRef]
13. T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” Lect. Notes Comput. Sci. 3024, 25–36 (2004). [CrossRef]
14. C. Liu, “Beyond pixels: exploring new representations and applications for motion analysis,” Ph.D. dissertation (Massachusetts Institute of Technology, 2009).
16. M. Tahtali, A. J. Lambert, and D. Fraser, “Restoration of nonuniformly warped images using accurate frame by frame shiftmap accumulation,” Proc. SPIE 6316, 631603 (2006). [CrossRef]
17. MathWorks, “Motion-based multiple object tracking,” http://www.mathworks.com.
18. N. Goyette, P. -M. Jodoin, F. Porikli, J. Konrad, and P. Ishwar, “changedetection.net: A new change detection benchmark dataset,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (Institute of Electrical and Electronics Engineers, 2012), pp. 1–8.