A profile preferentially partial occlusion removal method for integral imaging is presented. The profile of the occlusion always contains details with significant texture structure, and regions with significant texture structure often lead to reliable depth estimation. Taking the advantage of the significant texture structure, the profile of occlusion is preferentially dealt with, and then the entire occlusion region is determined via regional spreading according to the accurate profile. The details of occlusion can be accurately removed and the occluded scene is also retained to the maximum degree. In our method, elemental images are integrated into a four-dimensional light field to provide consistently reliable depth estimation and occlusion decisions among all elemental images. Experimental results show that the proposed method is efficient to deal with the details of the occlusion, and it is robust for the occlusions with different kinds of texture structure.
© 2016 Optical Society of America
One of promising three-dimensional (3D) sensing and imaging methods is integral imaging [1–3]. In integral imaging, with the help of the lenslet array or the camera array, the scene is captured with multi-perspectives, and the redundant information makes the recognition of partially occluded scene possible, because the occluded region in one perspective can be visible in another perspective. The traditional and the most noticeable methods for partially occluded scene recognition were presented in [4,5]. With the volumetric computational reconstruction [6,7], the scene was reconstructed at the depth of the partially occluded scene. Only the scene at the depth was focused and other regions were diffusing. As a result, the partially occluded scene became visible. But as a side effect, the diffusion of the foreground occlusion becomes noisy, and the focused occluded scene is confused. To address this problem, light rays emitted from the occlusion were eliminated, with the scene was computationally reconstructed [8–14]. On the one hand, according to the periodicity of the lenslet array, the occlusion was filtered in frequency domain [8–10]. In those methods, the scene with different depths was filtered with different frequencies. However, since the images are complex signals and frequency spectrum aliasing at different depths always exists, the occlusion cannot be completely filtered. On the other hand, there are methods to directly decide the occlusion in spatial domain, which is free from frequency spectrum aliasing, and it is possible to completely remove the occlusion. In spatial domain based methods, the quality of occlusion removal was related to the quality of occlusion decision, and efforts were devoted in previous works to have an accurate occlusion decision. The basic idea of occlusion decision is depth comparison since the occlusion is always in front of the interested scene. The common stereo matching methods based on block-wise matching with global regularization [11,12] and block matching methods via windowing technique [13,14] were used for depth estimation, but it is hard to deal with the details of the occlusion and excessively expand the range of occlusion, because the block is larger than pixel and the global regularization causes over-smooth effect. The luminance variance based methods [15,16] took the advantage of the fact that an actual object point in the scene was with a consistent luminance when element images were projected. The depth value that made a minimum luminance variance of the reconstructed light rays was considered as the actual depth of the point. However, the luminance variance based methods are sensitive to the texture structure of a scene, and they fail when there is a simple texture structure for the occlusion.
To realize an accurate and robust occlusion decision, the texture structure is taken into consideration when the depth information is estimated. As only the significant texture structure contains reliable depth information and the occlusion is always with a significant profile, a reliable depth information on the profile of occlusion can be preferentially obtained. Together with global depth completion and region spreading according to the profile in the four-dimensional (4D) light field, depth information can be spread from one element image to another or spread within an element image, so the missed or wrong depth information can be repaired from adjacent pixels or adjacent element images, and the occlusion can be globally decided. Thus a consistent occlusion decision in pixel-wise precision is realized among all element images, and light rays emitted from the occlusion are well eliminated when the scene is reconstructed. Moreover, as the occlusion profile is obtained preferentially, the proposed method is robust for partial occlusions with different texture structures.
In Section 2, the basic principle of our method is presented. Then in Section 3, the details of occlusion decision method are described. In Section 4, the experiment is carried out to show that the proposed method is efficient and robust for the occlusion with different texture features.
2. The basic principle of the proposed method
2.1 Relationship between integral imaging and the light field
The integral imaging sensing system records the light rays of a 3D scene in different perspectives. Each light ray can be represented by a 4D function , where is the intensity of the light ray and in the normalized RGB space , r, g, b are the normalized red, green and blue intensity values. x and y are the column and the row number of the elemental image, respectively. is the pixel coordinate of the light ray on the imaging plane at the mth column and the nth row. All the recorded light rays form a 4D light field [17,18]. As shown in Fig. 1, for the sake of simplicity, a 2D integral imaging sensing system with constants x and m is given, and the pinhole array records the light rays emitted from the object at different depths. Considering the 2D light field , the projections of the same 3D point in different elemental images form a line, and the slope of the line K is related to the depth of object as,
2.2 Profile preferential depth estimation
Starting with the luminance variance depth estimation method, according to the feature that the projection of the same 3D point in each elemental image is with a consistent luminance, the consistent score Cs for an X by Y lenslet array is defined as,Eq. (1). The scope of d also can be roughly estimated according to Eq. (1) with the scene depth z. W is a window function with a bandwidth B, which filters out the light rays whose intensities are too far from the reference light ray and makes the depth estimation robust to the partial occluded region, and a wider bandwidth B allows the same ray has a bigger intensity a riance in different elemental images. When reaches the maximum value in a possible range of the disparity d, the luminance variance reaches the minimum value, and the actual depth is obtained.
In Fig. 2, examples of depth estimation are shown. Pixel A and pixel B are in the region with significant texture structure, and pixel C is in a homogeneous region. The luminance consistent score distribution of pixel A and pixel B has a significant peak and it is confident to decide the optimal depth estimation at the peak. The peak score of pixel B is lower than pixel A, because the pixel B is partially occluded. It is difficult to decide the depth of pixel C as there is no significant peak for the optimal depth estimation, though the luminance consistent scores are high. Therefore, only the depth estimation at the region with the significant texture structure is reliable. Because the foreground is always clearly distinguished from its background and it provides significant texture structure at the profile, a reliable depth estimation can be obtained. The profile preferential depth estimation together with the linear pattern in the light field gives a robust and accurate occlusion decision.
3. Profile preferentially partial occlusion removal
According to the basic principle of the proposed method, the accurate profile preferential depth estimation is done firstly, and then these local depth pixels are statistically regressed to the linear pattern in the light field with global depth completion. Finally the occlusion is decided via region spreading according to the linear pattern.
3.1 Profile preferentially local depth estimation
Taking advantage of the fact that occlusion is always significantly distinguished from its background, a reliable depth information at the profile of occlusion is estimated. As shown in Fig. 3(b), the region with the significant texture structure is extracted according to the color gradient value, and the gradient operators Gh, GvEqs. (2)-(4), respectively. In Fig. 3(b), the region with the significant texture structure is extracted, which covers the profile of occlusion. Because the depth estimation is reliable, the depth maps for horizontal texture and vertical texture in Fig. 3(c) and 3(d) are both noiseless.
3.2 Global depth completion
As shown in Fig. 4(a), for the depth map in the 2D light field , depth pixels with the same depth value should form a line. According to this rule, the depth estimation is further repaired by linearly regressing the discrete depth pixels into depth lines, and the statistical method is used. Given a depth pixel located at in the 2D light field with a disparity d, the depth line crosses the pixel is defined as,
If there are many other depth pixels with the same depth value across the line, the line is supported to be a reliable depth line, and it is convincing to represent those discrete depth pixels with the depth line. By counting the support depth pixels for every possible depth line, the reliable depth lines are statistically estimated according to the number of supported depth pixels NUMsupport>Tsupport, where Tsupport is the threshold for supporting depth pixel number of a reliable depth line, and the Tsupport is suggested to be 50% of the height of the 2D light field. The depth pixel with the small supporting depth pixel number is removed, and the discrete pixels of occlusion are replaced with depth lines. As shown in Fig. 4(b), the discrete pixels of the profile are linked to lines, and the missed profile is repaired. The discrete noise is also eliminated.
3.3 Occlusion decision via profile spreading in light field
Since the complete profile of occlusion is obtained after global depth completion, the entire occlusion can be obtained by region spreading. Taking both the horizontal and the vertical texture structure into consideration, the region spreading direction is decided in the 4D light field . Similar with the watershed algorithm , the region spreads from the background pixels and signed as the background scene until the occlusion profile is met, and the rest of region is signed as occlusion. Figures 5(a) and 5(b) show examples of region spreading results in the xm domain and in the yn domain, where the horizontal and the vertical texture structures are considered respectively. More visualized results are given in the elemental image. The bright green region is the occlusion region in the profile. As the Fig. 5(a) only takes the horizontal texture into consideration and the Fig. 5(b) only takes the vertical texture into consideration, a lot of background regions are missed when the background is signed. So comprehensively considering the horizontal and the vertical texture structures, only the both signed region in Fig. 5(a) and 5(b) are signed as the occlusion, as shown in Fig. 5(c), where the white mask is the occlusion. After these inter-perspective depth information spreading, the inner-perspective depth information spreading is taken in the xy domain, and the final occlusion decision in the light field is shown in Fig. 5(d). Comparing Fig. 5(c) and Fig. 5(d), the background is better decided in Fig. 5(d) than that in Fig. 5(c) after spreading the inner-perspective depth information. As the occlusion decision is made in the light field instead of single elemental image, the depth information contained in one elemental image can be propagated to other elemental images with the help of region spreading. Compared with previous works, the background scene is better retained after removing the occlusion.
3.4 Computational integral imaging reconstruction with occlusion removal
According to the back projection and geometrical optics techniques, the 3D scene can be reconstructed as a superposition of the shifted elemental images,
The occlusion mask eliminates the light rays from occlusion.
4. Experiment and discussion
Experimental results are carried out to demonstrate the efficiency of the proposed method. Two groups of elemental images with different texture structures are used in the experiment. Figure 6 shows the elemental images of two groups. The occlusion with the complex texture is shown in Fig. 6(a), and the occlusion is smooth in Fig. 6(b). The occlusion is at 1.817m in front of the camera array, and the background is at 5m. The baseline of the camera array is 5mm. Operations in Eqs. (2)-(8) are computed in the normalized RGB space, and the bandwidth B is set to 0.1 in Eq. (5). The threshold for supporting depth pixel is set to 5 for both xm and yn domains in the global depth completion process.
4.1 Accurate consistent occlusion decision among elemental images
In the proposed method, the accurate profile of occlusion is firstly obtained with the profile preferential depth estimation method. Moreover, the depth line in 4D light field is introduced instead of the traditional depth pixels to make the depth estimation reliable and consistent among each elemental images, because the depth line is a statistical result of hte depth information in element images, which is more easy to be detected and more reliable than a single depth pixel. As shown in Fig. 7, the mask of occlusion is painted white, and the occlusion details are preserved in the occlusion mask. Even the small background “holes” in the occlusion is correctly decided. The occlusion is well decided and the information of the occluded scene is retained to the maximum. Moreover, the masks are consistent among all the 11 by 11 elemental images.
4.2 Robust to occlusion with different texture structures
As shown in Fig. 8, the proposed method is compared with some previous methods with the two groups of 11 by 11 elemental images. Figure. 8(c) and 8(d) show the occlusion decision results with block match and global regularization based methods [11,12]. The depth map is estimated with a typical block match and the global regularization based method , and it is effective for the occlusion with a smooth structure without details as Fig. 8(d). However, it cannot deal with the 3D scene with complex details as Fig. 8(c). Compared with our proposed method, the occlusion is over-smoothed and the small “holes” of the occluded scene are closed in Fig. 8(c). In Fig. 8(e) and 8(f), occlusion decision results with the variance based method  are shown. With the pixel-wise depth estimation, the occlusion details are well decided as Fig. 8(e). However, the variance based method is sensitive to the texture structure, and the depth estimation fails in the case of texture-less occlusion. As shown in Fig. 8(f), the profile of the square is well decided but the decision at the square center is wrong. Moreover, without globally considering the depth information as the global depth estimation, some occlusions are missed as shown in Fig. 8(e). In our proposed method, as shown in Fig. 8(g) and 8(h), the occlusion profile which always contains the significant texture structure is firstly decided, and then the total occlusion is decided via region spreading. The occlusion profile is accurate and it is robust for the occlusion texture structure.
4.3 Scene reconstruction
Optical reconstruction performances with elemental images as shown in Fig. 9(a) are compared with the proposed method are shown in Fig. 9. Letters “BUPT” are more easy to be identified in Fig. 9(g) than that in the optical reconstruction and the computational reconstruction in Fig. 9(c) and Fig. 9(e). There are black holes in Fig. 9(g) and Fig. 9(h), because the light rays from occlusion are removed and there is no available light ray crossing the black regions.
Computational reconstruction of two kinds of occlusion with above three methods is shown in Fig. 10, and PSNRs are marked in the corresponding reconstruction. All the figures in Fig. 10 are reconstructed at d = 1.9. As shown in Fig. 10(c), block match and global regularization based methods fails for the occlusion with complex texture structure, because the occlusion decision is over-smoothed as shown in Fig. 8(c), and the background is wrong decided. Comparing Fig. 10(f) with Fig. 10(h), when the occlusion with smooth texture structure is dealt with, the PSNR of the variance based method is significantly lower than that of the proposed method, which shows that the variance based method is not applicable for the smooth texture case. PSNRs of the proposed method in both the two different kinds of occlusion are significantly improved compared with the other two methods, which illustrates the efficiency of the proposed method.
An auxiliary visual method for occlusion removal in integral imaging is presented. Taking the texture structure into consideration and starting from the region with the reliable depth information, the accurate profile of occlusion is firstly obtained with the pixel-wise accuracy, and then region is spread globally in the 4D light field to decide the entire occlusion. Moreover, the proposed method is robust for different texture structures of the occlusion. The reconstruction of the occluded scene is clear without the noise of light rays emitted from the occlusion. The proposed method is based on the luminance of light rays, so the profile of occlusion should be sufficiently separated from the occluded scene in intensity or color, and the scene should satisfy the Lambertian surfaces condition, which is similar with previous works.
“863” Program (2015AA015902); National Natural Science Foundation of China (61575025); State Key Laboratory of Information Photonics and Optical Communications; Program of Beijing Science and Technology Plan (D121100004812001).
References and links
1. S. Adrian and B. Javidi, “Three-dimensional image sensing, visualization, and processing using integral imaging,” Proc. IEEE 94(3), 591–607 (2006). [CrossRef]
3. X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, “Advances in three-dimensional integral imaging: sensing, display, and applications [Invited],” Appl. Opt. 52(4), 546–560 (2013). [CrossRef] [PubMed]
5. S.-H. Hong and B. Javidi, “Three-dimensional visualization of partially occluded objects using integral imaging,” J. Disp. Technol. 1(2), 354–359 (2005). [CrossRef]
8. G. Saavedra, R. Martínez-Cuenca, M. Martínez-Corral, H. Navarro, M. Daneshpanah, and B. Javidi, “Digital slicing of 3D scenes by Fourier filtering of integral images,” Opt. Express 16(22), 17154–17160 (2008). [CrossRef] [PubMed]
10. J. Y. Jang, D. Shin, and E. S. Kim, “Optical three-dimensional refocusing from elemental images based on a sifting property of the periodic δ-function array in integral-imaging,” Opt. Express 22(2), 1533–1550 (2014). [CrossRef] [PubMed]
11. D. H. Shin, B. G. Lee, and J. J. Lee, “Occlusion removal method of partially occluded 3D object using sub-image block matching in computational integral imaging,” Opt. Express 16(21), 16294–16304 (2008). [CrossRef] [PubMed]
12. R. Taekyung, B. Lee, and S. Lee, “Mutual constraint using partial occlusion artifact removal for computational integral imaging reconstruction,” Appl. Opt. 54(13), 4147–4153 (2015). [CrossRef]
13. H. Yoo, “Depth extraction for 3D objects via windowing technique in computational integral imaging with a lenslet array,” Opt. Lasers Eng. 51(7), 912–915 (2013). [CrossRef]
14. H. Yoo, D. Shin, and M. Cho, “Improved depth extraction method of 3D objects using computational integral imaging reconstruction based on multiple windowing techniques,” Opt. Lasers Eng. 66, 105–111 (2015). [CrossRef]
15. B. G. Lee, H. H. Kang, and E. S. Kim, “Occlusion removal method of partially occluded object using variance in computational integral imaging,” 3D Research. 1(2), 2–10 (2010). [CrossRef]
16. X. Xiao, M. Daneshpanah, and B. Javidi, “Occlusion removal using depth mapping in three-dimensional integral imaging,” J. Disp. Technol. 8(8), 483–490 (2012). [CrossRef]
17. S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “The lumigraph,” ACM Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 43–54 (1996). [CrossRef]
18. E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” Vision and Modeling Group, Media Laboratory, Massachusetts Institute of Technology (1991).
19. F. Meyer, “Color image segmentation,” International Conference on IET in Image Processing and its Applications, 303–306 (1992).