Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Poisson disk sampling with randomized satellite points for projected texture stereo

Open Access Open Access

Abstract

A stereo camera is fundamental for 3D sensing. We can only use simple stereo algorithms for real-time applications; however, it has low accuracy. Adding texture by projection is a solution to the problem. The system is called projected texture stereo. Projecting light patterns is essential and deeply related to the matching algorithm in the system. This article proposes a projected pattern and a new texture stereo camera system with a suitable stereo matching algorithm. Experimental results show that our system is superior to Intel RealSense, the commercially successful projected texture stereo camera.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

A stereo camera is fundamental for 3D sensing. For sensing, real-time stereo matching is essential. Stereo matching [1] has various approaches, from high-computational cost with high accuracy to low-computational cost with low accuracy.

Low computational methods for real-time processing are unstable in weak texture regions. Optical light support is a solution for the problem [2]. The active projection for textureless regions can enrich texture for any region and improve matching accuracy. The stereo matching is called projected texture stereo [3] and has various patterns [48].

Various commercial 3D sensors are developed [9], and Intel RealSense [10] is a representative commercial product and is a projected texture stereo camera (RealSense was discontinued in 2021). The sensor projects a random dot pattern and uses census-transform and Hamming distance measurement [11] for stereo matching. The algorithm is on an application-specific integrated circuit (ASIC), and the ASIC outputs Wide VGA (848 $\times$ 480) with 90 fps or HDTV ($1280\times 720$) with 30 fps. The census-based method is highly resistant to lighting fluctuations since it does not directly compare luminance values. Also, the binary value comparison reduces the effect of small noises. Besides, it is suitable for hardware processing. The RealSense projection mask [12] blocks many of the opening areas. The mask requires a strong light, and the light spreads easily. Also, the number of characterizing points tends to decrease. Fanello et al. [13] uses Kinect V1’s projector for projected texture stereo with learned census-like matching. The projector is not for texture stereo; thus, the method combines with the high-cost matching of PatchMatch [14], which is a randomized algorithm and is not suitable for ASIC and field-programmable gate array (FPGA). Various applications [1518] used random speckle pattern [46], which consists of blobs with a fixed pixel size more than 1 pixel. The randomization is also used to support coded projections [19,20]. Yin et al. [20] use the pattern for supporting fringe projection profilometry [2124]. However, these conventional patterns require a certain amount of dot size, resulting in low depth resolution and requiring advanced algorithms, making hardware design difficult.

In this paper, we develop a new projected texture stereo system shown in Fig. 1. The system captures SXGA ($1280 \times 1024$) at 64 fps on FPGA. Our contributions are as follows;

  • • Our projection system uses a photomask, where additional points block the light. Our projector can let much light through and add a lot of small dots without spreading.
  • • Our projection pattern uses blur-noise sampling of Poisson disk sampling with randomized satellite points to prevent bias in depth accuracy.
  • • Our image processing unit has an effective implementation on FPGA by using skipped census-transform, which extracts performance from our projection pattern.

 figure: Fig. 1.

Fig. 1. Proposed projected texture stereo camera system.

Download Full Size | PDF

2. Proposed method

2.1 System setup

Our system consists of two cameras, a pattern projector, and an image processing unit (IPU). The cameras have 8-bit $1280\times 1024$ CMOS global shutter monochrome sensors. IPU uses FPGA, which is Xilinx Kintex-7. Our system size is $120\times 70\times 40$ mm. The field of view (FoV) of our system is $26^{\circ }$, and its focal length is 12.7mm. Our prototypes have two kinds of baseline lengths, 68mm and 78mm, but this paper mainly reported the 68mm version. Table 1 shows the datasheet of the proposed system and the competitive RealSense. Our system has a narrower field of view and baseline than RealSense, resulting in greater distance accuracy, as shown in the experimental results section. Note that RealSense can control FPS up to 90 by downsampling the resolution, but the maximum resolution setting is used in this paper.

Tables Icon

Table 1. Datasheet of the proposed system and Intel RealSense.

We calibrate both cameras by Zhang’s method [25] and then remove the lens distortion of both captured images by its parameters. Then, we rectify the stereo images to have parallel epipolar lines. Our IPU performs the rectification. Figure 2 shows the rectification results. The rectified image is rotated slightly and undistorted non-linearly with the Brown’s distortion model [26]. High distorted lenses and rough camera setup cause large deformation [27], while we used low distortion lenses and careful setup to suppress the deformation. Slight distortion can save the circuit line buffer size. Note that the displayed image renders the ideal grid, not a warping of the input image; thus, the boundary area is not disappearing.

 figure: Fig. 2.

Fig. 2. Input image grid and its transformed image grid. The radial distortion parameters of Brown’s distortion model [26] are $k_1=-8.60e^{-2}, k_2= 5.90e^{-1}, k_3=-1.78$, while our RealSense parameters are $k_1=0.12, k_2= -0.29, k_3=0.10$.

Download Full Size | PDF

Our pattern projector consists of a lens, LED, and a dot pattern printed on a photomask (Fig. 3). The projector uses the same lens and f-number as our stereo camera; thus, the pattern has the exact resolution and projected region. Also, our photomask can narrow down the projected light at the pixel level. The sensor’s pixel size is 5.3$\mu$m, and the dot size is slightly larger at 6.7$\mu$m since if the size of the dots is perfectly matched, slight installation errors will cause aliasing signals. For illumination, we use a blue peak wavelength (455nm) rating of 4.2W (but using at 2-3W). Also, we use a band-pass filter as a lens filter: peak wavelength 455$\pm$5nm, half-value width 60$\pm$8nm, transmittance 85% at least.

 figure: Fig. 3.

Fig. 3. Projector for random dot pattern and its photomask.

Download Full Size | PDF

2.2 Random dot generator

The projected texture stereo does not require a priori knowledge of the projection pattern that is different from structured light camera (Microsoft Kinect) or coded light camera (Apple iPhone X). The pattern structure should be unique for each matching area and has the same capturing area of the camera resolution.

The random sampling, which determines light passing or not, generates such a mask pattern. However, pure random sampling generates density shading and biasing. Thus, we use blue noise sampling [28] for random sampling. The blue noise sampling finds the samples that satisfy the following conditions:

$$\|p_i-p_j\|_2>d \quad \forall p_j\in\mathcal{P},$$
where, $p_i, p_i \in \mathcal {P}$ are the samples and all samples have a distance greater than $d$. To find the samples, we use the Poisson disk sampling (PDS) algorithm We use a fast implementation for PDS [29]. However, PDS generates lattice-like sampling, which has low uniqueness for matching areas. Therefore, we randomize a sampled point by adding randomly selected satellite points.
$$\phi(p_i, \mathrm{rnd}())=\{q_i, \ldots\}\subset{\mathcal{Q}} \quad \forall p_i\in\mathcal{P},$$
where $q_i\in \mathcal {Q}$ is an output sample, and $\phi$ is a randomizing function with random generator $\mathrm {rnd}()$, which randomly select a pattern from prepared set. We prepare 25 randomized satellite points patterns, where is $3\times 3$ size (See Fig. 4). Figure 5 shows the resulting patterns. RealSense illuminates samples locally with a near-infrared laser; thus, the projected area turns white as texture, while our proposed method creates black dots as a texture by blocking light locally. A strong light is necessary to add more light to a bright area, and the dot size tends to be large to increase light intensity. When we create a pattern by widely darkening areas by blocking lights, we only need to close the light with a mask. Also, we attach the band-pass filter of the wavelength of projected LED for our stereo camera to reduce the effects of external disturbances.

 figure: Fig. 4.

Fig. 4. 25 cases of randomized satellite points.

Download Full Size | PDF

 figure: Fig. 5.

Fig. 5. Projecting random dot patterns.

Download Full Size | PDF

2.3 Stereo matching

Simple stereo matching uses the sum of absolute difference (SAD) for its cost function. However, SAD cannot capture additional projections well. We use a census-transform to calculate the stereo matching cost. The census-transform searches for correspondence points by comparing a bit sequence created from the relationship between center and neighborhood pixels.

Let an input image be $I:\mathcal {S}\mapsto \mathcal {R}$, where $\mathcal {S}\subset \mathbb {N}^{2}$ is a spatial domain, $\mathcal {R}\subset [0,R=256) \subset \mathbb {N}$ is a range domain, respectively. We denote $\boldsymbol{p}\in \mathcal {S}$ as a pixel position, $\boldsymbol{I_p}\in \mathcal {R}$ as its intensity, and $\mathcal {N}_{\boldsymbol{p}}=\{\boldsymbol{q}_1, \boldsymbol{q}_2, \ldots, \boldsymbol{q}_N\}\subset \mathcal {S}$ as its neighboring pixels. The census-transform is

$$H(I_{\boldsymbol{p}}-I_{\boldsymbol{q}}) \quad \forall \boldsymbol{q} \in \mathcal{N}_{\boldsymbol{p}},$$
where $H: \mathbb {Z} \mapsto \{0, 1\}$ is the unit step function;
$$H(x) = \begin{cases} 1 & (x >0) \\ 0 & (x \leq 0). \end{cases}$$

The census-transformed output is represented as a vector, $\boldsymbol{J}_{\boldsymbol{p}}=(H_{\boldsymbol{q}_1}, H_{\boldsymbol{q}_2}, \ldots, H_{\boldsymbol{q}_N})$; $H(I_{\boldsymbol{p}}-I_{\boldsymbol{q}})$ as $H_{\boldsymbol{q}}$ in short. The vector is a binary bit sequence; thus, it is represented by an integer number.

After the census-transforming for left and right images, $\boldsymbol{J}^{L}$ and $\boldsymbol{J}^{R}$, we match the pair by measuring Hamming distance, $\mathcal {H}$, between them. Also, we accumulate the distance along a local support window, $\mathcal {M}_{\boldsymbol{p}} \subset \mathcal {S}$. Finally, we find the resulting value, $D_{\boldsymbol{p}} \in \mathbb {Z}$, by minimizing the cost between matched pixels along the disparity range, $D=\{d_{\min }, \ldots, d_{\max }\}\subset \mathbb {Z}$:

$$D_{\mathbf{p}}=\arg \min_{d \in D} \sum_{\mathbf{q} \in \mathcal{M}_{\mathbf{p}}} \mathcal{H}(\mathbf{J}^{R}_{\mathbf{p}}, \mathbf{J}^{L}_{\mathbf{p}+d}),$$
where $\mathcal {M}_{\boldsymbol{p}}$ can be set independently of the window $\mathcal {N}_{\boldsymbol{p}}$ of a census-transformation.

Usually, a census-transform window of $\mathcal {N}$ is a square shape with dense sampling; however, the shape requires many line buffers. To reduce the line buffers required in the FPGA, we change the pixel selection location from a square shape to a horizontal rectangle shape. Also, reducing sampling pixels to keep census-transform values within 2 bytes with supporting wide ranges, we select skipped-16-pixels, called skipped census-transform (See Fig. 6). The implementation accelerates stereo matching.

 figure: Fig. 6.

Fig. 6. Skipped census-transform ($13\times 3$).

Download Full Size | PDF

3. Experimental results

In this paper, we conducted experiments with simulations and real environments. We used simulation to compare our method with random projecting patterns, including random and PDS. Also, we compared our method with RealSense in real environments.

3.1 Simulation

We evaluated the proposed method by simulations. We used Middlebury stereo dataset [30,31], which contains ground truth depth maps. Our experiment added random dot patterns for input stereo pairs to darken images. Figure 7 shows the input and simulated images. The dataset contained color images; thus, we first converted a color image to a grayscale image. Then, we added projecting patterns. The detail of the generation of the simulated images is written in Appendix A. Also, we consider noisy conditions. We add Gaussian noise ($\sigma =5$) and gamma transform only for right image ($\gamma = 1.2$) shown in Fig. 8.

 figure: Fig. 7.

Fig. 7. Input and simulated projected stereo pair (Plastic).

Download Full Size | PDF

 figure: Fig. 8.

Fig. 8. Raw and noisy input stereo pairs with the proposed projection. Average intensities of right images in the flat gray region are 184 (raw) and 190 (noise offset by gamma transform), respectively.

Download Full Size | PDF

We show the output disparity maps with various projections encoded by 8 bits in Fig. 9. Note that the actual output has additional 6 bits for subpixel accuracy. Also, Table 2 shows the objective evaluation of the number of bad pixels ratio (NBP) [1] for the average of all 27 pairs with/without noise. NBP counts the number of pixels, where absolute distance between ground truth depth $d^{\rm {GT}_p}$ within a thresholding value $T$; $|d_p-d^{\rm {GT}_p}|\geq T$. The proposed method has better metrics than the output without projection. The appearance of the projected results is almost the same; however, the error map shows that the proposed method suppresses errors. Random is better than PDS; however, the random method has a drawback: we cannot ensure the accuracy of local pixels because we cannot control the density of the dot pattern. The issue is serious for industry products.

 figure: Fig. 9.

Fig. 9. Disparity maps of various projection method with error maps ($T=1$).

Download Full Size | PDF

Tables Icon

Table 2. Ratio of the number of bad pixels ($T = 1.0$) for various patterns in noisy and raw inputs. The top performance data for each of the noise and raw data are listed in bold font.

Next, we compare the proposed skipped census-transform (SCT) with various const functions; dense census-transform (CT), sum of absolute difference (SAD), sum of square difference (SSD), Sobel filtered SAD (SSAD), and Sobel filtered SSD (SSSD). SAD and SSD with Sobel prefiltering suppress the influence of the gamma curves (photoelectric conversion characteristics) of the left and right cameras used in the OpenCV library. For dense CT, we use the following $7\times 3$ transform, since our system’s line buffer is limited to 3 line and the system’s FPGA uses 16 bits computation.

$$\begin{bmatrix} 0 & 1 & 1 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 & 1 & 1 & 0 \\ \end{bmatrix}.$$

Table 3 shows NBP for each cost function with/without noise. For each with/without nose case, the proposed method is better than the other. SAD is the second-best in the no-noise case; however, the SAD cost function is not better in noisy cases. The Sobel prefilter helps suppress noise effect degradation but is inferior to the census-based method. Compared with dense CT and skipped CT, the proposed skipped CT is better.

Tables Icon

Table 3. Ratio of the number of bad pixels ($T = 1.0$) for various const fucntions in noisy and raw inputs. The top performance data for each of the noise and raw data are listed in bold font.

Figure 10 shows the cost functions at the position $(634,447)$, where the searching range is from 50 to 200 pixels. The actual matching point is 180. Our method has a correct peak, while PDS is flat; thus, our method enhances texture well.

 figure: Fig. 10.

Fig. 10. Cost function at a pixel for ours and PDS.

Download Full Size | PDF

3.2 Real environment

We verified our system in a real environment. For our IPU, FPGA’s slice LUTs were 46708, and slice resisters were 118669. The processing frame rate was 64+ fps for $1280\times 1024$ images.

Figure 11 demonstrates our system in a real environment. Our system can capture depth values even on reflecting objects due to our band-pass filter. The filter can successfully suppress the effects of ambient light suppress, while RealSense fails to capture because the projector light is too strong. RealSense is for general purpose; thus, the projected light has high power. Ours is for indoor use, especially for factory automation.

 figure: Fig. 11.

Fig. 11. Real environment results. Black regions in RealSense are not available regions.

Download Full Size | PDF

Figure 12 show the point clouds of Fig. 11. The proposed method can restore higher resolution than RealSense. Moreover, the proposed method is further away from the photographed object than RealSense, since the lenses and baselines are designed for factory automation.

 figure: Fig. 12.

Fig. 12. Point clouds of the proposed system and RealSense in Fig. 11. Note that the cameras are positioned differently to capture the same object in the field of view on systems with different focal lengths.

Download Full Size | PDF

Figure 13 shows theoretical accuracy. The relationship between the distance $Z$ and its disparity $D$, and the distance error $\delta _Z$ and disparity error $\delta _D$ are as follows;

$$Z=\frac{l_{\rm{bl}}l_{\rm{px}}}{2\tan(\theta_h/2)}\frac{1}{D}, \quad\quad \delta_Z = \frac{l_{\rm{bl}}l_{\rm{px}}}{2\tan(\theta_h/2)}\left(\frac{1}{D}-\frac{1}{D+\delta_D}\right)$$
where $\theta _h$ is horizontal field of view, $l_{\rm {bl}}$ is baseline length, and $l_{\rm {px}}$ is horizontal pixel length. In this paper, we assume that $\delta _D=0.25$; however, the error depends on the state of the texture or object. In most cases, the error is within 0.25 or less. The proposed system has a long-baseline and small FoV, resulting in high z-resolution with low error. Instead, RealSense can capture a wider scene.

 figure: Fig. 13.

Fig. 13. $Z$-error of each sensor for each distance.

Download Full Size | PDF

Other examples are shown in Fig. 14. Figure 14 shows the output of our system for various objects. Figure 14(a) (b) shows the result of white resin objects. The objects have low texture, but we can stably capture depth information. Figure 14(c) (d) shows the result of black resin objects, i.e., lens caps. The objects have low reflection, but we can stably capture depth information. Figure 14(a) (b) and Fig. 14(c) (d) have the same exposure time. Generally, stereo matching becomes complex for over and underexposure. These experiments support that our system can capture depth maps in a wide dynamic range. Figure 14(d) (f) and 14(g) (h) are metal objects with changing illumination conditions. Figure 14(d) (f) can capture depth information well in good lighting condition. Figure 14(g) (h) cannot capture depth partially, since the luminance value is saturated in the strongly reflecting area. Note that we enlightened the object partially to degrade the depth map quality.

 figure: Fig. 14.

Fig. 14. Additional results on real environment.

Download Full Size | PDF

4. Conclusion

In this paper, we build a new system of the projected texture stereo. Also, we proposed a novel projecting pattern named Poisson disk sampling with randomized satellite points. Experimental results supported that our pattern was superior to the other competitive patterns, and our stereo system was more robust than the conventional system of Intel RealSense. Our hardware efficient algorithm of the census-transform realizes that the output frame rate is also higher than Intel RealSense.

One of the advantages of our projection pattern, but not using it in this study, is that the projector points hardly pollute the projected images. If the projected points can be removed, optimizations using the edge information of the input image as a reference are possible [32]. We aim to achieve higher accuracy using edges in the future.

Appendix: Simulation of pattern projection

The generating method of the pattern is as follows; For the projecting aperture size of photomask $S_a$ is larger than the pixel pitch; thus, we use the following point spread function (PSF) to darken pixels;

$$W=d_w \begin{bmatrix} w^{2} & w & w^{2} \\ w & 1 & w \\ w^{2} & w & w^{2} \\ \end{bmatrix},$$
where $w = (S_a-1.0)*0.5$. In our system, $S_a=3.7/5.3=1.264$. $d_w$ is darkening ratio and is $d_w=1/12$ verified by experiments. The PSF is convoluted for a mask pattern $m_C$ to generate an actual pattern $M_C$.
$$M_C=W*m_C,$$
where $*$ is a convolution operator. The mask position is the center of the left and right camera; however, the dataset do not have the disparity map. The generate the center disparity map $D_C$ from the disparity maps on the left and right camera by warping both depth map to the center position and then merge them. If the left and the right camera have disparity values, the output is the average value, and if only one of the disparity values is available, the available one is used. Finally, the projected left and right images $I^{M}_L, I^{M}_R$ are generated. Here, the subscript $p$ of a pixel position is written in expanded form, e.g., $I_p=I(x,y)$, and the symbol $y$ is omitted since disparity moves only horizontally, $I_p=I(x)$. The disparity map has sub-pixel accuracy; the projected patterns in the sub-pixel region are linearly interpolated. The projected images are as follows.
$$I^{M}_L(x) = \Big\{\bar{\alpha}(x)M_C(x+D_C'(x))+\alpha(x) M_C(x+D_C'(x)+1)\Big\} I_L(x),$$
$$I^{M}_R(x) = \Big\{\bar{\alpha}(x)M_C(x-D_C'(x))+\alpha(x) M_C(x-D_C'(x)-1)\Big\} I_R(x)$$
where $I_L$ and $I_R$ are source left and right images, $D_C'(x) = \lceil D_C(x)\rceil$ is a integer disparity map, $\alpha (x) = D_C(x)-D_C'(x)$ and $\bar {\alpha }(x)=1-\alpha (x)$ are coefficients for linear interpolation.

Funding

Kowa Optronics Co., Ltd.; Japan Society for the Promotion of Science KAKENHI (21H03465).

Disclosures

Conflict of Interest as shown below; Jun Takeda: Kowa Optronics Co., Ltd. (I, E, P), Kowa Company, Limited (I, E, P) Norishige Fukushima: Kowa Optronics Co., Ltd. (F, P),

Data availability

The simulation code and data are available in [33].

Supplemental document

See Supplement 1 for supporting content.

References

1. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int. journal computer vision 47(1/3), 7–42 (2002). [CrossRef]  

2. J. Geng, “Structured-light 3D surface imaging: a tutorial,” Adv. Opt. Photonics 3(2), 128–160 (2011). [CrossRef]  

3. K. Konolige, “Projected texture stereo,” in IEEE International Conference on Robotics and Automation, (2010), pp. 148–155.

4. P. Bing, X. Hui-min, X. Bo-qin, and D. Fu-long, “Performance of sub-pixel registration algorithms in digital image correlation,” Meas. Sci. Technol. 17(6), 1615–1621 (2006). [CrossRef]  

5. B. Pan, Z. Lu, and H. Xie, “Mean intensity gradient: An effective global parameter for quality assessment of the speckle patterns used in digital image correlation,” Opt. Lasers Eng. 48(4), 469–477 (2010). [CrossRef]  

6. B. Pan, H. Xie, Z. Wang, K. Qian, and Z. Wang, “Study on subset size selection in digital image correlation for speckle patterns,” Opt. Express 16(10), 7037–7048 (2008). [CrossRef]  

7. J. Lim, “Optimized projection pattern supplementing stereo systems,” in IEEE International Conference on Robotics and Automation, (2009), pp. 2823–2829.

8. P. Mirdehghan, W. Chen, and K. N. Kutulakos, “Optimal structured light a la carte,” in IEEE Conference on Computer Vision and Pattern Recognition, (2018).

9. S. Giancola, M. Valenti, and R. Sala, A Survey on 3D Cameras: Metrological Comparison of Time-of-Flight,Structured-Light and Active Stereoscopy Technologies (Springer, 2018).

10. L. Keselman, J. Iselin Woodfill, A. Grunnet-Jepsen, and A. Bhowmik, “Intel realsense stereoscopic depth cameras,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, (2017).

11. R. Zabih and J. Woodfill, “Non-parametric local transforms for computing visual correspondence,” in European Conference on Computer Vision (ECCV), (1994), pp. 151–158.

12. A. Grunnet-Jepsen, J. N. Sweetser, P. Winer, A. Takagi, and J. Woodfill, “Projectors for d400 series depth cameras,” Intel REALSENSE white paper.

13. S. R. Fanello, J. Valentin, C. Rhemann, A. Kowdle, V. Tankovich, P. Davidson, and S. Izadi, “Ultrastereo: efficient learning-based matching for active stereo systems,” in IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017).

14. M. Bleyer, C. Rhemann, and C. Rother, “Patchmatch stereo-stereo matching with slanted support windows,” in British Machine Vision Conference, (2011).

15. J. García, Z. Zalevsky, P. García-Martínez, C. Ferreira, M. Teicher, and Y. Beiderman, “Three-dimensional mapping and range measurement by means of projected speckle patterns,” Appl. Opt. 47(16), 3032–3040 (2008). [CrossRef]  

16. D. Khan, M. A. Shirazi, and M. Y. Kim, “Single shot laser speckle based 3D acquisition system for medical applications,” Opt. Lasers Eng. 105, 43–53 (2018). [CrossRef]  

17. J. Guo, X. Peng, A. Li, X. Liu, and J. Yu, “Automatic and rapid whole-body 3D shape measurement based on multinode 3D sensing and speckle projection,” Appl. Opt. 56(31), 8759–8768 (2017). [CrossRef]  

18. X. Zhang, D. Li, and R. Wang, “Active speckle deflectometry based on 3D digital image correlation,” Opt. Express 29(18), 28427–28440 (2021). [CrossRef]  

19. Y. Wang and S. Zhang, “Three-dimensional shape measurement with binary dithered patterns,” Appl. Opt. 51(27), 6631–6636 (2012). [CrossRef]  

20. W. Yin, S. Feng, T. Tao, L. Huang, M. Trusiak, Q. Chen, and C. Zuo, “High-speed 3D shape measurement using the optimized composite fringe patterns and stereo-assisted structured light system,” Opt. Express 27(3), 2411–2431 (2019). [CrossRef]  

21. S. S. Gorthi and P. Rastogi, “Fringe projection techniques: whither we are?” Opt. Lasers Eng. 48(2), 133–140 (2010). [CrossRef]  

22. X. Su and Q. Zhang, “Dynamic 3D shape measurement method: a review,” Opt. Lasers Eng. 48(2), 191–204 (2010). [CrossRef]  

23. S. Zhang, “High-speed 3D shape measurement with structured light methods: A review,” Opt. Lasers Eng. 106, 119–131 (2018). [CrossRef]  

24. S. Feng, L. Zhang, C. Zuo, T. Tao, Q. Chen, and G. Gu, “High dynamic range 3D measurements with fringe projection profilometry: a review,” Meas. Sci. Technol. 29(12), 122001 (2018). [CrossRef]  

25. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Machine Intell. 22(11), 1330–1334 (2000). [CrossRef]  

26. D. C. Brown, “Decentering distortion of lenses,” Photogrammetric Engineering and Remote Sensing (1966).

27. X. Ding, L. Xu, H. Wang, X. Wang, and G. Lv, “Stereo depth estimation under different camera calibration and alignment errors,” Appl. Opt. 50(10), 1289–1301 (2011). [CrossRef]  

28. R. L. Cook, “Stochastic sampling in computer graphics,” ACM Trans. Graph. 5(1), 51–72 (1986). [CrossRef]  

29. R. Bridson, “Fast poisson disk sampling in arbitrary dimensions,” SIGGRAPH sketches 10, 1 (2007). [CrossRef]  

30. D. Scharstein and R. Szeliski, “High-accuracy stereo depth maps using structured light,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2003).

31. D. Scharstein and C. Pal, “Learning conditional random fields for stereo,” in IEEE Conference on Computer Vision and Pattern Recognition, (2007).

32. T. Matsuo, S. Fujita, N. Fukushima, and Y. Ishibashi, “Efficient edge-awareness propagation via single-map filtering for edge-preserving stereo matching,” in Proc. Three-Dimensional Image Processing, Measurement (3DIPM), and Applications, (2015).

33. “Project page of poisson disk sampling with randomized satellite points for projected texture stereo,” https://norishigefukushima.github.io/PDSRSP_ProjectedTextureStereo/.

Supplementary Material (1)

NameDescription
Supplement 1       Supplemental document

Data availability

The simulation code and data are available in [33].

33. “Project page of poisson disk sampling with randomized satellite points for projected texture stereo,” https://norishigefukushima.github.io/PDSRSP_ProjectedTextureStereo/.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (14)

Fig. 1.
Fig. 1. Proposed projected texture stereo camera system.
Fig. 2.
Fig. 2. Input image grid and its transformed image grid. The radial distortion parameters of Brown’s distortion model [26] are $k_1=-8.60e^{-2}, k_2= 5.90e^{-1}, k_3=-1.78$, while our RealSense parameters are $k_1=0.12, k_2= -0.29, k_3=0.10$.
Fig. 3.
Fig. 3. Projector for random dot pattern and its photomask.
Fig. 4.
Fig. 4. 25 cases of randomized satellite points.
Fig. 5.
Fig. 5. Projecting random dot patterns.
Fig. 6.
Fig. 6. Skipped census-transform ($13\times 3$).
Fig. 7.
Fig. 7. Input and simulated projected stereo pair (Plastic).
Fig. 8.
Fig. 8. Raw and noisy input stereo pairs with the proposed projection. Average intensities of right images in the flat gray region are 184 (raw) and 190 (noise offset by gamma transform), respectively.
Fig. 9.
Fig. 9. Disparity maps of various projection method with error maps ($T=1$).
Fig. 10.
Fig. 10. Cost function at a pixel for ours and PDS.
Fig. 11.
Fig. 11. Real environment results. Black regions in RealSense are not available regions.
Fig. 12.
Fig. 12. Point clouds of the proposed system and RealSense in Fig. 11. Note that the cameras are positioned differently to capture the same object in the field of view on systems with different focal lengths.
Fig. 13.
Fig. 13. $Z$-error of each sensor for each distance.
Fig. 14.
Fig. 14. Additional results on real environment.

Tables (3)

Tables Icon

Table 1. Datasheet of the proposed system and Intel RealSense.

Tables Icon

Table 2. Ratio of the number of bad pixels ( T = 1.0 ) for various patterns in noisy and raw inputs. The top performance data for each of the noise and raw data are listed in bold font.

Tables Icon

Table 3. Ratio of the number of bad pixels ( T = 1.0 ) for various const fucntions in noisy and raw inputs. The top performance data for each of the noise and raw data are listed in bold font.

Equations (11)

Equations on this page are rendered with MathJax. Learn more.

p i p j 2 > d p j P ,
ϕ ( p i , r n d ( ) ) = { q i , } Q p i P ,
H ( I p I q ) q N p ,
H ( x ) = { 1 ( x > 0 ) 0 ( x 0 ) .
D p = arg min d D q M p H ( J p R , J p + d L ) ,
[ 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 0 ] .
Z = l b l l p x 2 tan ( θ h / 2 ) 1 D , δ Z = l b l l p x 2 tan ( θ h / 2 ) ( 1 D 1 D + δ D )
W = d w [ w 2 w w 2 w 1 w w 2 w w 2 ] ,
M C = W m C ,
I L M ( x ) = { α ¯ ( x ) M C ( x + D C ( x ) ) + α ( x ) M C ( x + D C ( x ) + 1 ) } I L ( x ) ,
I R M ( x ) = { α ¯ ( x ) M C ( x D C ( x ) ) + α ( x ) M C ( x D C ( x ) 1 ) } I R ( x )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.