Deep learning-based Phase Measuring Deflectometry for single-shot 3D shape measurement and defect detection of specular objects

Luyao Fan; Zhaoxing Wu; Jie Wang; Chen Wei; Huimin Yue; Yong Liu

doi:10.1364/OE.464452

1. Introduction

Optical three-dimensional (3D) measurement technology has been widely used in many fields such as intelligent manufacturing, reverse engineering and product inspection [1,2], especially in diffuse reflection objects measurement [3–5]. In recent years, the measurement of specular objects, for instance, rapid automatic detection of surface topography and paint defects of car bodies, has become increasingly important. However, traditional optical measurement methods, such as Phase Measuring Profilometry (PMP) [6,7] and Modulation Measuring Profilometry (MMP) [8–10] are difficult in dealing with specular objects. Therefore, Phase Measuring Deflectometry (PMD) was proposed for 3D shape measurement of specular objects [11]. Inspired by PMD and MMP, Structured-Light Modulation Analysis Technique (SMAT) was proposed for defect detection of specular objects [12].

PMD is a 3D shape measurement method for specular objects based on fringe reflection, which has the characteristics of high-precision and incoherent optical full-field measurement. The measurement accuracy of PMD can reach the nanometer level. Further research in PMD has been conducted in many aspects, such as modulation analysis (SMAT) [12], fringe pattern design [13], carrier removal method [14] and et al. The modulation information of specular objects is retrieved from the captured reflected fringe patterns for defect detection in SMAT. Therefore, how to accurately obtain the phase information and modulation information from the captured deformed fringe patterns, that is, the fringe analysis method, is one of the key factors affecting the accuracy of PMD and SMAT.

The fringe analysis methods are mainly divided into two categories: spatial phase demodulation methods and phase-shifting methods. Spatial phase demodulation methods only need to collect a single-frame fringe pattern to obtain the phase information and modulation information. However, they perform poorly in areas with abrupt, discontinuous, and detailed fringes. Compared with spatial phase demodulation methods, phase-shifting methods have higher accuracy and higher resolution. However, phase-shifting methods usually require at least three frames of fringe patterns to be acquired. In order to obtain higher accuracy and higher resolution results, the number of phase-shifting steps needs to be increased, resulting in more time cost. Therefore, it is still a challenging problem to obtain high-precision and high-resolution phase information and modulation information from a single-frame fringe pattern for researchers [7].

With the continuous development of computer technology and algorithms, deep learning techniques have been effectively applied in optical fields, such as optical imaging [15] and computer vision [16]. To improve the performance of deep learning techniques, many new network structures have been proposed. For example, to optimize network parameters and improve network performance, Google’s Chollet et al. proposed the Xception architecture that replaces the traditional convolution layer with the deep separable convolution layer, separating the cross-channel correlation and spatial correlation of neural network learning [17]. He et al. introduced a residual structure to solve the problem that the accuracy decreases with the increase of network depth in deep neural networks [18]. Ronneberger et al. proposed a U-Net network to get an accurate output with a small training dataset [19]. In recent years, deep learning techniques have been widely used in optical 3D shape measurement. For PMP, deep learning has been applied to fringe pattern analysis [20,21], fringe pattern enhancement [22], phase demodulation [23–25] and phase unwrapping [26–29]. For PMD, Qiao et al. introduced a method based on dual neural networks constructed with depthwise separable residual convolution blocks, which can recover the phase distributions of specular object surfaces from a single-frame fringe pattern [30], but the current accuracy of phase retrieval still needs to be improved. In SMAT, the current traditional SMAT methods cannot complete high-quality modulation extraction under the premise of collecting only a single-frame fringe pattern of the objected surface. Therefore, it will be of great significance to extract high-precision modulation information from a single-frame fringe pattern for fast even on-line, and high-quality defect detection on specular surfaces. Deep learning may be a solution of this problem. However, limited by the large difficulty of work, there is no related research reported in the literatures at present.

Inspired by the application of deep learning techniques in optical 3D shape measurement, we demonstrate for the first time that deep learning techniques can be used to recover high-precision modulation distributions on specular object surfaces from a single-frame fringe pattern under SMAT, enabling fast and high-quality defect detection of specular surfaces. And the proposed network can also be used to recover higher-precision phase distribution on specular object surfaces from a single-frame fringe pattern under PMD, enabling fast and higher-quality 3D shape measurement of specular surfaces. In this work, we combine depthwise separable convolution, residual structure and U-Net to build an improved U-Net network. For modulation retrieval, experimental results show that the proposed network can detect scratches with depth $1.5\mu m$ and width $5\mu m$. For phase retrieval, experimental results show that the phase recovery accuracy of the proposed network is about 5 times that in Ref. [30]. The proposed network exhibits excellent performance in extracting both the phase distributions and modulation distributions of specular objects from a single-frame fringe pattern, which means that this method can be used for simultaneous high-speed and high-accuracy measurement of 3D shape and defect detection. In Section 2, the principles of PMD and SMAT and detailed network architecture are introduced. In Section 3, experimental verifications and comparison results are presented in detail. Section 4 outlines the conclusions of this work.

2. Principle

2.1. Fringe analysis with PMD and SMAT

PMD and SMAT use the same system structure and the same fringe analysis methods. We can obtain phase information from the distorted fringe images and realize the 3D shape measurement of specular objects with PMD, while obtain modulation information from the distorted fringe images and realize the defect detection of specular objects with SMAT.

Both SMAT and PMD are based on structured light illumination system. Figure 1 shows the principle of PMD and SMAT. A PMD and SMAT system usually contains a display screen, an imaging device (normally a camera) and a computer. Fringe patterns are generated in software and displayed on the screen. The camera captures the distorted fringe patterns modulated by the measured surface. In PMD system, phase information is demodulated from the captured deformed fringe patterns to reconstruct 3D shape. In SMAT system, modulation information can be extracted from the captured deformed fringe pattern to detect defects of the specular surface.

Fig. 1. Schematic diagram of PMD and SMAT. The angle of incidence is doubly changed due to the angle change caused by the slopes of the sample.

Download Full Size | PDF

It’s difficult to achieve high-precision phase measurement by using spatial phase demodulation methods, so the training data is obtained by ten-step phase-shifting method.

The fringe patterns collected by the camera can be expressed as:

(1)$${I_n}(x,y) = A(x,y) + B(x,y) \cdot \cos [\Phi (x,y) + \frac{{2\pi n}}{N}], $$

where ${I_n}(x,y)$ is the light intensity of the fringe pattern, $(x,y)$ is the pixel coordinate of the camera, $A(x,y)$ is the average light intensity of fringe pattern, $B(x,y)$ is the modulation, $\Phi (x,y)$ is the absolute phase distribution, N is the number of phase-shifting steps, ${{2\pi n} / N}$ is phase-shifting size, and $n$ = 0,1,…, $N - 1$.

With a least-squares algorithm, the wrapped phase $\varphi (x,y)$ can be retrieved from an inverse trigonometric function:

(2)$$\varphi (x,y) = \arctan \frac{{\sum\nolimits_{n = 0}^{N - 1} {{I_n}} (x,y) \sin (\frac{{2\pi n}}{N})}}{{\sum\nolimits_{n = 0}^{N - 1} {{I_n}} (x,y) \cos (\frac{{2\pi n}}{N})}} = \arctan \frac{{M(x,y)}}{{D(x,y)}}, $$

the numerator and the denominator of the arctangent function can be expressed as:

(3)$$M(x,y) = \sum\nolimits_{n = 0}^{N - 1} {{I_n}(x,y)\sin (\frac{{2\pi n}}{N}) = \frac{N}{2}B(x,y)\sin \varphi (x,y)}, $$

(4)$$D(x,y) = \sum\nolimits_{n = 0}^{N - 1} {{I_n}(x,y)\cos (\frac{{2\pi n}}{N}) = \frac{N}{2}B(x,y)\cos \varphi (x,y)}. $$

Furthermore, the modulation $B(x,y)$ can be expressed as:

(5)$$B(x,y) = \frac{2}{N}\sqrt {M{{(x,y)}^2} + D{{(x,y)}^2}}. $$

Modulation measurement based on SMAT is almost unaffected by ambient light, and the modulation values at defects are much smaller than those at clean and intact areas, so extracting high-precision and clear modulation information is of great significance for defect detection on specular object surfaces [12].

2.2. Phase and modulation retrieval through deep neural networks

Inspired by recent successes of deep learning techniques in phase analysis, we combine deep convolutional neural networks with PMD and SMAT to develop a novel method for predicting high-quality phase information and modulation information from a single frame fringe pattern.

From the above analysis, it is feasible to predict the wrapped phase $\varphi (x,y)$ from the fringe pattern ${I_n}(x,y)$ by the neural network method, and then the unwrapped phase $\Phi (x,y)$ is obtained by a global phase unwrapping algorithm based on the least-square method. But directly predicting the wrapped phase $\varphi (x,y)$ from the fringe pattern ${I_n}(x,y)$ performs poorly in terms of detail in our case. This may be because the envelope of the wrapped phase $\varphi (x,y)$ is difficult to predict directly, and the inaccurate wrapped phase envelope will lead to the decrease of accuracy. It has been experimentally proved that the method of learning and predicting the numerator $M(x,y)$ and denominator $D(x,y)$ can finally obtain high-precision unwrapped phase $\Phi (x,y)$, while directly linking the fringe pattern ${I_n}(x,y)$ with the wrapped phase $\varphi (x,y)$, the accuracy is decreasing sharply [23,30]. Therefore, we choose to predict the numerator $M(x,y)$ and denominator $D(x,y)$, and then the wrapped phase $\varphi (x,y)$ is obtained by Eq. (2), and finally the unwrapped phase $\Phi (x,y)$ is obtained by the global phase unwrapping algorithm. The flow chart of the proposed method is shown in Fig. 2(a).

Fig. 2. Flowcharts of the proposed method. (a) Phase retrieval. (b) Modulation retrieval.

Download Full Size | PDF

However, the proposed method performs poorly when predicting the numerator $M(x,y)$ and denominator $D(x,y)$ to calculate the modulation $B(x,y)$ by Eq. (5). This may be because the error of the numerator $M(x,y)$ and denominator $D(x,y)$ will be magnified in Eq. (5), which will lead to residual fringes in the image of the modulation $B(x,y)$. Therefore, we choose to directly link the fringe pattern ${I_n}(x,y)$ with the modulation $B(x,y)$ to obtain high-quality modulation information. The flow chart of the proposed method is shown in Fig. 2(b).

In order to extract phase and modulation information from a single frame fringe pattern respectively, two neural networks with the same core architecture were constructed. The core architecture is shown in Fig. 3(a). The difference between these two neural networks is: one network predicts the numerator $M(x,y)$ and denominator $D(x,y)$ of Eq. (2) to get the wrapped phase $\varphi (x,y)$, thus it needs two output channels, while the other network directly connects the distorted fringe image $I(\textrm{x},y)$ with modulation $B(\textrm{x},y)$.

Fig. 3. Structure diagrams of the improved U-Net. (a) Improved U-Net network architecture. (b) Depthwise separable residual convolution block.

Download Full Size | PDF

The depthwise separable residual convolution block employed by the improved U-Net is shown in Fig. 3(b), which is mainly composed of one 3 × 3 depthwise convolutional layer and two 1 × 1 point convolutional layers. The first point convolutional layer doubles the number of channels (from N to 2N) to extract more image features. Each channel is then filtered individually with a 3 × 3 depthwise convolutional layer. The second point convolutional layer reduces the number of channels from 2N to N, which can be added to the input channels to form a residual structure. This design method separates the mapping of cross-channel correlation and spatial correlation learned by the neural network [17]. The point convolution layer only maps the cross-channel correlation, and the 3 × 3 depthwise convolution layer only maps the spatial correlation, which can optimize network parameters and improve network performance. The residual structure directly correlates the input channels and the output channels, which can better solve the problem of accuracy decline in complex networks and improve the training speed of the network [18]. The modified linear unit (ReLU) needs to be added after the first point convolutional layer and the 3 × 3 depthwise convolutional layer to improve network performance. ReLU is not added after the second point convolutional layer, because more information is inevitably lost when ReLU processes layers with fewer channels [31]. Adding batch normalization layer (BN) before ReLU can speed the convergence rate of the network.

As shown in Fig. 3(a), the core architecture of this network consists of a contracting path (left side, also called encoder) and an expansive path (right side, also called decoder). Each contracting layer consists of a 2 × 2 max pooling layer, a 3 × 3 normal convolutional layer (followed by a BN and a ReLU) and a depthwise separable residual convolutional layer, and all convolutional layers in the network use “SAME” padding. It should be explained that “SAME” padding just adds a row or column of zero pixels around the image to ensure that the size of the output image after convolution is the same as the input image. In our data set, specular objects are surrounded by the background area, where the value of pixels have been set to zero during data preprocessing, so “SAME” padding has no influence on the target specular objects. The 2 × 2 max pooling layer is used for down-sampling so that the size of the feature map becomes (H/2, W/2). The 3 × 3 normal convolutional layer is used to expand the number of channels to extract more image features. On the one hand, the depthwise separable residual convolution layer separates the mapping of cross-channel correlation and spatial correlation, which can optimize network parameters and improve network performance. At the same time, the residual structure can better solve the problem of accuracy decline in complex networks and improve the training speed of the network. On the other hand, the depthwise separable residual convolutional layer can also increase the depth of the network and extract higher-level image features. It should be noted that the first contracting layer does not perform max pooling, but directly consists of a 3 × 3 normal convolutional layer (followed by a BN and a ReLU) and a depthwise separable residual convolutional layer, which ensures that the final output feature map has the same size as the original feature map. The fifth contracting layer consists of a 2 × 2 max pooling layer, a 3 × 3 normal convolutional layer (followed by a BN and a ReLU), a depthwise separable residual convolutional layer and a 2 × 2 transposed convolutional layer (up-sampling). The 2 × 2 transposed convolutional layer up-samples, doubles the size of the feature map, and halves the number of feature channels, allowing it to be concatenated with the output of the corresponding contracting layer on the left and input to the expansion layer on the right.

Each expansive layer consists of a concatenate layer, a 3 × 3 normal convolutional layer (followed by a BN and a ReLU), a depthwise separable residual convolutional layer and a 2 × 2 transposed convolutional layer (up-sampling). The concatenate layer combines the output of the corresponding contracting layer with the output of the previous expansion layer to preserve more dimensional/positional image features. This key operation will facilitate the subsequent extended path layer fusion of shallow and deep features, which can better extract image features. Instead of performing a 2 × 2 transposed convolution, the last output layer performs a 1 × 1 point convolution layer (“SAME” padding), which maps to the output of the network. It should be noted that when extracting the phase, the 1 × 1 point convolution has two outputs, corresponding to the numerator $M(x,y)$ and denominator $D(x,y)$ respectively; when extracting the modulation, the 1 × 1 point convolution has only one output, corresponding to the modulation $B(x,y)$. The improved U-Net network has a total of 37 convolutional layers, excluding max pooling layers and transposed convolutional layers.

The proposed improved U-Net structure combines depthwise separable convolution, residual structure, and U-Net, which means that the improved U-Net contains the advantages of all three structures and has excellent performance. Compared with traditional U-Net, the proposed network has three main advantages. First, the number of channels in the network is reduced by half, which can greatly reduce the parameters of the network, improve the training speed of the network and reduce the risk of overfitting. Second, using depthwise separable residual convolution blocks can increase network depth, which is benefit for extracting image features better. The traditional U-Net network has a total of 19 convolutional layers (excluding max pooling and transposed convolutional layers), while the proposed improved U-Net network has a total of 37 convolutional layers, so our network depth is almost twice as many as traditional U-Net. Third, the depthwise separable residual convolution blocks can separate the mapping of cross-channel correlations and spatial correlations learned by neural networks, which can optimize network parameters and improve network performance.

To express more clearly, in this paper, dual neural networks CNN1 and CNN2 represent the networks used in Ref. [30]. The depthwise separable residual convolution blocks used by CNN1 and CNN2 are slightly different from Fig. 3(b), except that BN is not added after the second 1 × 1 point convolutional layer. In addition, CNN1 and CNN2 only achieve feature extraction by stacking multiple depthwise separable residual convolution blocks and expanding the number of channels, and the structure is relatively simple. Therefore, the performance of extracting features of different dimensions is limited. BN is usually used between the convolutional layer and the ReLU layer, not after the convolutional layer alone, in order to batch normalize the data, and then better complete the linear correction. Since ReLU is not added after the second 1 × 1 point convolutional layer, the design without BN is more reasonable. Due to the continuous down-sampling and up-sampling multiple times in the proposed network, features of multiple dimensions can be efficiently extracted. It should be noted that CNN1 and CNN2 can only be used to extract phase information, and perform poorly in extracting modulation information. But the proposed network can extract high-precision and clear phase and modulation information. Therefore, the proposed network has better performance and can better recover the detailed information of measured surfaces.

3. Experiments and results

To verify the effectiveness of the proposed method, we construct the experimental system, which includes a 1920×1080 resolution LCD screen (PHILIPS 243V5QSB) and a 1600×1200 resolution CCD camera (AVT-GT1660C, 8-bit pixel depth). In order to reduce the parameters of the network and improve the training speed of the network, we scale the 1600×1200 images captured by the CCD camera to 640×480. The inputs for training the phase network improved U-Net are 640×480 the fringe patterns ${I^c}$, and the outputs are the numerators ${M^c}$ and the denominators ${D^c}$. The inputs for training the modulation network improved U-Net are also 640 × 480 the fringe patterns ${I^c}$, and the outputs are the modulations ${B^c}$, where ${I^c}$ are obtained by the ten-step phase-shifting method, and ${M^c}$, ${D^c}$ and ${B^c}$ are calculated by the fringe pattern analysis method based on PMD and SMAT.

As a high accuracy fringe analysis method, the phase-shifting method is widely adopted not only in PMD and SMAT, but also in the field of fringe projection profilometry. In deep learning-based fringe projection profilometry, high accuracy phase information used as the ground truth is usually attained from 10-step or 12-step phase-shifting method [20,23]. In the proposed method, the ground truth used to train the neural network is obtained from the 10-step phase-shifting method too.

The quality of data sets is critical to the training of neural networks. For the ten-step phase-shifting method, a single scene can indeed provide 10 distorted fringe images. However, 10 different training samples from one set of ten-step phase-shifting images share the same label, which means that it is difficult for the neural network to converge. This may result in the loss of image details. Therefore, in the proposed method only one training sample from a single scene was used to build the data sets. A total of 26 sets of phase-shifting images were taken from three different types of surfaces (including Samsung mobile phone glass panel, iPhone glass panel, and concave mirror), and 26 distorted fringe images were obtained to compose the original data sets. Then the number of the samples was expanded from 26 to 200 by small-angle rotation operation. Finally, we divided them into training sets, validation sets, and test sets in a ratio of 8:1:1. Some typical network data sets are shown in Fig. 4. It should be noted that the proposed method performed well in our data sets, which are not that much for deep learning. This is due to the powerful capabilities for feature extraction of our improved network structure.

Fig. 4. Part of the network data sets. The first column ${I^c}$ are the inputs to the network that extracts the phase and the modulation. The second column the numerators ${M^c}$ and the third column the denominators ${D^c}$ are the labels of the network that extracts the phase. The fourth column the modulations ${B^c}$ are the labels of the network that extracts the modulation. (a) An iPhone glass panel. (b) A Samsung mobile phone glass panel.

Download Full Size | PDF

The improved U-Net neural network is built on a laptop with an Intel Core i5-9300H CPU and a GeForce GTX 1650 GPU (NVIDIA) with Python Language and TensorFlow Platform (Google). The loss function is Mean Squared Error (MSE). The optimizer is Adam Optimizer, which updates the weight parameters of the network with the loss value for better gradient descent, and its initial learning rate is set to 0.0001.

3.1 Phase retrieval based on deep learning and PMD

In Subsection 2.2 we introduced the method of phase retrieval: we choose to predict the numerator and denominator, and then the wrapped phase is obtained by Eq. (2), rather than directly linking the fringe pattern with the wrapped phase, which can improve the accuracy of the wrapped phase. In order to prove the superiority of this method, we conducted a comparative experiment on the above two methods for obtaining the wrapped phase, and then we compared the MAEs of the numerator, denominator, wrapped phase obtained by Eq. (2), and wrapped phase directly predicted by the network, as shown in Table 1. The experiment results show that although the MAE of wrapped phase calculated by Eq. (2) was magnified about 4 times compared with the numerator or denominator, it is still about one eighth of the phase result directly predicted by the network.

Table 1. The comparison of the MAEs

View Table | View all tables in this article

After training for 400 epochs, the loss function of the network converged, and the training took about 12 hours. Input the test data sets into the trained neural network to predict the numerators ${M^{dl}}$ and the denominators ${D^{dl}}$, and we finally got absolute phases.

Figure 5 shows the two test samples and the results of the neural network prediction. The trained neural network took the test fringe patterns ${I^{dl}}$ shown in Fig. 5(a) as inputs to predict the numerators ${M^{dl}}$ and denominators ${D^{dl}}$, which are shown in Fig. 5(b) and Fig. 5(c), respectively. The predicted numerators ${M^{dl}}$ and denominators ${D^{dl}}$ were entered into Eq. (2) to calculate the wrapped phases ${\varphi ^{dl}}$, as shown in Fig. 5(d).

Fig. 5. Prediction results for two test samples. (a) Input fringe patterns ${I^{dl}}$. (b) Numerators ${M^{dl}}$ predicted by the network. (c) Denominators ${D^{dl}}$ predicted by the network. (d) Wrapped phases ${\varphi ^{dl}}$ calculated from (b) and (c). In order to avoid the influence of the background area and reflect the recovery accuracy of absolute phases as realistically as possible, we only unwrapped the red dashed box area to obtain absolute phases.

Download Full Size | PDF

To verify the effectiveness of the proposed method, the network constructed in this research was compared with ground truth, which was obtained by ten-step phase-shifting method and a global phase unwrapping algorithm based on the least-squares method, as shown in Fig. 6(a). To obtain the absolute phases, the wrapped phases predicted by the network were unwrapped into the phase distribution shown in Fig. 6(b). As can be seen from Fig. 6, the surface of the first mobile phone glass panel is relatively smooth and has fewer pits, and the surface of the second mobile phone glass panel is relatively rough and has more pits. Figure 6(c) shows the unwrapped residual distribution between the neural network and ground truth. To quantitatively analyze the unwrapped phase quality, we calculated the mean absolute error (MAE) between the network result and ground truth, and recorded the MAE under the residual distribution map, as shown in Fig. 6. The results show that the proposed network reconstructs the surface topography of these two mobile phone glass panels well.

Fig. 6. Grayscale maps of ground truth (ten-step phase-shifting method) and absolute phases predicted by the network, corresponding to the red dashed box area in Fig. 5(d). (a) Grayscale map of ground truth. (b) Grayscale map of our method. (c) The residual distribution map between (a) and (b). The solid and dashed lines on test2 were for one-dimensional comparison in the horizontal and vertical directions respectively, as shown in Fig. 7.

Download Full Size | PDF

We further compared the absolute phase values of the horizontal and vertical directional lines on test2 (solid line for the horizontal direction and dashed line for the vertical direction), as shown in Fig. 7. Figure 7(a) and Fig. 7(b) show that the results of our method almost coincide with the absolute phase curves of ground truth. Therefore, the phase recovery accuracy of this method is almost comparable to ground truth. For further comparison, the predicted minimum phase errors MAEs of the proposed method and the method in Ref. [30] are listed in Table 2. The networks in Ref. [30] using traditional depthwise separable residual convolution blocks have a minimum MAE = 0.028 rad, and our network has a minimum MAE = 0.0056 rad. Therefore, compared with the result in Ref. [30], the phase recovery accuracy of our network is about 5 times its accuracy. Experiments prove that this method can recover high-precision absolute phase information from a single-frame fringe pattern.

Fig. 7. One-dimensional comparison between the results of our method and the ground truth. (a) Comparison of phase values at solid line of Fig. 6. (b) Comparison of phase values at dashed line of Fig. 6.

Download Full Size | PDF

Table 2. The minimum MAEs of the proposed method and the method in Ref. [30]

View Table | View all tables in this article

To further quantitatively analyze the quality of the 3D surface reconstruction, the reconstructed 3D shapes of a mobile phone glass panel and a concave mirror are shown in Fig. 8. Experimental results indicate that the MAEs between the results of the proposed method and ground truth are $2.8112 \times {10^{ - 5}}mm$ (the iPhone glass panel) and $1.3489 \times {10^{ - 4}}mm$ (the concave mirror), respectively. So, compared the MAE of the 3D shape result in Ref. [30] ($0.00075mm$), the error of the proposed method is reduced by at least 80%. This experiment proves that the proposed method not only extracts high- precision phase distribution from a single fringe pattern, but also finally completes high-quality single-shot 3D shape measurement.

Fig. 8. 3D measurement results of a mobile phone glass panel and a concave mirror. The height distribution is obtained by the unwrapped phase integration in both X and Y directions. (a) Distorted fringe patterns in both directions. (b) The 3D measurement results of our method. (c) The 3D measurement results of ground truth. (d) The residual distribution between (b) and (c).

Download Full Size | PDF

3.2 Modulation retrieval based on deep learning and SMAT

After training for 300 epochs, the loss function of the network converged, and the training took about 11 hours. Input the test data sets into the trained neural network to predict the modulations ${B^{dl}}$.

Figure 9 shows the two test samples and the results of the neural network prediction. The trained neural network took the test fringe patterns ${I^{dl}}$ shown in Fig. 9(a) as inputs to predict the modulations ${B^{dl}}$, as shown in Fig. 9(b). To verify the effectiveness of the proposed method, the neural network built in this research was compared with ground truth ${B^t}$, which is shown in Fig. 9(c). Figure 9(d) shows the residual distribution between our method and ground truth. To quantitatively analyze the quality of the modulation, we calculated MAE between our method and ground truth, and recorded the MAE under the residual distribution map.

Fig. 9. Prediction results for two test samples. (a) Input fringe patterns ${I^{dl}}$. (b) Modulations ${B^{dl}}$ predicted by the network. (c) Ground truth ${B^t}$. (d) Residual distribution between (b) and (c). The solid and dashed lines on test2 were for one-dimensional comparison in the horizontal and vertical directions respectively, as shown in Fig. 10.

Download Full Size | PDF

We further compared the modulation values of the horizontal and vertical directional lines on test2 (solid line for the horizontal direction and dashed line for the vertical direction), as shown in Fig. 10. Figure 10(a) and Fig. 10(b) show that the results of our method almost coincide with the modulation curves of ground truth. Therefore, the modulation recovery accuracy of the proposed method is almost comparable to ground truth. Experiments prove that the proposed method can recover high-precision modulation information from a single-frame fringe pattern.

Fig. 10. One-dimensional comparison between the results of our method and the ground truth. (a) Comparison of modulation values at solid line of Fig. 9. (b) Comparison of modulation values at dashed line of Fig. 9.

Download Full Size | PDF

To further evaluate the performance of the proposed method for detecting defects on specular object surfaces, we measured a reticle with scratches of different depths and widths, and compared measurement result with ground truth. The measurement result is shown in Fig. 11. The scratch depth is $1.5\mu m$, $2\mu m$, $5\mu m$, $10\mu m$ from left to right, and the scratch width is $30\mu m$, $30\mu m$, $20\mu m$, $20\mu m$, $10\mu m$, $10\mu m$, $5\mu m$, $5\mu m$ from top to bottom. It can be seen from Fig. 11(b) and Fig. 11(c) that the network successfully detects all scratches on the reticle. Figure 11(d) shows the residual distribution between the proposed method and ground truth. To quantitatively analyze the quality of modulation detection, we calculated MAE between the proposed method and ground truth, and recorded the MAE under the residual distribution map.

Fig. 11. Measurement results of the reticle. (a) Inputs of the fringe pattern ${I^{dl}}$. (b) Modulation ${B^{dl}}$ predicted by the network. (c) Ground truth ${B^t}$. (d) Residual distribution between (b) and (c). The dashed line on the reticle was for one-dimensional comparison in vertical direction, as shown in Fig. 12.

Download Full Size | PDF

We further compared the modulation values of the vertical directional line on the reticle (dashed line for the vertical direction), as shown in Fig. 12. Figure 12 shows that the result of our method almost coincides with the modulation curve of ground truth, and all eight scratches are successfully detected. This experiment proves that the proposed method can detect scratches with depth $1.5\mu m$ and width $5\mu m$. Therefore, the proposed method has excellent performance in defect detection on specular surfaces.

Fig. 12. One-dimensional comparison between the results of our method and ground truth. Comparison of modulation values at dashed line.

Download Full Size | PDF

From the above experimental results, we can see that the proposed method has the following advantages: First, the proposed network can obtain excellent training and testing results with less datasets (200 datasets are used in this paper). Second, the proposed network can extract high-precision modulation information from a single-frame fringe pattern, and the proposed method can detect scratches with depth $1.5\mu m$ and width $5\mu m$. Third, the proposed network can extract phase information with higher precision from a single frame fringe pattern, and the phase recovery accuracy of our network is about 5 times that in Ref. [30].

It should be noted that compared with phase retrieval, it is more difficult to conduct deep learning-based single-frame and high-precision modulation retrieval, since defect reflected by modulation often occupies fewer pixels in the image. When training deep-learning model for prediction, the problems of overfitting, fringe residues and detail loss occur frequently, which seriously affect the effect of defect information detection. We finally realized the single-shot and high-precision modulation retrieval based on deep learning technology by optimization of network structures, production of data sets, adjustment of parameters, etc. This is the first success of deep learning technology in the field of modulation information prediction, which is of great significance for high-speed and high-accuracy defect detection of specular surfaces, and conducting the well-known phase unwrapping algorithm based on modulation ordering [32] by deep learning technology.

4. Conclusions

In this work, we propose a deep learning-based phase measuring deflectometry for 3D shape measurement and defect detection of specular objects. We demonstrate for the first time that deep learning techniques can be used to recover high-precision modulation distribution from a single-frame fringe pattern under SMAT, enabling fast and high-quality defect detection of specular surfaces. And the proposed network can also be used to recover higher-precision phase distribution on specular object surfaces from a single-frame fringe pattern under PMD, enabling fast and higher-quality 3D shape measurement of specular surfaces. The experimental results show that compared with the existing deep learning method in PMD, the accuracy of phase results predicted by this method is about 5 times its accuracy and almost comparable to the ground truth (ten-step phase-shifting method). The accuracy of modulation results is also close to ground truth and the proposed method can detect scratches with depth $1.5\mu m$ and width $5\mu m$.

Phase detection and modulation detection experiments verify that the network has a high degree of portability. In the meantime, the successful detection for different types of specular surface also proves the generalization ability of the network. The effectiveness of the proposed method provides more possibilities for the combined use of PMD and SMAT to improve each other's performance and provides a broad prospect for further research and development in the future. We believe that this method can provide new insights for fast and high-quality 3D shape measurement and defect detection of specular objects.

Funding

National Natural Science Foundation of China (62075032, 61875033).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. Qian, S. Feng, T. Tao, Y. Hu, K. Liu, S. Wu, Q. Chen, and C. Zuo, “High-resolution real-time 360 3d model reconstruction of a handheld object with fringe projection profilometry,” Opt. Lett. 44(23), 5751–5754 (2019). [CrossRef]

2. J. Salvi, S. Fernandez, T. Pribanic, and X. Llado, “A state of the art in structured light patterns for surfaceprofilometry,” Pattern Recognit. 43(8), 2666–2680 (2010). [CrossRef]

3. F. Chen, G. M. Brown, and M. Song, “Overview of three-dimensional shape measurement using optical methods,” Opt. Eng. 39(1), 8–22 (2000). [CrossRef]

4. S. S. Gorthi and P. Rastogi, “Fringe Projection Techniques: Whither we are?” Opt. Lasers Eng. 48(2), 133–140 (2010). [CrossRef]

5. S. Zhang, “Recent progresses on real-time 3D shape measurement using digital fringe projection techniques,” Opt. Lasers Eng. 48(2), 149–158 (2010). [CrossRef]

6. V. Srinivasan, H. C. Liu, and M. Halioua, “Automated phase-measuring profilometry of 3-D diffuse objects,” Appl. Opt. 23(18), 3105–3108 (1984). [CrossRef]

7. Z. Wu, W. Guo, and Q. Zhang, “High-speed three-dimensional shape measurement based on shifting Gray-code light,” Opt. Express 27(16), 22631–22644 (2019). [CrossRef]

8. L. Su, X. Su, W. Li, and L. Xiang, “Application of modulation measurement profilometry to objects with surface holes,” Appl. Opt. 38(7), 1153–1158 (1999). [CrossRef]

9. M. Lu, X. Su, Y. Cao, Z. You, and M. Zhong, “Modulation measuring profilometry with cross grating projection and single shot for dynamic 3D shape measurement,” Opt. Lasers Eng. 87, 103–110 (2016). [CrossRef]

10. J. Huang, W. Chen, and X. Su, “Application of two-dimensional wavelet transform in the modulation measurement profilometry,” Opt. Eng. 56(3), 034105 (2017). [CrossRef]

11. L. Huang, M. Idir, C. Zuo, and A. Asundi, “Review of phase measuring deflectometry,” Opt. Lasers Eng. 107, 247–257 (2018). [CrossRef]

12. Y. Huang, H. Yue, Y. Fang, W. Wang, and Y. Liu, “Structured-light modulation analysis technique for contamination and defect detection of specular surfaces and transparent objects,” Opt. Express 27(26), 37721–37735 (2019). [CrossRef]

13. Z. Zhang, Y. Wang, S. Huang, Y. Liu, C. Chang, F. Gao, and X. Jiang, “Three-Dimensional Shape Measurements of Specular Objects Using Phase-Measuring Deflectometry,” Sensors 17(12), 2835 (2017). [CrossRef]

14. H. Yue, Y. Wu, B. Zhao, Z. Ou, and Y. Liu, “A carrier removal method in phase measuring deflectometry based on the analytical carrier phase description,” Opt. Express 21(19), 21756–21765 (2013). [CrossRef]

15. F. Buggenthin, F. Buettner, P. S. Hoppe, M. Endele, M. Kroiss, M. Strasser, M. Schwarzfischer, D. Loeffler, K. D. Kokkaliaris, and O. Hilsenbeck, “Prospective identification of hematopoietic lineage choice by deep learning,” Nat. Methods 14(4), 403–406 (2017). [CrossRef]

16. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” in Neural Information Processing Systems (2015), pp. 91–99.

17. F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 1800–1807.

18. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 770–778.

19. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Intentional Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI); Springer: Cham, Switzerland, 2015.

20. S. Feng, C. Zuo, Y. Hu, Y. Li, and Q. Chen, “Deep-learning-based fringe-pattern analysis with uncertainty estimation,” Optica 8(12), 1507–1510 (2021). [CrossRef]

21. S. Feng, C. Zuo, L. Zhang, W. Yin, and Q. Chen, “Generalized framework for non-sinusoidal fringe analysis using deep learning,” Photonics Res. 9(6), 1084–1098 (2021). [CrossRef]

22. J. Shi, X. Zhu, H. Wang, L. Song, and Q. Guo, “Label enhanced and patch based deep learning for phase retrieval from single frame fringe pattern in fringe projection 3d measurement,” Opt. Express 27(20), 28929–28943 (2019). [CrossRef]

23. S. Feng, Q. Chen, G. Gu, T. Tao, L. Zhang, Y. Hu, W. Yin, and C. Zuo, “Fringe pattern analysis using deep learning,” Adv. Photonics 1(02), 1 (2019). [CrossRef]

24. S. Feng, C. Zuo, W. Yin, G. Gu, and Q. Chen, “Micro deep learning profilometry for high-speed 3d surface imaging,” Opt. Lasers Eng. 121, 416–427 (2019). [CrossRef]

25. S. Van der Jeught and J. J. Dirckx, “Deep neural networks for single shot structured light profilometry,” Opt. Express 27(12), 17091–17101 (2019). [CrossRef]

26. W. Yin, Q. Chen, S. Feng, T. Tao, L. Huang, M. Trusiak, A. Asundi, and C. Zuo, “Temporal phase unwrapping using deep learning,” Sci. Rep. 9(1), 20175 (2019). [CrossRef]

27. J. Qian, S. Feng, T. Tao, Y. Hu, Y. Li, Q. Chen, and C. Zuo, “Deep-learning-enabled geometric constraints and phase unwrapping for single-shot absolute 3d shape measurement,” APL Photonics 5(4), 046105 (2020). [CrossRef]

28. J. Qian, S. Feng, Y. Li, T. Tao, J. Han, Q. Chen, and C. Zuo, “Single-shot absolute 3d shape measurement with deep-learning-based color fringe projection profilometry,” Opt. Lett. 45(7), 1842–1845 (2020). [CrossRef]

29. Y. Li, J. Qian, S. Feng, Q. Chen, and C. Zuo, “Composite fringe projection deep learning profilometry for single-shot absolute 3D shape measurement,” Opt. Express 30(3), 3424–3442 (2022). [CrossRef]

30. G. Qiao, Y. Huang, Y. Song, H. Yue, and Y. Liu, “A single-shot phase retrieval method for phase measuring deflectometry based on deep learning,” Opt. Communications. 476, 126303 (2020). [CrossRef]

31. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2018), pp. 4510–4520.

32. Y. Zhu, Z. Luan, Q. Yang, D. Li, W. Lu, and L. Liu, “Improved reliability-guided phase unwrapping algorithm based on the fringe modulation and second-order phase difference,” Optik 118(4), 175–180 (2007). [CrossRef]

Network	The MAEs
The numerator	0.0067
The denominator	0.0065
The wrapped phase obtained by Eq. (2)	0.0248
The wrapped phase directly trained and predicted by the network	0.1962

Network	The MAEs
The numerator	0.0067
The denominator	0.0065
The wrapped phase obtained by Eq. (2)	0.0248
The wrapped phase directly trained and predicted by the network	0.1962

Deep learning-based Phase Measuring Deflectometry for single-shot 3D shape measurement and defect detection of specular objects

Abstract

1. Introduction

2. Principle

2.1. Fringe analysis with PMD and SMAT

2.2. Phase and modulation retrieval through deep neural networks

3. Experiments and results

3.1 Phase retrieval based on deep learning and PMD

3.2 Modulation retrieval based on deep learning and SMAT

4. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Tables (2)

Equations (5)

Optics Express

Network	The minimum MAEs
The proposed improved U-Net	0.0056 rad
Networks in Ref. [30]	0.028 rad