Single-shot 3D measurement of highly reflective objects with deep learning

MingZhu Wan; Lingbao Kong

doi:10.1364/OE.487917

1. Introduction

In recent years, three-dimensional (3D) optical measurement has gained great interest in researches and applications for its non-contact and low-cost features. Growing demand for high-speed 3D measurement are witnessed in biomedicine, computer vision, and industrial manufacturing. Fringe projection profilometry (FPP) is the most popular 3D optical measurement method. By projecting a set of fringe patterns onto objects, FPP derives the 3D coordinates using fringe analysis and triangulation.

The first step in FPP is the calibration of the structured light system. Calibration methods can be classified into phase-height models and triangulation [1]. Extracting wrapped phase from fringe patterns and unwrapping the extracted wrapped phase to obtain the continuous phase information of the target to be measured are the key steps of fringe analysis. Phase-shifting [2] and temporal phase unwrapping (TPU) [3] methods are commonly used phase retrieval and unwrapping methods in FPP. Zeng et al. [4] proposed a self-unwrapping phase-shifting method for 3D shape measurement. They embed a space-varying phase shift (SPS) that uniquely determines the fringe order information into sinusoidal patterns, and then extract it to retrieve the absolute phase by pixelwise calculation. The absolute phase was retrieved without external information or priors. Wang et al. [5] presented a 3D measurement method for rigid moving objects based on phase-shifting and three pitches heterodyne phase unwrapping (TPHPU) algorithms. This method preprocesses the fringe patterns of moving objects in 3D space, instead of complex phase compensation and inherits the accuracy and robustness of phase-shifting and TPHPU algorithms. Hu et al. [6] addressed a microscopic 3D measurement method of shiny surfaces based on a multi-frequency phase-shifting scheme. They calculated the phase of the highlighted areas from a subset of the phase-shifted fringe images and solved the problem that the defocus of the dense fringe and the complex surface reflexivity characteristics decrease the fringe quality. Li et al. [7] designed an improved temporal phase unwrapping method based on super-grayscale multi-frequency grating projection. The captured realistic super-grayscale patterns can calculate the high-resolution phase information, so that the reliable unit-frequency phase can guide the low-frequency and high-frequency fringe order calculation more precisely, and improve the measurement accuracy. Li et al. [8] proposed a 3D reconstruction framework based on modified three-wavelength phase unwrapping algorithm and phase error compensation method. The three-wavelength phase unwrapping algorithm can improve the 3D frame speed and phase error compensation method can reduce the error of 3D measurement. Servin et al. [9] combined co-phased profilometry and 2-steps temporal phase-unwrapping techniques for 3D fringe projection profilometry and phase unwrapping. The proposed profilometer allows to measure highly discontinuous objects minimizing the shadows and maximizing the phase-sensitivity.

With the development of deep learning, convolutional neural networks have been applied in 3D measurement. Lin et al. [10] designed a multi-stage convolution neural network for fringe pattern denoising. Their method is competitive with state-of-the-art denoising methods in spatial or transform domain. Yao et al. [11] proposed a multi-purpose neural network to calculate absolute phase map from few patterns. These patterns included a sinusoidal fringe pattern and two code-based patterns. The well-trained multi-purpose network can retrieve absolute phase map, which reduced the number of patterns needed in phase-shifting techniques. Yao et al. [12] introduced a super-resolution technique for dense 3D reconstruction. The fringe resolution was extended using a dual-dense block super-resolution network (DdBSRN).

However, these methods above require phase-shifting fringe images in multiple frequencies, which reduces the measurement speed. Although these methods provide highly accurate measurement data for static objects by capturing multiple fringe pattern images, the performance can degrade due to the disturbance of vibration and movement between gap of image shots, which hinders the performance in applications of dynamic scenes.

Single-shot methods extract phase maps from one fringe image, which are robust against movement and have the advantages of fast fringe acquisition speed and low cost. Thus, single-shot methods are desired for dynamic 3D measurement. The methods based on spatial demodulation were commonly used in single-shot 3D measurement. These methods include Fourier transform profilometry (FTP) [13], windowed Fourier transform profilometry (WFTP) [14] and wavelet transform profilometry (WTP) [15]. Despite their viability in dynamic scenes, spatial demodulation methods have the problems of spectrum aliasing and spectrum leakage, thus being sensitive to noise and having low accuracy especially when measuring objects with strong texture. Single-shot methods based on deep learning are more accurate and robust than traditional methods, thus being more viable in practical applications.

Single-shot methods based on deep learning start from introducing deep networks into phase retrieval. Deep networks were first used to predict the numerator and denominator of wrapped phase [16–18]. The ground-truth numerator and denominator are prepared using phase-shifting techniques. These methods yielded better results than traditional demodulation methods. Networks that directly output wrapped phase maps were also designed [19–22]. Variations of U-Net were adopted as the network architecture for fringe analysis and calculation of the numerator and denominator were concealed in hidden layers. Instead of retrieving wrapped phase from a single fringe image, some researchers proposed to transform a single fringe image into phase-shifting fringe images [23–25]. The predicted fringe images were then processed using phase-shifting techniques, which served the same purpose as single-shot 3D measurement. In addition to phase retrieval methods, phase unwrapping methods based on deep learning have been developed [26–30]. These methods utilized deep networks to predict fringe order maps from wrapped phase maps, which either avoided the requirement of multi-frequency fringe images or improved the accuracy of phase unwrapping. End-to-end networks that directly predict height maps have also been researched [31–33]. These networks adopt a large number of simulated or real fringe images with height map ground-truth for supervised training. FCN, AEN and U-Net have been evaluated as the architecture of end-to-end network, where U-Net yielded the most accurate results. However, end-to-end networks cannot take advantage of some important intermediate information.

The measured objects in industrial manufacturing often spawn highly reflective areas and acquisition of fringes of these objects will lead to overexposure and information loss. Thus, high dynamic range (HDR) measurement is needed to tackle the large range of exposure levels in fringe images.

Multi-exposure and single-exposure HDR methods have been proposed for 3D measurement and other applications. Jiang et al. [34] presented a 3D scanning technique for high-reflective surfaces based on phase-shifting fringe projection method. By fusing the raw fringe patterns acquired with different camera exposure time and the illumination fringe intensity from the projector, a synthetic fringe image avoiding saturation and under-illumination phenomenon was generated. Yonesaka et al. [35] introduced a digital holography method using HDR imaging to improve the quality of reconstructed image. The HDR imaging process included estimating the camera response function and synthesizing multiple holograms. Song et al. [36] developed a HDR method using binary pattern projection for 3D measurement. They proposed to calculate HDR fringe images from estimated radiance maps using multi-exposure to relieve the saturation. Multi-exposure methods use different exposure time or light intensity to capture fringe images and fuse them to recover the details lost in highly reflective areas. These methods are time-consuming and cannot fulfil the need for real-time 3D measurement. Cogalan et al. [37] proposed a single-exposure HDR method using camera sensors that can perform per-pixel exposure modulation. They developed a joint frame deinterlacing and denoising algorithm using deep neural networks. Jiang et al. [38] introduced a simple HDR method by projecting inverted fringe patterns to complement regular patterns without using multiple exposure. Inverted fringe patterns were used as lieu or combined with regular patterns depending on the saturation of regular patterns. Wu et al. [39] reconstructed HDR objects using motion tracking and phase-shifting profilometry. They used motion to change the position of saturated points on objects for single-exposure HDR measurement. Single exposure methods mostly aim to increase the number of unsaturated fringe patterns in highly reflective methods. These methods have additional or advanced devices that increase the cost of FPP system or have accuracy loss in phase computation due to extra fringes or tracking. Most HDR methods are incapable of measuring moving objects in dynamic scenes.

There are few deep-learning-based single-shot 3D optical measurement methods that attempt to solve high dynamic range problem. Zhang et al. [40] increased the dynamic range of 3D measurement by using deep learning for phase calculation, which broadens the dynamic range of three-step phase-shifting by a factor of 4.8. Yang et al. [41] designed a deep network to detect low-modulation regions and enhance fringe pattern details in these regions. A standard metal gauge block with a height of 5 mm was measured with the RMSE improvement from 0.55 mm to 0.06 mm. Their methods were fully based on algorithm and cannot entirely avoid information loss in overexposed areas. Liu et al. [42] developed a hand-crafted metric for exposure selection and used deep learning networks to enhance the fringe image, which achieved similar coverage rates (97.6% versus 98.0%) to the HDR method with ten exposures. Despite not being a single-shot method, their method increased the accuracy in highly reflective areas in a single exposure.

In this work, a single-shot high dynamic range 3D measurement method with deep-learning-based exposure selection and fringe analysis is proposed. The proposed method adopts two convolutional neural networks that cooperate in training and measurement: exposure selection network (ExSNet) and fringe analysis network (FrANet). The ExSNet utilized self-attention mechanism to enhance overexposed and underexposed areas and solve the problem of high dynamic range in single-shot 3D measurement. In order to solve the problem that the phase map error increases as the number of fringe patterns collected decreases in traditional methods, the FrANet is constructed to predict the wrapped phase maps and the absolute phase maps to increase the accuracy of phase retrieval and phase unwrapping. A novel training strategy that enables the ExSNet to learn optimal exposures will be introduced. During measurement, the ExSNet predicts the optimal exposure time and exposure adjusted fringe images are processed by the FrANet. The proposed method will be evaluated using a real-world FPP system to verify the effectiveness of the deep-learning-based exposure selection and fringe analysis. Results and discussions will also be presented.

2. Single-shot 3D measurement of highly reflective objects

The proposed method utilizes two deep neural networks, ExSNet and FrANet, to select optimal exposure time and conduct fringe analysis. The FrANet is first trained using fringe images under different exposure time. Then the exposure time corresponding to the best network performance is used as ground truth to train the ExSNet. The process of adding ground-truth optimal exposure time to fringe images for training is named optimal exposure time annotation. In single-shot measurement, a fringe image with inappropriate exposure time is fed into the ExSNet. Exposure time is adjusted based on the result and recaptured image is processed using the FrANet. Overview of the proposed method is shown in Fig. 1. In contrast to the methods that enhance the fringe images with overexposure or underexposure, the proposed method adjusts the exposure time to avoid information loss that cannot be fully recovered using these algorithm-based methods. The ground-truth optimal exposure time is directly selected based on the FrANet results, which guarantees accurate 3D measurement using exposure selected images. Details of network architecture and training strategy for FrANet and ExSNet are introduced in the following sections.

Fig. 1. Overview of the proposed method.

Techniques	Relative RMSE
Without exposure selection	3.54
With exposure selection but without attention mechanism	1.62
FTP for fringe analysis	9.17
All techniques	1.00

Abstract

1. Introduction

2. Single-shot 3D measurement of highly reflective objects

2.1 Architecture of the fringe analysis network

2.2 Architecture of the exposure selection network

2.3 Training strategy

3. Experiments

3.1 Implementation details

3.2 Exposure selection evaluation

3.3 3D measurement evaluation

3.4 Ablation study

3.5 Comparison with other methods

4. Discussions

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (15)

Tables (1)

Equations (7)

Optics Express