Depth and thermal sensor fusion to enhance 3D thermographic reconstruction

Yanpeng Cao; Baobei Xu; Zhangyu Ye; Jiangxin Yang; Yanlong Cao; Christel-Loic Tisse; Xin Li

doi:10.1364/OE.26.008179

1. Introduction

Infrared thermography provides a passive and noncontact tool to measure surface temperature of target objects. It has been applied in a broad range of applications such as thermal inspection, energy assessment, nondestructive testing, and medical diagnosis. However, existing thermography solutions are mostly based on 2D thermal images and their performance is inherently limited [1]. Static 2D thermal projection of 3D objects cannot provide sufficient information for large-scale and continuous thermal inspection tasks. In comparison, dynamic 3D thermographic model provides a better way to depict temperature distribution of 3D object [2, 3]. 3D geometrical model with incorporated thermal information enables quantitative analysis of object temperature distribution and fast identification of Regions of Interest (ROIs). Moreover, effective fusion of abundant data captured by multimodal sensors leads to noise reduction of range sensor and accuracy improvement of temperature measurement.

Recently, fusion of geometrical and thermal information has received extensive attention as a promising solution to extend thermal imaging from 2D to 3D. However the majority of the existing 3D thermal scanning systems are either stationary or viewpoint restricted [4]. Without a reliable 3D registration functionality, these systems are not capable of performing mobile 3D reconstruction and commonly require offline data processing. The democratization of low-cost depth sensors (e.g., Microsoft Kinect) and GPU-based high performance computing have facilitated great progress in real-time mobile 3D reconstruction [5–7]. The KinectFusion algorithm [5] is one of the most notable depth data based 3D reconstruction solutions. In real-time, it performs dense reconstruction of a desk-sized scene at sub-centimeter resolution by fusing individually captured depth maps into a single volumetric representation. However, KinectFusion algorithm utilizes geometric information alone for camera pose tracking. The resulting reconstruction becomes unstable when inter-frame camera displacement is large. Therefore, 3D thermal scanning systems based on KinectFusion require to move slowly and smoothly around the target object. This constraint severely decreases system mobility, which is critical for dynamic and large-scale thermal monitoring tasks.

In this paper, we present a robust method for mobile and real-time 3D thermographic reconstruction to overcome the aforementioned limitations. Our observation is that thermal information remains robust against viewpoint and illumination changes, and thus it provides stable references to improve performance of 3D geometrical reconstruction. Through effective fusion of thermal and depth information, the proposed Thermal-guided ICP method can generate more accurate 3D reconstruction results using significantly less scanning data. A thermal camera and an RGB-D sensor are geometrically calibrated and used for multimodal data capturing. A coarse-to-fine methodology is utilized to improve robustness of camera pose estimation. This allows us to establish correct correspondences between a pair of 3D point clouds taken from different viewpoints (wide baseline 3D alignment). The pose of sensing device is initially estimated using 3D correspondences found through the maximization of thermal consistency between consecutive infrared images. The coarse pose estimate is further refined by optimizing a combined cost function which imposes both geometric and thermographic constraints. Through multimodal sensor fusion, the proposed Thermal-guided ICP method substantially improves robustness and accuracy of 3D thermographic reconstruction. The contribution of this paper is twofold.

Firstly, unlike most existing 3D thermal imaging solutions in which depth and infrared data are individually used for geometrical reconstruction and thermal mapping, our proposed Thermal-guided ICP combines multimodal information (thermal and depth images) to improve both robustness and accuracy of 3D thermal reconstruction. To the best of our knowledge, this is the first attempt to explore both depth and thermal information to enhance performance of 3D thermographic reconstruction.

Secondly, our method employs an effective coarse-to-fine methodology to improve robustness of camera pose estimation which enables to handle significant camera motion and therefore facilitates dynamic and large-scale 3D thermal scanning applications. Moreover, we demonstrate that the combination of both dense geometric and thermographic camera pose constraints leads to accuracy improvement of 3D modeling.

The paper is organized as follows. We start by reviewing some existing solutions for 3D reconstruction and thermal mapping in Sec. 2. Our multimodal sensing system and the Thermal-guided ICP algorithm are presented in Sec. 3 and Sec. 4, respectively. We then show an extensive experimental comparison of methods for dense 3D modeling in Sec. 5 and conclude the paper in Sec. 6.

2. Related work

In the past decades, a large variety of 3D reconstruction methods have been proposed. One possible solution is to exploit prior knowledge of the scene (e.g., parallelism or coplanarity constraints) to perform single image based 3D reconstruction [8, 9]. Although these approaches only require single images as input, their applicability is limited since can only reconstruct simple 3D geometric shapes. Another possible solution is based on Structure from Motion (SfM) technique which has been successfully applied to generate large-scale 3D geometrical models [10]. Given corresponding 2D image points found in multiple views, sparse 3D point clouds are generated through triangulation. However, this approach only works well under high illumination conditions. Note that such conditions are typically not guaranteed in infrared monitoring tasks (e.g., night vision and security surveillance). In addition, SfM-based methods only produce sparse 3D point clouds, and the generated 3D models have no absolute scale information which is not favorable for thermal diagnosis applications. In [11], expensive terrestrial laser scanning (TLS) systems are used to acquire denser and more accurate 3D geometric information of building facades. However, TLS devices are heavy and bulky which undesirably decreases the mobility of 3D thermal imaging systems.

Recently, a large number of RGB-D based approaches have been proposed for dense 3D reconstruction due to the popularization of low-cost depth sensors (e.g., Microsoft Kinect and Asus Xtion) [5–7, 12]. Given sequentially acquired depth data, KinectFusion [5] applies the iterative closest point (ICP) algorithm to align the current view with the global model. A GPU-based processing pipeline is utilized to accelerate computational performance for real-time applications. An extension to the KinectFusion algorithm is presented in [6, 13, 14]. Dense real-time 3D reconstruction of large-scale scenes is achieved by incorporating a visual odometry in the geometric ICP pipeline. Nießner et al. make use of an extra inertial measurement unit to obtain camera initial position and thus improve the robustness of ICP when large camera motion exists [15]. In [3], a Video-Based Pose Estimation (VBPE) method is combined with the ICP algorithm to reduce the occurrence of wrong pose estimation. However, VBPE method is based on the detection and selection of Good Features To Track (GFTT) [16] which cannot produce good correspondences between wide-baseline images. In [17], an N-frame graph matching algorithm is presented to handle partial matching for robust 3D reconstruction. The algorithm is capable of aligning data sets with small overlap and is robust for handling noisy data obtained by a low-cost RGB-D camera. However, this algorithm involves a number of processing steps, such as feature extraction and multi-frame graph matching, which are computationally expensive and not suitable for real-time implementation.

The idea of combining 3D depth data with 2D thermal images has begun to attract considerable attention, and a number of effective multimodal fusion frameworks have been presented to achieve 3D thermographic reconstruction [2, 4, 18–21]. In [22], the relative pose between a thermal camera and a 3D laser is calculated based on a number of manually selected control points, and 2D thermal images are projected onto laser scanned 3D point clouds. Once again, the mobility of this system is unsatisfactory, and 3D geometric reconstruction is performed offline. In [2], a system comprising a thermal camera and a range sensor is calibrated geometrically. In this system, the external poses of these devices are computed using the ICP algorithm. It is the first handheld system which generates dense 3D object models with both visual and temperature information. However, this ICP-based 3D thermal mapping system is still prone to fail when camera motion is significant. In [18], a computational framework is presented to determine the sub-pixel corresponding temperature for each 3D point, and a real-time system is built to achieve simultaneous measurement of geometry and temperature. Recently, a series of experiments were conducted to identify the major factors that affect the real-time generation of 3D thermograms including distance to object, surface reflection, and sensor motion [4]. Once more, it was shown that a high fidelity 3D thermogram can only be generated by slowly moving the sensors in front of the target object. We noticed that in all existing 3D thermal imaging systems [2, 4, 22], depth and infrared images are individually used for geometrical reconstruction and thermal mapping. In this paper we demonstrate that complementary information captured by multimodal sensors (thermal and depth cameras) can be utilized simultaneously to improve both robustness and accuracy of 3D thermal reconstruction. Our system is capable of completing 360 degrees thermal reconstruction in real-time using only dozens of captured images instead of hundreds as required in existing systems [2, 5, 6].

3. Our system

The 3D thermal imaging system consists of a Xenics Gobi 640 long-wave infrared camera (working spectral band is 8–14 µm) and a Microsoft Kinect v2 RGB-D sensor. Two sensors are rigidly attached together to preserve their relative position and orientation as shown in Fig. 1. The resolution of the infrared camera is 640 × 480 pixels and its frame rate is 50 fps. This infrared camera has been radiometrically calibrated and outputs thermal measurement with ±2°C accuracy for the 0°C–400°C temperature range. The RGB-D camera simultaneously captures a 512 × 424 depth image and a 1920 × 1080 color image, both at a rate of 30 fps. The effective working distance of this depth sensor ranges from 0.5 m to 2.0 m.

Fig. 1 Thermal camera and depth sensor are rigidly attached using an acrylic frame.

Download Full Size | PDF

As shown in Fig. 2(a), a raw infrared image contains noticeable distortion effects. It is required to perform intrinsic calibration of the infrared camera for lens distortion correction. We captured a number of thermal images of a calibration target board made of materials with different emissivities [23] and then applied the algorithm presented by Zhang et al. [24] to calibrate the infrared camera. Figure 2(b) shows the result of lens distortion correction using the computed intrinsic matrix. Then we performed synchronization of RGBD and thermal cameras via motion statistic alignment (TV-L1 optical flow) and computed the optimal time offset that temporally aligns the RGBD and thermal streams according to the method described in [25]. Finally, we followed the mask-based approach described in [26] to compute the relative pose between the thermal camera and the depth sensor. The estimated six degrees of freedom (6DoF) extrinsic matrix is used to accurately align coordinate systems of two sensors. Each point in the 2D thermal images is mapped to its counterpart in the 3D point cloud as shown in Fig. 3.

Fig. 2 Distortion correction of an infrared camera. (a) A raw infrared image with apparent lens distortions and (b) Result of lens distortion compensation. The edge of a square calibration board appears a straight line after distortion correction.

Download Full Size | PDF

Fig. 3 3D alignment of thermal and depth camera coordinate systems. (a) Initial alignment without extrinsic calibration, (b) Alignment result using the estimated extrinsic matrix. Note the heat colored point clouds are obtained by mapping each pixel of 2D thermal image to its counterpart in the 3D point cloud.

Download Full Size | PDF

4. Thermal-guided ICP

We present a Thermal-guided ICP method to improve performance of 3D thermographic reconstruction. The underlying principle is that thermal information remains robust against illumination and viewpoint changes, and thus it provides useful information to support registration of 3D geometric data. This method consists of two major processing steps as illustrated in Fig. 4 and applies a coarse-to-fine scheme for robust camera pose estimation. Firstly, the initial pose of sensing device is estimated using correspondences identified through maximizing the thermal consistency of two consecutive infrared images. Then, the coarse estimation results are further refined by finding the motion parameters that minimize a combined geometric and thermographic loss function. By utilizing complementary information captured by multimodal sensors, our proposed method can handle large camera motion and generate more accurate 3D reconstruction results. A

Fig. 4 Flowchart of the proposed Thermal-guided ICP method.

Download Full Size | PDF

4.1. Initial pose estimation

The ICP algorithm is the most commonly used method for computing camera pose to facilitate global registration of individually captured 3D data. Given the previous global camera pose T_i₋₁, ICP iteratively calculates 3D correspondences between two point clouds captured at time frames i − 1 and i to refine the current global camera pose T_i. Here T =(R|t) contains a 3 × 3 rotation matrix R and a 3D translation vector t. If the camera pose at frame i is not properly initialized (e.g., let’s assume that camera pose is unchanged from frame i − 1 to i and set its initial pose $T_{i}^{0} = T_{i - 1}$ ), many correspondences found through projective data association [5] are incorrect matches and would cause the ICP to get trapped in bad local minima. Therefore, when significant camera motion exists between consecutive frames, ICP-based alignments are prone to fail [17].

Thermographic camera captures thermal radiations emitted or transmitted by object, and thus its output is robust against variations of illumination and shading. Moreover, emissivity of dielectric materials remains invariant to small camera viewing angle changes (i.e., the angle between the direction of emission and the normal to the emitting surface is smaller than 60°) [27]. Therefore, we can consider that thermal information remains consistent during viewpoint and illumination changes and provides a useful reference to improve performance of 3D geometrical reconstruction. Between two consecutive infrared images I_i₋₁ and I_i, we first calculate an approximated transformation that is used to obtain an initial alignment between the two wide baseline 3D point clouds. With the camera moving around the 3D target object, the 2D infrared images can be correlated using an essential matrix [28]. However, using a 6DoF motion model to warp images can only produce satisfactory results when camera motion is small [29]. In our method, we use a simplified 2D translation model to approximate the transformation between consecutive infrared images. It offers a more efficient yet sufficient estimation to the initial alignments of 3D point clouds.

For that we calculate a 2D displacement vector u which maximizes the thermal consistency between I_i₋₁ and I_i. The thermal consistency energy E_tc is defined as

E_{t c} (u) = \sum_{x \in Ω} {(I_{i} (x + u) - I_{i - 1} (x))}^{2},

where x denotes pixel coordinates on the 2D thermal image plane Ω. Note E_tc is a nonlinear least-squares objective and can be minimized iteratively using the Gauss-Newton method. During the k-th iteration, we update u from its previous value u^k−1 by

u^{k} = u^{k - 1} + Δ u .

Then the cost term E_tc becomes

\begin{array}{l} E_{t c} (Δ u) \approx \sum_{x \in Ω} {(\nabla I_{i} (x + u^{k - 1}) Δ u + I_{i} (x + u^{k - 1}) - I_{i - 1} (x))}^{2} \\ = {‖ J_{r_{t c}} Δ u + r_{t c} ‖}^{2}, \end{array}

where r_tc is the residual vector calculated at each image pixel using u^k⁻¹ and

J_{r_{t c}}

is the Jacobian of r_tc, and ∆u is the solution of the following linear system,

J_{r_{t c}}^{⊤} J_{r_{t c}} Δ u = - J_{r_{t c}}^{⊤} r_{t c} .

Compared with common visible images, thermal images contain much less textures and details [30]. Consequently, this objective function E_tc defined using thermal information is smoother and has fewer local minima, which makes the above optimization quite stable and numerically efficient.

We use the computed optimal 2D translation $\hat{u}$ to establish correspondences between two wide baseline 3D point clouds. Let V_i be a 3D vertex map at frame i which maps a point x in the 2D depth image to its corresponding 3D vertex V_i(x), and $V_{i - 1}^{g}$ be a globally registered 3D vertex map at frame i − 1. Firstly, the global camera pose T_i at frame i is initialized with T_i₋₁, and each global 3D point $V_{i - 1}^{g} (x)$ is transformed to the current camera coordinate space and projected into a 2D image pixel with coordinates p. Then this 2D position is shifted by $\hat{u}$ and its new position $p + \hat{u}$ is used as a lookup into the current view 3D vertex map V_i to find $V (p + \hat{u})$ which is the corresponding 3D point of $V_{i - 1}^{g} (x)$ . As shown in Fig. 5, the 2D image translational shift, which is computed by maximizing the thermal consistency between two infrared images, provides useful information to establish correct correspondences between two wide baseline 3D scenes. Finally, we evaluate the compatibility of corresponding points to reject outliers by testing whether their depth difference is lower than a threshold. In our implementation, we empirically set this depth threshold to 0.1 m which is suitable for 3D scanning of desk-size objects using Kinect sensor [5]. Since valid correspondences from two widely separated 3D models might have large spatial distance, their Euclidean distance cannot be used at this initial pose estimation step to reject outliers. We implement this algorithm using GPU and its pseudocode is given in Algorithm 1.

Fig. 5 The computed optimal 2D translation $\hat{u}$ can be used to shift the projection p and establish correct correspondences between two wide baseline scenes. The correspondences are shown in thermal images for visualization.

Download Full Size | PDF

After establishing a number of correspondences between 3D point clouds captured at frames i − 1 and i, we can simply solve a least square fitting [31] to get the rigid transformation $T_{i}^{0}$ that aligns these two corresponded point sets. This $T_{i}^{0}$ is a good rough alignment between the point clouds from frames i − 1 and i. It will be further refined using geometric and thermographic constraints in the next section.

4.2. Pose refinement

Using the initial pose estimation $T_{i}^{0}$ for frame i, the projective data association in ICP could now find reasonably good corresponding point pairs from point clouds of frames i − 1 and i. During each projective data association step, a unique global 3D point $V_{i - 1}^{g} (x)$ in the previous frame i − 1 is transformed to the coordinate system of current view and projected to a 2D pixel p in the image plane. Then projected position p is directly used as a lookup into the current view vertex map V_i to find the closest 3D correspondence V_i(p) for $V_{i - 1}^{g} (x)$ . Each GPU thread calculates the Euclidean distance between two corresponding points as well as the angle between their normals to reject outliers. We empirically set the distance threshold to 0.1 m and the normal threshold to 20° according to the parameter setting of KinectFusion algorithm [5]. For each valid geometrical correspondence (a 3D point which is observable at both frames i − 1 and i), we evaluate their temperature information to generate a number of thermal points of interest. If a 3D point provides valid thermal measurement in previous and current views (the camera viewing angles are both smaller than 60°) and its temperature variation is lower than a threshold (we set this temperature threshold to 1°C), we consider it as a thermal interest point. Algorithm 2 lists the steps to populate a list of geometrical correspondences and a list of thermal points of interest by considering both geometrical and thermographic information.

Algorithm 1. Correspondence Accumulation For Wide Baseline 3D Alignment

View Table | View all tables in this article

Algorithm 2. Accumulation of 3D Correspondences and Thermal Interest Points at k-th Iteration for Pose Refinement

View Table | View all tables in this article

The motion parameters are found by maximizing both geometric and thermographic consistencies between sequential thermal-depth data. Two cost functions, which correspond to geometric and thermographic information respectively, are combined in a weighted sum as

E (T_{i}) = E_{i c p} (T_{i}) + ω E_{t d} (T_{i}),

where ω is the weight which is set empirically to 0.1 to balance the strength of the ICP term E_icp and thermal term E_td. The ICP cost term E_icp calculates the distance of each point in the current view to its corresponding point in the global model as

\begin{array}{l} E_{i c p} (T_{i}) = \frac{1}{M} \sum_{(x, p) \in T} {‖ (V_{i - 1}^{g} (x) - T_{i} V_{i} (p)) \cdot N_{i - 1}^{g} (x) ‖}^{2} \\ = \frac{1}{M} \sum_{(x, p) \in T} {‖ (v_{i - 1}^{g} - T_{i} v_{i}) \cdot n_{i - 1}^{g} ‖}^{2}, \end{array}

where

T

indicates projected 2D positions of reasonable 3D correspondences found through Algorithm 2, M is the number of correspondences in the list

T

,

V_{i - 1}^{g}

and

N_{i - 1}^{g}

are the 3D vertex and normal maps of the previous view in the global coordinate space, v_i = V_i(p) is the corresponding point of

v_{i - 1}^{g} = V_{i - 1}^{g} (x)

in the current frame, and T_i is the estimate of rigid transformation from current frame to the global coordinate system. The thermal cost term E_td in Eq. (5) measures the squared temperature difference of a 3D point shown in two consecutive thermal images I_i₋₁ and I_i using

\begin{array}{l} E_{t d} (T_{i}) = \frac{1}{N} {\sum_{x \in P} (I_{i} (κ (Ψ (V_{i - 1}^{g} (x), T_{i}))) - I_{i - 1} (x))}^{2} \\ = \frac{1}{N} {\sum_{x \in P} (I_{i} (κ (Ψ (v_{i - 1}^{g}, T_{i}))) - I_{i - 1} (x))}^{2}, \end{array}

where

P

is the list of thermal interest points, N is the number of points in the list

P

,

Ψ (v_{i - 1}^{g}, T_{i})

defines a rigid transformation

Ψ (v_{i - 1}^{g}, T_{i}) = T_{i}^{- 1} v_{i - 1}^{g},

and κ(v) is the function that projects a 3D vertex v = (v_x, v_y, v_z) into the image plane:

κ (v) = {[\frac{v_{x} f_{x}}{v_{z}} + c_{x}, \frac{v_{y} f_{y}}{v_{z}} + c_{y}]}^{⊤},

in which f_x and f_y are the focal lengths of thermal camera and (c_x, c_y)^⊤ is the principal point.

Both E_icp and E_td are nonlinear least-squares objectives and can be minimized using the Gauss-Newton method. We use a 6-dimensional vector ξ = (α, β, γ, t_x, t_y, t_z)^⊤ to represent an incremental transformation ∆T, and iteratively solve $T_{i}^{k}$ , following the approach of [29],

\begin{matrix} T_{i}^{k} = Δ T T_{i}^{k - 1} \\ \approx (I + \hat{ξ}) T_{i}^{k - 1}, \end{matrix}

where

\hat{ξ} = (\begin{matrix} 0 & - γ & β & t_{x} \\ γ & 0 & - α & t_{y} \\ - β & α & 0 & t_{z} \\ 0 & 0 & 0 & 0 \end{matrix}) .

The cost term E_icp becomes

\begin{array}{l} E_{i c p} (ξ) \approx \frac{1}{M} {\sum_{(x, p) \in T} ‖ (v_{i - 1}^{g} - (I + \hat{ξ}) T_{i}^{k - 1} v_{i}) \cdot n_{i - 1}^{g} ‖}^{2} \\ = \frac{1}{M} \sum_{(x, p) \in T} ∥ {[\begin{matrix} - v_{i} \times n_{i - 1}^{g} \\ - n_{i - 1}^{g} \end{matrix}]}^{⊤} ξ + (v_{i - 1}^{g} - T_{i}^{k - 1} v_{i}) \cdot n_{i - 1}^{g} ∥^{2} \\ = \frac{1}{M} ∥ J_{r_{i c p}} ξ + r_{i c p} ∥^{2}, \end{array}

where r_icp is the residual vector which is constructed at each correspondence using

T_{i}^{k - 1}

and

J_{r_{i c p}}

is the Jacobian of r_icp. Similarly the thermal cost term E_td becomes

\begin{array}{l} E_{t d} (ξ) \approx \frac{1}{N} \sum_{x \in P} {(I_{i} (κ (Ψ (v_{i - 1}^{g}, (I + \hat{ξ}) T_{i}^{k - 1}))) - I_{i - 1} (x))}^{2} \\ \approx \frac{1}{N} \sum_{x \in P} {(\nabla I_{i} (κ) J_{κ} (Ψ) J_{Ψ} (ξ) |_{ξ = 0} ξ + I_{i} (κ (Ψ (v_{i - 1}^{g}, T_{i}^{k - 1}))) - I_{i - 1} (x))}^{2} \\ = \frac{1}{N} ∥ J_{r_{t d}} ξ + r_{t d} ∥^{2}, \end{array}

where J_κ(Ψ) is the Jacobian of κ, derived from Eq. 9 and J_Ψ(ξ) is the Jacobian of Ψ with respect to ξ, derived from Eq. 8, 10, 11. Then we minimize the weighted sum E defined in Eq. 5 by solving the following normal equations

{[\begin{matrix} \frac{1}{\sqrt{M}} J_{r_{i c p}} \\ \frac{μ}{\sqrt{N}} J_{r_{t d}} \end{matrix}]}^{⊤} [\begin{matrix} \frac{1}{\sqrt{M}} J_{r_{i c p}} \\ \frac{μ}{\sqrt{N}} J_{r_{t d}} \end{matrix}] ξ = - {[\begin{matrix} \frac{1}{\sqrt{M}} J_{r_{i c p}} \\ \frac{μ}{\sqrt{N}} J_{r_{t d}} \end{matrix}]}^{⊤} [\begin{matrix} \frac{1}{\sqrt{M}} r_{i c p} \\ \frac{μ}{\sqrt{N}} r_{t d} \end{matrix}],

(\frac{1}{M} J_{r_{i c p}}^{⊤} J_{r_{i c p}} + \frac{ω}{N} J_{r_{t d}}^{⊤} J_{r_{t d}}) ξ = - \frac{1}{M} J_{r_{i c p}}^{⊤} r_{i c p} - \frac{μ}{N} J_{r_{t d}}^{⊤} J_{r_{t d}},

where

μ = \sqrt{ω}

. The final estimate returns the optimal camera pose which jointly minimizes the temperature difference between the current and previous frames and the geometric error between the current 3D data and the global model. Since the temperature of objects remains consistent against both viewpoint and illumination changes, thus the selected thermal interest points provide a stable reference to improve 3D geometrical registration. Through the fusion of thermal and geometrical information, our proposed Thermal-guided ICP method can handle large camera motion and produce more accurate 3D reconstruction results. Qualitative and quantitative evaluation results are provided in Sec. 5.

5. Experimental results

In this section, we evaluate our method in terms of robustness against viewpoint changes and accuracy of 3D reconstruction. The method is implemented in a standard desktop PC running Ubuntu 16.04 with an Intel Core i7-6700K CPU at 4.00GHz, 16GB of RAM and an nVidia GeForce GTX 1080 GPU with 8GB of memory. Our approach takes less than 20 ms to process a newly scanned thermal-depth data which is sufficient for real-time 3D thermal reconstruction at 50.0 Hz.

5.1. Benchmark dataset

A multimodal benchmark (RGB, thermal and depth) is presented for the evaluation of 3D thermographic reconstruction solutions. We slowly move the multimodal sensing system 360 degrees around target objects to record a set of multimodal image sequences. Given densely captured 3D point clouds, we firstly apply the KinectFusion algorithm to compute relative pose between every two consecutive frames. Then we manually check alignment between a frame and its N spatially adjacent frames (we set N = 100 in our experiments) to detect valid loop closures. We construct a graph of camera poses in which each vertex is the camera pose of a frame and each edge represents the relative pose between two successfully aligned frames. Finally, we use the g2o framework presented by Kümmerle et al. [32] for optimizing graph-based nonlinear error functions. The loop closures and map optimization compensate camera drift accumulated during incremental 3D alignment. The corrected camera trajectory allows to construct an accurate 3D model which we consider as the high-accuracy reference. In total, we captured four image sequences covering two human bodies (with and without clothes) and two objects (a water boiler and an office chair) as shown in Fig. 6.

Fig. 6 The multimodal benchmark used for evaluation of 3D thermographic reconstruction solutions including (a) A person with clothes, (b) A person without clothes, (c) A water boiler, (d) An office chair.

Download Full Size | PDF

5.2. Robustness against large displacement

In the first set of experiments we evaluate our proposed coarse-to-fine scheme for robust camera pose estimation. In our experiments, we gradually down-sample the frame rate to increase the displacement between two consecutive frames and apply various methods to perform 3D reconstruction. We compare our proposed T-ICP with state-of-the-art 3D reconstruction approaches including RGB-based visual odometry (RGB) [29], ICP-based KinectFusion (ICP) [5] and RGBD-ICP-based Kintinuous (RGBD-ICP) [12]. We also consider an alternative Visible-guided ICP (V-ICP) in which the initial camera pose is calculated by shifting two visible images instead of thermal images. To evaluate whether the 3D reconstruction is successful, a visual check of the fused geometric and thermal information is performed as illustrared in Fig. 7. For each image sequence, we perform 3D reconstruction for 10 times using randomly selected starting frames and compute the averaged minimum number of frames required for 3D reconstruction. The calculated values are documented in Table 1 and some comparative results are shown in Fig. 8 and Visualization 1. The experiments show that our T-ICP is capable of completing 360 degrees reconstruction using significantly fewer frames. This improvement is very helpful for dynamic and large-scale 3D thermal scanning applications. In Fig. 9 we show the last frame of the video sequence which can be successfully aligned using different methods. Since thermal information remains more robust against viewpoint changes, the 2D optimal translation which maximizes the thermal consistency between two infrared images can be used to establish correct correspondences between two wide baseline 3D point clouds. As the result, our method can successfully handle large camera displacements. It is also worth mentioning that thermal images contain significantly less high-frequency signals compared with visible images [30]. Therefore, using thermal information to estimate the 2D translations is numerically more stable and efficient than using visual images. In our experiments, T-ICP always improves robustness of camera pose estimation while V-ICP sometimes performs even worse compared with the standard ICP algorithm.

Fig. 7 Samples of 3D thermal reconstruction. A successful 3D reconstruction (left) and a failed example (right).

Download Full Size | PDF

Table 1. Averaged minimum number of frames required to complete 3D reconstruction.

View Table | View all tables in this article

Fig. 8 3D thermal reconstruction results when the frame rate is decreased to 1/72. (a) Results of ICP, (b) Results of RGBD-ICP, and (c) Results of T-ICP. More comparative results are provided in Visualization 1.

Download Full Size | PDF

Fig. 9 Wide baseline alignment using ICP [5], Visible-guided ICP (V-ICP) and Thermal-guided ICP (T-ICP).

Download Full Size | PDF

5.3. Accuracy of 3D reconstruction

To evaluate the benefit of combined thermogrpahic and geometric error minimization, we decrease the frame rate to 1/30 and use the method described in Sec. 4.1 to compute the initial camera pose. Then we apply different combinations of pose constraints to refine the initial estimate to obtain the final 3D reconstruction output. We compare the accuracy of 3D reconstructions performed by the aforementioned ICP, RGBD-ICP, T-ICP. The accuracy is evaluated using the camera trajectory, by calculating the root mean squared error (RMSE) [13] between the computed one and the reference (obtained from densely sampled scan sequence). The comparative results are documented in Table 2. T-ICP produces the most accurate camera trajectories in all the experiments.

Table 2. Absolute trajectory error (RMSE between the estimated and the ground truth trajectories) of camera pose estimation using different pose constrains.

View Table | View all tables in this article

Figure 10 and Visualization 2 show 3D thermal models generated using different pose constraints. Both ICP and RGBD-ICP based methods contain significant mis-alignment errors, as highlighted in red bounding boxes. In contrast, our method effectively compensates such errors through the fusion of infrared and depth information. Our experimental results demonstrate that the combination of thermal and depth information enables accuracy improvement of 3D reconstruction when large camera displacement exists, confirming our previously mentioned observation that thermal information remains consistent during viewpoint and illumination changes.

Fig. 10 3D thermographic reconstruction by considering different pose constrains. (a) Results of ICP, (b) Results of RGBD-ICP, and (c) Results of T-ICP. Please see Visualization 2 to check the highlighted details.

Download Full Size | PDF

6. Conclusions

In this paper, we present a multimodal fusion method capable of producing high-fidelity 3D thermal models in real-time. Based on the observation that thermal information remains robust against viewpoint and illumination changes and that it could provide a stable reference for 3D geometric registration, we propose Thermal-guided ICP (T-ICP) method. T-ICP can handle large inter-frame motion where standard ICP-based solutions are prone to fail. For efficient and robust camera pose estimation, we perform a coarse-to-fine registration strategy. The pose of sensing device is initially estimated using 3D correspondences found through the maximization of thermal consistency of consecutive infrared images. This coarse pose estimate is then refined by our T-ICP integrating both geometric and thermographic constraints. Unlike the existing 3D thermal imaging solutions [2, 22] in which depth and infrared images are individually utilized for geometrical reconstruction and thermal mapping, our method utilizes multimodal information from both thermal and depth images, and it significantly improves the robustness and accuracy of 3D thermal reconstruction. In this paper, our proposed method is focused on thermal measurements of dielectric materials with high emissivity and it is not suitable for measuring temperatures of high-reflectivity metal objects. In the future, we plan to incorporate some reflection removal techniques (e.g., [33, 34]) into our proposed 3D thermal reconstruction framework to improve its applicability. We will also investigate techniques to optimize the fusion of pose estimates from different modalities to cope with more challenging 3D reconstruction tasks. Finally we plan to explore the fused knowledge of temperature and geometry to enable quantitative thermal analysis for a diverse range of applications such as energy efficiency monitoring, non-destructive evaluation, mechanical and electrical diagnosis, fire detection, and medical diagnosis.

Funding

National Natural Science Foundation of China (NSFC) (51605428, 51575486, U1664264).

References and links

1. S. Vidas and P. Moghadam, “Heatwave: A handheld 3D thermography system for energy auditing,” Energy Build. 66(5), 445–460 (2013). [CrossRef]

2. S. Vidas, P. Moghadam, and M. Bosse, “3D thermal mapping of building interiors using an RGB-D and thermal camera,” in Proceedings of IEEE International Conference on Robotics and Automation (IEEE, 2013), pp. 2311–2318.

3. S. Vidas, P. Moghadam, and S. Sridharan, “Real-time mobile 3D temperature mapping,” IEEE Sensors J. 15(2), 1145–1152 (2015). [CrossRef]

4. A. O. Müller and A. Kroll, “Generating high fidelity 3-D thermograms with a handheld real-time thermal imaging system,” IEEE Sensors J. 17(3), 774–783 (2017). [CrossRef]

5. S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon, “KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera,” in, Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, (ACM, 2011), pp. 559–568.

6. T. Whelan, H. Johannsson, M. Kaess, J. J. Leonard, and J. McDonald, “Robust real-time visual odometry for dense RGB-D mapping,” in Proceedings of IEEE International Conference on Robotics and Automation (IEEE, 2013), pp. 5724–5731.

7. P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “RGB-D mapping: Using kinect-style depth cameras for dense 3D modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012). [CrossRef]

8. A. Saxena, S. H. Chung, and A. Y. Ng, “3-D depth reconstruction from a single still image,” Int. J. Comput. Vis. 76(1), 53–69 (2008). [CrossRef]

9. Y. Cao and J. McDonald, “Improved feature extraction and matching in urban environments based on 3D viewpoint normalization,” Comput. Vis. Image Underst. 116(1), 86–101 (2012). [CrossRef]

10. M. Pollefeys, D. Nistér, J.-M. Frahm, A. Akbarzadeh, P. Mordohai, B. Clipp, C. Engels, D. Gallup, S.-J. Kim, P. Merrell, C. Salmi, S. Sinha, B. Talton, L. Wang, Q. Yang, H. Stewenius, R. Yang, G. Welch, and H. Towles, “Detailed real-time urban 3D reconstruction from video,” Int. J. Comput. Vis. 78(2), 143–167 (2008). [CrossRef]

11. Y. K. Cho, Y. Ham, and M. Golpavar-Fard, “3D as-is building energy modeling and diagnostics: A review of the state-of-the-art,” Adv. Eng. Informatics 29(2), 184–195 (2015). [CrossRef]

12. T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Real-time large-scale dense RGB-D SLAM with volumetric fusion,” The Int. J. Robotics Res. 34(4-5), 598–626 (2015). [CrossRef]

13. J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of RGB-D SLAM systems,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE, 2012), pp. 573–580.

14. S. Zhang, D. Royer, and S.-T. Yau, “GPU-assisted high-resolution, real-time 3-D shape measurement,” Opt. Express 14(20), 9120–9129 (2006). [CrossRef] [PubMed]

15. M. Nießner, A. Dai, and M. Fisher, “Combining inertial navigation and ICP for real-time 3D surface reconstruction,” in Proceedings of Eurographics (Short Papers), (ACM, 2014), pp. 13–16.

16. J. Shi and C. Tomasi, “Good features to track,” in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (IEEE, 1994), pp. 593–600.

17. S. Zheng, J. Hong, K. Zhang, B. Li, and X. Li, “A multi-frame graph matching algorithm for low-bandwidth RGB-D SLAM,” Comput. Des. 78(C), 107–117 (2016).

18. Y. An and S. Zhang, “High-resolution, real-time simultaneous 3D surface geometry and temperature measurement,” Opt. Express 24(13), 14552–14563 (2016). [CrossRef] [PubMed]

19. H. Metzmacher, D. Wölki, C. Schmidt, J. Frisch, and C. van Treeck, “Real-time human skin temperature analysis using thermal image recognition for thermal comfort assessment,” Energy Build. 158, 1063–1078 (2018). [CrossRef]

20. G. G. Demisse, D. Borrmann, and A. Nüchter, “Interpreting thermal 3D models of indoor environments for energy efficiency,” J. Intell. & Robotic Syst. 77(1), 55–72 (2015). [CrossRef]

21. D. Borrmann, J. Elseberg, and A. Nüchter, “Thermal 3D mapping of building façades,” in Proceedings of International Conference on Intelligent Autonomous Systems (Springer, 2012) pp. 173–182.

22. S. Lagüela, J. Martínez, J. Armesto, and P. Arias, “Energy efficiency studies through 3D laser scanning and thermographic technologies,” Energy Build. 43(6), 1216–1221 (2011). [CrossRef]

23. W. Nakagawa, K. Matsumoto, F. de Sorbier, M. Sugimoto, H. Saito, S. Senda, T. Shibata, and A. Iketani, “Visualization of temperature change using RGB-D camera and thermal camera,” in Proceedings of European Conference on Computer Vision Workshops (Springer, 2014), pp. 386–400.

24. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis Mach. Intell. 22(11), 1330–1334 (2000). [CrossRef]

25. J.T Lussier and S. Thrun, “Automatic calibration of RGBD and thermal cameras,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE, 2014), pp. 451–458.

26. S. Vidas, R. Lakemond, S. Denman, C. Fookes, S. Sridharan, and T. Wark, “A mask-based approach for the geometric calibration of thermal-infrared cameras,” IEEE Transactions on Instrumentation Meas. 61(6), 1625–1635 (2012). [CrossRef]

27. G. Cardone, A. Ianiro, G. Dello Ioio, and A. Passaro, “Temperature maps measurements on 3D surfaces with infrared thermography,” Exp. fluids 52(2), 375–385 (2012). [CrossRef]

28. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, (Cambridge University, 2003).

29. F. Steinbrücker, J. Sturm, and D. Cremers, “Real-time visual odometry from dense RGB-D images,” in Proceedings of IEEE International Conference on Computer Vision Workshops (IEEE, 2011), pp. 719–722.

30. N. J. Morris, S. Avidan, W. Matusik, and H. Pfister, “Statistics of infrared images,” in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 1–7.

31. K. S. Arun, T. S. Huang, and S. D. Blostein, “Least-squares fitting of two 3-D point sets,” IEEE Transactions on Pattern Analysis Mach. Intell. 9(5), 698–700 (1987). [CrossRef]

32. R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, “g2o: A general framework for graph optimization,” in Proceedings of IEEE International Conference on Robotics and Automation (IEEE, 2011), pp. 3607–3613.

33. B. Zeise and B. Wagner, “Temperature Correction and Reflection Removal in Thermal Images using 3D Temperature Mapping,” in Proceedings of International Conference on Informatics in Control, Automation and Robotics (Springer, 2016), pp. 158–165.

34. Y. Li and M.S. Brown, “Exploiting reflection change for automatic reflection removal,” in Proceedings of IEEE International Conference on Computer Vision (IEEE, 2013), pp. 2432–2439.

	Person 1	Person 2	Object 1	Object 2
RGB	804.0	1005.8	Failed	871.5
ICP	285.7	169.4	113.7	226.8
RGBD-ICP	370.5	187.6	120.3	230.5
V-ICP	187.8	192.9	60.7	74.2
T-ICP	21.3	26.1	15.6	38.6

	Person 1	Person 2	Object 1	Object 2
ICP	0.0578 m	0.0343 m	0.0126 m	0.0192 m
RGBD-ICP	0.0347 m	0.0268 m	0.0119 m	0.0184 m
T-ICP	0.0223 m	0.0191 m	0.0105 m	0.0158 m

	Person 1	Person 2	Object 1	Object 2
RGB	804.0	1005.8	Failed	871.5
ICP	285.7	169.4	113.7	226.8
RGBD-ICP	370.5	187.6	120.3	230.5
V-ICP	187.8	192.9	60.7	74.2
T-ICP	21.3	26.1	15.6	38.6

	Person 1	Person 2	Object 1	Object 2
ICP	0.0578 m	0.0343 m	0.0126 m	0.0192 m
RGBD-ICP	0.0347 m	0.0268 m	0.0119 m	0.0184 m
T-ICP	0.0223 m	0.0191 m	0.0105 m	0.0158 m

Depth and thermal sensor fusion to enhance 3D thermographic reconstruction

Abstract

Corrections

1. Introduction

2. Related work

3. Our system

4. Thermal-guided ICP

4.1. Initial pose estimation

4.2. Pose refinement

5. Experimental results

5.1. Benchmark dataset

5.2. Robustness against large displacement

5.3. Accuracy of 3D reconstruction

6. Conclusions

Funding

References and links

Supplementary Material (2)

Cited By

Figures (10)

Tables (4)

Equations (15)

Optics Express

Name	Description
Visualization 1	Robustness against large displacement
Visualization 2	Accuracy of 3D reconstruction