Laser reflectance feature assisted accurate extrinsic calibration for non-repetitive scanning LiDAR and camera systems

Zhengchao Lai; Yue Wang; Shangwei Guo; Xiantong Meng; Jun Li; Wenhao Li; Shaokun Han

doi:10.1364/OE.453449

1. Introduction

The deployment of multi-sensor fusion techniques for commercial applications has undergone rapid growth in recent decades. Many tasks, such as autonomous navigation [1,2] and remote sensing [3,4], have placed high demands on the reliability of the perception. The multi-sensor fusion is highly effective in addressing these demands. Thus it has gained broad attention in recent years. Among the multiple candidates, the combination of LiDAR and the optical camera is a favorable scheme due to their complementary characteristics. Specifically, the LiDAR-camera system can obtain the scene’s rich intensity and texture information from the visual sensor and perform lighting-invariant accurate 3D perception by the LiDAR, enabling long range detection and high robustness perception. However, to realize the full potential of these superior characteristics, a high-precision extrinsic parameters calibration is the most critical antecedent.

At present, most existing LiDAR-Camera calibration algorithms are designed for the typical multi-line mechanical spinning LiDARs with a repetitive scanning pattern, such as Velodyne VLP-16, HDL-32e, and HDL-64e. This kind of LiDAR has been widely deployed in the self-driving and photography industries [5]. Meanwhile, the high demands in these industries gave rise to the emergence of a series of light-weighted LiDARs with irregular scanning models, namely non-repetitive scanning patterns. The unique non-repetitive scanning pattern means that, such LiDARs can obtain high-resolution point cloud by integrating the continuous scans in the time domain, and deliver more details of the object in the Field of View(FoV) as the image does. The DJI Livox MID-40 [6] series LiDAR is a representative non-repetitive scanning LiDAR. The Livox MID-40 has a circular FoV of $38.4^{\circ}$, and the time-domain point cloud integration enables it to fill up to 20% FoV at 0.1 second and 90% at 1 second. It brings new opportunities to the LiDAR applications due to the advantages of long detection distance (up to 260m), high range precision (2 cm), and dramatically low cost ($ \$ $599).

Although non-repetitive scanning LiDAR has many advantages, there are challenges ahead, especially in the extrinsic calibration task. Solving the LiDAR-Camera extrinsic calibration problem can be summarized as aligning features from different data. Essentially, the difficulty lies in accurately extracting and matching the common features in RGB image and point cloud. There are substantial differences in characteristics in both sensor data. The RGB image provides rich texture details along object boundaries with intensity variations, which is an easily accessible visual feature. On the other hand, the LiDAR data provides sparse geometric features, which can be extracted based on the spatial arrangement of 3D points within the local neighborhoods [7]. However, the geometric features behave differently with respect to the search radius, point density [8]. Thus, the geometric feature extraction is highly sensitive to the scene or target, and associating the 3D geometric feature with the 2D image feature is difficult. Another idea is to utilize laser reflectance as the point cloud feature, which is the fourth dimension of LiDAR data. This scheme takes advantage of the distinct laser reflectance distribution characteristic in white and black blocks. 3D corner extracting and matching tasks can be far more robust and accurate by using a chessboard as a reference target. However, for the non-repetitive scanning LiDAR, due to the non-uniformity scanning pattern, reflectance intensity feature extraction encounters challenges of the inconsistency of point cloud density and reflectance intensity distribution. This problem has not been adequately studied and remains an unresolved challenge [9]. In addition, non-repetitive scanning LiDAR usually has a small Field of View(FoV) compared to the spinning LiDAR. The smaller FoV, the smaller overlap among sensors, thus the less position the target can be placed. A reduced common view area results in feature aggregation in 3D space, which may induce insufficient constraints to the extrinsic parameters solving. Thus the solutions may converge to local optima, and leads to a suboptimal extrinsic estimate. A higher precision point cloud feature extracting is demanded in this case.

To address the challenges mentioned above, we proposed a reflectance intensity-assisted automatic LiDAR-Camera calibration method, which uses only a printed chessboard as the calibration target and toward high-precision estimation. The main contribution of the proposed method can be summarized as follows:

• A GMM-based intensity clustering strategy is utilized to segment the point cloud adaptively in different blocks. The key idea is to exploit the characteristic of laser reflectance intensity exhibits distinct characteristics of a bimodal distribution in white and black blocks of the chessboard.
• A novel nonlinear-optimization-based grid fitting method is proposed to refine the 3D corner from the non-uniformity point cloud. By generating a standard grid to fit the chessboard point cloud, and applying the edge constraints on the intensity of the block point cloud, we are able to iteratively find the best align pose between the standard grid and the chessboaed point cloud.
• Iterative Lowest-cost Pose(ILP) algorithm is proposed to solve the 2D pose align optimization problem. In each iteration, ILP search new neighbor object points under the updated pose to update the cost.

Furthermore, we analyzed a high volume of indoor and outdoor LiDAR data and proposed a Lidar-camera data simulation kit. This toolkit enables us to synthesize various LiDAR and camera data with perfect ground-truth relative pose. In terms of engineering, we have fully considered the processing efficiency and implemented the approach by C++, the source code and the simulation kit are available as open-source software at GitHub:[10]. We believe that this framework has substantial potential benefits to the robotics community and automotive industry.

2. Related works

Previously, numerous methods have been proposed to solve extrinsic calibration problems involving LiDAR-Camera systems. These methods can be grouped into two broad classes according to presence or absence of known targets in the scene. The target-based methods calculate the extrinsic parameter by aligning geometric and visual features extracted from flat targets (such as chessboards and polygons) or 3D targets(such as cubes and balls). On the other hand, the target-less method mainly fuse the informations directly obtained from the raw scene and finding connections of the data from different sources.

As a recently developed novel idea, the target-less method received great attention benefited from its high convenience and wide adaptability, but often comes at the expense of accuracy. Ishikawa et al. [11] converted the problem of solving extrinsic parameters into a trajectory registration problem, which does not need to extract the common features. However, if the structural features of the scene are not sufficient enough, the trajectory obtained by LiDAR is neither reliable nor accurate for calibration, since the ICP algorithms applied by LiDAR-odometry suffer from degeneration. Pandey et al. [12] innovatively introduced mutual information(MI) based calibration method. They extimated the extrinsic parameters by maximizing the mutual information obtained between the sensor-measured surface gray-scale intensity and reflectance intensity. Subsequent work [13–15] inherited this idea and mainly improved in aspect of optimization method and feature correlation. Another branch type of targetless method is to directly use geometric features (i.e. 3D edge and plane) existed in the nature scene. Representative works such as the online automatic calibrate system proposed by Levinson et al. [16], they correct sensor drift by minimizing align resudual of corresponding laser edges and image edges. Since they only considered the vertical edge features, errors from other directions can not be fully constrained, result in large variance in the complex scene. To solve these problems, Zhang et al. [17] further used line features as align target and obtained relatively higher accuracy. Similarly, Yuan et al. [18] proposed pixel-level extrinsic self calibration method and revealed that depth-discontinuous edge features should be excluded to be a reference for calibration as they are not reliable. Their method achieved target-based-level accrate result, while the premise is that the scene must have rich architectural characteristics, which limits its convenience as an target-less method.

The target-based calibration methods are the most common approaches due to its relatively higher accuracy. Zhang et.al [19] proposed a method which innovatively exploits a planar chessboard pattern as the target, and calibrate extrinsic parameters between camera and range finder using plane constraints. Thereafter, Unnikrishnan et al. [20] followed up this method and extended to calibrate 3D laser scanner and an perspective camera system. Later, Pandey et al. [21] further extended to use omnidirectional camera. They applied constrains between normal of the planner surface and the relative position of the laser scanner to form a non-linear optimization problem. Since then, many methods based on this framework have been derived, most of which is improved on the calibration target to make the feature extraction more robust and accurate. such as polygonal boards [22–24], circular boards [25], multiple plane [26,27]. In some method, circular holes are made on the plane plate as the target, and then the center of the circle, i.e. the calibration reference, is determined by detecting the edge of the circle hole. [28,29]. The methods based on target geometric features are over-reliance on edge detection, which makes them unavoidably suffer from the issue of the "bleeding points" [18], since the point on depth-discontinuous edges of the board are not reliable. Wang et al. [30] proposed ILCC method, this method creatively extract the 3D coordinates of the corners on the chessboard based on the reflectance intensity of the points, which is a constraint beyond the geometric feature. Their method has been deployed for the collection of autonomous driving dataset and achieved reprojection error of $<$ 2 pixels [31]. However, this method are designed for repetitive scanning LiDARs, and is very time consuming when the point cloud comes to be dense. Cui et al. proposed ACSC [9], which further improved the framework of ILCC and applied to calibrate the non-repetitive scanning LiDAR-Camera. According to their experiment results, ACSC method achieved state-of-the-art sub-pixel-level precision, and for the first time, introduced a reflectance intensity-based method to calibrate a non-repetitive scanning LiDAR system.

Our method is most similar to the works of ACSC and ILCC. From the perspective of algorithm theory, the primary difference between our method and other method lies in feature extraction. The ACSC and ILCC directly apply the corners as the constrain and apply all the planner points to conduct the extraction cost. In contrast, our method applied constrained grid to fit the point cloud and construct the matching cost from neighbour points of the block grid. This scheme outperforms the other method in three aspects: (1)the number of grid constrain is nearly two times more than the corner constrain, makes the alignment of our method more accurate and robust. (2)cost constructed from neighbor points enables us to solve the point cloud nonuniform problem without using a uniform filter which could introduce potential shift error. (3)like the other methods, the cost function we construct is also spatially discrete. However, we approximate the gradient of the cost function by extending the field range, which allows us to use Gauss-Newton methods to conduct faster and more stable optimization. In addition, in terms of engineering implementation, we analyzed the error sources of point cloud from the basic principle-level of LiDAR, and use several corresponding strategies to reduce the impact on feature extraction. We will compare the three methods mentioned above in the experiment to further evaluate and illustrate the advantages of our method.

3. Methdology

3.1 Notation

The notation used to describe the algorithm is defined as follow. Except where otherwise noted, we use upper-case letters represent 3D points and vectors, lower-case letters represent 2D points and vectors, lower-case italics letters represent the scalars. The sets will be represented by coresponding boldface letters. $\mathrm {P}=\left [ x,y,z \right ] ^{\top }$ is the coordinates of a 3D point, and $\mathbf {P}=\left \{ \mathrm {P}_1,\mathrm {P}_2,\ldots \right \}$ is the set of 3D points, i.e the point cloud. Similarly, let $\mathrm {x}_c=\left [ u,v \right ] ^{\top }$ and $\mathbf {x}_c=\left \{ \mathrm {x}_{c1},\mathrm {x}_{c2},\ldots \right \}$ be the coordinate of 2D pixel point and its sets respectively. A 3D point $\mathrm {P}$ can be projected into the pixel coordinate through the perspective projection function: $\mathrm {x}_c=\pi \left ( \mathrm {\bar {P}} \right ) =[ \mathrm {K}\left ( x_{\mathrm {p}}/z_{\mathrm {p}},y_{\mathrm {p}}/z_{\mathrm {p}},1 \right ) ^{\top } ] |_{0:1}$, where $\mathrm {\bar {P}}$ is the normalized homogeneous coordinates corresponding to $\mathrm {P}$, $\mathrm {K}$ is the camera intrinsic matrix. Accordingly, $\mathrm {\bar {P}}=\pi ^{-1}\left ( \mathrm {x}_c \right )$ is the inverse mapping of the projection process. For simplification, we use $\left. \cdot \right |_{0:1}$ denotes the vector formed by the first two element of the vector $\left ( \cdot \right )$. Use $\mathrm {L}$ denotes the line in space defined by the direction vector $n_{\mathrm {L}}$ and an inlier 3D point $\mathrm {P}_\mathrm {L}$. $\mathbf {L}=\left \{ \mathrm {L}_1,\mathrm {L}_2,\ldots \right \}$ is the set of lines in space. Let $\mathrm {T}_{cl} \in \mathbb {S}\mathbb {E}\left ( 3 \right )$ be the transformation matrix of the 3D point from the LiDAR coordinate system $\mathcal {C}_{lidar}$ to the camera coordinate system $\mathcal {C}_{camera}$. $\mathrm {R}_{cl}$ and $\mathrm {t}_{cl}$ are the coresponding rotation matrix and translation vector respectively. The 3D rigid-body transformation process is defined as: $\mathrm {P}^c=\mathcal {T}_{3D}\left ( \mathrm {T}_{cl},\mathrm {P}^l \right ) =\mathrm {R}_{cl}\cdot \mathrm {P}^l+\mathrm {t}_{cl}$. Similarly, 2D rigid-body transformation process is defined as: $\tilde {\mathrm {p}}=\mathcal {T}_{2D} \left ( \xi,\mathrm {p}\right )$.

3.2 Proposed method overview

In this section, we briefly introduce the framework of the proposed method. The modular structure overview as shown in Fig. 1, which contains several modules denoted by different color.

Fig. 1. Overview of the proposed method.

Download Full Size | PDF

A printed planar chessboard pattern composed of $N_w\times N_h$ black-and-white squares is applied as a calibration target. The target was placed in a region where can be simultaneously observed by the LiDAR and camera. As the module A in Fig. 1 indicated, to obtain a more informative LiDAR measurement of the target, we applied time-domain integration in point cloud acquisition, namely accumulating multiple LiDAR frames in the same static scene. In the module B, the visual feature extraction is carried out by using the OpenCV (Intel open-source computer vision library), and Zhang’s [32] calibration method is utilized to obtain the intrinsic parameters of the camera. The core part of our method can be summarized into two stages: feature extraction and feature alignment. A coarse-to-fine strategy is applied in feature extraction stage. In coarse extraction part, we first segment the chessboard from the point cloud and then perform the rough extraction of the 3D corner points. These two steps correspond to the module C and D, and will be introduced in Section 3.3. In fine extraction part, the coarsely extracted 3D corners are used as an initial position for the grid fitting refinement, as is illstrated by the module E. This part is introduced in detail in Section 3.4. Finally, in the backend feature alignment part, i.e, module F, the extrinsic is solved by using PnP and nonlinear-optimization method, which will be introduced in Section 3.5.

3.3 Chessboard and 3D corner extraction

3.3.1 Chessboard segmentation

To extract the chessboard point cloud from the dense raw point cloud obtained by time-domain integration, we first deployed a widely used planar segmentation algorithm implemented by Point Cloud Library [33]. This algorithm can effectively split the dense point cloud into several segments distinguished from planarity and bounds.

In practice, the calibrating scene might be complex in practice, the segments contain background walls, floors, and surrounding debris. Two conditions are applied to reject the distractors and extract the target chessboard segments. Firstly, the angle between the normal vector of the target segment plane and the sensor optical axis is below $45 ^{\circ }$. Secondly, the segment’s point cloud number is above $N_{thd}$. In practice, the small pieces of noise segments can be rejected by setting $N_{thd}$ to $1\%$ of the total number of the raw point cloud. Meanwhile, we calculate the center of the segments which conform to the above conditions, and the closest ones can be identified as the chessboard point clouds. This strategy worked robust enough to segment and extract the chessboard point cloud in different scenarios.

3.3.2 3D corner extraction

In this subsection, we will introduce how to roughly extract corners from the segmented chessboard point cloud. We will first analyze the noise characteristics of LiDAR to facilitate the description of the strategies used in our method.

According to the principles of LiDAR, each laser beam has a certain divergence angle (Livox MID-40: Vertical 0.28$^\circ \times$ Horizontal 0.03$^\circ$), as is illustrated by the yellow beam in Fig. 2. Each measurement is affected by the pulse energy and duration of a certain reflected spot area on the object [34,35]. Consequently, as is also indicated in [18] and [36], large measurement deviation usually results from two conditions: (i)LiDAR scans the object which has a discontinuities structure. The peak energy of echo pulse may be affected while laser scans to the edge of the object, since the the spot is splited into different depth by fault edges. (ii)LiDAR scans the object with low reflectivity materials. Some materials, such as grass and high optical absorption black materials, may result in low reflect pulse energy, causing the peak energy of echo pulse ambiguity.

Fig. 2. Simplified process of single scan line ranging of LiDAR with divergence angle. The ellipse represents the section perpendicular to the optical axis of the laser beam. The deeper color represents a higher intensity of the reflected laser. Two conditions of axial ranging instability on the calibration plate point cloud are shown in area A and B.

Download Full Size | PDF

In this study, the edge parts of the segmented chessboard point cloud exhibited large noise(i.e., the A area in Fig. 2), which corresponds to the (i) condition mentioned above. The actual distance of the intersection point (i.e., on the centroid optic axis) may not be accurately measured by the peak value of the laser echo energy. The echo energy depends on which part of the splited spot is dominant in reflectivity, since the laser detector may receive two echo pluse reflacted from the foreground and the background objects respectively. Therefore, there are some extra points exceeds the actual boundary of the calibration plate(i.e., A area in Fig. 2), those red points often called "the bleeding points". In this study, calibration plate are designed to reserve some redundant white space along the edge. The bleeding points can be rejected by edge cutting, thus its interferences is reduced. On the other hand, the black part of the chessboard absorbed most of the laser energy, which resulted in low reflectivity (i.e., the (ii) condition mentioned above). The corresponding point cloud exhibited numerical instability, as is illustrated by the B area in Fig. 2. Therefore, we preferred to trust those of higher reflectance intensity points that fall on the white blocks of the chessboard. To roughly distinguish points of black and white blocks of the chessboard, an intensity threshold $I_{thd}$ needs to be defined adaptively.

The grayscale of object surface brightness is positively correlated with the laser reflectance intensity [37]. There are only two grayscale values in the pattern, black and white. Thus, theoretically, the histogram of reflectance intensity exhibits characteristics of a bimodal distribution, as is shown in Fig. 3(e). Hence, we assume that the distribution of the point intensity conforms to the two components Gaussian Mixture Distribution Model(GMM). The probability density function of the intensity is defined as Eq. (1), which is represented as a weighted sum of Gaussian densities:

(1)$$p\left( I \right) =\pi _w p_w\left( \left. I \right|\mu _{w}^{r},\sigma _{w}^{r} \right)+\pi _b p_b\left( \left. I \right|\mu _{b}^{r},\sigma _{b}^{r}\right), \pi _w + \pi _b=1 ,$$

where $I$ is the reflectance intensity, $\pi _w$ and $\pi _b$ are the distribution proportions of black and white point cloud respectively. The distributions of point intensity in white blocks $\mathcal {N}(\mu _{w}^{r},\sigma _{w}^{r})$ and black blocks$\mathcal {N}(\mu _{w}^{r},\sigma _{w}^{r})$ can be obtained by fitting to GMM through the Expectation Maximization (EM) algorithm [38]. Thus, let $I_{thd}=\mu _{b}^{r}+3\times \sigma _{b}^{r}$ be the intensity threshold that can adaptively distinguished points of black and white blocks. We use the plane model to fit the white block points iteratively. The corresponding plane parameter can be obtained by minimizing the sum of points distance to the fitting plane. Then, we project all the points to this plane along the optic axis ray to make all points fall on it. The target point cloud is closest to form the actual plane model, and the 3D point process can be converted into a 2D point process henceforth. All of the above processes correspond to step 1 in Fig. 3, and the pre-processed chessboard point cloud in the LiDAR coordinate system were denoted as $\mathbf {P}_l$.

Fig. 3. (a) Raw chessboard point cloud segmented by planar segmentation. (b) Projected points lies on the fitted plane model. (c) Segmented grid blocks distinguished using different colors. (d) Coarsely extracted 3D corners. (e) Histgram of the point cloud reflectance intensity.

Download Full Size | PDF

In the second step of the procedure, the black blocks of the chessboard are extracted from $\mathbf {P}_l$ by the intensity threshold $I_{thd}$. Then the blocks are segmented by using Euclidean-distance-based clustering method [33], the results as is shown in Fig. 3(c), the block segments are distingulished by different colors, the centroid point of each block are calculated and denoted by black point. In step3, by calcurating the midpoint of centroids of adjacent black blocks one another, all the 3D corner points can be obtained. Note that since the reflectance and number of point in each block area may not be uniform, thus the corners obtained in this section are used as the initial value for further optimization in the next subsection.

3.4 Refinement of chessboard grid features

This section presents our approach to refining the grid corner features using the nonlinear-optimization method. After preprocessing procedure described in the previous sections, planner board point cloud $\mathbf {P}_l$ and the corresponding unit normal vector $n_{board}$ are obtained. The initial 3D corner points is roughly extracted, we index them from the upper left to the lower right corner and denote as $\mathbf {P}_{corner}=\left \{ \mathrm {P}_{r,c}|r=0,1,\ldots,N_h-1;c=0,1,\ldots,N_w-1 \right \}$. By fitting the points in the same line, the unit direction vector of each grid row and column can be obtained: $n_{row}^{r},n_{col}^{c}$. To create a body coordinate system of the chessboard $\mathcal {C}_{board}$, choose a corner $\mathrm {P}_{k,m} \in \mathbf {P}_{corner}$ that close to center of the board as the origin, and the $k$-th row direction vector as the $x$ axis. Let $n_{board}$ be the $z$ axis so that the board plane to be the $xoy$ coordinate plane. According to the orthogonal relationship, the $y$ axis is $n_{board}\times n_{row}^{k}$. The three axes of $x,y,z$ are normalized and orthogonal to each other, as is illstrated in Fig. 4.

Fig. 4. The transform of $\mathbf {P}_l$ from $\mathcal {C}_{lidar}$ to $\mathcal {C}_{board}$. Since in $\mathcal {C}_{board}$, all points fall on the $xoy$ plane, $\mathbf {P}_l$ can be treated as 2D sets $\mathbf {p}_b$. The relative positions of the camera and LiDAR are setup as shown in the coordinate system on the right.

Download Full Size | PDF

Then transform of $\mathbf {P}_l$ from $\mathcal {C}_{lidar}$ to $\mathcal {C}_{board}$ is applied by $\mathbf {P}_b=\mathcal {T}_{3D}\left ( \mathrm {T}_{bl},\mathbf {P}_l \right )$, where:

(2)$$\mathrm{T}_{bl}=\left[ \begin{matrix} \mathrm{R}_{lb}^{\top} & -\mathrm{R}_{lb}^{\top}\mathrm{t}_{lb}\\ 0^{\top} & 1\\ \end{matrix} \right], \mathrm{R}_{lb}=\left[ \begin{matrix} n_{row}^{k} & n_{board}\times n_{row}^{k} & n_{board}\\ \end{matrix} \right] , \mathrm{t}_{lb}=\mathrm{P}_{k,m}.$$

Since the converted points $\mathbf {P}_b$ all fall on the $xoy$ plane of $\mathcal {C}_{board}$, we take the $x$ and $y$ coordinates of $\mathbf {P}_b$ to form a 2D point set $\mathbf {p}_b$, and all subsequent processing of this section is performed on $\mathbf {p}_b$. According to the actual size of the chessboard, we generated a standard grid on the $xoy$ plane of $\mathcal {C}_{board}$. The extraction of 3D corners can be turned into the 2D grid alignment problem.

The reflectance intensity characteristics of the chessboard point cloud is the key information to conduct the grid alignment. For ease of understanding, the reflectance intensity of $\mathbf {p}_b$ is modeled as a 2D surface w.r.t $x$ and $y$ variables, as is shown in Fig. 5. Obviously, the reflectance intensity at the white and black blocks exhibits a large difference, and it is extremely steep around the grid lines. Inspired by this characteristic, we apply the reflectance intensity gradient to form the grid align cost.

Fig. 5. Modeled 2D surface of point cloud reflectance intensity w.r.t $x$ and $y$ coordinate.

Download Full Size | PDF

The Fig. 6 illstrates the grid align process. The midpoint of the horizontal side of each standard grid block is defined as $\mathrm {p}^{row}\in \mathbf {p}_{\mathrm {std}}^{row}$, which is indicated by yellow dots in Fig. 6. Correspondingly, the vertical one is defined as $\mathrm {p}^{col}\in \mathbf {p}_\mathrm {std}^{col}$ and is indicated by green dots. The two sets formed the reference point set $\mathbf {p}_{\mathrm {std}}=\mathbf {p}_{\mathrm {std}}^{col}\cup \mathbf {p}_{\mathrm {std}}^{row}$. In each iteration, the target point cloud $\mathbf {p}_{\mathrm {b}}$ is inserted into a k-dimensional tree(KDTree). Thereafter, the neighbour points with distance within $r$ of each reference point $\mathrm {p} \in \mathbf {p}_{\mathrm {std}}$ are extracted by KDTree based searching. The neighbour points lies in the left side of $\mathrm {p}$ is denoted as set $\mathcal {L}_{r}^\mathrm {p}$, right side $\mathcal {R}_{r}^\mathrm {p}$, upper side $\mathcal {U}_{r}^\mathrm {p}$ and lower side $\mathcal {D}_{r}^\mathrm {p}$, As is illstrated in Fig. 6.

Fig. 6. Generated standard grid(the cyan grid) and point cloud (mapped with pseudo-color w.r.t reflectance intensity) alignment process. The row and column grid reference points are represented by yellow and green dot respectively. The neighbour points used for calculating the cost are further bolded by intensity mapped gray value.

Download Full Size | PDF

If the generated standard grid can perfectly fit the features of $\mathbf {p}_{\mathrm {b}}$ through $\xi$ transformation, for any reference point $\mathrm {p}\in \mathbf {p}_{\mathrm {std}}^{col}$, the difference of the sum of reflectance intensity in the left and right side regions should reach the maximum value. Meanwhile, the corresponding differential value of the upper and lower side regions should reach the minimum. The opposite is true for reference point $\mathrm {p}\in \mathbf {P}_{\mathrm {std}}^{row}$. Therefore, the difference of the sum of reflectance intensity between the upper and lower, left and right can be utilized as the optimization cost. The optimization variables are the 2D pose transformation matrix: $\xi =\left [ t,\theta \right ] ^{\top }\in \mathfrak {s}\mathfrak {e}\left ( 2 \right )$, $t=\left [ \triangle x,\triangle y \right ] ^{\top }\in \mathbb {R}^2$, where $\triangle x,\triangle y$ are the displacement and $\theta$ are the deflection angle around the $z$ axis. Grid alignment process is equivalent to minimizing the cost w.r.t reflectance intensity difference by inerating the 2D pose $\xi$. The optimization problem is defined as follows:

(3)$$\xi ^*=\mathrm{arg} \min_{\xi} Cost\left( \xi ,\mathbf{p}_{\mathrm{std}} \right).$$

Since the grid align problem is an iterative optimization problem, in each iteration process, we need to transform the standard grid (i.e. transform the 2D reference points $\mathrm {p} \in \mathbf {p}_{\mathrm {std}}$ to evaluate the cost of $\xi$ in current iteration. The 2D transformation process for the reference point $\mathrm {p}$ is defined as:

(4)$$\mathrm{\tilde{p}}=\mathcal{T}_{2D}\left( \xi ,\mathrm{p} \right) \\ =\left. \begin{array}{c} \left( \xi ^{\land}\cdot \left[ x_{\mathrm{p}} \quad y_{\mathrm{p}} \quad 1 \right] ^{\top} \right)\\ \end{array} \right|_{0:1} ,$$

where $\mathrm {\tilde {p}}$ is the transformed point, and $\xi ^{\land }$ is the 2D pose transformation matrix:

(5)$$\xi ^{\land}=\left[ \begin{matrix} \cos\mathrm{\theta} & -\sin\mathrm{\theta} & \bigtriangleup x\\ \sin\mathrm{\theta} & \cos\mathrm{\theta} & \bigtriangleup y\\ 0 & 0 & 1\\ \end{matrix} \right] \in \mathbb{S}\mathbb{E}\left( 2 \right).$$

The cost function is defined as:

(6)$$\begin{aligned} Cost\left( \xi ,\mathbf{p}_{\mathrm{std}} \right) &=\sum_{\mathrm{p}\in \mathbf{\tilde{p}}_{\mathrm{std}}}{C_{r}^{\mathrm{p}}}, \\ \mathbf{\tilde{p}}_{\mathrm{std}}&=\left\{ \left. \mathcal{T}_{2D}\left( \xi ,\mathrm{p}_{\mathrm{std}} \right) \right|\mathrm{p}_{\mathrm{std}}\in \mathbf{p}_{\mathrm{std}} \right\}, \end{aligned}$$

where:

(7)$$\begin{aligned} C_{r}^{\mathrm{p}}=&\mathrm{\gamma}\left[ C\left( \mathcal{R}_{r}^{\mathrm{p}} \right) -C\left( \mathcal{L}_{r}^{\mathrm{p}} \right) \right] ^2-\mathrm{\gamma}\left[ C\left( \mathcal{U}_{r}^{\mathrm{p}} \right) -C\left( \mathcal{D}_{r}^{\mathrm{p}} \right) \right] ^2 \\ =&\mathrm{\gamma}\left( \frac{1}{\left\| \mathcal{R}_{r}^{\mathrm{p}} \right\|}\sum_{\mathrm{p}_b\in \mathcal{R}_{r}^{\mathrm{p}}}{I_{\mathrm{p}_b}}-\frac{1}{\left\| \mathcal{L}_{r}^{\mathrm{p}} \right\|}\sum_{\mathrm{p}_b\in \mathcal{L}_{r}^{\mathrm{p}}}{I_{\mathrm{p}_b}} \right) ^2- \mathrm{\gamma}\left( \frac{1}{\left\| \mathcal{U}_{r}^{\mathrm{p}} \right\|}\sum_{\mathrm{p}_b\in \mathcal{U}_{r}^{\mathrm{p}}}{I_{\mathrm{p}_b}}-\frac{1}{\left\| \mathcal{D}_{r}^{\mathrm{p}} \right\|}\sum_{\mathrm{p}_b\in \mathcal{D}_{r}^{\mathrm{p}}}{I_{\mathrm{p}_b}} \right) ^2, \end{aligned}$$

(8)$$\gamma= \begin{cases} -1 & \mathrm{p}\in \mathbf{\tilde{p}}_{\mathrm{std}}^{col}\\ 1 & \mathrm{p}\in \mathbf{\tilde{p}}_{\mathrm{std}}^{row}. \end{cases}$$

In brief, the cost function in Eq. (7) measured the difference between the expected and the actual intensity gradient in x and y directions. The idea of calcurate the directional gradient of the cost function is similar to the method often used for calculating the pixel gradient in the image. Since they utilized the average neighbour value in the discontinuous intensity field. Likewise, to calculate the partial derivatives of the cost function in the x and y directions on the intensity field, the point cloud search radius is expending to $r_g=\sqrt {2}r$, such that the area of target neighbour points for calcurating the cost is doubled alone x and y direction. The partial derivatives of the costs w.r.t $x$ and $y$ can be obtained by:

(9)$$\frac{\partial C_{r}^{\mathrm{p}}}{\partial x}=2\gamma \cdot \left[ C\left( \mathcal{R}_{r_g}^{\mathrm{p}} \right) -C\left( \mathcal{R}_{r}^{\mathrm{p}} \right) -C\left( \mathcal{L}_{r_g}^{\mathrm{p}} \right) +C\left( \mathcal{L}_{r}^{\mathrm{p}} \right) \right] ,$$

(10)$$\frac{\partial C_{r}^{\mathrm{p}}}{\partial y}=2\gamma \cdot \left[ C\left( \mathcal{U}_{r_g}^{\mathrm{p}} \right) -C\left( \mathcal{U}_{r}^{\mathrm{p}} \right) -C\left( \mathcal{D}_{r_g}^{\mathrm{p}} \right) +C\left( \mathcal{D}_{r}^{\mathrm{p}} \right) \right] .$$

The derivative of the cost w.r.t 2D point $\mathrm {p}$ is defined as:

(11)$$\frac{\partial Cost\left( \xi ,\mathbf{p}_{\mathrm{std}} \right)}{\partial \mathrm{p}}=\sum_{\mathrm{p}\in \mathbf{p}_{\mathrm{std}}}{\left[ \frac{\partial C_{r}^{\mathrm{p}}}{\partial x},\frac{\partial C_{r}^{\mathrm{p}}}{\partial y} \right]}.$$

According to Eq. (4), the derivative of the 2D coordinates w.r.t the pose $\xi$:

(12)$$\frac{\partial \mathrm{p}}{\partial \xi}=\left[ \begin{matrix} 1 & 0 & -x\cos\mathrm{\theta}-y\sin\mathrm{\theta}\\ 0 & 1 & x\sin\mathrm{\theta}-y\cos\mathrm{\theta}\\ \end{matrix} \right].$$

By appling the chain rule, i.e.,combine Eq. (11) and Eq. (12), the Jacobian matrix can be obtained:

(13)$$J=\frac{\partial Cost\left( \xi ,\mathbf{p}_{\mathrm{std}} \right)}{\partial \xi}=\frac{\partial Cost\left( \xi ,\mathbf{p}_{\mathrm{std}} \right)}{\partial \mathrm{p}}\cdot \frac{\partial \mathrm{p}}{\partial \xi}.$$

Thus, the delta value and iterate direction can be obtained by using the Gauss-Newton algorithm with Levenberg-Marquardt (LM) damping.

By minimizing the cost in discrete field, the best grid align pose can be obtained. In engineering implementation, the Iterative Lowest-cost Pose(ILP) algorithm is proposed to conduct the alignment. Inspired by the Iterative Closest Point(ICP) algorithm, ILP algorithm applied KD-tree-based point searching method to update neighbour points and the cost in each iteration. This scheme runs efficiently since it only calculates the norm cost in neighbourhood of the reference points. The pseudo-code of the ILP is presented in Algorithm 1.

Algorithm 1. Grid Line Feature Optimization by ILP algorithm

View Table | View all tables in this article

The align results as shown in Fig. 7. By applying the inverse transformation of $\xi$ and $\mathrm {T}_{bl}$, all feature lines is transformed from $\mathcal {C}_{board}$ to $\mathcal {C}_{lidar}$, and denoted as $\mathbf {L}_{grid}$ to facilitate the use in 3D-2D feature alignment process later.

Fig. 7. (a) is the coarse align result, and (b) is the refined align result, which is satisfactory from both rows and columns direction.

Download Full Size | PDF

3.5 Feature alignment and extrinsic optimization

In this section, we will describe the calucation and optimization process of extrinsic parameter. This process can be considered as the alignment of the features extracted from image and point cloud.

The visual features, i.e. the 2D pixel corners in image, are denoted as $\mathbf {x}_{c}$. By appling the constrain of the actual size of grid block, the 3D corners on grid $\mathbf {L}_{grid}$ can be obtained and denoted as $\mathbf {P}^{*}_{corner}$. Then, the calibration is converted to a typical pose estimation problem, thus the extrinsic parameter can be solved by Perspective-n-Point(PnP) algorithm [39]. The result is used as the initial value of optimization and denoted as $\mathrm {T}_{cl}^{0}$. Note that as long as the points in $\mathbf {P}^{*}_{corner}$ and $\mathbf {x}_{c}$ are matched in pre-defined sequence, we can use multi-frame to apply a joint optimization to reduce the error. For the 3D corner point $\mathrm {P}_l\in \mathbf {P}_{corner}^{*}$ in the LiDAR coordinate system, the coresponding back-project normalized point on the camera coordinate system can be obtained through the transforming process w.r.t relative pose: $\mathrm {\bar {x}}_{c}^{\prime }=\left. \bar {\mathcal {T}_{3D}}\left ( \mathrm {T}_{cl},\mathrm {P}_l \right ) \right |_{0:1}$. The Euclidean distance between the projection points $\mathrm {\bar {x}}_{c}^{\prime }$ and the matched 2D normalized point $\mathrm {\bar {x}}_{c}=\pi ^{-1}\left ( \mathrm {x}_c \right )$ is considered as the residual, which is called the reprojection error.

As is also noted in [9] and [30], the scale of reprojection error may be affected by the corner distance, since the pixels in the camera are discrete. The farther the chessboard corner point is, the less accurate the reprojection error may be measured. To reduce this effect, we applied the noramized reprojection error as residual, which is defined as:

(14)$$E_{prj}\left( \mathrm{T}_{cl},\mathbf{x}_c,\mathbf{P}_{corner}^{*} \right) =\sum_{\left( \mathrm{P}_l,\mathrm{x}_c \right) \in \left( \mathbf{P}_{corner}^{*},\mathbf{x}_c \right)}{\frac{d_{\max}}{\left\| \mathcal{T}_{3D}\left( \mathrm{T}_{cl},\mathrm{P}_l \right) \right\|}\left\| \pi ^{{-}1}\left( \mathrm{x}_c \right) -\left. \bar{\mathcal{T}_{3D}}\left( \mathrm{T}_{cl},\mathrm{P}_l \right) \right|_{0:1} \right\|},$$

where $\left ( \mathrm {P}_l,\mathrm {x}_c \right ) \in \left ( \mathbf {P}_{corner}^{*},\mathbf {x}_c \right )$ indicates that $\mathrm {P}_l$ and $\mathrm {x}_c$ are matched pair of 3D-2D corner point. Traverse all the corners in $\mathbf {P}_{corner}^{*}$ to apply $\mathrm {T}_{cl}$ transform. Find out the corner point that is farest from camera and denote the distance as $d_{\max }$. Such that the optimization rely more on the reprojection error evaluated by the closer point, since the weight is inversely proportional to the point distance. The optimization problem can be formulated as:

(15)$$\mathrm{T}_{cl}^{*}=\mathrm{arg} \mathop {min} _{\mathrm{T}_{cl}}E_{prj}\left( \mathrm{T}_{cl},\mathbf{x}_c,\mathbf{P}_{corner}^{*} \right) ,$$

By applying Levenberg-Marquardt algorithm again to iterativly solve the problem, the final optimal solution extrinsic parameters are obtained and denoted as $\mathrm {T}_{cl}$.

4. Experiment

To demonstrate the feasibility and accuracy of the proposed method, simulation and real-world experiments were performed to evaluate the calibration results in this chapter. A Gazebo-based LiDAR-Camera simulation test suite was proposed in the simulation experiment, the data simulation process of a variety of non-repetitive scanning LiDAR is presented in section 4.1.1 and 4.1.2. Synthetic data enables us to evaluate the 3D corner estimation the absolute error of extrinsic parameters. The evaluation results is presented in section 4.1.3 and 4.1.4. The method has also been tested in a real-world indoor and outdoor environment, the evaluation and final results is presented in section 4.2.

4.1 Simulation experiment

4.1.1 Establishment of simulation environment

As a open-source 3D robotics simulator, Gazebo can simulate the data acquisition process of various sensors, including camera and LiDAR. With it’s powerful physics engine, Gazebo simulator enables us to simulate the LiDAR and camera data with perfect ground truth. As is shown in Fig. 8(a), we built a virtual environment contains our chessboard, Livox LiDAR, and camera. For cross method comparison, we also built the calibration setups for the other reflectance-based calibration method, i.e., ILCC [30] and ACSC [9], according to their literature, as is shown in Fig. 8(b). The size of chessboard square and sensor intrinsic parameters is consistent with those used in the real-world experiment. LiDAR and camera are rigidly connected, and their relative pose, i.e., the final extrinsic parameters, are known. Some extra interferences target are placed, the lighting conditions and the surrounding environment are as much as possible to simulate the real-world environment.

Fig. 8. Simulated scene: (a) our calibration chessboard setup. (b) ILCC and ACSC’s calibration chessboard setup. Chessboard setup in (b) is generated according to their literature. To make a fair comparison, the two kinds of chessboards has the same dimension of 8$\times$6 and the chessboard square size of 0.1m, the only difference is that our board has reserved some white space along the edge.

Download Full Size | PDF

We simulated the scan mode of MID-40, Avia and Horizon three different LIVOX LiDAR for point cloud acquisition. The sensor parameters is listed in Tab.1. For each acquisition, we keep the scene static and integrate the point cloud in certain integration time. The integration time of the point cloud is 5 seconds for Avia and Horizon. Considering that MID-40 LiDAR scans relatively sparse and the frame rate is low, expanded it’s integration time to 10s for higher point density. Meanwhile, a image with resolution of 2208$\times$1242 are captured in each of point cloud integration. To control the variables, keep the chessboard target placement the same for each acquisition in different method.

Table 1. The detailed parameters of sensors we simulated for evaluation.

View Table | View all tables in this article

4.1.2 Data simulation

This section introduced how to model and simulate the reflectance intensity of the point cloud on the chessboard. In practice, many elements, such as atmosphere, illumination, incidence angle, material properties, may affect the reflectance intensity of point cloud. More detailed error and intensity modeling are described in [34,40]. As it is not the main subject of this paper, for the purpose of calibration evaluation, we only considered the impact of the block color of the chessboard grid on the point cloud reflection intensity.

As previously mentioned in Sec.3.3.2, the two element GMM is used to fit intensity distribution. We collected extensive sets of real calibration board point cloud data in indoor and outdoor scenes, and trained the point cloud intensity to estimate the parameters $\mu _{w}^{r},\sigma _{w}^{r}$ and $\mu _{b}^{r},\sigma _{b}^{r}$ of the GMM. Take the indoor data as example, ten samples of the point intensity histograms as shown in data1$\sim$data10 in Fig. 9.

Fig. 9. Histogram of reflectance intensity samples of chessboard point clouds are denoted as data1$\sim$data10. Samples are plotted using thin curves of ten different colors. The bold red curve represents the mean of all samples. The bold blue curve represents the intensity histogram of simulated chessboard point cloud by using GMM fitting.

Download Full Size | PDF

A laser beam with a divergence angle may irradiate the white area and the black area at the same time, as is illustrated by the elliptical cross-section in Fig. 10. Let the area of the elliptical in white blocks as $S_w$ and the black block area as $S_b$. The intensity probability density of the point is obtained according to the area ratio. As is illstrated in Eq. (16)

(16)$$p(I)=\frac{S_w}{S_w+S_b}\mathcal{N}\left( \left. I \right|\mu _w,\sigma _w \right) +\frac{S_b}{S_w+S_b}\mathcal{N}\left( \left. I \right|\mu _b,\sigma _b \right) .$$

Fig. 10. The process of point cloud reflectivity intensity mapping. Due to the influence of LiDAR divergence angle, there is a smooth transition from black intensity to white intensity.

Download Full Size | PDF

Considering that the chessboard point cloud intensity and flatness in indoor and outdoor environments may be quite different due to changes in light conditions, we collected and analyzed the indoor and outdoor data in real-world, then performed simulation according to the intensity and flatness distribution of the collected data.

A total of eighty samples collected in the indoor and outdoor scene are used for analysis. The zero-mean Gaussian distribution model were used to fit the distance of the point to the fitting plane. On the other hand, the Gaussian distribution model were used to fit the reflectance intensity. The distribution results as shown in Tab.2. The data indicate that outdoor data have greater reflectance intensity variance and flatness variance, and the mean difference of intensity between black and white point cloud is larger as well. For the calibration task, the outdoor data is more challenging.

Table 2. Intensity and flatness distribution of indoor and outdoor data, also the parameter for point cloud simulation(N1 and N2).

View Table | View all tables in this article

After real-world data distribution was analyzed, in the simulation experiment, we simulated N1 and N2, two kinds of datasets according to the indoor and outdoor data distribution, respectively. In summary, we have simulated the calibration data acquired by three kinds of LiDAR(Avia, Horizon, MID-40) under two kinds of scene(N1, N2). Each one of the 3$\times$2 datasets has 20 target placement(point cloud and image pair).

4.1.3 Evaluation of 3D corner estimation

This experiment investigates the accuracy and robustness in 3D corner extraction from point clouds. We applied the 3D corner detection of ACSC [9], ILCC [30] and the proposed method on the simulated data obtained the in the previous section. Since most of the previrous reflectance intensity-based calibration method is designed for the sparse ring-based scanning LiDAR (e.g. Velodyne HDL64), they strictly relies on line scan pattern, which resulted in failure of board and corner detection for the non-repeative scan point cloud. We resample the integrated points to imitate the 128-channel LiDAR pattern like ACSC [9] does, which makes ILCC obtain viable results for comparation.

The corner detection comparison results of the three methods are as shown in Fig. 11. The box plot illustrates that our approach has higher precision than the other methods in different LiDARs and scene types. More specifically, ACSC and ILCC apply the distance between the point and the edge of the standard plate as the constrain. However, according to our analysis in Sec.3.3.2, the points around the edge are neither reliable nor accurate. These constraints introduce errors to the corner extraction. To minimize the influence of this case, the proposed method only considers the reflectance intensity of the inlier block point as constrain. On the other hand, ACSC and ILCC construct cost directly in each target 3D corner. Thus they have in total $N_w\times N_h=$35 constraints in this experiment. While our method have in total of 67 block grid constraints, as is illstrated by the yellow and green points in Fig. 6. More constraints enable the optimization to converge to a global optimal solution more reliable and accurate. The above advantages make our method show much higher accuracy. In addition to this, the box plot shows that the other methods have a higher error variance. This is mainly due to the non-uniformity distribution of density and reflectance intensity. The other methods traverse all point cloud to construct costs, that can result in unevenly distributed weight of the constrain, and leads to unconstant error in different corner. In our method, the costs are constructed by the neighbor points of each grid edge, and each cost is normalized by the point cloud number, which can reduce the influence of uneven point cloud density. Meanwhile, using the reflectance differential value of adjacent black-and-white point clouds to define the cost can solve the uneven reflectance distribution problem. Thus, the proposed method can maintain smaller and constant error in corner extraction.

Fig. 11. Cross method results comparison. The error distributions of the 3D corner detection take the form of box plots, illuminating the performance of the algorithms in different simulated scene types(N1 and N2) and sensors(Avia, Horizon, MID-40).

Download Full Size | PDF

4.1.4 Evaluation of calibration results on synthetic Data

In this section, we evaluate the absolute error of the calibration results. The ground truth extrinsic parameters of LiDAR and camera ($\mathrm {t}_{cl}^{gt},\mathrm {R}_{cl}^{gt}$) can be retrieved from the Gazebo simulator. Following [9], the absolute translation and the rotation error of the calibration results ($\mathrm {t}_{cl}^{*},\mathrm {R}_{cl}^{*}$) can be obtained by:

(17)$$\begin{aligned} e_{\mathrm{t}} &= \left\| \mathrm{t}_{cl}^{*}-\mathrm{t}_{cl}^{gt} \right\|, \\ e_{\mathrm{r}} &= \left\| \mathrm{I}-\left( \mathrm{R}_{cl}^{gt} \right) ^{{-}1}\mathrm{R}_{cl}^{*} \right\| _F, \end{aligned}$$

where $\|\cdot \|$ denotes the norm and $\|\cdot \|_F$ is the Frobenius norm.

We applied multi-frame joint optimization performance evaluation of the three methods. The calibration error distribution under different frame numbers for joint optimization is shown in Fig. 12. In each test, a total of 20 sets(images and point clouds pair) combinations were randomly selected to evaluate the stability of calibration results. The box plot limits indicate the minimum and maximum error on the 20 sets, "+" and "$\circ$" denote the outlier on N1 and N2 respectively. The mark on the box represents the median of error.

Fig. 12. Multi-frame extrinsic calibration performance comparison of our method, ACSC, and ILCC methods under N1 and N2 simulation setups. The tests are applied on three different LiDAR sensors(from left to right is results on AVIA, Horizon, MID-40). Boxplot in the first row is the rotation error. The second row is the translation error.

Download Full Size | PDF

The overall results are in line with the corner extraction accuracy evaluation in the previous section. The calibration results in Avia data are better than those of the other LiDARs. Since in addition to the denser and more uniform point cloud, larger FoV provides more effective constrains in the space. Unlike our method and ILCC, ACSC does not perform strict geometric constraints on the estimated points. Therefore, their corner estimation accuracy exhibits a greater variance, which results in more outliers of the calibration results. However, after computing the extrinsic by applying the RANSAC-based PnP, ACSC can have better results on some certain sets. Our method uses RANSAC-Based PnP as the initial value like ILCC, and further optimizes the extrinsic parameters by minimizing the corner reprojection error. For ILCC, although their corner estimation is a little bit more accurate than ACSC, the geometric constraints of corners instead result in the overall offset estimation. The RANSAC and optimization method cannot reject this deviation. Thus it can not be optimized to a satisfactory result. The accuracy of corner estimation in our method is high enough to mitigate the effect in extrinsic solving. It maintains a more stable and lower error when the number of frames for joint optimization reaches 6, while the others remain suboptimal.

4.2 Real world experiment

4.2.1 Setting

According to the properties of the non-linear scanning LiDAR (taking the lowest scanning frequency sensor LiDAR MID-40 as an example), the farther the distance, the smaller the distance error-rate ($error/distance$) [6]. However, the pixel error for visual features gets larger with increasing distance. The typical chessboard-like pattern can hold good accuracy of visual features extraction within 6m [41]. Combining the above factors, a compromise solution is to place the chessboard at 3-5m in front of the LiDAR. The chessboard used in this experiment has 0.1m square size and 8$\times$5 grid dimension. By setting the integrate time to 10s, each acquisition has $10^6$ points. Considering the circular FOV of $38.4^\circ$, the point cloud resolution on the plane in 5m of horizontal distance is, on average, approximately 10 points/cm$^2$. Such high-density point cloud enable a millimeter-level precision estimation.

To verify the effectiveness of the proposed method in real-world practice, this section presents a variety of evaluation results in two kinds of scenes: indoor and outdoor environments. We use the most challenging LiDAR sensor MID-40 combined with ZED2 stereo camera for data acquisition, as Fig. 13(a)(b) shows. In each acquisition, the dense point cloud and stereo images(right and left images) are obtained simultaneously. We collected 40 different target placements data for each of the indoor and outdoor experiments, respectively. Care has been taken to adjust the chessboard’s pose to make it cover the co-observed area as far as possible, as Fig. 13(c) illustrated. To verify the calibration stability of the algorithm under different pose, the extrinsic parameters of LiDAR-Left-cmaera and LiDAR-Right-camera will be estimated independently.

Fig. 13. (a)Indoor scene,(b)outdoor scene and (c)the setup of the overall experiment, shows that the target placement cover the common view area of LiDAR and camera.

Download Full Size | PDF

4.2.2 Re-projection evaluation

Since the ground-truth extrinsic parameter between camera and LiDAR cannot be obtained under real conditions, we used the reprojection consistency of the 3D-2D features under the relative pose results as a quantified evaluation index in this experiment. As is mentioned in section 3.5, the reprojection error needs to be re-scaled by the corner point distance to reduce the bias resulted from the perspective projection of camera. We follow [9] to apply the Normalized Reprojection Error(NRE) defined in Eq. (18) as the residual. $reprj(\cdot )$ denote the transformation and projection process of $\mathrm {P}_l\in \mathbf {P}_{corner}^{*}$ from $\mathcal {C}_{lidar}$ to the image pixel.

(18)$$\begin{aligned} NRE=&\frac{1}{\left\| \mathbf{P}_{corner}^{*} \right\|}\sum_{\left( \mathrm{P}_l,\mathrm{x}_c \right) \in \left( \mathbf{P}_{corner}^{*},\mathbf{x}_c \right)}{\frac{\left\| reprj\left( \mathrm{P}_l \right) -\mathbf{x}_c \right\|}{d_{max}}}, \\ d_{max}=&\max \left\{ \left. \left\| \mathcal{T}_{3D}\left( \mathrm{T}_{cl},\mathrm{P}_l \right) \right\| \right|\mathrm{P}_l\in \mathbf{P}_{corner}^{*} \right\} . \end{aligned}$$

The $NRE$ under different number of frames for joint optimization is used as quantitative indicator. $N$ frames LiDAR and image pairs are randomly chosen from previously collected indoor and outdoor datastes. We run 30 times for each of $N \in [1,20]$. The final $NRE$ box plot of indoor and outdoor data as shown in Fig. (14). The error distribution indicates that when the number of fusion frame $N$ reach six, our method can obtain an $NRE$ error of less than 0.8 pixel in both indoor and indoor environments.

Fig. 14. Multi-frame extrinsic calibration evaluation w.r.t NRE of MID-40+Left camera(left figure) and MID-40+Right camera(right figure).

Download Full Size | PDF

4.2.3 Cross validation

In this experiment, we use ZED2 Stereo camera for LiDAR-camera and camera-camera cross-validation to further verify the accuracy of our method. Let $\mathrm {T}_{\mathrm {LC}_l}$, $\mathrm {T}_{\mathrm {LC}_r} \in \mathbb {S}\mathbb {E}\left ( 3 \right )$ be the relative pose of Lidar-Left-cmaera and Lidar-Right-cmaera respectively. The two calibration are processed independently. The relative pose between the left camera and the right camera can be estimated by the transformation:

(19)$$\mathrm{T}_{\mathrm{C}_l\mathrm{C}_r}^{est}=\mathrm{T}_{\mathrm{LC}_l}^{{-}1}\cdot\mathrm{T}_{\mathrm{LC}_r}.$$

The relative pose between binocular cameras can also be calibrated accurately in advance by using MATLAB stereo calibration toolbox. We treat it as a reference pose $\mathrm {T}_{\mathrm {C}_l\mathrm {C}_r}^{ref}\in \mathbb {S}\mathbb {E}\left ( 3 \right )$, and then apply Eq. (17) to evaluation the difference between $\mathrm {T}_{\mathrm {C}_l\mathrm {C}_r}^{est}$ and $\mathrm {T}_{\mathrm {C}_l\mathrm {C}_r}^{ref}$ w.r.t translation and rotation. This experiment can evaluate the consistency of calibration results between different sensors and methods. Although $\mathrm {T}_{\mathrm {C}_l\mathrm {C}_r}^{ref}$ may not be the perfect relative pose between cameras, it can be used as a compromise solution to evaluate the calibrated LiDAR-Camera extrinsic parameters. The result in last experiment are evaluated and the boxplots as shown in Fig. 15.

Fig. 15. Rotation difference(left figure) and translation difference(right figure) between estimated pose. $\mathrm {T}_{\mathrm {C}_l\mathrm {C}_r}^{est}$ and reference pose $\mathrm {T}_{\mathrm {C}_l\mathrm {C}_r}^{ref}$.

Download Full Size | PDF

The boxt plots in Fig. 15 shows that, as the number of frames for joint optimization reaches six, the RMSD(Root Mean Squard Difference) of translation is 6.07e-3 m(indoor) and 6.11e-3 m(outdoor), and RMSD of rotation is 7.83e-4 deg(indoor) and 9.85e-4 deg(outdoor). The difference tends to be small and stable, which is consistent with the conclusion drawn under the simulation experiment.

4.2.4 Re-projection results

For qualitative evaluation, we apply the final extrinsic parameters as a relative pose between the sensors and align the data from LiDAR and image by coordinate transformation and re-projection process.

In the long-range scene, The extrinsic error might be enlarged which result in pixel shift in coordinate transform and reproject process. In Our experiments, in the long-range scene as shown in Fig. 16, even if the point cloud distance reaches 75m, it can still correspond well in the image. e.g., in the A area, the glass that does not reflect the laser and the window frame that can reflect can still be aligned after re-projection. The points at the top of the tower in area B are about up to 105m away, it can be to seen that the re-projection error is still within 1 pixel.

Fig. 16. The point clouds are re-projected onto the image with grey value which mapped by distance. The re-projection results of two long-range scenes respectively. The actual distance of area A is $\geq$ 75m, and B is $\geq$ 100m.

Download Full Size | PDF

We accumulated 30 seconds of LiDAR frames to make the point cloud dense and acquire more scene details. Then point cloud are colorized by re-projected pixel, as is shown in Fig. 17. The detailed view in A, B, C areas illustrated that the long-distance point clouds can also be well-aligned with pixels, which qualitatively verifies the viability and accuracy of our algorithm.

Fig. 17. Colorized point cloud in long-range scene. The figures in the first row show the global point cloud from two perspectives. The second row figures show the point cloud details of areas A, B and C respectively.

Download Full Size | PDF

5. Conclusion and discussion

We developed an automatic extrinsic parameter calibration method for non-repetitive scanning LiDAR and camera system, which used only a printed chessboard as the calibration target and achieved high-precision estimation. The proposed method can be applied to any high-resolution LiDAR such as Livox. It is much more effective for applications requiring high-precision calibration in advance. The quantitative evaluation in the simulation and real-world environment has verified that its accuracy in the indoor and outdoor environment can reach the sub-pixel-level re-projection accuracy. Especially for the indoor environment, when the number of frames for joint optimization reaches six, our method can obtain the re-projection error of less than 0.5 pixels. In the qualitative evaluation experiment, the calibrated extrinsic parameters are used for long-range scene data transformation and re-projection, alignment accuracy at long distances is verified. In addition, our method is implemented by C++, and its high efficiency and accuracy enable it a great practical application potential.

The biggest limitation of the proposed method is that it can only be applied on non-repetitive scanning LiDAR, and does not work well on sparse repetitive scanning LiDAR data. In addition, we have only evaluated the proposed method on Livox MID-40 sensor in the real-world experiment. We will supplement the calibration data of other types of LiDAR on the project home page in the future and follow-up research plan to expand it to the multiple LiDAR calibration.

Funding

International Science and Technology Cooperation Programme (2015DFR10830).

Acknowledgments

The authors appreciate Prof. Shaohui Zhang of Beijing Institute of Technology for his helpful and constructive comments and suggestions.

Disclosures

The authors have no conflicts of interest to declare.

Data availability

The raw data and code supporting the results of this article are available at GitHub [10].

References

1. L. Claussmann, M. Revilloud, D. Gruyer, and S. Glaser, “A review of motion planning for highway autonomous driving,” IEEE Trans. Intell. Transport. Syst. 21(5), 1826–1848 (2020). [CrossRef]

2. A. Cherubini, F. Spindler, and F. Chaumette, “Autonomous visual navigation and laser-based moving obstacle avoidance,” IEEE Trans. Intell. Transport. Syst. 15(5), 2101–2110 (2014). [CrossRef]

3. T. Hu, X. Sun, Y. Su, H. Guan, Q. Sun, M. Kelly, and Q. Guo, “Development and performance evaluation of a very low-cost uav-lidar system for forestry applications,” Remote Sens. 13(1), 77 (2020). [CrossRef]

4. A. Salach, K. Bakuła, M. Pilarska, W. Ostrowski, K. Górski, and Z. Kurczyński, “Accuracy assessment of point clouds from lidar and dense image matching acquired using the uav platform for dtm creation,” ISPRS Int. J. Geo-Information 7(9), 342 (2018). [CrossRef]

5. W. Zhen, Y. Hu, J. Liu, and S. Scherer, “A joint optimization approach of lidar-camera fusion for accurate dense 3-d reconstructions,” IEEE Robot. Autom. Lett. 4(4), 3585–3592 (2019). [CrossRef]

6. Livox LiDAR, “Livox products,” https://www.livoxtech.com/.

7. M. Weinmann, B. Jutzi, C. Mallet, and M. Weinmann, “Geometric features and their relevance for 3d point cloud classification,” ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. IV-1/W1, 157–164 (2017). [CrossRef]

8. E. M. Farella, A. Torresani, and F. Remondino, “Sparse point cloud filtering based on covariance features,” Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 42, 465–472 (2019). [CrossRef]

9. J. Cui, J. Niu, Z. Ouyang, Y. He, and D. Liu, “Acsc: Automatic calibration for non-repetitive scanning solid-state lidar and camera systems,” arXiv preprint arXiv:2011.08516 (2020).

10. Z. Lai, Y. Wang, S. Guo, X. Meng, J. Li, W. Li, and S. Han, “Data and code for laser reflectance feature assisted accurate extrinsic calibration for non-repetitive scanning lidar and camera systems,” GitHub, 2022https://github.com/zhijianglu/RCLC.

11. R. Ishikawa, T. Oishi, and K. Ikeuchi, “Lidar and camera calibration using motions estimated by sensor fusion odometry,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (2018), pp. 7342–7349.

12. G. Pandey, J. R. McBride, S. Savarese, and R. M. Eustice, “Automatic targetless extrinsic calibration of a 3d lidar and camera by maximizing mutual information,” in Proceedings of the AAAI National Conference on Artificial Intelligence, (2012), pp. 2053–2059.

13. Z. Taylor and J. Nieto, “Automatic calibration of lidar and camera images using normalized mutual information,” in Robotics and Automation (ICRA), 2013 IEEE International Conference on, (2013).

14. K. Irie, M. Sugiyama, and M. Tomono, “Target-less camera-lidar extrinsic calibration using a bagged dependence estimator,” in 2016 IEEE International Conference on Automation Science and Engineering (CASE), (2016), pp. 1340–1347.

15. Z. Taylor, J. Nieto, and D. Johnson, “Multi-modal sensor calibration using a gradient orientation measure,” J. Field Robotics 32(5), 675–695 (2015). [CrossRef]

16. J. Levinson and S. Thrun, “Automatic online calibration of cameras and lasers,” in Robotics: Science and Systems, vol. 2 (2013), p. 7.

17. X. Zhang, S. Zhu, S. Guo, J. Li, and H. Liu, “Line-based automatic extrinsic calibration of lidar and camera,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), (2021), pp. 9347–9353.

18. C. Yuan, X. Liu, X. Hong, and F. Zhang, “Pixel-level extrinsic self calibration of high resolution lidar and camera in targetless environments,” IEEE Robot. Autom. Lett. 6(4), 7517–7524 (2021). [CrossRef]

19. Q. Zhang and R. Pless, “Extrinsic calibration of a camera and laser range finder (improves camera calibration),” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), vol. 3 (2004), pp. 2301–2306 vol.3.

20. R. Unnikrishnan and M. Hebert, “Fast extrinsic calibration of a laser rangefinder to a camera,” Technical report, Carnegie Mellon University, Rotbotics Institute (2005).

21. G. Pandey, J. McBride, S. Savarese, and R. Eustice, “Extrinsic calibration of a 3d laser scanner and an omnidirectional camera,” IFAC Proc. Vol. 43(16), 336–341 (2010). [CrossRef]

22. Y. Park, S. Yun, C. S. Won, K. Cho, K. Um, and S. Sim, “Calibration between color camera and 3d lidar instruments with a polygonal planar board,” Sensors 14(3), 5333–5353 (2014). [CrossRef]

23. X. Xu, L. Zhang, J. Yang, C. Liu, Y. Xiong, M. Luo, Z. Tan, and B. Liu, “Lidar-camera calibration method based on ranging statistical characteristics and improved ransac algorithm,” Robotics Auton. Syst. 141, 103776 (2021). [CrossRef]

24. Q. Liao, Z. Chen, Y. Liu, Z. Wang, and M. Liu, “Extrinsic calibration of lidar and camera with polygon,” in 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), (2018), pp. 200–205.

25. Z. Deng, L. Xiong, D. Yin, and F. Shan, “Joint calibration of dual lidars and camera using a circular chessboard,” in WCX SAE World Congress Experience, (2020).

26. Z. Pusztai and L. Hajder, “Accurate calibration of lidar-camera systems using ordinary boxes,” in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), (2017), pp. 394–402.

27. J. Jiao, Q. Liao, Y. Zhu, T. Liu, Y. Yu, R. Fan, L. Wang, and M. Liu, “A novel dual-lidar calibration algorithm using planar surfaces,” in 2019 IEEE Intelligent Vehicles Symposium (IV), (2019), pp. 1499–1504.

28. S. A. Rodriguez F., V. Fremont, and P. Bonnifait, “Extrinsic calibration between a multi-layer lidar and a camera,” in 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, (2008), pp. 214–219.

29. J. Beltrán, C. Guindel, and F. García, “Automatic extrinsic calibration method for lidar and camera sensor setups,” arXiv preprint abs/2101.04431 (2021).

30. W. Wang, K. Sakurada, and N. Kawaguchi, “Reflectance intensity assisted automatic and accurate extrinsic calibration of 3d lidar and panoramic camera using a printed chessboard,” Remote Sens. 9(8), 851 (2017). [CrossRef]

31. Q.-H. Pham, P. Sevestre, R. S. Pahwa, H. Zhan, C. H. Pang, Y. Chen, A. Mustafa, V. Chandrasekhar, and J. Lin, “A 3d dataset: Towards autonomous driving in challenging environments,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), (2020), pp. 2267–2273.

32. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Machine Intell. 22(11), 1330–1334 (2000). [CrossRef]

33. R. B. Rusu and S. Cousins, “3d is here: Point cloud library (pcl),” in 2011 IEEE International Conference on Robotics and Automation, (2011), pp. 1–4.

34. C. Glennie and P. Eng, “Rigorous 3d error analysis of kinematic scanning lidar systems,” J. Appl. Geodesy 1(3), 147–157 (2007). [CrossRef]

35. P. McManamon, LiDAR Technologies and Systems (Society of Photo-Optical Instrumentation Engineers (SPIE), 2019).

36. Q. Zhou, Z. Tan, and C. Yang, “Theoretical limit evaluation of ranging accuracy and power for lidar systems in autonomous cars,” Opt. Eng. 57(9), 1–8 (2018). [CrossRef]

37. S. Kaasalainen, E. Ahokas, J. Hyyppa, and J. Suomalainen, “Study of surface brightness from backscattered laser intensity: calibration of laser data,” IEEE Geosci. Remote Sensing Lett. 2(3), 255–259 (2005). [CrossRef]

38. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” J. Royal Stat. Soc. 39(1), 1–22 (1977). [CrossRef]

39. V. Lepetit, F. Moreno-Noguer, and P. Fua, “Epnp: An accurate o(n) solution to the pnp problem,” Int. J. Comput. Vis. 81(2), 155–166 (2009). [CrossRef]

40. P. Chmelar, L. Beran, and N. Kudriavtseva, “The laser color detection for 3d range scanning using gaussian mixture model,” in 2015 25th International Conference Radioelektronika (RADIOELEKTRONIKA), (2015), pp. 248–253.

41. L. Zhou and Z. Deng, “Extrinsic calibration of a camera and a lidar based on decoupling the rotation from the translation,” in 2012 IEEE Intelligent Vehicles Symposium, (2012), pp. 642–648.

Sensor	Frequency	FoV( $^{\circ}$ )
Sensor	Frequency	Horizontal	Vertical
Livox Avia LiDAR	240,000 Points/s	70.4	77.2
Livox Horizon LiDAR	240,000 Points/s	81.7	25.1
Livox MID-40 LiDAR	100,000 Points/s	38.4	38.4
ZED2 Camera	15 Frames/s	110	70

Parameter	Reflactance Intensity distribution		Flatness distribution(m)
Parameter	$(μ_{w}^{r}, σ_{w}^{r})$	$(μ_{b}^{r}, σ_{b}^{r})$	$(μ_{w}^{d}, σ_{w}^{d})$	$(μ_{b}^{d}, σ_{b}^{d})$
Indoor data (N1)	(107.12,17.80)	(20.54,15.26)	(0, 7.2e-3)	(0, 1.1e-2)
outdoor data (N2)	(120.23,25.53)	(30.24,16.31)	(0, 1.2e-2)	(0, 1.5e-2)

Sensor	Frequency	FoV( $^{\circ}$ )
Sensor	Frequency	Horizontal	Vertical
Livox Avia LiDAR	240,000 Points/s	70.4	77.2
Livox Horizon LiDAR	240,000 Points/s	81.7	25.1
Livox MID-40 LiDAR	100,000 Points/s	38.4	38.4
ZED2 Camera	15 Frames/s	110	70

Parameter	Reflactance Intensity distribution		Flatness distribution(m)
Parameter	$(μ_{w}^{r}, σ_{w}^{r})$	$(μ_{b}^{r}, σ_{b}^{r})$	$(μ_{w}^{d}, σ_{w}^{d})$	$(μ_{b}^{d}, σ_{b}^{d})$
Indoor data (N1)	(107.12,17.80)	(20.54,15.26)	(0, 7.2e-3)	(0, 1.1e-2)
outdoor data (N2)	(120.23,25.53)	(30.24,16.31)	(0, 1.2e-2)	(0, 1.5e-2)

Laser reflectance feature assisted accurate extrinsic calibration for non-repetitive scanning LiDAR and camera systems

Abstract

1. Introduction

2. Related works

3. Methdology

3.1 Notation

3.2 Proposed method overview

3.3 Chessboard and 3D corner extraction

3.3.1 Chessboard segmentation

3.3.2 3D corner extraction

3.4 Refinement of chessboard grid features

3.5 Feature alignment and extrinsic optimization

4. Experiment

4.1 Simulation experiment

4.1.1 Establishment of simulation environment

4.1.2 Data simulation

4.1.3 Evaluation of 3D corner estimation

4.1.4 Evaluation of calibration results on synthetic Data

4.2 Real world experiment

4.2.1 Setting

4.2.2 Re-projection evaluation

4.2.3 Cross validation

4.2.4 Re-projection results

5. Conclusion and discussion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (17)

Tables (3)

Equations (19)

Optics Express