## Abstract

We present UWSPSM, an algorithm of uncertainty weighted stereopsis pose solution method based on the projection vector which to solve the problem of pose estimation for stereo vision measurement system based on feature points. Firstly, we use a covariance matrix to represent the direction uncertainty of feature points, and utilize projection matrix to integrate the direction uncertainty of feature points into stereo-vision pose estimation. Then, the optimal translation vector is solved based on the projection vector of feature points, as well the depth is updated by the projection vector of feature points. In the absolute azimuth solution stage, the singular value decomposition algorithm is used to calculate the relative attitude matrix, and the above two stages are iteratively performed until the result converges. Finally, the convergence of the proposed algorithm is proved, from the theoretical point of view, by the global convergence theorem. Expanded into stereo-vision, the fixed relationship constraint between cameras is introduced into the stereoscopic pose estimation, so that only one pose parameter of the two images captured is optimized in the iterative process, and the two cameras are better bound as a camera, it can improve accuracy and efficiency while enhancing measurement reliability. The experimental results show that the proposed pose estimation algorithm can converge quickly, has high-precision and good robustness, and can tolerate different degrees of error uncertainty. So, it has useful practical application prospects.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

The vision-based pose estimation [1–3] is to estimate the motion state of the moving object relative to the world coordinate system by using the moving target and its projection coordinates, mainly to obtain the three-dimensional pose. This can be attributed to the problem of solving the motion parameters of *n*-point perspective, which is the P*n*P (Perspective-n-Point) [4–6] problem.

Pose estimation based on stereo-vision has obvious advantages compared to monocular vision [7–11]. Stereo vision can obtain the three-dimensional information of the moving target by epipolar geometry and through the structural constraint of the camera. It does not require the prior constraints of moving objects and has high precision of pose estimation. Therefore, the pose estimation based on stereo-vision is more useful. The stereoscopic pose estimation process uses multiple points (n ≥ 3) to complete the moving target image point matching, and establishes the stereo vision target projection imaging model to solve the relative motion state of the target [12–15]. The solution of the *n*-points perspective problem is generally solved by iterative optimization algorithm that the usual approach is to construct a minimized objective function, and minimize the target error by loop iteration to get the optimal solution [16–18]. Cui et al. [19] established a binocular visual objective function based on the minimization of spatial coordinate error, and optimized the camera parameters according to the nonlinear optimization algorithm to obtain the global optimal camera parameters. El Batteoui et al. [20] proposed a camera calibration method with weakened constraints. Li et al. [21] proposed a beam-mode adjustment method that does not consider scene information to obtain camera model parameters, and only considers Gaussian noise. Censi et al. [22] used the non-metric multivariate scale method to establish the objective function of Euclidean space distance minimization, and improved the SK optimization algorithm to calibrate the camera to enhance the robustness of the algorithm. Haralick [23] proposed a relative pose determination algorithm based on point features by introducing the depth of field variables of feature points, and the algorithm has global convergence. In order to overcome the shortcomings of the Haralick algorithm, such as slow convergence, many scholars have modified and improved it, applied it to different scenarios such as planar targets and general camera models, and proposed a variety of effective global convergence iterative algorithms [24].

At present, the objective function of pose estimation based on collinearity error [25–28] assumes that the imaging errors of feature points conform to the uniform distribution of Gauss. Therefore, the result of pose estimation only has statistical significance and cannot get the actual optimal result. In the actual camera imaging, due to the influence of factors such as an imaging sensor assembly and imaging environment, the measurement noise at the feature point does not conform to the uniform Gaussian distribution, and the direction uncertainty is obtained. The pose solution obtained by the traditional method optimization is not the optimal solution in practical applications. The pose solution obtained by solving the objective function by loop iteration is not an absolute optimal solution, and is an optimal solution adapted to the current conditions. Under different conditions, the optimality criteria of the solution are different.

This paper is inspired by literature [29] and [30], and fully considers the problem of accuracy and robustness in the practical application of visual measurement, focuses on the direction uncertainty and position uncertainty of the feature points, we propose stereoscopic pose estimation method with image weighted measurement error based on the projection vector. The projection vector of feature is introduced to describe collinearity error by combining the depth information to eliminate model nonlinearity. The covariance matrix is used to describe the direction uncertainty of feature points, and the projection transformation is used to incorporate the uncertainty error into the collinear equation, which makes the method have the ability to deal with the problem of visual pose estimation with direction uncertainty. Extended to stereo vision, the algorithm can improve the efficiency by using fixed pose constraints between stereo vision cameras. It can expand the field of vision of the system by multiple cameras, and can also use redundant measurement information to enhance the robustness of the system to noise.

The remainder of this paper is organized as follows: Section 2 gives a description of the relevant problem. Section 3 introduces the algorithm of stereoscopic pose estimation with uncertainty error based on the projection vector. Section 4 theoretically proves the convergence of the algorithm in this paper. Experiments are conducted to verify the effectiveness of the algorithm in Section 5 and conclusion in Section 6.

## 2. Related problem description

#### 2.1 Stereoscopic collinear equation based on projection vector

Generally speaking, the problem of visual pose estimation is also called absolute orientation problem, which widely exists in various visual applications, such as rendezvous and docking, bionic robot, moving target tracking [31–35]. The general algorithm uses iterative optimization to solve the optimal transformation parameters, and stereo-vision can use multiple points to increase redundant information to improve the accuracy of vision-based pose estimation.

From the Fig. 1, $\def\upmu{\unicode[Times]{x00B5}}{O_w} - {X_w}{Y_w}{Z_w}$ is the world coordinate system, and the coordinates of the feature points in the word coordinate system are $\{{P_i^w = {{({x_i^w,y_i^w,z_i^w} )}^T},i = 1,2, \cdots n} \}$. ${O_{cl}} - {X_{ls}}{Y_{ls}}{Z_{ls}}$ and ${O_{cr}} - {X_{rs}}{Y_{rs}}{Z_{rs}}$ are camera coordinate systems, and the coordinates of feature points in corresponding camera coordinate systems are $\{{P_{ij}^c = {{({x_{ij}^c,y_{ij}^c,z_{ij}^c} )}^T},i = 1,2, \cdots n,j = l,r} \}$. The normalized image coordinates are ${\hat{C}_{ij}} = {({{u_{ij}},{v_{ij}},1} )^T}$ . Known from the rigid body motion, there is:

Wherein, the relative attitude rotation matrix ${{\boldsymbol R}_j}$, and translation vector ${{\boldsymbol T}_j}$, which is between the camera and phase of the body coordinate system. The relationship between the image point ${C_{ij}} = {({{u_{ij}},{v_{ij}}} )^T}$ and its image physical coordinates $C_{ij}^c\textrm{ = }({{x_{ij}},{y_{ij}},{f_j}} )$ in the camera coordinate system and $P_i^w = {({x_i^w,y_i^w,z_i^w} )^T}$ in world coordinate system is:Wherein, ${\boldsymbol X} = {({P_i^t,1} )^T}$ is the homogeneous coordinate, ${{\boldsymbol H}_j}$ is the transformation matrix, and ${{\boldsymbol A}_j}$ is the camera system matrix, ${f_j}$ is the focal length of the camera lens, $({{u_{j0}},{v_{j0}}} )$ is the camera optical center, ${\gamma _j}$ is the skew factor in the direction of the ${u_j}$ and ${v_j}$ axis, which is very small and can be neglected. ${1 \mathord{\left/ {\vphantom {1 {d{x_j}}}} \right.} {d{x_j}}}$ and ${1 \mathord{\left/ {\vphantom {1 {d{y_j}}}} \right.} {d{y_j}}}$ represent the physical length of unit pixel in direction *u* and direction *v*, respectively.

Ideally, feature points - photocenter - image points are collinear on the ray ${O_{jc}}P_{ij}^c$, as shown in Fig. 1. The unit vector projection ${\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {\boldsymbol v}_{ij}}$ is:

The collinear equation model of the stereo-vision measurement cameras used in this paper is extended on the basis of the monocular visual collinear equation model. The exterior orientation element in the collinear equation model is the pose of the right camera relative to the left camera in the world coordinate system, including rotation matrix and translation vector. The relationship between the elements of the left and right camera can be further obtained:

*l*and

*r*respectively denote the left and right camera. No special explanation will be given later. According to the setting of the system coordinate system, ${\boldsymbol R}$ and ${\boldsymbol T}$ can refer to either ${\boldsymbol R}_{l}$ and ${\boldsymbol T}_{l}$ of the left camera or ${\boldsymbol R}_{r}$ and ${\boldsymbol T}_{r}$ of the right camera.

The objective error function is based on the spatial collinearity error of all target feature points in the unified camera reference coordinate system. Compared with a single camera, the stereo-vision cameras optimize the target feature information obtained by all cameras through the fixed relative relationship constraint between the left and right camera, which not only enhances the constraint but also increases the redundant information, so that the objective error function is optimized. The pose solution is more stable and more accurate.

Assuming that *N* pairs of images have been acquired, there is such a constraint relationship for any two pairs of images:

*m*and

*h*respectively represent any two pairs of images.

The fixed pose relationship between cameras is added as a mandatory constraint to the collinear error equation of stereoscopic pose estimation. Compared with the traditional stereo pose estimation process, the dimension of the collinear error equation can be effectively reduced. During the iteration, only one of the two images’ translation vector were optimized to better combine the two cameras into one, hence effectively increase the number of pose estimation features to make the optimization more accurate under the same number of iterations. At the same time, the dimension of the collinearity error matrix is reduced by half compared with the conventional algorithm, which improves the optimization precision and reduces the time required for optimization.

#### 2.2 Characteristic measurement error uncertainty

In practical visual measurement, different feature points have different distribution of gray-scale patterns in imaging plane, and the orientation of gray distribution will be introduced when image points are extracted, which reflected on the direction *u* and *v*. Therefore, uncertainty of image point error can be used to represent the anisotropy and non-independent identical distribution of extraction error. Different image point errors have different magnitude, anisotropic distribution and uncertainties. Referring to [30,36–40], we use the covariance inverse matrix of measurement error to model the uncertainty of the image point, which is described as follows:

*u*and

*v*directions. $\omega$ is the sum of the gray level of the pixel in the region $\aleph$, which is an elliptical or circular region in the image plane

*I*. ${\boldsymbol Q}$ is the covariance matrix for image point. And Fig. 2 is the geometry description of the uncertainty.

As shown in Fig. 2, ${{\boldsymbol Q}^{ - 1}}$ represent measurement error, which depended on the elliptical region at point ${x_i} = {({{u_i},{v_i}} )^T}$. The size of the ellipse's long and short axis *a* and *b* indicate the uncertainty of the feature point ${x_i}$. The uncertain direction expressed by the angle of the long and short axes *a* and *b* with respect to the direction *u* and *v*. It mainly shows three different types of image point measurement error uncertainty in Fig. 2.

If the image point error is in the case of Fig. 2 (II), it means that the uncertainty of feature point is isotropic and there is no direction uncertainty, only the scale uncertainty. The global optimal pose solution can be obtained by iterating the objective function in the traditional way. If all the image point errors are in the case of Fig. 2 (I) and (III), the uncertainty of feature points are directional, and the uncertainty of image point errors’ direction in actual situation need to be considered, the objective function cannot be constructed simply by traditional method.

## 3. Stereoscopic pose estimation with uncertainty based on projection vector

#### 3.1 Processing method of uncertainty

The describing process of stereo vision collinearity error equation with the help of fixed position and attitude constraints between stereo-vision, such as Eqs. (8), (9) and (10), can satisfy the stereo-vision constraints as Eq. (11). Then, we need to explain how the uncertainty of errors are incorporated into the stereo vision collinearity equation. First, we need to model the object point imaging error as shown in Eq. (12). In order to integrate the measurement error uncertainty of the imaging feature points into the pose estimation, the imaging feature points and the re-projection points are transformed into the uncertainty of weighted covariance data space by the affine transformation matrix ${\boldsymbol F}$. According to the transformed data space, the feature points determine the re-projection errors. Thus incorporating the image point measurement error uncertainty into the objective function constructed with re-projection errors.

According to the structure of the covariance matrix ${\boldsymbol Q}$ of image point measurement errors, it is semi-positively symmetric and can be decomposed into SVD.

Wherein, $\Sigma = diag({\sigma_1^2,\sigma_2^2,1} )$. ${\sigma _1}$, ${\sigma _2}$ is the standard deviation of the image measurement error uncertainty along the two directions in the image coordinate system, the size is the long axis and the short axis of the elliptical region, and ${\boldsymbol U}$ is a 3×3 real orthogonal rotation matrix. The inverse matrix of the image point error covariance matrix is: In which, ${\Sigma ^{ - 1}} = diag({{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {\sigma_1^2}}} \right.}\!\lower0.7ex\hbox{${\sigma_1^2}$}},{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {\sigma_2^2}}} \right.}\!\lower0.7ex\hbox{${\sigma_2^2}$}},1} )$. Using the covariance matrix ${\boldsymbol Q}$ to define the matrix:*u*direction and the

*v*direction, and combines ${\sigma _1}$ and ${\sigma _2}$ in ${\Sigma ^{{{ - 1} \mathord{\left/ {\vphantom {{ - 1} 2}} \right.} 2}}}$ to inversely transform the ellipse into a unit circle, that is, the case of Fig. 2 (II). The error is uncorrelated in the

*u*direction and the

*v*direction, isotropic and independently distributed. ${\boldsymbol F}$ is a 3×3 affine transformation matrix, which determined by ${\sigma _1}$, ${\sigma _2}$. The matrix ${\boldsymbol F}$ can transform the imaging feature points and the re-projected image points into the uncertainty-weighted covariance data space. Let the coordinates of the imaged feature points on the image plane be ${C_i}$, the coordinates of the re-projected image points be set to ${C^{\prime}_i}$, and the points obtained after

**-transformation are ${\hat{C}_i}$ and ${\hat{C^{\prime}}_i}$, respectively. The transformation process is:**

*F***-transformation, the uncertainty of the image feature points are passed to the re-projection coordinates. The mathematical description of the imaging feature point uncertainty weighting process is as:**

*F*#### 3.2 Error-weighted objective function based on projection vector

In fact, when establishing the objective equation of pose estimation, it is necessary to consider the uncertainty of direction of feature measurement error. In this paper, the uncertainty of error direction is weighted, Which is transformed into covariance data space by the affine transformation matrix ${\boldsymbol F}$. The objective function of pose estimation is established to minimize the re-projection error, so as to solve the problem of uncertainty of feature measurement error direction.

According to Eqs. (2), (10) and (16), we can get the following conversion:

_{:}

Because the modified algorithm based on singular value decomposition proposed by Umeyama has the advantages of robustness when measuring errors and low computational complexity, this paper rotates the algorithm to solve the absolute azimuth. The specific Umeyama algorithm is detailed in the Ref. [44].

#### 3.3 Stereoscopic pose estimation algorithm flow based on projection vector

In this paper, the coordinate system of the right (or left camera) of the stereo-vision measurement camera is used as the measurement coordinate system (i.e., the world coordinate system), and the relative pose relationship between the camera coordinate system and the target coordinate system are ${\boldsymbol R}$ and ${\boldsymbol T}$ as the optimization target parameters. The objective function established in this paper is described by the 2-norm of spatial collinearity error, which represents the spatial collinearity error of all target feature points obtained by all cameras, and includes the target feature points’ uncertainty weighted information. The algorithm flow is as follows, where $j = l$ or $j = r$.

Step 1: Initialize ${{\boldsymbol R}^{(0 )}}$ and solve the projection vector ${\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {\boldsymbol v}_{ij}}$.

Step 2: Computing affine transform matrix ${{\boldsymbol F}_{ij}} = {\Sigma ^{{{ - 1} \mathord{\left/ {\vphantom {{ - 1} 2}} \right.} 2}}}{\boldsymbol U}_{ij}^T$ based on the covariance matrix ${{\boldsymbol Q}_{ij}}$.

Step 3: Calculate the optimal translation vector ${\boldsymbol T}({\boldsymbol R} )$ in Eq. (25).

Step 4: Update depth $s_{ij}^{(k )}({\boldsymbol R} )\textrm{ = }\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {\boldsymbol v}_{ij}^T({{{\boldsymbol R}_j}P_i^t + {\boldsymbol T}({\boldsymbol R} )} )$ and reconstruct feature points:

*k*+ 1)

^{th}iterative value of rotating matrix:

Step 8: If *k* is less than the ${k_{\max }}$ of the preset iteration number, Then perform step 3, or ends and outputs the result.

In order to intuitively represent the algorithm of this paper, the flow chart of the algorithm is given below in Fig. 3.

## 4. Global convergence analysis

We analyze the global convergence of the proposed algorithm, under any initial condition and any set of target feature points. There are definitions and global convergence theorems [45] as following.

**Theorem:** Assume that $\Theta $ is a set of solutions, ${\boldsymbol R}$ is an algorithm on $P = \{{{p^{(k )}}} \}$. Given the initial value is ${p^{(0 )}} \in P$, then do the iteration process: Until ${p^{(k )}} \in \Theta $, stop the iteration, or repeat the algorithm and set ${p^{({k + 1} )}} = {\boldsymbol R}({{p^{(k )}}} )$. Then, will be get a sequence $\{{{p^{(k )}}} \}$, Suppose:

- (1) There is a closed mapping ${\boldsymbol R}$ on the supplementary set of $\Theta $;
- (2) Tight subset of
*P*comprise the sequences $\{{{p^{(k )}}} \}$; - (3) Existing a descent function of $\Theta $ and ${\boldsymbol R}$, in addition, this function is continuous.

**Proof:** First prove the mapping ${\boldsymbol R}$ on the complement $\Theta $ is a closed.

**Definition:** Let $P = \{{{p^{(k )}}} \}$, $Q = \{{{q^{(k )}}} \}$ ($k = 1,2, \cdots ,n,$

$n \ge 3$) be a non-empty closed set ${\Phi _p}$ and ${\Phi _q}$, respectively, and ${\boldsymbol R}:P \to Q$ be a point-to-set mapping in space. If: ①${p^{(k )}} \to p$, ${p^{(k )}} \in P$ , ②${q^{(k )}} \to Q$, ${q^{(k )}} \in$ ${\boldsymbol R}({{p^{(k )}}} )$, implies $q \in {\boldsymbol R}(p )$, then at $p \in P$, the mapping ${\boldsymbol R}$ is closed. In case that, every mapping ${\boldsymbol R}$ is closed in the collection ${\boldsymbol {\rm M}} \subset P$, it is considered that the mapping ${\boldsymbol R}$ is closed within the collection ${\boldsymbol M}$.

In the algorithm, steps 3 and 4 are continuous point-to-point mappings, therefore both of it are closed-point to set, and in step 5, the solution of attitude matrix is also a closed mapping, therefore, the first condition of converges is satisfied.

Now, we prove the second condition. It is noticed that the nature of the algorithm is the solution of the pose matrix ${\boldsymbol R}$, which is an orthogonal matrix meet the requirement of ${\boldsymbol R}{{\boldsymbol R}^T}\textrm{ = }{\boldsymbol I}$. Each iteration output meets the constrained condition ${{\boldsymbol R}^{({k + 1} )}}{({{{\boldsymbol R}^{({k + 1} )}}} )^T}\textrm{ = }{\boldsymbol I}$, therefore ${{\boldsymbol R}^{({k + 1} )}}$ is an orthogonal matrix that is closed and bounded set satisfies the requirement of the compact set, so the second condition is satisfied.

Finally, the third condition is demonstrated. The error of the objective function after the k + 1^{th} iteration is:

In summary, the objective function can satisfy the three conditions of global convergence, which proves that the iterative algorithm has global convergence.

## 5. Analysis of the experiment and results

#### 5.1 Simulation experiment and result analysis

A lot of experiments have been done and compared with different algorithms in order to verify the superiority and effectiveness of the proposed algorithm. In order to facilitate the description, we use the abbreviation of the method mentioned in the experiment to express.

- (1) Conventional Stereopsis Pose Solution Method, abbreviated as CSPSM: The error type processed as a uniform, co-directional Gauss noise error in this algorithm.
- (2) Scale Weighted Stereopsis Pose Solution Method, abbreviated as SWSPSM: The error type processed as a non-uniform, co-directional Gauss noise error in this algorithm.
- (3) Direction Weighted Stereopsis Pose Solution Method, abbreviated as DWSPSM: The error type processed as a uniform, non-codirectional Gauss noise error in this algorithm.
- (4) Uncertainty Weighted Stereopsis Pose Solution Method, abbreviated as UWSPSM: The error type processed as a non-uniform, non-codirectional Gauss noise error in this algorithm.

The initial parameters of the measuring system are set as follows:

Triaxial relative attitude angle:

Relative position:

Two camera parameters:

In the space of [-0.2 0.2] × [-0.2 0.2] × [0 0.5] m^{3}, there are 20 target points generated and randomly uniformly distributed. Directional uncertainty of measurement error of characteristic image points is expressed by the ellipticity $r = {{{\sigma _1}} \mathord{\left/ {\vphantom {{{\sigma_1}} {{\sigma_2}}}} \right.} {{\sigma _2}}}$, which ${\sigma _1}$, ${\sigma _2}$ represents the long and short axis of the elliptical uncertainty region. For each feature point, the direction of the point ellipse noise is randomly selected from 0 to 180°.

In this section, the experimental results under each condition are averaged 100 times for the experiment more statistical.

### 5.1.1 Accuracy and convergence analysis

Set the ellipticity *r* = 2, and the CSPSM, SWSPSM and DWSPSM are compared with the proposed initial position estimation model and the initial simulation parameters given in the previous section. The result is displayed in Fig. 4.

From Fig. 4 shows that the relative position and three-axis attitude angle of the four methods reach convergence after 8∼14 iterations, and after 15 iterations, they can converge to the allowable range of error. However, the proposed method UWSPSM has faster convergence speed than the other three methods, and the fastness of the algorithm is verified. The convergence error is small, which verifies the validity of the global convergence of the algorithm.

The influence of the number of feature points for the algorithm accuracy is analyzed. Using the above 5∼20 feature points with the elliptical noise *r* = 10, and other conditions remain unchanged. Compared to the four methods by the absolute value of the pose error, and the results of the simulation are shown in Fig. 5.

As can be seen from Fig. 5, the pose estimation results of the four methods have large errors and unstable in the situation of fewer characteristic points and greater uncertainty. With the increase of the number of characteristic points, the four algorithms pose estimation error shows a downward trend. CSPSM has the worst precision, DWSPSM and SWSPSM have the second and the precision is not much different, UWSPSM has the highest precision, which shows the advantage of the proposed method.

In order to further prove the validity of the proposed algorithm, this paper presents the relationship between the error of pose estimation and the number of feature points for stereo vision of moving objects by five kinds of common pose estimation methods. The number of target feature points is increased from 5 to 20, and the uncertainty of elliptical noise of fixed image points is *r* = 2. The experimental results are shown in Fig. 6.

In Fig. 6, with the increase of the number of target feature points, the accuracy of various pose estimation algorithms improves continuously. Compared with the curve of attitude and position estimation error varying with the number of target feature points in Fig. 6, it can be seen that the algorithm in this paper has the highest position and attitude estimation accuracy, followed by EPNP algorithm and POSIT algorithm, and DLT algorithm has the worst position and attitude estimation accuracy. Through comparative analysis, the proposed algorithm can maintain the accuracy of pose estimation while maintaining high stability.

Then, we proposed 4 different types of errors, which integrated into the simulation data, and analyze the stability and applicability of the method.

Type 1: Uniform and co-directional Gauss noise error: Integrated into all feature points with mean value of zero and covariance of (0.25mm^{2}) ·${{\boldsymbol I}_3}$.

Type 2: Non-uniform and co-directional Gauss noise error: Integrated into the *i*-th feature point (*i* = 1,2,…,*N*) with a mean value of zero and covariance of $\sigma {{\boldsymbol I}_3}$, which $\sigma$ is randomly choose from [0mm^{2}, 0.5mm^{2}].

Type 3: Uniform and non-codirectional Gauss noise error: Integrated into all feature points with a mean value of zero and covariance of $diag({{\sigma_1}\cdot {\sigma_2}\cdot {\sigma_3}} )$, which ${\sigma _1}$, ${\sigma _2}$, ${\sigma _3}$ is independent mutually and randomly choose from [0mm^{2}, 0.5mm^{2}].

Type 4: Non-uniform and non-codirectional Gauss noise error: Integrated into the *i*-th feature point (*i* = 1,2,…,*N*) with a mean value of zero and covariance of $({{\sigma_{i1}},\; {\sigma_{i2}},\; {\sigma_{i3}}} )$, which ${\sigma _{i1}}$, ${\sigma _{i2}}$, ${\sigma _{i3}}$ is independent mutually and randomly choose from [0mm^{2}, 0.5mm^{2}].

The average and the mean square pose estimation errors under the 4 types error are obtained by 100 time simulation for the four methods. The results are shown in Fig. 7.

From Fig. 7, we can see that the estimated error of the 4 pose estimation methods is not very different under the error type is uniform and co-directional (Type 1), because the four methods take account of this error type. The result of pose estimation by the SWSPSM and UWSPSM method is more accurate than the other two methods under the error type is non-uniform and co-directional (Type 2), because the two methods take into account the scale of error. The pose estimation result obtained by the UWSPSM method is the most accurate, followed by DWSPSM under the error type is uniform and non-codirectional (Type 3), because the two methods take into account the uncertainty of the direction of the error. The pose estimation of the UWSPSM method is more accurate than the other three methods under the error type is non-uniform and non-codirectional (Type 4), which is related to the uncertainty of the scale and direction of the error. In this paper, the pose estimation results of four types of error types have been verified very well.

In order to judge whether the proposed pose estimation model can significantly increase the calculation time of the algorithm on the premise of ensuring the accuracy. The relationship between the number of reference points and the calculation time (convergence time) of the 4 algorithms is shown in Fig. 8, which the number of characteristic points varies from 5 to 20 with the ellipticity of noise *r* = 10.

In Fig. 8, the calculation time of each method increases as the number of characteristic point’s increases. Compared with other SWSPSM and DWSPSM algorithms, the proposed algorithm runs for a long time. The main reason is that the proposed method takes into account the uncertainties of noise scale and direction error, which result in a slightly longer calculation time that can bear for the iterative method. It should not be neglected that the method of UWSPSM in aspect of robustness and stability, and the UWSPSM algorithm can be optimized in the subsequent work.

### 5.1.2 Robustness analysis

In this paper, we use ${{\boldsymbol R}_{true}}$ and ${{\boldsymbol t}_{true}}$ to express the true value, meanwhile, ${{\boldsymbol R}_{est}}$ and ${{\boldsymbol t}_{est}}$ to express the value of pose estimation. The $rot\_err$ and $pos\_err$ are described as:

In order to evaluate the influence of the pose estimation results under the noise uncertainty of different degrees, we set the control point number is 20, and ${\sigma _2}$=0.01 be constant, while ${\sigma _1}$ is varied from 0.01 to 0.3, and the orientation of ellipse varies randomly with *r* is gradually changed from 1 to 30. When *r* = 1, the noise is isotropic and independent, and the weight of the image point error of the objective function is the same. When $r \ne 1$, the noise becomes non-uniform and non-codirectional distributed. At the same time, ${\sigma _1}$ and ${\sigma _2}$ determine the contribution weight of image point uncertainty error in the object function.

From Fig. 9, the pose estimation error gradually increases as the ellipticity uncertainty increases. When the uncertainty of elliptic noise is *r* = 2, the error of pose estimation corresponding to the number of target feature points is 20 in Fig. 5, which can be mutually verified. Comparing the variation of the pose estimation algorithm with the elliptic noise, the proposed algorithm UWSPSM has smaller error and better stability under different error uncertainties than the other three methods. It shows that this method can accommodate different degrees of error uncertainty. Even when the sampling error is relatively large, this method can still maintain a high calculation accuracy, which is also a great advantage.

Fifty experiments were repeated for the uncertain noise added to the image points. Five pose estimation methods were used to estimate the relationship between the pose estimation error of the moving object in stereo vision and the uncertainty of the feature points. The experimental results are shown in the Fig. 10.

As can be seen from Fig. 10, with the increase of the uncertainty of elliptic noise, the estimation errors of various pose estimation algorithms also gradually increase. Comparing with the position estimation error curve in Fig. 10(a) and the attitude estimation error curve in Fig. 10(b), it can be seen that the position and attitude error curves of the three algorithms are close to those of EPNP algorithm and POSIT algorithm. It shows that the stability of the position and attitude estimation of the three algorithms is stronger than that of the other two algorithms. The stability of the algorithm in this paper is the strongest. It is shown that the algorithm in this paper can adapt to different degrees of image uncertainty. When the number of target feature points is 20, the pose estimation error (Fig. 6) corresponds to that of the image point elliptic noise uncertainty *r* = 2 in Fig. 10, which can be verified mutually. By comparing the accuracy of pose estimation, we can see that the algorithm in this paper has the highest accuracy, while the DLT algorithm has the worst accuracy.

#### 5.2 Actual experiment and result analysis

In order to further verify the effect of the algorithm in practical application, it is compared with the traditional stereo-vision measurement method. The algorithm is verified by the measurement system shown in Fig. 11(a).

The measurement test uses Mikrotron's EoSens 3CL series of high-speed and high-sensitivity cameras, model MC3010, with a resolution of 1280×1024 pixels, a pixel size of 0.008mm/pixel, and a lens model of AF Zoom-Nikkor 24-85mm/1:2.8- 4D. The main parameters of stereo-vision camera calibration and testing are shown in Table 1.

When testing, the angle range of the motion simulation device is set to $[ - 12^\circ \textrm{ } + 12^\circ ]$, the swing speed is ${{1^\circ } \mathord{\left/ {\vphantom {{1^\circ } {s}}} \right.} {s}}$, the angular positioning accuracy is $0.1^\circ$, and the position repeat positioning accuracy is 0.1mm.

The angle test experiment is divided into two parts. One is static test: any pitch angle and yaw angle group are tested every 2°, and the swing angle error is a combination of two directions error. Another part is the dynamic test. The measurement error statistics of the motion parameters obtained in the test are shown in Fig. 12 and Fig. 13.

The position measurement is converted to the cone-pendulum coordinate system, and the absolute value $|d |$ of the distance error with respect to the zero-position pendulum is used as the position measurement error. The measurement results are shown in Fig. 14.

The overall test results of the experiment are shown in Table 2.

From the comprehensive experimental results, as can be seen from Fig. 12 and Fig. 13 that the measurement error is the smallest when the swing angle is near the zero-position and the accuracy of UWPSM is obviously higher than CSPSM. The main reason is that the calibration for measurement system parameters is done at the zero position of the motion device and our method taken into account of the error uncertainty. The swing angle measurement error is increased with the swing angle of ± 12°, and the overall error is within 0.1°, which has high measurement accuracy. From Table 2 and Fig. 14, we can see that the mean value of the pendulum measurement error is 0.083mm, which has high precision. The measurement error of angle and position is directly dependent on the measurement error of the camera to match the target feature point. It is the measurement basis of the whole visual measurement and directly determines the measurement accuracy of the motion parameter. Therefore, the UWSPSM method is insensitive to image measurement errors compared with other methods, and has high measurement accuracy and strong robustness.

## 6. Conclusion

According to the problem of stereoscopic pose estimation based on feature points in practical applications, stereoscopic pose estimation with uncertainty based on projection vector is proposed. Starting from the error non-uniformity of the measured image, direction uncertainty of the feature point is incorporated into the pose estimation method. Firstly, we used the covariance matrix to represent the directional uncertainty, and then an affine transformation matrix is constructed. Therefore, the original data is transformed into the weighted covariance data space to deal with the directional uncertainty. Then, the projection vector is introduced, in which the collinearity error can be expressed by the depth information of the feature points. The introduction of the depth information can effectively eliminate the model nonlinearity caused by camera perspective projection, ensure the global convergence and robustness of the method, which is proved by the global convergence theorem. Starting from the efficiency of the algorithm, the fixed constraint relationship between stereo-vision cameras is introduced. The error equations of the new method and the number of motion parameters to be optimized are half of the traditional algorithms. One-step calculation can realize the two-step optimization process, which improves efficiency and enhances the constraints. The proposed method is verified by simulation experiment and actual experiment that the method has good convergence and robustness, can adapt to different error uncertainties and it has strong practicability.

## Funding

National Natural Science Foundation of China (61603034); China Postdoctoral Science Foundation (2019M653870XB); Fundamental Research Funds for the Central Universities (XJS191315); Natural Science Foundation of Beijing Municipality (3182027).

## Acknowledgments

This work was supported by the National Science Fund of China under Grants 61603034, China Postdoctoral Science Foundation under Grant 2019M653870XB, Beijing Municipal Natural Science Foundation (3182027) and Fundamental Research Funds for the Central Universities, China, XJS191315.

## Disclosures

The authors declare no conflict of interest.

## References

**1. **R. Laganiere, S. Gibert, and G. Roth, “Robust object pose estimation from feature-based stereo,” IEEE Trans. Instrum. Meas. **55**(4), 1270–1280 (2006). [CrossRef]

**2. **W. B. Dong and V. Isler, “A novel method for the extrinsic calibration of a 2D laser rangefinder and a camera,” IEEE Sensors J. **18**(10), 4200–4211 (2018). [CrossRef]

**3. **K. Zhou, X. J. Xiang, Z. Wang, H. Wei, and L. Yin, “Complete initial solutions for iterative pose estimation from planar objects,” IEEE Access **6**, 22257–22266 (2018). [CrossRef]

**4. **K. Zhang, Z. Q. Cao, J. R. Liu, Z. J. Fang, and M. Tan, “Real-Time Visual Measurement With Opponent Hitting Behavior for Table Tennis Robot,” IEEE Trans. Instrum. Meas. **67**(4), 811–820 (2018). [CrossRef]

**5. **J. Wang, X. Wang, F. Liu, Y. Gong, H. H. Wang, and Z. Qin, “Modeling of binocular stereo vision for remote coordinate measurement and fast calibration,” Opt. Lasers Eng. **54**(1), 269–274 (2014). [CrossRef]

**6. **Y. Liu, Z. Chen, W. J. Zheng, H. Wang, and J. G. Liu, “Monocular visual-inertial SLAM: continuous preintegration and reliable initialization,” Sensors **17**(11), 2613 (2017). [CrossRef]

**7. **V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” Int. J. Comput. Vis. **81**(2), 155–166 (2009). [CrossRef]

**8. **R. Valenti, N. Sebe, and T. Gevers, “Combining head pose and eye location information for gaze estimation,” IEEE Trans. on Image Process. **21**(2), 802–815 (2012). [CrossRef]

**9. **S. Li, C. Xu, and M. Xie, “A robust O(n) solution to the perspective-n-point problem,” IEEE Trans. Pattern Anal. Mach. Intell. **34**(7), 1444–1450 (2012). [CrossRef]

**10. **Y. Zheng, S. Sugimoto, and M. Okutomi, “ASpnp: An accurate and scalable solution to the perspective-n-point problem,” ICIET Trans. Inf. Syst. **E96**(7), 1525–1535 (2013). [CrossRef]

**11. **C. K. Sun, H. Dong, B. S. Zhang, and P. Wang, “An orthogonal iteration pose estimation algorithm based on an incident ray tracking model,” Meas. Sci. Technol. **29**(9), 095402 (2018). [CrossRef]

**12. **M. Y. Li and K. Hashimoto, “Accurate object pose estimation using depth only,” Sensors **18**(4), 1045 (2018). [CrossRef]

**13. **J. S. Cui, J. Huo, and M. Yang, “Novel method of calibration with restrictive constrains for stereo-vision system,” J. Mod. Opt. **63**(9), 835–846 (2016). [CrossRef]

**14. **J. Schlobohm, A. Pösch, E. Reithmeier, and B. Rosenhahn, “Improving contour based pose estimation for fast 3D measurement of free form objects,” Measurement **92**, 79–82 (2016). [CrossRef]

**15. **K. Yan, R. Zhao, H. Tian, E. Liu, and Z. Zhang, “A high accuracy method for pose estimation based on rotation parameters,” Measurement **122**, 392–401 (2018). [CrossRef]

**16. **H. Nguyen, D. Nguyen, Z. Wang, H. Kieu, and M. Le, “Real-time, high-accuracy 3d imaging and shape measurement,” Appl. Opt. **54**(1), A9–A17 (2015). [CrossRef]

**17. **P. Kellnhofer, T. Ritschel, K. Myszkowski, and H. P. Seidel, “Optimizing disparity for motion in depth,” Comput. Graph. Forum. **32**(4), 143–152 (2013). [CrossRef]

**18. **Z. Cai, X. Liu, A. Li, Q. Tang, X. Peng, and B. Z. Gao, “Phase-3D mapping method developed from back projection stereovision model for fringe projection profilometry,” Opt. Express **25**(2), 1262–1277 (2017). [CrossRef]

**19. **Y. Cui, F. Q. Zhou, Y. X. Wang, L. Liu, and H. Gao, “Precise calibration of binocular vision system used for vision measurement,” Opt. Express **22**(8), 9134–9149 (2014). [CrossRef]

**20. **N. El Batteoui, M. Merras, A. Saaidi, and K. Satori, “Camera self-calibration with varying intrinsic parameters by an unknown three-dimensional scene,” Vis. Comput. **30**(5), 519–530 (2014). [CrossRef]

**21. **W. M. Li and D. Zhang, “A high-accuracy monocular self-calibration method based on the essential matrix and bundle adjustment,” J. Mod. Opt. **61**(19), 1556–1563 (2014). [CrossRef]

**22. **A. Censi and D. Scaramuzza, “Calibration by correlation using metric embedding from nonmetric similarities,” IEEE Trans. Pattern Anal. Mach. Intell. **35**(10), 2357–2370 (2013). [CrossRef]

**23. **R. M. Haralick, H. Joo, C. N. Lee, X. Zhang, V. G. Vaidya, and M. B. Kim, “Pose estimation from corresponding point data,” IEEE Trans. Syst., Man, Cybern. **19**(6), 1426–1446 (1989). [CrossRef]

**24. **C. P. Lu, G. D. Hager, and E. Mjolsness, “Fast and globally convergent pose estimation from video images,” IEEE Trans. Pattern Anal. Mach. Intell. **22**(6), 610–622 (2000). [CrossRef]

**25. **G. Schweighofer and A. Pinz, “Robust pose estimation from a planar target,” IEEE Trans. Pattern Anal. Mach. Intell. **28**(12), 2024–2030 (2006). [CrossRef]

**26. **C. Xu, L. Zhang, L. Cheng, and R. Koch, “Pose estimation from line correspondences: A complete analysis and a series of solutions,” IEEE Trans. Pattern Anal. Mach. Intell. **39**(6), 1209–1222 (2017). [CrossRef]

**27. **G. Caron, A. Dame, and E. Marchand, “Direct model based visual tracking and pose estimation using mutual information,” Image Vis. Comput. **32**(1), 54–63 (2014). [CrossRef]

**28. **J. Huo, G. Y. Zhang, J. S. Cui, and M. Yang, “A novel algorithm for pose estimation based on generalized orthogonal iteration with uncertainty-weighted measuring error of feature points,” J. Mod. Optic. **65**(3), 331–341 (2018). [CrossRef]

**29. **X. K. Miao, F. Zhu, and Y. M. Hao, “A new pose estimation method based on uncertainty-weighted errors of the feature points,” J. Optoelectronics Laser **23**(7), 1348–1355 (2012).

**30. **J. S. Cui, C. W. Min, X. Y. Bai, and J. R. Cui, “An Improved Pose Estimation Method Based on Projection Vector with Noise Error Uncertainty,” IEEE Photon. J. **11**(2), 1–16 (2019). [CrossRef]

**31. **S. Ghosh, R. Ray, S. R. Vadali, S. N. Shome, and S. Nandy, “Reliable pose estimation of underwater dock using single camera: a scene invariant approach,” Mach. Vis. Appl. **27**(2), 221–236 (2016). [CrossRef]

**32. **D. Raviv, C. Barsi, N. Naik, M. Feigin, and R. Raskar, “Pose estimation using time-resolved inversion of diffuse light,” Opt. Express **22**(17), 20164–20176 (2014). [CrossRef]

**33. **L. M. Zhang, F. Zhu, Y. M. Hao, and W. Pan, “Rectangular-structure-based pose estimation method for non-cooperative rendezvous,” Appl. Opt. **57**(21), 6164–6173 (2018). [CrossRef]

**34. **J. G. Wang, X. Xiao, and B. Javidi, “Three-dimensional integral imaging with flexible sensing,” Opt. Lett. **39**(24), 6855–6858 (2014). [CrossRef]

**35. **Z. F. Luo, K. Zhang, Z. G. Wang, J. Zheng, and Y. X. Chen, “3D pose estimation of large and complicated workpieces based on binocular stereo vision,” Appl. Opt. **56**(24), 6822–6836 (2017). [CrossRef]

**36. **R. M. Steele and C. Jaynes, “Feature uncertainty arising from covariant image noise,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2005), pp. 1063–1070.

**37. **P. J. Huber and E. M. Ronchetti, * Robust Statistics*. John Wiley & Sons, New Jersey, 2009.

**38. **G. B. Chang, T. H. Xu, and Q. X. Wang, “M-estimator for the 3D symmetric Helmert coordinate transformation,” J. Geod. **92**(1), 47–58 (2018). [CrossRef]

**39. **G. B. Chang, T. H. Xu, and Q. X. Wang, “Error analysis of the 3D similarity coordinate transformation,” GPS Solut. **21**(3), 963–971 (2017). [CrossRef]

**40. **G. B. Chang, T. H. Xu, Q. X. Wang, and M. Liu, “Analytical solution to and error analysis of the quaternion based similarity transformation considering measurement errors in both frames,” Measurement **110**, 1–10 (2017). [CrossRef]

**41. **O. D. Faugeras and M. Hebert, “The representation, recognition, and locating of 3D shapes from range data,” Mach. Intell. Pattern Recogn. **3**, 13–51 (1986). [CrossRef]

**42. **B. K. P. Horn, “Cloud-form solution of absolute arientation using unit quaternion,” J. Opt. Soc. Am. A **4**(4), 629–642 (1987). [CrossRef]

**43. **K. S. Arun, T. S. Huang, and S. D. Blostein, “Least-squares fitting of two 3-d point sets,” IEEE Trans. Pattern Anal. **PAMI-9**(5), 698–700 (1987). [CrossRef]

**44. **D. G. Luenberger and Y. Y. Ye, * Linear and Nonlinear Programming*. 3

^{rd}ed, Springer, New York, 2008.

**45. **P. Anandan and M. Irani, “Factorization with uncertainty,” Int. J. Comput. Vis. **49**(2/3), 101–116 (2002). [CrossRef]