Convenient calibration method for unsynchronized camera networks using an inaccurate small reference object

Jae-Hean Kim; Bon-Ki Koo

doi:10.1364/OE.20.025292

1. Introduction

To perform vision-based tasks such as 3D reconstruction and tracking using camera networks, the cameras must be calibrated in advance. To obtain accurate calibration results, the first method developed consists in setting up a calibration object with great accuracy [1]. In cases where the image capture volume is large, the calibration object should also be as large as possible. This is due to the fact that the points on the object should cover the volume so as to provide enough reference information. However, in practice it is difficult to handle and maintain a large calibration object; while manufacturing a large object with accuracy is both expensive and poses a difficult problem to resolve.

To avoid the problem of having to use a 3D calibration object, a method using a simple 2D object was proposed in [2, 3]. In practice, this method is very handy where a small capture volume is involved. However, certain drawbacks remain when the capture volume is increased. The large 2D object should be flat and must be moved by hand to obtain multiple images so as to make calibration possible. Moreover, for camera networks in which the cameras are arranged around the capture volume, it is impossible for the cameras on the opposite side to see the reference points concurrently when using a 2D object. This is one of the disadvantages of using a 2D object compared to using a 3D object. To acquire good relative extrinsic parameters between the cameras, it is desirable to obtain as many reference points observed in a common coordinate system as possible.

Svoboda et al. suggested a more convenient method in which a freely moving bright spot is used as the calibration object [4]. This method, which employs the self-calibration technique, requires electronically synchronized cameras that can capture the multiple images simultaneously. However, since the cameras may have different shutter lag and no external trigger input for Genlock, this requirement cannot always be guaranteed. If synchronization is not ensured for the method, then a large number of still images should be captured, as in stop-motion animation, which is a very difficult task. Consequently, such a method is unsuitable for unsynchronized systems, so in this case we would have no choice but to use large 2D or 3D reference objects, as in previous methods. Moreover, methods based on the self-calibration technique require a great deal of overlap among the many cameras to ensure stable calibration results. This implies the need for densely distributed cameras. The self-calibration technique also suffers from critical configurations of cameras and reference points, for which self-calibration is not possible [1].

Many methods adopting the idea of using moving bright spots have been suggested for the calibration of sparsely distributed camera networks. Kurillo et al. and Medeiros et al. each proposed a method based on pairwise calibration [5, 6]. They used a rod of known length with LED lights at each end to establish a consistent scale for sparsely distributed cameras. Chen et al. used an iterative approach to refine the calibration results from unsynchronized cameras [7]. However, all these methods assume that the intrinsic parameters are given a priori, which contributes to stable calibration results for sparsely distributed cameras.

Sinha and Pollefeys developed a multi-camera calibration method using the motion of silhouettes [8]. This method makes use of discrete sampling on the silhouette boundary of a subject to find the frontier points. The latter can be used to determine the epipolar geometry between any two views. This elegant approach has the advantage of not using any calibration object. However, many frontier points should be spread over the images (See Fig.8 in [8]) to obtain reliable results. These can be obtained when a subject moves widely over the capture volume. However, if the cameras are unsynchronized, the subject must act as it would in a stop-motion animation. Although this work also suggested a scheme for solving the problem of unsynchronized video streams, an approximate interpolation strategy was used to treat sub-frame discrepancy, and it can only be applied when the subject’s inter-frame motion is small.

Another method which also uses object silhouettes, by introducing an optimization process over a continuous space, also has been suggested [9]. Although this method alleviates the above requirements concerning the distribution of the frontier points, it requires good initialization of the camera parameters.

In this paper, we propose a new calibration algorithm using a small 3D reference object for camera networks. The purpose is to overcome the problem stated above and to eliminate the need for large 2D or 3D reference objects in the case of unsynchronized camera systems. The method is very useful in that it is easier to manufacture, move, and maintain a small reference object. The extrinsic and intrinsic parameters are recovered simultaneously by capturing the object, which is placed arbitrarily in different locations, in the capture volume. The calibration object is maneuvered so that the reference points on the object can fully cover the capture volume. The relative pose parameters between the objects are obtained simultaneously with the camera parameters. Thus the object acts as a large virtual calibration object, and the proposed method provides a simple and accurate means of calibrating camera networks. Moreover, since the proposed method can compensate the inaccuracy of the object, the manufacturing cost of the 3D reference object can be reduced. The radial lens distortion is also calibrated simultaneously using a similar method to [2].

We also think that if the reference object has many feature points on the surface and the object is positioned in many locations, many corresponding image points can be obtained and these can also be used as the input of the algorithm of [4]. In this paper, this issue is also considered in an experimental comparison, which shows that the previous methods cannot obtain stable estimation results in the case of sparsely distributed cameras compared with the proposed method.

2. Related works

To calibrate camera networks, Kassebaum et al. proposed a method of placing a 3D calibration object in only one position in each pairwise view overlap [10]. They assumed that the intrinsic parameters were given by a separate estimation method. Although the method can obtain reasonable results by virtue of the given parameters, the need for a large object for accuracy cannot be avoided. Tests are presented in this paper to show that the results obtained with that approach are substantially less accurate than the ones achieved with our algorithm.

Most self-calibration methods based on the factorization approach decompose a point measurement matrix into the camera and point parameters [4, 11–14]. The proposed method newly suggests a factorization-based approach which decomposes projection matrices corresponding to each object location into camera and object parameters. The proposed method is inspired by the works of [3, 15, 16], in which the measurement matrix of primitives, e.g., rectangles or parallelepipeds, was decomposed. The proposed method applies this type of work to multi-camera calibration using a small reference object. Since the projection matrices have more geometric information than points, it is very helpful in practice for sparsely distributed cameras compared to the point-based self-calibration technique.

The point-based factorization method used an iterative approach to estimate the scale factors corresponding to the projective depths of each point [4, 12]. However, in the proposed framework, very simple approaches can be used to estimate the scale factors linearly. Two alternative approaches are introduced. One of them was originated from the work of Wilczkowiak et al. [16]. In that work, they also needed a rank-3 factorization as in our case to decompose the measurement matrix of parallelepipeds. In this paper, we analyze the two approaches and suggest which one is better in terms of accuracy.

The work most closely related with the proposed method is that of Ueshiba and Tomita [3]. However, this method has the disadvantage of using a 2D reference object as mentioned above and, consequently, has many missing entries in the measurement matrix when used in camera network environments. They did not suggest any method to resolve this problem as in [13, 14], and only performed experiments for a small capture volume. However, in the proposed method using a 3D object, a simple approach is introduced to resolve this problem.

3. Preliminaries

3.1. Camera parameterization

The camera is represented using a pinhole model. The projection model of a point X in the 3D world coordinate system to a point x in the 2D image of the ith camera is expressed as follows:

x ≅ K_{i} [\begin{matrix} R_{i} & t_{i} \end{matrix}] X ≅ M_{i} X,

where x and X are homogeneous coordinates of the points and ’≅’ indicates equality up to scale. K_i is the camera’s intrinsic matrix, given by [1]. The rotation matrix R_i and the vector t_i represent the camera’s orientation and position, respectively. The 3 × 4 matrix M_i encapsulates the camera’s intrinsic and extrinsic parameters.

3.2. Object pose parameterization

Let X_ref be the homogeneous coordinates of a reference point with respect to the coordinate system attached to the 3D object. When the object is located at the jth position on the world coordinate system, the point is represented as follows:

X = [\begin{matrix} S^{j} & v^{j} \\ 0^{T} & 1 \end{matrix}] X_{ref} = N^{j} X_{ref},

where the rotation matrix S^j represents the object’s orientation and the vector v^j its position.

4. Image measurements

4.1. Projection matrix

Consider the projection of a reference point into a camera image plane. Using the results from Section 3, the projection of the reference points in the image is:

x ≅ M_{i} N^{j} X_{ref} = P_{i}^{j} X_{ref},

where

P_{i}^{j} = M_{i} N^{j} = K_{i} [\begin{matrix} R_{i} S^{j} & R_{i} v^{j} + t_{i} \end{matrix}]

.

The 3 × 4 matrix $P_{i}^{j}$ will be called the projection matrix. It represents a perspective projection that maps the reference points on the object located at the jth position onto the corresponding imaged points of the ith camera. This is illustrated in Fig 1. The rotation matrix R_iS^j and the vector R_iv^j + t_i represent the ith camera’s rotation and position with respect to the coordinate system attached to the jth object.

Fig. 1 Camera and 3D object pose parameters

Download Full Size | PDF

When at least six reference points in general position on the object can be viewed, the projection matrix can be computed up to scale [17]. Assume that the object is positioned in jth location and can be viewed from the ith camera. Let $X_{ref}^{k}$ be the kth point on the reference object with respect to the coordinate system attached to the object. Let ${\tilde{x}}_{i}^{j_{k}} (= {[u, v]}^{T})$ be the corresponding point measured from the captured image. For each point correspondence $X_{ref}^{k} \leftrightarrow {\tilde{x}}_{i}^{j_{k}}$ , we can derive a relationship

[\begin{matrix} 0^{T} & {- X_{ref}^{k}}^{T} & {v X_{ref}^{k}}^{T} \\ {X_{ref}^{k}}^{T} & 0^{T} & {- u X_{ref}^{k}}^{T} \\ {- v X_{ref}^{k}}^{T} & {u X_{ref}^{k}}^{T} & 0^{T} \end{matrix}] p = 0

where

p^{T} = [\begin{matrix} P_{i}^{j 1} & P_{i}^{j 2} & P_{i}^{j 3} \end{matrix}]

, and

P_{i}^{j l}

is a 4-vector, the lth row of

P_{i}^{j}

. From set of n point correspondences, a 2n × 12 matrix Λ can be obtained by stacking up the Eq. (4). The projection matrix is computed up to scale by solving the set of equations

Λ p = 0 .

The parameters of the projection matrix are further refined by minimizing the re-projection errors defined as follows:

\sum_{k \in I_{i}^{j}} {‖ {\tilde{x}}_{i}^{j_{k}} - {\hat{x}}_{i}^{j_{k}} (P_{i}^{j}, X_{ref}^{k}) ‖}^{2},

where

I_{i}^{j}

is the set of indices of the reference points that can be visible from the ith camera when the object is placed in the jth location, and

{\hat{x}}_{i}^{j_{k}}

represents the image points synthesized with

P_{i}^{j}

and

X_{ref}^{k}

. This non-linear minimization problem is solved with the Levenberg-Marquardt algorithm initialized with the results from Eq. (5).

4.2. Measurement matrix

Let us now consider the case in which an object is located at n different locations and seen by m cameras. Let ${\tilde{P}}_{i}^{j}$ be the estimation result of the projection matrix associated with the ith camera and the jth object pose and $λ_{i}^{j}$ a scale factor such that the following equality can be written:

λ_{i}^{j} {\tilde{P}}_{i}^{j} = M_{i} N^{j} = P_{i}^{j} .

We may gather the estimated projection matrix for all m cameras and n object poses into the following single matrix:

\tilde{W} = [\begin{matrix} {\tilde{P}}_{1}^{1} & \dots & {\tilde{P}}_{1}^{n} \\ ⋮ & ⋱ & ⋮ \\ {\tilde{P}}_{m}^{1} & \dots & {\tilde{P}}_{m}^{n} \end{matrix}] .

The matrix W̃ will be called the measurement matrix. When the scale factors are recovered, the measurement matrix can be factorized as follows:

[\begin{matrix} λ_{1}^{1} {\tilde{P}}_{1}^{1} & \dots & λ_{1}^{n} {\tilde{P}}_{1}^{n} \\ ⋮ & ⋱ & ⋮ \\ λ_{m}^{1} {\tilde{P}}_{m}^{1} & \dots & λ_{m}^{n} {\tilde{P}}_{m}^{n} \end{matrix}] = [\begin{matrix} P_{1}^{1} & \dots & P_{1}^{n} \\ ⋮ & ⋱ & ⋮ \\ P_{m}^{1} & \dots & P_{m}^{n} \end{matrix}] = [\begin{matrix} M_{1} \\ ⋮ \\ M_{m} \end{matrix}] [\begin{matrix} N^{1} & \dots & N^{n} \end{matrix}]

However, since the projection matrices are obtained up to scale, the measurement matrix cannot be factorized at its current form.

5. Parameter estimation

5.1. Rescaling the measurement matrix (Method 1)

In this section, we will describe how to solve the scale factor problem referred to in Section 4.2. Let ${\tilde{H}}_{i}^{j}$ be the leading 3 × 3 sub-matrix of ${\tilde{P}}_{i}^{j}$ , which can be written as ${\tilde{H}}_{i}^{j} ≅ K_{i} R_{i} S^{j}$ . Assume that, for alternative scale factors ${ρ_{i}^{j}}$ , the following reduced measurement matrix can be factorized as follows:

\tilde{Y} = [\begin{matrix} ρ_{1}^{1} {\tilde{H}}_{1}^{1} & \dots & ρ_{1}^{n} {\tilde{H}}_{1}^{n} \\ ⋮ & ⋱ & ⋮ \\ ρ_{m}^{1} {\tilde{H}}_{m}^{1} & \dots & ρ_{m}^{n} {\tilde{H}}_{m}^{n} \end{matrix}] = [\begin{matrix} a_{1} K_{1} R_{1} \\ ⋮ \\ a_{m} K_{m} R_{m} \end{matrix}] [\begin{matrix} S^{1} & \dots & S^{n} \end{matrix}] .

If {a_i} have the values such that

det (a_{i} K_{i} R_{i}) = 1,

for all the i’s, then

det (ρ_{i}^{j} {\tilde{H}}_{i}^{j}) = 1 .

From this observation, we can see that if all the

{\tilde{H}}_{i}^{j}

’s are rescaled to have unit determinants, the above factorization is possible. Then, the alternative scale factors can be determined as

ρ_{i}^{j} = det {({\tilde{H}}_{i}^{j})}^{- 1 / 3} .

5.2. Rescaling the measurement matrix (Method 2)

In fact, after obtaining ${\tilde{P}}_{i}^{j}$ , we can acquire ${λ_{i}^{j}}$ by RQ-decomposition of ${\tilde{H}}_{i}^{j}$ ’s. Assume that ${\tilde{H}}_{i}^{j}$ is decomposed as ${\tilde{H}}_{i}^{j} = Σ Θ$ , where Σ is an upper triangular matrix and Θ is an orthogonal matrix whose determinant is 1. Since K_i is an upper triangular matrix and (R_iS^j) is also rotation matrix, we can think that $Σ = μ_{i}^{j} K_{i}$ and Θ = R_iS^j. Moreover, since (K_i)₃₃ is 1, (Σ)₃₃ is $μ_{i}^{j}$ .

Because $H_{i}^{j} = K_{i} R_{i} S^{j}$ , where $H_{i}^{j}$ is the leading 3 × 3 sub-matrix of $P_{i}^{j}$ , the scale factors can be determined as $λ_{i}^{j} = 1 / μ_{i}^{j}$ .

5.3. Analysis of the two rescaling methods

In this section, a more detailed discussion of the two rescailing methods described in Sections 5.1 and 5.2 is given.

Suppose that ${\tilde{H}}_{i}^{j}$ has RQ-decomposition into the product of Σ and Θ, where Σ is an upper triangular matrix and Θ is an orthogonal matrix whose determinant is 1. Let the diagonal elements of Σ be a, b, and c. Then, the scale factor from Method 2 is 1/c. It is worthwhile to note that {(K_i)₁₁, (K_i)₂₂}(= {f_u, f_v}) = {a/c, b/c}. The determinant of ${\tilde{H}}_{i}^{j}$ is the product of the diagonal elements, abc.

Consequently, the scale factor $ρ_{i}^{j}$ from Method 1 is (abc)^−1/3 and $λ_{i}^{j}$ from Method 2 is (ccc)^−1/3. However, it is worthwhile to note that these scale factors are not ideal values because they are computed from ${\tilde{H}}_{i}^{j}$ contaminated with image noise. From this observation, we can see that $ρ_{i}^{j}$ has better numerical conditioning than $λ_{i}^{j}$ in the presence of noise. When ${\tilde{H}}_{i}^{j}$ are contaminated with image noise, the values of a, b, and c are also deviated from their true values. Since these variables are independent of each other, the product of a, b, and c has noise reduction effect compared to ccc. On the other hand, the product of three c’s amplifies its noise.

To support this analysis, we performed the following simulated experiments. We generated 3000 random ${\tilde{H}}_{i}^{j}$ ’s. Zero-mean Gaussian-distributed noise was added to the elements of these matrices. The standard derivation used for each element was 10% of the magnitude of the element. Fig. 2 shows the distribution of the relative error of abc and ccc. For example, the relative error of abc is ‖abc − ãb̃c̃‖/abc, where {a, b, c} are obtained from $H_{i}^{j}$ and {ã, b̃, c̃} from contaminated ${\tilde{H}}_{i}^{j}$ . The experimental comparison and evaluation of the two methods are presented in Section 6.1.2.

Fig. 2 The distribution of the relative error of abc and ccc. The variables, a, b, and c are diagonal elements of the upper triangular matrix obtained through RQ-decomposition of H(= KRS).

Download Full Size | PDF

5.4. Factorization

As usual in the previous factorization approaches, the SVD (Singular Value Decomposition) is used to obtain the low-rank factorization of Ỹ. Let the SVD of Ỹ be given as:

\tilde{Y} = U_{3 m \times 3 n} D_{3 n \times 3 n} V_{2 n \times 3 n}^{T} .

Assume that the diagonal matrix D contains the singular values of Ỹ: σ₁≥σ₂≥. . . ≥σ₃_n. In the absence of noise, Ỹ satisfying Eq. (10) has rank 3 and consequently σ₄ = σ₅ = . . . = σ₃_n = 0. If noise were present, this would not be the case. If we want to find the rank 3 matrix which is closest to Ỹ in the Frobenius norm, such a matrix can be obtained by setting all the singular values, besides the three largest ones, to zero. Let Ū and V̄ be the matrices consisting of the first three columns of U and V, respectively. Then, the rank-3 factorization result can be given as:

\begin{array}{l} \bar{Y} & = & {\bar{U}}_{3 m \times 3} \sqrt{D} \cdot {{\bar{V}}_{2 n \times 3} \sqrt{D}}^{T} \\ = & {\hat{U}}_{3 m \times 3} {\hat{V}}_{2 n \times 3}^{T}, \end{array}

where

\sqrt{D} = diag (\sqrt{σ_{1}}, \sqrt{σ_{2}}, \sqrt{σ_{3}})

. However, the factorization result is not unique because the following is also a valid factorization:

\bar{Y} = ({\hat{U}}_{3 m \times 3} T) (T^{- 1} {\hat{V}}_{2 n \times 3}^{T}),

where T is an arbitrary non-singular 3 × 3 matrix. To obtain the camera and object pose parameters, we have to resolve this affine ambiguity. This issue is considered in the next section. It is worthwhile noting that this ambiguity always exists even if the method described in section 5.2 is used.

5.5. Resolving affine ambiguity

Let the matrices Û_3m_×3 and V̂₂_n_×3 in Eq. (16) be decomposed in the 3 × 3 and 2 × 3 submatrices, respectively: ${\hat{U}}_{3 m \times 3}^{T} = [\begin{matrix} U_{1}^{T} & \dots & U_{m}^{T} \end{matrix}]$ and ${\hat{V}}_{2 n \times 3}^{T} = [\begin{matrix} V_{1}^{T} & \dots & V_{n}^{T} \end{matrix}]$ .

From Eq. (10) and (16), we have $T^{- 1} V_{j}^{T} = S^{j}$ . This can be written as:

T T^{T} = {V_{j}}^{T} V_{j} .

From the summation of Eq. (17) for all j’s, it can be seen that TT^T ≅ D because V̂^TV̂ = D and, consequently, that

T ≅ \sqrt{D} R_{w}

, where R_w is an arbitrary rotation matrix representing the fact that the global Euclidean reference frame can be chosen arbitrarily.

5.6. Reconstructing the camera and object pose parameters

We can obtain K_i and R_i from RQ-decomposition of U_iT and S_j from the rotation matrix closest to $T^{- 1} V_{j}^{T}$ [18].

Let ${\tilde{h}}_{i}^{j}$ be the fourth column of ${\tilde{P}}_{i}^{j}$ , which can be written as:

ρ_{i}^{j} {\tilde{h}}_{i}^{j} = μ_{i}^{j} (K_{i} R_{i} v^{j} + K_{i} t_{i}),

where

μ_{i}^{j}

can be obtained from

det (ρ_{i}^{j} {\tilde{H}}_{i}^{j}) = det (μ_{i}^{j} K_{i} R_{i} S_{j})

or

μ_{i}^{j} = {[det (ρ_{i}^{j} {\tilde{H}}_{i}^{j}) / det (K_{i})]}^{1 / 3}

. From Eq. (18), three independent linear equations are obtained and these equations for all pairs of the camera and object pose can be solved to obtain t_i and v^j.

5.7. Filling missing entries

In practice, for camera networks where all the fields of view of the cameras do not overlap, the calibration object placed in different locations cannot be seen from all the cameras simultaneously. In this case, some entries in the matrix Ỹ are missing and the factorization cannot proceed. To fill the missing entries, the following simple deduction process can be used.

{\tilde{H}}_{i}^{j} ≅ {\tilde{H}}_{i}^{l} {({\tilde{H}}_{k}^{l})}^{- 1} {\tilde{H}}_{k}^{j},

where

{\tilde{H}}_{i}^{l}

,

{\tilde{H}}_{k}^{l}

, and

{\tilde{H}}_{k}^{j}

can be measured for the kth camera and the lth object position. The candidates for the intermediate kth camera and the lth object position may be multiple. In this case, all the candidates can be used simultaneously to compensate the image noise.

Owing to the simple loop style Eq. (19), the filling process can be completed much more simply compared to the complex process of the point-based factorization methods [13, 14], which fit a low rank matrix to the measurement matrix.

5.8. Refining the parameters

The parameters are further refined by minimizing the re-projection errors defined as follows:

\sum_{i = 1}^{m} \sum_{j = 1}^{n} \sum_{k \in I_{i}^{j}} {‖ {\tilde{x}}_{i}^{j_{k}} - {\hat{x}}_{i}^{j_{k}} (K_{i}, R_{i}, t_{i}, S^{j}, v^{j}, X_{ref}^{k}) ‖}^{2},

where

X_{ref}^{k}

is the kth point on the reference object and

I_{i}^{j}

is the set of indices of the reference points that can be visible from the ith camera when the object is placed in the jth location.

{\tilde{x}}_{i}^{j_{k}}

represents the image points measured from the captured images while

{\hat{x}}_{i}^{j_{k}}

represents the image points synthesized with the camera and object parameters. This non-linear minimization problem is solved with the Levenberg-Marquardt algorithm initialized with the estimation results obtained using the technique described in the previous sections. Due to the multi-view constraints imposed on the factorization, consistency of the rigid transformations among cameras and objects is guaranteed. These consistent estimation results can be used directly to the initialization without any pretreatment.

5.9. Compensating the inaccuracy of a reference object

Critical issues when using a 3D reference object are manufacturing cost and accuracy. There is a trade-off between these two factors. The manufacturing accuracy severely affects the calibration quality in the previous method using a 3D object. On the other hand, since the images of the same object placed in different locations are captured using the proposed method, it is possible to compensate the inaccuracy of the object.

It is assumed that the object is composed of planes and that the planes are totally flat. Because the size of the object in the proposed method is small, it is not so difficult to satisfy this requirement for individual small planes. As such, it can be thought that the inaccuracy is caused while these planes are being assembled and patterns printed on the surface. Since it is easy to measure the relative 2D positions between the feature points on each surface plane, we can limit the variables regarding the inaccuracy to the rotation matrices R_f and the translation vectors t_f of the planes with respect to the coordinate attached to the object. These variables are described in Fig. 3 for one example plane. Since the planes at different positions can be viewed from more than two cameras concurrently, as in Fig. 3, and the variables are not changed while the object moves, they can be computed to satisfy overall consistency. The re-projection error function is extended from Eq. (20) as follows:

\sum_{i = 1}^{m} \sum_{j = 1}^{n} \sum_{k \in I_{i}^{j}} {‖ {\tilde{x}}_{i}^{j_{k}} - {\hat{x}}_{i}^{j_{k}} (K_{i}, R_{i}, t_{i}, S^{j}, v^{j}, X_{ref}^{k} (R_{f (k)}, t_{f (k)})) ‖}^{2},

where f(k) is the index of the face including the kth point. R_f(k) and t_f(k) are initialized according to the original drawing of the object. The other parameters are initialized with the estimation results obtained up to section 5.7.

Fig. 3 Rotation matrix R_f and translation vector t_f representing the inaccurcy of the reference object.

Download Full Size | PDF

5.10. Dealing with radial distortion

Up to now, the lens distortion of cameras has not been considered. In the proposed method, the first two terms of radial distortion are considered because it has been found that a more elaborate model offers negligible improvement when compared with sensor quantization and just causes numerical instability [2]. A similar method used in [2] is adopted. ${\hat{x}}_{i}^{j_{k}}$ of Eq. (21) is extended to include the radial lens distortion parameters. The real (distorted) normalized image coordinates (x̆, y̆) are represented as follows:

\overset{⌣}{x} = x + x [k_{1} (x^{2} + y^{2}) + k_{2} {(x^{2} + y^{2})}^{2}]

\overset{⌣}{y} = y + y [k_{1} (x^{2} + y^{2}) + k_{2} {(x^{2} + y^{2})}^{2}],

where (x, y) are distortion-free normalized image coordinates, and k₁ and k₂ are the coefficients of the radial distortion. An initial guess of k₁ and k₂ can be obtained through similar equations to Eq. (13) in [2], which can be modified to be proper for the proposed method or simply by setting them to 0.

The initial values of the other parameters can be set as the estimation results obtained up to Section 5.9. Otherwise, both parameters related to object inaccuracy and radial distortion are refined simultaneously by initializing the other parameters with the estimation results obtained up to Section 5.7. It was found that the results of the former and latter approaches are almost the same, but that the former approach requires more computational time. All of the experimental results presented in this paper were obtained using the latter approach.

6. Experimental results

6.1. Simulated experiments

Simulated experiments were performed to analyze the performance of the proposed algorithm. The advantage of simulated experiments instead of real image experiments is that a rigorous ground truth is available, while the relevant parameters are varied systematically.

Two types of experimental environments were constructed. First, six cameras were evenly located on a circle with a radius of 3m (See Fig. 4(a)). Since all the cameras looked towards the center, all the fields of view of the cameras overlapped. In the second environment, five cameras were mounted on the side walls of a corridor (See Fig. 4(b)). Since the optical axes of all the cameras were set to be parallel to the end wall of the corridor, not all of the fields of view of the cameras overlapped. The length, width, and height of the corridor were 5m, 6m, and 2m, respectively. We refer to these two environments as ENV1 and ENV2, respectively.

Fig. 4 The environments for the simulated experiments. (a) ENV1. (b) ENV2.

Download Full Size | PDF

Synthetic 1600×1200 images were taken by the cameras with the following intrinsic parameters: (f_u, f_v, s, u₀, v₀)=(1600, 1600, 0, 800, 600). Zero-mean Gaussian-distributed noise with standard deviation σ was added to the image projections. It was assumed that there were no gross outliers owing to false correspondence. However, to consider outliers in feature point detection owing to perspective and photometric effect, 10% of the noise-vectors are replaced with those from another Gaussian distribution with 2σ for all simulated experiments.

The 3D object having the shape shown in Fig. 7(a) was used and each face had 9 reference points. The number of trials was 100 for each simulation and the results shown are the averages. All results were obtained with the method described in Section 5.1 in determining the scale factors except for the results of Section 6.1.2. Since the performances obtained for all intrinsic parameters are highly similar, the estimation results for f_v, u_o, and v_o are not depicted here.

6.1.1. Performance w.r.t noise level, object size, and number of object positions

In this section, we analyze the performance with respect to the noise magnitude, the size of the reference object, and the number of object positions.

All of the results are displayed in Table 1. From the results, it is clear that the proposed method can obtain accurate estimates with a relatively small calibration object. When the number of object positions is increased in the case of a smaller object (240mm), error compensation can be observed. The object with a size of 240mm had a reasonable minimum size for acquiring stable results for the given environments. It is worthwhile noting that the number of object positions is equal to the number of shots required to calibrate the network. In ENV2, more object positions were needed than in ENV1, because not all the cameras viewed the same region and the object was placed in several separate view overlaps. Fig. 5 shows the visibility map for the object positions in ENV2 when the number of object positions is 25. From this map, it can be seen that the results for ENV2 validate the process of filling the missing entries described in Section 5.7.

Fig. 5 The visibility map for the object positions in ENV2 when the number of object positions is 25. The horizontal and vertical axes represent the object position index and camera index, respectively.

Download Full Size | PDF

Table 1. Results of the simulated experiments for performance evaluation. σ, W, and P represent the noise standard deviation in pixels, the object width in mm, and the number of object positions, respectively. RPE is the re-projection error in pixels. E_{f_u} is error for f_u. Rotation error E_R is measured by the Frobenius norm of the discrepancy between the true and estimated rotation matrix. E_t is the norm of error for the translational vector in mm.

View Table | View all tables in this article

The last four rows of Table 1 are to compare the proposed method with the method suggested by Kassebaum et al. [10]. As expected, the method could not yield accurate estimation results without a sufficiently sized object. To obtain estimation results comparable with those of the proposed method, that method would have needed a very large object with a width of 1450mm in these simulation environments. Moreover, even though a large object was used, that method obtained worse results than those of the proposed method when the intrinsic parameters were not given. This is due to the fact that the feature points cannot be placed inside the object. Unlike when using a large object placed in only one position, the proposed method can provide a great number of feature points evenly distributed over the capture volume.

6.1.2. Comparison of the two rescaling methods

We suggested two methods of determining the scale factors in Sections 5.1 and 5.2. The comparison of the two methods is summarized in Table 2. The number of object positions was 25 in this experiment and the environment was ENV2. After carrying out the factorization step, the RPE results obtained from Method 1 were better than those obtained with the Method 2. These results can be expected from the analysis of Section 5.3.

Table 2. Results of the simulated experiments used to compare the methods in Sections 5.1 and 5.2. RPE1 is the re-projection error obtained after the factorization step and RPE2 after the refinement step.

View Table | View all tables in this article

These results are the initial values for the non-linear optimization described in Section 5.8. In the case of a small noise level (σ = 0.5) and a larger object (360mm), the final results after the refinement step are fortunately similar. However, the differences in the initial values affect the final results when the noise level increases or a smaller object (240mm) is used.

The feasibility of the factorization step depends on the rank 3-ness of the reduced measurement matrix Ỹ rescaled by the two methods. Fig. 6 shows the magnitude of the singular values of rescaled Ỹ for each environment of Table 2. To be close to rank 3, the ratio of the 3rd and 4th singular values should be large with respect to the ratio of the 1st and 3rd singular values. We can see that this ratio of Method 1 is larger than that of Method 2. From these experimental results, it can be concluded that Method 1 is more desirable than Method 2.

Fig. 6 The magnitude of the singular values of rescaled Ỹ for each environment of Table 2.

Download Full Size | PDF

6.1.3. Performance w.r.t object inaccuracy and radial distortion

In this section, we test the feasibility of the approach described in Sections 5.9 and 5.10. The width of the object was 360mm. The image noise was zero-mean Gaussian-distributed noise with standard deviation of 0.5. The number of object positions was 9 for ENV1 and 25 for ENV2, respectively. The results are summarized in Table 3. It can be seen that the proposed method can be used even if the reference object is inaccurate. The radial distortion parameters were also well estimated. The proposed algorithm could not converge for the inaccuracy level more than (θ̂_f, ‖t̂_f‖)=(3°, 5mm) in ENV1 and (1°, 3mm) in ENV2 (See Table 3 for an explanation of the parameters). Since there were many missing entries in ENV2 compared with ENV1, error propagation during the filling process caused this difference. However, this performance means that we can allow a considerable degree of inaccuracy with the object and decrease the manufacturing cost.

Table 3. Results of the simulated experiments on object inaccuracy and radial distortion. (θ̂_f, ‖t̂_f‖) represent the pose discrepancy of the object’s faces from an original drawing. Each face rotated around a random axis with angle θ̂_f(°) and moved in a random direction t̂_f (mm). (k₁, k₂) are the lens distortion parameters. E_{t̂_f} is the norm of error for t̂_f. (E_k₁, E_k₂) are errors for (k₁, k₂).

View Table | View all tables in this article

6.1.4. Comparison with the method of [4]

As mentioned in Section 1, the feature points detected for the proposed method can be used as the input of the algorithm of [4]. Unlike a moving bright spot, which can be viewed from almost any direction, all the features on the object cannot be seen from each camera because the object cannot be made to be transparent. Consequently, the survival duration of the features along the cameras is relatively short compared to instances where a moving bright spot is used. If the cameras are arranged more densely, more stable results may be obtained due to the effects of chaining local stable reconstruction. The simulation results for the method of [4] using the features detected for the proposed method are shown in Table 4. The width of the object was 360mm. The image noise was zero-mean Gaussian-distributed noise with standard deviation of 0.5. Since the results of [4] are up to scale, a similarity transformation was applied so that the distances between the estimated camera positions and their corresponding ground truths were minimal. From the results, we can see that satisfactory results cannot be obtained even if the number of object positions increases. Since the cameras of ENV2 look down at the capture volume, the points on the upper plane of the object can be viewed at three adjacent cameras simultaneously. This is the reason why the calibration results can be obtained in ENV2 compared to the results in ENV1 when the number of cameras is small (5 or 6). To identify the effects of the number of cameras, more cameras were located evenly in the environments while fixing the number of object positions. As mentioned above, more accurate calibration results were obtained. This implies that the method of [4] can be applied only in the case of densely distributed cameras when the feature points are extracted from a 3D reference object for unsynchronized camera networks.

Table 4. Results of the simulated experiments for the method of [4] using the features detected for the proposed method. C and P represent the number of cameras and object positions, respectively. The case in which calibration results cannot be obtained from the method is indicated by ”×”.

View Table | View all tables in this article

6.2. Real image experiments

6.2.1. Manufacturing a reference object

Owing to the advantage of the proposed algorithm described in Section 6.1.3, we simply constructed the reference object with cheap foam boards manually (See Fig. 7(a)). Eighteen 2D patterns containing nine concentric circles were printed on sheets of paper which were then attached to the reference object. Since at least two planes should be visible from the cameras in any direction in order to compute the projection matrices, we considered this symmetric design to be rational. The width was about 360mm.

6.2.2. Extraction of feature points

The feature points on the reference object were extracted automatically. First, circle candidates were extracted via edge detection and blob coloring. Then, two concentric circles having the nearest mass centers were paired to discriminate the probable circles on the background. By recognizing the characters shown in Fig. 7(a), the indices of the faces and circles were identified. After obtaining the ellipse parameters of the imaged circles using the method of [19], the bi-tangents of every pair of circles on the same plane were computed and the intersection points between the bi-tangents were used as feature points. The extraction results are shown in Fig. 7(b). It is worthwhile noting that these point coordinates should be normalized to obtain stable calibration results [20]. This was also performed for the simulated experiments outlined above.

6.2.3. Actual camera network environments

The proposed method was evaluated on two actual camera networks. The types of environments were similar to ENV1 in Fig. 4(a) and ENV2 in Fig. 4(b). We refer to these environments as Network1 and Network2, respectively. Fig. 8 shows the illustration of the camera networks. Table 5 shows the detailed specifications of the camera networks.

Fig. 7 (a) 3D reference object used in real image experiments. (b) Automatic feature extraction results for some circle pairs. The blue lines represent the bi-tangents. The extracted features are represented with red +’s.

Download Full Size | PDF

Fig. 8 Two actual camera networks for evaluation of the proposed algorithm.

Download Full Size | PDF

Table 5. The specifications of the actual camera network environments. C represents the number of cameras used in the experiment. P represents the number of object positions.

View Table | View all tables in this article

Network1 is shown in the left-hand image of Fig. 8. Images captured from some cameras of Network1 are shown in Fig. 9. Fig. 10 shows the recovered cameras and objects. From these figure, it can be seen that the qualitative positions of cameras and objects were estimated correctly. Since the ground truth is not available in real image experiments, the re-projection results are good indices of the performance of the proposed method. The average re-projection error(RPE) of the reference points was 0.0835 pixels. When the inaccuracy of the object was not compensated by the technique described in Section 5.9, the average RPE was 0.284 pixels. This implies that the object had inaccuracy and the pose discrepancy of the object’s faces from an original drawing was well estimated. The average of the estimated discrepancy of the position was 2.32mm.

Fig. 9 Example of captured images for Network1.

Download Full Size | PDF

Fig. 10 Visualization of the calibration results for Network1. (a) Top and (b) side views are shown, respectively.

Download Full Size | PDF

The right-hand image of Fig. 8 shows Network2. Fig. 11 shows the images of the reference object captured from some cameras. Fig. 12 shows the visibility map for the object positions in Network2. The filling process described in Section 5.7 was conducted successfully in this experiment. The visualization of the calibration results are shown in Fig. 13. The average RPE was 0.0473 pixels. When the inaccuracy of the object was not compensated, the average RPE is 0.148 pixels.

Fig. 11 Example of captured images for Network2.

Download Full Size | PDF

Fig. 12 The visibility map for the object positions in Network2. The horizontal and vertical axes represent the object position index and the camera index, respectively.

Download Full Size | PDF

Fig. 13 Visualization of the calibration results for Network2. (a) Top and (b) side views are shown, respectively.

Download Full Size | PDF

In order to investigate the consistency of the proposed method, we applied it to the camera set {0,2,4} and {1,3}, respectively, in both Network1 and Network2. The calibration results are shown in Table 6 and 7. We can see that the differences between the results from the subgroups of the cameras and those from all the cameras are quite small. These results imply that the proposed algorithm provides consistent and stable estimation results.

Table 6. Results of the real image experiments for consistency evaluation. These results are obtained from the subgroups of the cameras, {0,2,4} and {1,3}. Compare these results with those of Table 7.

View Table | View all tables in this article

Table 7. Results of the real image experiments for consistency evaluation. These results are obtained from all the cameras. Compare these results with those of Table 6.

View Table | View all tables in this article

The results of calibration could not be obtained using the method of [4] with the feature points detected in either Network1 or Network2. In these environments, no features could be viewed from more than two cameras simultaneously due to the sparse distribution of the cameras.

7. Conclusions

The intrinsic and extrinsic parameters of camera networks were recovered only by capturing a small inaccurate reference object. The proposed method provides a simple and accurate means of calibrating unsynchronized camera networks without the need for a large reference object for accuracy. The algorithm is based on the factorization of projection matrices into the camera and object pose parameters. Since the camera and object pose parameters can be acquired simultaneously, consistency in the rigid transformations among cameras and objects is obtained. These consistent results were used as the initial values for the non-linear optimization step for further refinement. It was shown that the proposed method can compensate the inaccuracy of the object and, consequently, decrease the manufacturing cost of the 3D reference object. The radial lens distortion was also calibrated simultaneously. It was also shown that the method of [4] is not suitable for sparsely distributed cameras when a 3D reference object is used for unsynchronized camera networks. Various performance analyses of the proposed method were also presented through the simulated and real image experiments.

Acknowledgments

This work was supported by the strategic technology development program of MCST/MKE/KEIT [KI001798, Development of Full 3D Reconstruction Technology for Broadcasting Communication Fusion]. The authors would also like to thank the anonymous reviewers for their valuable comments to improve the quality of the paper, especially for the suggestions and ideas about the analysis of the two rescaling methods and the noise modelling.

References and links

1. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd. ed. (Cambridge University Press, 2003).

2. Z. Zhang, “A flexible new technique for camera calibration,” Tech. Rep. MSR-TR-98-71, Microsoft Corporation (1998).

3. T. Ueshiba and F. Tomita, “Plane-based calibration algorithm for multi-camera systems via factorization of homography matrices,” in “Proc. IEEE International Conference on Computer Vision,” (Nice, France, 2003), pp. 966–973. [CrossRef]

4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

5. G. Kurillo, Z. Li, and R. Bajcsy, “Wide-area external multi-camera calibration using vision graphs and virtual calibration object,” in “Proc. ACM/IEEE International Conference on Distributed Smart Cameras,” (2008), pp. 1–9. [CrossRef]

6. H. Medeiros, H. Iwaki, and J. Park, “Online distributed calibration of a large network of wireless cameras using dynamic clustering,” in “Proc. ACM/IEEE International Conference on Distributed Smart Cameras,” (2008), pp. 1–10. [CrossRef]

7. X. Chen, J. Davis, and P. Slusallek, “Wide area camera calibration using virtual calibration objects,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” Hilton Head, SC, USA (2000), pp. 520–527.

8. S. N. Sinha and M. Pollefeys, “Camera network calibration and synchronization from silhouettes in archived video,” Int. J. Comput. Vision 87, 266–283 (2010). [CrossRef]

9. E. Boyer, “On using silhouettes for camera calibration,” in “Proc. Asian Conference on Computer Vision,” Hyderabad, India (2006), pp. 1–10.

10. J. Kassebaum, N. Bulusu, and W.-C. Feng, “3-d target-based distributed smart camera network localization,” IEEE Trans. Image Process. 19, 2530–2539 (2010). [CrossRef] [PubMed]

11. C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography: a factorization method,” Int. J. Comput. Vision 9, 137–154 (1992). [CrossRef]

12. P. Sturm and B. Triggs, “A factorization based algorithm for multi-image projective structure and motion,” in “Proc. European Conference on Computer Vision,” Cambridge, UK (1996), pp. 709–720.

13. D. Jacobs, “Linear fitting with missing data: Applications to structure from motion and to characterizing intensity images,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” San Juan, Puerto Rico (1997), pp. 206–212.

14. D. Martinec and T. Pajdla, “Structure from many perspective images with occlusions,” in “Proc. European Conference on Computer Vision,” Copenhagen, Denmark, (2002), pp. 355–369.

15. P. Sturm, “Algorithms for plane-based pose estimation,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” Hilton Head Island, SC, USA, (2000), pp. 706–711.

16. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

17. O. Faugeras, Three-Dimensional Computer Vision (The MIT Press, 1993).

18. K. S. Arun, T. S. Huang, and S. D. Blosten, “Least-squares fitting of two 3-D point sets,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9, 698–700 (1987). [CrossRef]

19. A. Fitzgibbon, M. Pilu, and R. B. Fisher, “Direct least square fitting of ellipses,” IEEE Trans. Pattern Anal. Mach. Intell. 21, 476–480 (1999). [CrossRef]

20. R. Hartley, “In defence of the 8-point algorithm,” in “Proc. International Conference on Computer Vision,” Sendai, Japan (1995), pp. 1064–1070.

ENV	σ	W	P	RPE	E_{f_u}	E_R	E_t	Method

1	0.5	360	9	0.447	1.63	1.58×10⁻³	2.01	Our method
	1.0	360	9	0.845	2.23	2.98×10⁻³	3.14

	0.5	240	9	0.447	1.91	1.96×10⁻³	2.08
	1.0	240	9	0.840	2.57	3.88×10⁻³	5.19

	0.5	240	13	0.446	1.37	1.28×10⁻³	1.90
	1.0	240	13	0.846	2.20	2.62×10⁻³	2.97

	0.5	240	1	0.420	0	10.40×10⁻³	19.49	[10]
	0.5	240	1	0.415	298.4	0.222	593.9	[10] without given K

	0.5	1450	1	0.449	0	1.36×10⁻³	2.14	[10]
	0.5	1450	1	0.416	4.89	7.88×10⁻³	7.81	[10] without given K

2	0.5	360	25	0.438	1.60	1.57×10⁻³	2.06	Our method
	1.0	360	25	0.829	2.12	3.20×10⁻³	3.69

	0.5	240	25	0.446	2.01	2.27×10⁻³	2.19
	1.0	240	25	0.852	2.99	3.76×10⁻³	5.01

	0.5	240	30	0.448	1.73	1.42×10⁻³	2.01
	1.0	240	30	0.857	2.31	3.17×10⁻³	3.77

	0.5	240	3	0.426	0	54.0×10⁻³	61.3	[10]
	0.5	240	3	0.435	280.0	0.441	654.1	[10] without given K

	0.5	1450	3	0.428	0	2.05×10⁻³	3.98	[10]
	0.5	1450	3	0.402	7.94	6.27×10⁻³	10.59	[10] without given K

Rescaling Method	σ	W	RPE1	RPE2	E_{f_u}	E_R	E_t

1	0.5	360	17.60	0.438	1.60	1.57×10⁻³	2.06
	0.5	240	37.65	0.446	2.01	2.27×10⁻³	2.19
	1.0	360	40.65	0.829	2.12	3.20×10⁻³	3.69

2	0.5	360	41.58	0.439	1.44	1.38×10⁻³	2.86
	0.5	240	71.40	0.442	2.27	3.57×10⁻³	4.78
	1.0	360	69.74	0.834	3.75	3.88×10⁻³	5.14

ENV	(θ̂_f, ‖t̂_f‖)	(k₁, k₂)	RPE	E_{f_u}	E_R	E_t	E_{t̂_f}	(E_k₁, E_k₂)

1	(0.5,2.0)	(−0.2,0.05)	0.443	2.42	3.46×10⁻³	4.04	0.866	(5.28,20.9)×10⁻³
	(3.0,5.0)	(−0.2,0.05)	0.432	2.59	4.27×10⁻³	4.54	1.523	(4.33,23.9)×10⁻³
	(3.0,5.0)	(−0.4,0.1)	0.444	2.90	4.17×10⁻³	4.27	1.789	(3.28,17.1)×10⁻³

2	(0.5,2.0)	(−0.2,0.05)	0.439	2.45	5.45×10⁻³	4.42	0.936	(3.34,11.2)×10⁻³
	(1.0,3.0)	(−0.2,0.05)	0.442	2.39	6.40×10⁻³	5.08	1.491	(3.88,12.5)×10⁻³
	(1.0,3.0)	(−0.4,0.1)	0.443	3.01	5.64×10⁻³	4.21	1.984	(4.69,9.89)×10⁻³

ENV	C	P	RPE	E_{f_u}	E_R	E_t

1	6	9∼13	×

	7	9	0.439	4.68	5.13×10⁻³	7.89
	8		0.422	2.00	4.66×10⁻³	4.44
	10		0.481	1.93	2.11×10⁻³	2.75

2		25	0.421	275.8	177.2×10⁻³	362.9
	5	50	0.428	129.6	73.8×10⁻³	160.6
		100	0.433	36.0	20.1×10⁻³	43.3

	6	25	0.428	121.9	87.7×10⁻³	166.8
	8		0.459	5.30	4.53×10⁻³	8.32
	10		0.460	2.31	2.07×10⁻³	3.88

Network	Camera Model	Resolution	ENV	Dim.(m)	C	P
1	Basler piA1600	1600 × 1200	1	3(radius)	5	13
2	Panasonic 1500G	1280 × 720	2	10×10×2	5	19

Convenient calibration method for unsynchronized camera networks using an inaccurate small reference object

Abstract

1. Introduction

2. Related works

3. Preliminaries

3.1. Camera parameterization

3.2. Object pose parameterization

4. Image measurements

4.1. Projection matrix

4.2. Measurement matrix

5. Parameter estimation

5.1. Rescaling the measurement matrix (Method 1)

5.2. Rescaling the measurement matrix (Method 2)

5.3. Analysis of the two rescaling methods

5.4. Factorization

5.5. Resolving affine ambiguity

5.6. Reconstructing the camera and object pose parameters

5.7. Filling missing entries

5.8. Refining the parameters

5.9. Compensating the inaccuracy of a reference object

5.10. Dealing with radial distortion

6. Experimental results

6.1. Simulated experiments

6.1.1. Performance w.r.t noise level, object size, and number of object positions

6.1.2. Comparison of the two rescaling methods

6.1.3. Performance w.r.t object inaccuracy and radial distortion

6.1.4. Comparison with the method of [4]

6.2. Real image experiments

6.2.1. Manufacturing a reference object

6.2.2. Extraction of feature points

6.2.3. Actual camera network environments

7. Conclusions

Acknowledgments

References and links

Cited By

Figures (13)

Tables (7)

Equations (23)

Optics Express

Network	K,t	Camera{0,2,4}			Camera{1,3}
Network	K,t	0	2	4	1	3

1	f_u	1745.73	1740.40	1743.33	1747.48	1743.15
	f_v	1743.01	1737.43	1742.00	1744.57	1740.84
	s	−0.3795	−0.5610	0.02034	−0.2894	−2.1760
	u_o	816.25	833.64	821.03	819.95	833.66
	v_o	590.89	594.26	598.93	615.79	576.97

	t_x	−176.42	−342.55	416.03	−677.26	339.80
	t_y	−640.04	−1024.97	−854.01	−881.20	−977.50
	t_z	3047.91	4048.38	3290.12	3461.50	3971.07

	RPE	0.0847			0.0768

2	f_u	1524.15	1357.37	1603.03	1534.47	1380.02
	f_v	1525.22	1357.87	1603.49	1533.81	1380.21
	s	−1.6401	−2.4949	−2.7770	−2.4302	−2.5791
	u_o	639.34	633.60	642.31	641.50	647.50
	v_o	366.96	368.32	368.36	359.63	355.32

	t_x	25.60	41.48	22.97	7.90	−14.70
	t_y	560.21	532.06	550.36	557.68	578.22
	t_z	5479.83	4871.16	5868.46	5503.51	4877.54

	RPE	0.0432			0.0477

Network	K,t	Camera{0,1,2,3,4}
Network	K,t	0	1	2	3	4

1	f_u	1745.05	1744.78	1739.60	1746.30	1743.98
	f_v	1742.52	1742.32	1736.83	1743.32	1741.43
	s	−0.0215	−0.3153	−0.3797	−0.6097	−0.1731
	u_o	816.57	818.52	831.71	830.48	821.75
	v_o	590.94	614.79	591.20	576.19	601.18

	t_x	−176.93	−671.89	−337.55	349.36	414.27
	t_y	−640.20	−880.30	−1017.88	−972.99	−858.35
	t_z	3048.31	3461.22	4047.81	3972.03	3288.95

	RPE	0.0835

2	f_u	1522.87	1541.83	1352.17	1379.59	1599.15
	f_v	1523.57	1541.32	1352.96	1379.16	1598.92
	s	−0.0821	−0.5686	−0.7422	−0.7249	−1.0024
	u_o	641.29	640.56	635.88	647.21	640.38
	v_o	367.28	357.57	367.22	356.75	368.43

	t_x	18.34	7.58	35.24	−16.91	31.83
	t_y	559.08	564.20	535.97	572.52	549.83
	t_z	5471.89	5523.79	4852.43	4871.84	5849.98

	RPE	0.0473