Innovative K-Means based machine learning method for determination of non-uniform image coordinate system in panoramic imaging: a case study with Ladybug2 camera

Bahadır Ergun

doi:10.1364/OE.507052

1. Introduction

Capturing a single image in the surrounding environment, a multi-camera system offers great potential for geomatics instrumentation, robotics, car navigation, entertainment systems, and even space applications (such as spacecraft docking navigation systems). In parallel with the increasing importance of 3D information, the proliferating significance and use of machine vision studies as a means of obtaining three-dimensional information, have also increased.

The first panoramic camera was invented by P. Puchberger of Austria in 1843. It was a hand crank-driven swing lens panoramic camera, capable of capturing a 150° image. The rotating camera invention of M. Garella of England in 1857 extended the field of view of capture to a full 360° [1]. However, photogrammetry was unable to benefit from the use of early panoramic cameras. This was attributed to rotational mechanics’ issues in image acquisition, along with the complexity and time-consuming nature of image modeling during image correction [2].

The use of panoramic cameras in the field of photogrammetry has increased even more in recent years, thanks to the continuous advancements in technology. An omnidirectional system with multi-cameras can be designed with a conventional geometric optical system (e.g. perspective lenses) and a non-conventional geometric optical system (e.g. fisheye lenses). Conventional cameras have a narrow FOV, which need a larger number to cover a full-spherical FOV. Polydioptric systems with fisheye lenses have been the primary choice in the field due to their large FOV [3]. The main contribution of [4] is to express a multi-camera system and then to derive the structure from motion constraint equations.

Currently, there are many applications that are made with multiple camera systems. For example, a study conducted by [5] shows that a multi-camera system with six cameras, which is mainly used for land applications, is used in underwater work. [6] suggests and evaluates a low-cost and lightweight Personal Mobile Terrestrial System with a versatile camera. [7] provides panoramic images collected employing a mobile vehicle and an automatic mutual information recording method for the mobile LiDAR. In their study, the panoramic images are obtained from the Ladybug3 camera, which consists of six fisheye lenses. [8] focuses on the problem of calculating the external calibration of the 3D Velodyne LiDAR according to a solidly connected camera while at the same time estimating its internal parameters. To solve this problem, it is required to divide the problem into two least squares sub-problems and solve each one analytically to determine an exact initial estimate for the unknown parameters. The purpose of the study conducted by [9] is to develop an absolute visual positioning system of an omni-wheeled robot for indoor navigation. In the field of photogrammetry, [10] investigates the potential of immersive videography by strategically arranged multiple cameras, with each camera facing a specific angle to create captivating videos. In a study conducted by [11], a conical stereo imaging system with six and twelve sensors is presented. Additionally, it also provides a mathematical model of this system. [12] investigates the feasibility of using a six-sensor omnidirectional/fisheye camera system and reports about its performance. In the [13] study, to create a stereo panoramic application, several panoramic images were taken from two distinct locations in the calibration room, and photogrammetric three-dimensional coordinates were calculated for the identified targets. [14] introduces a dataset developed to enable the design of local and global energy-aware navigation and path-planning algorithms for planetary environments. Sensors onboard included an Occam Vision Group omnidirectional stereo camera which was composed of 10 individual RGB cameras. [15] presents a drone detection system that uses an omnidirectional camera. It is aimed to improve the performance of small object detection, which is generally regarded as a challenge for object detection using CNN. A UAV equipped with a multi-camera imaging system can capture oblique images from almost all angles of view. To highlight the extensive potential of 360° imaging for photogrammetric measurements, some studies have focused on camera calibration and photogrammetric applications of these multi-camera systems based on fisheye lenses [3].

Considering that these systems are composed of multiple cameras, to enable the use of a single set of exterior orientation parameters (EOPs) for the component images and generate composite images, it is necessary to estimate the relative orientation parameters (ROPs). In these recent commercial systems, cameras are arranged in the same compact and closed structure, which hinders direct ROP measurements and thus requires indirect estimation. The calibration of spherical cameras requires that object points are evenly distributed around the camera (to produce suitable geometry) and also that they are accurately measured in the images. Therefore, a proper 360° camera calibration field is required. The extraction of accurate image point coordinates is a critical process in camera calibration [16].

The fundamental image coordinate system in photogrammetry, which are measured the 2D image coordinates of target signals, is the coordinate system that adopts the point of the image center as a datum. This datum point has to be used for the projection center and the principal point is named.

Clustering the image measurements in photogrammetry by means of several clustering methods is possible [17]. In the literature, the term “K-Means” was first developed by James MacQueen in 1967 where he states that “K-Means clustering is one of the most commonly used unsupervised machine learning algorithms for partitioning a given data set into a set of k groups” [18]. K-Means is a non-hierarchical data clustering method that attempts to partition existing data into one or more clusters/groups. This method partitions the data into clusters/groups so that the data that have the same characteristics are grouped into the same cluster and the data that have different characteristics are grouped into other groups. The purpose of data clustering is to minimize the objective function set in the clustering process, which generally attempts to minimize variation within a cluster and maximize the variation between clusters. Due to its advantages of simplicity of its implementation, processing quick, efficiency, versatility, strong scalability, and easiness of interpreting the results of clustering this algorithm has become the most well-known and commonly used algorithm in various research fields [19,20]. The distance formula is used to calculate the distance from each data object to the center point, so as to realize the classification of the dataset. The main disadvantage of the K-Means method is that the number k is often not known priorly [21]. In the literature, there are several studies that have explored the use of eigenvalues to classify LiDAR points. It is proposed the use of metrics based on eigenvalues and the K-Means method to carry out the classification in [22]. The libraries of R are shown to be effective in remote sensing data processing tasks, such as classification using K-Means clustering and computing the Normalized Difference Vegetation Index (NDVI) [23]. K-Means generate hyper-elliptical (i.e. elliptical over more than two dimensions), unconstrained clusters offering the benefit of fast processing and a constrained number of clusters. However, the method requires the number of clusters to be specified beforehand, which limits its usefulness in data mining and often means that the technique results in clusters that meet the “required answer” [24].

By redefining the image coordinate system, the accuracy of the DLT on the panoramic image and the accuracy of the positions that can be obtained as a result of the DLT can be improved. In this study, an elliptical panoramic coordinate system is defined in a three-dimensional 360-degree calibration area with the Ladybug2 camera, and a non-uniform image coordinate system is obtained on the panoramic image using the K-Means segmentation method. With the obtained non-uniform coordinate system, DLT parameters and three-dimensional control points are compared with the DLT results obtained from the conventional image coordinate system. To determine the datum of the image coordinate system in panoramic imaging, the single panoramic image is obtained in the middle of the calibration room and photogrammetric EOPs are calculated with the DLT algorithm. However, the EOP parameters which are calculated from image coordinates in a conventional image coordinate system with a DLT algorithm have high residuals for the coordinates of 3D control points depending on the panoramic projection lines at the single panoramic image projection. This problem could be solved by defining the non-uniform image coordinate system which is calculated through segmentation of the 3D coordinates in panoramic projection for a single image in the 360-degree calibration room view. For this purpose, the K-Means segmentation method has been used for defining the non-uniform image coordinate system for the panoramic Ladybug2 camera projection model. Detailed information is presented in the application section. For example, in accordance with the suggested method, the point of the object to be studied photogrammetrically is obtained from a single panoramic view. Thus, image coordinates which are measured in the photogrammetric image coordinate system and in the suggested non-uniform coordinate system should be used for the calculation of the control points in a three-dimensional object coordinate system. These residuals, parameters, and accuracies calculated by DLT are compared with each other and reported accordingly. The fundamental of the suggested process in this study is presented in a flowchart in Fig. 1.

Fig. 1. Flowchart of the study.

SVD Parameters	Value (mm)
Radius 1	316.4175
Radius 2	59.6389
Vector 1	[1;0.00002; -0.00452]
Vector 2	[1; 0.02218; 0.00005]

RMSE of 3D Affine Trans.	m_X0	m_Y0	m_Z0	m_a1	m_a2	m_a3	m_a4	m_a5	m_a6	m_a7	m_a8	m_a9
Conventional	94.30045	94.30045	94.30045	0.025857	0.050863	0.021292	0.025857	0.050863	0.021292	0.025857	0.050863	0.021292
K-Means	10.94027	10.94027	10.94027	0.003	0.005901	0.00247	0.003	0.005901	0.00247	0.003	0.005901	021292

DLT Parameters	L₁	L₂	L₃	L₄	L₅	L₆	L₇	L₈	L₉	L₁₀	L₁₁
Conventional	-0.00021	9.28E-05	0.000331	-0.05441	-0.00011	-0.00046	-6.7E-05	0.513125	-0.00022	2.59E-06	-0.00016
K-Means	-2.41E-07	-4.01E-07	-1.67E-07	0.001098	-5.06E-07	1.57E-07	4.72E-07	0.00056	-0.00022	4.74E-07	-0.00017

RMSE	m₁	m₂	m₃	m₄	m₅	m₆	m₇	m₈	m₉	m₁₀	m₁₁
Conventional	4.19E-05	7.03E-05	4.37E-05	0.153383	3.6E-05	7.65E-05	3.02E-05	0.130361	1.1E-05	2.93E-05	1.66E-05
K-means	5.98E-08	1.2E-07	5.11E-08	0.000214	6.63E-08	1.18E-07	7.52E-08	0.000254	1.09E-05	2.9E-05	1.65E-05

IOP Values via DLT	x₀ (mm)	y₀ (mm)	c_x (mm)	c_y (mm)	L (mm)
From Conventional Image Coordinate System	-0.10296	0.44837	1.43697	1.667087	-3594.19847
From K-Means Segmented Image Coordinate System	0.00105	0.00045	0.00144	0.00251	-3593.75768

EOP Values via DLT	X₀ (m)	Y₀ (m)	Z₀(m)
From Conventional Image Coordinate System	2997.1853	1991.9603	101.4886
From K-Means Segmented Image Coordinate System	2997.1322	1991.9394	101.4849

Abstract

1. Introduction

2. Coordinate systems in panoramic imaging

3. Application

4. Results and discussion

5. Conclusion

Disclosures

Data availability

References

Data availability

Cited By

Figures (15)

Tables (6)

Equations (8)

Optics Express