Compact and lightweight panoramic annular lens for computer vision tasks

Shaohua Gao; Lei Sun; Lei Sun; Qi Jiang; Hao Shi; Jia Wang; Kaiwei Wang; Kaiwei Wang; Jian Bai

doi:10.1364/OE.465888

1. Introduction

Panoramic annular lens (PAL) system is an optical system with a large field of view (FoV) which is widely used in security surveillance, intelligent robot positioning, and map building [1,2]. Compared with the wildly-used fisheye optical system, the PAL system is a more compact choice, and it focuses on the scene acquisition and perception of the surround-view effect with lower distortion of the edge FoV. In addition, the PAL system has higher and more uniform relative illumination than the fisheye optical system with the same FoV. These features makes the images captured by the PAL system more conducive to computer vision tasks and a more suitable solution for panoramic environment perception. Based on the design idea of catadioptric, researchers have designed different outstanding PAL systems with the characteristics of ultra-wide FoV [3–5], low distortion [6,7], high imaging quality [8], stereo imaging [9], long focal length [10,11] and multi-views [12]. To correct aberrations in large FoV, conventional PAL systems are often designed with multiple spherical lenses or aspherical surfaces. For spherical PAL systems, the above conventional designs typically have tight tolerances and the number of lenses is more than 7. Optimizing the design with multiple spherical lenses or equipping aspherical surfaces can bring more optimization variables, but the crosstalking within the parameters will affect the optimization process. Besides, simply increasing the number of spherical lenses or using aspherical surfaces also leads to the redundancy and waste of the excessive optimized variables. In addition, a PAL system designed using more spherical lenses and aspheric techniques would bring further problems such as complex system structure, large volume, tight tolerances, high price, and low manufacturing yield. These factors will make traditional PAL systems unsuitable for the applications in situations where space and weight are limited. All the drawbacks make simpler PAL system with image quality enhancement algorithms a preferable choice.

The booming development of the novel sensors and artificial intelligence technology greatly promotes the advance of intelligent sensing optical systems. The combination of optical design and image process algorithms boost the transformation of the traditional optical technology. To enable the applications in various space- and weight-constrained scenarios, the panoramic optical system is expected to be compact in size and light-weight. Thus, to maintain both tiny optical system size and high-quality imaging, we choose to design a compact PAL system with acceptable imaging quality and equip it with powerful deep learning based image processing methods to enhance the image quality.

In this work, we propose a principle of focal power distribution for compact and lightweight PAL system design, which is based on the correction of Petzval sum. The designed and implemented system has a 360$^{\circ }\times$(25$^{\circ }$-100$^{\circ }$) large FoV with F-number 5.5. The whole optical system consists of only 3 spherical lenses. The focal length of the proposed compact PAL system is -1.17 mm. It has loose tolerances for easy manufacturing and mass production. Moreover, we design a novel neural network model named PAL-Restormer to recover the degraded image caused by the residual aberration of our compact PAL system, and extensive experiments shows the effectiveness of the proposed model in image quality improvement. Further, we show the applications of our compact PAL system in downstream computer vision tasks such as object detection and optical flow estimation. By equipping PAL-Restormer, the balance between high quality imaging and tiny size makes our compact PAL system promising in the applications of panoramic imaging in space- and volume-constrained scenarios.

The outline of this paper is organized as the following: In Section 2, we introduce the design principles of the compact PAL system. Then, we describe the design flow and verify the performance of the implemented PAL system in Section 3. The principle of quality enhancement with computational imaging is described in Section 4. In Section 5, we analyze the experimental results of quality enhancement method. In addition, we discuss and present the various computer vision tasks and experimental results in Section 6. Finally, the paper is summarized and prospected in Section 7.

2. Design principles

2.1 PAL system imaging principle

The PAL system is mainly composed of three parts, namely a PAL block, a relay lens group, and a sensor, as shown in Fig. 1. The PAL block is based on the catadioptric type, which is capable of converting the incident light from the surrounding 360$^{\circ }$ FoV to form a small angle beam into the relay lens group. The relay lens groups can provide positive focal power and correct aberrations in different FoV for imaging. The PAL system is a large FoV imaging optical system, which usually conforms to the principle of equidistant projection. The relationship among image height $y$, minimum half FoV $\theta$, and focal length $f$ can be expressed as following:

(1)$$y = f \cdot \theta.$$

Fig. 1. Imaging principle of PAL system.

Download Full Size | PDF

The PAL system will form an annular image among the minimum half FoV $\theta _{\min }$ and the maximum half FoV $\theta _{\max }$. Due to the occlusion of the second mirror, a central blind area will be formed in the center of the PAL. The blind area radius can be calculated according to the minimum half FoV $\theta _{\min }$, the focal length $f$, and Eq. (1). The maximum radius of the imaging circle can be calculated using maximum half FoV $\theta _{\max }$, the focal length $f$, and Eq. (1). To evaluate the area ratio of the imaging area of the PAL system to the maximum imaging circle, we can define the calculation process of the effective imaging area ratio as Eq. (2).

(2)$$R_{efficient}=\left[1-\frac{\pi\left(f \theta_{\min }\right)^{2}}{\pi\left(f \theta_{\max }\right)^{2}}\right] \%.$$

To obtain a high effective imaging area ratio, we expect the minimum FoV $\theta _{\min }$ to be as small as possible under the condition of satisfying imaging quality.

2.2 Focal power distribution theory for compact PAL initial structure calculation

The PAL system is an ultra-wide-angle optical system. The key aberration for the large FoV system is field curvature. When the field curvature of the PAL system is small, it proves that the initial structure of the optical system has the potential to be optimized for a design with good imaging quality. Therefore, it is crucial to obtain the initial structure of the optical system. The method of obtaining the initial structure of the optical system is usually $PW$ method calculation, experience, and patents search, etc. The initial structure calculation of the $PW$ method facilitates the design of simple and thin lens systems with small FoVs, but it is not suitable for the initial structure calculation of large FoV PAL systems with catadioptric structures. Consequently, the design of PAL systems often relies on traditional design experience. To design a compact and lightweight PAL system, we construct a PAL focal power distribution theory to get rid of the dependence on conventional PAL initial structures with tight tolerances and complex structures. The PAL focal power distribution theory based on the Petzval sum correction for the construction of compact and lightweight PAL system initial structure. The initial structure of the PAL system can be divided into two parts: (1) the establishment of the PAL block; (2) the establishment of the relay lens group. According to the Seidel aberration theory [13], the field curvature of an optical system is determined by the fourth Seidel factor Petzval sum $P$, as given in Eq. (3):

(3)$$P=J^{2} \sum_{i=1}^{n} \frac{\phi_{i}}{n_{i}},$$

where $J$ denotes Lagrange invariant, $\phi _{i}$ is $i$-th lens focal power, and $n_{i}$ is $i$-th lens refractive index. When designing a PAL system with a large FoV, we should minimize the field curvature of the system to facilitate subsequent aberration optimization. When the system has good image quality optimization conditions, the Petzval sum must be close to zero, as shown in Eq. (4).

(4)$$P=J^{2} \sum_{i=1}^{n} \frac{\phi_{i}}{n_{i}}=0.$$

The relay lens group usually has a positive focal power for focusing the light emerging from the PAL block. Therefore, the PAL block needs to be designed with negative focal power to satisfy the whole system with a Seidel coefficient close to zero. In previous studies, it has been analyzed that the reflection on the refraction surface of the PAL block is a source of strong stray light [14]. For the PAL blocks with a triplet lens or a doublet lens, usually the refractive indexes of the front glass, the rear glass and the glue of the glued surface are not the same. When the optical path propagates in the PAL block, the stray light generated by the Fresnel loss will be refracted and reflected multiple times in the PAL block, resulting in more serious stray light phenomenon. The PAL block designed with a single lens can avoid gluing and reduce the refraction surface during the transmission of light in the PAL block, thereby effectively reducing stray light. At the same time, considering the manufacturing cost and lightweight of the PAL system, this design solution adopts a single lens as the PAL block. To characterize the PAL block parameters, we trace a single chief ray entering the PAL block and establish PAL parameter definition model shown in Fig. 2.

To minimunize the axial length of the PAL system, the aperture stop is placed on the rear surface of the PAL block. The incident light is incident from the first transmission surface of the PAL block, then reflect on the first reflection surface and the second reflection surface in turn, and finally exits on the second transmission surface. The emergent ray passes through the intersection of the optical axis and the second transmission surface. We define the curvature of the first transmission surface as $c_{1}$ (the reciprocal of the radius $R_{1}$ of the first transmission surface). The refractive index of the first transmission surface before refraction is $n_{1}$, and the refractive index after refraction is $n_{1}^{\prime }$. Then we define the curvature $c_{i}$, $n_{i}$, $n_{i}^{\prime }$ for the first reflection surface, the second reflection surface, and the second transmission surface in turn, where $i$ is the $i$-th surface. When ray hits a reflection surface, $n_{i}^{\prime }$=-$n_{i}$. We can express Eq. (4) as Eq. (5)

(5)$$P=\sum_{i=1}^{n} P_{i}=J^{2} \sum_{i=1}^{n} \frac{n_{i}^{\prime}-n_{i}}{n_{i}^{\prime} n_{i}} c_{i}.$$

We define the parameter $\rho _{_{\rm {PAL\ block}}}$ of the PAL block to evaluate the field curvature Seidel aberration introduced by the PAL block. $\rho _{_{\rm {PAL\ block}}}$ can be defined as Eq. (6)

(6)$$\rho_{_{\rm{PAL\ block}}}=\sum_{i=1}^{4} \frac{n_{i}^{\prime}-n_{i}}{n_{i}^{\prime} n_{i}} c_{i}.$$

To initialize the PAL system model parameter calculation and ensure ray tracing performance, the PAL block parameters after scaling the previous PAL design can serve as an effective guide for the initial value of the PAL block. With the PAL block constructed with such initial parameters, a suitable catadioptric structure can be produced quickly. Since the first reflection surface and the second transmission surface generally have the same curvature and are the same surface, there is $R_{4}$=$R_{2}$. The initial values of $R_{1}$, $R_{2}$, and $R_{3}$ are set to 25 mm, -10 mm, and -30 mm, respectively. Because the reflection surface of the PAL block has higher system sensitivity, it is necessary to set the first reflection surface $R_{2}$ and the second reflection surface $R_{3}$ of the PAL to a smaller value range to ensure the correctness of ray tracing and the reasonableness of the PAL block structure. The value ranges of $R_{1}$, $R_{2}$, and $R_{3}$ are ${\pm }$4 mm, ${\pm }$2 mm, and ${\pm }$2 mm. When we change the refractive index $n_{_{\rm {PAL\ block}}}$ of the PAL block, the change curve of $\rho _{_{\rm {PAL\ block}}}$ is shown in Fig. 3.

Fig. 2. Definition of each work surface of the PAL block and its related parameters.

Download Full Size | PDF

Fig. 3. Relationship between the refractive index $n_{_{\rm {PAL\ block}}}$ of the PAL block and the Seidel aberration factor $\rho _{_{\rm {PAL\ block}}}$.

Download Full Size | PDF

When the PAL block uses a high refractive index material, the field curvature Seidel aberration factor $\rho _{_{\rm {PAL\ block}}}$ provided by the PAL block will decrease non-linearly. Therefore, it is much easier to correct the Petzval sum with high refractive index materials, thereby reducing the field curvature of the system. Since the original intention of our design is to achieve a low-cost and compact PAL system design, cost as a key factor should be given high priority in system design. Although with the increase of the refractive index of the PAL block, the Petzval field curvature brought by the PAL block will gradually decrease. However, this system still requires a relay lens group for imaging. The optical material of the relay lens group can still use higher refractive indexes to correct the Petzval field curvature. The PAL block have the largest volume and weight of the entire PAL system. Therefore, selecting a low refractive index material for the PAL block can effectively reduce the material cost of the system. Taking the above factors into consideration, the use of low-cost, easily available optical glass materials for PAL block design can greatly reduce the cost of the entire PAL system. This design consideration has important guiding significance for reducing the cost of mass production. For this reason, we set the initial refractive index of the PAL block to 1.5. To further reduce cost and improve manufacturing yield, we model the entire optical system using spherical surfaces.

Compared with the first transmission surface, the first reflection surface and the second reflection surface have the greatest influence on the direction of the light. Therefore, it is essential to study the influence of the radius $R_{2}$ of the first reflection surface and the radius $R_{3}$ of the second reflection surface on the field curvature Seidel aberration factor $\rho _{_{\rm {PAL\ block}}}$ provided by the PAL block. The relationship among these parameters can be illustrated as Fig. 4. As the absolute value of $R_{2}$ and the absolute value of $R_{3}$ gradually increase, the absolute value of $\rho _{_{\rm {PAL\ block}}}$ decreases.

Fig. 4. Relationship of first reflection surface radius $R_{2}$ and second reflection surface radius $R_{3}$ to PAL block Seidel aberration factor $\rho _{_{\rm {PAL\ block}}}$.

Download Full Size | PDF

The second part of the initial structure calculation is to design the relay lens group. The main purpose of the relay lens group is to correct the aberration of the light from the PAL block. To reduce the complexity of the PAL system, it is necessary to make the relay lens group as simple and efficient as possible to correct the aberration. The design concept avoids conflicts between too many optimization parameters. This design can effectively improve the compactness of the PAL system and reduce the size and weight. We first preliminarily design the relay lens group as a single plano-convex positive lens to provide positive focal power. The front surface is flat and the rear surface is convex. We define the parameters of a single positive lens as a relay lens group as shown in Fig. 5(a). The refractive index in front of the single lens plane is $n_{5}$. The refractive index behind the plane is $n_{5}^{\prime }$. The curvature of the plane is $c_{5}=0$. The refraction index in front of the convex surface of the single lens is $n_{6}$. The refraction index behind the convex surface is $n_{6}^{\prime }$. The curvature of the convex surface is $c_{6}$. The sum of the field curvature Seidel aberration factor $\rho _{_{\rm {PAL\ block}}}$ provided by the PAL block and the relay lens group $\rho _{_{\rm {Relay\ Lens\ Group}}}$ is close to zero, as shown in Eq. (7).

(7)$$\rho_{_{\rm{Relay\ lens\ group}}}+\rho_{_{\rm{PAL\ block}}}=0.$$

Fig. 5. (a) Definition of relay lens group parameters for a single lens. (b) Definition of relay lens group parameters for doublet lenses.

Download Full Size | PDF

The PAL systems usually have great imaging qualities when the field curvature Seidel aberration factor is less than 0.01. Since the individual plano-convex lens as a relay lens group needs to take a high positive focal power, we set the initial refractive index at 1.8 to reduce the curvature of the individual convex surfaces for easier fabrication. Then the field curvature Seidel aberration factor $\rho _{_{\rm {Relay\ Lens\ Group}}}$ of the relay lens group can be expressed by Eq. (8).

(8)$$\rho_{_{\rm{Relay\ lens\ group}}}= \frac{n_{5}^{\prime}-n_{5}}{n_{5}^{\prime} n_{5}} c_{5}+\frac{n_{6}^{\prime}-n_{6}}{n_{6}^{\prime} n_{6}} c_{6}.$$

Based on Eqs. (8) and (9), we can calculate the curvature $c_{6}$ and the radius $R_{6}$ of the convex surface when a single lens is used as a relay lens group. $d_1$ in Eq. (9) is the thickness of the plano-convex lens. The initial value is set to 4 mm.

Then, Eq. (10) can be used to calculate the focal power $\Phi _{_{\rm {Relay\ lens\ group}}}$ of the relay lens group. To maintain the compactness of the PAL system while further improving the aberration correction capability of the relay lens group, we use a meniscus lens to cement the plano-convex lens for reducing the focal power borne by a single convex surface.

The two curvatures $c_{6}$ and $c_{7}$ of the concave and convex surfaces of the meniscus lens are defined as shown in Fig. 5(b), and they have the same initial values.

(9)$$\Phi_{_{\rm{Relay\ lens\ group}}}=(n_{5}^{\prime}-1)(c_{5}-c_{6})+\frac{(n_{5}^{\prime}-1)^{2}d_{1}c_{5}c_{6}}{n_{5}^{\prime}}.$$

The focal power $\Phi _{_{\rm {Relay\ lens\ group}}}$ of a single plano-convex lens as a relay lens group is split into a plano-convex lens with a smaller focal power and a meniscus lens. The sum of the focal powers of the plano-convex lens and the meniscus lens after separating the lenses is equal to the focal power of the previous single plano-convex lens as a relay lens group, which satisfies Eq. (10).

(10)$$\Phi_{_{\rm{Plano-convex\ lens}}}+\Phi_{_{\rm{Meniscus\ lens}}}=\Phi_{_{\rm{Relay\ lens\ group}}}.$$

The focal power of the plano-convex lens after separating can be expressed by Eq. (11).

(11)$$\Phi_{_{\rm{Plano-convex\ lens}}}=(n_{5}^{\prime}-1)({-}c_{6}).$$

The focal power of the meniscus lens after separating is defined as Eq. (12). $d_2$ in Eq. (12) is the thickness of the meniscus lens. The initial value is set to 2 mm.

(12)$$\Phi_{_{\rm{Meniscus\ lens}}}=\frac{(n_{6}^{\prime}-1)^{2}d_{2}{c_{6}}^{2}}{n_{6}^{\prime}}.$$

By solving Eqs. (11), (12), and (13) jointly, $c_{6}$, $c_{7}$, $R_{6}$, and $R_{7}$ can be calculated. Thus, the calculation of the initial structure of the compact and lightweight PAL system is completed. The whole PAL system uses only 3 spherical lenses for imaging. The initial structure implements the correction of the Petzval field curvature. Excellent image quality can be achieved with only moderate optimization of the initial structure. In contrast to traditional design methods that only use scaled previous PAL systems, our design theoretical model does not require precise PAL system values. The parameters of the PAL can be defined in a relatively loose range, and the compact PAL system structure can be quickly calculated according to the mathematical model among the PAL system parameters, the Petzval field curvature, and the focal power distribution. This design principle avoids crosstalk and redundant parameters usage among aspherical and multi-spherical surfaces, greatly reducing system complexity, size, and weight. The design approach enables the low-cost, large FoV imaging design that are particularly suitable for space- and weight-constrained areas such as portable wearable devices, miniature detection robots, and UAVs.

3. Design and performance of compact and lightweight PAL system

3.1 Design process of the PAL system

The design flow of the compact and lightweight PAL system is shown in Fig. 6. It is mainly divided into three parts: initial structure calculation, design and optimization, and execution of computer vision tasks. Our ultimate goal is to prove that the compact and lightweight PAL system is able to implement panoramic environment perception.

Fig. 6. Design flow of the compact and lightweight PAL system.

Download Full Size | PDF

3.2 Optimization and image quality of the designed PAL system

According to Section 2, we first construct a PAL system, in which the relay lens group is composed of a single plano-convex lens, as shown in Fig. 7(a). The axial displacement of the focal point in the focal plane for the minimum and maximum FoV is less than 0.6 mm, as shown in the magnified detail view near the focal plane. It proves that the initial structure of the compact PAL system constructed based on the Petzval field curvature correction theory is reasonable. The PAL system has a relay lens group consisting of a plano-convex lens and a meniscus lens, as shown in Fig. 7(b). The axial displacements of the focal points of the minimum and maximum FoV in the focal plane is close to that of Fig. 7(a), which also verifies the theoretical rationality of our focal power distribution theory for designing the compact PAL system.

Fig. 7. (a) PAL system with a relay lens group consisting of a single plano-convex lens. (b) PAL system with a relay lens group composed of a plano-convex lens and a meniscus lens.

Download Full Size | PDF

We choose a sensor with a resolution of 1280$\times$1024, and its pixel size is 4 μm. The focal length scaling of the initial structure satisfies the matching between the PAL system and the sensor. The obtained initial structure consists of virtual glass. It is necessary to replace the glass and optimize the parameters of the initial structure to obtain a PAL system with good imaging quality. Since the compact PAL initial structure designed by the focal power distribution theory is already corrected the field curvature, only a simple optimization is needed to complete the design of a great PAL system with a large FoV.

The optimized PAL system parameters are shown in Table 1. As shown in Fig. 8, the designed system is able to correct the field curvature in the initial structure well. The maximum chief ray angle is 3.43$^{\circ }$, which is close to the telecentricity of the image side.

Fig. 8. Ultimate structure of the compact and light-weight PAL system.

Download Full Size | PDF

Table 1. Specifications of the PAL system

View Table | View all tables in this article

The spot diagram of the designed PAL system is shown in Fig. 9, and the Airy disk radius is smaller than the pixel size. The root mean square spot radius is less than 1 pixel in 0.25$\sim$0.65 FoV and less than 3 pixels in 0.65$\sim$1 FoV. It shows that our system meets the imaging requirements in 0.25$\sim$0.65 FoV. There is still residual uncorrected aberration in 0.65$\sim$1 FoV, we will introduce the method of quality enhancement of the image degraded by such aberrations.

Fig. 9. Spot diagram of the compact PAL.

Download Full Size | PDF

The 0.8 FoV of modulation transfer function (MTF) is greater than 0.25 at 125 lp/mm, and the entire FoV is greater than 0.1, as shown in Fig. 10(a). Figure 10(b) shows that the field curvature is less than 0.1 mm and the $f$-$\theta$ distortion is less than 4.5%.

Fig. 10. (a) MTF of the compact PAL system. (b) Field curvature and $f$-$\theta$ distortion of the compact PAL.

Download Full Size | PDF

Figure 11(a) shows that the relative illumination of the entire FoV of the PAL system is higher than 0.95. It is shown in Fig. 11(b) that the PAL system can still maintain good imaging quality in a large focal depth range. The MTF in entire FoV is higher than 0.58 at 0.1 mm, which indicates that our designed PAL system has loose tolerances.

Fig. 11. (a) Relative illumination of the compact PAL. (b) Through-focus MTF of the compact PAL.

Download Full Size | PDF

3.3 Tolerance analysis

Tolerance analysis of the system enables the evaluation of the system’s mass manufacturing yield. Under the Nyquist frequency 125lp/mm, the PAL system has an average MTF > 0.2 at cumulative probabilities of 90$\%$ for the entire FoV. The tolerance values shown in Table 2, which indicates that our system has extremely loose tolerances with high manufacturing yield and stability.

Table 2. Tolerance values of the PAL system

View Table | View all tables in this article

3.4 Implemented system and performance validation

Compared with the traditional PAL systems, the prototype of the entire PAL has ultra-small size and weight. The prototype of the compact PAL is shown in Fig. 12(a), and the weight is only 20 g. Its volume is comparable to that of a coin, as shown in Fig. 12(b). Its small size and light weight features make it particularly suitable for applications in wearable devices (such as visual aids for the blind, tourist guide devices) and miniaturized robots. It is demonstrated in Fig. 12(c) that our design achieves a large FoV image of 360$^{\circ }\times$(25$^{\circ }$-100$^{\circ }$). We eliminate the image degradation caused by the residual aberration of optical system in the large FoV by deep learning based image processing method (Section 4).

Fig. 12. (a) Prototype of the compact PAL. (b) The compact PAL is comparable in size to a coin. (c) Original image captured by the compact PAL.

Download Full Size | PDF

4. Quality enhancement with computational imaging

4.1 PAL image unfolding

Most of the computer vision tasks are designed for images with normal pixel configurations (i.e., width direction represents the azimuth angle and height direction represents the polar angle, respectively). To facilitate human observing and enjoy the benefits of modern computer vision algorithms, unfolding is the first step for PAL image processing. We use the omnidirectional camera model proposed by Scaramuzza et al. [15] as the camera model for the designed PAL camera, in which a three-dimensional vector $P \propto g(\mathbf {u})$ in object space is reflected onto the image plane by a rotationally symmetric reflector determined by a Taylor polynomial $f(\rho )$:

(13)$$\left\{\begin{array}{c} \begin{aligned} & P \propto g(\mathbf{u})=(u, v, f(\rho))^{\mathrm{T}}, \\ & f(\rho) = \alpha_0 + \alpha_2 \cdot \rho^{2} + \cdots+ \alpha_N \cdot \rho^{N}, \end{aligned} \end{array}\right.$$

where $\rho =\sqrt {u^{2}+v^{2}}$ is the radial distance between 2D vector $\mathbf {u}$ on image plane and the center of image plane, $\alpha$ is the polynomial coefficient, $N$ is the series of the polynomial (set to 10 in our calibration process), and $(u, v)$ are the Cartesian coordinates on the image plane. The coefficients of the polynomial can be directly obtained by internal calibration of the PAL using the OmniCalib calibration toolbox [16]. After completing the camera calibration, we need to unfold the original PAL image into the equirectangular panorama format for subsequent vision tasks. We first project the annular image onto the sphere according to the above camera model. Given the longitude $\phi$ and latitude $\theta$ of a sphere vector, we have $(\phi, \theta ) \in [0,2\pi ] \times [0,\pi ]$. We convert the angular position $(\phi, \theta )$ into the unfolded coordinate $P'=(p_x,p_y)$ by:

(14)$$\begin{aligned} \left\{ \begin{aligned} & p_x = R(\phi-\phi_0)cos\theta_1,\\ & p_y = R(\theta-\theta_1), \end{aligned} \right. \end{aligned}$$

where $\phi _0$ is the center meridian of the map, $\theta _1$ are the standard parallels, i.e., the north and south of the equator, and $R$ is the radius of the sphere.

4.2 Synthetic dataset with aberration

We simulate the imaging process of our proposed PAL design to generate synthetic dataset for image quality enhancement. Based on the publicly available dataset of clear perspective images, we approximately take the unfolded PAL image as the perspective one. Then the degradation of aberration is simulated on the perspective image according to the aberration distribution of designed PAL on the unfolded image plane. The point spread functions (PSFs) of our PAL system can be calculated by $Zemax$. Because the PSFs varies greatly with the FoVs, we sample FoVs in the range of 25$^{\circ }\sim$100$^{\circ }$ with step of 0.75$^{\circ }$ for 100 different PSFs. The wave response of selected sensor is also considered through fusing sampled 31 groups of PSFs from wavelengths of 400 nm to 700 nm at 10 nm intervals into RGB channels. In addition, we apply patch-wise convolution between calculated PSFs and corresponding image patches of raw image. We note that the effect of aberration is simulated on raw image produced by invert image signal process (ISP) and we generate the final degraded image through ISP [17]. This operation is consistent with the imaging process of our PAL system.

4.3 PAL-Restormer for image enhancement

Inspired by Restormer [18], we propose PAL-Restormer for our PAL image restoration. The transformer block is robust to the local detailed perturbations because the multi-head self-attention mechanism focus on the global information of the image despite the tiny local disturbance [19,20]. Consequently, when the simulated image degradation deviates from the real manufactured one, our PAL-Restormer is robust enough to handle such synthetic-to-real gap. Figure 13 shows the overview of our model with an encoder-decoder architecture. We concatenate the PSF and downsample rate information with image as input. The Channel Transformer Block from Restormer [18] is adopted as our basic block.

Fig. 13. Overview of our model architecture.

Download Full Size | PDF

However, in the Channel Transformer Block, self-attention is only applied at channel level, and it is not enough for the PAL image with a large FoV. To model the long-distance dependencies in the PAL image, we introduce the Spatial Attention Block between the encoder and the decoder of the model. But the vanilla transformer block in Vision Transformer [21] is of high computational cost, and it is not bearable for PAL image with large spatial size. As Fig. 13 shows, to reduce the computational cost, we downsample the feature maps of $Q$ and $K$ to $1/s$ as the original input size using a strided convolution layer. And the computational complexity is reduced from $O(h^{2}w^{2})$ to $O(\frac {h^{2}w^{2}}{s^{2}})$. The $s$ is empirically set to 2 in our experiments.

Another vital factor is the utilize the PSF information effectively. Apart from concatenate the PSF with image in the input level, we propose the PSF Aggregating Block after the encoder-decoder inspired by [22]. In the PSF Aggregating Block, the concatenated input is fused with the feature maps in non-self attention fashion. The fused feature maps are feed to refine block, which is consist of two Channel Transformer Block. In the final, we add the predicted residual to the input image [23].

5. Experiments

5.1 Datasets and settings

Because the lack of real world low quality-high quality PAL image pairs, we train our model on synthetic datasets and show qualitative results on our collected PAL images. We choose DIV2K [24] as the base dataset, which contains 800, 100, 100 diverse high-resolution images for training, validation and test. Then, according to the aforementioned PAL image enhancement targets, we generate simulated degraded images based on PSFs from $Zemax$. With our proposed PAL, we collect images in various scenes for evaluating the quality enhancement methods. The scenes include the campus, streets and busy shopping areas, which allow us to verify the performance of our PAL in panoramic vision tasks under these scenes.

We train our network from scratch on $256\times 256$ random crops from train set of the synthetic dataset. Horizontal flipping is applied for data augment. We use AdamW [25] optimizer with a initial learning rate of $3\times 10^{-4}$ and cosine learning rate annealing strategy with minimum learning rate of $1\times 10^{-7}$. We train our model with a batch size of $8$ for 400k iterations on 4 TITAN RTX GPUs. To make a fair comparison, we train our model and other counterparts with the same training settings and configurations.

5.2 Ablation study

We first conduct ablation studies to analyze the effect of the model design and the proposed modules. The “Baseline” in Table 3 is the base model from Restormer [18]. Simply equipping Restormer with our PSF and PSF Aggregating Block yields 1.09 dB improvement in PSNR and 0.11 in SSIM. Simply adding our Spatial transformer block between the encoder and the decoder of the Restormer improves the PSNR by 0.12 dB. Concatenating PSF with image in the input boost the performance by 1.22 dB. We also add ablation study on PSF aggregating block without PSF, and in this condition, the input of the PSF aggregating block are input image and feature maps from decoder. PSF Aggregating Block improves 0.11 dB with out PSF, and 0.19 dB with PSF. This shows our PSF Aggregating Block effectively utilize the PSF information.

Table 3. Ablation of various components of PAL-Restormer on synthetic dataset^a

View Table | View all tables in this article

5.3 Comparisons with state-of-the-art methods

We compare our approach with other state-of-the-art methods: Restormer [18] and SRN [26]. All the methods are trained on our DIV2K dataset with degradation caused by optical aberration with the same training strategy. Table 4 shows the quantitative results. Our methods surpass all the other methods in the validation set of the DIV2K dataset.

Table 4. Comparison with state-of-the-art methods on the synthetic dataset

View Table | View all tables in this article

Figure 14 also shows qualitative images produced by our compact PAL system (top) and restored image from our proposed PAL-Restormer (bottom). Due to the limited aberration optimization capability of the three lenses PAL system, there are still some residual aberrations at the edge of the FoV. In addition, because of the different pixels at different FoVs on annular image, the corresponding regions of unfolded image are upsampled or downsampled during unfolding. In our case, the PAL image is unfolded at 90 $^{\circ }$ half FoV, bringing regions of FoVs smaller than it upsampled while regions of FoVs larger than it downsampled. Consequently, regions of all FoVs on unfolded image suffer from degradation. The original image is hazy in details and textures (people, windows, shop sign in the enlarged windows). With the proposed PAL-Restormer, the processed images are clearer and sharper in areas with full textures compared with original image while keeping the smooth area (e.g., sky) unchanged.

Fig. 14. (a) Image quality enhancement of urban long-distance scenes. (b) Image quality enhancement for urban close range scenes. (Top: original image, Bottom: quality-enhanced image).

Download Full Size | PDF

6. Evaluation on computer vision tasks

6.1 Object detection

To test the quality of the restored image for computer vision tasks, we firstly implement object detection on the original image and restored image by our PAL-Restormer. We use YOLOv5 [27] as our object detection method and train it on COCO dataset [28]. We evaluate the YOLOv5 on the degraded and restored image from PAL, respectively. The original image already shows potential in the application of object detection (Fig. 15(a)). Figure 15(b) shows the result from our enhanced image from PAL-Restormer. The confidence of the objects (cars and people) increases in the enhanced image and more people in the image are detected, which proves the effectiveness of the PAL-Restormer.

Fig. 15. (a) Outdoor object detection results in original images. (b) Outdoor object detection results in restored images. (Top: sunny day, Bottom: cloudy day).

Download Full Size | PDF

6.2 Optical flow

Panoramic optical flow estimation can obtain temporal cues of surrounding scenes in one-shot computation. We use the state-of-the-art 360$^{\circ }$ flow estimation pipeline PanoFlow [29] to explore the effect of PAL image restoration on the quality of optical flow estimation. The qualitative results are shown in Fig. 16. Note that on the restored image, the optical flow estimation results for the foreground such as pedestrians and vehicles are more accurate. For example, the restored image’s flow field can clearly identify cyclists, while the original image results in blurry shape. Obviously, panoramic optical flow estimation benefits from our image restoration. The result of the restored panoramic image is smoother, and the edge is sharper.

Fig. 16. (a) Panoramic optical flow estimation results in original images. (b) Panoramic optical flow estimation results in restored images.

Download Full Size | PDF

7. Conclusion and future work

In this paper, we propose a focal power distribution theory for the compact and light-weight PAL system based on Petzval sum correction. According to the theory, we design a 360$^{\circ }\times$(25$^{\circ }$-100$^{\circ }$) large FoV PAL system with a coin-size and its weight only 20 g. The designed PAL system tackles three major challenges in the traditional PAL systems: large volume, complex system, and tight tolerances. Benefiting from deep learning techniques, we eliminate the effects of residual aberration on captured image and demonstrate the potential in downstream computer vision tasks while maintaining a small size and loose tolerances. The proposed PAL-Restormer improves both image quality and computer vision task performance. The design concept demonstrates the applications of the compact PAL system in scenarios where the volume and weight of the optical system is strictly limited, and brings a promising approach to miniaturized panoramic environmental perception.

In future research, we will continue to reduce the volume and weight of the PAL based on emerging optical techniques and theories to expand the application scenarios of the panoramic system in space and volume constraints.

Funding

National Natural Science Foundation of China (12174341).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. Gao, E. A. Tsyganok, and X. Xu, “Design of a compact dual-channel panoramic annular lens with a large aperture and high resolution,” Appl. Opt. 60(11), 3094–3102 (2021). [CrossRef]

2. D. Wang, J. Wang, Y. Tian, K. Hu, and M. Xu, “PAL-SLAM: a feature-based slam system for a panoramic annular lens,” Opt. Express 30(2), 1099–1113 (2022). [CrossRef]

3. D. Cheng, C. Gong, C. Xu, and Y. Wang, “Design of an ultrawide angle catadioptric lens with an annularly stitched aspherical surface,” Opt. Express 24(3), 2664–2677 (2016). [CrossRef]

4. X. Wang, X. Zhong, R. Zhu, F. Gao, and Z. Li, “Extremely wide-angle lens with transmissive and catadioptric integration,” Appl. Opt. 58(16), 4381–4389 (2019). [CrossRef]

5. K. Zhang, X. Zhong, L. Zhang, and T. Zhang, “Design of a panoramic annular lens with ultrawide angle and small blind area,” Appl. Opt. 59(19), 5737–5744 (2020). [CrossRef]

6. X. Zhou, J. Bai, C. Wang, X. Hou, and K. Wang, “Comparison of two panoramic front unit arrangements in design of a super wide angle panoramic annular lens,” Appl. Opt. 55(12), 3219–3225 (2016). [CrossRef]

7. Q. Zhou, Y. Tian, J. Wang, and M. Xu, “Design and implementation of a high-performance panoramic annular lens,” Appl. Opt. 59(36), 11246–11252 (2020). [CrossRef]

8. J. Wang, K. Yang, S. Gao, L. Sun, C. Zhu, K. Wang, and J. Bai, “High-performance panoramic annular lens design for real-time semantic segmentation on aerial imagery,” Opt. Eng. 61(03), 035101 (2022). [CrossRef]

9. J. Wang, J. Bai, K. Wang, and S. Gao, “Design of stereo imaging system with a panoramic annular lens and a convex mirror,” Opt. Express 30(11), 19017–19029 (2022). [CrossRef]

10. S. Niu, J. Bai, X. Hou, and G. Yang, “Design of a panoramic annular lens with a long focal length,” Appl. Opt. 46(32), 7850–7857 (2007). [CrossRef]

11. J. Wang, Y. Liang, and M. Xu, “Design of panoramic lens based on ogive and aspheric surface,” Opt. Express 23(15), 19489–19499 (2015). [CrossRef]

12. Y. Huang, Z. Liu, Y. Fu, and H. Zhang, “Design of a compact two-channel panoramic optical system,” Opt. Express 25(22), 27691–27705 (2017). [CrossRef]

13. X. Li and Z. Cen, Geometrical Optics, Aberrations and Optical Design (Zhejiang University Press, 2014).

14. Z. Huang, J. Bai, T. Lu, and X. Hou, “Stray light analysis and suppression of panoramic annular lens,” Opt. Express 21(9), 10810–10820 (2013). [CrossRef]

15. D. Scaramuzza, A. Martinelli, and R. Siegwart, “A flexible technique for accurate omnidirectional camera calibration and structure from motion,” in Fourth IEEE International Conference on Computer Vision Systems (ICVS’06) (IEEE, 2006), p. 45.

16. D. Scaramuzza, A. Martinelli, and R. Siegwart, “A toolbox for easily calibrating omnidirectional cameras,” in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE, 2006), pp. 5695–5701.

17. T. Brooks, B. Mildenhall, T. Xue, J. Chen, and J. T. Barron, “Unprocessing images for learned raw denoising,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

18. S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” arXiv:2111.09881 (2021).

19. C. Si, W. Yu, P. Zhou, Y. Zhou, X. Wang, and S. Yan, “Inception transformer,” arXiv:2205.12956 (2022).

20. N. Park and S. Kim, “How do vision transformers work?” arXiv:2202.06709 (2022).

21. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv:2010.11929 (2021).

22. L. Sun, C. Sakaridis, J. Liang, Q. Jiang, K. Yang, P. Sun, Y. Ye, K. Wang, and L. Van Gool, “Mefnet: Multi-scale event fusion network for motion deblurring,” arXiv:2112.00167 (2021).

23. L. Sun, J. Wang, K. Yang, K. Wu, X. Zhou, K. Wang, and J. Bai, “Aerial-pass: panoramic annular scene segmentation in drone videos,” in 2021 European Conference on Mobile Robots (ECMR) (IEEE, 2021), pp. 1–6.

24. E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2017), pp. 1122–1131.

25. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv:1711.05101 (2017).

26. X. Tao, H. Gao, X. Shen, J. Wang, and J. Jia, “Scale-recurrent network for deep image deblurring,” in Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition (2018), pp. 8174–8182 .

27. G. Jocher, A. Stoken, J. Borovec, et al., “ultralytics/yolov5: v3.1 - Bug Fixes and Performance Improvements,” (2020).

28. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision (Springer, 2014), pp. 740–755.

29. H. Shi, Y. Zhou, K. Yang, Y. Ye, X. Yin, Z. Yin, S. Meng, and K. Wang, “Panoflow: Learning optical flow for panoramic images,” arXiv:2202.13388 (2022).

Parameters	Specifications
Wavelength	486-656 nm
F-number	5.5
Effective focal length	-1.17 mm
FoV	360 $^{\circ} \times$ (25 $^{\circ}$ -100 $^{\circ}$ )
Nyquist frequency	125 lp $/$ mm
$f$ - $θ$ distortion	4.5 $%$
Relative illumination	$⩾ 0.95$
$R_{e f f i c i e n t}$	93.7 $%$
Total length	29.2 mm

Tolerance items	Value
Radius (fringe)	$⩽ 3$
Thickness (mm)	$\pm$ 0.08
Surface decenter (mm)	$\pm$ 0.03
Element tilt ( $^{\circ}$ )	$\pm$ 0.1
Element decenter (mm)	$\pm$ 0.05
Surface irregularity (fringe)	$⩽ 0.5$
Refractive index	$\pm$ 0.001
Abbe number (%)	$\pm$ 1

Architecture	PSF	PSF aggregating block	PSNR $↑$	SSIM $↑$
Baseline	$\times$	$\times$	27.12	0.867
Baseline	$✓$	$✓$	28.21	0.878
Baseline + STB	$\times$	$\times$	27.24	0.871
Baseline + STB	$✓$	$\times$	28.46	0.879
Baseline + STB	$\times$	$✓$	27.35	0.876
Baseline + STB	$✓$	$✓$	28.65	0.88

Method	PSNR $↑$	SSIM $↑$
Restormer [18]	27.12	0.867
SRN [26]	27.65	0.874
PAL-Restormer (Ours)	28.65	0.88

Parameters	Specifications
Wavelength	486-656 nm
F-number	5.5
Effective focal length	-1.17 mm
FoV	360 $^{\circ} \times$ (25 $^{\circ}$ -100 $^{\circ}$ )
Nyquist frequency	125 lp $/$ mm
$f$ - $θ$ distortion	4.5 $%$
Relative illumination	$⩾ 0.95$
$R_{e f f i c i e n t}$	93.7 $%$
Total length	29.2 mm

Compact and lightweight panoramic annular lens for computer vision tasks

Abstract

1. Introduction

2. Design principles

2.1 PAL system imaging principle

2.2 Focal power distribution theory for compact PAL initial structure calculation

3. Design and performance of compact and lightweight PAL system

3.1 Design process of the PAL system

3.2 Optimization and image quality of the designed PAL system

3.3 Tolerance analysis

3.4 Implemented system and performance validation

4. Quality enhancement with computational imaging

4.1 PAL image unfolding

4.2 Synthetic dataset with aberration

4.3 PAL-Restormer for image enhancement

5. Experiments

5.1 Datasets and settings

5.2 Ablation study

5.3 Comparisons with state-of-the-art methods

6. Evaluation on computer vision tasks

6.1 Object detection

6.2 Optical flow

7. Conclusion and future work

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (16)

Tables (4)

Equations (14)

Optics Express