## Abstract

Recovering the real light field, including the light field intensity distributions and continuous volumetric data in the object space, is an attractive and important topic with the developments in light-field imaging. In this paper, a blind light field reconstruction method is proposed to recover the intensity distributions and continuous volumetric data without the assistant of prior geometric information. The light field reconstruction problem is approximated to be a summation of the localized reconstructions based on image formation analysis. Blind volumetric information derivation is proposed based on backward image formation modeling to exploit the correspondence among the deconvoluted results. Finally, a light field is blindly reconstructed via the proposed inverse image formation approximation and wave propagation. We demonstrate that the method can blindly recover the light field intensity with continuous volumetric data. It can be further extended to other light field imaging systems if the backward image formation model can be derived.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Recovering the real light field in the object space is an attractive and important topic with the developments in light-field imaging. According to the wave-optics models, the light field in the object space consists of the intensity distributions and continuous volumetric data. Reconstructing them only using the spatial and angular information recorded on the sensor is challenging since the volumetric information is lost during acquisition and the spatial resolution of acquired data is limited.

The existing light field recovery works mainly use the data acquired by plenoptic cameras since they can record the direction of light rays via a single shoot [1–3]. They reconstruct the light field by computational synthesis of 3D focal stacks across the 3D scene based on ray-optics [3,4]. While the volumetric information they provided is a relative depth among the virtual focal planes and the compromise between the lateral and angular resolution shows resolution loss during reconstruction. Zhang *et al.* reconstructed 3D object by moving a plenoptic camera around the object and updating the structure-from-motion method [5]. Although the point cloud can be reconstructed, it needs to capture and register multiple light-field images, which is only applicable to static objects. S. Shroff *et al.* proposed wave-optics models to reconstruct the light field through point-spread-function (PSF) deconvolution [6–8]. It mitigates the resolution loss for the reconstructed light field, while, the prior information, like the distance of each object or the distance of each object point in an extreme case, is needed. We proposed a light field reconstruction model to tackle the scenarios that the imaging noise exists in the acquired data [9]. However, the exact distance of the object plane is still needed for reconstruction. C. Guo *et al.* extended the work to the microscopic scale and reconstructed 3D volumetric information. Nevertheless, the geometric information of the scene is needed [10]. M. Broxton *et al*. used Richardson-Lucy algorithm to recover the 3D scene [11]. However, their work cannot obtain the exact object distance, which results in the difficulty in clarifying which object on which depth generates the reconstructed intensity. Also, they cannot reconstruct the light field for a specific object.

So, in this paper, a blind light field reconstruction method is proposed to recover the intensity distributions and continuous volumetric data without the assistant of prior geometric information. Plenoptic camera 2.0 [12–14], which inserts a microlens array behind the image plane of the main lens for an improved spatial resolution of the acquired light field, is exploited to benefit from its distinct image response. By analyzing the image responses among the neighboring microlens, we proposed to approximate the light field reconstruction problem to be a summation of the localized reconstructions. Based on the approximation, blind light field reconstruction problem becomes to be a problem in blindly deriving the distance correspondence from the reconstructions generated by the microlens images. Blind volumetric information derivation is proposed based on backward image formation modeling to exploit the correspondence among the deconvoluted results. Finally, light field is blindly reconstructed via the proposed inverse image formation approximation and wave propagation. We demonstrate that the method can blindly recover the light field intensity with continuous volumetric data. It can be further extended to other light field imaging systems if the backward image formation model can be derived.

The paper is organized as follows. The proposed light field reconstruction approximation is described in detail in Section 2. Section 3 describes the proposed blind light field reconstruction method. Section 4 provides experimental results to demonstrate the effectiveness of the proposed method. Section 5 concludes the paper.

## 2. Light field reconstruction approximation

#### 2.1 Image formation of plenoptic camera 2.0 and light field reconstruction modeling

The optical configuration of plenoptic camera 2.0 is shown in Fig. 1.

As shown in the figure, a microlens array is inserted between the image plane of the main lens and the imaging sensor. Rays coming from the object on the focal plane, the rays in green, pass through the main lens and focus on the image plane. Then, treating the light field on the image plane as a new object, the microlens array reimages it on the sensor. Thus, dividing the relay imaging system into several sub-imaging-systems, our previous work successfully modeled the image formation process of plenoptic camera 2.0 for a point light source placed at (*x*_{0}, *y*_{0}) on depth *d*1 by wave-optics as [9,15]

*x*

_{0},

*y*

_{0}), (

*x*), (

_{main}, y_{main}*x*

_{1},

*y*

_{1}), (

*x*) and (

_{m}, y_{m}*x, y*) are the coordinate of the point on the object plane, main lens plane, the image plane of the main lens, microlens array plane and sensor plane, respectively;

*d*

_{1}and

*d*

_{2}is the distance between the object and the main lens and that between the main lens and the image plane, respectively;

*d*

_{3}and

*l*is the distance between the image plane and the microlens and that between the microlens and the sensor plane, respectively; $\lambda $ is the wavelength of the light;

*k*is the wavelength number equaling to$2\pi /\lambda $;

*t*(

_{main}*x*) and

_{main}, y_{main}*t*(

_{micro}*x*) is the phase correction factor of the main lens and that of a single microlens, respectively.

_{m}, y_{m}*t*(

_{main}*x*) represents the optical characteristic of the main lens, which is given by [15]:

_{main}, y_{main}*P*

_{1}(

*x*) is the pupil function of the main lens; and

_{main}, y_{main}*f*

_{1}is the focal length of the main lens.

*t*(

_{micro}*x*) represents the optical characteristic of a single microlens, which is given by [15]:

_{m}, y_{m}*P*

_{2}(

*x*) is the pupil function of the microlens; and

_{m}, y_{m}*f*

_{2}is the focal length of a single microlens.

*h*(*x*, *y*, *x*_{0}, *y*_{0}) describes the imaging response of a point light source, called the PSF of the imaging system. Treating a real imaging target as a set of point light sources, a pixel on the sensor actually records the summation of the imaging responses from all the light sources. Since only the intensity value is recorded, the intensity of a pixel on the sensor, *I* (*x*, *y*), can be formulated as:

*I*(

*x*

_{0},

*y*

_{0}) is the intensity of point (

*x*

_{0},

*y*

_{0}). Thus, a linear forward imaging model can be established [4–7,16,17] as:

_{${I}_{0{d}_{1n}}$}, a $\left({P}_{0}\times {Q}_{0},1\right)$ vector, represents an object consisting of ${P}_{0}\times {Q}_{0}$point light sources on the object plane

*d*

_{1}= ${d}_{1n}$away from the main lens. Each entry in ${I}_{0{d}_{1n}}$represents the intensity of a point light source. Similarly, ${I}_{s}^{{d}_{1n}}$ is a $\left({P}_{s}\times {Q}_{s},1\right)$ vector corresponding to ${P}_{s}\times {Q}_{s}$sensor data generated by the point sources at depth ${d}_{1n}$. ${H}_{{d}_{1n}}$ is a$\left({P}_{s}\times {Q}_{s},{P}_{0}\times {Q}_{0}\right)$ system transmission matrix of object distance ${d}_{1n}$. It is organized like:

*i*in ${H}_{{d}_{1n}}^{}$ describes how the light rays coming from the

*i*

^{th}object point contribute to all the pixels on the sensor. While, row

*j*in ${H}_{{d}_{1n}}^{}$ describes how the object space points contribute to the

*j*

^{th}pixel on the senor.

So, to recover the light field intensity in the object space, ${I}_{0{d}_{1n}}$in Eq. (5), the inverse problem of Eq. (5) can be formulated by Tikhonov regularization [18,19] considering the existence of imaging noise and the possibility of non-singular ${H}_{{d}_{1n}}^{}$ [9]. It is given by:

If ${d}_{1n}$is known, which corresponds to the prior geometric information of the imaging target is known, ${I}_{0{d}_{1n}}$can be recovered with high accuracy via the derivation of ${H}_{{d}_{1n}}^{}$. However, such prior information is generally unknown, which results in blind light field reconstruction becoming challenging.

#### 2.2 The proposed light field reconstruction approximation

Considering the information that can be exploited is limited to the sensor data and the optical configuration of the imaging system, we propose to approximate the light field reconstruction problem by exploiting the optical structure of plenoptic camera 2.0. Referring to the system structure shown in Fig. 1, the image on the sensor can also be treated as the summation of the imaging responses from all the microlenses. So, the image formation process of plenoptic camera 2.0 with *M* × *N* microlenses can be reformulated by [14]:

*m*,

_{x}*m*)

_{y}^{th}microlens;${H}_{{d}_{1n}}^{\stackrel{\rightharpoonup}{m}}$ is the (

*m*,

_{x}*m*)

_{y}^{th}microlens’s system transmission matrix at object distance

*d*

_{1}

*. Theoretically, the dimension of ${I}_{s}^{\stackrel{\rightharpoonup}{m},{d}_{1n}}$is the same with that of${I}_{s}^{{d}_{1n}}$, and the dimension of ${H}_{{d}_{1n}}^{\stackrel{\rightharpoonup}{m}}$is the same with that of ${H}_{{d}_{1n}}$. However, saving*

_{n}*M*×

*N*transmission matrices is storage expensive and retrieving ${I}_{s}^{\stackrel{\rightharpoonup}{m},{d}_{1n}}$is impractical to a real image.

To simply this, the image formation process is further analyzed using ray-optics to discover the ray contribution on the sensor. For a point light source at (*x*_{0}, *y*_{0}) in the object space, as shown in Fig. 1, rays coming from it pass through the main lens and converge at (*x*_{1}, *y*_{1}). (*x*_{1}, *y*_{1}) and (*x*_{0}, *y*_{0}) satisfy:

*d*1 and

*d*2 satisfy the Gaussian equation:

Then, treating the point at (*x*_{1}, *y*_{1}) as a new object, the microlenses within the imaging range reimage it. The rays pass through the edge of the main lens, as *Ray*1 and *Ray*2 shown in Fig. 1, determine the imaging range on the microlens array. Using the coordinates in the vertical direction as instances, the microlens’ coordinates within the imaging range can be derived as follows. For *Ray*1, the vertical coordinate of its image on microlens array is *y _{m}*

_{1}. It is given by:

*R*is the radius of the main lens; and

*L*is the distance between the main lens plane and microlens array plane. Similarly, the vertical coordinate of the image of

*Ray*2 in Fig. 1 on microlens array is

*y*

_{m}_{2}. It equals to:

As *L* is bigger than *d*_{2}, which corresponds to imaging the objects with rays converged before the microlens array, the focused image *y*_{1} is between the microlens array and main lens and *y _{m}*

_{1}is vertically below

*y*

_{m}_{2}. The vertical coordinate of a microlens,${m}_{y}\in \left[1,N\right]$, in the imaging range satisfies:

*r*is the radius of a microlens. Based on above equations, the microlens

*m*within the imaging range satisfies:Substituting Eqs. (9) and (10) into it, we have:

_{y}As *L* is smaller than *d*_{2}, which corresponds to imaging the objects with rays converged behind the microlens array, the focused image *y*_{1} is behind the microlens array and *y _{m}*

_{1}is vertically above

*y*

_{m}_{2}. Similarly,

*m*within the imaging range satisfies:

_{y}The above derivation can be performed for the *x* dimension equally. Combining Eqs. (15) and (16) together, it is found that for a specific object point (*x*_{0}, *y*_{0}), no matter it is focused or defocused, only some microlens (*m _{x}*,

*m*) together with the pixels under them record its information.

_{y}Further analyzing the imaging response of (*x*_{0}, *y*_{0}), i.e. the pixel on the sensor, the generated converge point (*x*_{2}, *y*_{2}) is given by:

*D*is the pitch of a microlens equaling to 2

*r*.

*d*

_{4}and $\left(L\text{-}{d}_{2}\right)$ satisfy the Gaussian equation:

After rays converging at (*x*_{2}, *y*_{2}), if the object point is on the focal plane of the main lens, like the rays in green shown in Fig. 1, the imaging result behind microlens (*m _{x}*,

*m*) will be a pixel (

_{y}*x*,

*y*). If the object point is not on the focal plane, like the red rays shown in Fig. 1, the rays will propagate from (

*x*

_{2},

*y*

_{2}) to the sensor, which results in a bright disk area on the sensor. As the image formation properties are similar between the center of the disk and the points around, we use the center point (

*x*,

*y*) of the disk in the following derivations because of its simplicity in mathematical expressions. The center of the disk is:

*x*,

*y*) on the sensor, only a small group of object points (

*x*

_{0},

*y*

_{0}) and some microlens (

*m*,

_{x}*m*) contribute to it.

_{y}Thus, combining the two observations together with design constraints described in [3] that the image-side f-number must match the microlens f-number to prevent microlens image overlapping and to maximize the illuminated area behind each microlens for the plenoptic cameras, we propose to approximate image formation process of plenoptic camera 2.0 with *M* × *N* microlenses by the summation of the localized responses of microlenses as:

^{th}microlens which is spatially cropped from ${I}_{s}^{{d}_{1n}}$; and ${I}_{0{d}_{1n}}^{\stackrel{\rightharpoonup}{m}}$represents point light sources that contribute to the $\stackrel{\rightharpoonup}{m}$

^{th}microlens. ${H}_{{d}_{1n}}^{\stackrel{\rightharpoonup}{m}}$ is the $\stackrel{\rightharpoonup}{m}$

^{th}microlens’s system transmission matrix that is approximated by keeping the rows in ${H}_{{d}_{1n}}$that correspond to the pixels in ${I}_{s}^{\stackrel{\rightharpoonup}{m},{d}_{1n}}$unchanged while setting other rows to be zero based on the observations above that only a small group of object points (

*x*

_{0},

*y*

_{0}) and some microlens (

*m*,

_{x}*m*) contribute to specific pixels on the sensor. Considering that ${I}_{0{d}_{1n}}$consists of point light sources from multiple objects at depth ${d}_{1n}$and a group of object points only contribute to a specific number of pixels on the sensor, we proposed to further simplify Eq. (21) to be:

_{y}*d*

_{1}

*;*

_{n}*O*is the total number of objects on

*d*

_{1}

*;${I}_{0{d}_{1n},{\Omega}_{k}}^{\stackrel{\rightharpoonup}{m}}$is the intensity of point sources in ${\Omega}_{k}$ that contribute to the $\stackrel{\rightharpoonup}{m}$*

_{n}^{th}microlens; ${I}_{s,{\Omega}_{k}}^{\stackrel{\rightharpoonup}{m},{d}_{1n}}$is the image corresponding to ${\Omega}_{k}$under microlens $\stackrel{\rightharpoonup}{m}$.

Since exchanging the order of summation will not affect the result in Eq. (22), finally, recovering the light field intensity in the object space, ${I}_{0{d}_{1n}}$in Eq. (7), is proposed to be approximated by:

## 3. Blind light field reconstruction

#### 3.1 Backward image formation modeling and blind volumetric information derivation

To blindly derive *d*_{1}* _{n}* for a correct reconstruction, the backward image formation process is analyzed to estimate

*d*

_{1}

*from the spatial correspondence among the reconstructions generated at a series of depths using multiple microlens images.*

_{n}Substituting Eq. (10) into Eq. (20) and generalizing (*x*, *y*) by $\left({x}_{{d}_{1n}}^{{m}_{x}},{y}_{{d}_{1n}}^{{m}_{y}}\right)$as a pixel under microlens $\stackrel{\rightharpoonup}{m}$ = (*m _{x}*,

*m*), corresponding to a point light source $\left({x}_{0{d}_{1n},{\Omega}_{k}}^{{m}_{x}},{y}_{0{d}_{1n},{\Omega}_{k}}^{{m}_{y}}\right)$ in ${\Omega}_{k}$ whose intensity is an element of${I}_{0{d}_{1n},{\Omega}_{k}}^{\stackrel{\rightharpoonup}{m}}$, we can express the relationship between $\left({x}_{{d}_{1n}}^{{m}_{x}},{y}_{{d}_{1n}}^{{m}_{y}}\right)$ and $\left({x}_{0{d}_{1n},{\Omega}_{k}}^{{m}_{x}},{y}_{0{d}_{1n},{\Omega}_{k}}^{{m}_{y}}\right)$ by the function of

_{y}*d*

_{1}

*. Using the horizontal direction as an instance, we have:*

_{n}*d*

_{1}

*is unknown, during recovery, ${{d}^{\prime}}_{1n}$ that is different from*

_{n}*d*

_{1}

*may be assigned to Eq. (25). Meanwhile, ${x}_{{d}_{1n}}^{{m}_{x}}$ in Eq. (24) is constant caused by the fixed ${x}_{0{d}_{1n},{\Omega}_{k}}^{{m}_{x}}$ and*

_{n}*d*

_{1}

*. Thus, via substituting Eq. (24) into Eq. (25), the spatial distance between the inverse projection at ${{d}^{\prime}}_{1n}$and that at the real distance*

_{n}*d*

_{1}

*is given by:*

_{n}*d*

_{1}

*, the reconstructed ${x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x}}$moves closer/farther to the correct position ${x}_{0{d}_{1n},{\Omega}_{k}}^{{m}_{x}}$ gradually. The distance between ${x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x}}$and ${x}_{0{d}_{1n},{\Omega}_{k}}^{{m}_{x}}$ is changed linearly. It only equals to zero as ${{d}^{\prime}}_{1n}$ equals to*

_{n}*d*

_{1}

*.*

_{n}If $\left({x}_{0{d}_{1n},{\Omega}_{k}}^{{m}_{x}},{y}_{0{d}_{1n},{\Omega}_{k}}^{{m}_{y}}\right)$also contributes to the pixels under other microlens, like the red point source in Fig. 1, it can be reconstructed by the pixels from different microlenses. Thus, the distance between ${x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x1}}$ and ${x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x2}}$that recovered from the pixel under microlens ${m}_{x1}$ and${m}_{x2}$, respectively, using ${{d}^{\prime}}_{1n}$ different from *d*_{1}* _{n}* is given by:

*d*

_{1}

*, the reconstructed object points ${x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x1}}$ and ${x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x2}}$are spatially apart from each other, which visually displayed as ghosting effect in the reconstructed light field. The effect can only be eliminated as ${{d}^{\prime}}_{1n}$equals to*

_{n}*d*

_{1}

*, i.e., the object points are spatially coincident to be the real object point. It is obvious that the derivation is general to all the point light sources belonging to ${\Omega}_{k}$at depth*

_{n}*d*

_{1}

*. So, blind deriving*

_{n}*d*

_{1}

*can be solved by detecting whether the object points reconstructed from the pixels under different microlenses are spatially coincident. The above derivations are also valid to the vertical direction.*

_{n}Since $\left({x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x1}},{y}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{y1}}\right)$ and $\left({x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x2}},{y}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{y2}}\right)$signals the entry of the reconstructed pixel intensity in image ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$and ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{\text{2}}}$, we propose to detect whether $\left({x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x1}},{y}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{y1}}\right)$ is spatially coincident with $\left({x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x2}},{y}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{y2}}\right)$by evaluating the similarity between the corresponding reconstructed intensity image ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$and ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{\text{2}}}$. As ${{d}^{\prime}}_{1n}$is different from *d*_{1}* _{n}*, the intensity of the point located at $\left({x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x1}},{y}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{y1}}\right)$in ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$is different from the intensity of the collocated point in ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{\text{2}}}$. So, if calculating the pixel-wise intensity difference between ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$and ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{\text{2}}}$, the difference will decrease with the difference decrement between ${{d}^{\prime}}_{1n}$and

*d*

_{1}

*. It will reach the minimum as ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$is exactly the same with${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{\text{2}}}$, which corresponds to $\left({x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x1}},{y}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{y1}}\right)$ is spatially coincident with $\left({x}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{x2}},{y}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{m}_{y2}}\right)$. Generalizing the process to horizontal and vertical directions,*

_{n}*d*

_{1}

*can be blindly derived by:*

_{n}*A*and

*B*. In image processing area, there are several methods in evaluating the spatial similarity between two images. In this paper, we use Euclidean distance between ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$and ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{\text{2}}}$as the evaluation method taking the consideration of its low complexity and sufficient accuracy in identification.

So, for the point sources in ${\Omega}_{k}$at depth *d*_{1}* _{n}*, we segment its image under microlens ${\stackrel{\rightharpoonup}{m}}_{1}$ as ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1},{d}_{1n}}$ in Eq. (23), and uses a series of ${H}_{{{d}^{\prime}}_{1n}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$ at ${{d}^{\prime}}_{1n}$ to reconstruct a series of${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$by:

*d*

_{1}

*is derived. Then, ${I}_{0{d}_{1n}}$ can be directly reconstructed by adding the reconstruction of each object under each microlens at*

_{n}*d*

_{1}

*, ${I}_{0{d}_{1n},{\Omega}_{k}}^{\stackrel{\rightharpoonup}{m}}$, together according to Eq. (23), since they have been reconstructed during deriving*

_{n}*d*

_{1}

*. Although, theoretically, the images under all the microlenses need to complete the process for each object, it can be further simplified by using limited number of images based on the discussion above that the rays from specific object point only contribute to limited number of pixels on the sensor. During the implementation, we use the most complete two images of the object from two microlens to greatly reduce the computational complexity and preserve the reconstruction quality.*

_{n}#### 3.2 Light field repropagation

For the real scenario that several imaging targets are located at different depths, i.e. different *d*_{1}* _{n}*, the above processing can be applied iteratively to recover ${I}_{0{d}_{1n}}$. Since ${I}_{0{d}_{1n}}$ only contains the light field intensity of the targets, i.e. ${\Omega}_{k}$ s, located at depth

*d*

_{1}

*, light field repropagation is required to get additional light field on*

_{n}*d*

_{1}

*that generated by the light sources (imaging targets) on other depths. Similarly using the light propagation as we exploited in deriving the imaging response of plenoptic camera 2.0 [14], the light field at point (*

_{n}*x’*,

*y’*) on

*d*

_{1}

*that is generated by the light propagated from the light source (*

_{m}*x*

_{0},

*y*

_{0}) in ${\Omega}_{k}$ on

*d*

_{1}

*equals to:*

_{n}*x*

_{0},

*y*

_{0}) at

*d*

_{1}

*that can be defined by the type of light source. Finally, the light field intensity at*

_{n}*d*

_{1}

*equals to:*

_{m}## 4. Experiments and results

The effectiveness of the proposed blind light field reconstruction method is demonstrated by testing on the simulated sensor data. The plenoptic camera 2.0 system is simulated according to [14], which consists of a main lens with *f*_{1} = 40mm and 4mm radius, and a 3 × 3 microlens array with *f*_{2} = 4mm and 160$\mu m$ radius for each microlens. The focal plane of the whole system is set at 65mm before the main lens. *L* and *l* equals to 122.49mm and 5.104mm, respectively. Three objects, “P,” “S,” and “F” are placed at *d*_{1}* _{n}* = 65mm, 67mm, and 69mm, respectively, as shown in Fig. 2(a), and the simulated sensor data is shown in Fig. 2(b).

To extract the imaging results for a same object under a microlens from the sensor data, ${I}_{s,{\Omega}_{k}}^{\stackrel{\rightharpoonup}{m},{d}_{1n}}$in Eq. (23), several image segmentation methods, like graph-cut [20], can be exploited to distinguish the objects’ response on the sensor. During the experiments, we use the connected components analysis in [21] to label the 8-connected components in the image and segment out the region as the image of an object. The segmented regions are lined in red and shown in Fig. 3. Using the segmented images of “P” as instances, the regions lined in red are magnified on the right in Fig. 3. According to Eq. (23),the segmented images of “P” under microlens (1,1), (1,2), and (2,1) can be denoted by ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1},{d}_{1n}}$${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2},{d}_{1n}}$and ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{3},{d}_{1n}}$, respectively.

#### 4.1 Blind depth derivation verification

First, the correctness of Eq. (28), which deriving the depth by evaluating the similarity between the reconstructed images, is verified by executing Eq. (29) for ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1},{d}_{1n}}$${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2},{d}_{1n}}$and ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{3},{d}_{1n}}$at a series of ${{d}^{\prime}}_{1n}$ and comparing the results of Eq. (28) with the real depth information. ${d}_{1n}$ from 62mm to 71mm with 1mm interval is used and the reconstructed ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$,${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2}}$and${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{3}}$are shown in Fig. 4 from (b) to (k), respectively.

We use Euclidean distance as the function of $Dis(A,B)$in Eq. (28) because of its simplicity and sufficient accuracy. Smaller value corresponds to smaller distance and higher similarity. The results of each pair of reconstructed images at ${d}_{1n}$ are listed in Table 1. It can be found that as ${d}_{1n}$ increases from 62mm to the real distance 65mm, the value of *Dis*(.) decreases, which corresponds to the reconstructed images spatially moving closer to each other. The effect is consistent with that shown in Fig. 4 and the derivation in Eq. (27). Inversely, as ${{d}^{\prime}}_{1n}$increases from the real distance 65 mm to 71mm, *Dis*(.) increases which corresponds to the reconstructed images spatially moving apart from each other. *Dis*(.) always reaches the minimum at 65 mm, which is the real distance that “P” is placed at, for all the pairs. It indicates that according to Eq. (28), *d*_{1}* _{n}* equaling to 65mm can be achieved no matter which pair of images is input.

Similar processes are performed to the image of “S” and “F.” Since any pair of the images under two microlenses can derive the real distance, we just show the results of “S” and “F” using the most complete two images of the object from two microlens. Reconstructing “S” uses the segmented images under microlens (2, 2) and (3, 2) as${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1},{d}_{1n}}$and ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2},{d}_{1n}}$, respectively. Reconstructing “F” uses segmented images under microlens (2,3) and (3,3) as${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1},{d}_{1n}}$and ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2},{d}_{1n}}$, respectively. The reconstructed intensity images are shown in Fig. 5 from (b) to (k). To make the spatial disparity of reconstructed results clearly, we use red color to represent ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1},{d}_{1n}}$and its reconstructed ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$, and use green color to highlight ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2},{d}_{1n}}$and its reconstructed${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2}}$. The similarities measured by *Dis*(.) between ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$and ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2}}$ are listed in Table 2. From Fig. 5, It can be found that the spatial disparity between ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$and ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2}}$reaches the minimum at 67mm, the real depth of “S,” for object “S” and reaches 69mm, the real depth of “F,” for object “F.” Combining the disparity calculation results in Tables 1 and 2 together, it demonstrates that the blind volumetric information derivation method proposed is effective and accurate.

#### 4.2 Blind depth derivation verification for noisy imaging results

To further verify that the proposed depth derivation method, i.e., Eq. (28), also works for the noisy imaging results, Gaussian noise is added to the imaging result in Fig. 2(b). The noisy imaging result shown in Fig. 6, whose peak signal to noise ratio (PSNR) is only 25dB, present strong noise distortion in the image relative to the noise-free result in Fig. 2(b).

Same to the implementation in processing the noise-free sensor data, we use the segmented images under microlens (1, 1) and (1, 2) for object “P,” the segmented images under microlens (2,2) and (3,2) for object “S” and the segmented images under microlens (2, 3) and (3, 3) for object “F.” Treating them as the image responses ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1},{d}_{1n}}$and ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2},{d}_{1n}}$, the reconstructed${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$and ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{\text{2}}}$as ${{d}^{\prime}}_{1n}$ varies from 62mm to 71mm with 1mm interval are shown in Fig. 7 from (b) to (k), respectively.

It can be found that the spatial disparities between the reconstructed object points are similar to the noise-free case. ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1}}$is spatially coincident with ${I}_{0{{d}^{\prime}}_{1n},{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{\text{2}}}$at 65mm, the real depth of “P,” for object “P.” Also, the spatial disparity reaches the minimum at 67mm and 69mm, the real depth of “S” and “F,” for object “S” and object “F,” respectively. Still using the Euclidean distance as the function of $Dis(A,B)$ in Eq. (28), the spatial similarities measured by *Dis*(.) are listed in Table 3. Although the strong noise decreases the difference in the spatial similarity, the reconstruction model proposed in Eq. (29) weakens the noise influence by smoothness regularization. Thus, we can still obtain the correct object distance from Table 3, which demonstrates the robustness of the proposed method.

#### 4.3 Light field reconstruction results

Using the distances blindly derived above, the reconstructed discrete intensity information at the specific distance and the volumetric information recovered are shown in Fig. 8.

Compared Fig. 8 with the original object information in Fig. 2(a), our reconstructed volumetric information in Fig. 8 can embody the depth and the actual size of the objects, which gives real space information. Further applying the light field repropagation as Eq. (30), the light field intensity at depth 65mm, 67mm, and 69mm is generated using Eq. (31) and shown in Fig. 9. As shown in the figure, the light field propagates to all the directions, which results in on each light field slice we can observe some light field intensity information generated by the light sources of objects “P,” “S,” and “F.” The effect is consistent with the theoretical understanding of light field.

#### 4.4 Light field reconstruction for bigger object with more microlenses

To further verify the universality of the proposed method, reconstruction results are provided for a much bigger imaging target using a plenoptic camera 2.0 with a 7 × 7 microlens array.

The system parameters are consistent with those in the above experiments. The imaging target “A,” shown in Fig. 10(a), is placed at 66mm. Its physical size is much larger than “P,” “S,” or “F” used before, which cannot be fully imaged by a single microlens. Thus, as the simulated sensor data shown in Fig. 10(b), the image response under each microlens is only a part of “A.”

Since image responses under different microlenses correspond to different parts of the object, as shown in the Fig. 10(b), we use three pairs of image responses to recover the light field for the whole object. As shown in the Fig. 11, the first pair uses the image responses under microlens (2, 2) and (2, 3) as${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1},{d}_{1n}}$ and ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2},{d}_{1n}}$, respectively; the second pair uses the image responses under microlens (5, 2) and (5, 3) as${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1},{d}_{1n}}$ and ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2},{d}_{1n}}$, respectively; the third pair uses the image responses under microlens (4, 4) and (4, 5) as${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{1},{d}_{1n}}$ and ${I}_{s,{\Omega}_{k}}^{{\stackrel{\rightharpoonup}{m}}_{2},{d}_{1n}}$, respectively.

Using the distance derived by Eq. (28), the reconstructed information from the first, the second and the third pair image responses are shown in Fig. 12(a)-(c), respectively. Since all the derived object’s distances are 66mm, the recovered light field intensity at distance 66mm is generated by Eq. (23), i.e., adding the three reconstructed light fields together. The recovered light field intensity is shown in Fig. 12(d), which shows the information of the object “A.” Comparing it with the original imaging target in Fig. 10(a), the recovered “A” has exactly the same physical size. The completion of recovered “A” can be further improved by reconstructing the image responses under more microlenses. It demonstrates that the proposed approach also works for the plenoptic camera 2.0 with more microlenses for bigger imaging targets.

## 5. Conclusion

In this paper, we proposed a blind light field reconstruction method based on inverse image formation approximation and blind volumetric information derivation. The inverse image formation is approximated to be a summation of the localized reconstructions based on image formation analysis. Blind volumetric information derivation is proposed based on backward image formation modeling to exploit the correspondence among the deconvoluted results. The light field is blindly reconstructed via the proposed inverse image formation approximation and wave propagation. Experimental results demonstrated the correctness and effectiveness of the proposed method in blindly recovering light field intensity with continuous volumetric data. Since the internal parameter changes do not affect the mathematical formalism of all the derivations and the image formation analysis provided in the paper, the proposed method is general to different optical parameters of plenoptic camera 2.0.

To further optimize the proposed algorithm, we are investigating on more automatic segmentation methods to extract the imaging results even if the depth-dependent imaging distortion exists. Also, recovering real objects on heterogeneous optical configurations are under modeling.

## Funding

National Natural Science Foundation of China (NSFC) (61771275); Shenzhen Project, China (JCYJ20170817162658573).

## References and links

**1. **E. H. Adelson and J. Y. A. Wang, “Single lens stereo with a plenoptic camera,” IEEE Trans. Pattern Anal. Mach. Intell. **14**(2), 99–106 (1992). [CrossRef]

**2. **R. Ng, M. Levoy, M. Bredif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenopic camera,” Technical Report, Stanford University (2005).

**3. **R. Ng, “Digital light field photography,” Ph.D. thesis, Stanford University (2006).

**4. **N. Antipa, S. Necula, R. Ng, and L. Waller, “Single-shot diffuser-encoded light field imaging,” in *2016 IEEE International Conference on Computational Photography* (*ICCP*), Evanston, IL, pp. 1–11 (2016).

**5. **Y. Zhang, Z. Li, W. Yang, P. Yu, H. Lin, and J. Yu, “The light field 3D scanner,” in *2017 IEEE International Conference on Computational Photography* (*ICCP*), Stanford, CA, pp. 1–9 (2017).

**6. **S. Shroff and K. Berkner, “High resolution image reconstruction for plenoptic imaging systems using system response,” in Imaging and Applied Optics Technical Papers, OSA Technical Digest (online) (Optical Society of America (2012)), paper CM2B.2. [CrossRef]

**7. **S. Shroff and K. Berkner, “Plenoptic system response and image formation,” in Imaging and Applied Optics, OSA Technical Digest (online) (Optical Society of America, 2013), paper JW3B.1.

**8. **S. Shroff and K. Berkner, “Wave analysis of a plenoptic system and its applications,” Proc. SPIE **8667**, 86671L (2013). [CrossRef]

**9. **L. Liu, X. Jin, and Q. Dai, “Image formation analysis and light field information reconstruction for plenoptic Camera 2.0,” Pacific-Rim Conference on Multimedia (PCM)2017, Sept. 28–29, Harbin, China. [CrossRef]

**10. **C. Guo, H. Li, I. Muniraj, B. Schroeder, J. Sheridan, and S. Jia, “Volumetric light-field encryption at the microscopic scale,” in Frontiers in Optics 2017, OSA Technical Digest (online) (Optical Society of America, 2017), paper JTu2A.94.

**11. **M. Broxton, L. Grosenick, S. Yang, N. Cohen, A. Andalman, K. Deisseroth, and M. Levoy, “Wave optics theory and 3-D deconvolution for the light field microscope,” Opt. Express **21**(21), 25418–25439 (2013). [CrossRef] [PubMed]

**12. **E. Y. Lam, “Computational photography with plenoptic camera and light field capture: tutorial,” J. Opt. Soc. Am. A **32**(11), 2021–2032 (2015). [CrossRef] [PubMed]

**13. **A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in Proceedings of IEEE International Conference on Computational Photography (ICCP, 2009), pp. 1–8.

**14. **T. Georgiev and A. Lumsdaine, “Focused plenoptic camera and rendering,” J. Electron. Imaging **19**(2), 1–28 (2010).

**15. **X. Jin, L. Liu, Y. Chen, and Q. Dai, “Point spread function and depth-invariant focal sweep point spread function for plenoptic camera 2.0,” Opt. Express **25**(9), 9947–9962 (2017). [CrossRef] [PubMed]

**16. **T. Georgiev and A. Lumsdaine, “Superresolution with plenoptic 2.0 cameras,” in *Frontiers in Optics 2009/Laser Science XXV/Fall* 2009, OSA Technical Digest (CD) (Optical Society of America) (2009), paper STuA6.

**17. **T. E. Bishop and P. Favaro, “The light field camera: extended depth of field, aliasing, and superresolution,” IEEE Trans. Pattern Anal. Mach. Intell. **34**(5), 972–986 (2012). [CrossRef] [PubMed]

**18. **C. C. Paige and M. A. Saunders, “LSQR: an algorithm for sparse linear equations and sparse least squares,” ACM Trans. Math. Softw. **8**(1), 43–71 (1982). [CrossRef]

**19. **D. C. L. Fong and M. Saunders, “LSMR: an iterative algorithm for sparse least-squares problems,” SIAM J. Sci. Comput. **33**(5), 2950–2971 (2011). [CrossRef]

**20. **Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell. **23**(11), 1222–1239 (2001). [CrossRef]

**21. **R. M. Haralick and L. G. Shapiro, *Computer and Robot Vision* (Addison-Wesley Longman Publishing Co., 1992), pp. 28–48, vol. I.