Single shot three-dimensional imaging using an engineered point spread function

René Berlich; Andreas Bräuer; Sjoerd Stallinga

doi:10.1364/OE.24.005946

1. Introduction

The ability to acquire depth information in a single shot in addition to the conventional two-dimensional image of an object scene is of increased interest in modern applications for consumer electronics, bio-medical imaging, machine vision and automotive engineering. Depending on the particular application, optical system solutions rely on active or passive illumination. The former approach incorporates a tailored, artificial light source in addition to an image acquisition module to extract the depth information of an object. Existing technologies include structured light [1], time-of-flight (Lidar) [2] as well as interferometry [3]. Passive illumination methods purely rely on ambient light and thus generally benefit from reduced energy consumption and system complexity, as well as robustness with respect to stray light. Most common solutions are based on multi-aperture approaches, i.e. stereo setups. The major disadvantage of these setups is the necessity for multiple optical systems and image sensors that result in increased costs, higher complexity and the need for an elaborate calibration [4]. In contrast, conventional single aperture approaches based on depth from focus (DFF) or depth from defocus (DFD) extract depth information by analyzing the axially dependent image blur or by searching for the in-focus state of the imaging system, respectively [5]. These configurations provide less complexity, but commonly suffer from low axial precision or require multiple acquisitions [6]. An approach that enables combining the advantages of monocular and stereo systems is based on the integration of a diffraction grating in front of a single imaging configuration [7, 8]. But the utilization of higher diffraction orders in order to generate a stereo pair results in a significant spectral dependence of the image disparity. Accordingly, the method requires a quasi-monochromatic illumination or prior knowledge on the object spectrum in order to retrieve well-defined depth information. In the past decade, plenoptic cameras have been of increased interest due to their rather simple, cost efficient setup. However, the inherent loss in lateral object resolution due to the optical demagnification by the microlens array represents a severe drawback [9, 10].

An alternative method for acquiring three-dimensional object information utilizes temporally or spatially engineered point spread functions (PSFs). Temporal PSF engineering techniques exploit a tailored focus sweep to generate a depth dependent PSF distribution with an extended depth of focus, which requires complex and costly opto-mechanical components such as piezo-electric actuators or deformable lenses [11, 12]. Various spatially tailored PSFs have been proposed in order to enhance the depth discrimination capabilities of depth-from-defocus systems. In [13], adapted aperture masks are utilized to extract depth information, but severely reduce the light efficiency of an optical system. In order to overcome this constraint, complex segmented optical elements within the pupil can be employed to achieve an extended depth of focus, but only provide low depth discrimination [14]. Moreover, the respective PSF engineering approaches commonly require extensive computational effort due to the incorporated iterative error minimization methods [12–14 ].

A novel PSF engineering approach has been demonstrated by Piestun and coworkers, which utilizes a rotating double helix PSF [15, 16]. The corresponding system has been applied successfully in the area of microscopy, demonstrating an extended depth of focus and a high depth resolution for 3D single-molecule localization [17, 18]. Moreover, the general feasibility for broadband, passive cameras has been verified [19]. However, the applicability to commercial camera systems, e.g. in the area of consumer electronics or machine vision, is strongly limited. The necessity of multiple image acquisitions in order to retrieve the axial and lateral object information represents a major drawback of the system and restricts its application to (near) static object scenes. Additional minor drawbacks of these systems include the complex and costly setup, as well as the low light efficiency due to the incorporated spatial light modulator, which requires polarization filtering. A similar system based on four rotating PSF peaks has been developed by Niihara et. al [20], which enables single shot depth acquisition. But in addition to the costly numerical reconstruction approach, the respective pupil elements are not optimized for an extended rotation range, which significantly limits the retrievable depth range.

Here, we present a closed system approach based on the combination of a compact cost efficient optical setup and customized image processing that enables obtaining three-dimensional, broadband (RGB) object information from a single image. In particular, we show how the image’s power cepstrum can be used to retrieve the axially dependent PSF parameters, which encode the object’s depth information, with low computational effort. Based on the obtained parameters, the lateral scene can be reconstructed by a tailored Wiener filter, which, in contrast to the filter proposed in [16, 19], does not require an additional reference frame.

Initially, the concept of the proposed image acquisition approach is presented and a simplified imaging model to describe the hybrid optical system is established. The work flow of the applied image processing steps, including the depth map retrieval and the object reconstruction, is subsequently described. In the final section, experimental results of a developed demonstration system are presented, which verify the applied system approach.

2. System approach

2.1. Imaging setup

The general image acquisition setup is schematically shown in Fig. 1. A passively illuminated, three-dimensional object is imaged on a conventional image sensor. The hybrid optical system is based on a conventional camera objective in combination with a computer-generated hologram (CGH). The thin holographic element is located directly at the objective’s aperture stop position in order to ensure a field independent transmission. The particular design of the CGH is based on the approach presented in [15]. An initial estimate is obtained analytically based on a tailored superposition of Gauss-Laguerre modes with respective indices (2,2), (4,6), (6,10), (8,14) and (10, 18). Subsequently, a phase-only element is retrieved by further iterative optimization. The element modifies the phase of the transmitted light, which results in characteristic spiral exit pupil phase distributions that are exemplarily shown in Fig. 2(a) for an in- and out-of-focus object point. The corresponding double-helix shaped PSF distribution features a depth dependent rotation with an extended depth of focus. When an extended object distribution is imaged, the depth dependence is encoded within the recorded two-dimensional image. By decoding this raw image, both the depth map and the lateral object information can be extracted.

Fig. 1 Schematic layout of the proposed image acquisition setup. A 3D object distribution is imaged by a conventional camera objective with an implemented glass substrate comprising the CGH surface profile. The lateral and axial object information is optically encoded within the raw image due to an engineered PSF and can be recovered by tailored image processing.

Download Full Size | PDF

Fig. 2 The phase distribution in the exit pupil plane of the hybrid optical system (a), the corresponding Modulation Transfer Function (MTF) (b), as well as the MTF of a conventional optical system (c) are plotted for an exemplary in- and out-of-focus object distance z ₁ and z ₂, respectively. Note that the CGH is slightly oversized with respect to the actual pupil size, which is indicated by the dashed circle in (a). The spatial frequencies of the displayed MTFs are normalized according to the optical cut-off frequency given by the wavelength λ and the system’s F-number. The engineered MTFs, shown in (b), exhibit a characteristic modulation with an axially dependent period 1/p(z) and orientation angle θ.

Download Full Size | PDF

In contrast to the spatial light modulator used in [19], the presented setup incorporates a thin glass substrate with a structured surface profile, which provides a more compact and robust system solution. The glass element can be used with a broader temperature range and without the need for a polarization filter, which decreases the light efficiency. The profile is generated in two steps utilizing cost efficient, state-of-the-art wafer level technology that enables the processing of multiple-elements in a single iteration. Initially, a master sample is fabricated inside a photo resist layer using a novel grayscale, LED writing lithography system [21]. In particular, the utilized system provides a high accuracy, characterized by a lateral resolution below 1 μm, a low wave front error of manufactured CGHs, and a highly dynamic dosage control. In comparison to the system applied in [22], the increased lateral processing area (11×) and the improved positioning accuracy (2×) enables highly parallelized, more cost efficient manufacturing of the CGH master samples. Using reactive-ion-etching or mask imprinting technology, the obtained profile is subsequently transferred onto the targeted substrate, which is diced in order to obtain the final elements. Ultimately, they are directly implemented inside a commercial camera objective. Note that the optical parameters (e.g. focal length, F-number) can be tailored to particular application needs. The optical setup is similar to the coded aperture configuration proposed in [13], which incorporates an adapted aperture mask. However, the systems light efficiency is significantly increased, due to the utilization of a phase-only element. In addition, the more confined double-helix PSF distribution inherently provides a higher lateral resolution.

2.2. Image acquisition

The proposed setup is modeled as an incoherent imaging system, described by [19]

i = {i_{k l}} = \int_{- \infty}^{+ \infty} o (z) * h (z) d z + n,

where i is the discretely sampled coded image distribution, o(z) is the object’s discrete surface brightness, h(z) is the engineered point spread function and n describes an additive noise term. Note that * denotes the discrete lateral convolution integral with the laterally shift-invariant and axially shift-variant PSF. In the following, the indices (k, l) denote the pixel indexing within the discretely sampled, two-dimensional distributions. According to the design of the CGH, the axial dependence of the PSF can be described by a combination of a rotation and a lateral scaling of the double peak separation. At the same time, the engineered PSF inherently extends the system’s depth of focus by minimizing the spreading of the individual peaks within the axial range of interest. If we assume that the two peaks of the double-helix PSF are well confined with negligible side-lobes over the entire axial range of interest, the PSF can be approximated by

h (z) ≃ h_{0} * δ^{+} (z) + h_{0} * δ^{-} (z),

where h ₀ represents the nominal, shift-invariant distribution of a single PSF peak. The Delta-distributions δ ^±(z) can be expressed by

δ_{k l}^{\pm} (z) = δ [k \pm p (z) cos (θ (z)), l \pm p (z) sin (θ (z))],

with a peak separation p(z) and an azimuth orientation angle θ(z) that linearly depend on z. Accordingly, Eq. (1) can be rewritten as

i = \int_{- \infty}^{+ \infty} \underset{o_{0} (z)}{\underset{︸}{o (z) * h_{0}}} * [δ^{+} (p (z), θ (z)) + δ^{-} (p (z), θ (z))] d z + n

The image is thus a superposition of two representations of the blurred object distribution o ₀, which are shifted according to their axial position. Note that o ₀ describes the blurred object distribution, analogue to a conventional imaging system with an extended depth of focus.

2.3. Image processing

The work flow of the proposed image processing procedure, based on the previously described image acquisition approach, is schematically shown in Fig. 3. First, the depth map of the encoded image is retrieved as described in the following section. In the second step, the object distribution is reconstructed by applying the decoding approach explained in the subsequent section.

Fig. 3 Schematic work flow of the proposed image acquisition and processing approach, which retrieves the depth information encoded in (p_kl, θ_kl) and reconstructs the object distribution O′_kl from a single subimage I_kl.

Download Full Size | PDF

2.3.1. Depth map retrieval

The key to retrieving the depth distribution of the object from the raw image i is to determine the lateral distribution of the rotation angle θ_kl of the twin images. This is done by analyzing the object features in a M × N pixels large neighborhood of each image location (k,l), which is valid under the assumption that the neighborhood corresponds to a part of the object distribution located at the same distance z_kl. Thus, a sliding window function

w_{m n} = {\begin{array}{l} 1 & if | m |, | n | \leq | M / 2 |, | N / 2 | \\ 0 & else \end{array}

is applied to the raw image i, which results in the subimage distribution I_kl defined by

I_{k l}^{m n} = i_{k + m, l + n} \cdot w_{m n} .

In order to reduce the numerical effort of the depth map retrieval, the subimage distribution may be sampled at a reduced rate given by the window size divided by a sampling factor q, which is typically on the order of 2 to 4. According to Eq. (4), each windowed subimage distribution I_kl is given by the convolution

I_{k l} = O_{k l} * H_{0} * [δ^{+} (p_{k l}, θ_{k l}) + δ^{-} (p_{k l}, θ_{k l})] + N_{k l},

where O_kl and N_kl denote the windowed subobject and noise distributions, respectively. The windowed nominal distribution of a single PSF peak is described by H ₀. In our approach, we apply the cepstrum concept to extract the corresponding, discretely sampled PSF parameters θ_kl and p_kl. The concept originates from pitch detection in human speech [25, 26] and can also be utilized to detect motion blur or stereo correspondence in imaging applications [23, 24], which in fact represent similar image processing problems. The power cepstrum distribution of I_kl is defined by

C_{k l} = 𝒞 {I_{k l}} : = ℱ^{- 1} {log ({| ℱ {I_{k l}} |}^{2})} .

Accordingly, inserting Eq. (7) in Eq. (8) leads to

C_{k l} = ℱ^{- 1} {log ({| ℱ {I_{0, k l} + N_{k l}} |}^{2})}

= ℱ^{- 1} {log ({| ℱ {I_{0, k l}} |}^{2}) + log ({| 1 + \frac{ℱ {N_{k l}}}{ℱ {I_{0, k l}}} |}^{2})}

= 𝒞 {I_{0, k l}} + ℱ^{- 1} {log ({| 1 + \frac{ℱ {N_{k l}}}{ℱ {I_{0, k l}}} |}^{2})},

where I _0,kl denotes the subimage without noise. The obtained cepstrum is thus a superposition of the cepstrum of the encoded object distribution and a second contribution that depends on the spatial frequency content of the noise as well as the encoded object. The key property of the cepstrum calculation is that it maps a convolution into an addition. Thus, the first term in Eq. (11) can be written as

𝒞 {I_{0, k l}} = 𝒞 {O_{k l} * H_{0}} + 𝒞 {[δ^{+} (p_{k l}, θ_{k l}) + δ^{-} (p_{k l}, θ_{k l})]},

which separates the cepstrum of the blurred object distribution from the cepstrum of the shift term described by the delta distributions. According to [27], the latter leads to a symmetrical set of impulses within the cepstrum domain. The impulses are located along a line with a respective angle θ_kl and a separation 2p_kl and thus directly provide the engineered PSF parameters.

The main challenge of the proposed approach is to accurately identify these impulse peaks within each subimages cepstrum C_kl. The first main limitation arises from the spatial frequency content of the object distribution. Fig. 2(b) shows the Modulation Transfer Function (MTF) of the proposed hybrid imaging system for two representative object distances z ₁ and z ₂. It can be seen that the introduced phase element leads to a modulation of the conventional MTF shown in Fig. 2(c), which is characterized by a period p and an orientation θ. This modulation ultimately leads to the set of impulses within the cepstrum domain. Note that in comparison to performing an autocorrelation, the cepstrum analysis provides increased contrast of the impulse peak in case of a weak modulation. In fact, the separation between the source signal (blurred object distribution) and the carrier (double Delta-distribution) is improved due to the logarithmic enhancement of the modulation in the spatial frequency domain. However, the modulation turns invisible in case a of lack of sufficiently small spatial object features and the set of impulses in the cepstrum domain vanishes completely. Accordingly, the object’s spatial frequency spectrum must span an area beyond the first modulation minimum at 1/(V · p), considering the magnification V of the optical system. In other words, the object scene must contain spatial features that are comparable to or smaller than the double helix PSF extension V · p in object space. In addition, the peak identification can be ambiguous in case of periodic object features, which lead to an equivalent modulation of the image spectrum. The corresponding, additional peaks in the cepstrum domain may corrupt the peak identification and result in false depth information. The second major influence is given by the noise level. If the object’s spatial frequency content in the range of interest is insufficient, peaks that originate from the noise contributions (second term in Eq. (11)) are dominating the cepstrum and the impulse identification becomes unreliable [24]. Accordingly, the window size needs to be increased at the expense of lateral depth resolution in order to include more object features. It should be noted that both limitations are (in a slightly modified manner) inherent to all passive optical systems.

In fact, the size of the considered neighborhood is crucial for a robust depth estimation of each object point. But making a reasoned choice for M and N is difficult because the noise in the cepstrum depends on the spatial frequency content in each window according to the second term in Eq. (11). The window size and the corresponding degree of noise averaging needed must then be set according to the particular object that is imaged.

In order to increase the reliability of the peak identification, a two-dimensional Hann window is initially applied to the individual subimages

I_{k l}^{' m n} = ℋ {I_{k l}} : = I_{k l}^{m n} \cdot {\frac{1}{4} [1 - cos (\frac{2 π m}{M - 1})] [1 - cos (\frac{2 π n}{N - 1})]},

before the cepstrum calculation, in order to mitigate the influence of edge effects on the discrete Fourier transformation. In addition, a priori knowledge on the peak parameters θ and p is used. First, it is assumed that the axial extension of the object distribution is limited to a total PSF rotation range of 180 degree, which ensures a unique relationship between θ and z. Under this condition, the detection range of the impulses in the cepstrum domain can be truncated according to

C_{k l}^{' m n} = {\begin{array}{l} C_{k l}^{m n} & if & p_{min} \leq \sqrt{m^{2} + n^{2}} / 2 \leq p_{max}, \\ 0 & else \end{array}

by applying a minimum and maximum double peak separation (p _min, p _max). Both should typically be in the order of 0.8 to 0.9 and 1.1 to 1.2 times the peak separation at the nominal (in-focus) object distance, respectively, which can be extracted from the optical system design. Second, the truncated cepstrum C′_kl is convolved with a Gaussian kernel of size s to mitigate the impact of noise on the peak detection. The kernel width s should be selected in the range of 1–2 times the diffraction limited PSF peak size, which determines the minimum size of features in the cepstrum domain that do not originate from noise. In practice, s as well as (p _min, p _max) may be obtained by experimentally analyzing the PSF peak width σ and separation p(z), respectively.

Finally, the pixel location (m _max, n _max) of the maximum in each convolved cepstrum C″_kl is located and the cepstrum values in an s pixel wide neighborhood are extracted. The weighted position of the peak within this subset of C″_kl can be calculated using a standard center of gravity detection algorithm and the rotation parameters θ_kl and p_kl are extracted for each subimage. In fact, the identification of a single peak in the cepstrum is sufficient, due to the symmetry of C_kl. The angle θ_kl is finally related to the object distance z_kl based on a look-up table of the calibrated relationship z(θ). We emphasize that the described peak identification approach focuses on a high computational efficiency. More advanced methods, e.g. based on maximum likelihood estimators, can provide a higher accuracy and robustness, but require computationally expensive iterative algorithms.

2.3.2. Object reconstruction

In order to reconstruct the original object information from a single acquisition, the twin images in i need to be merged. This can be done by means of a deconvolution operation using the double helix PSF. The shape of the PSF can, however, be distorted in comparison to the original design due to geometrical and chromatic aberrations, as well as mechanical system tolerances. A direct deconvolution may thus result in severe artifacts depending on these shape deviations. In general, it is possible to determine the exact PSF distribution experimentally. However, this may require measuring the two-dimensional PSF shape within the entire three-dimensional region of interest due to the lateral and axial dependency of potential PSF distortions, i.e. in case of significant off-axis aberrations. In addition to the extensive calibration efforts, either a comprehensive look-up table or complex interpolation schemes based on analytic or numerical approximations need to be incorporated. Alternatively, blind deconvolution algorithms can be applied that are however numerically demanding due to the necessity for iterative optimization procedures.

In order to facilitate a fast and reliable image decoding, the proposed object reconstruction focuses on removing the twin image within i and partially recovering sharp object features. We retrieve the windowed subobject distributions O′_mn with a linear Wiener-type (deconvolution) filter. The Fourier transform of Eq. (7) can be expressed as

{\hat{I}}_{k l}^{'} = {\hat{O}}_{k l}^{'} \cdot {\hat{H}}_{0} \cdot {\hat{D}}_{k l} + {\hat{N}}_{k l}^{'}

where Î′_kl and Ô′_kl denote the Fourier transformation of the Hann windowed distributions I_kl and O_kl, respectively. The Fourier transform D̂_kl of the delta-distributions is

{\hat{D}}_{k l}^{m n} = cos [2 π p_{k l} {k cos (θ_{k l}) + l sin (θ_{k l})}],

which corresponds to the axially dependent MTF modulation illustrated in Fig. 2(b). The parameters θ_kl and p_kl are already obtained during the depth map retrieval. The Fourier transform Ĥ ₀ of a single PSF peak with neglected side-lobes is approximated by a Gaussian function

{\hat{H}}_{0}^{m n} = \frac{1}{2 π {\hat{σ}}^{2}} exp (- \frac{m^{2} + n^{2}}{2 {\hat{σ}}^{2}}),

with a width σ̂. The sharpened object spectrum Ô′_kl is reconstructed by a Wiener filter

{\hat{O}}_{k l}^{'} = {\hat{I}}_{k l}^{'} \cdot [\frac{{\hat{H}}_{0}^{*} \cdot {\hat{D}}_{k l}^{*}}{{| {\hat{H}}_{0} \cdot {\hat{D}}_{k l} |}^{2} + {SNR}_{k l}^{- 1}}],

where SNR _kl is the signal-to-noise ratio of each subwindow. In addition to limiting the amplification of noise, SNR _kl is essential in order to compensate for zero values of D̂_kl within the denominator of Eq. (18). On the one hand, D̂_kl removes the modulation of the spectrum, which eliminates the twin image in i. On the other hand, Ĥ ₀ recovers high spatial frequency contributions. Therefore, a respective Gaussian width σ̂ > max{N, M}/(2πσ) should be selected according to the PSF peak width σ in order to avoid ringing artifacts. Note that a proper removal of the twin images necessitates accurate estimation of the local PSF parameters θ_kl and p_kl in the order of the pixel size of the image sensor. A false estimation, i.e. due to a high noise level or due to an oversized sliding window that spans over a significant depth range, can result in severe artifacts within the reconstructed object distribution. Contrarily, estimation errors based on lack of small object features in certain object regions only lead to minor reconstruction artifacts due to the absence of high spatial frequencies.

Finally, an inverse Fourier transformation of Eq. (18) leads to the recovered distribution O′_kl. Adding these windowed subobjects according to

o_{k l} = \sum_{m, n} O_{k - m, l - n}^{' m n}

provides the reconstructed object distribution o. The Hann window, which is maintained within each subobject, leads to a smooth overlap of the individual O_kl, which mitigates stitching artifacts within o even in case of a small sampling factor q.

It should be pointed out that the object reconstruction significantly benefits from the extended depth of focus of the hybrid optical system. As can be seen in Fig. 2(b), the presented out-of-focus MTF is generally increased (i.e. for spatial frequencies |ξ|, |η| > 0.1/(λF#)) in comparison to the conventional MTF in Fig. 2(c). Hence, it enables an improved reconstruction of these frequencies, which results in an enhanced image resolution for object areas that are significantly out-of-focus. In addition, we emphasize that the Fourier transform ℱ{I′_kl} = Î′_kl, which is required to determine Ô′_kl in Eq. (18), is already calculated during the prior cepstrum analysis. The total numerical effort for the depth estimation in combination with the object retrieval is thus mainly determined by only three Fourier transformations. Hence, it provides significantly reduced numerical costs in comparison to regularized iterative error minimization methods used in [12–14 , 20]. It facilitates a fast approach that performs up to 1–2 fps for a megapixel image using our current software implementation in MATLAB on a conventional desktop PC. The frame rate can be increased furthermore by employing state-of-the-art hardware and using a dedicated computation on a GPU, which can potentially allow for a real time implementation on the order of 10–20 fps.

3. Proof-of-principle experiment

3.1. Setup implementation

A demonstration system according to the proposed imaging setup shown in Fig. 1 is realized in order to verify the presented approach. For demonstration purposes, the developed photo resist master, which is obtained in the first step of the CGH fabrication process, is directly utilized without the subsequent transfer of the surface profile onto the final substrate. The surface profile of the corresponding element, which is measured using a white light interferometer, is shown in Fig. 4(a). Note that its lateral extension is slightly oversized with respect to the aperture stop diameter of 10 mm (indicated by a white circle in Fig. 4(a)) in order to accommodate alignment tolerances. The maximum profile height of 885 ± 10 nm complies with the maximum required phase shift of 2π, considering a design wavelength of 550 nm and the photo resist refractive index of 1.62. The major difference in comparison to the phase element manufacturing in [22] relies on the applied exposure scheme. In contrast to a single shot exposure, the element shown in Fig. 4(a) is manufactured by optimized lateral stitching of multiple substructure exposures in order to achieve a more than 4 times increased diameter of 12 mm. The advanced fabrication thus enables versatile lateral scaling of the designed CGHs in order match the apertures of commercially available objective lenses. Additional minor advantages include an improved surface smoothness with minimum imperfections, which results in reduced straylight, as well as an enhanced height discretization of 10bit in comparison to 30 levels in [22]. The phase element is placed at the aperture location of a compact optical demonstration setup, which consists of an achromatic doublet pair. In particular, we utilize two conventional achromats from Thorlabs with a focal length of 100 mm (AC254-250-A) and 250 mm (AC254-100-A), respectively. The diffraction limited optical system is optimized for a nominal object distance of 1 m and features a focal length of 83 mm and an F-number of 8.4. The correction of axial color aberrations is essential in order to minimize the spectral dependence of the rotation angle θ, which limits the axial resolution. However, it should be pointed out that a tailored spectral dependence can also provide an additional degree of freedom that can potentially increase the reliability of the depth measurement. A 1/2.3-inch CMOS image sensor (Aptina MT9F002) with a total pixel count of 4384 × 3290 (14MP) and a standard Bayer pattern for RGB imaging is placed at the nominal image position. The pixel size of 1.4 μm × 1.4 μm with a Nyquist frequency of 357 lp/mm leads to a minor undersampling of the image considering the optical cut-off frequency of 216 lp/mm of the nominal system, but provides sufficiently high sampling of the engineered image distribution. The final system covers a lateral object field extension of 75 × 53 mm ² at the nominal distance.

Fig. 4 (a) Measured surface profile of realized CGH. The dashed circle indicates the aperture size of 10 mm with in the optical setup. (b) Measured relationship between object distance z and rotation angle θ for three different wavelengths. The insets display the shape of the PSF at 540 nm for the corresponding object distance.

Download Full Size | PDF

3.2. Depth estimation

First, the relationship between the rotation angle θ and the object distance z is calibrated by successively imaging three LED point sources with peak irradiances at 465 nm, 540 nm and 625 nm, respectively. The noise is reduced by averaging over 10 image acquisitions. Note that the calibration can be limited to on-axis points due to the diffraction limited performance of the demonstration system over the entire three-dimensional field of interest. The corresponding PSF distribution, which is illustrated in the two insets of Fig. 4(b), clearly shows two distinct peaks (separated by approximately 20μm), which can be analyzed in order to obtain the axial dependence of θ(z) shown in Fig. 4(b). It can be seen that a linear relationship is maintained over a large range of approximately 170°. The effective depth range that can be utilized is 160 mm. Beyond this range, the rotation rate begins to decrease drastically and the distorted shape of the PSF prohibits reliable depth estimation. In comparison to conventional multi-aperture approaches that utilize multiple optical systems, no further calibration routines, such as the determination of the relative positions of the subsystems, need to be performed.

An extended, three-dimensional scene is imaged at the nominal object distance of 1 m using the calibrated system. The setup includes multiple objects located at different distances within the calibrated range between 960 mm and 1060 mm. The scene is illuminated by a conventional, broad-band halogen desk lamp without any spectral or polarization filtering. The left part of Fig. 5(a) shows the imaged nominal object scene, which is initially obtained without the CGH inside the optical system. The enlarged image sections on the right side of Fig. 5(a) exemplarily highlight two distinct object parts that are located at an in- and out-of-focus location. After the CGH is implemented into the system, the encoded image shown in Fig. 5(b) is obtained. As can be seen in the two enlarged image sections, the engineered PSF results in twin images of the captured features, which are laterally shifted in a direction according to the their axial position. The depth map of the object scene is obtained by applying the proposed cepstrum approach to the captured image. The sampling of the depth map is selected based on a compromise between maximizing the lateral resolution on the one hand and ensuring sufficient spatial object features to increase the signal to noise ratio for the peak identification in the cepstrum domain on the other hand. In particular, a windowing size of M = N = 256 and a sampling factor q = 4 are applied based on an empirical selection. After the determination of the angle distribution θ_kl and a subsequent evaluation of the corresponding distance z_kl using the linear calibration fit (Fig. 4(b)), the final depth map is calculated after applying a 3 × 3 pixel median filter [28] in order to reduce outliers. The resulting depth distribution shown in Fig. 6 covers the entire field of view over a depth range of approximately 100 mm. It clearly exhibits distinct objects of the captured scene and provides a spatially resolved visualization of their axial position.

Fig. 5 (a) Nominal image distribution of the three-dimensional object scene, captured without the CGH. The two insets on the right side exemplarily highlight an in- and out-of-focus part of the object scene, respectively. (b) Raw, encoded image distribution of the scene using the CGH. The in- and out-of-focus insets exhibit the blurred twin image with a lateral shift according to the distance of the respective object part. (c) Decoded image. The exemplary text features, displayed in both insets, can clearly be identified after the removal of the twin image.

Download Full Size | PDF

Fig. 6 Retrieved depth map of the imaged, three-dimensional scene.

Download Full Size | PDF

3.3. Image decoding

Finally, the angle distribution θ_kl, obtained during the depth retrieval, is combined with the information on the peak separation p_kl to reconstruct the object distribution. The deconvolved image distribution, shown in Fig. 5(c), is obtained after applying the proposed filter function according to Eq. (18) using a width σ̂ = 18, which effectively avoids ringing artifacts. For demonstration purposes, a simplified, constant signal-to-noise ratio of 33 is applied for all sub-windows, which provides a compromise between minimum noise amplification and an effective twin image removal. The image shows the uncoded RGB color information of the object and is only subject to minor reconstruction artifacts. A comparison between the decoded and the uncoded object distributions in Fig. 5 demonstrates the successful removal of the twin-image and an increased image contrast. A residual background, i.e. in the direction of the double-helix orientation, remains after the deconvolution due to the elongated side-lobes of the PSF, which are not accounted for in the approximated PSF in Eq. (2) and the corresponding filter in Eq. (18). These side lobes are a residue of the CGH design approach as well as fabrication tolerances, which can potentially be minimized further by optimizing the design, as well as the fabrication method. Alternatively, the experimental PSF can directly be used in the deconvolution approach, which necessitates a comprehensive three-dimensional characterization of the PSF distribution in order to minimize reconstruction artifacts. Furthermore, a comparison between the highlighted object parts of the nominal and the decoded image in Fig. 5 demonstrates the extended depth of focus property of the proposed hybrid system. Whereas the features of the out-of-focus part (highlighted in blue) of the nominal image are significantly blurred in comparison to the in-focus part (highlighted in red), the deconvolved image provides a comparable resolution throughout the entire axial range of interest. In fact, the out-of-focus part in Fig. 5(c) features an increased contrast in comparison to the same part in Fig. 5(a).

4. Conclusion

A system approach describing a passive optical setup combined with a tailored image processing concept is presented, which enables the acquisition of three-dimensional object information using a monocular camera system. The method is based on integrating a computer-generated hologram, fabricated on a thin glass substrate, into a conventional camera setup, which facilitates a compact, robust and cost efficient system with an extended depth of focus. Moreover, the optical setup does not require additional wavelength or polarization filters, which enables a light efficient image acquisition that maintains the RGB color information of the object. An efficient image processing approach has been developed that analyzes the cepstrum distribution of the image and incorporates a Wiener filter in order to provide a fast calculation of the axial and lateral object distribution based on a single image. Without the need for extensive iterative optimization procedures of common image deconvolution algorithms, the system potentially allows for three-dimensional video acquisition.

An experimental system has been implemented, demonstrating the capabilities of the proposed system approach. The depth map as well as the lateral (RGB) information of an extended scene has been obtained based on a single acquisition using a compact, light efficient optical system with an engineered point spread function.

In addition to the qualitative demonstration presented here, future work will include a quantitative assessment of the systems imaging performance. In particular, we aim to address scaling laws of the axial and lateral resolution limits, which potentially allows for a system optimization according to a particular application and enables a proper comparison to other three-dimensional imaging approaches, i.e. based on stereo or plenoptic configurations.

Acknowledgments

The authors would like to thank Marko Stumpf for manufacturing the CGH and Lucas van Vliet for a critical reading of the manuscript. This work was performed in the frame of the Photonics Research Germany funding program by the German Federal Ministry of Education and Research under contract 13N13667.

References and links

1. J. Salvi, J. Pagès, and J. Batlle, “Pattern codification strategies in structured light systems,” Pattern Recogn. 37(4), 827–849 (2004). [CrossRef]

2. M. Amann, T. Bosch, M. Lescure, R. Myllyla, and M. Rioux, “Laser ranging: a critical review of usual techniques for distance measurement,” Opt. Eng. 40(1), 10–19 (2000).

3. D. Huang, E. Swanson, and C. Lin, “Optical coherence tomography,” Science 254(5035), 1178–1181 (1991). [CrossRef] [PubMed]

4. M. Z. Brown, D. Burschka, and G. D. Hager, “Advances in computational stereo,” IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 993–1008 (2003). [CrossRef]

5. Y. Schechner and N. Kiryati, “Depth from defocus vs. stereo: How different really are they?” Int. J. Comput. Vision 39(2), 141–162 (2000). [CrossRef]

6. M. Subbarao and G. Surya, “Depth from defocus: a spatial domain approach,” Int. J. Comput. Vision 13(3), 271–294 (1994). [CrossRef]

7. R. Horisaki and J. Tanida, “Multi-channel data acquisition using multiplexed imaging with spatial encoding,” Opt. Express 18(22), 429–432 (2010). [CrossRef]

8. R. Horisaki and J. Tanida, “Preconditioning for multiplexed imaging with spatially coded PSFs,” Opt. Express 19(13), 573–583 (2011). [CrossRef]

9. T. Georgiev, K. C. Zheng, B. Curless, D. Salesin, S. Nayar, and C. Intwala, “Spatio-angular resolution tradeoffs in integral photography,” in Proceedings of the Eurographics Symposium on Rendering (2006), pp. 263–272.

10. A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2009), pp. 1–8.

11. D. Miau, O. Cossairt, and S. Nayar, “Focal sweep videography with deformable optics,” in Proceedings of IEEE International Conference on Computational Photography (IEEE, 2013), pp. 1–8. [CrossRef]

12. P. Llull, X. Yuan, L. Carin, and D. Brady, “Image translation for single-shot focal tomography,” Optica 2, 822–825, (2015). [CrossRef]

13. A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” ACM Transactions on Graphics 26(3), 70 (2007). [CrossRef]

14. A. Levin, S. Hasinoff, and P. Green, “4D frequency analysis of computational cameras for depth of field extension,” ACM Transactions on Graphics 28(3), 97 (2007).

15. S. R. P. Pavani and R. Piestun, “High-efficiency rotating point spread functions,” Opt. Express 16(5), 3484–3489 (2008). [CrossRef] [PubMed]

16. A. Greengard, Y. Y. Schechner, and R. Piestun, “Depth from diffracted rotation,” Opt. Lett. 31(2), 181–183 (2006). [CrossRef] [PubMed]

17. S. Quirin, S. R. P. Pavani, and R. Piestun, “Optimal 3D single-molecule localization for superresolution microscopy with aberrations and engineered point spread functions,” Proc. Natl. Acad. Sci. 109(3), 675–679 (2012). [CrossRef] [PubMed]

18. S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. E. Moerner, “Three-dimensional, single-molecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function,” Proc. Natl. Acad. Sci. 106(9), 2995–2999 (2009). [CrossRef] [PubMed]

19. S. Quirin and R. Piestun, “Depth estimation and image recovery using broadband, incoherent illumination with engineered point spread functions [Invited],” Appl. Opt. 52(1), 367–376 (2013). [CrossRef]

20. T. Niihara, R. Horisaki, and M. Kiyono, “Diffraction-limited depth-from-defocus imaging with a pixel-limited camera using pupil phase modulation and compressive sensing,” Appl. Phys. Express 8, 012501 (2014). [CrossRef]

21. H.-C. Eckstein, M. Stumpf, P. Schleicher, S. Kleinle, A. Matthes, U. D. Zeitner, and A. Bräuer, “Direct write grayscale lithography for arbitrary shaped micro-optical surfaces,” presented at the 20th Microoptics Conference, Fukuoka, Japan, 25–28 Oct. 2015.

22. G. Grover, S. Quirin, Callie Fiedler, and Rafael Piestun, “Photon efficient double-helix PSF microscopy with application to 3D photo-activation localization imaging,” Biomed. Opt. Express 82(11), 3010–3020 (2011). [CrossRef]

23. M. Cannon, “Blind deconvolution of spatially invariant image blurs with phase,” IEEE Trans. Acoust. Speech Signal Process. 24, 230–2351976).

24. P. W. Smith and N. Nandhakumar, “An improved power cepstrum based stereo correspondence method for textured scenes,” IEEE Trans. Pattern Anal. Mach. Intell. 18(3), 338–348 (1996). [CrossRef]

25. A. M. Noll, “Short-time spectrum and ”cepstrum” techniques for vocal-pitch detection,” J. Acoust. Soc. Am. 36(2), 296–302 (1964). [CrossRef]

26. A. M. Noll, “Cepstrum pitch determination,” J. Acoust. Soc. Am. 41(2), 293–309 (1967). [CrossRef] [PubMed]

27. R. Rom, “On the cepstrum of two-dimensional functions (Corresp.),” IEEE Trans. Inf. Theory 21(2), 214–217 (1975). [CrossRef]

28. W. K. Pratt, Digital Image Processing (John Wiley & Sons, 2007). [CrossRef]

Single shot three-dimensional imaging using an engineered point spread function

Abstract

1. Introduction

2. System approach

2.1. Imaging setup

2.2. Image acquisition

2.3. Image processing

2.3.1. Depth map retrieval

2.3.2. Object reconstruction

3. Proof-of-principle experiment

3.1. Setup implementation

3.2. Depth estimation

3.3. Image decoding

4. Conclusion

Acknowledgments

References and links

Cited By

Figures (6)

Equations (19)

Optics Express