Fast lightweight framework for time-of-flight super-resolution based on block compressed sensing

Wuyang Zhang; Ping Song; Xuanquan Wang; Zhaolin Zheng; Yunjian Bai; Haocheng Geng

doi:10.1364/OE.456196

1. Introduction

Time-of-flight (ToF) cameras have gained wide attention in several fields, such as robot navigation, object recognition, face identification, and gesture recognition [1–3]. Such cameras utilize active infrared illumination and a focal plane array sensor to achieve fast and reliable depth image acquisition [4]. Compared with other three-dimensional acquisition techniques, such as structured light [5] and stereo vision [6], ToF cameras are efficient, low cost, and compact [7].

Numerous attempts have been made to achieve super-resolution (SR) of ToF cameras, which can be categorized into three groups: SR algorithms, hybrid methods, and compressive imaging for SR. SR algorithms recover SR depth images from one or more low-resolution (LR) images [8,9]. Sparse coding, Bayes strategy, and deep learning [10,11] are commonly used methods for recovery. However, a purely software-based method does not improve the spatial resolution. Hybrid methods use additional information (e.g., an SR intensity image) to recover the SR image. Additional information contains more detailed spatial information to help improve the ToF spatial resolution. However, it is difficult to exactly align the additional image with the depth image, which leads to additional artifacts.

Recently, compressive imaging for SR has become a research hotspot. It leverages a spatial light modulator (SLM) and compressed sensing (CS) to recover an SR depth image from a series of LR images. Typical SLMs include digital mirror devices (DMD), coded apertures (CAs), and liquid crystals on silicon (LCoS). Ahmed et al. [12] utilized DMD to realize spatiotemporally modulated scene illumination. The system achieved a resolution of 64 × 64 pixels using a single photodetector. Achuta et al. [13] used a coded aperture to develop a compressive ToF system. The experiments verified that the method could reduce the number of sensors by 85% without quality reduction. Li et al. [14] utilized DMD to establish a CS–ToF system. The spatial resolution was increased by approximately 4× using optical multiplexing and phasor representation. As a simple and cost-effective solution, compressive imaging for SR provides a promising method to acquire SR depth images for ToF cameras.

However, the introduction of compressive imaging to SR presents new challenges. First, the spatial resolution is improved by sacrificing temporal resolution. This is caused by the high complexity of the computational reconstruction. In general, the temporal resolution of a ToF camera can reach 30 fps, whereas in a compressive ToF system, the frame drops below 1 fps (in [14], the wall time required to reconstruct one SR depth image is more than 30 s). Second, compressive imaging significantly increases the data volume, which increases the pressure on the data transmission. For instance, to obtain an SR depth image with a spatial resolution improvement of 4× (2× in the x direction and 2× in the y direction), one needs to acquire at least four LR images. This means that four times the data volume must be read out and transferred, which imposes tremendous pressure on the ToF camera. This leads to a further reduction in the system efficiency. Consequently, the compressive ToF system for SR is difficult to apply in practical situations.

It is desirable to improve the efficiency and overcome the “data overload” for a compressive ToF system. Therefore, several fast algorithms have been applied [15–18]. In recent years, the block compressed sensing (BCS) of images has gained prominence. In BCS, the original signal is divided into several small blocks and each block is sampled independently, leading to a low memory requirement and low encoding complexity.

In this study, BCS was used to overcome the limitations of existing compressive ToF imaging for SR. To improve the efficiency, we introduced BCS into the SR image acquisition process of the system. To overcome the “data overload” problem, we modified the data transmission process of the ToF camera and introduced BCS into the process.

The main contributions of this study are:

(1) To improve the efficiency of the existing compressive ToF system, we introduced the BCS method into the SR image-acquisition process. In detail, we modified the measurement matrix of the BCS and designed DMD masks to make the BCS compatible with the SR image acquisition process. Consequently, the computing costs were significantly reduced, which means that the efficiency was distinctly improved.
(2) To overcome the “data overload” of the existing compressive ToF system, we similarly utilized the BCS in the data transmission process. Specifically, we partition the ToF sensor read-out into several blocks and design the read-out mode using a linear feedback shift register (LFSR) to make the BCS compatible with the data transmission process. As a result, the pressure on data transmission is greatly reduced, which means the “data overload” problem is satisfactorily resolved.
(3) Based on the modified SR image acquisition and data transmission processes, we established the BCS-ToF system, and the applicability of the system was verified experimentally. The efficient and lightweight approach was verified to have significant improvements over approaches in prior studies.

The remainder of this paper is organized as follows. Section 2 introduces the principles of the ToF camera. Section 3 describes the proposed BCS-ToF system in detail. Section 4 presents experiments and discussion. Finally, in Section 5, the paper is concluded.

2. Fundamentals of ToF cameras

ToF cameras typically comprise emitting, receiving, and processing units, as shown in Fig. 1. The emitting unit is mainly comprised of a modulator, a driver, and several illuminators. Vertical-cavity surface-emitting lasers (VCSELs) are often used as illuminators. The receiving unit is composed of a lens, ToF sensor, and several analog-to-digital converters (ADCs). Fabricated with special components, the ToF sensor can acquire depth information. The processing unit is a digital signal processing (DSP) controller that corrects and transforms the data.

Fig. 1. Typical components of a ToF camera

Download Full Size | PDF

In the demodulation process, the distance is measured indirectly as the round-trip phase delay between a modulated light source and a synchronized array of gated pixels. Assuming that the emitted light signal is $E(t )= {A_0}\cos (\omega t)$, the received signal can be represented as $R(t )= k{A_0}\cos (\omega t + \varDelta \varphi ) + {B_0}$, where $\omega $ is the modulation frequency, A is the amplitude of the emitted signal, $\varDelta \varphi $ is the phase shift between the emitted and received signals, B is the noise signal generated during the transmission of the light, and k is the signal attenuation coefficient. In this process, the four-step phase-shift method is commonly used to demodulate the depth information. Two different capacitors with two phase windows are set under each pixel of the ToF sensor. Four samplings, termed differential correlation samplings (DCSs), are captured at four different phases (0°, 90°, 180°, and 270°) by the capacitors. The distance is given by

(1)$$D({x,y} )= \frac{c}{2} \times \frac{1}{{2\pi \omega }} \times atan2\left( {\frac{{DCS3({x,y} )- DCS1({x,y} )}}{{DCS2({x,y} )- DCS0({x,y} )}}} \right)$$

where c represents the speed of light.

3. Methods

In this section, we introduce the architecture of the proposed BCS-ToF system. Second, the BCS method and the modification of the method are illustrated. Finally, to explain the application of the BCS method to our system, modifications of the SR image acquisition process as well as the data transmission process are introduced in detail.

3.1 Architecture of the proposed system

To achieve SR depth image acquisition, we first established a compressive ToF system for SR, based on DMD. A diagram of the compressive ToF system is shown in Fig. 2. The system is presented as a two-arm system that includes an imaging arm and a relay dispersion arm. After being illuminated by the VCSEL, the object is projected onto the high-resolution DMD in the imaging arm. After being reflected and modulated by the DMD, the object is then re-imaged onto the low-resolution ToF sensor in the relay dispersion arm. Finally, 3-dimensional information is acquired and calculated using the ToF sensor.

Fig. 2. Overview of the compressive ToF system

Download Full Size | PDF

In a compressive ToF system for SR, the DMD is the key component for realizing sparse sampling and achieving super-resolution. As a type of SLM, DMD can modulate light into a specific form. It consists of an array of micromirrors, each with individually controllable tilt. When turned on, the mirrors are tilted about their diagonal hinge at an angle of +12°, as shown in case (a) in Fig. 2. In this case, the radiation of the object is reflected into the relay lens and is captured by the ToF sensor. Otherwise, as shown in case (b) in Fig. 2, radiation is reflected into the air and wasted.

To clearly illustrate our work on the compressive ToF system for SR, we define the processes ①, ②, and ③ in Fig. 2 as the “SR image acquisition process”. Meanwhile, we define the process ④ as the “data transmission process”.

The framework of the BCS-ToF system is illustrated in Fig. 3. It contains two parts: the SR imaging acquisition process and the data transmission process. We introduce BCS into the two processes simultaneously to realize high efficiency and low data requirements. Smoothed projected Landweber (SPL) reconstruction, which provides a good trade-off between computational complexity and reconstruction quality, was chosen as the reconstruction algorithm. However, the standard SPL does not match well with our system. Therefore, we revised the BCS-SPL method, as illustrated in Section 3.2. The detailed implementation of the revised BCS-SPL in the system is presented in Section 3.3.

Fig. 3. Framework of the BCS-ToF system

Download Full Size | PDF

3.2 Revised BCS-SPL method

3.2.1 System model for BCS

Suppose that a real-valued signal x with length N is needed to recover from M samples, where M<<N. The problem can be transformed into a mathematical model:

(2)$$y = {\rm{\varPhi }}x$$

where y represents an $M \times 1$ sampled vector, and ${\rm{\varPhi }}$ is a measurement matrix with the size of $M \times N$. In general, recovering $x \in {\Re ^N}$ from its corresponding y is impossible because the number of unknowns is much larger than the observations (the solution is ill-posed). However, exact recovery is possible if x is sufficiently sparse in a known transform domain, ${\rm{\varPsi }}$. In other words, the transform-domain signal $f = {\rm{\varPsi }}x$ can be approximated using only d nonzero entries. It has been proven that if ${\rm{\varPhi }}$ and ${\rm{\varPsi }}$ are incoherent, x can be recovered from the measurements through nonlinear optimizations. This constitutes the basis of the CS.

To achieve fast and precise sensing, BCS is chosen to design the measurement matrix in both the SR image acquisition process and the data transmission process, specifically, the design of the DMD mask and the read-out fashion of ADCs.

In BCS, the original signal is partitioned into $B \times B$ small blocks and each block is sampled independently using the same measurement operator [18]. For every block, the equation x turns into,

(3)$${y_j} = {{\rm{\varPhi }}_B}{x_j}$$

where ${x_j}$ represents the block j of the original signal and ${{\rm{\varPhi }}_B}$. is an ${M_B} \times {B^2}$. orthonormal measurement matrix with ${M_B} = \left[ {\frac{M}{N}{B^2}} \right]$. . The global measurement matrix is then transformed into a block diagonal:

(4)$$\varPhi = \left[ {\begin{array}{cccc} {{\Phi_B}}&0& \cdots &0\\ 0&{{\Phi_B}}& \cdots &0 \\ \vdots & \vdots & \ddots & \vdots \\ 0& \cdots &0&{{\Phi_B}} \end{array}} \right]$$

3.2.2 Revised SPL reconstruction

For the reconstruction algorithm, although various reconstruction algorithms based on BCS have been thoroughly investigated [19–22], the SPL algorithm, which provides a good trade-off between the computational complexity and the reconstruction quality, is chosen as the reconstruction algorithm, which is presented in Table 1.

Table 1. The revised SPL algorithm

View Table | View all tables in this article

However, SPL does not match well with our system. In the standard SPL, an orthonormal measurement matrix is required, while in our system, the Hadamard matrix is modified, where the “-1” in the matrix is changed into “0”. Consequently, the measurement matrix loses its normalized features. Therefore, we revised the SPL algorithm to make it more suitable for our system. Specifically, we add an operator ${({{\rm{\varPhi }} \cdot {{\rm{\varPhi }}^T}} )^{ - 1}}$. to the calculation process of and $X_j^{i + 1}$. We chose the dual-tree discrete wavelet transform (DDWT) for the transform domain ${\rm{\varPsi }}$.

3.3 Detailed implementation

3.3.1 SR image acquisition process

In the SR image acquisition process, the DMD establishes the basis of CS, which means that the measurement matrix on the DMD determines the effect of the measurement. The Hadamard transform matrix, known to be well-conditioned and suitable for SR measuring [23,24], was used as the measurement matrix fotwo reasons. First, the Hadamard matrix is full-rank, which means that the compressed samples can be used most effectively through the DMD mask [25]. Second, when the DMD is working with a Hadamard matrix, at least half of the mirrors are open. Compared to utilizing random matrices, the DMD reflects more light, leading to an improvement in the signal-to-noise ratio (SNR).

The DMD mask-generating scheme based on the Hadamard matrix is shown in Fig. 4, and can be simply described as:

(1) Generate a Hadamard matrix of order n² (where n equals the mirror numbers corresponding to an individual pixel of the ToF camera).
(2) Revise the Hadamard matrix. Change the “-1” in the matrix to “0” because the DMD only contains two states: “open” and “close”. Then, replace the first row with the covector of the second row to reduce the pixel crosstalk.
(3) Separate the row vectors of the revised Hadamard matrix. Fill the small mask block using the separated vectors in column-major order.
(4) Fill the DMD mask with the small mask blocks.

Fig. 4. DMD mask generating scheme. The white squares represent the value 1 while black squares represent 0

Download Full Size | PDF

3.3.2 Data transmission process

After modifying the SR image acquisition process, BCS is introduced into the data transmission process to overcome the data overload caused by SR image acquisition. Note that the physical integrated circuit (IC) designs of modern complementary metal oxide semiconductors (CMOS) do not lend themselves to CS processing; therefore, it is necessary to modify the internal structure of the ToF camera. In [26], the LFSR was applied to the ADC reading process to establish the acquisition model. In this manner, the outputs of the ADC were rearranged in a pseudo-random fashion, which is required for the CS. Subsequently, in [27], the method was extended to a ToF camera. Multiplexers were utilized to combine pixel read-out to realize compressive read-out. Following this idea, we modified the architecture of our ToF camera to utilize BCS in the data transmission process, as shown in Fig. 5.

Fig. 5. Modification of the data transmission process

Download Full Size | PDF

In a traditional CMOS ToF sensor design, data in one row are directly read pixel-by-pixel. In the modified structure, the read-out method was rearranged through modifications. The linear combinations of neighboring pixels are read and transmitted using the multiplexer. Sparse sampling is achieved through this method, which is necessary for BCS.

Owing to the limitation of the read-out circuit [26], the Hadamard matrix generated in 3.3.1 is hard to apply to the transmission process. In this section, the block partial circulant matrix (BPCM), which has been proven to have the ability to recover signals from sparsity [28], is chosen as the measurement matrix. Compared with the random matrix, the BPCM has the advantages of being hardware-friendly and having low computational complexity [29]. The generation and application of the BPCM in the transmission process can be illustrated as follows:

1. Generate a pseudo-random sequence containing 0 and 1 with a length of M×N using the linear feedback shift register (LFSR).
2. Cyclically shift the generated sequence to form the BPCM. Extract a vector with a length of N and write it into the shift register.
3. Conduct transmission and compression in parallel. Specifically, the sequence is pushed to the output unit of every pixel in the block to control the output signal. In this step, “0” means “no output” and “1” means “output”.
4. Read out the compressed voltage output with the multiplexer. The output is then converted to a digital value using an analog-to-digital converter (ADC).

4. Results and discussion

4.1 BCS-ToF experimental system

The BCS-ToF system prototype assembled in our laboratory is illustrated in Fig. 6. The system is composed of a 4-VCSEL board (as shown in Fig. 6(a)) attached to a 12.5 mm C-Mount Lens (MVL12WA, Navitar), a DMD (DLP LightCrafter 4500, Texas Instruments), a VIS-NIR doublet relay lens (MAP10100100-B, Thorlabs) and a ToF sensor. A total internal reflection (TIR) prism was aligned carefully to ensure that the infrared light received from the object could be properly beamed to the DMD and then reflected to the ToF sensor. The read-out circuit of the ToF sensor is redesigned to achieve compressive transmission. Different from the data acquisition process, the modification of the data transmission process is manufactured in the IC (integrated circuit), which surrounds the ToF sensor and is fabricated in the receiving board, as shown in Fig. 6(b).

Fig. 6. BCS-ToF system prototype

Download Full Size | PDF

The original resolutions of the DMD and ToF camera were 1140×912 and 320×240, respectively. Considering the effects of lens aberration, we only utilized the central regions of the DMD (512×512 pixels) and ToF sensor (128×128 pixels) to conduct the experiments. Therefore, every block of 4×4 micromirrors of the DMD maps to one pixel of the ToF sensor, brings 16 times resolution improvement in the ideal case.

Owing to multiple sources of errors, the system must be calibrated. We first calibrated the ToF camera based on the method described in our previous work [30,31]. Two types of errors are suppressed: depth-accuracy-related errors (ambient light, temperature, and reflectivity) and flat-field-related errors (integration time, fixed-pattern noise, column ADC variation, and row address variation). Second, we calibrate the BCS-ToF system to ensure exact mapping between the DMD blocks and ToF sensor pixels [32]. We carefully switched on the blocks (each block corresponds to an individual pixel of the ToF sensor) of the DMD one-by-one and simultaneously recorded the ToF camera output. The calibration matrix C is then acquired and used to calibrate the system. C is the sparse matrix which presents the exact mapping between the ToF sensor pixels and the DMD blocks.

4.2 Depth image reconstruction

In SR reconstruction, the quality of the recovered image is the most important indicator. Therefore, we first verified the quality of the reconstruction. In the proposed system, the compressed sensing was applied to two processes. While different processes present distinct characteristics, we divide the quality verification into two parts, which are illustrated in Sections 4.2.1, and 4.2.2, respectively. Second, when combining the two processes to constitute the entire system, optimizing the compression ratio (CR) combination to achieve the best performance of the BCS-ToF system becomes critical. Thus, we chose depth accuracy, image quality, and amount of data as the evaluation criteria to evaluate the performance of different CR combinations in Section 4.2.3.

4.2.1 SR image acquisition process

To verify the resolution improvement of the system, USAF 1951 and Siemens’ star resolution charts were utilized to conduct the experiments. We made two modifications to the charts to make them more visible in the depth image. In detail, we removed some of the stripes and added reference plates behind the charts, where the distance between the chart and the plate was 10 cm, as shown in Fig. 7(a) and (f). LR images and reconstructed depth images with different CRs were acquired. CR was defined using $\alpha = m/n$, where m is the number of images used, and n is the total number of images. To illustrate this clearly, the CR in the SR image acquisition process is denoted as CR_A, whereas in the data transmission process, it is denoted as CR_T. In this section, because only the resolution that can be achieved in the acquisition process is verified, the ADCs maintain the sequence read-out mode in the transmission process.

Fig. 7. Image super-resolution reconstruction of two resolution targets. (a) Visible image of the USAF 1951 resolution chart. (b) LR depth image (c)-(e) Reconstructed depth images with CR_A of 0.125, 0.375 and 0.875. (f)-(j) The corresponding images of the Siemens star resolution chart.

Download Full Size | PDF

Figure 7(b) and (e) show the LR images that were captured when all the DMD were switched “on”. In the LR images, most of the details are lost. The horizontal and vertical bars are difficult to distinguish in the box in Fig. 7(b). The edges of the star are blurry in the box in Fig. 7(g). These details were effectively recovered in the reconstructed depth images with CR_A of 0.375 and 0.875, as shown in Fig. 7(d), (e), (i) and (j). When CR_A is too low, e.g., 0.125 in Fig. 7(c) and (h), the effect of the recovery becomes worse because of the insufficient recovery information. Thus, to obtain a high-quality super-resolution depth image, a compression ratio higher than 0.375 is necessary.

To verify its effectiveness for a real scene, two groups of 3D objects were chosen. The first group is a set of plaster castings, including a statue of Alexander, a cube, and a ball, as shown in Fig. 8(a). The second group was a set of ornaments, including a pot of flowers, a pot of grass, and a doll.

Fig. 8. Image super-resolution reconstruction of two 3D object groups. (a) Plaster casting group. (b) LR depth image. (c)-(e) Reconstructed depth images with CR_A of 0.125, 0.375 and 0.875. (f)-(j) Corresponding images of the ornaments group.

Download Full Size | PDF

Figure 8 shows the results of the image super-resolution reconstruction. In the LR images, as shown in Fig. 8(b) and (g), the features of the objects are indistinct. For instance, the nose and ear of the Alexander statue are difficult to distinguish (green box in Fig. 8(b)) and the edges of the leaves are vague (green box in Fig. 8(g)). For the reconstruction with a compression ratio of 0.375, as shown in Fig. 8(d) and (i), the features were recovered to a higher quality. The results for a compression ratio of 0.125, as shown in Fig. 8(c) and (h), show distinct block artifacts, which is caused by too few recovery images being used. The results with a compression ratio of 0.875, as shown in Fig. 8(e) and (j), show no significant observable improvement in quality compared with a compression ratio of 0.375. Considering the tradeoff between the performance and the efficiency, a CRA of 0.375 is insufficient for practical applications.

4.2.2 Data transmission process

After investigating the super-resolution depth image acquisition, the BCS-SPL was then utilized in the transmission process to solve the “data overload” caused by the acquisition. In this section, the ADCs were set to compressive read-out mode to realize compressive transmission. Meanwhile, all DMD were switched “on” to capture the original images. In other words, CS was not applied in the acquisition process. We conducted experiments similar to those in Section 4.1, with the 3D objects. The results are shown in Fig. 9.

Fig. 9. Results of compressed transmission. (a)-(e) Reconstructed images of the plaster casting group with CR_T of 0.0625, 0.125, 0.25, 0.5 and 1, respectively. (f)-(j) Reconstructed images of the plaster casting group with CR_T of 0.0625, 0.125, 0.25, 0.5 and 1, respectively.

Download Full Size | PDF

Figure 9(a)–(e) are the reconstructed images of the plaster casting group. When CR_T is 0.0625, as shown in Fig. 9(a), the objects are difficult to distinguish, which means most of the depth values are completely wrong. This phenomenon significantly decreases with a compression ratio of 0.125, as shown in Fig. 9(b). However, the edges of the objects are still unclear. When CR_T reaches 0.25, as shown in Fig. 9(c), the sharpness of the edges as well as the features of the objects are clear and easy to distinguish. Although the image quality is further improved under higher CR_Ts, as shown in Fig. 9(d) and (e), a compression ratio of 0.25 is sufficient for practical applications. This means there was a 75% data reduction during the transmission process. In other words, the “data overload” induced by SR image acquisition was significantly suppressed. Results for the ornaments (Fig. 9(f)–(j)) show the same trend.

4.2.3 Optimize the CR combination

So far, the applicability of the acquisition and transmission processes has been verified. In practical use, the two processes are combined. Thus, choosing CR_A and CR_T (i.e., optimizing the CR combination) to achieve the best performance of the BCS-ToF system is critical. In this section, the performance of different CR combinations is evaluated from three aspects: depth accuracy, image quality, and data amount.

Depth accuracy determines the applicability of the ToF system for distance measurement. The extra error caused by the reconstruction process significantly affects depth accuracy. Thus, a series of experiments was conducted to explore the depth accuracy of CR combinations. First, we set up a reflecting plate (with a reflectivity of 80%) at different distances, as shown in Fig. 10(a). In the captured depth images, the central region of the plate was chosen as the region of interest (ROI) and used to acquire the mean value. At each distance, the original images and the reconstructed images with different CR combinations were captured. Reconstruction errors were calculated using $\delta = mea{n_{Reconstruction}} - mea{n_{Origin}}$. Generally, the ToF camera system error is less than 10 mm for a 5 m distance range [33]. If the error introduced by reconstruction is less than 10% (1 mm at 5 m) of the system error, it was considered acceptable in this study.

Fig. 10. Reconstruction results with distances. (a) Verification setup. (b) Variation of reconstruction error with distances under different CR combinations.

Download Full Size | PDF

Figure 10(b) demonstrates the variation of reconstruction error with distance. An important observation is that not all the CR combinations are acceptable. For instance, the CR combination of CR_A= CR_T= 0.375 brings a 2.441 mm error at a distance of 5 m. To find the acceptable range of the CR combinations, we extract all the reconstruction results at 5 m.

Figure 11 shows the reconstruction errors for different CR combinations. Figure 11(a) illustrates the variation of reconstruction error with different CR combinations at a distance of 5 m. The combinations with a reconstruction error of less than 1 mm are highlighted inside the red box. To find the best combination, we take the PSNR as well as the overall compression ratio CR_ALL (we define CR_ALL= CR_A× CR_T) as the evaluation criteria. Eight points are extracted and tabbed as P₁–P₈, as shown in Fig. 11 (b). Then PSNR and CR_ALL are calculated, as shown in Fig. 11(c). It is found that P₂ (CR_A= 0.4375, CR_T= 0.8125) has the highest PSNR (35.88), which means that it has the best reconstruction effect. P₄ (CR_A= 0.5625, CR_T= 0.5) has the lowest CR_ALL (0.2464), which means that it uses the least amount of data. By comparing P₂ and P₄, it can be found that the PSNR of P₂ (36.10) just slightly higher than that of P₄ (35.88), while the CR_ALL of P₂ (0.3555) close to 1.5 times than that of P₄ (0.2464). Thus, we choose P4 (CR_A = 0.5625, CR_T = 0.5) to be the practical CR combination.

Fig. 11. Reconstruction error for different CR combinations. (a) Variation of reconstruction error with different CR combinations. (b) Points chosen from (a). (c) Comparison of the chosen points.

Download Full Size | PDF

Finally, to verify the transverse accuracy of the system, another set of experiments was conducted using a piecewise planar object. The object contained five pieces with various geometric shapes, that were each spaced apart by 0.05 m. The object was set at 2 m, which means the five pieces were set at 1.75, 1.8, 1.85, 1.9, and 1.95 m, as shown in Fig. 12(a)–(c).

Fig. 12. Experiments with the piecewise planar object. (a)–(c) Illustration of the object. (d) The reconstructed depth image in 3-D mode. (e) The transverse distribution along line A-A’. (f) Variations of the absolute errors along line A1-A1’. (g) The transverse distribution in box B. (h) Variations of the errors in box B.

Download Full Size | PDF

Figure 12(d) shows the reconstructed depth image in 3-D mode. For a quantitative analysis, the depth values were extracted. Figure 12(e) illustrates the transverse distribution along line A-A’. The reconstruction results with the CR combination of CR_A= 0.5625, CR_T= 0.5 (marked as CR_COMB1) and the LR image are plotted. To make a comparison, two other combinations, which are CR_A= CR_T= 0.25 (marked as CR_COMB2) and CR_A= CR_T= 0.75 (marked as CR_COMB3) are also plotted. We extracted 100 values in line A-A’ and calculated the absolute errors. As shown in Fig. 12(f), the maximum absolute error of the CR_COMB1 is 0.56 mm, which is acceptable in the ToF system. Figure 12(g) illustrates the transverse distribution in box B. Similarly, we extract the values in the box and calculate the errors, as shown in Fig. 12(h). It can be found that the errors fluctuate in a range of 0.46 mm, which is lower than 1 mm. In conclusion, we verify the depth accuracy of the chosen CR combination (CR_A= 0.5625, CR_T= 0.5) in two aspects, which are the longitudinal error and the transverse error. The results show the chosen CR combination is acceptable for BCS-ToF system in practical applications.

The final reconstruction results under the CR combination of CR_A= 0.5625 and CR_T= 0.5 as well as the original images are shown in Fig. 13. Compared to the original images, the resolution of the reconstruction results was distinctly improved, the details were clearly recovered, and the amount of data was effectively reduced.

Fig. 13. Final reconstruction results. (a) Original image of the plasters casting group. (b) Reconstruction result of the plaster casting group. (c) Original image of the plaster casting group. (d) Reconstruction result of the ornaments group.

Download Full Size | PDF

4.3 Discussion

In Section 4.2, we found that the CR combination of CR_A= 0.5625 and CR_T= 0.5 provides the best performance for the proposed BCS-ToF system. To quantitatively illustrate the effectiveness of the system, we conducted a contrastive analysis. Various reconstruction methods were chosen because of their robustness and speed, including the NESTA algorithm [34], gradient projection for sparse reconstruction (GPSR) [15], and sparsity adaptive matching pursuit (SAMP) [16]. The compression ratio combinations were set to be identical. All the reconstructions were conducted on a computer with an Intel i7-9700F CPU with clock speed of 3.0 GHz. The object was a reflecting board, and the distance was set to 5 m. The peak signal-to-noise ratio (PSNR), root-mean-square error (RMSE), reconstruction error, and computing time were selected as the evaluation criteria. Among these, PSNR and RMSE were typically used to evaluate the quality of an image [35]. In the depth image, they were defined as

(5)$$RMSE = \sqrt {\frac{1}{{MN}}{{\sum\limits_{i = 1}^M {\sum\limits_{j = 1}^N {[{{D_R} - {D_M}} ]} } }^2}}$$

(6)$$PSNR = 20{\log _{10}}\left( {\frac{{\max \{{{D_M}} \}}}{{RMSE({{D_R},{D_M}} )}}} \right)$$

where ${D_R}$ was the reference depth data and ${D_M}$ was the measured depth data. M and N were the pixel identification numbers in a column and row, respectively. The comparison results are presented in Table 2.

Table 2. Comparison of revised BCS-SPL with other algorithms

View Table | View all tables in this article

In the four algorithms, SAMP and GPSR yielded reconstruction errors of 1.51 mm and 1.34 mm, respectively, which were not acceptable. Although NESTA resulted in a slightly higher PSNR (with a difference of 0.37 dB) and lower RMSE (0.26 mm), its computing time (135.4 s) was much higher than that of the revised BCS-SPL. Meanwhile, the revised BCS-SPL exhibited excellent performance in terms of efficiency in all methods. Thus, the revised BCS-SPL was suitable for achieving high-quality and high-speed reconstruction of the BCS-ToF system.

In contrast with the existing compressive ToF system, the framework significantly improves efficiency. Several experiments were conducted for comparison. During the experiments, the usable area of the ToF sensor and the improvement resolution times were the same as those in [14] (186 × 200 usable area, 4x resolution improvement). The results are shown in Table 3. It can be observed that the efficiency of the system significantly reduced the reconstruction time from 30-120 s to 3.4 s. Owing to several reasons such as the synchronization or the data read-out process, the total wall time was inevitably higher than the reconstruction time. In this study, we re-designed the data read-out mode to achieve parallel transmission, as shown in Section 3.3.2. Additionally, we utilized a controller to ensure synchronization between the projection of the DMD and the capture of the ToF camera. As a result, the total wall time was approximately 5 s.

Table 3. Comparison of the proposed framework with the existing method

View Table | View all tables in this article

Subsequently, to illustrate the reduction in data requirements, a spatiotemporal calculation is conducted. We assume that the data compression ratio in the acquisition process is c_A= s × m/g, where m is the number of reconstructed images, s = 128 × 128 = 16384 pixels (P) is the number of pixels used in the ToF sensor, and g = 512 × 512 = 262144 P is the usable DMD spatial resolution. Nine images were used to achieve high-quality reconstruction, which means m = 9. Thus, c_A= 16384 × 9 / 262144 = 56.25%. In the transmission process, the compression ratio c_T is 50%, as illustrated in Section 4.2.2. Thus, the overall compression ratio is c_all= c_A·c_T= 0.5625 × 0.5 = 28.125%. This indicates approximately one quarter of the data is sufficient to reconstruct the high-quality 16× super-resolution image. Note that a larger block size (e.g. 8 × 8 or higher) can utilize more pixels on DMD for reconstruction, resulting in a higher compression. This will be explored in future research as it is outside the scope of this study. Similarly, for equal comparison, we performed a calculation as conducted in [14]. Theoretically, the maximum reconstruction resolution is limited by the physical resolution of the DMD, which means g_theo = 1140 × 912 = 1039680 P. Thus, c_A-theo = 16384 × 9 / 1039680 = 14.18%. c_all-theo= c_A-theo·c_T= 0.1418 × 0.5 = 7.09%. The amount of data required to reconstruct a high-quality image is less than one third of that in [14]. In addition, the ToF camera has a maximum framerate of f = 150 fps. When using m = 9 images to recover one high quality super-resolution image, the systems theoretical video frame rate is f_Sys = f/m = 16.67 fps. Note that although the transmission process further compresses the data, it does not influence the f_Sys.

Several limitations owing to the specifics of the compressive imaging system should be discussed. (1) Pixel crosstalk. This predominantly occurs for two reasons, the inexact alignment between the DMD and the ToF sensor [36], and lens distortion. Although we used the calibration matrix to correct it, the phenomenon remains non-negligible especially when the block size is larger than 8 × 8. Improved optimization of the measurement matrix or lens design may contribute to solving the problem. (2) Light waste. During the SR image acquisition process, part of the infrared light is reflected into the air and wasted because the DMD is a binary mirror sequence. In future research, we aim to design the measurement matrix with greater light utilization.

5. Conclusion

Existing compressive ToF systems for achieving SR suffer from low efficiency and high data storage requirements. In this paper, we propose a fast and lightweight compressive ToF framework for SR. The BCS method, which exhibits distinct characteristics of high efficiency and low implementation cost than previous methods, is introduced into the SR image acquisition and data transmission processes.

To verify the applicability of the proposed BCS-ToF framework, a BCS-ToF system was established. Several experiments were conducted to verify its resolution improvement, image quality, and depth accuracy. The results show that a CR combination of CR_A= 0.5625 and CR_T= 0.5 achieves the best performance, which results in a 16× resolution improvement, high image quality, and a reconstruction depth error of less than 1 mm. Additionally, we conducted a comparison with previous methods. The revised BCS-SPL method significantly improves the efficiency compared with several common methods (from 65.7 to 7.8 s), while reducing the reconstruction error (from 0.94 to 0.66 mm). Compared to existing compressive ToF systems, the proposed BCS-ToF system achieves a distinct improvement in efficiency (reconstruction time reduction from 30 to 3.4 s) and realizes a significant reduction in the data storage requirement to approximately one third of that previously required.

We believe that this study provides a potential direction for compressive ToF imaging, and also provides effective guidance for researchers realizing highly efficient and lightweight SR image reconstruction. Future work will focus on improving the feasibility of the proposed solution. For instance, a more efficient measurement matrix and reconstruction methods will be attempted, system structure optimization and an investigation into deep learning compressed sensing methods will be conducted.

Funding

National Defense Basic Scientific Research Program of China (JCKY2019203B035).

Disclosures

The authors declare no conflicts of interest.

Data availability

The data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. Wheaton, A. Bonakdar, I. H. Nia, C. L. Tan, and H. Mohseni, “Open architecture time of flight 3D SWIR camera operating at 150 MHz modulation frequency,” Opt. Express 25(16), 19291–19297 (2017). [CrossRef]

2. F. Heide, L. Xiao, A. Kolb, M. B. Hullin, and W. Heidrich, “Imaging in scattering media using correlation image sensors and sparse convolutional coding,” Opt. Express 22(21), 26338–26350 (2014). [CrossRef]

3. E. L. Francois, A. Griffiths, J. Mckendry, C. Haochang, and M. Strain, “Combining Time of Flight and Photometric Stereo Imaging for 3D Reconstruction of Discontinuous Scenes,” Opt. Lett. 46(15), 3612–3615 (2021). [CrossRef]

4. R. Holger, “Experimental and Theoretical Investigation of Correlating TOF-Camera Systems,” University of Heidelberg, Germany (2007).

5. C. Guan, L. Hassebrook, and D. Lau, “Composite structured light pattern for three-dimensional video,” Opt. Express 11(5), 406–417 (2003). [CrossRef]

6. M. Hasler, T. Haist, and W. Osten, “Stereo vision in spatial-light-modulator-based microscopy,” Opt. Lett. 37(12), 2238–2240 (2012). [CrossRef]

7. R. Whyte, L. Streeter, M. J. Cree, and A. A. Dorrington, “Application of lidar techniques to time-of-flight range imaging,” Appl. Opt. 54(33), 9654–9664 (2015). [CrossRef]

8. Y. Konno, M. Tanaka, M. Okutomi, Y. Yanagawa, and M. Kawade, “Depth map upsampling by self-guided residual interpolation,” in IEEE 23rd International Conference on Pattern Recognition (ICPR), (IEEE, Cancun, Mexico, 2016), pp. 1394–1399.

9. J. Xie, R.S. Feris, and M.T. Sun, “Edge guided single depth image super resolution,” in IEEE International Conference on Image Processing, (IEEE, Paris, France, 2014), pp. 3773–3777.

10. D. Chao, L. C. Change, H. Kaiming, and T. Xiaoou, “Image Super-Resolution Using Deep Convolutional Networks,” IEEE Trans Pattern. Anal.38(2), 295–307 (2015).

11. F. Li, P. Ruiz, O. Cossairt, and A.K. Katsaggelos, “Multi-frame Super-resolution for Time-of-flight Imaging,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, Brighton, UK, 2019), pp. 2327–2331.

12. A. Kirmani, C. Andrea, F. N. C. Wong, and V. K. Goyal, “Exploiting sparsity in time-of-flight range acquisition using a single time-resolved sensor,” Opt. Express 19(22), 21485–21507 (2011). [CrossRef]

13. A. Kadambi and P.T. Boufounos, “Coded aperture compressive 3-D LIDAR,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, South Brisbane, Australia2015), pp.1166–1170.

14. F. Li, H. Chen, A. Pediredla, C. Yeh, K. He, A. Veeraraghavan, and O. Cossairt, “CS-ToF: High-resolution compressive time-of-flight imaging,” Opt. Express 25(25), 31096–31110 (2017). [CrossRef]

15. M. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems,” IEEE J. Sel. Top. Signal Process. 1(4), 586–597 (2007). [CrossRef]

16. T.D. Thong, G. Lu, N. Nam, and D.T. Trac, “Sparsity adaptive matching pursuit algorithm for practical compressed sensing,” in 2008 42nd Asilomar Conference on Signals, Systems and Computers, (IEEE, Pacific Grove, CA, USA, 2008), pp. 581–587.

17. J. Haupt and R. Nowak, “Signal reconstruction from noisy random projections,” IEEE T. Inform. Theory.52(9), 4036–4048 (2006). [CrossRef]

18. L. Gan, “Block Compressed Sensing of Natural Images,” in 2007 15th International Conference on Digital Signal Processing, (IEEE, Cardiff, UK, 2007), pp. 403–406.

19. A. Akbari, M. Trocan, and B. Granado, “Residual based Compressed Sensing Recovery using Sparse Representations over a Trained Dictionary,” in 11th International ITG Conference on Systems, Communications and Coding, (IEEE, Hamburg, Germany, 2017), pp. 1–6.

20. M. Trocan, T. Maugey, J.E. Fowler, B. Pesquetpopescu, and T. Paristech, “Disparity-compensated compressed-sensing reconstruction for multiview images,” in IEEE International Conference on Multimedia and Expo, (IEEE, Singapore, 2010), pp. 1225–1229.

21. J.E. Fowler, S. Mun, and E.W. Tramel, “Multiscale block compressed sensing with smoothed projected Landweber reconstruction,” in 19th European Signal Processing Conference, (IEEE, Barcelona, Spain, 2011), pp. 564–568.

22. A. Akbari and M. Trocan, “Sparse recovery-based error concealment for multiview images,” in International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM), (IEEE, Prague, Czech Republic, 2015), pp. 1–5.

23. C. Mingbo, H. Xinxin, X. Yang, X. Huaming, and W. Yihui, “Micro two-dimensional slit-array for super resolution beyond pixel Nyquist limits in grating spectrometer,” Opt. Express 29(9), 13669–13680 (2021). [CrossRef]

24. Z. Yao, C. Qian, S. Xiubao, and G. Hang, “Super Resolution Imaging Based on a Dynamic Single Pixel Camera,” IEEE Photonics. J. 9(2), 1–11 (2017). [CrossRef]

25. S. L. Shishkin, “Fast and Robust Compressive Sensing Method Using Mixed Hadamard Sensing Matrix,” IEEE J. Emerg. Sel. Topics Circuits Syst. 2(3), 353–361 (2012). [CrossRef]

26. L. Jacques, P. Vandergheynst, A. Bibet, V. Majidzadeh, A. Schmid, and Y. Leblebici, “CMOS compressed imaging by Random Convolution,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, Taipei, Taiwan, 2009), pp. 1113–1116.

27. S. Antholzer, C. Wolf, M. Sandbichler, M. Dielacher, and M. Haltmeier, “Compressive Time-of-Flight 3D Imaging Using Block-Structured Sensing Matrices,” Inverse Probl. 35(4), 045004 (2019). [CrossRef]

28. H. Rauhut, J. Romberg, and J. A. Tropp, “Restricted isometries for partial random circulant matrices,” Appl. Comput. Harmon. A. 32(2), 242–254 (2012). [CrossRef]

29. D. Lee, T. Sasaki, T. Yamada, K. Akabane, and K. Uehara, “Spectrum Sensing for Networked System Using 1-Bit Compressed Sensing with Partial Random Circulant Measurement Matrices,” in IEEE 75th Vehicular Technology Conference (VTC Spring), (IEEE, Yokohama, Japan, 2012), pp. 1–5.

30. X. Q. Wang, P. Song, and W. Y. Zhang, “An Improved Calibration Method for Photonic Mixer Device Solid-State Array Lidars Based on Electrical Analog Delay,” Sensors 20(24), 7329 (2020). [CrossRef]

31. X. Q. Wang, P. Song, W. Y. Zhang, Y. J. Bai, and Z. L. Zheng, “A systematic non-uniformity correction method for correlation-based ToF imaging,” Opt. Express 30(2), 1907–1924 (2022). [CrossRef]

32. A. Mahalanobis, R. Shilling, R. Murphy, and R. Muise, “Recent results of medium wave infrared compressed sensing,” Appl. Opt. 53(34), 8060 (2014). [CrossRef]

33. https://thinklucid.com/product/helios-time-of-flight-imx556/

34. S. Becker, J. Bobin, and E. J. Candès, “NESTA: A Fast and Accurate First-Order Method for Sparse Recovery,” SIAM J. Imaging Sci. 4(1), 1–39 (2011). [CrossRef]

35. A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their performance,” IEEE Trans. Commun. 43(12), 2959–2965 (1995). [CrossRef]

36. J. P. Dumas, M. A. Lodhi, W. U. Bajwa, and M. C. Pierce, “Computational imaging with a highly parallel image-plane-coded architecture: Challenges and solutions,” Opt. Express 24(6), 6145–6155 (2016). [CrossRef]

Method	Reconstruction time (s)	Wall time (s)	Overall compression ratio
The proposed framework	3.4	5	7.09%
Method in [14]	30-120	120	25.04%

Method	Reconstruction time (s)	Wall time (s)	Overall compression ratio
The proposed framework	3.4	5	7.09%
Method in [14]	30-120	120	25.04%

Fast lightweight framework for time-of-flight super-resolution based on block compressed sensing

Abstract

1. Introduction

2. Fundamentals of ToF cameras

3. Methods

3.1 Architecture of the proposed system

3.2 Revised BCS-SPL method

3.2.1 System model for BCS

3.2.2 Revised SPL reconstruction

3.3 Detailed implementation

3.3.1 SR image acquisition process

3.3.2 Data transmission process

4. Results and discussion

4.1 BCS-ToF experimental system

4.2 Depth image reconstruction

4.2.1 SR image acquisition process

4.2.2 Data transmission process

4.2.3 Optimize the CR combination

4.3 Discussion

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (3)

Equations (6)

Optics Express

Method	PSNR (dB)	RMSE (mm)	Reconstruction error (mm)	Computing time (s)
Revised	35.88	2.21	0.66	7.8
BCS-SPL	35.88	2.21	0.66	7.8
NESTA	36.25	1.95	0.94	135.4
SAMP	30.57	3.32	1.51	65.7
GPSR	33.66	2.84	1.34	182.6