Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

ProDebNet: projector deblurring using a convolutional neural network

Open Access Open Access

Abstract

Projection blur can occur in practical use cases that have non-planar and/or multi-projection display surfaces with various scattering characteristics because the surface often causes defocus and subsurface scattering. To address this issue, we propose ProDebNet, an end-to-end real-time projection deblurring network that synthesizes a projection image to minimize projection blur. The proposed method generates a projection image without explicitly estimating any geometry or scattering characteristics of the projection screen, which makes real-time processing possible. In addition, ProDebNet does not require real captured images for training data; we design a “pseudo-projected” synthetic dataset that is well-generalized to real-world blur data. Experimental results demonstrate that the proposed ProDebNet compensates for two dominant types of projection blur, i.e., defocus blur and subsurface blur, significantly faster than the baseline method, even in a real-projection scene.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Digital image projection, a technique by which virtual information is projected onto a real environment, has many potential applications, such as augmenting a physical environment [13], diminishing information from a real environment [4,5], and projecting an interactive display on a real environment [6]. For the best performance, all these applications require the projected result to be focused. Efficient computation of generated projection images and the ability to project to non-planar or multiple surfaces are also important factors for many practical applications. This paper presents ProDebNet, a method to generate projection images that minimize projection blur in real-time via an end-to-end convolutional neural network (CNN).

In most cases, projection screens are not optimal; they are rarely planar, not always a single surface, and often have significant scattering characteristics. Non-planar screens can cause a non-uniform spatial blur in the projected result, primarily due to defocus blur. Subsurface scattering also makes projection images less sharp.

Aiming to minimize blurring in a projected result, many studies have investigated way to computationally manipulate the projection image [710]. The majority of these approaches utilize the fact that projection blur can be modeled by a convolution between the projected original image and the point spread function (PSF) [11]. Blur minimization using this model requires PSF estimation and solving the inverse problem using the estimated PSF. As shown in Table 1, all conventional methods have constraints that reduce real-time capability and/or image quality regarding these two procedures. Some approaches use coded projection patterns to estimate PSF [79], which is not feasible for real-time applications. Other methods use the Wiener filter to solve the inverse problem [8,10]. However, due to the limited dynamic range of projectors, the Wiener filter generally causes a ringing artifact in the projection image, which degrades image quality. To avoid such degradation, Zhang and Nayar solve the constrained optimization problem for every single pixel [7], which is computationally expensive.

Tables Icon

Table 1. Comparison of compensating image generation algorithms for projection blur

Differing from these conventional methods, this paper adopts a data-driven approach with an end-to-end CNN. We train ProDebNet to generate a projection image that compensates for projection blur while representing the desired information. In recent years, CNN-based image enhancement has achieved remarkable success, e.g., super-resolution [1214], deblurring [15,16], inpainting [17], and denoising [18]. CNN-based per-pixel projector radiometric compensation methods have also been proposed [1921]. However, to the best of our knowledge, ProDebNet is the first CNN-based image generation method to minimize projection blur. Given a single projected result and a target image, ProDebNet estimates a projection image that minimizes projection blur (Fig. 1). The advantage of the proposed approach is its feasibility for real-time applications. We do not explicitly estimate PSF for each pixel; therefore, a coded projection is not required.

 figure: Fig. 1.

Fig. 1. Projection blur compensation via ProDebNet with the simulated projection scene that has non-uniform distance from the virtual projector. Given (a) a target image and (b) an initial projected result of the target image, ProDebNet generates (c) a projection image that minimizes projection blur that occurred in (d) a projected result (see Section 2.3 for synthetic data generation condition).

Download Full Size | PDF

In addition, we also design a data synthesis and augmentation strategy with a simulated projection environment. Effective data-driven network training relies on the availability of a large-scale dataset. In projection contexts, generating large-scale datasets relies on real devices and, therefore, is often impractical and time-consuming. In addition, such datasets often lack a variety of scenes. Rather than using real captured data, we model the simulated blurred projected result with various projected surfaces and use the simulated blurred projected results to train ProDebNet.

The primary contributions of this study can be summarized as follows.

  • • To the best of our knowledge, we are the first to propose an end-to-end projection image generation pipeline that minimizes projection blur.
  • • We present ProDebNet, a projection image generation network that compensates projection blur. Since ProDebNet does not explicitly estimate PSF to generate projection image, it contributes to fast and coded pattern projection-less projection image generation.
  • • We describe a method for generating pseudo (i.e., synthetic) blurred/deblurred projected results. We propose several critical processes for augmenting training data to help the trained projection image better generalize to real-world blur data.
  • • We conduct extensive experiments with both synthetic and real captured data and show that the proposed method outperforms the baseline method. We also show that ProDebNet compensates both defocus blur and subsurface scattering blur.

2. Methodology

There are two key ideas behind ProDebNet. First, toward real-time and high-quality projection image generation that minimizes projection blur without any coded pattern projection, we propose an end-to-end CNN-based network pipeline (Section 2.2). Second, to make the proposed model applicable to a real-projected scene, we synthesize “pseudo-projected” scenes as training data that replicates the visual characteristics of the projection (Section 2.3).

2.1 Projection image generation using explicitly estimated PSF

Before we explain our CNN-based projection image generation, this section first introduces the general problem setting of projection image generation. Then, we explain the conventional way of projection image generation that minimizes projection blur.

Given a target image of the projected scene ${I_{trg}}$, the goal of projector deblurring is to compute an optimal projection image ${I^{*}_{pro}}$ that minimizes projection blur and represents projected result ${I^{*}_{res}}$ that is as similar as possible to ${I_{trg}}$. Without any blur compensation strategy, $I_{res}$ includes projection blur in many cases. This projection blur can be modeled using a PSF as follows:

$${I^{*}_{res}} = f \otimes {I^{*}_{pro}} + n,$$
where $\otimes$ is the convolutional operator, $f$ represents the PSF, and $n$ represents projection noise. Here, if the PSF is known or estimated, we can inversely solve Eq. (1) to generate ${I^{*}_{pro}}$ as follows:
$${I^{*}_{pro}} = f^{-1}\otimes({I_{trg}}-{n}).$$
This is the computational model of projection image generation in conventional projector deblurring techniques.

2.2 CNN-based projection image generation network

In this section, we introduce ProDebNet. Similar to the conventional approaches that the previous section shows, our goal is to generate an optimal projection image ${I^{*}_{pro}}$. However, rather than solving Eq. (2), which uses explicitly estimated PSF, the proposed method takes a learning-based approach. Since explicit PSF estimation requires coded pattern projection in addition to displaying the desired image, conventional approaches limit user experiences. ProDebNet avoids this step by leveraging a CNN-based pipeline that implicitly estimates PSF.

As shown in Fig. 2, given a target image ${I_{trg}}$ (Fig. 2(a)) and an observed initial projected result $I_{res}$ (Fig. 2(b)), ProDebNet outputs a projection image $\hat {I}_{pro}$ (Fig. 2(d)). We do not measure any geometric information directly. Instead, we observe $I_{res}$ to know the amount of initial projection blur that occurs without a projection blur minimization strategy, which implicitly reveals the geometry of the projection surface. This implicitly geometry understanding is considered to include estimating surface geometry, and estimating PSF to know how much each image region is blurred. Note, with the benefit of this data-driven approach, ProDebNet has the potential to minimize both defocus blur and subsurface scattering blur, which are two main causes of the projection blur. Controlling these two factors becomes possible because both types of blur can be represented mathematically with the same model.

 figure: Fig. 2.

Fig. 2. Proposed ProDebNet architecture. The network consists of two passes, (top) a sub-pass to extract target image features and input them to the main network aiming at reconstructing a sharper projection image and (bottom) the main pass to generate a projection image that reduces projection blur. Both generators share the same network architecture, which has 13 layers with consecutive convolutional layers with small convolution kernels of $3\times 3$ pixels.

Download Full Size | PDF

Inspired by the success of U-Net [22], ProDebNet comprises a contracting path to capture context and a symmetric expanding path that enables the network to keep localized features. There are two image generators in the network, as shown in Fig. 2. The bottom one generates a projection image $\hat {I}_{pro}$ (Fig. 2(d)), which is our output. The top one reconstructs a target image $\hat {I}_{trg}$ (Fig. 2(c)). This generator is introduced to pass the localized features of ${I_{trg}}$ to each layer of the $\hat {I}_{pro}$ generator aiming at contributing to generate a sharper projection image. The generators have the same network architecture, i.e., 13 layers with consecutive convolutional layers with small convolution kernels of $3\times 3$ pixels. The resolution of each input and reconstructed images is (x, y) = $256\times 256$ pixels.

With the variable $\theta$ that contains all trainable parameters, the training objective for ProDebNet takes the following form:

$$\mathcal{L}(\theta) = \mathcal{L}_{A}(\theta) + \lambda \mathcal{L}_{SSIM}(\theta),$$
where $\mathcal {L}$ denotes a loss function, and $\mathcal {L}_{A}$ and $\mathcal {L}_{SSIM}$ are sub-loss functions. $\mathcal {L}_{A}$ measures the differences between the target image ${I_{trg}}$ (Fig. 2(a)) and the projected result $\hat {I}_{res}$ (Fig. 2(e)). $\hat {I}_{res}$ is generated given the network output $\hat {I}_{pro}$. We explain the detailed procedure in Section 2.3. Please note $\hat {I}_{res}$ generation is required only for training. $\mathcal {L}_{SSIM}$ computes the structural similarity (SSIM) [23] loss between the input and reconstructed target image (Figs. 2(a) and (c)).

Here, as other image generation networks show [19,24], the choice of $\mathcal {L}_{A}$ defines the properties of the trained network. For image generation, a popular choice is to use SSIM [23], $\mathcal {\ell }_{1}$ norm of the differences [25], or generative adversarial networks (GAN) and trained content loss functions [26]. We test and analyze to find an optimal $\mathcal {L}_{A}$ (Section. 3.2).

2.3 Training data generation: pseudo-projected image synthesis

This section proposes a procedure for synthesizing pseudo-projected images that are used to train ProDebNet. Using pseudo-projected results has several advantages. First, the size of the dataset is bounded only by computation as it does not require manual projection scene settings. Second, diversity in the dataset can be increased by varying scene parameters, such as the shape or depth of the screen.

To mimic the actual projected result, we apply three processes: (1) convolution with Gaussian filters with different standard deviations, (2) intensity attenuation based on inverse-square law of projection light, and (3) adding non-uniform spatial noise (Fig. 3).

 figure: Fig. 3.

Fig. 3. Pseudo-projection design. Given (a) a target image, we generate a target image filtered with Gaussian distribution defined with reference map $\mathcal {R}_{g}$ to mimic projection blur as shown in (b). Then, we decrease the intensity of that image depending on reference map $\mathcal {R}_{i}$ and add Gaussian noise to generate (c) the pseudo-projected result.

Download Full Size | PDF

First, to simulate projection to any non-planar surface, the input image ${I_{trg}}$ (Fig. 3(a)) is spatially non-uniformly blurred by convoluting $\mathcal {G}(\mathcal {R}_{g})$, which is a reference map $\mathcal {R}_{g}$ filtered with Gaussian distribution $\mathcal {G}(\cdot )$. The reference map $\mathcal {R}_{g}$ represents the standard deviations of Gaussian distribution based on simulated pixel-wise distance from the focal plane of the virtual projector to define how much the target image should be blurred. Given a depth map $\mathcal {D}$ with the value range $[0, 255]$, $\mathcal {R}_{g}$ is expressed as follow:

$$\mathcal{R}_{g}(i,j) = \lambda_{g}|\mathcal{D}(i,j) - h|,$$
where $h$ takes a random value between 0 to 255 and $\lambda _{g}$ is a constant value that adjusts the dynamic range of $\mathcal {R}_{g}$. The motivation of introducing a random value $h$ that represents a non-uniform focal distance of the virtual projector is to generate a diverse depth-based reference map. Second, to represent light attenuation depending on the distance from the virtual projector, we model another reference map $\mathcal {R}_{i}$ with a depth map $\mathcal {D}$ as bellows:
$${\mathcal{R}_{i}(i,j) = \lambda_{i} (\mathcal{D}(i,j)/255)+ \alpha,}$$
where $\lambda _{i}$ is the constant value that adjusts the dynamic range of $\mathcal {R}_{i}$, and $\alpha$ represents an offset that prevents potential to be zero. The motivation of modeling $\mathcal {R}_{i}$ with $\mathcal {D}$ is to represent the signal attenuation. It takes larger values when the corresponding depth map has larger (deeper) values. Finally, we add non-uniform spatial noise $N_{i,j}$ that is modeled with Gaussian noise to mimic the real-projected scene. The generation of pseudo-projected result ${I_{res}}$ given a target image ${I_{trg}}$ via pseudo-projection function $P(\cdot )$ is then expressed as follows:
$$ {I_{res}}(i,j)= P({I_{trg}}(i,j)) $$
$$=\frac{1}{{\mathcal{R}_{i}(i,j)}^{2}}\{\mathcal{G}(\mathcal{R}_{g}(i,j))\otimes {I_{trg}}(i,j)\}+ N_{i,j} $$
In this paper, the value range of $\mathcal {R}_{i}(i,j)$ is $[1.00, 1.15]$, and the value range of $\mathcal {R}_{g}(i,j)$ is $[0.0, 3.0]$.

3. Quantitative evaluation with synthetic dataset

3.1 Experimental settings

Dataset. To train the network, we prepared 150M images that consist of 100K target images augmented with 1500 different types of pseudo projection defined by depth images. For the test dataset, we use 1000 images (100 target images augmented with 10 pseudo projection). We randomly selected depth images from the NYU Depth Dataset V2 [27] and followed Eq. (6) to generate pseudo-projected results.

Implementation details. We used Adam [28] with a learning rate of 1e-3 and momentum parameters $\beta _{1} = 0.9$, $\beta _{2}=0.999$ to optimize the network. Network weights were initialized following a previous study [29]. To train the proposed network, we used a workstation consisting of an 8 core CPU (i7-7820X 3.60 GHz), single GPU (NVIDIA TITAN Xp), and 64 GB of RAM. The training of ProDebNet took 66 minutes. Two baseline methods [7,8] were implemented in C++; the proposed method was implemented in Python. For a fair comparison, all the methods were tested in the same CPU-based computational environment (CPU: i7-5960X 3.00 GHz; RAM: 32 GB).

3.2 Loss functions

As described in Eq. (3), our loss function uses $\mathcal {L}_{A}$, which measures the differences between a target image and the projected result. In this section, we consider the optimal loss function to form $\mathcal {L}_{A}$ since the design of $\mathcal {L}_{A}$ defines the properties of the trained network.

Following two previous studies [19,24], we tested six designs for $\mathcal {L}_{A}$, i.e., (1) $\mathcal {L}_{\ell 1}$, (2) $\mathcal {L}_{SSIM}$, (3) $\mathcal {L}_{\ell 1} + \mathcal {L}_{SSIM}$, (4) $\mathcal {L}_{\ell 1} + \mathcal {L}_{GAN}$, (5) $\mathcal {L}_{SSIM} + \mathcal {L}_{GAN}$, and (6) $\mathcal {L}_{\ell 1} + \mathcal {L}_{SSIM} + \mathcal {L}_{GAN}$. Here, $\mathcal {L}_{\ell 1}$, $\mathcal {L}_{SSIM}$, and $\mathcal {L}_{GAN}$ are loss functions that use the $\mathcal {\ell }_{1}$ norm of the differences [25], SSIM [23], or a GAN content loss function [26], respectively. For the target image, we used ${I_{trg}}$ where the intensity of each pixel was reduced uniformly such that it fits within the range of intensity of the projected result.

Quantitative and qualitative results are shown in Table 2 and Fig. 4, respectively. As shown in Table 2, the model trained with $\mathcal {L}_{A} = \mathcal {L}_{SSIM}$ has the best scores for both PSNR and SSIM. Similar to the quantitative results, compared to other loss functions, $\mathcal {L}_{SSIM}$ reconstructs images that are most similar to the target images. Given these quantitative and qualitative results, we use $\mathcal {L}_{A} = \mathcal {L}_{SSIM}$ in the following experiments.

 figure: Fig. 4.

Fig. 4. Qualitative comparison of models trained with different loss functions. The first and the second columns represent the target image and the initial projected result without any blur compensation strategy, respectively. The third to eighth columns show the projected results with generated projection images via ProDebNet trained with different loss functions. As can be seen, images generated using the SSIM loss are the most similar to the target images.

Download Full Size | PDF

Tables Icon

Table 2. Quantitative results with different loss functions

3.3 Comparison with baseline methods

This section compares ProDebNet to a conventional Wiener filter-based technique [8] and a state-of-the-art projection image generation method proposed by Zhang and Nayar [7] as baselines. We evaluated all the methods in terms of computational efficiency and the quality of the projected result. Here, note that since Zhang and Nayar’s method solves the constrained problem iteratively, computational cost and performance are highly dependent on the number of iterations. Following Zhang and Nayar’s paper [7], we used 10 iterations.

Qualitative results are shown in Fig. 5. Without any projection blur minimization strategy, the projected results become quite blurry, as shown in Fig. 5(b). As shown in Figs. 5(c)-(e), the baseline and the proposed methods achieved less blurry projected results. Note particularly the structural line of the pyramid in the top row and the texture of the stone statue in the bottom row.

 figure: Fig. 5.

Fig. 5. Qualitative comparison of proposed method and baseline methods. (a) Target image, (b) initial projected result, (c) projected result with Wiener filter-based baseline method [8], (d) projected result with the state-of-the-art method [7], (e) projected result with the proposed method.

Download Full Size | PDF

Table 3 provides quantitative results in terms of PSNR, SSIM, and calculation time for projection image generation. Because the state-of-the-art method [7] directly minimizes the intensity differences between the target and the projection image, it provides a projected result with the image quality of almost the theoretical upper limit regarding PSNR. The PSNR of ProDebNet is lower than but comparable to that of the upper limit, while it is higher than that of the Wiener filter-based method [8]. Regarding SSIM, ProDebNet provides the same SSIM value as the state-of-the-art method, which indicates that the perceptual quality of projection blur compensation is the same for both methods. SSIM of the Wiener filter-based method is significantly lower than the other methods.

Tables Icon

Table 3. Quantitative comparison of PSNR, SSIM, and calculation time for baseline and proposed methods. Note that the results are on synthetic data with only the same CPU. Also note that the calculation times of the baseline methods do not contain the time required for computer vision-based PSF estimation process.

As shown in Table 3, the calculation time required for ProDebNet to generate a projection image is significantly less than that of the state-of-the-art method, i.e., ProDebNet is 300 times faster. On the other hand, the Wiener filter-based method works 5 times faster than the proposed method. Note that the calculation times of the baseline methods do not include any PSF estimation process which is additionally required in these conventional techniques. Note again that all the calculation times shown in Table 3 are the performances of the same CPU.

4. Experimental evaluation with captured images in physical setup

To further showcase the applicability of the proposed method, we also test ProDebNet on real captured data. We empirically investigate ProDebNet performance compared to two baseline methods [7,8] for scenes that include defocus and subsurface scattering blurs. Quantitative evaluations using PSNR and SSIM are not accurate in this experiment because the captured projected results need to be rectified and misalignment in the rectification is inevitable. Therefore, the projected image qualities are qualitatively evaluated. Specifically, the purpose of this experiment is to judge if ProDebNet can compensate for projection blurs with comparable performance as the baseline methods without requiring a dot pattern projection and an explicit PSF estimation.

4.1 Experimental setting

Dataset. The dataset used in the previous section was used to train the network. For test data, we used real captured data; we collected target images from the DIV2K dataset validation data [30] and captured actual projected results with the generated projection image.

Experimental environment. We used an ACER DLP projector PSV0808 (resolution: $856 \times 600$ pixels) and a Canon EOS Kiss X4 camera DS126271 (resolution: $2080 \times 1552$ pixels) for real data projection and capturing, respectively. The baseline methods were implemented in C++; the proposed method was implemented in Python. The baseline methods were tested in the same CPU-based computational environment (CPU: i7-5960X 3.00 GHz; RAM: 32 GB). To train the network and generate projection images in ProDebNet, we used an 8 core CPU (i7-7820X 3.60 GHz), single GPU (NVIDIA TITAN Xp), and 64 GB of RAM.

Projection surfaces. We prepared two different screen materials so that defocus and subsurface scattering blurs independently occurred, respectively. For the defocus blur, we used a projection screen (FirstScreen FSf-130n), which has a very low subsurface scattering property. We placed the surface diagonal to the projector’s optical axis to create spatially varying PSFs (Fig. 6(a)). For the subsurface scattering blur, we prepared a flat screen made of a material (polylactic acid) which has a strong subsurface scattering property. The projector faces front toward the screen so that there is only subsurface scattering blur (Fig. 6(b)).

 figure: Fig. 6.

Fig. 6. Experimental environments for compensation with (a) defocus blur and (b) subsurface scattering blur. (a) Projection surface is diagonal to the projector such that the surface has non-uniform distance from the projector. (b) Projection surface with strong scattering faces forward toward the projector.

Download Full Size | PDF

4.2 Results

Figure 7 shows the projected results. The first and second rows show the projection results under defocus blur, and the third and fourth rows show the results under subsurface scattering blur. The spatially varying PSFs were estimated by projecting a 17$\times$17 dot pattern in the baseline methods before computing the projection images.

 figure: Fig. 7.

Fig. 7. Results on an actual projection environment with defocus blur (first and second rows), and subsurface scattering blur (third and fourth rows). (a) Target image, (b) initial projected result, (c) projected result with Wiener filter-based baseline method [8], (d) projected result with the state-of-the-art method [7], (e) projected result with the proposed method.

Download Full Size | PDF

Due to the defocus and subsurface scattering blurs, the projected results without any compensation methods became strongly blurry (Fig. 7(b)). The projected results with the Wiener filter-based method [8] and the state-of-the-art method [7] compensated for the blurs and significantly improved the image quality (Figs. 7(c) and (d)). Figure 7(e) shows that ProDebNet also computed the projection images that could compensate for the blurs. In this experiment, ProDebNet took 15 ms to generate each projection image, and thus, achieved a real-time compensation (i.e., faster than 60 FPS). The blur compensation performances regarding the image quality were, at least subjectively, comparable among the baseline and the proposed methods.

5. Discussion and limitation

This paper presents the first end-to-end projection blur compensation technique. The simulation experiment (Section 3) shows that the proposed technique could compensate the projection blur with high perceptual image quality as the state-of-the-art technique [7], while significantly reducing the calculation time. Compared to the other baseline method based on Wiener filter [8], our network could display projected results with better image quality. Although our network was slightly slower than the Wiener filter-based method, the calculation time in Table 3 does not include the time required for the spatially varying PSF estimation from a captured dot pattern. The PSF estimation consists of the estimation of Gaussian function parameters for each dot and spatial interpolation of the estimated parameters. Therefore, actual calculation times might be almost the same between the Wiener filter-based and the proposed techniques. In the experiment with the physical setup (Section 4), we confirmed that the proposed method successfully compensates for both defocus and subsurface scattering blurs. We also found that ProDebNet works in real-time (i.e., faster than 60 FPS) on a GPU. Therefore, we believe the proposed technique is useful in projector deblurring in terms of image quality and processing time. In addition, compared to the conventional techniques, ProDebNet does not require unnatural dot pattern projections, which can significantly improve user experiences.

As a first attempt to generate a projection image to compensate projection blur with a data-driven approach, we acknowledge that, similar to other existing projection image generation approaches, many possible open problems can be explored to improve the blur compensation performance of our proposed method.

As the first limitation, the proposed network relies on initial projected results to generate a projection image for every input target image, even though it does not require any coded projections. In some real-world applications, the physical setup is fixed, and thus, the spatially varying PSFs do not change over time. Therefore, once the initial projected results are captured for the first few input target images, the initial projected results should not be required for the following input target images. Compared to such “static” applications, in “dynamic” projection mapping scenarios where the physical setup is not fixed, the spatially varying PSFs dynamically change. Therefore, initial projected results become continuously necessary throughout the applications. We believe our method is useful in such dynamic situations.

As the second limitation, the current network is designed to work only for images of a single channel (i.e., grayscale). Our technique might work for 3 channels (i.e., RGB) by feeding each channel separately into the network to achieve deblurring full color projected results. On the other hand, such a simple extension might fail because there is no guarantee that the original color balance is maintained in the projected result due to the channel-independent operation. Another option would be to separate the luminance and chrominance components of the target image, feed the luminance component (grayscale image) into the current network and combine the processed luminance component and the original chrominance component to generate the projection image. This might work because the human visual system has much more luminance acuity than chrominance acuity [31]. In both cases, we need further investigations, which are however out of the focus of this paper, firstly attempting to generate a projection image using CNN to compensate for projection blur. Extending this concept to full-color images is an interesting future direction.

6. Conclusion and future work

In this paper, we proposed ProDebNet, an end-to-end fast projection image generation network to minimize non-uniform spatial projection blur. We showed that the pseudo-projection image generation that we designed worked successfully to train ProDebNet without requiring real captured data. Even though ProDebNet is trained only with those synthetic data, we empirically demonstrated that our generated projection image could be effectively generalized to a real-projection environment. We believe our contributions mark a significant step for real-time dynamic projection mapping without any coded pattern projection.

In this study, we did not consider geometrical correction because the proposed method uses a coaxial projector-camera system there therefore does not require projector-camera calibration. In addition, since ProDebNet primarily focuses on projection blur compensation, which is the most critical degradation of the projected result, similar to other existing methods, we did not consider other degradation factors (e.g., ambient light, specular reflection, and lens distortion). However, we are aware that extending the proposed method to consider those factors would increase the method’s applicability to real-world projection environments. In future, we intend to extend the proposed approach to handle such degradation factors by introducing them to our pseudo-projected results synthesis. Other important future works are to conduct a thorough quantitative comparison between existing deep learning models and ProDebNet to show the necessity of its architecture design for projector deblurring, as well as to investigate the efficacy of introducing a pre-trained existing auto-encoder to our sub-pass network.

Funding

Japan Science and Technology Agency (JPMJPR19J2); Japan Society for the Promotion of Science (JP17H04691).

Disclosures

The authors declare no conflicts of interest.

References

1. A. Bermano, P. Brüschweiler, A. Grundhöfer, D. Iwai, B. Bickel, and M. Gross, “Augmenting physical avatars using projector-based illumination,” ACM Trans. Graph. 32(6), 1–10 (2013). [CrossRef]  

2. P. Punpongsanon, D. Iwai, and K. Sato, “Projection-based visualization of tangential deformation of nonrigid surface by deformation estimation using infrared texture,” Virtual Reality 19(1), 45–56 (2015). [CrossRef]  

3. G. Narita, Y. Watanabe, and M. Ishikawa, “Dynamic projection mapping onto deforming non-rigid surface using deformable dot cluster marker,” IEEE Trans. Vis. Comput. Graph. 23(3), 1235–1248 (2017). [CrossRef]  

4. M. Inami, N. Kawakami, and S. Tachi, “Optical camouflage using retro-reflective projection technology,” in Proceedings of Second IEEE and ACM International Symposium on Mixed and Augmented Reality, (IEEE, 2003), pp. 348–349.

5. T. Yoshida, K. Jo, K. Minamizawa, H. Nii, N. Kawakami, and S. Tachi, “Transparent cockpit: Visual assistance system for vehicle using retro-reflective projection technology,” in Proceedings of IEEE Virtual Reality Conference, (IEEE, 2008), pp. 185–188.

6. C. Harrison, H. Benko, and A. D. Wilson, “Omnitouch: wearable multitouch interaction everywhere,” in Proceedings of ACM symposium on User interface software and technology, (ACM, 2011), pp. 441–450.

7. L. Zhang and S. Nayar, “Projection defocus analysis for scene capture and image display,” ACM Trans. Graph 25(3), 907–915 (2006). [CrossRef]  

8. M. S. Brown, P. Song, and T.-J. Cham, “Image pre-conditioning for out-of-focus projector blur,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2006), pp. 1956–1963.

9. M. Grosse, G. Wetzstein, A. Grundhöfer, and O. Bimber, “Coded aperture projection,” ACM Trans. Graph. 29(3), 1–12 (2010). [CrossRef]  

10. Y. Oyamada and H. Saito, “Focal pre-correction of projected image for deblurring screen image,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2007), pp. 1–8.

11. P. Favaro and S. Soatto, “A geometric approach to shape from defocus,” IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 406–417 (2005). [CrossRef]  

12. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016). [CrossRef]  

13. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), pp. 4681–4690.

14. B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, (IEEE, 2017), pp. 136–144.

15. O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan: Blind motion deblurring using conditional adversarial networks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2018), pp. 8183–8192.

16. S. Nah, T. Hyun Kim, and K. Mu Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (2017), pp. 3883–3891.

17. S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Globally and Locally Consistent Image Completion,” ACM Trans. Graph. 36(4), 1–14 (2017). [CrossRef]  

18. H. Zhang, V. Sindagi, and V. M. Patel, “Image de-raining using a conditional generative adversarial network,” in IEEE Trans. Circuits. Syst. Video Technol. (2019). (to be published).

19. B. Huang and H. Ling, “End-to-end projector photometric compensation,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2019), pp. 6810–6819.

20. B. Huang and H. Ling, “Compennet++: End-to-end full projector compensation,” in Proceedings of IEEE International Conference on Computer Vision, (IEEE, 2019), pp. 7165–7174.

21. B. Huang and H. Ling, “Deltra: Deep light transport for projector-camera systems,” https://arxiv.org/abs/2003.03040.

22. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, (Springer, 2015), pp. 234–241.

23. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]  

24. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), pp. 5967–5976.

25. K. Hammernik, F. Knoll, D. K. Sodickson, and T. Pock, “L2 or not l2: impact of loss function design for deep learning mri reconstruction,” in Proceedings of International Society of Magnetic Resonance in Medicine, (Wiley Online Library, 2017), p. 0687.

26. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of Advances in Neural Information Processing Systems, (Curran Associates, Inc., 2014), pp. 2672–2680.

27. P. K. Nathan Silberman, D. Hoiem, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Proceedings of European Conference on Computer Vision, (Springer, 2012), pp. 746–760.

28. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” https://arxiv.org/abs/1412.6980.

29. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of IEEE International Conference on Computer Vision, (IEEE, 2015), pp. 1026–1034.

30. E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, (IEEE, 2017), pp. 1122–1131.

31. M. S. Tooms, Colour Reproduction in Electronic Imaging Systems (John Wiley & Sons, Ltd., 2016).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. Projection blur compensation via ProDebNet with the simulated projection scene that has non-uniform distance from the virtual projector. Given (a) a target image and (b) an initial projected result of the target image, ProDebNet generates (c) a projection image that minimizes projection blur that occurred in (d) a projected result (see Section 2.3 for synthetic data generation condition).
Fig. 2.
Fig. 2. Proposed ProDebNet architecture. The network consists of two passes, (top) a sub-pass to extract target image features and input them to the main network aiming at reconstructing a sharper projection image and (bottom) the main pass to generate a projection image that reduces projection blur. Both generators share the same network architecture, which has 13 layers with consecutive convolutional layers with small convolution kernels of $3\times 3$ pixels.
Fig. 3.
Fig. 3. Pseudo-projection design. Given (a) a target image, we generate a target image filtered with Gaussian distribution defined with reference map $\mathcal {R}_{g}$ to mimic projection blur as shown in (b). Then, we decrease the intensity of that image depending on reference map $\mathcal {R}_{i}$ and add Gaussian noise to generate (c) the pseudo-projected result.
Fig. 4.
Fig. 4. Qualitative comparison of models trained with different loss functions. The first and the second columns represent the target image and the initial projected result without any blur compensation strategy, respectively. The third to eighth columns show the projected results with generated projection images via ProDebNet trained with different loss functions. As can be seen, images generated using the SSIM loss are the most similar to the target images.
Fig. 5.
Fig. 5. Qualitative comparison of proposed method and baseline methods. (a) Target image, (b) initial projected result, (c) projected result with Wiener filter-based baseline method [8], (d) projected result with the state-of-the-art method [7], (e) projected result with the proposed method.
Fig. 6.
Fig. 6. Experimental environments for compensation with (a) defocus blur and (b) subsurface scattering blur. (a) Projection surface is diagonal to the projector such that the surface has non-uniform distance from the projector. (b) Projection surface with strong scattering faces forward toward the projector.
Fig. 7.
Fig. 7. Results on an actual projection environment with defocus blur (first and second rows), and subsurface scattering blur (third and fourth rows). (a) Target image, (b) initial projected result, (c) projected result with Wiener filter-based baseline method [8], (d) projected result with the state-of-the-art method [7], (e) projected result with the proposed method.

Tables (3)

Tables Icon

Table 1. Comparison of compensating image generation algorithms for projection blur

Tables Icon

Table 2. Quantitative results with different loss functions

Tables Icon

Table 3. Quantitative comparison of PSNR, SSIM, and calculation time for baseline and proposed methods. Note that the results are on synthetic data with only the same CPU. Also note that the calculation times of the baseline methods do not contain the time required for computer vision-based PSF estimation process.

Equations (7)

Equations on this page are rendered with MathJax. Learn more.

I r e s = f I p r o + n ,
I p r o = f 1 ( I t r g n ) .
L ( θ ) = L A ( θ ) + λ L S S I M ( θ ) ,
R g ( i , j ) = λ g | D ( i , j ) h | ,
R i ( i , j ) = λ i ( D ( i , j ) / 255 ) + α ,
I r e s ( i , j ) = P ( I t r g ( i , j ) )
= 1 R i ( i , j ) 2 { G ( R g ( i , j ) ) I t r g ( i , j ) } + N i , j
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.