DH-GAN: a physics-driven untrained generative adversarial network for holographic imaging

Xiwen Chen; Hao Wang; Abolfazl Razi; Michael Kozicki; Christopher Mann

doi:10.1364/OE.480894

1. Introduction

Digital holography (DH) is a commonly-used technique to exploit the 3D shape of microscopic objects, something not feasible with regular cameras. This powerful technique is used in various applications, including micro-particle measurement [1,2], biology [3], encryption [4], and visual identification tags [5]. The core idea behind DH is that a laser beam with a plane wavefront experiences diffraction and phase shift when it encounters a microscopic object. The interfering wave intensity, also called hologram, is captured by a charge-coupled device (CCD) sensor array. The goal of DH is to reconstruct the object’s 3D shape by processing the captured holograms [6,7]. In short, if $\mathbf {O}(x,y)$ and $\mathbf {H}(x,y)$ are the object wave and the captured hologram, the goal is recovering $\mathbf {O}(x,y)$ (or equivalently the 3D facet of the sample) from $\mathbf {H}(x,y)$, which involves twin-image removal.

Compared to off-axis holography, digital inline holography (DIH) entails a much easier hologram rendering method by emitting only one beam through the object and processing the diffracted wave. However, it requires more complex numerical methods for phase recovery to deconvolve the spatially overlapping zero-order and cross-correlated holographic terms. Some methods rely on taking multiple images at different positions to enhance the phase recovery performance [8]. In this work, we use transparent inline holography (Fig. 1) with single-shot imaging and numerical reconstruction for its more straightforward design and potential for developing low-cost compact and portable readers appropriate for Internet of Things (IoT) and supply chain applications [9], especially for dendritic tags, our custom-designed visual identifiers [10].

Fig. 1. The typical in-line digital holography setups.

Download Full Size | PDF

1.1 Related work on conventional DH phase recovery

Recently, a physics-driven Compressive sensing (CS) based method has been proposed to solve the twin image problem using single-shot imaging [11]. Specifically, they observed that the real object wave $\mathbf {R}\mathbf {R}^*\mathbf {O}=\mathbf {O}$ has sharp edges, while the twin virtual image $RRO^*$ is diffused when mapped to their sparse representation. Here, $\mathbf {O}$ and $\mathbf {R}$ represent the object and reference waves, and $*$ is the complex conjugate operator. The total variation (TV) loss is applied to the complex-valued object wave to impose sparsity. Moreover, a two-step iterative shrinkage/thresholding (TwIST) algorithm is used to optimize the objective function $\hat {\mathbf {U}}=\arg \min _{\mathbf {U}}\left \{\frac {1}{2}\|\mathbf {H}-T_f(\mathbf {\mathbf {U}})\|_{2}^{2}+\tau \|\mathbf {U}\|_{t v}\right \}$, where $T_f$ is the forward propagator, $\|.\|_2$ is the 2nd norm, $\|\cdot \|_{t v}$ is the total variation norm, and $\tau$ is a tuning parameter. This method is more efficient than the iterative methods, hence is used as a benchmark method in some recent papers [5,12] including our comparative results in this paper. However, it suffers from a few technical issues. For example, imposing explicit sparsity constraints can cause the edge distortion problem. Moreover, the results are sensitive to the choice of $\tau$.

1.2 Related work on deep learning-based DH

Recently, deep learning (DL) methods have been used for computational holography due to their superior performance in many visual computing and image processing tasks [13,14]. In contrast to the conventional phase recovery algorithms that mainly rely on theoretical knowledge and phase propagation models, supervised DL methods often use large-scale datasets for training a black-box model to solve the inverse problem numerically. Therefore, prior knowledge about the propagation model and the system parameters is not necessary to construct DL networks [15].

For example, the authors of [16–22] used different implementations of convolutional neural networks (CNN) taking advantage of the CNN’s capability in developing multi-frequency and multi-scale feature maps. They usually customize the network or apply proper regularization terms to reconstruct the object wave from the captured hologram. For instance, [20] proposed a Y-like network with two output heads that can reconstruct intensity and phase information simultaneously. Digital holographic reconstruction is extended to multi-sectional objects in [21]. More recently, spatial Fourier transform modules are utilized in addition to convolutional layers to handle spatial information better [22]. A generative adversarial network (GAN) is proposed in [23] to generate bright-field microscopy at different depths free of the artifacts and noise from the captured hologram. The GAN network learns the statistical distribution of the training samples. Although their one-shot inference was fast, the training time was fairly long, using about 6,000 image pairs (30,000 pairs after data augmentation).

Supervised DL methods, including the aforementioned methods, offer superior performance in phase recovery. Nevertheless, they usually suffer from the obvious drawback of reliance on relatively large datasets for training purposes. For instance, the models in [16,17] require about 10,000 training pairs. This requirement becomes problematic since such huge DH datasets rarely exist for different sample types. Even if such datasets exist, the training time can be prohibitively long for time-sensitive applications. For instance, the training time with a typical GPU: GeForce GTX 1080 is about 14.5 hours for the model proposed in [18]. Since the training process is not transferable and should be repeated for different setups and sample types, such a long training phase is not practically desirable. In some other applications, such as authenticating objects using nano-scaled 3D visual tags, data sharing can be prohibited for security reasons [10].

To address the scarcity of paired DH samples, some recent works utilize unpaired data (unmatched holograms and samples) to train their network [24]. Specifically, a cycle-generative adversarial network (CycleGAN) is employed in [24] to reconstruct the object wave from the hologram by training the model with holograms (denoted as domain $\mathcal {X}$) and unmatched objects (denoted as domain $\mathcal {Y}$). Particularly, two generators are used to learn the functions $\mathcal {X}\rightarrow \mathcal {Y}$ and $\mathcal {Y}\rightarrow \mathcal {X}$. A consistency loss is used to enforce the training progress $\mathcal {X}\rightarrow \mathcal {Y}\rightarrow \hat {\mathcal {X}}\approx \mathcal {X}$. A similar method based on CycleGAN, called PhaseGAN, is proposed in [25], which used unpaired data for training. The near-field Fresnel propagator [26] is employed as part of their framework. Although these methods do not require matched object-hologram samples, they still need large datasets of unmatched hologram samples in the training phase.

Considering the difficulties of developing large DH datasets, some attempts have been made recently to create unsupervised learning frameworks [5,27,28]. Most of these frameworks utilize CNN architectures as their backbones since they can capture sufficient low-level image features to reproduce uncorrupted and realistic image parts [29]. Often, a loss function is employed to minimize the distance between the captured hologram and the artificial hologram obtained by forward-propagating the recovered object wave. For example, our previous work [5] uses an hourglass encoder-decoder structure to reconstruct the object wave from DIH holograms. Inspired by the Deep decoder concept proposed in [27], the reconstruction algorithm in [28] abandoned the encoder part and only used a decoder with a fixed random tensor as its input. Some classical regularization methods, such as total variation (TV) loss and weight decay, are applied to partially solve the noisy and incomplete signal problem. PhysenNet used a U-net architecture [30] to retrieve the phase information [31]. Most recently, an untrained CNN-based network is employed in dual-wavelength DIH, which benefits from the CNN’s capability of image reconstruction and denoising and the Dual-wavelength setup’s capability of phase unwrapping [12].

1.3 Summary of our contributions

Despite their innovative design and reconstruction efficiency, most of these methods suffer from critical shortcomings. First, these untrained networks often use a loss function based on the mean-squared errors (MSE), $L2$-norm, or similar distance measures between the captured hologram and the reproduced hologram. This class of loss functions is not capable of measuring structural similarities [32] and is not fully consistent with the human perception. The perceptual loss, proposed in [33], uses a pre-trained feature extraction backbone to measure the loss, as a reasonable solution for this matter. Inspired by this work in developing a semantic similarity measure, we propose an untrained and physics-driven learning framework based on GAN architecture for one-shot DH reconstruction. In our method, the discriminator network contributes a learnable penalty term to evaluate the similarity between the reproduced and the captured holograms. As we will discuss later in section 3.1, the role of the generator network in our network is textitfunction approximator to model the inverse of the hologram generation process (i.e., mapping the hologram to complex-valued object wave), as opposed to general GANs, where the generator network learns the data distribution to create new samples from noise.

Another drawback of most aforementioned DL-based methods is their lack of interpretability and ignorance of physics knowledge. Therefore, there are always two risks (i) over-fitting and (ii) severe performance degradation under minor changes to sample characteristics and test conditions. We address these issues in two different ways. First, we incorporate forward and backward propagation into our model, following some recent works [5,28,31]. Secondly, we implement a new spatial attention module using an adaptive masking process to split the object pattern into foreground and background regions and impose smoothness on the image background. The background mask update is performed based on the reconstructed object wave quality to be regulated by simulated annealing (SA) optimization to start from more aggressive updates and settle with more conservative changes when the network is converged. Imposing smoothness constraint on the gradually-evolving background area, makes our method fundamentally different than some iterative methods that enforce physics-driven hologram formation equations on the support region (i.e., the foreground) [34,35] or the entire image [36].

We show that our framework is generic and independent of the choice of the generator network. In particular, we tested our framework with two recently developed generators, the fine-tuned version of DeepDIH [5] and the deep compressed object decoder (DCOD) [28]. We also show that adding a super-resolution layer to the utilized auto-encoder (AE) improves the quality of the phase recovery.

This paper is organized as follows. Section 2 reviews the hologram formation process and recasts it as a nonlinear inverse problem. Section 3 elaborates on the details of the proposed DL method for phase recovery, highlighting its key features and differences from similar methods. Experimental results for simulated holograms, publicly available samples, and our dendrite samples are presented in Section 4 followed by concluding remarks in Section 5.

2. Problem formulation

The goal of this work is to design an unsupervised physics-driven DL network to reconstruct the 3D surface of microscopic objects, especially dendrites, micro-scaled security tags used to protect supply chains against cloning and counterfeit attacks (see Section 4.4 for details of dendrites).

The incident wave passing through a thin transparent object can be characterized as a complex-valued value

(1)$$\mathbf{O}(x,y;z=0)= \mathbf{R}(x,y;z=0)t(x,y),$$

where $\mathbf {R}(x,y;z=0)$ is the reference wave (i.e., the incident wave if the object is not present) and $t(x,y)=A(x,y)\text {exp}({j}\phi (x,y))$ is the incurred perturbation term caused by the object. $t(x,y)$ includes attenuation $A(x,y)$ and phase shift $\phi (x,y)$ [37]. After performing forward-propagation described by the angular spectrum method at distance $z=d$, $\mathbf {O}(x,y;z=d)$ is formed as follows

(2)$$\begin{aligned} \mathbf{O}(x,y;z=d) &= \mathbf{p}(\lambda,z=d)\circledast \mathbf{O}(x,y;z=0) \\ & = \mathcal{F}^{{-}1}\{\mathbf{P}(\lambda,z=d)\cdot\mathcal{F}\{\mathbf{O}(x,y;z=0)\}\}, \end{aligned}$$

where $\lambda$ represents the wavelength and $\circledast$ is the convolution operator. $\mathcal {F}\{\cdot \}$ and $\mathcal {F}^{-1}\{\cdot \}$ denote the direct and inverse Fourier transforms, respectively. Here, $\mathbf {P}(\lambda,z)= \mathcal {F}\{\mathbf {p}(x,y,z)\}$ is the transfer function, defined as

(3)$$\mathbf{P}(\lambda,z) = \exp \left(\frac{2 \pi j z}{\lambda} \sqrt{1-\left(\lambda f_{x}\right)^{2}-\left(\lambda f_{y}\right)^{2}}\right),$$

where $f_{x}$ and $f_{y}$ denote the spatial frequencies. The formed hologram in the detector plane is

(4)$$\mathbf{H}(x,y;\lambda,z) = |\mathbf{p}(\lambda,z=d)\circledast (\mathbf{O}(x,y;z=0)+\mathbf{R}(x,y;z=0))|^2.$$

Our ultimate goal is to recover the object-related perturbation $t(x,y)$ or equivalently the complex-values object wave $\mathbf {O}(x,y)$ from the captured hologram $\mathbf {H}(x,y)$, that is consistent with Eqs. (1–4).

3. Proposed method

The essence of our method relies on using a GAN-based architecture with several key modifications. More specifically, consider a chain $\mathbf {O} \stackrel {F(\cdot )}{\longrightarrow } \mathbf {H_0} \stackrel {P_z(\cdot )}{\longrightarrow } \mathbf {H} \stackrel {P_z^{-1}(\cdot )}{\longrightarrow } \mathbf {H_0 } \stackrel {G_W(\cdot )}{\longrightarrow }\mathbf { \hat {O}} \stackrel {\tilde {F}(\cdot )}{\longrightarrow } \mathbf {\hat {H}_0} \stackrel {P_z(\cdot )}{\longrightarrow } \mathbf {\hat {H}}$ (Fig. 2), where $\mathbf {O} \in \mathbb {R}^{ h \times w \times 2}$ is the inaccessible and unknown complex-valued object wave with height $h$ and width $w$, $\mathbf {H_0} \in \mathbb {R}^{h \times w \times 1}$ is the produced hologram in the object plane, and $\mathbf {H} \in \mathbb {R}^{h \times w \times 1}$ is the hologram in the sensor plane. Similarly, $\mathbf {\hat {O}}$, $\mathbf {\hat {H}}_0$, $\mathbf {\hat {H}}$, are the reconstructed versions of the object wave, the hologram in the object plane, and the hologram in the sensor plane. It is noteworthy that a classic phase unwrapping algorithm based on fast Fourier transform [38] is applied to the phase of $\mathbf {\hat {O}}$. Forward and backward angular spectrum propagation (ASP) according to Eqs. (2) and (3) are represented by $P_z(\cdot )$ and $P_z^{-1}(\cdot )$. Likewise, $F(\cdot ): \mathbb {R}^{h \times w \times 2} \mapsto \mathbb {R}^{h \times w \times 1}$ represents the hologram formation according to Eqs. (1)-(4). Our goal is to develop a generator network $G_w(\cdot ): \mathbb {R}^{h \times w \times 1} \mapsto \mathbb {R}^{h \times w \times 2}$ that models the inverse of the hologram formation process to reproduce the object wave $\mathbf {\hat {O}}$ as close as possible to $\mathbf {O}$ under some distance measure $d(\mathbf {\hat {O}},\mathbf {O})$. However, we can not quantify $d(\mathbf {\hat {O}},\mathbf {O})$ since $\mathbf {O}$ is inaccessible. To address this issue and noting that the hologram formation process $F(\cdot )$ is known, we apply the same process to the reconstructed wave $\mathbf {\hat {O}}()$ to obtain a corresponding reproduced hologram $\mathbf {\hat {H}}=P_z\big (\tilde {F}[G_W(P_z^{-1}(\mathbf {H}))]\big )$. Then, we use the surrogate distance $d(\mathbf {\hat {H}},\mathbf {H})$ instead of $d(\mathbf {\hat {O}},\mathbf {O})$ to assess the reconstruction quality. Finally, note that we used $\tilde {F}(\cdot )$ for numerical hologram formation to account for minor differences with the real hologram formation for parameter mismatch $\lambda, z$, and for adopting some idealistic assumptions (e.g., plane wavefront).

Fig. 2. The overall block diagram of the hologram formation along with the proposed DL architecture for phase recovery.

Download Full Size | PDF

3.1 Optimization through loss function

Figures 2 and 3 present the details of the proposed DL Architecture for DIH phase recovery. The loss term used in the generator network $G_W(\cdot )$ includes the following components.

Fig. 3. The overall framework of our untrained GAN-based network which consists of AE-based generator network $G$, a discriminator network $D$, and a SA-based adaptive masking module.

Download Full Size | PDF

• One term is the MSE distance between the reproduced and the captured hologram $d_1(\mathbf {\hat {H}},\mathbf {H})=MSE(\mathbf {\hat {H}},\mathbf {H})$ used to directly train the AE-based generator, following the physics-driven methods [5,28,31].
• Noting the limitations of MSE and 2nd norm, we also use a discriminator network $D_W(): \mathbb {R}^{h \times w \times 2} \mapsto \mathbb {R}^{1}$ to produce a learnable penalty term by maximizing the confusion between the reproduced and captured holograms that can be an indicator of the quality of the hologram. Suppose $D_W(\mathbf {H})$ and $D_W(\mathbf {\hat {H}})$ be the probability of the captured and reproduced holograms being real. Then, we must maximize the first term and minimize the second term when training the discriminator to distinguish between the real and numerically-regenerated holograms. However, we maximize the second term when training the generator to make the reproduced holograms as close as possible to the captured hologram to fool the discriminator. This is equivalent to the conventional GAN formulation $(5)$$\mathcal{L} =\min_{G_W}\max_{D_W}\mathbb{E}_{x\sim P_{data}}\log[D_W(x)]+\mathbb{E}_{z\sim p_z}\log[1-D_W(G_W(z))],$$$ with a few modifications.
• Finally, to incorporate our prior knowledge, we use a new term that imposes smoothness on the image background. This is to embrace the fact that in most real scenarios, the samples are supported with a transparent glass slide, meaning that the background of the reconstructed object should present no phase shift. In other words, $t(x,y)=A(x,y)e^{\Phi (x,y)}=1 \Rightarrow \mathbf {O}(x,y)=\mathbf {R}(x,y)$ based on Eq. (1), which means zero phase shift in the object wave for all pixels out of the object boundary, $(x,y)\notin \mathcal {S}$. This approach is inspired by the physics-informed neural networks (PINN) [39] that use boundary conditions to solve partial differential equations (PDEs). Our approach to detecting image background is discussed in Section 3.2.

To summarize, the proposed network aims to solve the following optimization problem:

(6)$$\begin{aligned} \mathcal{L}=\min _{G_W} \max _{D_W}& \log [D_W(\mathbf{H})]+ \log [1-D_W(\mathbf{\hat{H}})]\\ +&\lambda_1 \mathcal{L}_{Auto}(\mathbf{\hat{H}})+\lambda_2 \mathcal{L_{B}}(G_W(\mathbf{H})), \end{aligned}$$

where $\mathbf {\hat {H}}=P_z\big (\tilde {F}[G_W(P_z^{-1}(\mathbf {H}))]\big )$ denotes the reproduced hologram and noting its value depends on the optimizer $G_W$. The first two terms represent the GAN framework loss with the ultimate goal of making the generator $G_W()$ as close as possible to the inverse of hologram formation $F^{-1}()$ through iterative training of the generator and discriminator networks. We have used an auto-encoder architecture for the generator following our previous work [5], whose loss function is represented by $\mathcal {L}_{Auto}$. Likewise, $\mathcal {L_{B}}$ represents the background loss term for points out of the object mask $p \notin \mathcal {S}$ with $\lambda _1$ and $\lambda _2$ being tuning parameters.

In the training phase, the loss of $G_w$ and $D_w$ are minimized sequentially,

(7)$$\begin{aligned}\mathcal{L}_{G_w} &= \min_G \log [1-D_W(\mathbf{\hat{H}})]+\lambda_1 \mathcal{L}_{Auto}(\mathbf{\hat{H}})+\lambda_2 \mathcal{L_{B}}(G_W(\mathbf{H})) \\ \mathcal{L}_{D_w} &= \max_D\log [D_W(\mathbf{H})]+ \log [1-D_W(\mathbf{\hat{H}})]. \end{aligned}$$

To avoid the lazy training of the generator and achieve larger gradient variations, especially at the beginning training steps, we solve the following equivalent optimization problem

(8)$$\mathcal{L}_{G_w} = \min_G -\log [D_W(\mathbf{\hat{H}})]+\lambda_1 \mathcal{L}_{Auto}(\mathbf{\hat{H}})+\lambda_2 \mathcal{L_{B}}(G_W(\mathbf{H})).$$

Since this network has only one fixed input and target, the GAN structure aims to map the input to a reproduced domain as close as possible to the target, even without the $\mathcal {L}_{Auto}$ and $\mathcal {L}_{B}$ terms. Adding these terms enhances the reconstruction quality by enforcing our prior knowledge. Besides, since the discriminator $D_w$ would extract deep features via its multiple convolutional layers, compared with the MSE loss or $L2$ loss, its similarity evaluation would intuitively be more meaningful. Thus, the network would learn a more robust translation from the digital hologram to the object wave.

The auto-encoder loss term $\mathcal {L}_{Auto}(G_W(H))$ in Eq. (6) is used to directly minimize the gap between the captured hologram and the numerically reconstructed hologram, independent from the utilized discriminator.

(9)$$\mathcal{L}_{Auto}(\mathbf{\hat{H}})=d_{MSE}(\mathbf{H},\mathbf{\hat{H}})=\frac{1}{h \times w }\|\mathbf{H}-\mathbf{\hat{H}}\|_{2}^{2}$$

where the captured and reconstructed holograms ($\mathbf {H}$, $\mathbf {\hat {H}}$) are representatives of the AE input and output after proper propagation.

Finally, we use total variation (TV) loss to enforce smoothness on the image background, or simply the pixels $p=(x,y) \notin \mathcal {S}$ out of the region of interest (ROI), or the image foreground. This incorporates our prior knowledge about zero-shift for background pixels beyond ROI, and improves the reconstruction quality. The TV loss for complex-valued 2D signal $z$ is

(10)$$\mathcal{L}_{B}(z)= \int_{z \in \Omega_B} \big(|\nabla\Re(z)|+|\nabla\Im(z)|\big) \mathbf{d}x\mathbf{d}y,$$

where $\Omega _B$ denotes the support set of $z$, and $\Re (z)$ and $\Im (z)$ denote the real and imaginary parts of $z$, respectively. In our case, the points z are taken from $\tilde {F}(G_W(P_z^{-1}(H)))$ and $\Omega _B= \{(x,y)|1\leq x \leq w, 1\leq y \leq h, (x,y) \notin \mathcal {S}\}$.

For discrete signals, we use the approximation $|\nabla _x\Re (z)|=|\Re (z)_{x+1, y}-\Re (z)_{x, y}|$. Noting $|\nabla \Re (z)|=(|\nabla _x\Re (z)|_2^2+|\nabla \Re _y(z)|_2^2\big )^{1/2}$, Eq. (10) converts to

(11)$$\begin{aligned} \mathcal{L}_{B}(z) =\frac{1}{|\Omega_B|}\sum_{x,y\in \Omega_B} &\big(\left|\Re(z)_{x+1, y}-\Re(z)_{x, y}\right|^2+\left|\Re(z)_{x, y+1}-\Re(z)_{x, y}\right|^2\big)^{1/2}\\ &+ \big(\left|\Im(z)_{x+1, y}-\Im(z)_{x, y}\right|^2+\left|\Im(z)_{x, y+1}-\Im(z)_{x, y}\right|^2\big)^{1/2}, \end{aligned}$$

where $|\Omega _B|$ is the cardinality (the number of points) of set $\Omega _B$. For simplicity, we skip the square root operation, and use the following version, which is computationally faster.

(12)$$\begin{aligned} \mathcal{L}_{B} =\frac{1}{|\Omega_B|}\sum_{x,y\in \Omega_B} &\left|\Re(z)_{x+1, y}-\Re(z)_{x, y}\right|^2+\left|\Re(z)_{x, y+1}-\Re(z)_{x, y}\right|^2\\ &+ \left|\Im(z)_{x+1, y}-\Im(z)_{x, y}\right|^2+\left|\Im(z)_{x, y+1}-\Im(z)_{x, y}\right|^2. \end{aligned}$$

The details of the adaptive masking to define the ROI is discussed below.

3.2 Adaptive masking by K-means and simulated annealing

The background loss $\mathcal {L}_{B}$ in Eq. (6) operates on the background area of the output image, as shown in Fig. 4. The background area is determined by a binary mask $M^{(t)}$, where $t$ is a discrete number denoting the mask update time point. To this end, a binary mask $\hat {M}^{(t)}$ is developed by applying K-means segmentation (with $K$=2) to $|\mathbf {\hat {O}}_0|$, the amplitude of the reconstructed object wave at $z=0$. We consider the resulting mask as a "proposal mask", which may or may not be accepted. Rejection means that we use the previously formed mask $M^{(t-1)}$ to calculate the background loss. To avoid instability of the results and unnecessary mask updates, we use a mechanism that tends to make more frequent (aggressive) updates at the beginning and less frequent (conservative) updates when the algorithm converges to reasonably good results. A natural way of implementing such a mechanism is using simulated annealing (SA) algorithm where the variation rate decline is controlled by temperature cooling.

Fig. 4. The block-diagram of the adaptive segmentation to create background loss. The operator $\otimes$ denotes element-wise multiplication, indicating that all operations are only applied on the background area. The mask update process is explained in Section 3.2.

Download Full Size | PDF

The SA algorithm is initialized by temperature $T_0$ for time $t=0$. We also set the first mask $M^{(0)}=[1]_{h \times w}$, assuming no foreground is detected yet.

To update the mask at time $t=1,2,3,\dots$, we compare the MSE distance between the reproduced hologram $\hat {H}$ and the captured hologram $H$ on the background areas determined once by the previous mask $M^{(t-1)}$ and next by the current mask proposal $\hat {M}^{(t)}$. Mathematically, we compute $\delta _{t-1} = d_\text {MSE}(\mathbf {H},\mathbf {\hat {H}}; M^{(t-1)})$, and $\hat {\delta }_t = d_\text {MSE}(\mathbf {H},\mathbf {\hat {H}}; \hat {M}^{(t)})$. Inequality $\hat {\delta }_t< \delta _{t-1}$ means that the consistency between the captured and reconstructed holograms improves by using the current mask proposal, so we accept the proposal and update the mask $M^{(t)}=\hat {M}^{(t)}$. Otherwise, we lower the temperature as $T_t=T_{t-1}/\log (1+t)$, and then update the mask with Probability $e^{-(\hat {\delta }_t- \delta _{t-1}) / T_t}$. It means that as the time passes, the update probability declines. The summary of Algorithm 1 is presented below.

Algorithm 1. Adaptive Background Masking

View Table | View all tables in this article

The confirmed binary mask $M^{(t)}$ is used to determine the background area at time point $t$ for loss term $\mathcal {L}_{B}$ in Eq. (6), noting that the background area is flat and bears constant attenuation and phase shift. This provides additional leverage for the optimization problem to converge faster. This improvement is confirmed by our results in Section 3.2 (for instance, see Fig. 8 and Table 3).

3.3 Network architecture

The network consists of a generator $G$ and a discriminator $D$ (Fig. 2). Although the proposed framework is general and any typical generative network and binary classifier can be used for $G$ and $D$; here, we provide the details of the utilized networks for the sake of completeness. We use the modified version of the auto-encoder (AE) in [5] as our generator network (Fig. 3). The AE network consists of 8 convolutional layers in the encoder and 8 in the decoder part. Max pooling, and transposed convolution operators are used to perform downsampling and upsampling, respectively. One key modification we made is adding 2 more convolutional layers 1 more transposed convolutional layers to enable super-resolution, which brings further improvement at a reasonably low computation cost.

The discriminator network $D$ uses an architecture similar to the encoder part of the AE-based generator $G$. It consists of 8 convolutional layers, a global pooling layer, and a dense layer. It outputs a single value that represents the evaluation score. Batch Normalization [40] is used for both $G$ and $D$ to stabilize the training progress. The architectural details for $G$ and $D$ are given in Tables 1 and 2. To show the generalizability of the architecture, we also used DCOD [28]) as an alternative generator network in our experiments.

Table 1. The architectural details of generator $G$. It utilizes a hourglass autoencoder structure. $K_1$ and $K_2$ denote the kernel size and $C_{in}$ and $C_{out}$ denote the input channel and the output channel, respectively. Layers with * are used for super-resolution.

View Table | View all tables in this article

Table 2. The architectural details of discriminator $D$. It outputs the similarity of the input and the target.

View Table | View all tables in this article

The training strategy is shown in Fig. 5. The generator and discriminator are trained sequentially. However, to avoid the early convergence of the generator, we train the generator only once, then train the discriminator for 5 consecutive iterations. Note that the early convergence of the generator is not desirable, since any mediocre generator can produce artificial results that can fool a discriminator that has not yet reached its optimal operation. Therefore, we let the discriminator converge first and perform its best, then train the generator accordingly to produce accurate object waves from the captured holograms. The aforementioned masking update by the SA-based algorithm is performed after updating the generator. This does not occur after every update, but rather once after every $k$ update of the generator, as shown by the red intervals in Fig. 5.

Fig. 5. The training strategy. The masking update is activated once every $k=100$ intervals (shown by red). Each interval includes one iteration of the generator update (brown) followed by five iterations of the discriminator update (blue). If masking update is active (in red intervals), it is performed between the generator training and discriminator training (yellow).

Download Full Size | PDF

4. Experiment

In this section, we verify the performance of the proposed algorithm using simulated holograms, publicly available samples, and our dendritic tags.

4.1 Experiment setup

Our experiment setup is shown in Fig. 6. The laser module CPS532-C2 is used to generate a single wavelength (532 ${nm}$) laser beam with a round shape of diameter 3.5 ${mm}$. The laser module provides 0.9 ${mW}$ a typical USB port power. The USB-based powering facilitates taking clear holograms in normal conditions. We use a digital camera A55050U, which employs a $1/2.8^"$ Complementary Metal-Oxide Semiconductor (CMOS) sensor with 2.0 $\mu{m}$ $\times$ 2.0 $\mu{m}$ pixel size. This sensor provides picture quality with 22 frames per second (fps) at a resolution of 5 Mega-Bytes (2560 $\times$ 1920 pixels), which gives a 5120 $\mu{m}$ $\times$ 3840 $\mu{m}$ field of view (FOV). Rolling shutter and variable exposure time also provide convenience for fast and accurate imaging. Note that this architecture can be made compact by substantially lowering the distances for portable readers.

Fig. 6. Utilized experimental setup for in-line holography, (a) using two lenses to enlarge the beam intersection (b) sample test.

Download Full Size | PDF

As shown in Fig. 6(c), two convex lenses with focal lengths of $f_1=$ 25 ${mm}$ and $f_2=$ 150 ${mm}$ are applied to expand the laser beam, so that the laser beam fully covers the dendrite samples. The lenses located at distance $f_1+f_2=175 \,{mm}$ from one another, so their focal points collocate to retain the plane wavefront. The magnifying power of this system is $MP = \frac {f2}{f1} = \frac {150}{25} = {6}$, which enlarges the laser intersection diameter from 3.5 ${mm}$ to 21 ${mm}$. We use a viewing card at a distance of 20 ${ft}$ to verify the magnified beam is properly collimated.

In Fig. 6(b), a sample slide is placed on the sample holder; the laser beam passes through the sample, and propagates the hologram onto the sensor plane. The captured image is displayed on the computer in real-time and is fed to the proposed DL-based recovery algorithm. With an exposure time of 28 $\mu {s}$, the hologram is captured in clear and bright conditions.

The DL framework is developed in Python environment using the Pytorch package and Adam optimizer. Training is performed using two Windows 10 machines with an NVIDIA RTX2070 graphics card.

4.2 Dendrite samples

In addition to simulated and public holograms, we use dendrite samples in our experiments. Dendrites are visual identifiers that are formed by growing tree-shaped metallic fractal patterns by inducing regulated voltage on electrolyte solutions with different propensities [41]. These tags can be efficiently produced in large volumes on multiple substrate materials (e.g., mica, synthetic paper, etc.) with different granularity and density [42]. A dendrite sample is shown in Fig. 7. Dendrites have specific features such as extremely high entropy for their inherent randomness, self-similarity, and unclonability due to their 3D facets and non-resolution granularity. These features make this patented technology an appropriate choice for security solutions, including identification tags, visual authentication, random generators, and producing physical unclonable functions (PUFs) with robust security keys [10,43].

Fig. 7. A dendritic pattern grown on a synthetic paper and soaked with a liquid electrolyte.

Download Full Size | PDF

We previously have shown dendrites’ utility as 2D authentication identifiers [10,44,45], but exploiting information-rich features from dendrites to achieve unclonability requires specific technologies such as digital holography, as presented in our previous work [5].

4.3 Test with simulated holograms

First, we compare the performance of our method against the most powerful untrained methods, where the sample hologram is sufficient to recover the phase with no need for a training dataset. It is noteworthy that in general, there exist two main classes of untrained neural networks, one with encoder-decoder architecture, mainly based on deep autoencoders, (e.g., DeepDIH [5]), and another class with only the decoder part, the so-called deep decoder (e.g., DCOD [28]).

In this experiment, we compare our model with two untrained DL methods (DeepDIH and DCOD) as well as a CS-based method proposed in [11] using USAF target samples. In our framework, we use the fine-tuned version of DeepDIH as the generator network, but we also perform ablation analysis by replacing it with the DCOD.

The results in Fig. 8 and Table 3 demonstrate the superiority of our proposed method. Particularly, the PNSR of our method ranges from 25.7 dB to 29 dB, depending on the choice of the generator and activating/inactivating the adaptive masking module, which is significantly higher than the CS method (PSNR 14.6 dB), DeepDIH (PNSR 19.7 dB), and DCOD (PNSR 20.1 dB). A similar observation is made in Fig. 8, especially in the quality of the reconstructed object phase. The main justification for this huge improvement is that the untrained method with deep autoencoder without proper regularization terms can easily be trapped in overfitting the noise, especially if over-parameterized [27].

Fig. 8. The comparison of different methods, including (a) DeepDIH [5], (b) DCOD [28], (c) proposed method using DCOD as generator, (d) proposed method with modified DeepDIH as generator, and (e) same as (d) with adaptive masking module. First, second, and third rows represent the reconstructed amplitude, phase, and amplitude of select zone, respectively.

Download Full Size | PDF

Table 3. The comparison of different methods, including compressive sensing (CS) method [11], DeepDIH [5], DCOD [28], proposed method with DCOD as generator, and proposed method with modified DeepDIH as generator without and with adaptive masking module.

View Table | View all tables in this article

Although the DCOD method uses fewer parameters to alleviate the overfitting issue, it does not employ complete knowledge about the hologram formation process and uses random input. In contrast, our method uses the back-propagated holograms as the generator input, meaning that the generator network training starts from a reasonably good start point and converges to a better optimum.

Another drawback of the competitor methods is using MSE loss which does not adequately capture the image reconstruction quality and may guide the network to converge wrongly. This issue is solved in our method by leveraging the underlying physics law and using a learnable distance measure through the discriminator network.

Finally, we observe a significant improvement for the utilized adaptive masking module that improves the reconstruction quality from PSNR 26.3 dB to as high as 29 dB. This highlights the advantage of incorporating physical knowledge into the reconstruction process by adding more constraints to the network weights through background loss.

Figure 9 provides a closer look at the benefits of using the adaptive masking module and applying background loss to USAF target. For a better visibility, we compare three selected parts of the reconstructed amplitude (middle) and the side-view of the reconstructed object surface. It is clearly seen that imposing background loss smooths out the background part of the image and improves the reconstruction quality while not causing edge distortion damage.

Fig. 9. The comparison of the reconstructed object wave from captured hologram using the proposed model without imposing background loss (top row) and with background loss (bottom row). Left (a),(d): amplitude; Middle (b),(e): zoom-in details of amplitude; Right (c),(f): side view of one row of the object blades’ surface.

Download Full Size | PDF

We present the runtime of different approaches using a windows machine with Intel Core i7-8700K CPU and RTX 2070 GPU in Table 4. We recognize that our method with adaptive masking needs about 30 minutes for training GAN, and 6 minutes for masking update. This time is relatively long but is still reasonable for non-time-sensitive applications. To alleviate the computational cost, we use Transfer Learning, as discussed in Section 4.6. With this accelerated network, the reconstruction time reduces to about 4 minutes, comparable to DeepDIH [5].

Table 4. The runtime of different methods, including CS [11], DIH [5], DCOD [28], and our method with and without masking. We use $500\times 500$ images with 5000 iterations for all methods while 500 iterations are sufficient to produce high-quality results using transfer learning.

View Table | View all tables in this article

4.4 Test with real samples

To prove the applicability of our model in real-world scenarios, we have tested different types of samples, including S1: Zea Stem, S2: Onion Epidermis, and S3: Stomata-Vicia Faba Leaf (Fig. 10). The average cell sample size is 2 ${mm}$ $\times$ 2 ${mm}$, equivalent to 1000 $\times$ 1000 pixels in the sensor field. All samples have been placed at a distance of 5.5 ${mm}$ (the closest possible) to the CMOS sensor to avoid unnecessary diffraction of the object waves [28]. The parameters of the framework are set accordingly. For example, we set pixel size (2 $\mu{m}$), wavelength (0.532 ${nm}$), and the distance from the sample to sensor (5,500 $\mu{m}$). We compare our method against the aforementioned methods in Fig. 11, which shows that our method recovers a higher quality texture while maintaining a clean background.

Fig. 10. The reconstruction of the three real samples S1: Zea Stem, S2: Onion Epidermis, S3: Stomata-Vicia, Faba Leaf). (a) Captured hologram; (b) reconstructed amplitude; (c) reconstructed phase; (d) zoom in part.

Download Full Size | PDF

Fig. 11. The comparison between different methods on Onion Epidermis sample in terms of reconstructed phase.

Download Full Size | PDF

We also used the same setup to capture holographic readings of dendrite samples (Fig. 12). The results are presented after convergence which occurs after 2,000 epochs. The results in Figs. 10 and 12 demonstrate the end-to-end performance of the proposed GAN-based phase recovery when applied to real holograms captured by our DIH setup.

Fig. 12. The reconstruction process of a dendrite sample. (a) A typical mica-substrate dendrite sample; (b) captured hologram of select part; (c) reconstructed amplitude; (d) reconstructed phase; (e) 3D view of the reconstructed object surface.

Download Full Size | PDF

4.5 Robustness to noise

Like regular images, the holographic readings can be noisy due to the illumination conditions, rusty lens, sensor noise, and other imaging artifacts. We examine the impact of noise to ensure reasonable noise levels do not substantially degrade the reconstruction quality. To this end, we add additive white Gaussian noise (AWGN) of different levels (standard deviation: $\sigma =5$, $\sigma =10$, and $\sigma =15$) to the simulated holograms. The results for cell and dendrite samples are respectively presented in Figs. 13 and 14, and summarized in Table 5.

Fig. 13. Reconstructed amplitude of a cell sample by different approaches. The first row is the simulated hologram under different noise levels $\sigma =0, 5, 10, 15$.

Download Full Size | PDF

Fig. 14. Reconstructed phase of the dendrite sample with their 3D plot. The first row is the captured hologram with artificially added noise with standard deviations $\sigma =0, \sigma =5, \sigma =10$, and $\sigma =15$, respectively.

Download Full Size | PDF

Table 5. Comparison of different methods in reconstructing phase contaminated with different noise levels.

View Table | View all tables in this article

The results in Figs. 13 and 14 show that the phase recovery of our algorithm is fairly robust against noise levels up to $\sigma =10\sim 15$, and significantly improves upon the similar frameworks such as DeepDIH and DCOD. Similar results are provided in Table 5 that shows better performance for our methods both in SSIM and PSNR metrics. For instance, the SSIM of DeepDIH, DCOD, ours using DCOD as generator, and ours using DeepDIH as generator for a dendrite sample under noise level $\sigma =15$ is respectively 0.453, 0.494, 0.708, and 0.763. It means that our method when using DCOD as generator increases the performance of DCOD from SSIM=0.494 to SSIM=0.708 ($43\%$ improvement). The same applies to our method when using DeepDIH as generator ($68\%$ improvement). By increasing the noise level up to $\sigma =10$, the performance decay of our method is smaller than that of the DeepDIH and DCOD methods.

For instance, for the cell sample, DeepDIH shows around 3 dB decay for each $\Delta \sigma =5$ increase in the noise level, while ours only shows around 2 dB decay. This represents a 50% improvement in PSNR vs noise increase rate. In the dendrite sample, from $\sigma =5$ to $\sigma =10$, the SSIM of DeepDIH decreases about 0.2, while that of ours only decreases about 0.06, which is $70\%$ smaller. We declare conservatively that the reconstruction quality is acceptable for noise levels up to $\sigma =10$, which incurs only around 4 dB decay in PSNR and around 0.2 SSIM loss.

The results overall confirm the robustness of the proposed model for noisy images. Part of this robustness is inherited from the intrinsic noise removal capability of AEs used as a generator in our framework. Also, imposing TV loss on the background section of the hologram removes high-frequency noise from the image.

4.6 One-shot training and transfer learning

A key challenge of DL-based phase recovery methods compared to conventional numerical methods is their generalizability and transferability to other experiments due to DL methods’ unexplainability and black-box nature. This matter can be problematic in real-time applications since the time-consuming training phase should be repeated for every new sample. The proposed method like some other untrained methods partially alleviates this issue by incorporating the underlying physics laws. Originally, our model takes 3000-5000 iterations ($\sim$30 minutes) to reconstruct a hologram with random initialization. By using One-shot Training, we expect the model trained for the first hologram reconstruction can directly be used for all other new holograms with the same recording conditions. By using Transfer Learning, the model can be initialized by the weights obtained from the reconstruction of the previous hologram, and then the model fine-tunes itself to the new hologram, which takes fewer iterations.

To investigate the transferability of our methods, we develop an experiment with the following three testing scenarios for simulated holograms for 4 randomly selected neuro samples taken from the CCBD dataset [46].

I. "One-shot Train" model: The hologram of sample S1 is used to train the DH-GAN model and reconstruct S1 amplitude and phase as usual with 3,000 iterations. This generator part of the model is used to reconstruct the amplitude and phase of the holograms of samples S2-S4 (one model for all).
II. "Retrain:500" model: the network is initialized with random weights, then the reconstruction is performed independently for each sample using 500 iterations (four different models, one for each sample).
III. "One-shot Train+500" model: we use the model trained for sample S1 to initialize the network for other samples S2-S4, then fine-tune the model to perform reconstruction with extra 500 iterations for each sample separately.

The results of this experiment are shown in Fig. 15 and Table 6. The results in Fig. 15(a,b) show an excellent reconstruction quality for DH-GAN with 3,000 iterations. However, for fast deployment, one may not afford repeating 3,000 training iterations for every new sample. In this case, one potential solution would be using the trained network for other samples (e,i,m). The results are quite impressive (PSNR is in 19 dB to 20.2 dB range) and can be acceptable for many applications. Indeed, it outperforms independent networks trained for new samples using only 500 training iterations and random initialization (d,h,i), which achieve a PSNR in 12.8 dB to 14.3 dB range. One intermediate solution would be transfer learning, namely using the network trained for S1 as initialization for other networks and performing 500 training iterations for new samples (f,j,n), which offers the best results (PSNR in 25 dB to 30 dB range).

Fig. 15. The transferability of the DH-GAN. (a) Simulated hologram of sample S1; (b) reconstructed phase of sample S1 using the xcfully trained model. Left side: each row represents a sample (S2,S3,S4); first column represents the captured hologram, and the next three columns represent the results of the three testing scenarios.

Download Full Size | PDF

Table 6. The performance (in PSNR) of three transfer learning scenarios, presented in Fig. 15.

View Table | View all tables in this article

To further investigate the transferability of the developed framework, we perform a test using three sample types, including:1) MNIST handwriting digits, 2) CCBD, and 3) USAF Target. We choose four samples of each type, and train an independent network with fixed initialization for each sample using 3,000 iterations. The weights are collected once per 100 iterations and considered a data point.

Figure 16 visualizes the resulting network weights in the 2D domain using principal component analysis (PCA). The observation is quite interesting since the network weights corresponding to the sample type are aligned in the same direction, and different sample types are somewhat separated into disjoint clusters. However, this is not universal and in some cases, the network trained for one sample type (like blue) can also be used for a different type (like red) if the parameters are close enough in some compact space. Therefore, we should take caution when using transfer learning.

Fig. 16. The 2D visualization of the network weights using PCA. Each data point represent the vector of network weights after 100 iterations. Different colors represent different sample types, including MNIST handwriting digits (green), CCBD (blue), and USAF Target (red). The weights trained for similar patterns are radially clustering with the same orientation.

Download Full Size | PDF

Transferability is more challenging for real samples, because of varying recording conditions. Here, we evaluate different methods on real samples S3 (Stomata-Vicia, Faba Leaf) and S2 (Onion Epidermis) shown in Fig. 10. All models are fully trained using S3, then fine-tuned for S2 with 500 iterations. The reconstructed phase is shown in Fig. 17. Compared with retraining with 500 iterations starting from random initialization (shown in the first row of Fig. 17), all untrained networks demonstrate reasonable transferability although the two samples S2 and S3 are morphologically different (substantially different textures). Our proposed method exhibits a significant improvement against DCOD methods in recovering details. Compared with DeepDIH, our proposed method shows more contrast depth information. We also observe our framework can boost the performance of DCOD, which can be due to the robustness to noisy reading conditions introduced by our design.

Fig. 17. Phase reconstruction by Transfer Learning of different methods. All methods are fully trained on sample S3 (Stomata-Vicia, Faba Leaf), then fine-tuned on sample S2 (Onion Epidermis) in Fig. 10.

Download Full Size | PDF

5. Conclusion

In this paper, we implemented a GAN-based framework to recover the 3D surface of micro-scaled objects from holographic readings. Our method offers several novel features that yield phase retrieval quality far beyond the current practice.

First, we utilized an AE-based generator network as a function approximator (to map real-valued holograms into complex-valued object waves) in contrast to regular supervised GAN networks, where the generator acts as a density estimator of data samples. Secondly, we implemented a progressive masking method powered by simulated annealing that exploits image foregrounds (e.g., fractal patterns in dendrite samples). This feature facilitates imposing smoothness through TV loss on background areas that further improves the reconstruction and noise removal quality.

The proposed method outperforms both conventional and DL-based methods designed for phase recovery from one-shot imaging under similar conditions. Our method achieves a 10 dB gain in PSNR over the CS-based [11] and about 5 dB gain over the most recent untrained deep learning methods such as DeepDIH [5], and DCOD [28]. An additional 3 dB gain is observed for activating the adaptive masking module. Moreover, our model is sufficiently robust against noise and tolerates AWGN noise up to $\sigma =10$. It shows only about 0.4 dB decay per unit noise variance increase, lower than similar methods. Our method elevates the DL-based digital holography to higher levels with a subtle computation increment. Furthermore, we explored transfer learning to enable fast utilization of the proposed method in time-constrained applications. Our experiments show that using a model trained for a similar sample can offer a reasonable reconstruction quality. Using transfer learning by borrowing network weights trained for a similar sample and performing additional 500 iterations for the new sample brings a considerable gain of about 12 dB compared to independent training with 500 iterations. This observation suggests that the developed model is highly transferrable between samples of the same type, but transferability across different sample types needs further investigation.

Funding

U.S. Department of Agriculture (2020-67017-33078).

Acknowledgments

The authors would like to thank Dr. Bruce Gao for his comments on developing the test setup and experiment scenarios.

Disclosures

The authors have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. K. Wallace, S. Rider, E. Serabyn, J. Kühn, K. Liewer, J. Deming, G. Showalter, C. Lindensmith, and J. Nadeau, “Robust, compact implementation of an off-axis digital holographic microscope,” Opt. Express 23(13), 17367–17378 (2015). [CrossRef]

2. N. Patel, S. Rawat, M. Joglekar, V. Chhaniwal, S. K. Dubey, T. O’Connor, B. Javidi, and A. Anand, “Compact and low-cost instrument for digital holographic microscopy of immobilized micro-particles,” Opt. Lasers Eng. 137, 106397 (2021). [CrossRef]

3. W. Xu, M. Jericho, I. Meinertzhagen, and H. Kreuzer, “Digital in-line holography for biological applications,” Proc. Natl. Acad. Sci. 98(20), 11301–11305 (2001). [CrossRef]

4. A. Alfalou and C. Brosseau, “Optical image compression and encryption methods,” Adv. Opt. Photonics 1(3), 589–636 (2009). [CrossRef]

5. H. Li, X. Chen, Z. Chi, C. Mann, and A. Razi, “Deep dih: single-shot digital in-line holography reconstruction by deep learning,” IEEE Access 8, 202648–202659 (2020). [CrossRef]

6. M. K. Kim, “Principles and techniques of digital holographic microscopy,” SPIE Rev. 1(1), 018005 (2010). [CrossRef]

7. C. J. Mann, L. Yu, C.-M. Lo, and M. K. Kim, “High-resolution quantitative phase-contrast microscopy by digital holography,” Opt. Express 13(22), 8693–8698 (2005). [CrossRef]

8. G. Koren, F. Polack, and D. Joyeux, “Iterative algorithms for twin-image elimination in in-line holography using finite-support constraints,” J. Opt. Soc. Am. A 10(3), 423–433 (1993). [CrossRef]

9. N. Bari, G. Mani, and S. Berkovich, “Internet of things as a methodological concept,” in Fourth International Conference on Computing for Geospatial Research and Application (IEEE, 2013), pp. 48–55.

10. Z. Chi, A. Valehi, H. Peng, M. Kozicki, and A. Razi, “Consistency penalized graph matching for image-based identification of dendritic patterns,” IEEE Access 8, 118623–118637 (2020). [CrossRef]

11. W. Zhang, L. Cao, D. J. Brady, H. Zhang, J. Cang, H. Zhang, and G. Jin, “Twin-image-free holography: a compressive sensing approach,” Phys. Rev. Lett. 121(9), 093902 (2018). [CrossRef]

12. C. Bai, T. Peng, J. Min, R. Li, Y. Zhou, and B. Yao, “Dual-wavelength in-line digital holography with untrained deep neural networks,” Photonics Res. 9(12), 2501–2510 (2021). [CrossRef]

13. G. Situ, “Deep holography,” Light: Adv. Manuf. 3(2), 1 (2022). [CrossRef]

14. T. Shimobaba, D. Blinder, T. Birnbaum, I. Hoshi, H. Shiomi, P. Schelkens, and T. Ito, “Deep-learning computational holography: A review,” Front. Photonics 3, 8 (2022). [CrossRef]

15. T. Zeng, Y. Zhu, and E. Y. Lam, “Deep learning for digital holography: a review,” Opt. Express 29(24), 40572–40593 (2021). [CrossRef]

16. H. Wang, M. Lyu, and G. Situ, “eholonet: a learning-based end-to-end approach for in-line digital holographic reconstruction,” Opt. Express 26(18), 22603–22614 (2018). [CrossRef]

17. R. Horisaki, R. Takagi, and J. Tanida, “Deep-learning-generated holography,” Appl. Opt. 57(14), 3859–3863 (2018). [CrossRef]

18. Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light: Sci. Appl. 7(2), 17141 (2017). [CrossRef]

19. Y. Zhang, H. Wang, and M. Shan, “Deep-learning-enhanced digital holographic autofocus imaging,” in Proceedings of the 2020 4th International Conference on Digital Signal Processing (2020), pp. 56–60.

20. K. Wang, J. Dou, Q. Kemao, J. Di, and J. Zhao, “Y-net: a one-to-two deep learning framework for digital holographic reconstruction,” Opt. Lett. 44(19), 4765–4768 (2019). [CrossRef]

21. Z. Ren, Z. Xu, and E. Y. Lam, “End-to-end deep learning framework for digital holographic reconstruction,” Adv. Photonics 1(01), 1–016004 (2019). [CrossRef]

22. H. Chen, L. Huang, T. Liu, and A. Ozcan, “Fourier imager network (fin): A deep neural network for hologram reconstruction with superior external generalization,” Light: Sci. Appl. 11(1), 254 (2022). [CrossRef]

23. Y. Wu, Y. Luo, G. Chaudhari, Y. Rivenson, A. Calis, K. de Haan, and A. Ozcan, “Bright-field holography: cross-modality deep learning enables snapshot 3d imaging with bright-field contrast using a single hologram,” Light: Sci. Appl. 8(1), 25–27 (2019). [CrossRef]

24. D. Yin, Z. Gu, Y. Zhang, F. Gu, S. Nie, J. Ma, and C. Yuan, “Digital holographic reconstruction based on deep learning framework with unpaired data,” IEEE Photonics J. 12(2), 1–12 (2020). [CrossRef]

25. Y. Zhang, M. A. Noack, P. Vagovic, K. Fezzaa, F. Garcia-Moreno, T. Ritschel, and P. Villanueva-Perez, “Phasegan: A deep-learning phase-retrieval approach for unpaired datasets,” Opt. Express 29(13), 19593–19604 (2021). [CrossRef]

26. F. A. Jenkins and H. E. White, “Fundamentals of optics,” Indian J. Phys. 25, 265–266 (1957).

27. R. Heckel and P. Hand, “Deep decoder: Concise image representations from untrained non-convolutional networks,” arXiv, arXiv:1810.03982 (2018). [CrossRef]

28. F. Niknam, H. Qazvini, and H. Latifi, “Holographic optical field recovery using a regularized untrained deep decoder network,” Sci. Rep. 11(1), 10903–10913 (2021). [CrossRef]

29. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 9446–9454.

30. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention (Springer, 2015), pp. 234–241.

31. F. Wang, Y. Bian, H. Wang, M. Lyu, G. Pedrini, W. Osten, G. Barbastathis, and G. Situ, “Phase imaging with an untrained neural network,” Light: Sci. Appl. 9(1), 77 (2020). [CrossRef]

32. G. Palubinskas, “Image similarity/distance measures: what is really behind mse and ssim?” Int. J. Image Data Fusion 8(1), 32–53 (2017). [CrossRef]

33. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision (Springer, 2016), pp. 694–711.

34. R. W. Gerchberg, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35(2), 237–246 (1972).

35. Z. Zalevsky, D. Mendlovic, and R. G. Dorsch, “Gerchberg–saxton algorithm applied in the fractional fourier or the fresnel domain,” Opt. Lett. 21(12), 842–844 (1996). [CrossRef]

36. T. Latychevskaia, “Iterative phase retrieval for digital holography: tutorial,” J. Opt. Soc. Am. A 36(12), D31–D40 (2019). [CrossRef]

37. T. Latychevskaia and H.-W. Fink, “Practical algorithms for simulation and reconstruction of digital in-line holograms,” Appl. Opt. 54(9), 2424–2434 (2015). [CrossRef]

38. M. A. Schofield and Y. Zhu, “Fast phase unwrapping algorithm for interferometric applications,” Opt. Lett. 28(14), 1194–1196 (2003). [CrossRef]

39. S. Cai, Z. Mao, Z. Wang, M. Yin, and G. E. Karniadakis, “Physics-informed neural networks (pinns) for fluid mechanics: A review,” Acta Mech. Sinica pp. 1–12 (2022).

40. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning (PMLR, 2015), pp. 448–456.

41. M. N. Kozicki, “Dendritic structures and tags,” (2021). US Patent 11,170,190.

42. M. N. Kozicki, “Dendritic tags,” (2022). US Patent App. 17/311,154.

43. A. Razi and Z. Chi, “Methods and systems for generating unclonable optical tags,” (2022). US Patent App. 17/505,547.

44. A. Valehi, A. Razi, B. Cambou, W. Yu, and M. Kozicki, “A graph matching algorithm for user authentication in data networks using image-based physical unclonable functions,” in Computing Conference (IEEE, 2017), pp. 863–870.

45. H. Wang, X. Chen, and A. Razi, “Fast key points detection and matching for tree-structured images,” arXiv, arXiv:2211.03242 (2022). [CrossRef]

46. “Cil project: P1170”.

Layer Type	Kernel Size
Layer Type	$K_{1} \times K_{2} \times C_{i n} \times C_{o u t}$
Conv2d+BatchNorm+Relu	$5 \times 5 \times 2 \times 32$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 32 \times 32$
MaxPool2d
Conv2d+BatchNorm+Relu	$3 \times 3 \times 32 \times 64$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 64 \times 64$
MaxPool2d
Conv2d+BatchNorm+Relu	$3 \times 3 \times 64 \times 128$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 128 \times 128$
MaxPool2d
Conv2d+BatchNorm+Relu	$3 \times 3 \times 128 \times 128$
Conv2d+BatchNorm+Tanh	$3 \times 3 \times 128 \times 16$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 16 \times 128$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 128 \times 128$
ConvTranspose2d	$s t r i d e = 2$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 128 \times 64$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 64 \times 64$
ConvTranspose2d	$s t r i d e = 2$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 64 \times 32$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 32 \times 32$
ConvTranspose2d	$s t r i d e = 2$
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Conv2d+BatchNorm+Relu*	$3 \times 3 \times 32 \times 16$
Conv2d+BatchNorm+Relu*	$3 \times 3 \times 16 \times 16$
ConvTranspose2d*	$s t r i d e = 2$
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Conv2d+BatchNorm+Relu	$3 \times 3 \times 16 \times 16$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 16 \times 16$
Conv2d	$3 \times 3 \times 16 \times 2$

Layer Type	Kernel Size
Layer Type	$K_{1} \times K_{2} \times C_{i n} \times C_{o u t}$
Conv2d+BatchNorm+ReLU	$5 \times 5 \times 2 \times 32$
Conv2d+BatchNorm+ReLU	$3 \times 3 \times 32 \times 32$
MaxPool2d	-
Conv2d+BatchNorm+ReLU	$3 \times 3 \times 32 \times 64$
Conv2d+BatchNorm+ReLU	$3 \times 3 \times 64 \times 64$
MaxPool2d	-
Conv2d+BatchNorm+ReLU	$3 \times 3 \times 64 \times 128$
Conv2d+BatchNorm+ReLU	$3 \times 3 \times 128 \times 128$
MaxPool2d	-
Conv2d+BatchNorm+ReLU	$3 \times 3 \times 128 \times 128$
Conv2d+BatchNorm	$3 \times 3 \times 128 \times 16$
GlobalPool2d	-
Full-connected	$1 \times 1 \times 16 \times 1$

	Noise Level		0	5	10	15
Cell	DeepDIH	PSNR	26.008	22.933	20.663	18.311
	DeepDIH	SSIM	0.807	0.699	0.585	0.491
	DCOD	PSNR	25.169	21.881	18.481	16.216
	DCOD	SSIM	0.748	0.664	0.481	0.423
	Ours (DCOD as G)	PSNR	27.979	24.794	20.939	19.392
	Ours (DCOD as G)	SSIM	0.869	0.732	0.568	0.492
	Ours (DeepDIH as G)	PSNR	27.817	26.062	23.732	19.793
	Ours (DeepDIH as G)	SSIM	0.941	0.842	0.716	0.548
Dendrite	DeepDIH	PSNR	30.113	29.207	21.784	16.72
	DeepDIH	SSIM	0.916	0.875	0.671	0.453
	DCOD	PSNR	30.54	28.793	21.597	17.118
	DCOD	SSIM	0.922	0.865	0.639	0.494
	Ours (DCOD as G)	PSNR	30.99	28.691	25.98	21.846
	Ours (DCOD as G)	SSIM	0.931	0.872	0.76	0.708
	Ours (DeepDIH as G)	PSNR	32.994	30.071	28.092	23.976
	Ours (DeepDIH as G)	SSIM	0.969	0.911	0.846	0.763

Samples	S1	S2	S3	S4
Retrain: 500	-	13.028	14.240	12.866
One-shot Train	31.286	19.772	20.261	18.998
One-shot Train+500	-	29.476	25.162	25.576

Layer Type	Kernel Size
Layer Type	$K_{1} \times K_{2} \times C_{i n} \times C_{o u t}$
Conv2d+BatchNorm+Relu	$5 \times 5 \times 2 \times 32$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 32 \times 32$
MaxPool2d
Conv2d+BatchNorm+Relu	$3 \times 3 \times 32 \times 64$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 64 \times 64$
MaxPool2d
Conv2d+BatchNorm+Relu	$3 \times 3 \times 64 \times 128$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 128 \times 128$
MaxPool2d
Conv2d+BatchNorm+Relu	$3 \times 3 \times 128 \times 128$
Conv2d+BatchNorm+Tanh	$3 \times 3 \times 128 \times 16$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 16 \times 128$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 128 \times 128$
ConvTranspose2d	$s t r i d e = 2$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 128 \times 64$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 64 \times 64$
ConvTranspose2d	$s t r i d e = 2$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 64 \times 32$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 32 \times 32$
ConvTranspose2d	$s t r i d e = 2$
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Conv2d+BatchNorm+Relu*	$3 \times 3 \times 32 \times 16$
Conv2d+BatchNorm+Relu*	$3 \times 3 \times 16 \times 16$
ConvTranspose2d*	$s t r i d e = 2$
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Conv2d+BatchNorm+Relu	$3 \times 3 \times 16 \times 16$
Conv2d+BatchNorm+Relu	$3 \times 3 \times 16 \times 16$
Conv2d	$3 \times 3 \times 16 \times 2$

DH-GAN: a physics-driven untrained generative adversarial network for holographic imaging

Abstract

1. Introduction

1.1 Related work on conventional DH phase recovery

1.2 Related work on deep learning-based DH

1.3 Summary of our contributions

2. Problem formulation

3. Proposed method

3.1 Optimization through loss function

3.2 Adaptive masking by K-means and simulated annealing

3.3 Network architecture

4. Experiment

4.1 Experiment setup

4.2 Dendrite samples

4.3 Test with simulated holograms

4.4 Test with real samples

4.5 Robustness to noise

4.6 One-shot training and transfer learning

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (17)

Tables (7)

Equations (12)

Optics Express