Diffraction-Net: a robust single-shot holography for multi-distance lensless imaging

Haixin Luo; Jie Xu; Jie Xu; Jie Xu; Liyun Zhong; Xiaoxu Lu; Jindong Tian; Jindong Tian; Jindong Tian

doi:10.1364/OE.472658

1. Introduction

Benefitting from the large field of view, simple structure and high resolution, lensless imaging has been widely adopted in quantitative phase imaging, biological tissue microscopy, particle monitoring, etc. in recent years [1–4]. In lensless imaging, illumination light firstly irradiates on an object and diffracts/scatters, and then forms image on a photoelectric sensor directly. The image contains the optical information of complex-amplitude on the object surface, including amplitude and phase distributions, which indicate transmissivity and retardation of the object to light respectively [5,6]. Hence, lensless imaging is also known as lensless holography [7,8]. Whereas, since photoelectric sensor can only directly record light intensity instead of phase information, how to make full use of the light intensity to recover the light phase is the key to reconstruct the complex-amplitude on the object surface [9,10]. To address this issue, the most commonly used methods are to capture multiple images, at different diffraction distances or using multi-wavelength illuminations [11–13]. Then, based on these multiple diffraction images associated with the light transfer function, the phase information can be retrieved computationally [14,15].

The retrieval approaches always fall into two categories, the iterative approaches and the non-iterative approaches. The transport of intensity equations (TIE) proposed by Teague is a famous non-iterative approach. It connects the light intensity with phase information on three different planes perpendicular to the optical axis, making it possible to directly recover the phase information from the intensity information [14]. Zuo et al. improved the method by combining TIE and digital refocusing algorithm to realize high-resolution phase reconstruction, using RGB-LED array to obtain diffraction images corresponding to three wavelengths at different illumination angles [16]. For the iterative category, a representative is Gerchberg-Saxton (GS) algorithm proposed by Gerchberg et al. [15]. Osten et al. associated GS algorithm with multiple diffraction images collected at different planes to recover the phase map [13]. To improve robustness, Liu et al. changed this kind of serial transmission iteration approach and adopted the iterative complex-amplitude retrieval method [17–19].

In general, these phase retrieval approaches need to acquire multiple images, at different diffraction distances or using multi-wavelength illuminations. The later method achieves the acquisition without moving samples or sensors, but the system is often complex and expensive. Besides, the number of obtained images depends on the number of wavelengths. The multi-distance method has no these limitations. A one-dimensional displacement stage is required to obtain the images at different diffraction distances, so some inevitably errors, such as shifts and rotations, would affect the final reconstruction accuracy. Recently, Huang et al. proposed a precise registration algorithm in multi-distance lensless imaging, which reduces the errors caused by the sample displacement during the moving process [20]. However, in some dynamic applications, such as observing live cells or organisms in nutrient solution, the shape and position of the sample may change over a short period of time, due to the influence of life activity and fluid movement. This dynamic change, integrating with the moving errors of the displacement stage, invalidates the phase recovery method based on multi-distance lensless imaging.

In order to realize phase retrieval from a single-shot diffraction image, deep learning is an approach with high potentials. Sinha et al. firstly proposed the deep neural network (DNN) to directly recover the phase information from a single image by using an end-to-end approach without complex phase solution process [21]. It paves a new way for the phase recovery in lensless imaging. For the twin-image problem commonly appearing in digital holographic microscope, Ozcan et al. used deep learning approach to remove the twin-image and recover the complex-amplitude of the cells [22]. These end-to-end network approaches have simple structures, but suffering from the unsatisfied interpretability and generalization ability. A promising trend of optimization is to introduce physical prior knowledge constraints or model transfer learning into neural network. In these processes, neural networks generate intermediate results to reduce the differences between the target domain and the source domain, thus improve the interpretability and generalization ability of deep learning [23–27].

Aiming to achieve single-shot complex-amplitude reconstruction with good generalization ability, inspired by previous works mentioned above, we construct a diffraction network (Diff-Net) system which combines the generative adversarial networks with the iterative complex-amplitude retrieval algorithm. This process is a hybrid-driven reconstruction approach including both physical models and deep learning. In the Diff-Net, the diffraction image acquired at the first position is used as the input sample, and the images captured at other distances are used as the training targets. After the training, only the image at the first position is needed for input and the other images are generated from the network. The Diff-Net system improves the accuracy by avoiding practical errors caused by optical path inclination and sample movement. Hence, the Diff-Net system provides a robust single-shot approach to recover complex-amplitude of dynamic samples. To evaluate the effectivity and stability of the Diff-Net system, simulations are carried out and samples with significantly different morphologies are used for experimental verification.

2. Methods

2.1 Diff-Net system

The Diff-Net system is based on a multi-distance lensless system as shown in Fig. 1(a). The illumination is a collimated laser beam. The sample with amplitude and phase distributions to be measured locates in the optical path and modulates the incident laser. Then, the modulated light directly forms image on a camera sensor, in the way of in-line holography. In the construction of the Diff-Net, we move the camera in optical axis to capture multi-distance images as shown in Fig. 1(a). Specifically, the first image is used as the input sample and the others are used as the training target of the Diff-Net. By capturing a large number of images for each distance, we construct a dataset for Diff-Net which indicates the transfer law of diffraction light.

Fig. 1. Schematic of the Diff-Net system, where (a) is the optical layout of the multi-distance lensless imaging for constructing a dataset for Diff-Net, (b) is the complex-amplitude retrieval process based on single-shot input associated with the Diff-Net.

Download Full Size | PDF

To recover the sample’s complex-amplitude (or termed hologram including amplitude and phase distributions), multiple images at different distances are required. Whereas, once the Diff-Net has been set up, the multi-distance images can be generated from one of them, thus we can practically use only one image to retrieve complex-amplitude of the sample based on the iterative method as shown in Fig. 1(b).

In the iterative complex-amplitude retrieval algorithm, for the m^th iteration, we assume that the complex-amplitude on the object surface is A_m. Then, the A_m propagates along different (serial number of n with n = 1, 2, …N) distances z_n to generate multiple images, with complex-amplitude D_mn as [28]:

(1)$${D_{mn}} = {F^{ - 1}}({F({{A_m}} )H({{f_\textrm{x}},{f_\textrm{y}},{z_n}} )} ),n = 1,2,\ldots ,N, $$

where F and F⁻¹ represent Fourier transform and inverse operation respectively, f_x and f_y are frequencies in x and y directions, z_n is the distance between the sample and the n^th sensor as shown in Fig. 1(a). In Eq. (1), H is the transfer function of light in free space and expressed as:

(2)$$H({{f_\textrm{x}},{f_\textrm{y}},{z_n}} )= \left\{ {\begin{array}{cl} {\exp \left( {\frac{{\textrm{i}2\mathrm{\pi }{z_n}}}{\mathrm{\lambda }}\sqrt {1 - {{({\mathrm{\lambda }{f_\textrm{x}}} )}^2} - {{({\mathrm{\lambda }{f_\textrm{y}}} )}^2}} } \right),}&{{{({\mathrm{\lambda }{f_\textrm{x}}} )}^2} + {{({\mathrm{\lambda }{f_\textrm{y}}} )}^2} < 1}\\ {0,}&{{{({\mathrm{\lambda }{f_\textrm{x}}} )}^2} + {{({\mathrm{\lambda }{f_\textrm{y}}} )}^2} \ge 1} \end{array}} \right., $$

where λ is the wave wavelength, i is the imaginary unit. The D_mn in Eq. (1) is the predicted complex-amplitude that propagates from object, while it’s amplitude can be obtained from the image captured by camera sensor. Assuming that the intensity obtained from the image is I_n, the D_mn is updated to [29]:

(3)$$D{\mathrm{^{\prime}}_{mn}} = \sqrt {{I_n}} \frac{{{D_{mn}}}}{{|{{D_{mn}}} |}} = \sqrt {{I_n}} \exp ({\textrm{i}{\theta_{mn}}} ), $$

where θ_mn is the phase of original D_mn before update. Then, assuming that the updated D’_mn propagates backward to the object, we can obtain a new complex-amplitude A_mn on object surface with the help of transfer function H again, which is expressed as:

(4)$${A_{mn}} = {F^{ - 1}}({F({D{\mathrm{^{\prime}}_{mn}}} )H({{f_\textrm{x}},{f_\textrm{y}}, - {z_n}} )} ). $$

Noting that A_mn is obtained from different (n) distances, we calculate the average value of the A_mn and use it as the new complex-amplitude for the next (m + 1)^th iteration:

(5)$${A_{m + 1}} = \frac{1}{N}\sum\limits_{n = 1}^N {{A_{mn}}}. $$

After numerous iterations, the A_m_+ 1 converge to the truth of complex-amplitude on the object surface (termed ground truth). It should be noted that, the sensor should be as close as possible to the object surface while meet the Nyquist sampling theorem to improve the numerical aperture (NA) value thus the resolution of the system [30]. For a system where the sensor is very close to sample, its resolution reaches to the diffraction limit, and the minimum spatial resolution equals to the pixel size [31]. The actual resolution of our system is 1.85µm. Besides, multiple diffraction images at different distances are required via one-dimensional movement of the camera sensor along optical axis [13].

According to the Ref. [22], the SSIM index of N = 6 is only 1.5% smaller than that of N = 8. In order to improve the computation speed, in this paper, we choose N = 6 as total number of the diffraction images, with equivalent distance from each other. Accordingly, the Diff-Net is set to be one input channel and five output channels. The input channel is the image acquired at the first position that locates nearest to the object and the output channels are the other images obtained at farther positions. After the training, only a single diffraction image is required to generate the other diffraction images at different positions through the Diff-Net. Finally, the sample’s hologram which is composed of amplitude and phase information is reconstructed via the iterative algorithm based on Eqs. (1) – (5) illustrated above. The number of iterations we use is 40, where the results converge to stable status. It takes about 2.851s to reconstruct a typical field of view with 256 × 256 pixels. The program is run on MATLAB R2021a, and the running platform is a computer with Intel Xeon Gold 6230R CPU @ 2.10 GHz.

2.2 Diff-Net framework

Our Diff-Net adopts the framework of conditional generative adversarial network (cGAN), which consists of two sub-networks, the generator and the discriminator [23,32,33]. As Fig. 2 shows, the architecture of generator is a U-net whose encoder is constructed with 8 convolution blocks and the decoder is constructed with 8 transpose blocks. Each convolution block of the encoder extracts features from their input and downsamples them into multi-channel feature maps as the input to the next block. In the decoder, each transpose block upsamples the feature maps from the output of the previous block and the corresponding convolution block, which are used as the input of the next block [33]. In general, the encoder extracts feature information of the input diffraction image, then the decoder reconstructs them into the diffraction image I_GH with the same size as the original image at the other positions. The predicted image I_GH and the ground truth diffraction image I_GTH are concatenated as the input of the discriminator. This process aims to judge whether the generated image is closer to the real image through the pairwise features of pixel-to-pixel than through whole features, thus to improve the discriminant ability [34].

Fig. 2. Simplified diagram of the Diff-Net. (a) is the architecture of the generator which is constructed with 8 convolution blocks (Conv (3 × 3, stride = 2)-Layer Norm-Leaky ReLU) and 8 transpose blocks (Deconv (3 × 3, stride = 2)-Leaky ReLU). The skip connection concatenates the feature maps from the down-sampling path with the corresponding feature maps from the up-sampling path. The 5 channel outputs of the generator are the diffraction images at five different positions. (b) is the architecture of the discriminator which is constructed with 7 Conv (3 × 3, stride = 1)-LayerNorm-LeakyReLU blocks. The subscript in each block shows the number of channels of each feature map.

Download Full Size | PDF

For the encoder, the convolution kernel in each convolution block has a size of 3 × 3 with the stride of 2, which is denoted as the Conv (3 × 3, stride = 2). Then, the layer normalization operation (Layer Norm) is performed after the Conv (3 × 3, stride = 2) to limit the data within the range from 0 to 1 to prevent gradient disappearance or explosion, thus improve the stability of the network [35]. At last, a Leaky ReLU function with parameter of 0.2 is used to make the network perform nonlinear calculation.

For the decoder, an up-sampling path with 8 transpose blocks is adopted. The size of the transpose kernel in each transpose block is 3 × 3 with the stride of 2, which is denoted as Deconv (3 × 3, stride = 2) [36]. A Leaky ReLU function with the parameter of 0.2 is also introduced after the Deconv (3 × 3, stride = 2). Then, the feature maps output from the transpose block is concatenated with the diffraction image feature maps output from the corresponding convolution block in a skip connection way. At last, the concatenation feature maps are input to the next transpose block until the maps recover the original image.

The discriminator consists of 7 blocks, within which a series of operations are performed, including convolution with stride 1 (Conv (3 × 3, stride = 1)), layer normalization and nonlinear activation using the Leaky ReLU function. The role of the discriminator is to distinguish the fake images generated by the generator as much as possible, while the generator outputs more realistic images to confuse the discriminator. Such a generative adversarial mechanism can be described as an optimization problem and expressed as:

(6)$${G_{wb}} = {\mathop {\arg \min}\nolimits_{G}}{{\textrm{max}}_D}\log {D_{GTH}} + \log (1 - {D_{GH}}),$$

where G_wb is the weight and bias parameters of the generator, D_GH and D_GTH are two possibilities generated by the discriminator and used to determine whether the image is generated or the ground truth. The generator tries to minimize this problem, while the discriminator tries to maximize it instead. When the two optimizations are balanced, the diffraction image generated by the generator is similar enough to the ground truth to confuse the discriminator.

2.3 Loss function

Loss functions Loss_G of the generator and Loss_D of the discriminator are used to optimize the weight and bias parameters of the Diff-Net. They are calculated from the hologram I_GH generated by generator and the corresponding ground truth I_GTH, expressed as [37]:

(7)$$\left\{ \begin{array}{l} Los{s_G} = \varepsilon ({1 - SSIM({{I_{GTH}},{I_{GH}}} )+ RMSE({{I_{GTH}},{I_{GH}}} )} )+ BCE({{D_{GH}},1} )\\ Los{s_D} = BCE({{D_{GH}},0} )+ BCE({{D_{GTH}},1} )\end{array} \right.$$

in which SSIM and RMSE are the abbreviations of structural similarity and root mean squared error. They are two indexes to evaluate image quality. The value equals to 1 for truth image, while approaches to 0 for generated image. The BCE is the abbreviation of binary cross entropy loss function and used to determine closeness of two probabilities. The SSIM is given as:

(8)$$SSIM({{I_{GTH}},{I_{GH}}} )= \frac{{({2{\mu_{GTH}}{\mu_{GH}} + {C_1}} )({2{\sigma_{GTH,GH}} + {C_2}} )}}{{({\mu_{GTH}^2 + \mu_{GH}^2 + {C_1}} )({\sigma_{GTH}^2 + \sigma_{GH}^2 + {C_2}} )}}, $$

where µ and σ are mean value and variance of the gray map corresponding to the denoted subscript. The σ_GTH_,GH is the covariance of the two maps. The C₁ and C₂ are two constants. The SSIM in Eq. (7) evaluates the structural similarity of two images by using the brightness, contrast and structure of the images. In Eq. (6), the weight of SSIM and RMSE is controlled by the parameter ε. A larger ε indicates better generation effect for complex texture, thus more complex texture information can be recovered [38]. In this paper, the parameter ε is set to 100, which is conducive for complex texture recovery. The BCE is given as:

(9)$$BCE({{p_1},{p_2}} )={-} {p_2}\log ({{p_1}} )- ({1 - {p_2}} )\log ({1 - {p_1}} ), $$

in which p₁ and p₂ are two predicted probability values. More detailed definition of these evaluation indexes is given in Refs. [37–40].

3. Simulation analysis

In order to verify the validity of the Diff-Net, we simulate the hologram recovery by utilizing the Diff-Net associated with multi-distance iteration algorithm. We take notice of several possible deviations in multi-distance lensless imaging, such as sample tilt, moving incline, oblique incidence and axial moving error. These errors all have negative influence on the hologram retrieval and can be summarized into one error: the fact that the directions of incident illumination and camera’s movement are unparallel [41,42]. Hence, we mainly focus on two error cases, the oblique incidence and the incline movement, as shown in Fig. 3. The sample used in the simulation is a pure phase object with amplitude of 1. The phase distribution P is superposition of several (total number of Q) randomly located two-dimensional Gaussian distributions [23]:

(10)$$P({x,y} )\textrm{ = }\sum\limits_{l = 1}^Q {{a_l}} {\textrm{N}_l}({{\mu_\textrm{x}},{\mu_\textrm{y}},\sigma_\textrm{x}^2,\sigma_\textrm{y}^2} ),l = 1,2,3\ldots ,Q, $$

where N_l indicates an individual two-dimensional Gaussian distribution with amplitude of a_l, l is the serial number, µ_x and µ_y are the center position of the individual Gaussian distribution, σ_x and σ_y are the corresponding shape parameters in x and y directions respectively. Ground truth of the phase distribution is shown in the left of Fig. 3 with Q = 6. The image size is 1.85µm/pixel × 256pixels = 473.6µm.

Fig. 3. The main errors affecting phase retrieval in multi-distance lensless imaging and the corresponding retrieved phase maps with iteration only (up row) and Diff-Net based iteration (bottom row). The left shows the ground truth on object surface; (a) is the case of ideal condition; (b) is the case where the incident light is tilted with an angle of 1°; (c) is the case where the moving direction of the sample is tilted with an angle of 1°.

Download Full Size | PDF

To build the datasets, 440 different samples were generated by adjusting the parameters in Eq. (10). For each sample, we calculated the diffraction images on several planes with different distances to the sample. Specifically, 6 images with the distance of 900µm + 50µm × n (n = 1, 2, …6) were obtained. The first image (n = 1) was used as the input sample of the Diff-Net, and the other 5 images were used as the training targets. A total of 440 pairs of the diffraction image datasets were made, in which 40 pairs were used for validation. Adam optimizer was used to optimize the Diff-Net [43]. The batch size was set to 10, the epoch was set to 1000, and the strategy of varying learning rate was adopted. Specifically, the learning rate of the first 500 epochs was 0.0002, and the learning rate of the last 500 epochs was reduced to 1% of 0.0002. The network is trained on the computer with Intel Xeon Gold 6230R CPU @ 2.10 GHz, NVIDIA Quadro RTX 5000 16 G using Python3.6, Tensorflow1.8 framework. For the image size of 256 × 256 pixels which is a typical field of view, the training time is about 18 hours and the inference time is about 0.26s.

With the method mentioned above, we obtained the Diff-Net which connects the first diffraction images with the other diffraction images at different distances. Thus, we generated the other diffraction images from the first diffraction image via the Diff-Net, then used the iterative complex-amplitude retrieval algorithm to reconstruct the hologram of the sample, as shown in Fig. 3. The three cases depicted in Fig. 3 was considered and analyzed. A tilt angle of 1° was set for both cases of oblique incidence and incline movement. For the oblique incidence case, the first image will be overall offset because the tilted incident light adds a phase factor to the sample. However, for the incline movement case, the first diffraction image is the same with that in ideal condition, since the errors are generated at the other 5 diffraction images. Hence, by utilizing the Diff-Net, the incline movement has no influence on the retrieved hologram, since only first diffraction image is used in fact. Thus, for the ideal condition and the case of incline movement, the recovered phase maps based on Diff-Net are the same in fact, as shown in the bottom of Fig. 3(a) and (c).

In terms of phase recovery effect, it can be seen in Fig. 3 that the phase profile of the object is clear without obvious distortion. Even in the case of oblique incidence, the phase of the object can be clearly retrieved, and the result is close to the ground truth. In order to quantitatively evaluate the recovery effect, maximum cross-correlation value (Max CCV) is used. Since the position of the diffraction image is overall offset in the oblique incidence case, the typical evaluation indexes, such as SSIM, PSNR and RMSE lost their significance. Instead, the Max CCV finds and compares the areas with highest similarity between two images, which means Max CCV is an effective evaluation parameter for overall offset images. Hence, we take Max CCV as the general index that evaluate the deviation between the recovered phase map and the ground truth for all cases as displayed in Table 1. The results in Table 1 are the mean values obtained from the 40 validation samples.

Table 1. Quantitative evaluation of the phase recovery results under the three cases corresponding to Fig. 3.

View Table | View all tables in this article

In Table 1, we can see that that the Max CCV has slight difference for the two methods in ideal condition. Whereas, for the oblique incidence case, the method that based on Diff-Net has an obvious (about 18.5%) larger Max CCV than the method of iteration only, which indicates better recovery effect. Besides, for the incline movement case, the Max CCV via Diff-Net method is 0.950, which is only 0.011 (about 1.1%) lower than traditional iteration only method under ideal condition.

All the evaluation indexes above demonstrate that the phase recovery method based on the Diff-Net is not only valid, but also possesses qualified ability to correct the tilt errors commonly appears in multi-distance lensless imaging. It means that the Diff-Net based method is robust for nonideal cases with various errors. Besides, once the Diff-Net has been constructed, only a single diffraction image is needed for holographic retrieval.

4. Experimental results and discussions

4.1 Experiment setup

The experimental lensless imaging system is constructed as shown in Fig. 4(a). The light source is a He-Ne laser with wavelength of 632.8 nm (Thorlabs HNL100LB). The initial laser beam is expanded by a spatial filtering system (Thorlabs KT3110/M) into a plane wave. The planar illumination laser irradiates on the sample and directly forms image on the camera sensor (LUCID PHX122S-MNL with pixel size of 1.85µm and image size of 4024 × 3036 pixels). The distance between the sample and the camera sensor is precisely controlled by a precise displacement stage (Thorlabs LTS150/M). We used luffa pollen (Xinxiang, Qizhi Biology) as experiment sample because this kind of biological particles has great morphological change, which is helpful to verify the reliability of our algorithm. The pollen samples are prepared by applying commercial pollens to a glass slide. Besides, as a common allergen, pollen plays an important role in air environment. A total of 6 diffraction images at different positions were collected, with the moving distance of 50µm. Since the camera sensor has a glass protection plate, it is impossible for the sample to be completely attached to the camera sensor. Through focusing strategy [10], the first diffraction distance was set to be 1300µm. This distance is different from that used in simulation, because the surface of the camera sensor is covered with a protective glass with thickness of about 1 mm. This difference would not affect our experimental verification, since the distance of lensless imaging is usually set in the range from 100µm to 2 mm [44].

Fig. 4. The lensless imaging system (a) and the captured diffraction image of luffa pollen particles (b), in which the images denoted by 1 and 2 are cropped small images with the size of 256 × 256 pixels for datasets construction of the Diff-Net.

Download Full Size | PDF

As shown in Fig. 4(b), the field of view of the lensless imaging system is so large that numerous luffa pollen particles are contained in one frame with the size of 4024 × 3036 pixels. Hence, we cut the frame into multiple small frames as the datasets for the training of network. Each one of the small frames is 256 × 256 pixels. Specifically, for each one frame obtained at the 6 different distances, 110 small fames were acquired from the large frame. Among them, 100 frames were augmented to 400 frames by rotating 0°, 90°, 180° and 270°, which were used as training set, and the remaining 10 frames were used for validation. Adam optimizer was used to optimize the network, and the batch size was set to 10 for each training, with a total of 1000 epochs trained [43]. The strategy of variable learning rate was adopted. The learning rate of the first 500 epochs was 0.0002, and that of the last 500 epochs was reduced to 1% of the initial learning rate. This network is trained on the computer with Intel Xeon Gold 6230R CPU @ 2.10 GHz, NVIDIA Quadro RTX 5000 16 G using Python3.6, Tensorflow1.8 framework. For the image size of 256 × 256 pixels which is a typical field of view, the training time is about 18 hours and the inference time is about 0.26s. The training results and related codes is available on Github [45].

4.2 Effectivity of the Diff-Net

Since the ground truth of the luffa pollen is unknown, we used the reconstructed hologram retrieved from 6 actual diffraction images via iteration method as the ground truth of complex-amplitude. The reconstruction results obtained by our Diff-Net based iterative method are compared with those obtained from the complex-amplitude solver network method (CAS-Net) [46], the de-twined term network method (DTT-Net) [22], and the direct back propagation method (DBP) [8].

For the CAS-Net and DTT-Net, although the datasets, network layers, learning rate and training platform used by them are the same as those used by Diff-Net, the Diff-net does not need to preprocess the acquired diffraction images. It directly takes the first diffraction image as the network input, and other diffraction images as the corresponding targets. Whereas, the CAS-Net needs to calculate the amplitude and phase of the object iteratively according to multiple diffraction images, and take them as the target of network training. The DTT-Net also needs to calculate the amplitude and phase of the sample disturbed by twin-image as the input sample. The DBP is a traditional method to solve complex-amplitude of the sample by direct back propagation. All these four methods only need a single raw image to be captured, so their acquisition time is the same (about 0.1s). For a typical field of view of 256 × 256 pixels, the inference time of the Diff-Net to generate 5 other diffraction images is about 0.26s. The inference time of other learning-based methods mentioned above to reconstruct complex-amplitude is about 0.1s. Comparing to the traditional multi-distance method which need multiple images to be captured physically, the significant advantage of the methods mentioned above is that only a single-shot is required, thus these methods have the same temporal resolution in offline dynamic measurement, and their temporal resolution depends on the frame rate and exposure time of the single-shot image.

In order to intuitively observe the recovery effect of the four methods, we randomly select an untrained sample of the luffa pollen and input its diffraction image into the networks for recovery. Results of the four methods are given in Fig. 5, including the recovered amplitude results shown from Fig. 5(a) to (d) and the recovered phase results shown from Fig. 5(e) to (h). All the results are compared with the ground truth (GT). It can be seen in Fig. 5(a) to (d) that, the details and contours of the amplitude are recovered well for all the methods except for CAS-Net. At the same time, we also noticed that the contrast of the recovered phase via DTT-Net is poor, and the recovered amplitude via DBP is greatly disturbed by twin-image. The amplitude recovered by the Diff-Net is close to the ground truth. For the phase recovery results, as shown in Fig. 5(e) to (h), the phase maps retrieved via the four methods have the same effects. Overall speaking, the details of the recovered hologram via Diff-Net are better than CAS-Net, the contrast is better than DTT-Net, and the interference of twin-image is suppressed better than DBP.

Fig. 5. Recovery effects of four methods based on Diff-Net, CAS-Net, DTT-Net and DBP, where (a)-(d) are the recovered amplitude maps of the four methods respectively, and (e)-(f) are the recovered phase maps of the four methods respectively. The ground truth (GT) and the absolute errors between the ground truth and the reconstruction results are given for comparison. The evaluation indexes SSIM (S), RMSE (R) and PSNR (P) are marked on the upper right of the reconstruction maps.

Download Full Size | PDF

Quantitative indexes including SSIM, RMSE and PSNR are used to compare the recovery effects of the four methods. The SSIM is used to evaluate the structural similarity between the recovered image and the ground truth, with the value ranging from 0 to 1. Larger SSIM means the two images are more similar. The RMSE is used to evaluate the similarity of low-frequency signal between the recovered image and the ground truth. Smaller RMSE means better recovery. The PSNR is the abbreviation of peak signal-to-noise ratio which is used to evaluate the information loss of the recovered image [39]. Larger PSNR means better signal-to-noise ratio of the recovered image. We randomly selected 10 untrained diffraction images of luffa pollen, and used them to recover amplitude and phase by the four methods respectively. Then, the recovered values and the ground truth values were import into the calculations of SSIM, RMSE and PSNR, and averaged by 10. The final calculation results are given in Table 2.

Table 2. SSIM, RMSE and PSNR of amplitude and phase recovery effect by four methods

View Table | View all tables in this article

Since the amplitude and phase have difference ranges (amplitude ranges from 0 to 255 while phase ranges from 0 to 2π), the RMSE of them is quite different, but it does not affect the evaluation ability of the RMSE for the four methods. For the recovered amplitude, compared with the results from the other three methods, the SSIM of Diff-Net shows the largest value with 6.1% ∼ 14.5% larger than the other methods, the PSNR of Diff-Net shows the highest value with 10.3% ∼ 20.4% higher than the other methods, and the RMSE of Diff-Net shows lowest value with 27.2% ∼ 39.1% lower the other methods. For the recovered phase, the results of Diff-Net show the same effect as the recovered amplitude: the SSIM is more than 12.6% larger, the RMSE is more than 30.4% lower, and PSNR is more than 28.7% higher than that of the other methods. Overall speaking, the amplitude and phase recovery performance of the proposed Diff-Net is evidently better than the other three methods in detail retaining, contrast holding and twin-image suppression.

To further prove the effectivity of the Diff-Net, standard polymethyl methacrylate (PMMA) spherical particles (refractive index of 1.5, particle size of 10µm, produced by KEMAI, Dongguan) were tested for verification. The particles are immersed and dispersed in artificial cedar oil with refractive index of 1.51. Other experimental parameters are the same with Fig. 5. Results are shown in Fig. 6, which also consists of the four methods for comparison. Considering that PMMA particles are nearly transparent and the amplitude has little difference, only the phase recovery results are shown. The ground truth of the phase is calculated from the relative index of refraction between the PMMA particles and the oil. The phase curves right side in the Fig. 6 are the sectional phase distribution along the red lines in the phase maps, and each phase curve is the mean value obtained from 10 particles. The GT curve is also mean value and it is calculated from traditional iteration algorithm based on experimental multi-distance diffraction images.

Fig. 6. Recovered phase of spherical PMMA particles, consists of the four methods for comparison. Diffraction image and phase maps are located in the left side, and the sectional phase curves along the red lines in the phase maps are given in the right side. Each phase curve is the mean value obtained from 10 particles. Other parameters are the same with those in Fig. 5.

Download Full Size | PDF

Figure 6 shows that the CAS-Net fails to predict the phase of a sample, which may be caused by the fact that the size of PMMA particles is smaller than the training set samples (the pollen particles are about 30µm while the PMMA particles are 10µm). The results of the other three methods are similar, but overall speaking, the curve of Diff-Net is closer to GT, which demonstrate that it is effective and performs well.

4.3 Generalization ability of the Diff-Net

On the discussions of generalization ability of the Diff-Net, we selected four pollen samples with significantly different morphologies for verification. Under the same experimental conditions, the diffraction images of peach blossom pollen, pine pollen, lily pollen and wheat pollen are captured respectively. Results are shown in Figs. 7 and 8, corresponding to amplitude recovery and phase recovery. Data processing is the same with that shown in Fig. 5, with the same constructed Diff-Net. Results of the other three methods are shown correspondingly for comparison.

Fig. 7. Comparison of the phase retrieval effects of four single-shot methods for four pollen samples with significantly different morphologies, in which (a) is pine pollen, (b) is lily pollen, (c) is peach pollen and (d) is wheat pollen. First column images are the diffraction images acquired at a distance of 1300µm away from the camera. The evaluation indexes SSIM (S), RMSE (R) and PSNR (P) are marked on the upper right of the reconstruction maps. Other parameters are the same with those in Fig. 5.

Download Full Size | PDF

Fig. 8. Comparison of the amplitude retrieval effects, with same parameters of Fig. 5.

Download Full Size | PDF

The amplitude and phase recovered by our Diff-Net based iterative method on the four kinds of pollen particles are closer to the ground true than those recovered by the other methods. Among them, the reconstruction quality of CAS-Net is poor, and it only recovers the rough outline instead of details of the amplitude and phase, but it suppresses twin-image. DTT-Net recovers the amplitude details well, but it shows relatively poor performance in phase recovery. The contrast of the amplitude and phase recovered by DTT-Net is low, and the ability of twin-image suppression is poor. The amplitude and phase recovered by DBP are rich in detail, but are seriously disturbed by twin-image. Overall speaking, the generalization ability of the amplitude and phase reconstruction method based on Diff-Net is better than those of the CAS-Net and the DTT-Net. In addition, Diff-Net suppresses the twin-image better than DBP.

Quantitatively, we calculate the evaluation indexes of SSIM, RMSE and PSNR for 10 untrained diffraction images of each type of pollen mentioned in Figs. 7 and 8, and the mean values are presents in Table3. It can be seen that, for lily pollens, peach pollens and wheat pollens, the evaluation indexes of the Diff-Net method are significantly better than other three methods. Specifically, we can calculate from Table 3 that the PSNR of the Diff-Net is 8.6% ∼ 36.8% (on the amplitude) and 3.5% ∼ 82.4% (on the phase) larger than the other methods respectively. The SSIM of the Diff-Net is 0.2% ∼ 25.8% (on the amplitude) and 0.7% ∼ 57.3% (on the phase) larger than the other methods. These two indexes indicate that the Diff-Net based method performs obviously better than the other methods in noise suppression and sample information reproduction. For the RMSE of the Diff-Net based method on the phase recovery of pine pollen, although it is slightly higher than CAS-Net method, it is still 19.8% ∼ 21.7% smaller than the other two methods.

Table 3. SSIM, RMSE and PSNR of amplitude and phase recovery of pine pollen, lily pollen, peach blossom pollen and wheat pollen by four methods

View Table | View all tables in this article

To sum up, the Diff-Net based recovery results are overall optimal in terms of SSIM, RMSE and PSNR. It indicates that the combination of the physical model and deep learning to solve the amplitude and phase has better robustness, which overcomes the shortcomings of the insufficient generalization ability of the traditional deep learning method.

5. Conclusions

In lensless imaging, deep learning is a promising approach for the purpose of realizing single-shot phase retrieval with easy experimental setup. In this paper, we propose a diffraction network (Diff-Net) which connects the multiple diffraction images captured at different distances. Here, only one single diffraction image is required for holography by utilizing the Diff-Net based iterative complex-amplitude retrieval method. The competitive edge for adopting only one image is to ensure the retrieval process to be robust with no practical errors occurring between multiple images. We have experimentally tested four kinds of pollen particles with significantly different morphologies. The retrieval results of the Diff-Net based recovery results obviously outperformed the other methods, which proves that the Diff-Net obtains qualified generalization ability. We further believe that the Diff-Net system will be a significant single-shot holographic approach for reconstructing dynamic samples under various practical conditions.

Funding

National Natural Science Foundation of China (61727814, 62005175, 62075140).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [45].

References

1. Y. Rivenson, Y. Wu, H. Wang, Y. Zhang, A. Feizi, and A. Ozcan, “Sparsity-based multi-height phase recovery in holographic microscopy,” Sci. Rep. 6(1), 37862 (2016). [CrossRef]

2. Y. Zhang, Y. Shin, K. Sung, S. Yang, H. Chen, H. Wang, D. Teng, Y. Rivenson, R. P. Kulkarni, and A. Ozcan, “3D imaging of optically cleared tissue using a simplified CLARITY method and on-chip microscopy,” Sci. Adv. 3(8), e1700553 (2017). [CrossRef]

3. X. Cui, L. M. Lee, X. Heng, W. Zhong, P. W. Sternberg, D. Psaltis, and C. Yang, “Lensless high-resolution on-chip optofluidic microscopes for Caenorhabditis elegans and cell imaging,” Proc. Natl. Acad. Sci. USA105(31), 10670–10675 (2008). [CrossRef]

4. Y. Wu, A. Shiledar, Y. Li, J. Wong, S. Feng, X. Chen, C. Chen, K. Jin, S. Janamian, Z. Yang, Z. S. Ballard, Z. Gorocs, A. Feizi, and A. Ozcan, “Air quality monitoring using mobile microscopy and machine learning,” Light: Sci. Appl. 6(9), e17046 (2017). [CrossRef]

5. D. Gabor, “A new microscopic principle,” Nature 161(4098), 777–778 (1948). [CrossRef]

6. A. Greenbaum and A. Ozcan, “Maskless imaging of dense samples using pixel super-resolution based multi-height lensfree on-chip microscopy,” Opt. Express 20(3), 3129–3143 (2012). [CrossRef]

7. S. Isikman, S. Seo, I. Sencan, A. Erlinger, and A. Ozcan, “Lensfree cell holography on a chip: From holographic cell signatures to microscopic reconstruction,” 2009 IEEE LEOS Annual Meeting Conference Proceedings, 2009: 404–405 (2009).

8. Y. Zhang, G. Pedrini, W. Osten, and H. J. Tiziani, “Whole optical wave field reconstruction from double or multi in-line holograms by phase retrieval algorithm,” Opt. Express 11(24), 3234–3241 (2003). [CrossRef]

9. C. Zuo, Q. Chen, J. Sun, and A. Asundi, “Non-interferometric phase retrieval and quantitative phase microscopy based on transport of intensity equation: A review,” Chinese Journal of Lasers 43(6), 0609002 (2016). [CrossRef]

10. Z. Liu, C. Guo, and J. Tan, “Lensfree computational imaging based on multi-distance phase retrieval,” Infrared Laser. Eng. 47(10), 1002002 (2018). [CrossRef]

11. D. W. E. Noom, K. S. E. Eikema, and S. Witte, “Lensless phase contrast microscopy based on multiwavelength Fresnel diffraction,” Opt. Lett. 39(2), 193–196 (2014). [CrossRef]

12. P. Bao, F. Zhang, G. Pedrini, and W. Osten, “Phase retrieval using multiple illumination wavelengths,” Opt. Lett. 33(4), 309–311 (2008). [CrossRef]

13. G. Pedrini, W. Osten, and Y. Zhang, “Wave-front reconstruction from a sequence of interferograms recorded at different planes,” Opt. Lett. 30(8), 833–835 (2005). [CrossRef]

14. M. R. Teague, “Deterministic phase retrieval: a greeńs function solution,” J. Opt. Soc. Am. 73(11), 1434–1441 (1983). [CrossRef]

15. G. Yang, B. Dong, B. Gu, J. Zhuang, and O. K. Ersoy, “Gerchberg–Saxton and Yang–Gu algorithms for phase retrieval in a nonunitary transform system: a comparison,” Appl. Opt. 33(2), 209–218 (1994). [CrossRef]

16. C. Zuo, J. Sun, J. Zhang, Y. Hu, and Q. Chen, “Lensless phase microscopy and diffraction tomography with multi-angle and multi-wavelength illuminations using a LED matrix,” Opt. Express 23(11), 14314–14328 (2015). [CrossRef]

17. X. Wen, X. Zhou, Y. Li, Y. Ji, K. Zhou, S. Liu, D. Jia, W. Liu, D. Chi, and Z. Liu, “High-performance lensless diffraction imaging from diverse holograms by three-dimensional scanning,” Opt. Lett. 47(14), 3423–3426 (2022). [CrossRef]

18. Z. Liu, C. Guo, J. Tan, Q. Wu, L. Pan, and S. Liu, “Iterative phase-amplitude retrieval with multiple intensity images at output plane of gyrator transforms,” J. Opt. 17(2), 6 (2015). [CrossRef]

19. C. Guo, C. Shen, Q. Li, J. Tan, S. Liu, X. Kan, and Z. Liu, “A fast-converging iterative method based on weighted feedback for multi-distance phase retrieval,” Sci. Rep. 8(1), 10 (2018). [CrossRef]

20. Y. Huang, M. Zhu, L. Ma, and W. Zhang, “Accurate and fast registration algorithm for multi-height lensless in-line on-chip holographic microscopy,” Opt. Commun. 526, 128898 (2023). [CrossRef]

21. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4(9), 1117–1125 (2017). [CrossRef]

22. Y. Rivenson, Y. Zhang, H. Gnaydin, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light: Sci. Appl. 7(2), 17141 (2018). [CrossRef]

23. Q. Zhang, S. Lu, J. Li, D. Li, X. Lu, L. Zhong, and J. Tian, “Phase-shifting interferometry from single frame in-line interferogram using deep learning phase-shifting technology,” Opt. Commun. 498, 127226 (2021). [CrossRef]

24. F. Wang, Y. Bian, H. Wang, M. Lyu, G. Pedrini, W. Osten, G. Barbastathis, and G. Situ, “Phase imaging with an untrained neural network,” Light: Sci. Appl. 9(1), 7 (2020). [CrossRef]

25. Y. Zhang, M. A. Noack, P. Vagovic, K. Fezzaa, F. Garcia-Moreno, T. Ritschel, and P. Villanueva-Perez, “PhaseGAN: a deep-learning phase-retrieval approach for unpaired datasets,” Opt. Express 29(13), 19593–19604 (2021). [CrossRef]

26. J. Chen, Q. Zhang, X. Lu, L. Zhong, and J. Tian, “Quantitative phase imaging based on model transfer learning,” Opt. Express 30(10), 16115–16133 (2022). [CrossRef]

27. D. Yang, J. Zhang, Y. Tao, W. Lv, S. Lu, H. Chen, W. Xu, and Y. Shi, “Dynamic coherent diffractive imaging with a physics-driven untrained learning method,” Opt. Express 29(20), 31426–31442 (2021). [CrossRef]

28. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21(15), 2758–2769 (1982). [CrossRef]

29. A. Anand, V. K. Chhaniwal, P. Almoro, G. Pedrini, and W. Osten, “Shape and deformation measurements of 3D objects using volume speckle field and phase retrieval,” Opt. Lett. 34(10), 1522–1524 (2009). [CrossRef]

30. L. Xu, X. Peng, J. Miao, and A. K. Asundi, “Studies of digital microscopic holography with applications to microstructure testing,” Appl. Optics 40(28), 5046–5051 (2001). [CrossRef]

31. W. Bishara, T. Su, A. F. Coskun, and A. Ozcan, “Lensfree on-chip microscopy over a wide field-of-view using pixel super-resolution,” Opt. Express 18(11), 11181–11191 (2010). [CrossRef]

32. T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “Deep learning approach for Fourier ptychography microscopy,” Opt. Express 26(20), 26470–26484 (2018). [CrossRef]

33. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Commun. ACM 63(11), 139–144 (2020). [CrossRef]

34. P. Isola, J. Zhu, A. A. Efros, and T. Zhou, “Image-to-Image Translation with Conditional Adversarial Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1125–1134 (2017).

35. F. Agostinelli, M. Hoffman, P. Sadowski, and P. Baldi, “Learning activation functions to improve deep neural networks,” arXiv preprint arXiv:1412.6830, (2014).

36. V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” arXiv preprint arXiv:1603.07285, (2016).

37. Q. Zhang, J. Chen, J. Li, E. Bo, H. Jiang, X. Lu, L. Zhong, and J. Tian, “Deep learning-based single-shot structured illumination microscopy,” Opt. Lasers Eng. 155, 107066 (2022). [CrossRef]

38. Q. Xu, S. Zhong, K. Chen, and C. Zhang, “Optimized Selection Method of Cycle-consistent Loss Coefficient of CycleGAN in Image Generation with Different Texture Complexity,” Comput. Sci. 46(1), 100–106 (2019). [CrossRef]

39. A. Hore and D. Ziou, “Image quality metrics: PSNR vs. SSIM,” 2010 20th international conference on pattern recognition, 2366–2369 (2010).

40. T. Chai and R. R. Draxler, “Root mean square error (RMSE) or mean absolute error (MAE)? - Arguments against avoiding RMSE in the literature,” Geosci. Model Dev. 7(3), 1247–1250 (2014). [CrossRef]

41. W. Li, Q. Zhang, J. Li, L. Zhong, X. Lu, and J. Tian, “A simplified method of interferograms matching in the dual-channel phase-shifting interferometry,” Opt. Commun. 452, 457–462 (2019). [CrossRef]

42. C. Guo, Q. Li, J. Tan, S. Liu, and Z. Liu, “A method of solving tilt illumination for multiple distance phase retrieval,” Opt. Lasers Eng. 106, 17–23 (2018). [CrossRef]

43. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

44. A. Greenbaum, W. Luo, T. Su, Z. Gorocs, L. Xue, S. O. Isikman, A. F. Coskun, O. Mudanyali, and A. Ozcan, “Imaging without lenses: achievements and remaining challenges of wide-field on-chip microscopy,” Nat. Methods 9(9), 889–895 (2012). [CrossRef]

45. https://github.com/ipc-deeplearning/Diff-Net

46. K. Wang, J. Dou, Q. Kemao, J. Di, and J. Zhao, “Y-Net: a one-to-two deep learning framework for digital holographic reconstruction,” Opt. Lett. 44(19), 4765–4768 (2019). [CrossRef]

	Ideal condition		Oblique incidence		Incline movement
	Iteration only	Diff-Net +Iteration	Iteration only	Diff-Net +Iteration	Iteration only	Diff-Net +Iteration
Max CCV	0.961	0.950	0. 747	0.885	0.811	0.950

		Diff-Net	CAS-Net	DTT-Net	DBP
Amplitude	SSIM	0.592	0.543	0.558	0.517
	RMSE (gray scale)	15.842	25.997	21.775	25.735
	PSNR (dB)	24.318	20.744	22.040	20.193
Phase	SSIM	0.762	0.677	0.573	0.563
	RMSE (rad)	0.454	0.959	0.652	1.249
	PSNR (dB)	26.206	17.442	20.367	15.757

		Pine Pollen
		Diff-Net	CAS-Net	DTT-Net	DBP
Amplitude	SSIM	0.570	0.482	0.529	0.508
	RMSE (gray scale)	18.836	25.952	25.401	24.082
	PSNR (dB)	23.082	20.287	20.657	21.257
Phase	SSIM	0.569	0.551	0.422	0.520
	RMSE (rad)	0.624	0.571	0.797	0.778
	PSNR (dB)	22.254	21.508	18.384	19.159
		Lily Pollen
		Diff-Net	CAS-Net	DTT-Net	DBP
Amplitude	SSIM	0.542	0.486	0.516	0.541
	RMSE (gray scale)	18.806	26.381	26.604	24.111
	PSNR (dB)	22.855	20.063	20.090	20.832
Phase	SSIM	0.763	0.699	0.502	0.704
	RMSE (rad)	0.603	0.820	0.799	1.163
	PSNR (dB)	23.333	18.819	18.576	15.979
		Peach Blossom Pollen
		Diff-Net	CAS-Net	DTT-Net	DBP
Amplitude	SSIM	0.601	0.500	0.543	0.492
	RMSE (gray scale)	19.477	36.365	36.797	34.896
	PSNR (dB)	22.485	17.551	17.386	17.604
Phase	SSIM	0.826	0.721	0.577	0.751
	RMSE (rad)	0.406	1.206	0.919	1.030
	PSNR (dB)	26.509	14.534	16.883	15.902
		Wheat Pollen
		Diff-Net	CAS-Net	DTT-Net	DBP
Amplitude	SSIM	0.502	0.399	0.443	0.438
	RMSE (gray scale)	24.101	50.871	39.509	36.964
	PSNR (dB)	20.696	15.133	16.616	17.377
Phase	SSIM	0.763	0.673	0.485	0.758
	RMSE (rad)	0.473	0.869	1.032	1.025
	PSNR (dB)	23.198	17.741	16.256	15.973

	Ideal condition		Oblique incidence		Incline movement
	Iteration only	Diff-Net +Iteration	Iteration only	Diff-Net +Iteration	Iteration only	Diff-Net +Iteration
Max CCV	0.961	0.950	0. 747	0.885	0.811	0.950

		Diff-Net	CAS-Net	DTT-Net	DBP
Amplitude	SSIM	0.592	0.543	0.558	0.517
	RMSE (gray scale)	15.842	25.997	21.775	25.735
	PSNR (dB)	24.318	20.744	22.040	20.193
Phase	SSIM	0.762	0.677	0.573	0.563
	RMSE (rad)	0.454	0.959	0.652	1.249
	PSNR (dB)	26.206	17.442	20.367	15.757

Diffraction-Net: a robust single-shot holography for multi-distance lensless imaging

Abstract

Corrections

1. Introduction

2. Methods

2.1 Diff-Net system

2.2 Diff-Net framework

2.3 Loss function

3. Simulation analysis

4. Experimental results and discussions

4.1 Experiment setup

4.2 Effectivity of the Diff-Net

4.3 Generalization ability of the Diff-Net

5. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (3)

Equations (10)

Optics Express