Dynamic coherent diffractive imaging with a physics-driven untrained learning method

Dongyu Yang; Dongyu Yang; Dongyu Yang; Junhao Zhang; Junhao Zhang; Junhao Zhang; Ye Tao; Ye Tao; Wenjin Lv; Wenjin Lv; Shun Lu; Hao Chen; Wenhui Xu; Wenhui Xu; Yishi Shi; Yishi Shi

doi:10.1364/OE.433507

1. Introduction

Coherent diffraction imaging (CDI) is a powerful imaging technique that is simple in concept and easy to implement. It collects intensities of the diffraction wave and then uses iterative feedback algorithms to reconstruct an image, namely, to solve the inverse problem. Since its first experimental demonstration in 1999 [1], CDI has gained rapid development, and a variety of methods have been proposed, including plane-wave CDI, Bragg CDI, reflection CDI, Fresnel CDI, sparsity CDI, and ptychography [2–6]. These CDI methods have been widely used in astronomy, crystallography, biomedical imaging, optical imaging and more [7–12]. Notwithstanding its considerable development and widespread implementation, current CDI techniques still suffer from several shortcomings when investigating in a dynamic process. Firstly, the complex field reconstruction problem. Conventional iterative methods, such as error-reduction (ER) [13] and hybrid input-output (HIO) [13] or difference map (DM) [14], are accompanied by complementary problems, plaguing with the uniqueness of the solutions and the stagnation of the algorithm [15–18]. The second problem is the time-consuming data acquisition, it has been demonstrated that the ambiguities in the solution space can be reduced by introducing more constraints, a good case in point is ptychography [10,19,20], which can faithfully reconstruct a complex object from multiple partially overlapped diffraction patterns. However, the data collection is time-consuming, as usually tens to hundreds of diffraction patterns are recorded. Fortunately, some recently developed methods, such as single shot ptychography (SSP) [21] and coherent modulation imaging (CMI) [22] have addressed the two problems, thus have great prospects for the study of dynamic objects. Finally, all these methods mentioned above are in an iterative manner. Their algorithms usually require hundreds to thousands of iterations to converge to a solution with high confidence. This would be extremely time-consuming when there is a large amount of recorded data need to be processed.

Recently, research on deep learning (DL) has opened a new gateway for solving the inverse problem in computational imaging [23,24]. The DL-based methods have been used in holographic image reconstruction, computational ghost imaging, optical tomography, and phase retrieval [25–41]. Specifical to the dynamic imaging problem in CDI, a state-of-art demonstration is CDI NN [42], which can invert a diffraction pattern to an image within a few milliseconds of computation time. Most of these DL-based methods employ a supervised training strategy, thus require a large amount of labeled data and can take hours for training. Nevertheless, in practice application, because it is hard to obtain enough ground truth for networking training when imaging things never seen before, one can find the generalization ability of the trained network is limited, as the network has generalization ability only within a small neighborhood around the trained samples. To overcome this limitation, Wang introduced an untrained learning method termed PhysenNet [38], which can be used without training beforehand and can effectively recover a phase object from a single diffraction pattern. The concept of PhysenNet is combining a conventional network with a real-world physical imaging model, that is also suitable to solve many other inverse problems in computational imaging. We note that this revolutionary concept has not been used in the reconstruction of a complex field from one single diffraction measurement, overcome which will be of great help to the study of dynamic process.

In this study, we introduce a physics-driven untrained learning method termed Deep CDI, which can reconstruct the complex field from one single measurement and has the possibility to achieve the fast reconstruction of a dynamic process. The complex field reconstruction is achieved by combining a conventional artificial neural network such as U-Net [43] with a real-world physical imaging model. To the best of our knowledge, we are the first to demonstrate that the support region constraint, which is widely used in conventional iterative algorithms [6,13,20,22,44,45], can be utilized for loss calculation. The loss calculated from support constraint and free propagation constraint is summed up to optimize the network’s weights and bias, resulting in a feasible mapping between the diffraction pattern and the corresponding complex field. As a result, the only input to the network is one single diffraction pattern, instead of a large amount of labeled data. We validate the feasibility of our method with numerical simulations and optical experiments. More importantly, the limited but useful generalization ability of neural networks brings us more flexibility when dealing with a dynamic process with thousands of data-points. Instead of reconstructing a dynamic process frame by frame, we demonstrate a two-step reconstruction strategy that use a fraction of the recorded diffraction patterns to train the network. Once trained, the network can predict the whole process with an average reconstruction speed that is much faster than conventional iterative algorithms.

2. Method

2.1 Formulation of the inverse problem

An optical thin object can be described by a complex transmission function $U(x,y)$, where

(1)$$U(x, y)=A(x, y) e^{i \varphi(x, y)},$$

with $A(x,y)$ being a real modulus between 0 and 1 representing the absorption of the object, and $\varphi (x,y)$ denoting the phase response. With the illumination of a plane wave, the near-field diffraction of $U\left (x,y\right )$ over distance z can be modeled by an angular spectrum method:

(2)$$U_{z}(x, y)=\mathcal{F}^{{-}1}\left\{\mathcal{F}\{U(x, y)\} \exp \left\{{-}i \pi \lambda z\left(u^{2}+v^{2}\right)\right\}\right\},$$

where $\mathcal {F}$ denotes the 2D Fourier transform and $\mathcal {F}^{-1}$ its inverse, $u$ and $v$ are the spatial frequencies in the $x$ and $y$ directions, respectively. The CCD camera record only the diffraction intensity $I_z(x,y)$, which is

(3)$$I_z(x,y)=\left| U_z(x,y) \right|^{2}.$$

We can write a mapping function $H(\cdot )$ to express the above process, relating the complex amplitude object $U$ to its diffraction intensities $I_z$, and the inverse is then denoted as $H^{-1}(\cdot )$:

(4) $$I_z(x,y)=H(U(x,y)),$$

(5)$$U(x,y)=H^{{-}1}(I_z(x,y)).$$

This equation formulates a highly ill-posed problem [16,24], in which only one intensity measurement is used to derive the complex-valued object function. The solution of the equation can be further formulated as a minimization problem:

(6)$$\widehat{U}(x,y)=argmin_uD(H(U),I)+\Psi(U),$$

here $D(H(U),I)$ is the fitness term, $D(,)$ is a distance operator, and $\mathrm {\Psi }(U)$ is the prior knowledge term. A typical DL-based method to solve this problem is the End-to-End approach, which attempts to learn a mapping function f from a large amount of labeled data $\left (U_k,I_k\right ),\ k=1,\ldots ,K$ that denoted as

(7)$$\widehat{f}=argmin_fD(f(I_k),U_k),$$

where $f(\cdot )$ is the output of the neural network.

To save the need from experimental collecting the large amount of training data, Wang [38] introduced a method with an untrained neural network, PhysenNet. They combined U-net, a conventional artificial neural network, with a real-world physical imaging model imposing the free propagation constraint, to solve the phase retrieval problem. However, in terms of the complex field reconstruction problem, we must seek additional constraints.

Among the methods based on iterative algorithms, an efficient constraint is the so-called support region constraint where the wavefield is known to have non-zero amplitude within a finite region. A recent feasible demonstration is CMI [22]. Besides the intensity constraint, CMI utilizes a support region constraint along with a modulator to derive the complex field, in which the update function at the support plane is

(8)$$U_j=U'_{j-1}\cdot S+\beta (U'_{j-1}-U_{j-1})\times(1_{N\times N}-S),$$

(9)$$S(x,y)= \begin{cases} 1, & x,y\in R \\0, & x,y\in R' \end{cases},$$

where $U_{j-1}^{\prime }$ is the previous estimate; $U_{j-1}$ is the estimate after optimizing by the measured data in the $(j-1)$th iteration; $\beta$ is a constant coefficient. $S$ is the support plane.

To implement the support constraint in the DL-based method, we formulate the complex field retrieval as

(10)$$\widehat{f}=argmin_f\{D(f(I_k),U_k)+D(U\_k(1-S),0_{N\times N})\}.$$

The added second term of the above equation is the fitness term of the support constraint, with which the area outside the support can converge to zero. When the optimization is complete, the complex field at the support plane can be reconstructed using the trained mapping function:

(11)$$\widehat{U}=\widehat{A}e^{i\widehat{\varphi}}=\widehat{f}(I_z).$$

2.2 Deep CDI principle

The forward imaging geometry is shown in Fig. 1(a). The support plane is formed using a small pinhole, which is pressed against the sample. A collimated laser beam transmits through the sample, and then propagate to reach the detector plane where the diffraction patterns are captured. Constraints are applied in the two distinct planes, as illustrated in Fig. 1(b). The recorded diffraction pattern is first fed into the Deep CDI network (Step 1) to get a predicted complex field (Step 2), then the loss is calculated according to the physical constraints (Step 3), which is detailed in the next section, at last, the network’s weights are updated (Step 4). Steps 1-4 are repeated as a training cycle.

Fig. 1. Schematic of the experimental set-up and Deep CDI phase retrieval algorithm. (a) A collimated laser beam passed through a 1mm diameter pinhole and then passed through the biological sample which was pressed against the pinhole. Further downstream, a time series of diffraction patterns of the sample were collected by a camera. (b) Flowchart of the Deep CDI phase retrieval algorithm. Step 1: The collected diffraction pattern is input to the Deep CDI network. Step 2: After forward propagation, the network predicts reconstructed amplitude and phase images. Step 3: The collected diffraction pattern and reconstructed results are used to calculate the loss value by physical constraints including support constraint and free propagation. Step 4: The loss value is used as a feedback for the network backpropagation.

Download Full Size | PDF

2.3 Experiment setup

The optical setup for the Deep CDI is depicted in Fig. 2(a). The imaging system is constructed by cage mounts with a vertical layout, which is stable and facilitates the observation of biological samples in liquid media. Using the tension of water, the living biological sample is sandwiched between two cover slides with a thin steel sheet, as shown in Fig. 2(b) (top view). The thickness of the thin steel sheet is $\leqslant 0.05mm$, the outer diameter is 30mm, and the inner diameter is 8mm. All diffraction patterns (8bit) are acquired by the CMOS camera (Basler, ace2, a2A4504-18umPRO) with a 50-microsecond exposure at 60 FPS.

Fig. 2. Optical setup for Deep CDI. (a) a horizontal laser beam emitted from a semiconductor laser source at a wavelength of 639 nm (COHERENT, OBIS 637LX) first passed through a mirror mounted on a right-angle cage mount. And then, we used a spatial filter (THORLABS, KT310) with a focusing lens (THORLABS, C230TMD-A, $f=4.51mm$), a pinhole (THORLABS, P15D, $15\mu m$) and a collimating lens (THORLABS, LA1608-A, $f=75mm$) to collimate the laser beam. The support region pinhole and sample were passed through by the collimated laser beam. The diffraction pattern images were captured by a Basler ace2 camera with $512\times 512$ pixels of size $2.74\times 2.74 \mu m^{2}$, 60 FPS and 8bit. (b) A living biological sample in liquid medium (top view).

Download Full Size | PDF

2.4 Design of the network architecture

In this study, we implement Deep CDI with the U-net architecture, as shown in Fig. 3(a). The U-net architecture is consisting of Double convolution layer, Single convolution layer, Transposed convolution, Max pooling, Concatenation and Skip connection (Please see Supplement 1, Fig. S1 for more details). The input of the network is the diffraction pattern, which is a single-channel grayscale image of $1\times N\times N$ pixels. The output layer of the network consists of two channels ($2\times N\times N$ pixels), one for amplitude and the other for phase, which are combined into the predicted complex field. Then, the predicted complex field is used for loss calculating. The error metric we use is the Mean Square Error (MSE). The first loss function is calculated at the support plane, as shown in Fig. 3(b) top row, which can be described as:

(12)$$loss1=\Psi\{\hat{A}(x, y) \times\left(1_{N \times N}-S\right), 0_{N \times N})\},$$

where $\Psi (,)$ is mean square error (MSE); $\widehat {A}(x,y)$ is the amplitude of the predicted complex field from the network output; $S$ is the support region; $1_{N\times N}$ and $0_{N\times N}$ are the N-by-N matrix of ones and zeros, respectively. The second loss function is calculated at the detector plane, as shown in Fig. 3(b) bottom row. By propagating the predicted complex field to detector plane, we can calculate the MSE between the predicted diffraction intensity and the measured one:

(13)$$loss2=\Psi\{H(\widehat{A}(x,y)e^{i\widehat{\varphi}(x,y)}),I_z(x,y)\},$$

where $\hat {A}\left (x,y\right )$ and $\hat {\varphi }(x,y)$ are the network outputs, denoting amplitude and phase of the predicted complex field, respectively; $I_z\left (x,y\right )$ is the measured diffraction pattern. Note that the propagator is determined by the simulation/experimental setup. In this article, we demonstrate our method in the near-field or far near-field geometry, in which the angular spectrum propagation is valid [46]. Finally, the loss function for the whole network is the sum of loss1 and loss2.

Fig. 3. Neural network architecture and physical driven loss functions. (a). Detailed schematic of the U-Net used for reconstructing a complex valued object from a single diffraction pattern. (b). Two physical driven loss functions are used to calculating a loss from the U-Net output. The support constraint on the U-Net output amplitude channel is shown in the top row. $S$ is the support constraint. Meanwhile, the free propagation constraint is used with the two-channel of the U-Net output, both amplitude and phase, which is shown in the bottom row. $H(\cdot )$ is the process of propagation and record. $I'_z(x, y)$ is the diffraction intensity of the complex amplitude object $U_0(x, y)$ at distance z. The symbol $\oplus$ refers to add.

Download Full Size | PDF

2.5 Implementation details

The Deep CDI network is implemented using Python version 3.7.9, Pytorch framework version 1.7.1 and CUDA version 11.2. We use the Adam optimizer with a learning rate of 0.005. All the models are trained with the same fixed seed on a PC with E5-2630 @ 2.20GHz CPU and NVIDIA GTX 2080Ti GPU. For the one single diffraction pattern reconstruction training, only one GPU is used for training. The size of the input image (diffraction pattern transformed from 8bit to $[0, 1]$) is 512$\times$512 pixels and the output images (reconstructed amplitude and phase) are 2$\times$512$\times$512 pixels. In the dynamic experiment training, there are three GPUs for parallel training. The training set is composed of 60 randomly selected one frame every second from a time series of 3600 diffraction patterns. The Deep CDI model is trained with a batch size of 6 and other 6 images are used for validation. After training, the wrapped phase image ($\varphi _{wrapped}(x,y$)) is obtained from the network output amplitude image ($\hat {A}\left (x,y\right )$) and phase image ($\hat {\varphi }(x,y$)) by the arctangent function:

(14)$$\varphi_{wrapped}(x,y)=angle(\widehat{A}(x,y)e^{(i\widehat{\varphi}(x,y)}),$$

where $\varphi _{wrapped}(x,y)\in (-\pi ,\ \pi ]$; $angle(\cdot )$ is a Numpy function returning the counterclockwise angle from the positive real axis on the complex plane. Final phase images presented in this paper are deliberately masked out with its central region of 364 pixels in diameter.

3. Results and discussion

3.1 Numerical simulations

In our simulations, firstly, we verified the effectiveness of the Deep CDI with different kinds of images. The first two images are from the Faces-LFW [47] dataset and the second are images of HeLa cells, where in both condition the amplitude and the phase is set to [0,1]. The simulation wavelength is 639 nm and the CCD camera (consisting of 512$\times$512 pixels, pixel size is $2.74\times 2.74\ {um}^{2}$) is placed $z=20mm$ downstream the simulated sample. The diameter of the support is 372 pixels. The two simulated diffraction patterns are shown in Fig. 4(a) and (l).

Fig. 4. Comparison of different methods to reconstruct amplitude and phase from simulation data. (a-k) is example 1 which uses two images from the FLW dataset. The range of the phase image is set to $[0, 1]$. (l-t) is the example 2 which uses images of HeLa cells as its amplitude and phase. (a, l) Diffraction patterns of complex valued objects at distance $z=20mm$ are used as the input for all method to reconstruct amplitude and phase images.

Download Full Size | PDF

With the simulated diffraction patterns, the Deep CDI is trained 10000 epochs to output the reconstruction results. For comparison, we processed the same diffraction pattern with the ER algorithm [13] and the conventional end-to-end learning method. The ER algorithm is a classical phase retrieval algorithm that utilizes a support constraint and iterates between the object domain and the Fourier domain. To ensure the convergence, the ER algorithm is run over 5000 iterations. The end-to-end method is trained with the Faces-LFW dataset (3,000 human face images as the amplitude labels and 3,000 other human face images as the phase labels). The same network structure (Fig. 3(a)) is used in the end-to-end method except for the loss function which is directly calculated the MSE loss between the network outputs and labels. We use the structural similarity [48] (SSIM) index values between the ground truth and the reconstructed results from different methods for quantitative evaluation, as given in Table 1. Simulation results show that the Deep CDI can successfully reconstruct amplitude and phase (Figs. 4(d), (e), (o), (p)) from one diffraction pattern. For Example 1, compared with other methods, Deep CDI gives a better result with fine details, such as the woman’s hair. The results of ER algorithm suffer from the twin-image problem [39] that appears in both the amplitude and phase channels (Figs. 4(f), (i), (q), and (r)), due to the centrosymmetric object support [17]. As one might expect, in Figs. 4(j), (k), (s), and (t), the end-to-end method performs better on data that are similar to the training set while fails to deal with different data [38]. These results indicate that the Deep CDI is adaptable for different types of samples, and there is no stagnation problem faced with the ER algorithm. However, it should be noted that the network trained by one diffraction measurement cannot be used to predict output from another different measurement, which is analyzed in detail in Supplement 1.

Table 1. The SSIM index values of reconstructed results from different methods.

View Table | View all tables in this article

Secondly, the effect of the diffraction distances $z$ on the reconstruction quality is analyzed through numerical simulation. Three different distances, $z=20mm, 80mm$ and $160mm$, are chosen for the analysis while other parameters unchanged. The images’ amplitude is set to $[0, 1]$ and the phase is set to $[0, \pi ]$. The results presented in Fig. 5 indicate our method can successfully reconstruct the amplitude and phase from the diffraction pattern in all cases. For quantitative evaluation, we calculate the SSIM between the reconstructed images and the ground truth. For the reconstructed amplitudes, the SSIM indexes are all higher than 0.9996 at distance $z=20mm, 80mm$ and $160mm$. While the SSIM indexes for the corresponding reconstructed phase are 0.9989, 0.9987 and 0.9988, respectively. Fig. 6(a) illustrates the MSE loss with an increasing number of epochs. At the three different distances, the loss curves share a similar trend. The MSE values are dropped suddenly from $1\times {10}^{-1}$ to $2\times {10}^{-2}$ at the first 10 epochs, and then decrease slowly to $1\times {10}^{-3}$ in 500 epochs. After 10,000 epochs, the MSE value fluctuated in a range of ${10}^{-4}$ to ${10}^{-6}$, as shown in the enlarged part in Fig. 6(a). It indicates that the Deep CDI works well at these diffraction distances.

Fig. 5. Comparison of network reconstruction results for the effect of distances by using simulation data. (a, b) Amplitude and phase ground truth. The phase range is set to $[0, \pi ]$. (c, f, i) Diffraction patterns at different distances, $z = 20mm, 80mm, 160mm$, as the network input. (d, e, g, h, j, k) The reconstructed amplitude and phase images of the network from different distances.

Download Full Size | PDF

Fig. 6. (a) The MSE loss curves for samples at different distances. (b) Network outputs during the optimization process at distance $z = 20mm$. Reconstructed amplitudes (the first row) and phases (the last row) on different epochs at distance $z=20mm$.

Download Full Size | PDF

The network outputs during optimization process at distance $z=20mm$ are illustrated in Fig. 6(b). In the optimization process, the network has shown rapid convergence within the support constraints ($S = 0$), as the support mask is well retrieved only after tens of epochs. Within the support region ($S = 1$), the error metric is reduced more slowly and steadily. A good estimation of the complex-valued object can be found after 10,000 epochs.

Finally, as the effectiveness of Deep CDI relies on the support constraint, we numerically analyzed the effect of the support region diameter $R$ on the reconstructed quality. Four kinds of support conditions are simulated, $R =\infty$ (without support), $R = 1.2 mm$, $R = 1.0 mm$, and $R = 0.8 mm$, while keeping other parameters unchanged. The corresponding pixels are calculated by R/delta, where delta is the simulated pixel sizes. The simulation results after 10,000 epochs are shown in Fig. 7. Figures 7 (d) and (e) are the reconstruction results without the support constraint, which can be seen as direct use of Wang’s method [38] to complex field reconstruction. The SSIM index values associated with the reconstruction amplitude (Fig. 7(d)) and phase (Fig. 7(e)) with respect to ground-truth in Figs. 7(a) and (b) are 0.8116 and 0.5801, respectively. From Figs. 7(g) and (h), (j) and (k) and (m) and (n), one can clearly see that the reconstruction quality is improved by using the support constraint. Besides, as the support diameter decrease, the quality of the reconstruction amplitude and phase is increased further. The quantitative performance evaluation of SSIM index values indicates the same trend, which is listed in Table 2. Significantly, the SSIM index values of $R=1.0mm$ and $0.8mm$ are higher and close. Therefore, we choose $R = 1.0 mm$ in our optical experiments because it has a larger field of view than when $R = 0.8 mm$ and higher reconstruction quality than when $R = 1.2 mm$.

Fig. 7. Comparison of network reconstruction results for the effect of support region diameter ($R$). (a, b) Amplitude and phase ground truth. The phase range is set to [0, 1]. (c, f, i, l) Diffraction patterns with different support diameters at distance $z = 20mm$. (d, e, g, h, j, k) The corresponding reconstructed amplitude and phase images of the network under different support diameters.

Download Full Size | PDF

Table 2. The SSIM index values of reconstruction results with different support diameters.

View Table | View all tables in this article

3.2 Complex field reconstruction from one single diffraction pattern

As a proof-of-principle experiment, we present an optical bench result of a biological sample. A 1 mm pinhole is placed against to a small intestine section. The wavelength of light is 639 nm. The CMOS camera with $2.74\times 2.74\ {um}^{2}$ pixel size is placed 20.98 mm (calibrated using the sharpness-statistics-based auto-focusing algorithm [49]) downstream of the sample. The diffraction patterns with 512$\times$512 pixels are captured at three different positions of the sample (as shown in Fig. 8(a)). The diameter of real support is then 364 pixels in these cases, while we impose a loose support of 376 pixels diameter. To validate the Deep CDI results, we also present reconstruction results of off-axis holography [50] using the same camera to record the hologram of the same region.

Fig. 8. Static experimental results via Deep CDI. (a) The image of the small intestine section (top right) taken by a conventional microscope with a $\times 40$ objective lens. (b1, c1, d1) Reconstructed amplitude images by the Deep CDI in the corresponding position in a. (b2, c2, d2) Reconstructed phase images by the Deep CDI in the corresponding position in a. (d3, d4) The amplitude and phase images are reconstructed by off-axis holography for validating. (e) The MSE loss curves of the three static experimental data during the training process.

Download Full Size | PDF

To reconstruct the complex field, each diffraction pattern is input separately into a Deep CDI network with the same hyperparameter. Figures 8(b)-(d) show the Deep CDI reconstruction results of the intestine section after 10,000 epochs. And the recorded diffraction patterns of corresponding position are shown in Supplement 1, Fig. S2. The MSE curves during the training process are plotted in Fig. 8(e), which share a similar trend with our numerical results. One can see from the curves that all the MSE values are converged to $1.0\times {10}^{-4}$ after about 10,000 epochs. Off-axis reconstruction results at (d1) position are shown in Figs. 8(d3) and (d4). The overall reconstruction results are in good agreement between the two methods. The experiment results demonstrate that Deep CDI can reconstruct the complex field through only one intensity measurement without any labeled data, which is essential to deep learning for practice applications as the data limits are eliminated.

3.3 Dynamic process reconstruction with a biological sample

As the Deep CDI can recover the quantitative amplitude and phase with a single acquisition, it is suitable for studying dynamic specimens. In this section, investigations of a live rotifer are carried out to demonstrate the ability of a trained Deep CDI model for faithful and fast reconstructions. The experimental configuration is kept unchanged with the static experiment, while the distance between the sample and the camera is calibrated to be $20.87mm$, for a detailed description of sample preparation the reader is referred to Supplement 1. We record a $60 s$ dynamic process with a speed of 60 FPS, thus a total of 3600 diffraction patterns are recorded. It should be noted that reconstructing the dynamic process frame by frame would be extremely time-consuming, as each frame requires about 10 minutes to reconstruct and the whole process requires about $3600\times \frac {10}{60}=600$ hours to reconstruct. Therefore, in order to achieve fast reconstruction, we demonstrate a two-step reconstruction strategy. In the first step, we randomly select one frame every second, and thus a total of 60 diffraction patterns are used as the training set to train the Deep CDI network, while 6 randomly selected diffraction patterns are to compose the validation set. Noting the training step is basically the same as that with a single diffraction measurement, except for the number of input data. The network is trained after 1000 epochs costing about 30 minutes of training. Then, in the second step, the total 3600 diffraction patterns are inputted into the trained model to predict the whole biological dynamic process. Part of the raw data and reconstruction results are shown in Fig. 9. Figure 9(a) shows the recorded diffraction pattern of the live rotifer. The corresponding reconstructed amplitude and phase are shown in Figs. 9(b) and (c), respectively. The reconstruction results of the whole dynamic process can be found in Visualization 1. Specifically, one can clearly see from Fig. 9(c), that wrapped phase are successfully reconstructed. This result indicates that the Deep CDI works well for phase modulation ranges larger than $2\pi$. The MSE loss curves of the training set and validation set are shown in Fig. 10(a), which gradually converge as the learning progresses. In addition, we recorded the prediction time of the phase and amplitude images for each frame, as shown in Fig. 10(b). The average time for the trained model to predict one frame is 0.0044s (228 FPS), which is much less than 0.0126s (the camera’s acquisition frame rate 60 FPS in our experiment). The computing time of the different reconstruction strategies for the dynamic process are detailed in Table 3. As a result, using our strategy, the total time used for reconstructing the dynamic process (including training and predicting) from the 3600 diffraction patterns is only about 30 minutes ( 30mins for training and 0.26min for prediction), which is approximately 1/1200 of reconstructing frame by frame and 1/20 of the ER algorithm.

Fig. 9. Dynamic experimental results. (a) Diffraction patterns captured at the certain time (top right) with 60 frames per second at distance $z = 20.87 mm$. (b and c) are predicted amplitude and wrapped phase images, respectively, by a trained Deep CDI network from the diffraction pattern at the certain time.

Download Full Size | PDF

Fig. 10. (a) MSE loss curves in Deep CDI training using a fraction of the dynamic diffraction data. (b) The predicted time of each frame by trained Deep CDI network. The red dot dash line indicates the camera’s frame rate interval of 60 Hz.

Download Full Size | PDF

Table 3. Comparison of different methods to reconstruct the dynamic process from a time series of 3600 diffraction patterns.

View Table | View all tables in this article

4. Conclusion

In conclusion, Deep CDI overcomes three main problems addressed in this article.

First is a longstanding challenge in CDI, that is, how to reconstruct the quantitative complex amplitude information of a light field from a single diffraction measurement. Conventional iterative algorithms are accompanied by complementary problems, plaguing with the uniqueness of the solutions and the stagnation of the algorithm. Deep CDI solves this problem by combining a conventional artificial neural network (U-net) with a real-world physical imaging model. The combination is established by designing loss functions based on real-world physical constraints. To our knowledge, we propose for the first time that the support region constraint is used as the loss function in an untrained neural network. The loss calculated from support constraint and free propagation constraint is summed up to optimize the network’s weights during network fitting, which finally allows the complex field to be reconstructed with a single diffraction measurement.

Second, the pure end-to-end approach usually requires a large amount of labeled data to train a network [25,35]. However, labeled data in a clinic or biomedical applications are limited, unknown, or insufficient. Most of those labeled data can be approximated by scanning at a high dose, over a long time, and using sophisticated hardware, but these are either costly or totally infeasible [37]. Deep CDI, instead, employs an untrained training strategy, requires only the diffraction measurement itself to be inputted. Therefore, Deep CDI might bring a new DL-based solution for image reconstruction in data-limited situations.

The last problem is the time-consuming reconstruction. Theoretically, all the single-frame reconstruction methods can be used for dynamic process reconstruction. However, conventional iterative methods have to process the data frame by frame, resulting in a high cost of time. We introduce a two-step reconstruction strategy based on Deep CDI for dynamic processes with a large amount of data to be processed. Because the recorded diffraction patterns share similar features in a certain dynamic process, we can use a fraction of the recorded data to train the network for a short time. Once trained, the network can predict the whole process with an average reconstruction speed that is much faster than conventional iterative algorithms.

Furthermore, Deep CDI also has the advantage of simple experimental implementation. In our experiment, we designed a concise and cost-effective structure only with a thin steel pinhole attached to the sample after the illuminating beam. Compared to other single-shot optical experiments, such as CMI [22] and Off-axis holography [49], there is no modulator or reference beam in our experiment light path.

In essence, we have developed Deep CDI, a physics-driven untrained learning method for complex field reconstruction from a single diffraction measurement. We validate the proposed method using both numerical simulations and optical experiments on static and dynamic biological samples. The numerical results indicate that Deep CDI is robust to different types of samples; the effects of the diffraction distance z and the support region radius R on reconstruction quality are also analyzed. The static experiment results show that Deep CDI can reconstruct a complex-valued object from one diffraction pattern with fast convergence. In addition, fast reconstruction of complex amplitude for a biological dynamic process is presented, demonstrating the superiority of Deep CDI.

However, the ability of Deep CDI that uses a fraction of the measurements to train the network and predict all frames, is mainly because that the diffraction measurements from one certain dynamic process have high similarity and the training data is randomly selected a uniform distribution. When dealing with a totally different dataset, the network needs to be re-trained or trained from scratch.

Although the Deep CDI is demonstrated for biological samples using visible light, the approach is in principle applicable to a broad range of wavelengths and radiations, such as x-ray and high-energy electrons [6]. Possible applications of our method include the complex field reconstruction and imaging of sparsely varying dynamic processes [51], such as growing neurons [52] and crystal formation [53], and so on. With further development, we expect this general Deep CDI method can be used to image a wide range of dynamical phenomena with high spatial and temporal resolution.

Funding

Youth Innovation Promotion Association of the Chinese Academy of Sciences (2017489); Youth Supported by the University of Chinese Academy of Sciences; Fusion Foundation of Research and Education of CAS; Natural Science Foundation of Hebei Province (F2018402285); National Natural Science Foundation of China (61575197).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. J. Miao, P. Charalambous, J. Kirz, and D. Sayre, “Extending the methodology of X-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens,” Nature 400(6742), 342–344 (1999). [CrossRef]

2. H. N. Chapman and K. A. Nugent, “Coherent lensless X-ray imaging,” Nat. Photonics 4(12), 833–839 (2010). [CrossRef]

3. J. M. Rodenburg and H. M. L. Faulkner, “A phase retrieval algorithm for shifting illumination,” Appl. Phys. Lett. 85(20), 4795–4797 (2004). [CrossRef]

4. I. K. Robinson, I. A. Vartanyants, G. J. Williams, M. A. Pfeifer, and J. A. Pitney, “Reconstruction of the Shapes of Gold Nanocrystals Using Coherent X-Ray Diffraction,” Phys. Rev. Lett. 87(19), 195505 (2001). [CrossRef]

5. G. J. Williams, H. M. Quiney, B. B. Dhal, C. Q. Tran, K. A. Nugent, A. G. Peele, D. Paterson, and M. D. de Jonge, “Fresnel Coherent Diffractive Imaging,” Phys. Rev. Lett. 97(2), 025506 (2006). [CrossRef]

6. B. Abbey, K. A. Nugent, G. J. Williams, J. N. Clark, A. G. Peele, M. A. Pfeifer, M. de Jonge, and I. McNulty, “Keyhole coherent diffractive imaging,” Nat. Phys. 4(5), 394–398 (2008). [CrossRef]

7. B. H. Dean, D. L. Aronstein, J. S. Smith, R. Shiri, and D. S. Acton, Phase retrieval algorithm for jwst flight and testbed telescope, in Space telescopes and instrumentation I: optical, infrared, and millimeter, vol. 6265 (International Society for Optics and Photonics, 2006), p. 626511.

8. Y. Park, C. Depeursinge, and G. Popescu, “Quantitative phase imaging in biomedicine,” Nat. Photonics 12(10), 578–589 (2018). [CrossRef]

9. T. Kimura, Y. Joti, A. Shibuya, C. Song, S. Kim, K. Tono, M. Yabashi, M. Tamakoshi, T. Moriya, T. Oshima, T. Ishikawa, Y. Bessho, and Y. Nishino, “Imaging live cell in micro-liquid enclosure by X-ray laser diffraction,” Nat. Commun. 5(1), 3052 (2014). [CrossRef]

10. P. Thibault, M. Dierolf, A. Menzel, O. Bunk, C. David, and F. Pfeiffer, “High-Resolution Scanning X-ray Diffraction Microscopy,” Science 321(5887), 379–382 (2008). [CrossRef]

11. G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution Fourier ptychographic microscopy,” Nat. Photonics 7(9), 739–745 (2013). [CrossRef]

12. J. N. Clark, C. T. Putkunz, E. K. Curwood, D. J. Vine, R. Scholten, I. McNulty, K. A. Nugent, and A. G. Peele, “Dynamic sample imaging in coherent diffractive imaging,” Opt. Lett. 36(11), 1954 (2011). [CrossRef]

13. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21(15), 2758 (1982). [CrossRef]

14. V. Elser, “Phase retrieval by iterated projections,” J. Opt. Soc. Am. A 20(1), 40 (2003). [CrossRef]

15. J. R. Fienup and C. C. Wackerman, “Phase-retrieval stagnation problems and solutions,” J. Opt. Soc. Am. A 3(11), 1897 (1986). [CrossRef]

16. Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase Retrieval with Application to Optical Imaging: A contemporary overview,” IEEE Signal Process. Mag. 32(3), 87–109 (2015). [CrossRef]

17. M. Guizar-Sicairos and J. R. Fienup, “Understanding the twin-image problem in phase retrieval,” J. Opt. Soc. Am. A 29(11), 2367 (2012). [CrossRef]

18. S. Marchesini, “Invited Article: A unified evaluation of iterative projection algorithms for phase retrieval,” Rev. Sci. Instrum. 78(1), 011301 (2007). [CrossRef]

19. J. M. Rodenburg, A. C. Hurst, A. G. Cullis, B. R. Dobson, F. Pfeiffer, O. Bunk, C. David, K. Jefimovs, and I. Johnson, “Hard-X-Ray Lensless Imaging of Extended Objects,” Phys. Rev. Lett. 98(3), 034801 (2007). [CrossRef]

20. A. Maiden, D. Johnson, and L. Peng, “Further improvements to the ptychographical iterative engine,” Optica 4(7), 736 (2017). [CrossRef]

21. P. Sidorenko and O. Cohen, “Single-shot ptychography,” Optica 3(1), 9 (2016). [CrossRef]

22. F. Zhang, B. Chen, G. R. Morrison, J. Vila-Comamala, M. Guizar-Sicairos, and I. K. Robinson, “Phase retrieval by coherent modulation imaging,” Nat. Commun. 7(1), 13367 (2016). [CrossRef]

23. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921 (2019). [CrossRef]

24. M. T. McCann, K. H. Jin, and M. Unser, “Convolutional Neural Networks for Inverse Problems in Imaging: A Review,” IEEE Signal Process. Mag. 34(6), 85–95 (2017). [CrossRef]

25. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4(9), 1117 (2017). [CrossRef]

26. M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. 7(1), 17865 (2017). [CrossRef]

27. T. Liu, K. de Haan, Y. Rivenson, Z. Wei, X. Zeng, Y. Zhang, and A. Ozcan, “Deep learning-based super-resolution in coherent imaging systems,” Sci. Rep. 9(1), 3926 (2019). [CrossRef]

28. C. Bai, M. Zhou, J. Min, S. Dang, X. Yu, P. Zhang, T. Peng, and B. Yao, “Robust contrast-transfer-function phase retrieval via flexible deep learning networks,” Opt. Lett. 44(21), 5141 (2019). [CrossRef]

29. J. Zhang, T. Xu, Z. Shen, Y. Qiao, and Y. Zhang, “Fourier ptychographic microscopy reconstruction with multiscale deep residual network,” Opt. Express 27(6), 8612 (2019). [CrossRef]

30. I. Kang, F. Zhang, and G. Barbastathis, “Phase extraction neural network (PhENN) with coherent modulation imaging (CMI) for phase retrieval at low photon counts,” Opt. Express 28(15), 21578 (2020). [CrossRef]

31. Y. Wu, Y. Rivenson, Y. Zhang, Z. Wei, H. Günaydin, X. Lin, and A. Ozcan, “Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery,” Optica 5(6), 704 (2018). [CrossRef]

32. Y. Rivenson, Z. Göröcs, H. Günaydin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica 4(11), 1437 (2017). [CrossRef]

33. E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-STORM: super-resolution single-molecule microscopy by deep learning,” Optica 5(4), 458 (2018). [CrossRef]

34. Y. Li, Y. Xue, and L. Tian, “Deep speckle correlation: a deep learning approach toward scalable imaging through scattering media,” Optica 5(10), 1181 (2018). [CrossRef]

35. S. Li, M. Deng, J. Lee, A. Sinha, and G. Barbastathis, “Imaging through glass diffusers using densely connected convolutional networks,” Optica 5(7), 803 (2018). [CrossRef]

36. H. Wang, Y. Rivenson, Y. Jin, Z. Wei, R. Gao, H. Günaydin, L. A. Bentolila, C. Kural, and A. Ozcan, “Deep learning enables cross-modality super-resolution in fluorescence microscopy,” Nat. Methods 16(1), 103–110 (2019). [CrossRef]

37. G. Wang, J. C. Ye, and B. De Man, “Deep learning for tomographic image reconstruction,” Nat Mach Intell 2(12), 737–748 (2020). [CrossRef]

38. F. Wang, Y. Bian, H. Wang, M. Lyu, G. Pedrini, W. Osten, G. Barbastathis, and G. Situ, “Phase imaging with an untrained neural network,” Light Sci Appl 9(1), 77 (2020). [CrossRef]

39. Y. Rivenson, Y. Zhang, H. Günaydin, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light Sci Appl 7(2), 17141 (2018). [CrossRef]

40. Y. Rivenson, Y. Wu, and A. Ozcan, “Deep learning in holography and coherent imaging,” Light Sci Appl 8(1), 85 (2019). [CrossRef]

41. O. Wengrowicz, O. Peleg, T. Zahavy, B. Loevsky, and O. Cohen, “Deep neural networks in single-shot ptychography,” Opt. Express 28(12), 17511 (2020). [CrossRef]

42. M. J. Cherukara, Y. S. G. Nashed, and R. J. Harder, “Real-time coherent diffraction inversion using deep generative networks,” Sci. Rep. 8(1), 16520 (2018). [CrossRef]

43. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation”, arXiv:1505.04597 [cs] (2015). ArXiv: 1505.04597.

44. A. M. Maiden and J. M. Rodenburg, “An improved ptychographical phase retrieval algorithm for diffractive imaging,” Ultramicroscopy 109(10), 1256–1262 (2009). [CrossRef]

45. J. R. Fienup, T. R. Crimmins, and W. Holsztynski, “Reconstruction of the support of an object from the support of its autocorrelation,” J. Opt. Soc. Am. 72(5), 610 (1982). [CrossRef]

46. T. Latychevskaia and H.-W. Fink, “Practical algorithms for simulation and reconstruction of digital in-line holograms,” Appl. Opt. 54(9), 2424–2434 (2015). [CrossRef]

47. G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database forstudying face recognition in unconstrained environments,” in Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, (2008).

48. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing 13(4), 600–612 (2004). [CrossRef]

49. R. Ma, D. Yang, T. Yu, T. Li, X. Sun, Y. Zhu, N. Yang, H. Wang, and Y. Shi, “Sharpness-statistics-based auto-focusing algorithm for optical ptychography,” Optics and Lasers in Engineering 128, 106053 (2020). [CrossRef]

50. J. T. Sheridan, R. K. Kostuk, A. F. Gil, Y. Wang, W. Lu, H. Zhong, Y. Tomita, C. Neipp, J. Francés, S. Gallego, I. Pascual, V. Marinova, S.-H. Lin, K.-Y. Hsu, F. Bruder, S. Hansen, C. Manecke, R. Meisenheimer, C. Rewitz, T. Rölle, S. Odinokov, O. Matoba, M. Kumar, X. Quan, Y. Awatsuji, P. W. Wachulak, A. V. Gorelaya, A. A. Sevryugin, E. V. Shalymov, V. Yu Venediktov, R. Chmelik, M. A. Ferrara, G. Coppola, A. Márquez, A. Beléndez, W. Yang, R. Yuste, A. Bianco, A. Zanutta, C. Falldorf, J. J. Healy, X. Fan, B. M. Hennelly, I. Zhurminsky, M. Schnieper, R. Ferrini, S. Fricke, G. Situ, H. Wang, A. S. Abdurashitov, V. V. Tuchin, N. V. Petrov, T. Nomura, D. R. Morim, and K. Saravanamuttu, “Roadmap on holography,” J. Opt. 22(12), 123002 (2020). [CrossRef]

51. Y. Shechtman, Y. C. Eldar, O. Cohen, and M. Segev, “Efficient coherent diffractive imaging for sparsely varying objects,” Opt. Express 21(5), 6327–6338 (2013). [CrossRef]

52. M. Maletic-Savatic, “Rapid Dendritic Morphogenesis in CA1 Hippocampal Dendrites Induced by Synaptic Activity,” Science 283(5409), 1923–1927 (1999). [CrossRef]

53. M. A. Lauterbach, C. K. Ullal, V. Westphal, and S. W. Hell, “Dynamic Imaging of Colloidal-Crystal Nanostructures at 200 Frames per Second,” Langmuir 26(18), 14400–14404 (2010). [CrossRef]

Name	Description
Supplement 1	Supplemental Document
Visualization 1	The collected diffraction patterns, predicted amplitude and wrapped phase images are presented in the video with 60 FPS.

		Deep CDI	ER	End-to-end
Example 1	amplitude	0.9942	0.9057	0.9042
Example 1	phase	0.9786	0.8248	0.8382
Example 2	amplitude	0.9997	0.9223	0.8830
Example 2	phase	0.9987	0.8295	0.8348

	R= $\infty$	R=1.2mm	R=1.0mm	R=0.8mm
amplitude	0.8116	0.9394	0.9942	0.9940
phase	0.5801	0.8654	0.9786	0.9801

Method	Epochs	Training time (min)	Average one frame (s)	Total time (min)
ER	100	/	$\sim$ 10	$\sim$ 600
Deep CDI (frame by frame)	10000	$\sim$ 10	$\sim$ 600	$\sim$ 36000
Deep CDI (our strategy)	1000	$\sim$ 30	$\sim$ 0.004	$\sim$ 30

		Deep CDI	ER	End-to-end
Example 1	amplitude	0.9942	0.9057	0.9042
Example 1	phase	0.9786	0.8248	0.8382
Example 2	amplitude	0.9997	0.9223	0.8830
Example 2	phase	0.9987	0.8295	0.8348

Dynamic coherent diffractive imaging with a physics-driven untrained learning method

Abstract

1. Introduction

2. Method

2.1 Formulation of the inverse problem

2.2 Deep CDI principle

2.3 Experiment setup

2.4 Design of the network architecture

2.5 Implementation details

3. Results and discussion

3.1 Numerical simulations

3.2 Complex field reconstruction from one single diffraction pattern

3.3 Dynamic process reconstruction with a biological sample

4. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (2)

Data availability

Cited By

Figures (10)

Tables (3)

Equations (14)

Optics Express