Displacement-agnostic coherent imaging through scatter with an interpretable deep neural network

Yunzhe Li; Shiyi Cheng; Yujia Xue; Lei Tian

doi:10.1364/OE.411291

1. Introduction

Imaging through scatter remains one of the most challenging tasks in computational imaging. The difficulty stems from the scattering process scrambling the object’s spatial information, which can be understood with a complex system matrix. As a result, computational retrieval of the object requires solving an ill-posed inverse problem based on the speckle measurements and a careful characterization of the random media [1,2]. Despite these challenges, many effective techniques have been demonstrated for various applications, such as wavefront shaping [3], deep tissue imaging [4], and dynamic biological imaging [5].

In general, the coherent scattering process can be characterized by a linear transmission matrix (TM) [1,6]. This coherent TM establishes a one-to-one relation between the input and the output wavefronts. However, since the scattering is in general linearly shift-variant (LSV) [1,3,6], the complete characterization of the TM is often time-consuming due to the large size of the matrix [1]. Computational imaging techniques based on inverting the TM are susceptible to calibration errors, which may come from medium change and other perturbations to the system [7–10]. One useful simplification utilizing the memory effect [11,12] approximates the system to be linearly shift-invariant (LSI) [13,14]. Under this approximation, an invariant speckle intensity pattern only translates when shifting/rotating the incident beam over a small distance/angle[11,12]. This implies that under the incoherent imaging condition, the output speckle intensity is the convolution between the object’s intensity distribution and the medium’s speckle intensity point spread function (PSF) [13,15]. Based on this principle, 2D imaging through scatter can be achieved with a single-shot by either utilizing the pre-calibrated PSF [16] or solving a phase retrieval problem based on the autocorrelation of the output intensity [17].

Recently, deep learning (DL) has proven to be a powerful technique for solving highly ill-posed computational imaging problems [18]. In particular, deep neural network (DNN) models have been proposed to replace the standard TM that relates the output speckle patterns and the input objects, and shown superior performance over traditional methods [19–22]. Most importantly, the DNN models have shown to be resilient against various perturbations and instabilities of the scattering media. For example, DNN models for coherent imaging through multimode fibers can make robust predictions under temperature and mechanical instabilities and wavelength drift [21,23–25]. In our previous work, we have shown that a DNN model for coherent imaging through scatter trained on a few thin diffusers can make high-quality predictions through unseen diffusers [26], indicating the model’s robustness against medium perturbations. Specifically, during this study, we changed the diffuser placed in-between the object and the imaging optics while keeping the diffuser’s location and the imaging system unchanged. In general, many other factors can perturb the scattering medium and thus affect the imaging performance.

In this work, we further consider the effect of axial displacements of the scatterer itself and the imaging optics. We demonstrate a more robust DNN model for coherent imaging through scatter that is agnostic to a broader class of perturbations. Generally speaking, axial displacements of both the scatterer and the imaging optics reduce the correlation of the speckle intensity measurements [27]. Recently, the 3D memory effect has been used to expand the imaging range and achieve extended depth of field (DOF) in imaging through scatter [28–32]. Coherent imaging under defocus is further complicated by diffraction effects. In Wu et al.’s work [33], 10$\times$ DOF improvement is achieved by incorporating defocus measurements in the training process for in-line holographic imaging in free space. In general, to train a robust DNN model against a broad class of perturbations, a diverse training dataset is needed to provide sufficient statistical information of the underlying process. As a result, we design our DNN training by incorporating the measurements taken from changing the scatterer, axially displacing the scatterer, and defocusing the optics. Specifically, our training data includes speckle patterns captured from four different diffusers at each training position; the training positions includes one diffuser position at 5$\times$ DOF displacement and two camera positions at $\pm 5\times$ DOF defocus (Fig. 1(a)). Most importantly, we show that the trained DNN can make high-quality predictions beyond the training range which is across 10$\times$ DOF through previously unseen diffuser.

Fig. 1. Overview of our deep learning approach to achieve generalization for coherent imaging through scatter. (a) Our imaging model and data acquisition approach to obtain a diverse dataset. The dataset including scatterer changes, sensor and scatterer displacement over 10$\times$ DOF. Speckle patterns from training diffusers at training positions are used to train the DNN. Others are used as testing data. (b) We implemented a DNN structure including a transformation module to achieve generalizability.

Download Full Size | PDF

To achieve robustness to displacement and better generalizability, we propose a DNN model using a hybrid network structure to better model the shift-variant (SV) property of the imaging problem (Fig. 1(b)). Our network is built on the modified U-net structure in our previous work [26]. To improve the network’s expressivity for modeling the SV properties, we add two fully-connected layers in the bottleneck of the network, as denoted by the transformation module in Fig. 1(b). This module takes input from the encoded features and transforms the 2D information to a 1D vector, which is then fed into the decoder to reconstruct the 2D object. The 1D vector here is denoted as the ‘latent code’, which is commonly interpreted as the low-dimensional representation of the encoded high-dimension information through the DNN’s encoder path [34]. The operations on the latent code in the transformation module enlarges the effective receptive field of the DNN model, which in turn accounts for the SV of the system.

To interpret the working principle of our DNN model and better understand its generalization capability, we further develop a new analysis framework based on an unsupervised dimension reduction technique, UMAP [35]. In particular, our analysis provides several new insights into both the information contained in the raw speckle data, and the training and prediction processes. First, we show that directly decomposing the raw speckle intensity data onto a nonlinear manifold using the unsupervised technique already reveals scatter and displacement-specific information without the need for any prior information nor supervised learning. Second, by further analyzing the latent code in our DNN model, we show that the model reveals object-specific information and disentangles scatter and displacement information using the encoder path. The generalization of our DNN model is analyzed in two steps. First, we set up the learned latent manifold using the learned latent codes extracted by feeding only the training data to the trained network. Next, the predicted latent codes from the unseen speckle patterns under different scattering conditions are extracted and projected onto the learned manifold. We show that the predicted latent codes match well with the learned latent codes, which indicates that indeed the DNN model can generalize well to the unseen scattering cases. Finally, we further ‘dissect’ the network and elucidate on the distinct functionalities of the encoder-decoder path and skip connections for performing the underlying imaging task. Our analysis shows that the encoder-decoder path is responsible to the generalization to unseen scatterers, whereas the skip connections mainly contribute to improve the reconstruction quality for seen scatterers.

2. Methods

2.1 Experimental setup

The imaging setup is shown in Fig. 2(a). A spatial light modulator (SLM) (Holoeye NIR-011, pixel size 8 $\mu$m) was coherently illuminated by a collimated beam from a HeNe laser (632 nm, Thorlabs HNL210L). We used the SLM as a programmable amplitude-only object by placing two orthogonally oriented polarizers before and after. The SLM was relayed onto the camera (Thorlabs Quantalux, pixel size 5.04 $\mu$m) by a 4F system. Two lenses with focal lengths 200 mm (L1) and 125 mm (L2) were used to provide a 0.625 magnification. A 14 mm iris was placed at the pupil plane of the 4F system to control the speckle size. A diffuser was placed in between the SLM and L1. We placed five diffusers (Thorlabs N-BK7 Ground Glass Diffuser, 220 Grit DG10-220) on a filter wheel (Thorlabs FW1A) in order to capture data through different diffusers. Both the camera and filter wheel were attached to linear motion stages that can be moved axially. The initial positions for the camera $Z_{C10}$ and the diffuser $Z_{D0}$ were set at the back focal plane of L2 and 100 mm in front of L1, respectively. By controlling the motion stages, we captured speckle patterns from multiple combinations of diffuser/camera displacements and through five different diffusers. The intervals between the neighboring displacement positions for the diffuser and the camera are 1$\times$ DOF, which are set by the corresponding speckle sizes.

Fig. 2. Experimental setup and data characterization. (a) The experiment setup for coherent imaging through scatter. The SLM is used as the amplitude-only object. Both the diffuser and the camera are placed on motion stages to control the axial displacement. (b) The 3D speckle size is characterized by calculating the speckle intensity stack’s autocorrelation. (c) The acquired speckle data covers combinations of five diffusers, displacing the diffuser up to 10$\times$ DOF, and defocusing the camera up to $\pm$10$\times$ DOF. (d) The raw speckle intensity distributions $P(I/\bar {I})$ approximately follow the same probability density distribution regardless of the input object, diffuser, camera position $Z_{Ci}$, and diffuser position $Z_{Di}$. The two orthogonal axes (labeled with $Z_{Di}$ and $Z_{Ci}$) represent the diffuser and camera displacement positions, respectively.

Download Full Size | PDF

2.2 System characterization

The system was characterized by measuring the 3D speckle size [27], as summarized in Fig. 2(b). We first captured a speckle intensity stack by moving the camera from 2 mm before $Z_{C10}$ to 2 mm after $Z_{C10}$ with step size 0.02 mm while fixing the diffuser at its initial position $Z_{D0}$. We then measured the 3D speckle size by calculating the 3D autocorrelation of the speckle intensity stack[27], as shown in Fig. 2(b). The theoretical lateral speckle size in free space propagation geometry is defined by the full width at half maximum (FWHM) of the normalized autocorrelation function along the lateral direction and is $\lambda \frac {z}{D}$, where $\lambda$ is the wavelength of the coherent source, $z$ is the distance between the scattering surface and observation region, $D$ is the diameter of the scattering spot. The axial speckle size is defined by the FWHM of the normalized correlation function along the axial direction and is $7.1\lambda {\left (\frac {z}{D}\right )}^{2}$, which also defines the system’s DOF [27]. According to this, the theoretical axial speckle size of our system at the sensor plane is 0.38 mm. The experimentally measured axial speckle sizes is 0.41 mm, which match well with the theory. The theoretical lateral speckle size at the camera side is 5.56 $\mu$m while the measured lateral speckle size is 10.08 $\mu$m. The experimentally measured lateral speckle size is larger than the theoretical value due to under-sampling by the camera, which also match with the theory The axial speckle size at diffuser plane is enlarged to 1.04 mm due to the system magnification. We used the respective axial speckle size to define the DOF in the object and image spaces.

2.3 Data acquisition

The data used for training and testing is summarized in Fig. 2(c) and Table 1. Our training data consists of 4200 image pairs, each consisting of the input object and the measured speckle pattern. The input objects contained 400 MNIST handwritten digits [36], 350 of which were used as the training objects. The speckle patterns for training were taken through four different diffusers ($D_1,D_2,D_3,D_4$), one diffuser position ($Z_{D5}$), and two camera positions ($Z_{C5}$, $Z_{C15}$). Specifically, the training diffuser position was 5$\times$ DOF from its initial position at $Z_{D5}$; the two training camera positions were $\pm$5$\times$ DOF away from its initial positions at $Z_{C5}$, $Z_{C15}$. The testing data consisted of 50000 speckle patterns taken under two types of conditions. In the first type, we tested our network using speckle patterns through the four seen diffusers (i.e. used for taking the training data), but at 9 unseen diffuser positions ($Z_{D0},\ldots ,Z_{D4},Z_{D6},\ldots ,Z_{D10}$) and 18 unseen camera positions ($Z_{C0},\ldots ,Z_{C4},Z_{C6},\ldots ,Z_{C14}$,$Z_{C16},\ldots ,Z_{C20}$,), and using both seen and unseen objects. In the second testing condition, we applied our network to speckle patterns from the unseen diffuser (i.e. never used during network training) at all diffuser and camera positions and using both seen and unseen objects. When taking the displacement measurements, the diffuser was moved with a 1 mm interval from the initial position $Z_{D0}$ towards L1 and took data at the 11 different positions ($Z_{D0}$ to $Z_{D10}$) with the camera being fixed at $Z_{C10}$. Next, the camera was moved with a 0.5 mm interval across 21 positions from $Z_{C0}$ to $Z_{C20}$ with the diffuser being fixed at $Z_{D0}$. Among the 20 camera positions, 10 positions ($Z_{C0}$ to $Z_{C9}$) were moved toward L2, the other 10 were moved away from L2. The total displacement range of the diffuser and the camera covers 10$\times$ DOF and 20$\times$ DOF, respectively. We note that the testing displacement range is far beyond the training displacement range.

Table 1. Data acquired for training (in black) and testing (in red)

View Table

2.4 Speckle intensity distribution

We studied the statistical distribution of the measured speckle intensity data, as summarized in Fig. 2(d). For fully developed speckles, it is well known that their intensities follow the negative exponential distribution [27]. For $N$ incoherently summed independent speckles, the probability density function (PDF) is a Gamma density function [27]:

(1)$$P(I)=\frac{N^{N}I^{N-1}}{\Gamma(N)\bar{I}^{N}}\exp\left(-N\frac{I}{\bar{I}}\right).$$

In practice, the PDF was estimated from the normalized intensity histogram of experimentally measured speckle patterns. As shown in Fig. 2(c), the estimated PDF of the speckle patterns captured from different objects through different diffusers and/or at different displacement positions can all be fitted to the same PDF. $N$ was estimated to be 1.8, which is consistent to our under-sampled imaging condition. This highlights that all the scattering-specific and object-specific information are encoded in the higher order statistics and are hard to extract using standard statistical fitting techniques.

2.5 Data preprocessing

The SLM input and the measurements are collected in pairs for generating the dataset. The central 512$\times$512 SLM pixels were used as the object; the corresponding central 512$\times$512 camera pixels were used as the input for our DNN. The objects displayed on the SLM were 8-bit grayscale images from the MNIST handwritten digit. Due to the computation and memory limitations, all input and output images were down-sampled from 512$\times$512 pixels to 128$\times$128 pixels by taking the average within each 4$\times$4 neighboring pixels (i.e. pixel binning). For both training and testing, intensity outliers were removed by histogram clipping. The speckle images were then normalized between 0 and 1 by dividing each image by its maximum. Although we display grayscale images on the SLM, our DNN was designed to make binary predictions.

2.6 Network implementation

We built a DNN shown in Fig. 1(b) to learn a statistical model relating the speckle patterns and the unscattered objects. The overall structure of the proposed DNN follows the U-net architecture with the modifications of replacing the convolutional (conv) layers with the dense blocks [26] and the additional fully connected layers at the “bottleneck” to perform latent code transformation. The input to the DNN is a preprocessed 128$\times$128 speckle intensity. The input then goes through the “encoder path", which yields a stack of 44 latent code. The latent code includes case-specific information that encodes the displacement and diffuser parameter. Next, the latent code is flattened to a 1D vector, which is input to two fully connected layers and then reshaped to 2D. Together, this composes the latent code transformation module. This module enables transforming the case-specific information to meaningful object-specific features. These operations on the latent code also enlarges the effective receptive field of the DNN model, which facilitates modeling the shift-variance effect of the imaging process. The decoder reverses the process that recombines the information into feature maps with gradually increased lateral details. Skip connections are used to transfer additional information from the encoder to the decoder without going through the bottleneck. The final output is a binary object prediction. We use the binary cross-entropy [37] as the loss function. Additional details of our DNN model and the benefit of the transformation module are discussed in Supplementary Materials.

The DNN training was performed on BU Shared Computing Cluster with one GPU (NVIDIA Tesla P100) using Keras/Tensorflow. Each CNN was trained with 100 epochs by the ADAM optimizer for up to three hours. The learning rate of $10^{-4}$ was used. Once the DNN was trained, each prediction was made in 0.0156 s.

3. Results

We demonstrated the robustness of our network against diffuser change as well as diffuser and the camera displacements on two types of experiments. All the experimental results were obtained using the same single network trained with the four diffusers at three different training positions.

3.1 Results on axial displacements through seen diffusers

We first tested our network using the speckle patterns from the same four trained diffusers at different unseen positions, as shown in Fig. 3. The testing objects consisted of both seen digits used for the training and unseen digits. The testing displacement positions for both the diffuser and the camera were up to 10$\times$ DOF, as shown in Fig. 3(a). The speckle patterns appear notably different when the diffuser or the camera is displaced over 1$\times$ DOF in the axial direction. Our DNN demonstrated the ability to make high-quality predictions at previously unseen positions across 10$\times$ DOF. Representative examples of the speckle and prediction pairs for both seen and unseen objects are shown in Fig. 3(b). We first show the results on diffuser displacements with the camera placed at its initial position $Z_{C10}$. On the left panel of Fig. 3(b), the speckle patterns and the prediction results are shown for the diffuser position at 1$\times$, 3$\times$, 7$\times$ and 10$\times$ DOF, respectively. Next, we present the testing results for camera displacements with the seen diffusers placed at $Z_{D0}$ on the right panel of Fig. 3(b). We show the speckle patterns and the network predictions when the camera was displaced by -9$\times$, -7$\times$, 2$\times$, 10$\times$ DOF away from $Z_{C10}$.

Fig. 3. Data acquisition and results on coherent imaging through seen diffusers at unseen displacement positions. (a) A summary of the training and testing dataset. The training data are captured through four different training diffusers at three training positions. All the rest are testing positions. (b) Representative testing results at different diffuser displacement positions (Left panel) and camera displacement positions (Right panel) for both seen objects (Row 1 and 2) and unseen objects (Row 3 and 4).

Download Full Size | PDF

3.2 Results on imaging through unseen diffusers across different displacements

In the second experiment, we further tested our network using the speckle patterns obtained with the unseen diffuser and across a range of displacement positions, as shown in Fig. 4(a). We tested our network on both seen and unseen objects from the MNIST digit dataset. As summarized in Fig. 4(b), the left panel shows that the prediction results with the diffuser displaced at 1$\times$, 3$\times$, 7$\times$ DOF, respectively. The right panel shows the prediction results with the camera displaced at -9$\times$, 0$\times$, 7$\times$ DOF, respectively. By using an unseen diffuser, the problem becomes more challenging as the network needs to overcome both scatter and position perturbations. As a result, the performance degrades as compared to the seen diffuser case. Still, the main structure of the objects are accurately recovered across a range of tested displacement positions.

Fig. 4. Data acquisition and results on coherent imaging through unseen diffusers at unseen displacement positions. (a) A summary of the training and testing dataset. Training data are the same as the data in the first experiment. Testing data is from the unseen diffuser at different positions. (b) Testing results through unseen diffuser at different diffuser displacement positions (Left panel) and camera displacement positions (Right panel) for both seen objects (Row 1 and 2) and unseen objects (Row 3 and 4).

Download Full Size | PDF

3.3 Quantitative evaluation

Next, we quantify the prediction performance using the Pearson correlation coefficient (PCC), as summarized in Fig. 5. Our experimental results show consistent predictions on both seen and unseen objects for both seen and unseen diffusers. Accordingly, the two curves plotted in Fig. 5 were calculated by accumulating the statistics from all the objects (seen and unseen) on all four seen diffusers and the single unseen diffuser, respectively.

Fig. 5. Quantitative performance evaluation of DNN prediction results. At each position, 1600 testing images are evaluated for seen diffuser case and 400 testing images are evaluated for unseen diffuser case. Each cross (seen diffuser) or circle (unseen diffuser) marker represents the mean PCC of the predictions at each position. Each error bar quantifies the standard deviation of the prediction results at each position. The training positions are marked by the grey boxes.

Download Full Size | PDF

The mean PCC at each position over all the predictions are marked by the cross markers for the seen diffuser and the circle markers for the unseen diffuser. Each error bar quantifies the standard deviation of the prediction results at each position. We observed that for the seen diffusers, the network performs the best at the trained positions. When the displacement increases, the PCC gradually decreases. The variations of the predictions, quantified by the standard deviation (std), also increases with the displacement distance. For the unseen diffuser, the overall performance drops slightly and is in quantitative agreement to that reported in our previous work [26]. Notably, the prediction results are less dependent on the positions. The mean and std of the PCCs remain consistent over the entire displacement range. Overall, our DNN showed the ability to make high-quality predictions against perturbations from camera and diffuser displacements. The degradation of the predictions as a function of displacement is gradual, as seen in Fig. 3(b) and 4(b). This shows the robustness of our DNN model under these physical perturbations.

4. Analysis

Next, we investigated the correlation across different speckle patterns imaged under different imaging conditions and further developed a framework to interpret the mechanism of how our DNN model generalizes over different scattering conditions. To do so, we used the state-of-the-art unsupervised dimension reduction technique, UMAP [35]. UMAP models the entire dataset into a single nonlinear manifold by learning the underlying topological structure contained in the high-dimensional data. In its simplest form, UMAP considers each data (e.g. an image) as a single vector in the learned manifold and models the entire dataset as a 2D (nonlinear) representation. We apply this technique to analyze both the raw speckle patterns and the DNN model’s latent codes, and propose a procedure to visualize the training and prediction process.

4.1 Raw speckle patterns contain scattering-specific information

We analyzed the input data and the corresponding measured speckle patterns to discover the intrinsic correlations, as shown in Fig. 6(a). First, the UMAP learned manifold of the input object dataset is visualized as a 2D map. For better visualization, We randomly selected 4000 images from the same MNIST dataset as our input data. Once a large dataset without labels is fed into UMAP, the algorithm adaptively outputs an unsupervised transformation mapping between the high-dimensional dataset and a low-dimensional representation [35]. Consider the entire dataset as a $N\times$512$\times$512 matrix, where $N$ denotes the number of images. To perform dimension reduction analysis, we first preprocessed the data by reshaping the matrix into a $N\times 512^2$ matrix. Next, we compute the input manifold as a 2D representation using UMAP. As clearly shown in Fig. 6(a)(i), this dataset naturally (i.e. without any supervision/labels for UMAP) clusters into 10 groups corresponding to the underlying 10-digit classes, each of which is marked by a different color. Here, each point (i.e. a vector) on this 2D map represents an input object image. This visualization shows that the raw input object intrinsically contains object-specific information in its image structure as expected.

Fig. 6. Data and latent space analysis based on dimension reduction. (a) UMAP based visualization of the input data and speckle measurements. (i) The manifold of the input data shows 10 clusters matching the underlying 10 digits. (ii) The manifold of the speckle patterns forms clusters based on the underlying scattering conditions, in which the indices of the diffuser and the position are marked by D# and P#, respectively. (b) Network analysis during training and making predictions. (i) The training input manifold computed from the speckles captured under 12 scattering conditions correspondingly forms 12 distinct clusters. (ii-iv) The testing data under different imaging conditions are projected onto the same training input manifold. (v) The learned latent manifold is computed from the leaned latent code of the training data. (vi-viii) The corresponding predicted latent code for testing data under different imaging conditions are projected onto the learned latent manifold.

Download Full Size | PDF

Next, we visualized the UMAP learned manifold from 9600 speckle patterns taken under 24 different scattering conditions, including four diffusers, each with three diffuser positions and three camera positions. Importantly, we observe that the learned manifold for the speckle patterns are clustered into 24 distinct groups according to the underlying scattering condition, while the object-specific information has been scrambled by the scattering process, as shown in Fig. 6(a)(ii). We label each scattering condition by (D#, P#), where D1-D4 indexes the four different diffusers and P1 - P6 indexes six different position, corresponding to $Z_{c5}$, $Z_{c15}$, $Z_{D5}$, $Z_{c1}$, $Z_{D3}$, $Z_{D9}$. We observe that speckles measured from the same diffuser at different positions do not form an apparent “super cluster”. These results show that the speckle patterns captured under the same scattering condition contain intrinsic correlations; the speckle patterns become more decorrelated as the scattering condition changes. These observations match well with our previous study based on the classical PCC analysis [26]. Here, by using a more advanced dimension reduction technique, we show that the raw speckle patterns contain scattering-specific information that can be revealed without the need for any supervised learning procedure.

4.2 Interpreting the mechanism of the DNN model’s generalization

Next, we develop a novel procedure to interpret the working principle of our DNN model and its generalization capability to different scattering conditions. Our main idea is to analyze the training and prediction processes and quantify the underlying information content using UMAP. To do so, we take the following two-step process. First, we set up the learned latent space by the trained DNN model, which will be used as the global “coordinate system” to quantify the information content in both the training and the prediction. Specifically, we fed each training data into the trained network and extracted the corresponding latent code. We then used all the latent code from the entire training dataset and set up the learned latent space using UMAP. Second, we projected the latent code extracted from the testing data under different conditions to the learned latent space (the coordinate system) and visualized the discrepancy between the learned and the predicted latent codes in order to assess the DNN model’s generalization capability. In the following, we first discuss our analysis results on unseen objects under the same training scattering conditions and show that our DNN model can reveal object-specific information and disentangles scattering-specific information using the DNN model’s encoder path. Next, we discuss the testing results underlying different scattering conditions and demonstrate our DNN’s generalization capability to different scattering cases. Finally, we dissect our DNN model and discuss the distinct functionalities provided by the encoder-decoder path and the skip connections for solving this inverse scattering problem.

4.2.1 DNN model reveals object-specific information

We first visualized the speckle patterns used for the training that include 12 different scattering conditions (i.e. four diffusers at one diffuser position and two camera positions) by UMAP in the 2D map in Fig. 6(b)(i), which is termed the training input manifold. As expected based on our previous analysis, 12 distinct clusters are formed matching the underlying scattering condition. Next, we fed all the training speckle patterns to our trained DNN and extracted the learned latent codes, which are then used to compute the learned latent manifold by UMAP. In Fig. 6(b)(v), the learned latent space is visualized as a 2D map. Importantly, it contains 10 clusters based on the corresponding digit label instead of the scattering conditions. This shows that the trained network learns to distill object-specific features and “unmix” the scattering effects. Next, we projected the testing speckle patterns captured from unseen objects and under the same 12 scattering conditions onto the existing training input space under the same (nonlinear) UMAP transformation. As shown in Fig. 6(b)(ii), the projection aligns well with the existing training input manifold and the corresponding clusters, which further indicates that speckles captured with a given scattering condition are correlated regardless of the input objects. Next, we fed all the testing speckle patterns to the trained DNN and extracted the predicted latent codes. Finally, we projected the predicted latent codes onto the previously learned latent manifold, as shown in Fig. 6(b)(vi). The predicted latent code clusters align well with that from the training data. This result indicates that our DNN can reveal object-specific information under the same scattering conditions from unseen speckle patterns.

4.2.2 Interpreting the DNN model’s generalizability to different scattering conditions

We discuss the analysis results from different scattering conditions in two cases. In the first case, we analyzed the testing data from different displacements through the same four training diffusers, including four unseen diffuser positions and four unseen camera positions. After projecting the input speckle patterns onto the previously established training input manifold, the clusters no longer align with the existing input manifold, as shown in Fig. 6(b)(iii). In the second case, we analyzed speckle patterns captured from the unseen diffuser with four diffuser positions and four camera positions. As shown in Fig. 6(b)(iv), the projected clusters significantly differ from the training input manifold. In both cases, it shows again that speckles captured from different scattering conditions are decorrelated. Although the manifold learned by UMAP is nonlinear, Fig. 6(b)(iii) and Fig. 6(b)(iv) can be intuitively interpreted as follow: given the “coordinate system” set up by the training speckles, the testing speckles can no longer be represented by any single cluster (“axis”). This is because speckle patterns from different cases exhibit different features so that their corresponding 2D representations are far apart. Specifically, combining with our previous analysis, the underlying scattering condition dictates the unique features in the speckle patterns, and will differentiate them from other cases. This elucidates on the challenge for deep learning to generalize over different scattering conditions.

The next step is to project the predicted latent codes extracted from the testing data onto the learned latent manifold under the same UMAP transformation. As shown in Fig. 6(b)(vii) and Fig. 6(b)(viii), for both cases under different scattering conditions, the predicted latent-code clusters align well with that of the training data. The good alignment between the predicted and learned latent manifolds illustrates our DNN model’s ability to generalize in terms of diffuser displacements, camera displacement, and change of diffusers. Artifacts including mixing across different clusters are also observed as compared to the original learned latent manifold. These artifacts become more obvious for the unseen diffuser case (Fig. 6(b)(viii)). Fundamentally, this is because the learned encoder is trained based on the training data distribution, which may not sufficiently capture the testing data distribution.

4.3 Encoder-decoder path and the skip connections provide distinct functionalities

While the concept of latent code is originally established for the auto-encoder network (i.e. an encoder-decoder network without skip connections) [34], we adapt the same concept to analyze our modified U-net. To justify our approach, we compared different network structures using the weights directly loaded from our trained network. Notably, we found that the encoder-decoder path and the skip connections provide distinct functionalities for inverting the coherent speckles.

We first blocked the skip connections of the trained network to effectively construct a plain encoder-decoder network. The procedures and details are shown in Supplementary Materials. We performed predictions using our network with blocked skip connections on the same testing data. The corresponding prediction results showed that only minor blurring is resulted in the reconstruction. We further quantitatively evaluate the predictions from the plain encoder-decoder network using the same testing data as Fig. 5. Slight performance drop was observed on the seen diffuser cases with reduced mean PCC and increased std. Secondly, we blocked the information flow of the encoder-decoder path and remained skip-connection paths. The corresponding prediction results showed severe degradation. Visualization and quantitative results can be found in Supplementary Materials.

This study shows that the information used for reconstruction is primarily extracted from the encoder-decoder path. In particular, the encoder-decoder path plays the key role in generalizing to new scattering conditions. Adding the skip connections helps restoring high-resolution features and slightly improves the prediction results for the seen diffuser case. Because of this, we can confidently extend the latent code concept to analyze the information learned by our network without missing much from those flow through the skip connections. Empirically, we also found that adding the skip connections helps preventing overfitting. More results and implementation details can be found in Supplementary Materials.

5. Conclusion

We presented a new DL framework for coherent imaging through scatter and pushing the robustness against scatterer displacement and imaging optics defocus. We developed a new analysis framework for revealing the information contained in the speckle, and interpreting the mechanism and visualizing the generalizability of the DNN model. We established the importance of the encoder-decoder path for the DNN model’s generalization to untrained scatterers, as well as how the skip connections help predicting overfitting to seen scattering conditions.

We demonstrated that our DNN model is agnostic to scatterer changes, scatterer displacement and camera defocus for coherent imaging through scatter. By improving the data acquisition strategy and improving the network structure, the generalization capability may be further improved, which will be explored in our future work. These promising results show that DL can robustly solve challenging inverse problems of coherent imaging through complex media under various perturbations.

Our analysis framework shows that the speckle patterns intrinsically carry scatter/displacement information. After training, the encoder can unmix the scattering and distill the object-specific information. Our analysis framework allows us to visualize the DNN’s generalizability under different imaging conditions. This provides a powerful tool to explore the underlying correlations within the data and to interpret the learning mechanism of the DNN model. The caveat of using a nonlinear dimension reduction process, like UMAP, is that the traditional distance measures for linear spaces, such as the Euclidean norm, can no longer be used [35]. This poses challenges to quantify the discrepancy between the learned and the predicted distributions as well as to provide a unique inverse transform from the latent space to the input [35], both of which will be investigated in our future work.

Here, to simplify the problem we have focused on establishing an understanding of the mechanism of network generalization against scatterer and displacement changes while using a simple dataset containing only 10 classes of handwritten digits. This approach has the benefits of facilitating direct visualization of the learned latent space into a small number of visually distinguishable clusters using the dimension reduction technique. However, the simplicity of the object dataset inevitably limits the network’s generalizability for predicting more complex structures, which requires increased diversity of the training objects, as shown in our previous work [26]. In general, it has been recently shown that the network’s generalizability against object variations can be improved by increasing the information capability, i.e. entropy, of the training data [38]. In our work, the information content in the input data is visualized by the dimension reduction technique. Importantly, our results show that for systems involving complex transformations, such as scattering, directly measuring the information content in the raw input data may not provide sufficient information for generalization. Instead, we develop a latent code analysis framework for understanding this challenging computational imaging problem. We envision a comprehensive analysis framework developed by analyzing the information content of both the network input and the latent space may provide additional insights into improving and interpreting generalization against both system perturbations and object variations, which will be considered in our future work.

Our DNN performs a deterministic inversion and makes a “point estimation” of the underlying object. For solving complex inverse problems, it has been shown that quantifying the uncertainty using a probabilistic perspective of deep learning can further enhance the reliability and interpretability of the prediction. For example, a variational auto-encoder framework has developed to visualize the prediction variations from inverting the speckle data [39]. A full Bayesian neural network framework has been developed to quantify the uncertainties due to both model and data variations for solving a phase retrieval problem [40]. Incorporating the uncertainty quantification technique may be particularly useful to fully understand the “deep” correlations contained in the coherent speckle data, which will be explored in our future work.

Funding

National Science Foundation (1813848).

Acknowledgements

We thank George Barbastathis and Mo Deng for insightful discussions, Boston University Shared Computing Cluster for proving the computational resources.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

See Supplement 1 for supporting content.

References

1. M. Kim, W. Choi, Y. Choi, C. Yoon, and W. Choi, “Transmission matrix of a scattering medium and its applications in biophotonics,” Opt. Express 23(10), 12648–12668 (2015). [CrossRef]

2. S. Popoff, G. Lerosey, M. Fink, A. C. Boccara, and S. Gigan, “Image transmission through an opaque material,” Nat. Commun. 1(1), 81 (2010). [CrossRef]

3. I. M. Vellekoop and A. P. Mosk, “Focusing coherent light through opaque strongly scattering media,” Opt. Lett. 32(16), 2309–2311 (2007). [CrossRef]

4. S. Ohayon, A. Caravaca-Aguirre, R. Piestun, and J. J. DiCarlo, “Minimally invasive multimode optical fiber microendoscope for deep brain fluorescence imaging,” Biomed. Opt. Express 9(4), 1492–1509 (2018). [CrossRef]

5. A. K. Dunn, H. Bolay, M. A. Moskowitz, and D. A. Boas, “Dynamic Imaging of Cerebral Blood Flow Using Laser Speckle,” J. Cereb. Blood Flow Metab. 21(3), 195–201 (2001). [CrossRef]

6. S. M. Popoff, G. Lerosey, R. Carminati, M. Fink, A. C. Boccara, and S. Gigan, “Measuring the Transmission Matrix in Optics: An Approach to the Study and Control of Light Propagation in Disordered Media,” Phys. Rev. Lett. 104(10), 100601 (2010). [CrossRef]

7. T. R. Hillman, T. Yamauchi, W. Choi, R. R. Dasari, M. S. Feld, Y. Park, and Z. Yaqoob, “Digital optical phase conjugation for delivering two-dimensional images through turbid media,” Sci. Rep. 3(1), 1909 (2013). [CrossRef]

8. M. Jang, H. Ruan, I. M. Vellekoop, B. Judkewitz, E. Chung, and C. Yang, “Relation between speckle decorrelation and optical phase conjugation (OPC)-based turbidity suppression through dynamic scattering media: A study on in vivo mouse skin,” Biomed. Opt. Express 6(1), 72–85 (2015). [CrossRef]

9. Y. Liu, P. Lai, C. Ma, X. Xu, A. A. Grabar, and L. V. Wang, “Optical focusing deep inside dynamic scattering media with near-infrared time-reversed ultrasonically encoded (TRUE) light,” Nat. Commun. 6(1), 5904 (2015). [CrossRef]

10. M. M. Qureshi, J. Brake, H.-J. Jeon, H. Ruan, Y. Liu, A. M. Safi, T. J. Eom, C. Yang, and E. Chung, “In vivo study of optical speckle decorrelation time across depths in the mouse brain,” Biomed. Opt. Express 8(11), 4855–4864 (2017). [CrossRef]

11. S. Feng, C. Kane, P. A. Lee, and A. D. Stone, “Correlations and Fluctuations of Coherent Wave Transmission through Disordered Media,” Phys. Rev. Lett. 61(7), 834–837 (1988). [CrossRef]

12. I. Freund, M. Rosenbluh, and S. Feng, “Memory Effects in Propagation of Optical Waves through Disordered Media,” Phys. Rev. Lett. 61(20), 2328–2331 (1988). [CrossRef]

13. I. Freund, “Looking through walls and around corners,” Phys. A 168(1), 49–65 (1990). [CrossRef]

14. S. Schott, J. Bertolotti, J.-F. Léger, L. Bourdieu, and S. Gigan, “Characterization of the angular memory effect of scattered light in biological tissues,” Opt. Express 23(10), 13505–13516 (2015). [CrossRef]

15. J. Bertolotti, E. G. van Putten, C. Blum, A. Lagendijk, W. L. Vos, and A. P. Mosk, “Non-invasive imaging through opaque scattering layers,” Nature 491(7423), 232–234 (2012). [CrossRef]

16. E. Edrei and G. Scarcelli, “Memory-effect based deconvolution microscopy for super-resolution imaging through scattering media,” Sci. Rep. 6(1), 33558 (2016). [CrossRef]

17. O. Katz, P. Heidmann, M. Fink, and S. Gigan, “Non-invasive single-shot imaging through scattering layers and around corners via speckle correlations,” Nat. Photonics 8(10), 784–790 (2014). [CrossRef]

18. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921–943 (2019). [CrossRef]

19. S. Li, M. Deng, J. Lee, A. Sinha, and G. Barbastathis, “Imaging through glass diffusers using densely connected convolutional networks,” Optica 5(7), 803–813 (2018). [CrossRef]

20. M. Lyu, H. Wang, G. Li, S. Zheng, and G. Situ, “Learning-based lensless imaging through optically thick scattering media,” Adv. Photonics 1(03), 1 (2019). [CrossRef]

21. B. Rahmani, D. Loterie, G. Konstantinou, D. Psaltis, and C. Moser, “Multimode optical fiber transmission with a deep learning network,” Light: Sci. Appl. 7(1), 69 (2018). [CrossRef]

22. A. Turpin, I. Vishniakou, and J. d Seelig, “Light scattering control in transmission and reflection with neural networks,” Opt. Express 26(23), 30911–30929 (2018). [CrossRef]

23. N. Borhani, E. Kakkava, C. Moser, and D. Psaltis, “Learning to see through multimode fibers,” Optica 5(8), 960–966 (2018). [CrossRef]

24. P. Fan, T. Zhao, and L. Su, “Deep learning the high variability and randomness inside multimode fibers,” Opt. Express 27(15), 20241–20258 (2019). [CrossRef]

25. E. Kakkava, N. Borhani, B. Rahmani, U. Teğin, C. Moser, and D. Psaltis, “Deep learning-based image classification through a multimode fiber in the presence of wavelength drift,” Appl. Sci. 10(11), 3816 (2020). [CrossRef]

26. Y. Li, Y. Xue, and L. Tian, “Deep speckle correlation: A deep learning approach toward scalable imaging through scattering media,” Optica 5(10), 1181–1190 (2018). [CrossRef]

27. J. W. Goodman, Speckle Phenomena in Optics: Theory and Applications (Roberts and Company Publishers, 2007).

28. X. Xie, H. Zhuang, H. He, X. Xu, H. Liang, Y. Liu, and J. Zhou, “Extended depth-resolved imaging through a thin scattering medium with PSF manipulation,” Sci. Rep. 8(1), 4585 (2018). [CrossRef]

29. M. Liao, D. Lu, G. Pedrini, W. Osten, G. Situ, W. He, and X. Peng, “Extending the depth-of-field of imaging systems with a scattering diffuser,” Sci. Rep. 9(1), 7165 (2019). [CrossRef]

30. R. Horisaki, Y. Okamoto, and J. Tanida, “Single-shot noninvasive three-dimensional imaging through scattering media,” Opt. Lett. 44(16), 4032–4035 (2019). [CrossRef]

31. A. Labeyrie, “Attainment of diffraction limited resolution in large telescopes by fourier analysing speckle patterns in star images,” Astron. Astrophys. 6, 85–87 (1970).

32. A. K. Singh, D. N. Naik, G. Pedrini, M. Takeda, and W. Osten, “Exploiting scattering media for exploring 3d objects,” Light: Sci. Appl. 6(2), e16219 (2017). [CrossRef]

33. Y. Wu, Y. Rivenson, Y. Zhang, Z. Wei, H. Günaydin, X. Lin, and A. Ozcan, “Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery,” Optica 5(6), 704–710 (2018). [CrossRef]

34. J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, “Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction,” in Artificial Neural Networks and Machine Learning – ICANN 2011, T. Honkela, W. Duch, M. Girolami, and S. Kaski, eds. (Springer, Berlin, Heidelberg, 2011), Lecture Notes in Computer Science, pp. 52–59.

35. L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” arXiv:1802.03426 [cs, stat] (2018).

36. “MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges,” http://yann.lecun.com/exdb/mnist/.

37. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

38. M. Deng, S. Li, Z. Zhang, I. Kang, N. X. Fang, and G. Barbastathis, “On the interplay between physical and content priors in deep learning for computational imaging,” Opt. Express 28(16), 24152–24170 (2020). [CrossRef]

39. F. Tonolini, J. Radford, A. Turpin, D. Faccio, and R. Murray-Smith, “Variational Inference for Computational Imaging Inverse Problems,” arXiv:1904.06264 [cs, stat] (2020).

40. Y. Xue, S. Cheng, Y. Li, and L. Tian, “Reliable deep-learning-based phase imaging with uncertainty quantification,” Optica 6(5), 618–629 (2019). [CrossRef]

Diffuser change	Displacement
	Element	Positions	Interval
$D_{i}$	Diffuser	$Z_{D 0}, Z_{D 1}, \dots,$ $Z_{D 5}$ , $\dots, Z_{D 10}$	1 mm
( $i = 1, 2, 3, 4,$ 5)	Camera	$Z_{C 0}, \dots$ , $Z_{C 5},$ $Z_{C 6}, \dots, Z_{C 14}$ , $Z_{C 15},$ $\dots, Z_{C 20}$	0.5 mm

Displacement-agnostic coherent imaging through scatter with an interpretable deep neural network

Abstract

1. Introduction

2. Methods

2.1 Experimental setup

2.2 System characterization

2.3 Data acquisition

2.4 Speckle intensity distribution

2.5 Data preprocessing

2.6 Network implementation

3. Results

3.1 Results on axial displacements through seen diffusers

3.2 Results on imaging through unseen diffusers across different displacements

3.3 Quantitative evaluation

4. Analysis

4.1 Raw speckle patterns contain scattering-specific information

4.2 Interpreting the mechanism of the DNN model’s generalization

4.2.1 DNN model reveals object-specific information

4.2.2 Interpreting the DNN model’s generalizability to different scattering conditions

4.3 Encoder-decoder path and the skip connections provide distinct functionalities

5. Conclusion

Funding

Acknowledgements

Disclosures

References

Supplementary Material (1)

Cited By

Figures (6)

Tables (1)

Equations (1)

Optics Express