cGAN-assisted imaging through stationary scattering media

Siddharth Rawat; Jonathan Wendoloski; Anna Wang

doi:10.1364/OE.450321

1. Introduction

Laser speckles are a nuisance when imaging through disordered media such as a diffuser or biological tissue. While methods such as cleaning and slicing can be used during sample preparation to minimize the adverse scattering from unwanted scatterers, these methods are invasive and can physically damage the sample [1,2]. It is thus desirable to invert or negate the effect of this extra scattering by non-invasive or computational means, for a myriad of desirable applications such as deep tissue imaging [3,4], imaging through fog [5–7], underwater imaging [8,9], imaging through multi-mode fibers [10], and more.

To avoid the issue of unwanted scattering, a common requirement for in-line coherent imaging modalities such as holographic microscopy is for the sample solution to be sufficiently dilute. If this condition is not met, then the fringes in the object holograms might be strongly affected by the fringes from the multiple secondary scatterers (present along with the primary object), making particle localization and inferring object properties such as refractive index and particle size impossible.

The main bottleneck for an analytical approach in analyzing the speckle/diffused patterns lies in accurately modelling and characterizing the whole scattering process. This characterization becomes challenging owing to the sheer number of variables involved and the associated degrees of freedom present within the scatterer and scattering media.

Nonetheless, optical scattering still can be regarded as a deterministic process [11] and a series of strategies have been developed to negate the effect of scattering. Iterative wavefront shaping [12,13], the transmission matrix [14–16] approach, speckle correlation [17–19], and use of guide stars for point spread function (PSF) estimation [20–22], are all useful but come with their own limitations. In the iterative wavefront shaping method, the involvement of localized reporters and the number of trials involved to get the final optimized wavefront can both present challenges. The transmission matrix (TM) approach requires a spatial light modulator hence complicated opto-acoustic configurations and has limited depth of field [14]. Speckle correlation methods are less invasive than the aforementioned two methods: the object field is reconstructed from the autocorrelation of the field recorded at the sensor, using Fienup-type iterative phase retrieval algorithms. The main drawback of this approach is the limited angular range due to the memory effect [17]. PSF estimation based on a guide star and blind deconvolution using algorithms such as Richardson-Lucy is also limited by the small angular range of reconstruction [20–22].

In this manuscript we treat the scattering as an image-to-image translation problem in order to use recent advancements in machine and deep learning. Unlike the speckle correlation method, our proposed method is not constrained by small angular range, nor does it require constant scanning of the incident laser beam using galvanic mirrors [23]. This is because it is an image-to-image translation method, meaning it translates the whole measured (unknown) speckle patterns directly into full field-of-view reconstructions. Data-driven approaches have gained traction in solving a range of inverse problems in optics including phase unwrapping [24], image retrieval [25], defect detection [26], and coherent imaging [27–30] and scalable imaging through diffusive media [31–34]. Li and coworkers [11] proposed a highly scalable new deep learning framework that encapsulates a range of statistical variations for the trained model for robust reconstructions, despite speckle decorrelations. The convolutional neural network (CNN) architecture used was fairly simple, being essentially an autoencoder. For example recently, Lai and coworkers used a YGAN architecture trained on thousands of image pairs to retrieve pairs of objects from speckle patterns [35].

Here, we use conditional generative adversarial networks or cGANs [36] to analyze images from in-line coherent imaging setups. There are several variations of GANs [37] such as SRGAN, LAPGAN, DCGAN, and more, that have been used previously for various image translation tasks. In the present work we specifically chose a cGAN over other GANs as it is the only GAN architecture that allows us to generate object class specific reconstructions from unknown speckle patterns, due to the conditioning. Also, the individual generated images/reconstructions not only match their class labels, but also capture the full diversity of the training dataset.

Our main contribution is to demonstrate the feasibility of cGANs in accurate signal retrieval, impulse localization and, object retrieval from these unknown speckle patterns either generated computationally (simulation) or acquired experimentally for various scattering scenarios, whilst keeping the training dataset sizes minimal. Importantly, the recovered signal is faithful enough to the original image for further analysis, leading to excellent quantitative feature extraction of moving objects in in-line imaging setups. Second, our first- and second-order speckle intensity autocorrelation analysis shows that object class specific speckle patterns formed in the sensor plane are unique to that object class only. Moreover, we show that the second order normalized intensity autocorrelation is independent of the physical properties of the scattering layer. Third, we show that scattering simulations (based on scalar diffraction theory) for interferometric phase conjugation-based object retrieval in diffusive media is prone to uncorrelated phase changes over time, hence rendering the earlier estimated/measured phases for phase conjugation useless. The object retrieval becomes impossible, therefore warranting data-driven alternatives such as cGANs. Finally, we believe this is the first experimental demonstration of imaging through stationary scattering media without using any quasi-planar devices such as spatial light modulators (SLMs) and instead using real objects and diffusers.

2. Network architecture

The principle and workflow of a cGAN network architecture for several image translation tasks we employ is illustrated in Fig. 1. Figure 1(a) shows the complete training of a typical cGAN network architecture; paired image data is used in training the cGAN consisting of a generator and a discriminator block. In this architecture, generator G is tasked to produce a synthetic translation of the training data in a conditional sense. The discriminator D is tasked to classify whether the inputted image pair is real or fake. Both generator G and discriminator D compete to optimise their own objectives, and are hence termed adversaries.

Fig. 1. Principle and working of a cGAN network architecture for several image translation tasks. (a) Complete cGAN architecture: Paired training dataset is used to train the network consisting of a generator $G$ and a discriminator $D$ block. Generator outputs a synthetic translation for a given input $x$ and a random noise vector $z$. The discriminator $D$ is a tasked to classify between real and fake image pairs. Alongside the $L_\mathrm {cGAN}$ loss, $L_1$ loss is also considered, so that the synthetically generated images remains closer to the ground truth. (b) Simulation workflow from training to inference/testing: After training, a standalone model can then be used to perform image translation tasks to generate $G(x,z)$ from non-discernible images $x$, resulting in signal retrieval, impulse localization, or object retrieval.

Download Full Size | PDF

Alongside the conditional GAN loss $L_\mathrm {cGAN}$, the $L_1$ loss is computed to maintain the similarity between the generated and ground truth images. A standalone model is then obtained after the completion of training (see Supplementary Methods in Supplement 1). Figure 1(a) shows the complete workflow from training to inference – after model training, a standalone Pix2Pix cGAN model is obtained, that can then be used to translate images from one domain to another. Thereafter, several crucial parameters can be extracted from these generated images and error metrics. For each of the three image translation tasks shown in Fig. 1(b), the Pix2Pix cGAN network was trained independently, outputting three separate standalone models. These three models were then used to invert or negate the effects of scattering for the aforementioned image translation tasks.

3. Phase conjugation robustness

3.1 Mathematical formulation of first and second order speckle intensity autocorrelation functions

Before proceeding with the cGAN, we investigated the speckle field and intensity autocorrelation for a scenario involving scattering from a thin diffuser layer. For further details please see Section 1 in Supplement 1. In brief, we performed second order speckle statistics for the electric field $A(x^{'})$ at the observation plane and found that the field autocorrelation function $C_{A}(x_1^{'}, x_2^{'}) \equiv \langle A^{\ast }(x_1^{'}) A(x_2^{'}) \rangle$ [38] ($\ast$ denoting the complex conjugate) between any two coordinate points $(x_1^{'}, x_2^{'})$ in the field $A(x^{'})$ is proportional to $|f(x)|^2$. In other words, the field in the observation plane and the field in the object plane are correlated despite randomization. We also showed that first and second order speckle intensity autocorrelation functions are related and are independent of the physical properties of the scattering layer.

Given this correlation, we then performed scattering simulations based on scalar diffraction theory to determine the robustness of simple phase conjugation (see Supplement 1 and Figures S2-S7). In brief, we used the Angular Spectrum Propagation (ASP) method to simulate forward and backward scattering from an object in presence of a stationary scattering media layer such as a random phase mask (e.g. diffuser). In the simulation, we assumed that the diffuser phase can be estimated using interferometry and can be later used for phase conjugation for object retrieval.

However, we found that interferometric phase estimation methods are sensitive to noise; translating the diffuser by a few micrometers renders the earlier phase estimation useless for phase conjugation. This poses a significant challenge to using the interferometric scheme for object retrieval and alternative methods are thus warranted.

In the following sections, we test our cGAN-based approach on systems where the main scatterer of interest is in motion, either from diffusion or from manual translation.

4. Microscopic object retrieval

4.1 Object retrieval from behind a single scattering layer

Digital holographic microscopy (DHM) is one key imaging technology that uses coherent light and is hence severely affected by additional scatterers. Though DHM has been used for precision tracking to great success [39], dilute systems are typically used because the interference fringes of neighboring objects limits what information can be recovered about the object of interest.

We test our approach on holograms of a primary scatterer (P) positioned near a layer of secondary scatterers (S), as shown in Fig. 2(a, g). The S layers consist of 1-$\mu$m-diameter spheres of refractive index $n_S$ = 1.59, typical of polystyrene latex microspheres. The primary scatterer P is 2-$\mu$m-diameter of refractive index $n_P$ = 1.42, approximating that of biological material. The surrounding medium’s refractive index is $n_{med}$ = 1.33, representing aqueous buffers or water. The wavelength of the light source is $\lambda$ = 632.8 nm.

Fig. 2. Spatial parameter extraction by fitting a Lorenz-Mie scattering model to the cGAN-generated and noisy holograms for a mobile primary scatterer ($P$) in two different scattering scenarios. (a) A primary scatterer $P$ diffuses freely in two dimensions in front of a stationary scattering layer $S$. (b) The complete path of $P$ (150 steps) within bounds i.e. 3 $\leq$ ($x_p$, $y_p$) $\leq$ 7 and $z_p$ = 8. (c) The ground truth $y$ = $H_\mathrm {P}$, cGAN-generated $G(x,z)$ = $H_\mathrm {cGAN}$, and the complex hologram $x$ = $H_\mathrm {P+S}$. (d) Fit results for localizing $P$ in $x = H_\mathrm {P+S}$ and $G(x,z)$ = $H_\mathrm {cGAN}$, versus ground truth values. (e) The root mean squared error (RMSE) for localizing $P$ in the axial direction for the raw, ground truth and cGAN generated holograms. (f) Error metrics $\chi ^2$ and $R^2$ reveal excellent recovery of hologram features by the trained model. (g) A primary scatterer $P$ diffuses freely in three dimensions between two stationary scattering layers $S$ and $S'$. (h) The complete 3D path of $P$ (2000 steps) within bounds i.e. 3 $\leq$ ($x_p$, $y_p$) $\leq$ 8 and 12 $\leq$ $z_p$ $\leq$ 15. (i) The ground truth $y$ = $H_\mathrm {P}$, cGAN-generated $G(x,z)$ = $H_\mathrm {cGAN}$, and the complex hologram $x$ = $H_\mathrm {P+S+S'}$. (j) Fit results for localizing $P$ in $x = H_\mathrm {P+S+S'}$ and $G(x,z)$ = $H_\mathrm {cGAN}$, versus ground truth values. (k) The root mean squared error (RMSE) for localizing $P$ in all directions for the raw, ground truth and cGAN generated holograms. (l) Error metrics $\chi ^2$ and $R^2$ shows excellent recovery of hologram features by the trained model. For further details please see Visualizations 1-6.

Download Full Size | PDF

To train the model, we generate image pairs with and without the scattering layer. For P obscured by a single S layer, we simulated a total of 150 holograms for a mobile P and stationary S, $x = H_{P+S}$ (see Fig. 2(b) and Supplementary Methods in Supplement 1). We also simulated corresponding holograms of P by itself $y = H_P$. The position coordinates for P were chosen to mimic Brownian motion using a Wiener process, a continuous-time stochastic process with stationary independent real increments, trajectory shown in Supplement 1 (Supplementary Methods) and Visualization 1. After training on 100 image pairs $x = H_{P+S}$ and $y = H_P$, we selected a model (Figure S8) to generate S-free holograms (see Supplementary Methods in Supplement 1 and Visualization 2) to test against the remaining 50 holograms.

As shown in Fig. 2(c), the qualitative agreement between cGAN-generated holograms $G(x, z) = H_{cGAN}$ and ground truth holograms $y = H_P$ appeared excellent, so we proceeded to quantitatively verify the performance of our model. We fitted a Lorenz-Mie scattering model (see Supplement 1) to each of the holograms $H_{P+S}$ and $H_{cGAN}$ using the package HoloPy [39–41]. The fitted results for $H_{P+S}$ and $H_{cGAN}$ were compared to the ground truth values used for simulation (Fig. 2(d)). While the $x_p$ and $y_p$ coordinates for both $x = H_{P+S}$ and $G(x, z) = H_{cGAN}$ could be recovered, the fit results for the axial position $z_p$ of the scatterer were vastly better for $H_{cGAN}$. The root-mean-square errors RMSE for localizing P reveals excellent performance in all three dimensions for $G(x, z) = H_{cGAN}$ (Fig. 2(e)).

The high quality of fit for $H_{cGAN}$ corresponds to excellent recovery of the hologram’s qualitative features. $\chi ^2$ (the sum of the pixel-by-pixel squared differences between the best-fit hologram and either $x$ or $G(x, z)$) and $R^2$ were both vastly improved by the cGAN model translation (Fig. 2(f)). Similarly-promising results were obtained for a scenario where the axial positions of P and S are reversed, and the axial position of P was allowed to change (see Section 2.5 in Supplement 1, Visualization 3, Visualization 4, and Figures S9 and S10). For complete training and testing parameters see Table S1.

4.2 Object retrieval from between two scattering layers

Holograms in the presence of a single scattering layer S still showed recognizable features of the primary scatterer P. We were thus compelled to explore a more adverse scenario: the addition of another scattering layer S’ renders P barely visible (Fig. 2(g)), and furthermore, we allow $P$ to diffuse in 3D (Fig. 2(h)). Additionally, the radii $r_i$ of the spheres in S and S’ were randomly sampled from a discrete uniform distribution $r_i \stackrel {}{\sim } U[.1,\; .5]$. This geometry and scenario corresponds to many real-world experimental schemes as described in other studies [16–18,20,21]. Specifically, our simulation scheme closely resembles the experiment described in [16]: blood-filled translucent silicone tube buried between two pieces of ex-vivo chicken-breast tissue sample.

By allowing $P$ to explore 3D space, we would also need the training set to span 3D space. To mitigate the concomitant increase in training time, we postulated that it may be possible to train the cGAN network on a series holograms of $P$ diffusing with discrete integer steps, despite the testing dataset being for $P$ diffusing with sub-integer steps. For a system with a single S we found that this approach worked effectively (Figures S9-10), so we then applied it to the system with both S and S’. The training dataset consisted of a total of 144 simulated hologram image pairs consisting of $x = H_\mathrm {P+S+S'}$ and $y = H_\mathrm {P}$ (Fig. 2(i)).

We generated the testing dataset separately using a 3D random walk with sub-integer step sizes (see Fig. 2(h) and supplementary Visualization 5). Even though the network was trained on discrete data (Figure S11), it accurately inferred the real testing dataset with particle localization in all three dimensions being very close to the ground truth (see Fig. 2(j)). Notably, the RMSE for particle localization was less than 0.2 $\mu$m in all three dimensions (Fig. 2(k)), whereas the particle could not be localized with the fitting routine for unprocessed holograms $x = H_\mathrm {P+S+S'}$.

For the dual scattering layer case, the overall quality of fit for $H_\mathrm {cGAN}$ was excellent (Fig. 2(l)), just as it was for the single scattering layer case. The $\chi ^2$ and $\textit {R}^2$ values were vastly improved by the cGAN model translation (see Figure S11 and Visualization 6). For complete training and testing parameters see Table S1. The best-fit results (Fig. 2(d-f) & Fig. 2(j-l)) verify that for the raw holograms, the fitting routine was unable to converge successfully; this is corroborated by the errors in the $x$, $y$ and $z$ coordinate fits (see Fig. 2(d-f) & Fig. 2(j-l)). Image clean-up is therefore necessary to extract any quantitative data.

5. Experimental object localization and retrieval

5.1 Accurate object localization in a spatially shift variant system

Given the promising results thus far, we sought to experimentally verify whether the same approach could be used to negate effects of a scattering layer on a point spread function (PSF). Ordinarily, imaging a point source such as a sub-micron fluorescent bead or a diffraction limited point source can be used to determine the PSF of an incoherent system or a coherent point spread function (c-PSF) of coherent imaging system [22]. For an ideal optical system with very high space-bandwidth product, the PSF is linear and shift invariant. However, these two assumptions do not hold for imaging systems with aberrations imposed by imperfections owing to undesirable scattering. As an extreme example, imaging point sources behind a stationary scattering layer can lead to formation of complex patterns such as laser speckles when using coherent light sources. This scattering makes the c-PSF (a speckle pattern) non-linear and spatially variant, which further limits the imaging field of view owing to memory effects [17,22].

We used a Mach Zehnder setup as shown in Fig. 3(a) to acquire c-PSFs of a point source with and without a scattering layer. Unlike a typical Mach Zehnder setup, which is used for interference between two beams, our use of the setup was to ensure stability of the two arms during imaging. Specifically, a diffraction limited spot was generated by a short focal length lens (Thorlabs, C171TMD-B), placed after the fiber collimated laser source (Thorlabs, PAF2-A7B), then imaged by a low NA lens (Nikon Plan Fluor, 4X/0.13). The amplitude of the optical field was split using a beam splitter (BS1). A ground glass diffuser (Edmund Optics, 220 grit) was placed in between BS1 (Thorlabs, CCM1-BS013) and a mirror M2 (Thorlabs, BB03-E02) to produce a speckled PSF which is non-linear and shift variant; this constitutes the first arm, arm1. The beam in the alternate arm arm2 was a clear and magnified image of the point source or impulse. The main advantage of this geometry is that it keeps the optical path lengths in the two arms matched, allowing us to record (Basler, acA1920-155um) the speckled image from arm1 and clear image of the object from arm2 simply by alternately blocking the arms (Fig. 3(b)). For further discussion of the propagation, see Supplement 1.

Fig. 3. Experimental setup for data acquisition of two different input types in the object plane. (a) A low NA lens L produces a diffraction limited spot in the image plane. Likewise, a USAF test target can be placed in the object plane. An objective lens (4x) L1 is then used to image the object plane. Beam splitter BS1 splits the amplitude of the diverging spherical beam into two arms arm1 and arm2. Mirrors M1 and M2 direct the beam to another beamsplitter BS2. A sensor is then placed after BS2 to record the image pairs. (b) By sequentially blocking either arm1 or arm2, image pairs $x$ and $y$ can be recorded for both the impulse/spot and the USAF target.

Download Full Size | PDF

Twenty-five translated impulses were collected along the rasterization path as shown in Fig. 4(a) (red dotted line). For every impulse, a speckle pattern was also recorded (see Fig. 3(b), top, and Fig. 4(b)). Thereafter, the paired dataset was split into testing and training sets (see Fig. 4(a)). Training this minuscule paired dataset (orange in Fig. 4(a)) only took $\sim$10 minutes in a Google Colaboratory environment (Figure S12). The five measured/test speckle patterns (see white bounding box in Fig. 4(a)) that were not used in training were then inputted individually into the standalone model model acquired after training. The model generated reconstructed images, a composite of which are shown in Fig. 4(b).

Fig. 4. Object retrieval and localization from unknown speckle patterns. (a) A composite image of the 25 translated impulses. The impulses within orange bounding boxes were used for training the cGAN network and the diagonal impulses within the white bounding box were used for testing. The red dotted line shows the rasterization path while recording these impulses. (b) The 5 test speckle patterns for the translated impulses in the object plane (c.f. white bounding box in (a)), and their corresponding cGAN-generated images $G(x,z)$. (c) The generated images $G(x,z)$ were compared with the ground truth images $y$. Low RMSE (units of pixels) in $X$ and $Y$ coordinates show excellent recovery of the impulses by the model. All the individual images are 256 x 256 pixels and the white scale bar represents 50 pixels. (d) For the USAF target object recovery, six class types were trained along with their speckle images. (e) Comparison between the cGAN-generated $G(x,z)$ and ground truth test images $y$ demonstrate excellent object recovery from the speckle patterns $x$. The colours in (a-b) and (e) represent the normalized intensity.

Download Full Size | PDF

By comparing the centroids of the original and cGAN-generated impulses, we confirmed that the impulse recovery was quantitatively excellent. The root mean square error (RMSE) between the original and cGAN-generated spot coordinates in $X$ is 1.0864, and $Y$ is 1.0687 pixels (see Fig. 4(c)) compared to a spot diameter of 15 pixels (FWHM 10 pixels). We therefore have shown that is possible to localize diffraction-limited objects in a spatially shift-variant optical system with high spatial accuracy, despite the c-PSF presenting as random noise. Complete training and testing parameters are tabulated in Table S1.

5.2 Imaging through stationary scattering media

Having established that our trained model can localize point sources with great accuracy, we then attempted to image objects through a scattering layer. The specimen chosen was a USAF target (Edmund Optics, 36-275), placed into the object plane of the objective lens L1 in place of the lens from the previous experiment. To collect data, we translated the USAF structures of interest (group 4) in the objective’s field of view, then alternately blocked arm1 to collect the clean object images, or arm2 to record the corresponding speckle image (see Fig. 3(b), bottom). A total of six classes were trained (Fig. 4(d), top), with the whole dataset consisting of 90 images pairs or 15 image pairs per class for six classes.

Importantly, each image pair had a slightly different position of the USAF target relative to the diffuser. The USAF target was mounted on a three-axis stage and was translated in the $X$-$Y$ plane only. Notably, each of the translations were unique, this was achieved by translating the target randomly (a few microns) in $X$-$Y$ plane within the sensor’s field of view. Two-thirds of the data per class was used for training (see Fig. 4(d), Supplement 1, and Figure S13), the remaining one-third was used for testing.

Despite the testing set having a diffuser position on which the model was not trained, we found that the object recovery was excellent. Figure 4(e) compares ground truth test images and generated images, when the recorded test speckle pattern was inputted into the best model. The average time to reconstruct from a speckle pattern of 256 x 256 pixels was $\sim$1.43 seconds. Complete training and testing parameters are tabulated in Table S1.

We believe this is the first experimental demonstration of signal retrieval of a translated object from behind a diffuser without using quasi-planar devices such as spatial light modulators (SLMs), which have limited spatial resolution and spectral acceptance in IR. Moreover, the displayed patterns on SLMs are mere approximations of complex (amplitude + phase) objects. It is important to test any approach on real objects and fields rather than displayed patterns on SLMs.

6. Discussion amd conclusions

Our results show that cGAN-based-models can be efficiently trained to generate faithful representations $G(x,z) \sim y$ (Figure S1). The size of the training set did not exceed 144 image pairs for this work (see Table S1 for summary of model details). Nonetheless, the object reconstructions were not only qualitatively comparable to the ground truth, but were also accurate enough to enable quantitative post-processing. In a previous study [24] we found that cGANs could be employed for phase unwrapping while keeping the training dataset sizes minimal for various object classes, demonstrating another scenario for which cGANs can be rapidly-trained.

By increasing the complexity of our scattering scenario in a step-wise fashion, we identified an additional strategy for keeping the training set to a minimum. When the object of interest P was obscured by two scattering layers S and S’, we formed our training image pairs by simulating discrete positions of P with integer step size, despite the testing dataset having real sub-integer step sizes from diffusion. Surprisingly, the testing dataset fitted very well despite the network training on integer steps, demonstrating that with cGANs it is possible to have accurate object reconstructions with small training datasets. This strategy is computationally less expensive as it lessens need to train with a large dataset, whilst retaining high reconstruction accuracy and low RMSE of extracted parameters.

We were also surprised that the cGAN could be used to transform a spatially shift variant optical system to a shift invariant system for accurate object localization. As a result, it became possible to localize diffraction limited objects such as impulses in a diffusive media (Fig. 4(a-c)). The implication of these results is that this concept can be expanded to image any class of objects with pixel-level accuracy. Indeed we found that even when translating the object of interest, a USAF target, to an untrained location, the model could still generate faithful object retrievals.

The robustness of cGANs with diffused and speckled data warrants further discussion. The cGAN network architecture employs convolutional neural networks in the generator and discriminator blocks. Based on our results we hypothesize that cGANs are efficient at learning the input-output mapping, even when the input pixel representation is randomized. This randomized input is further down-sampled to a lower dimensional latent space (non-discernible by humans) in the generator. We found that in contrast to other methods, there is no strict requirement for a direct visual correspondence between the input and output image pairs while training these networks. This condition is well suited for purposes where the objects of interest may be in motion or translated. To further add to this proposition, Kim and coworkers [42] showed that GANs such as DiscoGANs can be trained to learn cross domain relations efficiently. Similarly Wolf and coworkers [43] showed that it is possible to perform cross domain image generation with GAN training in an unsupervised fashion i.e. without explicitly forming the input-output pairs.

Lastly, our analytical results for speckle intensity autocorrelation shows that the speckles formed in the sensor plane for for a specific object class (say structure $5$ in a USAF test target) in the input plane are unique to that particular object class. Based on this, we deduce that image pair dataset formation for training a cGAN is quite effective for outputting a reliable trained model owing to this unique characteristic property of speckles.

The main limitation of our approach is that in some scenarios GANs are difficult to train/converge owing to their adversarial nature, thus requiring constant parameter tweaking to make them stable. Convergence should be constantly checked during training, by monitoring the discriminator and generator losses. Mode collapse is also possible, resulting in less diversity in generated images, again requiring parameter tweaking.

Future work will involve studying strongly-scattering dynamic systems using cGANs and other pathways for real time object retrieval. One other area of interest would be real time PSF modeling and rapid field of view reconstructions for a dynamic system.

In summary, cGANs can compensate for the presence of a scattering layer and enable quantitative feature extraction. cGANs can also be rapidly-trained and thus enables opportunities to compensate for scattering in increasingly complex optical systems.

Funding

Australian Research Council (DE210100291).

Acknowledgments

Siddharth Rawat thanks UNSW Sydney for scholarship support. Anna Wang is a recipient of the UNSW Scientia Fellowship and Australian Research Council Discovery Early Career Award (DE210100291). We would also like to thank Luke Marshall and Christopher Lee for 3D printing the optical hardware for experiments.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content including methods and additional figures, and Visualizations 1–6.

References

1. T. Tian, Z. Yang, and X. Li, “Tissue clearing technique: Recent progress and biomedical applications,” J. Anat. 238(2), 489–507 (2021). [CrossRef]

2. E. C. Costa, D. N. Silva, A. F. Moreira, and I. J. Correia, “Optical clearing methods: An overview of the techniques used for the imaging of 3d spheroids,” Biotechnol. Bioeng. 116(10), 2742–2763 (2019). [CrossRef]

3. J. A. Izatt, M. R. Hee, G. M. Owen, E. A. Swanson, and J. G. Fujimoto, “Optical coherence microscopy in scattering media,” Opt. Lett. 19(8), 590–592 (1994). [CrossRef]

4. W. Denk, J. H. Strickler, and W. W. Webb, “Two-photon laser scanning fluorescence microscopy,” Science 248(4951), 73–76 (1990). [CrossRef]

5. G. Satat, M. Tancik, and R. Raskar, “Towards photography through realistic fog,” in 2018 IEEE International Conference on Computational Photography (ICCP), (IEEE, 2018), pp. 1–10.

6. D. Psaltis and I. N. Papadopoulos, “The fog clears,” Nature 491(7423), 197–198 (2012). [CrossRef]

7. D. Kijima, T. Kushida, H. Kitajima, K. Tanaka, H. Kubo, T. Funatomi, and Y. Mukaigawa, “Time-of-flight imaging in fog using multiple time-gated exposures,” Opt. Express 29(5), 6453–6467 (2021). [CrossRef]

8. D. M. Kocak, F. R. Dalgleish, F. M. Caimi, and Y. Y. Schechner, “A focus on recent developments and trends in underwater imaging,” Mar. Technol. Soc. J. 42(1), 52–67 (2008). [CrossRef]

9. W. W. Hou, “A simple underwater imaging model,” Opt. Lett. 34(17), 2688–2690 (2009). [CrossRef]

10. N. Borhani, E. Kakkava, C. Moser, and D. Psaltis, “Learning to see through multimode fibers,” Optica 5(8), 960–966 (2018). [CrossRef]

11. Y. Li, Y. Xue, and L. Tian, “Deep speckle correlation: a deep learning approach toward scalable imaging through scattering media,” Optica 5(10), 1181–1190 (2018). [CrossRef]

12. J. Kubby, S. Gigan, and M. Cui, Wavefront shaping for biomedical imaging (Cambridge University Press, 2019).

13. R. Horstmeyer, H. Ruan, and C. Yang, “Guidestar-assisted wavefront-shaping methods for focusing light into biological tissue,” Nat. Photonics 9(9), 563–571 (2015). [CrossRef]

14. A. Boniface, J. Dong, and S. Gigan, “Non-invasive focusing and imaging in scattering media with a fluorescence-based transmission matrix,” Nat. Commun. 11(1), 6154 (2020). [CrossRef]

15. S. M. Popoff, G. Lerosey, R. Carminati, M. Fink, A. C. Boccara, and S. Gigan, “Measuring the transmission matrix in optics: an approach to the study and control of light propagation in disordered media,” Phys. Rev. Lett. 104(10), 100601 (2010). [CrossRef]

16. C. Ma, X. Xu, Y. Liu, and L. V. Wang, “Time-reversed adapted-perturbation (trap) optical focusing onto dynamic objects inside scattering media,” Nat. Photonics 8(12), 931–936 (2014). [CrossRef]

17. O. Katz, P. Heidmann, M. Fink, and S. Gigan, “Non-invasive single-shot imaging through scattering layers and around corners via speckle correlations,” Nat. Photonics 8(10), 784–790 (2014). [CrossRef]

18. H. Ruan, Y. Liu, J. Xu, Y. Huang, and C. Yang, “Fluorescence imaging through dynamic scattering media with speckle-encoded ultrasound-modulated light correlation,” Nat. Photonics 14(8), 511–516 (2020). [CrossRef]

19. M. Alterman, C. Bar, I. Gkioulekas, and A. Levin, “Imaging with local speckle intensity correlations: theory and practice,” ACM Trans. Graph. 40(3), 1–22 (2021). [CrossRef]

20. E. Edrei and G. Scarcelli, “Memory-effect based deconvolution microscopy for super-resolution imaging through scattering media,” Sci. Rep. 6(1), 33558 (2016). [CrossRef]

21. J. Bertolotti, “Unravelling the tangle,” Nat. Phys. 11(8), 622–623 (2015). [CrossRef]

22. J. Schneider and C. M. Aegerter, “Guide star based deconvolution for imaging behind turbid media,” J. Eur. Opt. Soc.-Rapid Publ. 14(1), 21 (2018). [CrossRef]

23. J. Bertolotti, E. G. Van Putten, C. Blum, A. Lagendijk, W. L. Vos, and A. P. Mosk, “Non-invasive imaging through opaque scattering layers,” Nature 491(7423), 232–234 (2012). [CrossRef]

24. S. Rawat and A. Wang, “Accurate and practical feature extraction from noisy holograms,” Appl. Opt. 60(16), 4639–4646 (2021). [CrossRef]

25. C. Yan, B. Gong, Y. Wei, and Y. Gao, “Deep multi-view enhancement hashing for image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell. 43(4), 1445–1451 (2020). [CrossRef]

26. K.-C. C. Chien and H.-Y. Tu, “Complex defect inspection for transparent substrate by combining digital holography with machine learning,” J. Opt. 21(8), 085701 (2019). [CrossRef]

27. Y. Rivenson, Y. Wu, and A. Ozcan, “Deep learning in holography and coherent imaging,” Light: Sci. Appl. 8(1), 85 (2019). [CrossRef]

28. Y. Zhang, T. Liu, M. Singh, E. Çetintaş, Y. Luo, Y. Rivenson, K. V. Larin, and A. Ozcan, “Neural network-based image reconstruction in swept-source optical coherence tomography using undersampled spectral data,” Light: Sci. Appl. 10(1), 155 (2021). [CrossRef]

29. Y. Zhang, H. C. Koydemir, M. M. Shimogawa, S. Yalcin, A. Guziak, T. Liu, I. Oguz, Y. Huang, B. Bai, Y. Luo, Z. Wei, H. Wang, V. Bianco, B. Zhang, R. Nadkarni, K. Hill, and A. Ozcan, “Motility-based label-free detection of parasites in bodily fluids using holographic speckle analysis and deep learning,” Light: Sci. Appl. 7(1), 108 (2018). [CrossRef]

30. S. Yoon, M. Kim, M. Jang, Y. Choi, W. Choi, S. Kang, and W. Choi, “Deep optical imaging within complex scattering media,” Nat. Rev. Phys. 2(3), 141–158 (2020). [CrossRef]

31. S. Zhu, E. Guo, J. Gu, L. Bai, and J. Han, “Imaging through unknown scattering media based on physics-informed learning,” Photonics Res. 9(5), B210–B219 (2021). [CrossRef]

32. W. Tahir, H. Wang, and L. Tian, “Adaptive 3d descattering with a dynamic synthesis network,” Light: Sci. Appl. 11(1), 42 (2022). [CrossRef]

33. U. Teğin, M. Yıldırım, İ. Oğuz, C. Moser, and D. Psaltis, “Scalable optical learning operator,” Nat. Comput. Sci. 1(8), 542–549 (2021). [CrossRef]

34. K. Skarsoulis, E. Kakkava, and D. Psaltis, “Predicting optical transmission through complex scattering media from reflection patterns with deep neural networks,” Opt. Commun. 492, 126968 (2021). [CrossRef]

35. X. Lai, Q. Li, Z. Chen, X. Shao, and J. Pu, “Reconstructing images of two adjacent objects passing through scattering medium via deep learning,” Opt. Express 29(26), 43280 (2021). [CrossRef]

36. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2017), pp. 1125–1134.

37. J. Langr and V. Bok, GANs in Action: Deep Learning with Generative Adversarial Networks (Manning Publications, 2019).

38. I. M. Vellekoop, “Controlling the propagation of light in disordered scattering media,” arXiv preprint arXiv:0807.1087 (2008).

39. S. Barkley, T. G. Dimiduk, J. Fung, D. M. Kaz, V. N. Manoharan, R. McGorty, R. W. Perry, and A. Wang, “Holographic microscopy with python and holopy,” Comput. Sci. Eng. 22(5), 72–82 (2019). [CrossRef]

40. A. Wang, R. F. Garmann, and V. N. Manoharan, “Tracking e. coli runs and tumbles with scattering solutions and digital holographic microscopy,” Opt. Express 24(21), 23719–23725 (2016). [CrossRef]

41. A. Wang, T. G. Dimiduk, J. Fung, S. Razavi, I. Kretzschmar, K. Chaudhary, and V. N. Manoharan, “Using the discrete dipole approximation and holographic microscopy to measure rotational dynamics of non-spherical colloidal particles,” J. Quant. Spectrosc. Radiat. Transfer 146, 499–509 (2014). [CrossRef]

42. T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks,” in International Conference on Machine Learning, (PMLR, 2017), pp. 1857–1865.

43. Y. Taigman, A. Polyak, and L. Wolf, “Unsupervised cross-domain image generation,” arXiv preprint arXiv:1611.02200 (2016).

Name	Description
Supplement 1	Supplement 1: Further calculations and model details
Visualization 1	This video shows the trajectory of P diffusing in front of a scattering layer S. This video file is 50 frames long and is generated at 5 fps.
Visualization 2	This video shows a frame-by-frame comparison between the ground truth holograms vs cGAN generated and noisy holograms for the scattering scenario consisting of P diffusing in front of a scattering layer. This video file is 50 frames long and is gener
Visualization 3	This video shows the trajectory of P diffusing behind a scattering layer. This video file is 500 frames long and is generated at 30 fps.
Visualization 4	This video shows a frame-by-frame comparison between the ground truth holograms vs cGAN generated and noisy holograms for the scattering scenario consisting of P diffusing behind a scattering layer. This video file is 500 frames long and is generated
Visualization 5	This video shows the trajectory of P diffusing between two scattering layers. This video file is 2000 frames long and is generated at 30 fps.
Visualization 6	This video shows a frame-by-frame comparison between the ground truth holograms vs cGAN generated and noisy holograms for the scattering scenario consisting of P diffusing between two scattering layers. This video file is 2000 frames long and is gene

cGAN-assisted imaging through stationary scattering media

Abstract

1. Introduction

2. Network architecture

3. Phase conjugation robustness

3.1 Mathematical formulation of first and second order speckle intensity autocorrelation functions

4. Microscopic object retrieval

4.1 Object retrieval from behind a single scattering layer

4.2 Object retrieval from between two scattering layers

5. Experimental object localization and retrieval

5.1 Accurate object localization in a spatially shift variant system

5.2 Imaging through stationary scattering media

6. Discussion amd conclusions

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (7)

Data availability

Cited By

Figures (4)

Optics Express