Three-dimensional-generator U-net for dual-resonant scanning multiphoton microscopy image inpainting and denoising

Chia-Wei Hsu; Chun-Yu Lin; Yvonne Yuling Hu; Chi-Yu Wang; Shin-Tsu Chang; Ann-Shyn Chiang; Shean-Jen Chen; Shean-Jen Chen

doi:10.1364/BOE.474082

1. Introduction

Multiphoton microscopy is a powerful technique for in-vivo brain imaging [1]. Many high-speed volumetric methods have been integrated with multiphoton microscopy to achieve dynamic signal imaging. However, the image quality of such methods is degraded as the result of an insufficient accumulation time and lack of photons. Thus, obtaining the photon budget which provides the optimal tradeoff between the imaging speed and the accumulation time remains a major challenge in the biological compatibility. Adaptive excitation provides an effective approach for high-speed brain function imaging by illuminating only the region of interest (ROI) [2]. Moreover, when acquiring the full image, previous studies have shown that the imaging speed can be improved by replacing the galvanometer used in conventional multiphoton microscopes with a resonant mirror (RM) or tunable acoustic gradient (TAG) lens, thereby achieving a line scan rate of 16 kHz to 1 MHz and an imaging speed over 10 volumes per second (vps) [3,4]. Besides scanners, the imaging speed can also be increased through the use of spatial separation and time delay optical designs. For example, reverberation microscopy enables simultaneous multiplane imaging at video rates [5]. Similarly, free-space angular-chirp-enhanced delay technology makes possible imaging rates of up to 3,000 frames per second (fps) [6]. The present study develops a rapid dual-resonant scanning multiphoton (DRSM) microscope incorporating both a RM and a TAG lens, which facilitate simultaneous high-speed scanning in the x- and z-axis directions. The proposed system achieves a high imaging rate of 8,000 fps and 31.25 vps. However, the DRSM imaging rate is obtained at the expense of a low number of photons per voxel and a large number of Lissajous patterning residuals, and hence the volumetric images have a severe negative signal-to-noise ratio (SNR). Also, up to 70% voxel missing for a large scanning volume such as 200 × 200 × 100 µm³ is resulted. Moreover, the imaging results are thus consistent with those of previous studies, which showed that in-vivo multiphoton imaging systems typically suffer optical aberrations and scattering effects when applied to thick biological tissue [7]. As described above, inpainting and denoising is an important strategy for image restoration. One of the first digital image inpainting algorithms was that proposed in 2000 [8]. A pioneering image denoising and inpainting method based on deep neural networks was further proposed in 2012 [9]. In the past few years, various other deep learning approaches for image inpainting have also been proposed [10].

Deep learning methods, in which multilayer artificial neural networks are trained using a large number of representative samples, provide a powerful technique for solving many nonlinear mathematical and physical problems. The literature contains several proposals for performing image inpainting and denoising in microscopy using classical deep learning architectures such as U-Net [11], generative adversarial network (GAN) [12], and residual channel attention network (RCAN) [13]. Content-aware image restoration based on deep learning has also been proposed as a means of solving the trade-off problem between the imaging speed and the spatial resolution [14]. Furthermore, deep learning has been used to implement three-dimensional (3D) virtual refocusing in fluorescence microscopy in order to increase the depth-of-field without the need for axial scanning [15]. In other words, this method can achieve high imaging speed and predict unscanning layers. The authors in [16] used a 3D RCAN model to improve the spatial resolution and visual quality of fluorescence microscopy image volumes. NVIDIA proposed a method for filling irregular holes using partial convolutions [17]. DeepFill achieves free-form image inpainting using gated convolutions [18]. Advances in GAN technology have improved the image inpainting quality of natural images, including the combined use of Pix2Pix [19] and conditional GANs [20]. Several authors have employed GANs to enhance the detail and texture of super-resolution images [21,22]. In recent years, many scholars have combined image inpainting and microscopy. For example, the authors in [23] used a two-stage multiscale GAN to generate full scanning transmission electron micrographs from various partial scans, including spiral and jittered gridlike [23]. In [24], a learning transfer strategy was employed to improve the image quality and recover the imaging loss in in-vivo two-photon fiberscopy [24]. In [25], the authors used ResNet to enhance the imaging speed in single molecule localization microscopy. However, most of these recent applications focus on two-dimensional (2D) images. Deep learning architectures inevitably incur a massive computation cost when applied to 3D in-vivo bioimaging. Furthermore, to the best of the current authors’ knowledge, existing rapid TAG scanning microscopy systems do not generate large-scale, high-quality images, but are intended mainly instead for the observation of locally dynamic signals. By contrast, the 3D generator U-Net architecture proposed in the present study is targeted specifically at the inpainting and denoising of 3D in-vivo images with a large scanning volume and image size and a rapid volumetric imaging rate.

Building upon these studies, the present work develops a modified and effective 3D-generator U-Net architecture for inpainting and denoising the volumetric images obtained by the proposed DRSM microscope. The results show that the convergence time of the proposed U-Net model in training is around half that of a general 3D U-Net model. The U-Net model is pretrained using a dataset consisting of simulated fluorescent microbead images containing missing voxels, background noise, and Lissajous scanning patterning residuals. The experimental results confirm that the axial distortion and poor spatial resolution of the DRSM images can be effectively restored by the trained 3D U-Net model. The mushroom bodies (MBs) of the drosophila brain are commonly utilized to demonstrate functional bioimaging [26,27]. However, even though the spatial geometry of the MBs is similar from one fly to the next, generating a large dataset of representative MB images using a simulation approach is complex and time consuming. Thus, in the present study, the 3D U-Net model is further trained using an in-vitro drosophila brain image dataset. The trained model again shows the ability to restore the DRSM images and produce clear neural images. However, in-vivo imaging remains a problem since the posture of the MBs may move over time. Drawing on the short computation time of the modified 3D-generator U-Net architecture, a transfer learning method based on the in-vitro image dataset is employed to further train the model to restore the quality of the in-vivo images. Herein, in-vivo drosophila brain imaging with deep-restoration not only keeps the advantage of temporal resolution, but also maintain image quality. The analysis of SNR and structure similarity index measure (SSIM) achieve obvious improvements. Overall, the results show that the deep-restoration rapid DRSM microscope not only achieves a volumetric imaging rate up to 31.25 vps for 200 × 200 × 100 µm³ with 256 × 256 × 128 voxels, but also achieves an image quality comparable to that of the gold standard point-scanning multiphoton microscopy technique.

2. System and methodology

2.1 Overall system setup and rapid volumetric imaging

Figure 1(a) illustrates the basic structure of the DRSM microscope. The excitation light source was provided by a pulsed Ti:sapphire laser (Tsunami, Spectra-Physics, USA) with a repetition rate of 80 MHz and a wavelength of 920 nm. The power of the laser light was adjusted using a half-wave plate and polarizer. The laser light was scanned in the z- and x-axis directions using a TAG lens (TAG Lens 2.0, TAG Optics Inc., USA) and RM (CRS series, Cambridge, USA) set to frequencies of around 456 kHz and 8 kHz, respectively. Y-axis scanning was performed by a galvo mirror (GM) (6215H, Cambridge, USA). The laser light was focused into the sample using a water immersion objective (Plan-Apochromat 40x/ 1.0 NA, Carl Zeiss, Germany). The focusing depth was controlled using an objective scanner (OS, PD72Z4CAA, Physik Instrumente, UK) with a maximum travel range of 400 µm. To prevent photobleaching, an electronically-controlled shutter was installed before the scanning system. The aforementioned components were integrated in an upright optical microscope (Axio imager 2, Carl Zeiss, Germany) and the sample position was controlled by a motorized stage (HEP4AXIM/B ProScan, PriorScientific, UK) with a three-axis encoder. The fluorescent signals emitted from the sample were passed through a dichroic mirror (FF670-Di01, Semrock, USA), filter, and lens, and were collected by photomultiplier tubes (PMTs) (H7422-40, Hamamatsu, Japan), thereby enabling analog detection. The scanning and signal acquisition processes were synchronized using self-written LabVIEW software implemented on a FPGA board. The combined use of the TAG lens and RM makes possible a frame rate of 8,000 fps in the x-z plane. In addition, the GM scans 256 layers in the x-z plane. Therefore, the theoretical volumetric imaging rate is equal to 31.25 vps (i.e., 8,000 / 256). In other words, the volume image acquisition time is around 0.032 seconds. The number of effective pulses with an 80 MHz repetition rate was around 30% (i.e., 80 MHz / 31.25 vps / (256 × 256 × 128 voxels)) of the total number of voxels for a large image volume. In the rapid volumetric experiments, imaging was typically performed at 200 × 200 × 100 µm³ with 256 × 256 × 128 voxels. In addition, the pixel size was set as ∼0.78 µm per pixel to match the spatial resolution requirement. For the ground truth image experiments performed using a conventional point scanning multiphoton microscope, the typical frame acquisition time was around 3.15 seconds. The training and image restoration scheme was implemented on a 3D U-Net model, as shown in Fig. 1(b). Image pre-processing was performed using MATLAB (Mathworks, USA).

Fig. 1. (a) Overall system setup of proposed rapid dual-resonant scanning multiphoton microscope. (b) Training and restoration scheme implemented using 3D U-Net model.

Download Full Size | PDF

2.2 3D-generator U-Net for volume image inpainting and denoising

Figure 2(a) shows the structure of the 3D generator proposed in the present study for restoring the volumetric images obtained by the DRSM microscope. As shown, the U-Net structure consists of convolution layers, batch normalization, rectified linear units (ReLUs), max pooling, concatenation, and transposed convolution. In general, 3D models with a large patch size consume significant GPU memory resources. Accordingly, various modifications were made to the U-Net network architecture and parameters to ensure that the model fit within 24 GB of GPU memory. Due to the hardware limitations, the mini batch size was set as 1. In particular, all of the 2D convolution layers were replaced by 3D convolution layers, where each convolution layer had a kernel size of 3 × 3 × 3 and a stride of 1. In addition, the number of filters on the encoder side was increased from 32 to 256, while that on the decoder side was decreased from 512 to 64. The number of filters on the final convolution layer was 1. The max pooling kernel size was set as 2 × 2 × 2 and downsampling was performed using a factor of 2. Finally, the transposed convolution size was set as 2 × 2 × 2 and upsampling was conducted using a factor of 2 (i.e., the kernel size was 2 × 2 × 2 with a stride of 2).

Fig. 2. (a) 3D-generator U-Net architecture for volume image restoration. (b) 3D discriminating network for 3D GAN.

Download Full Size | PDF

Lissajous patterning residuals in rapid volumetric images are directional. Thus, images with any orientation other than those of the original image degrade the training results. Consequently, neither rotation nor flip operations were applied in the data augmentation process. Augmentation on the simulated microbead images were performed instead by randomly patching images with missing voxels, background noise which additive Poisson distribution using imnoise function in MATLAB, and Lissajous scanning patterning residuals. The training model was implemented in the 3D U-Net structure using the MATLAB Deep Learning Toolbox in a single NVIDA GeForce RTX 3090. The U-Net model was trained using the conventional mean-squared error loss function. In a preliminary stage of this study, a general 3D U-Net network was found to take twice as long to converge in training as the modified model (i.e., training times of 47.52 hours and 21.6 hours, respectively, given an image size of 256 × 256 × 80 voxels, a dataset of 111 volumes, and an epoch size of 100). However, both networks showed a similar convergence speed under the same loss function. For comparison purposes, a 3D GAN was also implemented with Pix2Pix [19] as the discriminating network and the proposed 3D U-Net model as the generative network. Figure 2(b) shows the structure of the 3D discriminating network in the present study. The 3D GAN was trained using the loss function of Pix2Pix [14]. For both models, the ranges of the rapid volumetric images and ground truth images, respectively, were linearly rescaled to [0,1]. Furthermore, 7.5% of the whole volume images were randomly chosen as validation data. All of the experiments used the Adam optimizer with a learning rate of 0.0002. With the exception of the transfer learning process, the number of training epochs was set as 100. The quality of the restored volumetric images was evaluated by computing the 3D peak-SNR (PSNR) and structure similarity index measure (SSIM) using the built-in MATLAB function.

3. Experimental results and discussions

3.1 Optimization of 3D U-Net model using microbead-based simulation training model

The 3D U-Net model was trained using a dataset consisting of 80 synthetic microbead images designed to simulate the volumetric images captured by the proposed DRSM microscope. The training process was performed using the diffraction-limit point-spread-function (PSF) of a point-scan two-photon microscope convolved with microbeads as the ground truth. In accordance with the water immersion objective used in the experimental platform, the simulated lateral and axial resolutions were set as 0.39 and 1.53 µm, respectively. Furthermore, to simulate the actual microbead images obtained from the DRSM microscope, system Poisson noise and multiple Lissajous patterns were added to the PSF of the microscope. Typical Lissajous patterns were measured directly from the DRSM microscope under various settings of the TAG lens frequency and phase delay between the TAG lens and RM.

In the testing stage, the trained 3D U-Net model was utilized to enhance the image quality of 10-µm fluorescent beads (F-8836, Thermo Fisher Scientific, USA) fixed in agarose gel on a glass slide and captured by the DRSM microscope at a frame rate of 8,000 fps in the x-z plane and a volumetric imaging rate of 30 vps. As shown in Fig. 3(a), the original (i.e., non-restored) x-y cross-section image of the 10-µm fluorescent beads not only contained obvious patterning residuals, but also had a large quantity of noise and many missing pixels. Figure 3(b) shows the typical restoration result obtained by the trained 3D U-Net model. For ease of comparison, the single 10-µm fluorescent beads shown within the two red dashed squares in Figs. 3(a) and 3(b), respectively, were selected for further analysis. Figures 3(c) and 3(d) show the x-z cross-section images of the selected beads before and after 3D U-Net inpainting and denoising, respectively. A detailed inspection of the two images shows that 3D U-Net not only inpaints the missing pixels and denoises the image, but also improves the axial resolution. Figures 3(e) and 3(f) show the k_xk_y and k_xk_z spectra of the beads in Figs. 3(a) and 3(c), respectively. Many high frequency components related to the noise and patterning residuals are observed in both figures. Figures 3(g) and 3(h) show the corresponding spectra of the beads in the restored images (i.e., Figs. 3(b) and 3(d), respectively). The results confirm that the 3D U-Net model improves both the SNR and the spatial resolution of the image. Moreover, the model can enable dynamic volumetric imaging (not just dynamic signal tracking) to be performed with significant SNR improvement. The microbead structure is simple and regular, and hence the dynamic volumetric image of the microbead is well restored. However, for irregular structures such as the drosophila brain in Secs. 3.2 & 3.3, the restoration process is significantly more complex.

Fig. 3. x-y cross-section images of 10-µm fluorescent beads captured by rapid DRSM microscope: (a) before and (b) after 3D U-Net restoration. The ability of the 3D U-Net model to restore dynamic 31.25-vps volume images was evaluated by driving a 10-µm fluorescent bead fixed on a glass slide in the x-y direction using a motorized stage (see Visualization 1 for the resulting 4D image). (c) and (d) x-z cross-section images of selected beads in red dashed squares of Figs. 3(a) and 3(b), respectively. Scale bar is 10 µm. (e) and (f) Spatial spectra corresponding to unrestored images (Figs. 3(a) and 3(c)). (g) and (h) Spatial spectra corresponding to restored images (Figs. 3(b) and 3(d)).

Download Full Size | PDF

3.2 In-vitro drosophila brain image improvement

The applicability of the 3D U-Net model to in-vitro bioimage restoration was investigated using the MB structure of the drosophila brain (OK-107). Although the spatial geometries of different MBs are similar, generating a training dataset consisting of a large number of detailed MB images using a simulation technique is complex and time-consuming. Thus, in contrast to the training process described in Sec. 3.1, MB volumetric images of the MB structure were acquired directly using a gold standard point-scanning multiphoton microscopy technique to serve as the ground truth. Figure 4(a) shows the volumetric image obtained by stacking the 128 DRSM images obtained at 0.78-µm intervals over the depth range of 0∼100 µm using a different color for each layer with a volumetric imaging rate of 31.25 vps. Figure 4(b) shows the corresponding volumetric image by stacking the 128 images captured by the point-scanning multiphoton microscope with a volumetric imaging rate of 0.002 vps (i.e., an acquisition time of 500 seconds) as ground truth. Figure 4(c) shows the restored volumetric image obtained by 3D U-Net, while Fig. 4(d) shows the restored volumetric image obtained by the 3D GAN. In both cases, the volumetric images correspond to a depth range of 0∼100 µm and an imaging region of 200 × 200 × 100 µm³. Furthermore, both models were trained using 490 DRSM images with corresponding ground truth images and validated using 40 DRSM images (i.e., 7.5% of the whole volume images). The DRSM images and ground truth images were grabbed under the same microscope body, and hence image registration was not required prior to network training. The training time for the 490 images was around 141 hours and 396 hours for the 3D U-Net and 3D GAN models, respectively. By contrast, the validation time for each image with a size of 256 × 256 × 128 voxels was just 0.8 seconds. Thus, even though the training time is excessive, the validation time appears to be acceptable for in-vivo imaging applications.

Fig. 4. (a) Rapid DRSM volumetric image of in-vitro drosophila brain captured with imaging rate of 31.25 vps, and (b) ground truth volumetric image acquired by point-scanning multiphoton microscope with volumetric imaging rate of 0.002 vps. For both images, the imaging region has a size of 200 × 200 × 100 µm³ with 256 × 256 × 128 voxels, and hence the layer-to-layer distance is 0.78 µm. Thus, each volumetric image stacks 128 x-y images, where each layer (depth) is projected with a different color. (c) and (d) Restored volumetric images obtained using 3D U-Net model and GAN model, respectively. The images also comprise 128 layers uniformly distributed over a depth of 0∼100 µm. The scalar bar indicates a distance of 20 µm. Visualization 2 shows the rendering process of Fig. 4(a)–4(d) from left to right, respectively. (e) and (f) SSIM and PSNR values, respectively, of U-Net and GAN models over a depth of 20∼80 µm. (g) PSNR improvements with standard deviations of 40 original, 40 U-Net restored, and 40 GAN restored images, respectively.

Download Full Size | PDF

The experimental results presented in Figs. 4(c) and 4(d) show that both models overcome the negative SNR, missing pixel, and Lissajous pattern residual problems in the original volumetric image (Fig. 4(a)). In particular, the two models not only preserve the volumetric imaging rate, but also achieve an imaging quality close to that of the ground truth image (Fig. 4(b)). Comparing the restored images obtained by the two models, it is seen that the GAN model successfully restores even the neural cells in the MB structure (as shown by the white arrow in Fig. 4(d)). Compared with the image of the U-Net model, the image of the GAN can highlight the appearance of the neurons in the visual effect. In other words, the GAN model tries to generate the image of the neurons. Figures 4(e) and 4(f) show the computed SSIM and PSNR values, respectively, of the various images over the depth range of 20∼80 µm. Note that the red lines in the two figures show the SSIM and PSNR values of Figs. 4(a) to 4(b), while the green lines show the SSIM and PSNR values of Figs. 4(c) to 4(b) for the 3D U-Net model, and the blue lines show the SSIM and PSNR values of Figs. 4(d) to 4(b) for the GAN module. As shown in Fig. 4(e), the two models improve the SSIM from less than 0.05 (red line) to 0.8∼0.95 (green and blue lines). Similarly, the models improve the PSNR of the original volumetric image from 17∼24 dB to around 36∼45 dB. Figure 4(g) shows both the 40 restored images via 3D U-Net and GAN for further statistics analysis. The average PSNR improvements with standard deviations of the U-Net and GAN models were found to be around 18.1 ± 2.7 dB (i.e., 22.4 to 40.5 dB) and 17.4 ± 2.4 dB (i.e., 22.4 to 39.8 dB), respectively. Thus, the overall PSNR improvement of the U-Net model is slightly higher than that of the GAN model. A close inspection of the restored images reveals the presence of some artifacts, particularly in the image restored using the GAN.

As mentioned in Sec. 3.1, the microbead structure is regular and predictable. However, the MB has an irregular structure. Furthermore, the upper and lower halves of the MB volumetric image have different characteristics. Therefore, the lower part is difficult to restore according to the upper part information.

3.3 In-vivo drosophila brain image restoration via transfer learning

The practical feasibility of the 3D U-Net restoration model was further investigated by restoring in-vivo drosophila images with the assistance of a transfer learning approach. As described in Sec. 3.2, the U-Net model was originally trained using 490 in-vitro images. However, compared to in-vitro images, which are stationary, in-vivo images typically show posture changes and distortion over time. Figures 5(a) and 5(b) show two in-vivo volumetric images of the drosophila brain captured using the DRSM microscope with a volumetric imaging rate of 31.25 vps and the point-scanning microscope with a volumetric imaging time of 500 seconds, respectively. Note that the imaging region is 200 × 200 × 100 µm³ in both cases and the layer-to-layer distance is 0.78 µm. As in Sec. 3.2, the images comprise 128 x-y images acquired over a depth range of 0∼100 µm. Figures 5(c) and 5(d) show the restored volumetric images obtained from the U-Net model trained on the original 490 in-vitro dataset and a 93 in-vivo dataset, respectively. The PSNR of the original volumetric image in Fig. 5(a) is 22.9 dB. The restored volumetric image in Fig. 5(c) has an obviously poor quality with a SNR improvement of just 2 dB over the original image. The restored volumetric image shown in Fig. 5(d) obtained using the U-Net model trained on a small number (i.e., 93) of in-vivo images shows a better quality than that of the image in Fig. 5(c) and has a PSNR of 27.8 dB. Thus, the results confirm that the in-vivo volumetric imaging performance of the proposed DRSM microscope can be improved by utilizing the 3D U-Net model trained on in-vivo images.

Fig. 5. In-vivo drosophila brain volumetric images: (a) rapid DRSM image obtained with volumetric imaging rate of 31.25 vps, (b) ground truth image obtained with volumetric imaging time of 500 seconds, (c) restored image obtained only using 400 in-vitro dataset model (i.e., without transfer learning), (d) restored image obtained using 100 in-vivo dataset model (i.e., without transfer learning), (e) restored image obtained using transfer learning (TL) from 400 in-vitro dataset pretrained model, (f) restored image obtained using TL from 100 in-vivo dataset pretrained model, and (g) restored image obtained using trained model only from 50 in-vivo targeting drosophila images (i.e., new model). Note that scalar bar has a length of 20 µm. Visualization 3 shows the rendering process of Fig. 5(a), 5(b), 5(d), and 5(f) from left to right, respectively. (h) Intensity profiles for marked regions in Figs. 5(b), and 5(e)–5(g).

Download Full Size | PDF

In theory, the restoration performance of the U-Net model can be further enhanced by increasing the size of the training dataset. However, this increases the time and expense of the training process. Moreover, in-vivo images are usually time-varying and time-sensitive. Thus, compiling a robust training dataset of a sufficient size poses a significant challenge. Accordingly, in the present study, the performance of the U-Net model was improved through the use of a transfer learning approach. Figures 5(e) and 5(f) show the restored volumetric images obtained when the U-Net model was trained using 10 in-vivo targeting drosophila images added to the 490 in-vitro dataset pretrained model and to the 93 in-vivo dataset pretrained model, respectively. Note that the targeting drosophila images were obtained from the same fly for further testing. Compared to Figs. 5(c) and 5(d), in which the volumetric images were restored without transfer learning, the images in Figs. 5(e) and 5(f) have a clearly improved quality. The corresponding PSNR values are found to be 39.7 dB and 40.2 dB, respectively. Moreover, the training time for the 10 in-vivo targeting drosophila images is only around 5,800 seconds for 50 epochs. Thus, the transfer learning method achieves both a rapid convergence time and a high image quality.

Figure 5(g) shows the restored volumetric image obtained by the new 3D U-Net model trained only using 50 in-vivo targeting drosophila images. The PSNR of the reconstructed image has an acceptable value of 33.7 dB. However, the training time is around 5 times longer than that for the 10 in-vivo targeting drosophila images described above. In other words, the transfer learning process based on 10 in-vivo targeting drosophila images not only improves the quality of the reconstructed images, but also yields a significant reduction in the training time. The SSIM values of the images in Fig. 5(a) and Figs. 5(c)–5(g) are 0.6, 0.91, 0.92, 0.98, 0.98, and 0.96, respectively. To further quantify the quality of the restored volumetric images, Fig. 5(h) shows the intensity profiles corresponding to the regions indicated by the white arrows in Figs. 5(b), 5(e)–5(g) at a detection depth of 68 µm. Note that the red, blue, green and cyan lines correspond to the intensity profiles in Figs. 5(b), 5(e), Fig. 5(f), and 5(g), respectively. The intensity profiles of Figs. 5(e) and 5(f) are very similar to that of the ground truth image in Fig. 5(b). However, of the two profiles, that in Fig. 5(f) is closer to the ground truth intensity profile. The PSNR and SSIM values for Figs. 5(e) and 5(f) are obviously improved compared to those of the original volumetric image. Furthermore, although the restored volumetric images fail to retain some of the details in the original image, the image morphology is still clear. A short training time is also important in implementing transfer learning approaches. Overall, the proposed modified 3D-generator U-Net architecture not only achieves a high imaging quality, but also has a low computation time, and therefore provides a simple yet highly efficient method for practical in-vivo bioimaging applications. Thus, the experimental results demonstrate that the transfer learning-based 3D U-Net model provides a rapid and effective approach for the restoration of in-vivo dynamic bioimages in the future.

4. Conclusions

A modified 3D U-Net model has been trained using simulated fluorescent microbead images containing Lissajous patterning residuals, missing voxels, and background noise characteristic of the volumetric images captured using the proposed DRSM microscope. It has been shown that the convergence time of the modified U-Net model in training is around half that of a general 3D U-Net model. The experimental results have shown that the trained 3D U-Net model not only inpaints and denoises the original fluorescent microbead images, but also improves the axial resolution. Furthermore, for in-vitro drosophila brain images, the 3D U-Net model not only maintains the 256 × 256 × 128 voxel and 31.25 vps imaging capability of the DRSM microscope, but also achieves a restored image quality close to that of the ground truth images acquired using a gold standard point-scanning multiphoton microscopy technique. In general, in-vivo bioimaging poses a significant challenge to multiphoton microscopy since the posture of the imaged object tends to vary over time. Accordingly, the present study has utilized a transfer learning approach based on an in-vivo pretrained model to improve the restoration performance for in-vivo images. The experimental results have shown that the transfer learning process not only reduces the convergence time of the training process, but also improves the restored image quality. However, there are still some differences for the irregular MB structure. Despite of one pulse per voxel and more than 70% missing voxel for a scanning volume up to 200 × 200 × 100 µm³ with 256 × 256 × 128 voxels, in-vivo drosophila brain imaging by using the DRSM microscope with the deep volume image restoration not only keeps the advantage of temporal resolution, but also provides sufficient image quality.

Funding

Ministry of Education; Higher Education Sprout Project of the National Yang Ming Chiao Tung University and Ministry of Education (MOE) in Taiwan; National Science and Technology Council (110-2221-E-A49 -009, 110-2221-E-A49 -059 -MY3); Veterans General Hospitals and University System of Taiwan Joint Research Program (VGHUST111-G3-2-3).

Disclosures

The authors declare no conflicts of interest.

Data availability

The data and results presented in this paper are not publicly available currently, but are available from the authors upon reasonable request.

References

1. N. G. Horton, K. Wang, D. Kobat, C. G. Clark, F. W. Wise, C. B. Schaffer, and Chris Xu, “In vivo three-photon microscopy of subcortical structures within an intact mouse brain,” Nat. Photonics 7(3), 205–209 (2013). [CrossRef]

2. B. Li, C. Wu, M. Wang, K. Charan, and C. Xu, “An adaptive excitation source for high-speed multiphoton microscopy,” Nat. Methods 17(2), 163–166 (2020). [CrossRef]

3. L. Kong, J. Tang, J. P. Little, Y. Yu, T. Lämmermann, C. P. Lin, R. N. Germain, and M. Cui, “Continuous volumetric imaging via an optical phase-locked ultrasound lens,” Nat. Methods 12(8), 759–762 (2015). [CrossRef]

4. S. Han, W. Yang, and R. Yuste, “Two-color volumetric imaging of neuronal activity of cortical columns,” Cell Rep. 27(7), 2229–2240.e4 (2019). [CrossRef]

5. D. R. Beaulieu, I. G. Davison, K. Kılıç, T. G. Bifano, and J. Mertz, “Simultaneous multiplane imaging with reverberation two-photon microscopy,” Nat. Methods 17(3), 283–286 (2020). [CrossRef]

6. J. Wu, Y. Liang, S. Chen, C.-L. Hsu, M. Chavarha, S. W. Evans, D. Shi, M. Z. Lin, K. K. Tsia, and N. Ji, “Kilohertz two-photon fluorescence microscopy imaging of neural activity in vivo,” Nat. Methods 17(3), 287–290 (2020). [CrossRef]

7. C. Rodríguez, A. Chen, J. A. Rivera, M. A. Mohr, Y. Liang, R. G. Natan, W. Sun, D. E. Milkie, T. G. Bifano, X. Chen, and N. Ji, “An adaptive optics module for deep tissue multiphoton imaging in vivo,” Nat. Methods 18(10), 1259–1264 (2021). [CrossRef]

8. M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, “Image inpainting,” SIGGRAPH ‘00, 417–422 (2000).

9. J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep neural networks,” Advances in Neural Information Processing Systems 25(NIPS 2012), 1–9 (2012).

10. J. Jam, C. Kendrick, K. Walker, V. Drouard, J. G.-S. Hsu, and M. H. Yap, “A comprehensive review of past and present image inpainting methods,” Comput. Vis. Image Und. 203, 103147 (2021). [CrossRef]

11. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” MICCAI, 234–241 (2015).

12. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Commun. ACM 63(11), 139–144 (2020). [CrossRef]

13. Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” Proc. Euro. Conf. Comp. Vis., 286–301 (2018).

14. M. Weigert, U. Schmidt, T. Boothe, A. Müller, A. Dibrov, A. Jain, B. Wilhelm, D. Schmidt, C. Broaddus, S. Culley, M. Rocha-Martins, F. Segovia-Miranda, C. Norden, R. Henriques, M. Zerial, M. Solimena, J. Rink, P. Tomancak, L. Royer, F. Jug, and E. W. Myers, “Content-aware image restoration: pushing the limits of fluorescence microscopy,” Nat. Methods 15(12), 1090–1097 (2018). [CrossRef]

15. Y. Wu, Y. Rivenson, H. Wang, Y. Luo, E. Ben-David, L. A. Bentolila, C. Pritz, and A. Ozcan, “Three-dimensional virtual refocusing of fluorescence microscopy images using deep learning,” Nat. Methods 16(12), 1323–1331 (2019). [CrossRef]

16. J. Chen, H. Sasaki, H. Lai, Y. Su, J. Liu, Y. Wu, A. Zhovmer, C. A. Combs, I. Rey-Suarez, H.-Y. Chang, C. C. Huang, X. Li, M. Guo, S. Nizambad, A. Upadhyaya, S.-J. J. Lee, L. A. G. Lucas, and H. Shroff, “Three-dimensional residual channel attention networks denoise and sharpen fluorescence microscopy image volumes,” Nat. Methods 18(6), 678–687 (2021). [CrossRef]

17. G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” Proc. Euro. Conf. Comp. Vis., 85–100 (2018).

18. J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. Huang, “Free-form image inpainting with gated convolution,” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 4471–4480 (2019).

19. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 1125–1134 (2017).

20. T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional GANs,” Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 8798–8807 (2018).

21. C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 4681–4690 (2017).

22. X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy, “ESRGAN: Enhanced super-resolution generative adversarial networks,” Proc. Euro. Conf. Comp. Vis., 63–79 (2018).

23. J. M. Ede and R. Beanland, “Partial scanning transmission electron microscopy with deep learning,” Sci. Rep. 10(1), 8332 (2020). [CrossRef]

24. H. Guan, D. Li, H.-c. Park, A. Li, Y. Yue, Y. A. Gau, M.-J. Li, D. E. Bergles, H. Lu, and X Li, “Deep-learning two-photon fiberscopy for video-rate brain imaging in freely-behaving mice,” Nat. Commun. 13(1), 1534 (2022). [CrossRef]

25. Z. Zhou, W. Kuang, Z. Wang, and Z.-L. Huang, “ResNet-based image inpainting method for enhancing the imaging speed of single molecule localization microscopy,” Opt. Express 30(18), 31766–31784 (2022). [CrossRef]

26. K.-J. Hsu, Y.-Y. Lin, A.-S. Chiang, and S.-W. Chu, “Optical properties of adult Drosophila brains in one-, two-, and three-photon microscopy,” Biomed. Opt. Express 10(4), 1627–1637 (2019). [CrossRef]

27. K.-J. Hsu, Y.-Y. Lin, Y.-Y. Lin, K. Su, K.-L. Feng, S.-C. Wu, Y.-C. Lin, A.-S. Chiang, and S.-W. Chu, “Millisecond two-photon optical ribbon imaging for small-animal functional connectome study,” Opt. Lett. 44(13), 3190–3193 (2019). [CrossRef]

Name	Description
Visualization 1	The resulting 4D image for a 10-µm fluorescent bead
Visualization 2	The rendering process of Fig. 4(a)-4(d)
Visualization 3	The rendering process of Fig. 5(a), 5(b), 5(d), and 5(f)

Three-dimensional-generator U-net for dual-resonant scanning multiphoton microscopy image inpainting and denoising

Abstract

1. Introduction

2. System and methodology

2.1 Overall system setup and rapid volumetric imaging

2.2 3D-generator U-Net for volume image inpainting and denoising

3. Experimental results and discussions

3.1 Optimization of 3D U-Net model using microbead-based simulation training model

3.2 In-vitro drosophila brain image improvement

3.3 In-vivo drosophila brain image restoration via transfer learning

4. Conclusions

Funding

Disclosures

Data availability

References

Supplementary Material (3)

Data availability

Cited By

Figures (5)

Biomedical Optics Express