Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Deep learning-based high-speed, large-field, and high-resolution multiphoton imaging

Open Access Open Access

Abstract

Multiphoton microscopy is a formidable tool for the pathological analysis of tumors. The physical limitations of imaging systems and the low efficiencies inherent in nonlinear processes have prevented the simultaneous achievement of high imaging speed and high resolution. We demonstrate a self-alignment dual-attention-guided residual-in-residual generative adversarial network trained with various multiphoton images. The network enhances image contrast and spatial resolution, suppresses noise, and scanning fringe artifacts, and eliminates the mutual exclusion between field of view, image quality, and imaging speed. The network may be integrated into commercial microscopes for large-scale, high-resolution, and low photobleaching studies of tumor environments.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Owing to the rapid development of computer science and technology in recent years, coupled with exponential growth in processing power and the emergence of readily available and voluminous datasets, deep learning has had a prominent impact in many fields of research. Various deep learning networks have been proposed, ranging from the early convolutional neural networks (CNNs) used in medical image registration and recognition, to the recent and promising generative adversarial networks (GANs), which have enabled major advances in biomedical imaging [16]. For example, deep learning has been shown to improve the accuracy of chondrocyte viability measurement in nonlinear optical microscopy, making it possible to automatically segment and classify chondrocytes [7]. Deep learning can also enhance the resolution of multi-modal nonlinear optical images [5,8,9], revealing microscopic biological details with greater precision. Furthermore, digital staining technology can be realized using deep learning, which accommodates the generation of quantitative phase images for virtual histological staining. It avoids the process of sample histological staining, thereby simultaneously reducing costs and saving time [10]. Image reconstruction is one of the most valuable applications of advanced deep learning techniques, constituting a methodology that transcends the limitations of traditional microscopes and enables image enhancement without requiring a change of fundamental system equipment [1113].

The techniques of multiphoton microscopy have enabled the visualization of multiple types of cells and their associated extracellular matrices and constitute powerful tools for the pathological analysis of tumors. At present, multiphoton microscopy (MPM) possess the advantages of deep penetration, high resolution, and low light bleaching, facilitating the visualization of comprehensive microstructure and abundant biochemical phenomena [1418]. Accurate and comprehensive multi-dimensional photophysical information can be obtained with MPM, thereby expanding the research horizons of many aspects of biological research such as physiology, neurobiology, and tissue engineering [19,20]. However, the imaging quality and resolution of all multiphoton imaging technologies are universally limited by the physical constraints of optical diffraction and sensor technology. The size of field of view (FOV) of a traditional MPM conflicts with the imaging speed, making it difficult to visualize large-field and high-speed cell distributions and interactions [21]. This limitation can adversely affect analyses of tumor invasion and other phenomena. A considerable challenge for MPM is to perform large-field imaging of tumors, skin, or other tissues with resolutions at cellular scales [22]. Typical methods for increasing the FOV of MPM include adopting specially designed scanning modes, increasing the diameter of the objective lenses, and increasing the number of objective lenses [2225]. At present, large-FOV imaging is readily obtained by scanning a series of adjacent FOVs in rapid succession through an advanced microscope equipped with a high-speed precision mechanical scanner such as galvo-resonant scanner. However, compared with slower, non-resonant scanning, images generated in this way are more vulnerable to shot noise, insufficient resolution, and scanning artifacts, which detract from the quality of multiphoton imaging. Galvo-resonant acquisition significantly decreases pixel dwell times and necessitates higher excitation powers or frame averaging to achieve image quality that is comparable to slower, non-resonant scanning. The reduction of pixel dwell time is associated with low photon number from desired signals. To visualize the weak signals, a large amplification of PMT is employed, which also yields high noise level [9,26]. The image pixels number collected in GR scanning mode is also less than that of dual-axis galvo scanning mode, which could lead to lower resolution when the pixel size is equal or larger than the resolution. On the other hand, the captured images from GR scanner exhibited strong scanning fringe artifacts (SFA, see Fig. 5) resulting from the couple of fast GR scanning and leakage of ambient light [27]. Further studies are required to enhance the resolution of multiphoton images acquired under different equipment-related conditions. Moreover, an appropriate deep learning network is required to avoid replacement of existing equipment with expensive alternative imaging systems in the pursuit of image enhancement.

To obtain multi-modal information employing a large FOV while ensuring high-quality and high-speed imaging, we propose a self-alignment dual-attention-guided residual-in-residual generative adversarial network (SADA-GAN). The network is trained using datasets generated using specific methods (see subsection 3.4). These datasets consist of two-photon excited fluorescence (TPEF) and second harmonic generation (SHG) channels. A galvo-resonant scanning multiphoton image of 896 × 3200 pixels collected in 2 min 20 s can be reconstructed into a high-resolution image of 3584 × 12800 pixels within 2 min. In contrast, the target image corresponding to the dual-axis galvo scanning needs 13 min 6 s to be generated. Upgrading a large image of 1.59 × 1.59 mm2 from 10240 × 10240 pixels (20× objective lens) to 40960 × 40960 pixels (network output) requires an average of 5 min 20 s instead of 48 min for images acquired with a 60× objective lens with the same level of quality. Furthermore, adverse noise that affects semantic information extraction is significantly suppressed, while optical resolution and quality indices are significantly improved. SADA-GAN has superseded MPM in meeting the mutual demands associated with imaging speed, field size, and spatial resolution.

2. Principle

2.1 Label-free and large-field multiphoton imaging

Multiphoton microscopy has demonstrated its high effectiveness for analyzing tumor cell types and tumor micro-environments over the past decade. In our study, flavin adenine dinucleotide (FAD) were monitored using TPEF to obtain redox ratio information. SHG was used to observe the distribution of collagen fibers to obtain cancer-invasion-associated information. High-resolution multiphoton microscopic images can display more accurate and diverse physiological details with longer acquisition times. The transformation of autofluorescence harmonic images characterized by low resolution and high noise to counterparts characterized by high resolution and high quality (low noise) is highly valuable in a research context.

We used a commercially available multiphoton microscopy system to obtain autofluorescence harmonic images. The system comprised an 8-kHz galvo-resonant scanning system and a dual-axis galvo scanning system (Fig. 1(a)). The excitation light was switched between the two scanning systems using two beam splitters (BSs) with motorized shutters. The fast resonant scanning mode was operated through the galvo-resonant scanning system to obtain the input images, functioning at 30 fps with a frame time of 0.13 s for 1024 × 1024 pixels. The high-quality images regarded as ground truth (GT) were generated by the dual-axis galvo scanning system in slow galvo scanning mode at 0.48 fps with a frame time of 8 s for 4096 × 4096 pixels. To accelerate the acquisition, the pixel number of the input images was a quarter of that of the ground truth images. We tuned the excitation wavelength of the femtosecond laser (with pulse length ∼100 fs) to 1140 nm to simultaneously detect non-centrosymmetry information from the SHG signals on collagen and functional information from the FAD signals. The laser beam passed through the scanning mirrors, scanning lens, and tube lens before reaching the back focal plane of the objective lens of the 0.75-NA microscope objective. The autofluorescence and SHG signals were collected and separated by a combination of a dichroic mirror (DM) and filter (long-pass 685 nm and band-pass 641/75 nm for TPEF of FAD, long-pass 593 nm and band-pass 570/10 nm for SHG). To enable an efficient excitation, a group delay dispersion (GDD) of 8000 fs2 was used to compress the pulse in time before passing it through dispersive optical elements. The power on the sample was less than 50 mW.

 figure: Fig. 1.

Fig. 1. Multiphoton microscopy. a Commercial multiphoton microscope with a fast galvo-resonant scanning system and slow dual-axis galvo scanning system. b Composition of low-quality input and high-quality target images (SHG and TPEF superimposed channels) in datasets. SL: scan lens; TL: tube lens; DM: dichroic mirror; CL: collect lens; OB: objective.

Download Full Size | PDF

Twenty frozen slices, each with a thickness of 5 µm, were obtained on a freezing microtome for multiphoton imaging. We used a high-NA objective lens (MRD71600, 60×, 1.40 NA, Nikon) to capture high-spatial-resolution images. Images with relatively low resolution were obtained using a low-NA objective lens (MRD70200, 0.75 NA, Nikon). Our training dataset was composed of the images collected in the three modalities shown in Fig. 1(b). The low-quality input images consisted of ten large images of 10240 × 10240 pixels each (obtained using a 20× objective lens), thirty-eight galvo-resonant scanning images of 2176 × 2176 pixels each, and twenty images of 1024 × 1024 pixels each captured with a 3.7 mW low-power excitation. The high-quality target images consisted of ten large images of 30720 × 30720 pixels each (obtained using a 60× objective lens), thirty-eight dual-axis galvo scanning images of 8704 × 8704 pixels each, and twenty images of 4096 × 4096 pixels each captured with a 14.3 mW high-power excitation. The testing dataset consists of two large paired images from 20× to 60× objective lens, six pairs of images from resonant-galvo scanning to dual-axis galvo scanning, and four pairs of images from low power to high power, whose pixel numbers correspond to the input and GT of the training dataset, respectively. Training and testing had no data overlap, i.e., the test images shown in this article were blindly generated by the deep network. Except for the images obtained with galvo-resonant scanning and those captured with a 20× objective lens in the input data, the data were acquired by dual-axis galvo scanning using a 60× objective lens.

2.2 Deep learning network architecture

The proposed deep learning network architecture can effectively achieve a combination of rapid imaging, large area coverage, and high image quality. It transforms previously fuzzy, inferior images to clear, superior images. Thus far, we had realized the preliminary image warp (IW) process between the input low-quality and high-quality GT images [28], owing to the misalignment caused by the noncollinearity of two scanning systems and the movement of samples (Fig. 2(a)). Pairs of preregistered images were therefore generated as a training dataset.

 figure: Fig. 2.

Fig. 2. Deep learning network architecture. a Overall network architecture, including IW, SAPCD, and RRDAB modules, convolution layers, the skip connection, up-sampling operation, and discriminator. b Image registration framework. Image warping is performed by the ORB feature extraction method to construct the training dataset. The self-alignment pyramid, cascading, and deformable convolutions (SAPCD) are embedded in the generator. c RRDAB reconstruction network. Cascaded RRDAB modules are used for image reconstruction, and dense connectivity DAB for feature communication. The attention blocks consist of two SE modules. DConv: deformable convolution; L: level; LR: low-resolution; HR: high-resolution; DAB: dense attention block; IW: image warp; FA: feature alignment; HR: high resolution; GT: ground truth.

Download Full Size | PDF

Deformable convolution is proposed to learn the position of irregular convolution, with the aim of obtaining a more effective feature expression ability. For multi-frame sequence images, this approach enables the realization of pixel calibration at the feature level by selecting the sampling point convolution with position deviation without explicit motion estimation such as optical flow. We used the modulated deformable module [29] as the single image pixel calibration module. For example, given a convolutional kernel of K sampling locations, with a corresponding weight and pre-specified offset for the k-th location denoted by ${w_k}$ and ${p_k}$, respectively, the aligned features at each position $p$ (deformable convolution) are expressed as

$${x^a}(p )= \mathop \sum \nolimits_{k = 1}^K {w_k} \cdot x({p + {p_k} + \Delta {p_k}} )\cdot \Delta {m_k},$$
where $\Delta{p_k}\; $ and $\Delta{m_k}\; $ denote the learnable location offset and modulation scalar range [0,1] at the k-th location, respectively. The quantity $x(p )$ represents the input feature maps at location $p$. Bilinear interpolation was used to calculate $x({p + {p_k} + \Delta{p_k}} )$ because $({p + {p_k} + \Delta{p_k}} )$ was usually in the form of a fraction. A separate convolution layer acted on the same input feature maps x with a total of 3 K channels, where 2 K channels were associated with $\Delta{p_k}$ and K channels with $\Delta{m_k}$, with a sigmoid layer for the range [0,1].

Inspired by the Video Restoration framework with Enhanced Deformable convolutions (EVDR) [30], Laplacian pyramid super-resolution network (LapSR) [31], and texture transformer network for image super-resolution (TTSR) [32], we introduced a self-alignment pyramid, cascading, and deformable convolution (SAPCD) framework. It was based on alignment and feature extraction, with a focus on delicately registering the input and GT datasets (Fig. 2(b)). First, we performed feature extraction of Level 1 (L1). Second, we down-sampled the feature of level ${L_i}$ ($i = 2,3, \ldots $) from that of level $L$($i - 1$) by stride convolution with the stride value equal to 2. At level ${L_i}$, we predicted the offsets (orange boxes in Fig. 2(b)) using the features of level ${L_i}$ and 2× up-sampled offsets from level $L({i + 1} )$. The aligned features were similarly predicted with deformable results and up-sampled aligned features from the $L({i + 1} )$ level. The predicted values were calculated as follows:

$$\Delta{P^i} = f({x,{{({\Delta{P^{i + 1}}} )}^{ \uparrow 2}}} );$$
$${{({{x^a}} )}^i} = g({DConv({{x^i},({\Delta{P^i}} )} ),{{({{{({{x^a}} )}^{i + 1}}} )}^{ \uparrow 2}}} ),$$
where
$$\Delta{P^l} = \{{\Delta{p^l}} \}$$
and f and g are different convolutions for different predicting tasks. The notation ${(. )^{ \uparrow s}}$ denotes bilinear interpolation upscaling by a factor s. The quantity $DConv$ is a deformable convolution as expressed by Eq. (1).

Inspired by enhanced super resolution general adversarial networks (ESRGAN) [33], we introduced a residual-in-residual dense attention block (RRDAB). It was based on the residual-in-residual dense block (RRDB) module, which works with dense blocks to reconstruct the features subsequent to the SAPCD alignment (Fig. 2(c)). With reference to squeeze-and-excitation (SE) networks and the convolutional block attention module (CBAM) [34,35], SE blocks are generally used adjacent to dense connection, the learning channel, and spatial attention. The first SE blocks in networks can assign weights to different fields adaptively, learning the target feature maps. The second SE blocks can assign weights to different regions adaptively, knowing which regions to learn from the feature map. SADA-GAN with SE blocks can enhance the dense connection that concatenates different receptive fields of input into channels. To clarify the network structure concretely, ${F_i}$ and ${F_{i - 1}}$ are denoted as the output and input, respectively, of the RRDAB, whereas X and Y are denoted as the input and output, respectively, of the RDAB:

$$\left\{ {\begin{array}{cc} {RDAB:} &{Y = X + DAB(X )\times B}\\ {RRDAB:} &{{F_i} = {F_{i - 1}} + RDAB({RDAB({RDAB({{F_{i - 1}}} )} )} )\times \beta ,} \end{array}} \right.$$
where the quantity $\beta $ is used for residual scaling in Fig. 2(c).

Furthermore, the output feature of RRDAB can be given by Eq. (6),

$${x^{out}} = RRDA{B_d}({RRDA{B_{d - 1}}({ \ldots RRDA{B_2}({RRDA{B_1}({{x^a}} )} )} )} ),$$
where ${x^a}$ is the aligned feature from SAPCD, ${x^{out}}$ is the output feature, and d is the RRDAB cascading number. Continuing with the architecture, the high-resolution image ${I^{HR}}$ generated by network generator can be given by the following equation,
$${I^{HR}} = conv({LReLU({conv({{{({{x^{fea1}} + Att({conv({{x^{out}}} )} )} )}^{ \uparrow S}}} )} )} ),$$
where S is the 4× up-sampling scale, ${x^{fea1}}$ is the feature map extracted from the first convolution layer in SAPCD, and the channels of ${x^{fea1}}$ and ${x^{out}}$ are equal. In addition, dense attention block (DAB) can be represented by the following formula, which combines multi-level residual network and dense connections as depicted in Fig. 2(c),
$$DAB:\; \; \; \; \; \; \; \; {l_i} = \left\{ {\begin{array}{cc} {LReLU({Conv({[{{l_0},{l_1}, \ldots ,{l_{i - 1}}} ]} )} ),i = 1,2, \ldots ,m - 2}\\ {Att({LReLU({Conv({[{{l_0},{l_1}, \ldots ,{l_{i - 1}}} ]} )} )} ),i = m - 1}\\ {Att({Conv({[{{l_0},{l_1},{l_{i - 1}}} ]} )} ),i = m,} \end{array}} \right.$$
where ${l_i}$ is the $i$-$th$ layer in the m-layer DAB, and ${l_0}$ and ${l_m}$ are input and output layers, respectively. [${l_0},{l_1}, \ldots ,{l_{i - 1}}$] represents the concatenation of the predicted feature maps in layers 0, 1, …, $i$-1 and LReLU is an abbreviation for Leaky ReLU.

As for the attention operation in DAB block,

$$\left\{ {\begin{array}{l} {Att:} \qquad \qquad \quad \;\; {fea^{\prime\prime} = SE({SE({fea} )} )} \\ {SE:} \qquad \qquad {fea^{\prime} = \sigma ({FC({AvgPool({fea} )} )})} \\ {FC({AvgPool({fea} )} )= F{C_1}({ReLU({F{C_0}({fe{a_{avg}}} )} )} ),} \end{array}} \right.$$
where $fea$, $fe{a_{avg}}$, $fea^{\prime}$, and $fea^{\prime\prime}$ denote the input feature map, feature map after global average pooling, output feature map after the first SE block, and output feature map after the second SE block, respectively. The quantity $fea \in {R^{C \times h \times w}}$, where C, h, and w are the channel, height, and width of $fea$, respectively, whereas $FC$ is the fully connected layer. $F{C_0} \in {R^{C/r \times C}}$ and $F{C_1} \in {R^{C \times C/r}}$, where r is the reduction ratio. The symbol $\sigma $ represents the sigmoid function.

A perceptual loss function based on high-level features extracted from the pretrained VGG19 networks was used to improve convergence speed to ensure a more optimal reconstruction of edges and details. Instead of pixel differences, high-level image information extracted from the GT and output was compared with respect to perceptual loss to ensure the similarity of content and global structure between the GT and output. The high perceptual loss in our network is expressed as follows:

$${L_{hi - percep}} = \frac{1}{{{W_{5,4}}{H_{5,4}}}}\mathop \sum \limits_{h = 1}^{{H_{5,4}}} \mathop \sum \limits_{w = 1}^{{W_{5,4}}} {{({{\varphi_{5,4}}({I_{h,w}^{GT}} )- {\varphi_{5,4}}({I_{h,w}^{SR}} )} )}^2},$$
where ${\varphi _{i,j}}$ is the $j$-th feature map layer at the block before the $i$-th max-pooling and ${\varphi _{5,4}}$ is chosen as the perceptual loss layer. The quantities ${W_{5,4}}$ and ${H_{5,4}}$ are the width and height of the ${\varphi _{5,4}}$ feature map in VGG, respectively. The quantity ${I^{SR}}$ is the high-resolution result generated by the network, whereas ${I^{GT}}$ is the intensity information of the GT.

The total loss function for the generator is

$$I_{Gen}^{SR} = {L_{content}} + {\lambda _1}{L_{hi - percep}} + {\lambda _2}{L_{GA{N_{gen}}}}.$$

The loss function of the discriminator is

$$I_{GA{N_{dis}}}^{SR} ={-} logDi{s_{{\mathrm{\Theta }_D}}}({{I^{GT}}} )+ \log ({Di{s_{{\mathrm{\Theta }_D}}}({Ge{n_{{\mathrm{\Theta }_G}}}({{I^{LR}}} )} )} ),$$
where Gen and Dis are abbreviations for the generator and discriminator, respectively, and ${\mathrm{\Theta }_G}$ and ${\mathrm{\Theta }_D}$ are the network parameters of Gen and Dis, respectively. The quantities ${\lambda _1}$ and ${\lambda _2}$ are the coefficients of the different losses. The quantity $I_{GAN\_dis}^{SR}$ discriminates the generated high-resolution result to false and the GT to true. However, after adversarial training, these quantities converge, allowing the generator to predict the high-resolution image residing on the natural image manifold and visually converge to the GT.

The generative loss that encourages the generator to make the high-resolution results natural (and thereby deceive the discriminator) is expressed as

$${L_{GA{N_{gen}}}} = ({ - logDi{s_{{\mathrm{\Theta }_D}}}({Ge{n_{{\mathrm{\Theta }_G}}}({{I^{SR}}} )} )} ).$$

The L1 loss is expressed as

$${L_{content}} = \frac{1}{{WH}}\mathop \sum \nolimits_{h = 1}^H \mathop \sum \nolimits_{w = 1}^W |{I_{h,w}^{GT} - I_{h,w}^{SR}} |,$$
where ${L_{content}}$ ensures pixel-level consistency and concurrently avoids over-smoothing in some high-resolution methods using the mean square error (MSE).

The proposed SADA-GAN network aims to further close the undeniable gap between MPM images generated by deep-learning networks and ground-truth images [7,9,12]. SADA-GAN introduced a block with a higher capacity named RRDAB, which used residual scaling to promote training an omniscient deep-learning network. Relativistic average GAN is used in discriminator, and learns to evaluate whether the currently generated MPM image is more realistic than the other. Two SE channel attention modules embedded in the generator, one can direct the networks to select proper feature maps, the other SE can indicate the feature barycenter were introduced to further guide the reconstruction processes. They help to identify the feature regions and crucial features to avoid over-smoothness and improve high-resolution details. The attention-guided dense connections with the assistance of the proper loss functions can prevent the high-resolution reconstruction from generating deceptive artifacts. Moreover, the VGG features in the perceptual loss are taken to focus more on the textures and edges of generated MPM images for perceptually satisfaction.

2.3 Sample preparation

Skin tissues were collected from patients at The Sixth Affiliated Hospital of Shenzhen University & Huazhong University of Science and Technology Union Shenzhen Hospital, with approval of biomedical research ethics involving humans by the associated Scientific Research Ethics Committees. All patients with diagnosed skin tumors were approached for recruitment. Physicians recruited patients and obtained their study consent. Experienced gynecological oncologists conducted histological identification and classification according to the FIGO classification standards. Tissue samples were surgically removed, snap-frozen in liquid nitrogen, and stored at −80 °C until being cut into 5-µm sections for unstained applications using a freezing microtome (CM1850, Leica, Germany). The frozen tissue sections were simply covered with a coverslip, imaged by multiphoton microscopy, and preserved by formalin (Anatech) fixation and paraffin embedding.

2.4 Training and testing details

An NVIDIA GTX 1080 Ti GPU (11 GB memory) was used to train the network on the PyTorch framework with 400,000 iterations. Because of video memory limitations, the large input images obtained by the multiphoton system were divided into numerous 128${\times} $128 tiles in the computer to reduce the memory demand of the computer and speed up the process of network training and reasoning. We used two GTX 1080 Ti graphics cards to maximize our resources, where each GPU had a batch size of one, corresponding to one of the nonlinear modalities (TPEF and SHG channels) to accelerate the training processes. The test images shown in this paper are independent of the training dataset. In the training process, the generator and discriminator were alternately updated until the generated image converged to a plateau. The weight of high perceptual loss in loss function is ${\lambda _1} = 0.1$, the weight of GAN loss is ${\lambda _2} = 0.05$. High perceptual loss in the pretrained VGG19 is used as the feature extraction in order to clarify the textures and edges of the generated images, which also ensures pixelwise identity while avoiding over-smoothing.

3. Results and discussion

3.1 Deep learning enables high-speed multiphoton imaging

Commercial multiphoton imaging systems have two scanning modes. One mode obtains images at a rapid rate through the operation of an 8-kHz galvo-resonant scanning system; however, these images will be characterized by large noise levels and low resolution. The other mode obtains images characterized by a high SNR and resolution through a dual-axis galvo scanning system; however, acquisition of images of relatively high quality will inevitably require longer scanning times. The excitation light is switched between the two scanning systems by two beam splitters. SADA-GAN combined with a multiphoton microscopic imaging system effectively solves the incompatibility between imaging speed and image quality. The input image for deep learning is obtained in the rapid resonance scanning mode, corresponding to a scanning speed of 7.7 fps for a 1024 × 1024 pixel image, with a pixel dwell time of 0.5 µs. The resonant scanner oscillates only at the specified frequency, lacking position and speed control, which maximizes the image acquisition rate. We obtained high-quality target images of 1024 × 1024 pixels using the dual-axis galvo slow scanning system at 0.9 fps with a pixel dwell time of 2.2 µs. Because a dual-axis galvo scanner utilizes galvanometer scanning in both the x- and y-axis, a longer pixel dwell time reduces the noise in the target images and enhances the fine texture details. Hyper-selectors can switch the excitation between galvo-resonant and dual-axis galvo scanning systems arbitrarily. To speed up the acquisition, the number of pixels of the input image was selected as one quarter of the number of pixels of the GT image [17].

Multiphoton imaging can reveal many mesoscopic biochemical processes, such as cell metabolism, characterized by the TPEF signals of FAD, which are closely related to tumor growth and metastasis [17]. The SHG signal of collagen fiber, a non-centrosymmetric biological structure, can provide rich information about cancer invasion. Twenty-five frozen basal cell carcinoma slices, each 5 µm thick, were obtained using a freezing microtome. We first operated the galvo-resonant and dual-axis galvo scanning systems to obtain multiple large-FOV multiphoton images of these slices for constructing the training and testing datasets. Figures 3(a)-(c) depict the registered input images with 896 × 3200 pixels, output images, and GT images with 3584 × 12800 pixels, respectively. Large field images are superimposed by TPEF (magenta) and SHG (cyan).

 figure: Fig. 3.

Fig. 3. Deep learning-enhanced multiphoton images with different scanners. a-c Input, result, and target GT large images, respectively. ROI1: superimposed channel; ROI2: TPEF channel; ROI3: SHG channel. d Intensity profiles along the solid line in ROI3. e PSNR of input and output images for n = 10 large images; error bars indicate the mean standard deviation (SD). f Comparison of network inference and GT acquisition time. The values of each scale bar are marked in a.

Download Full Size | PDF

On the whole, semantic information such as the pathological features of basal cell carcinomas can be well preserved. The clumped cancer cells and collagen fibers (indicated by SHG) surrounding them were distinct; however, they were ambiguous in the input image. The noise of the input image caused by the fast galvo-resonant scanning system was greatly suppressed, and the details of the skin epidermis, dermis, sebaceous glands, and sweat glands could be significantly distinguished in the output images. In Fig. 3, three regions of interest (ROIs) in each large image are magnified and overlaid by the TPEF and SHG channels, respectively. ROI1 displays the dermal tissue on the edge of hair follicles, where reticular tissue and banded collagen can be clearly reconstructed, and most of the shot noise is effectively screened out. This type of noise is an additive noise with a Poisson distribution, caused by the random current generated by the thermal activity of the electronic components, and is independent of the signal intensity [36]. ROI2 displays the edge of the sweat gland duct. Information-carrying signals and noise were not differentiated well in the input image, causing difficulty in extracting the semantic information. Fortunately, the reconstructed tissue features were well preserved, significantly reducing these undesirable distortions as a result. ROI3 displays the collagen fibers with orthogonal profiles, which separate two clusters of basal cell carcinoma. Figure 3(d) displays the intensity profiles of the input, output (‘result’), and GT images. The middle peak of the curve is the SHG signal from collagen fibers. The input noise on either side was effectively eliminated in the result. We extracted the intensity scatter points at the signal and performed a Gaussian fit to measure the full width at half maximum (FWHM) of the respective images to further evaluate the improvement in image resolution produced by the network. The FWHM of the input, result, and GT images after fitting were 1266 nm, 930 nm, and 750 nm, respectively. The spatial resolution of the result was improved by 336 nm. The improved resolution is conducive to extracting more precise morphological details in subsequent histological analysis, such as the collagen texture and arrangement and the tumor boundary.

Notably, the average time required for a single large-field input of the two nonlinear optical modalities obtained by the galvo-resonant scanning system was 2 min 20 s. The corresponding average acquisition time of high-quality GT images from the dual-axis galvo scanning system was 13 min 6 s. In contrast, the time required for the network to generate results, including the processes of reading into memory, computing, and stitching images, was less than 2 min. We used MATLAB to calculate a full-reference quality metric, the peak signal-to-noise ratio (PSNR), of the input and result images. The PSNR is a commonly used parameter for evaluating the improvement of microscope image quality. It is defined as the ratio of the peak signal value to the noise value. A higher PSNR value indicates better image quality approaching the GT. In our data, the average PSNR of the reconstructed large-field input was increased by 6 dB. Our combined results emphatically demonstrated the capability of SADA-GAN to suppress distortions such as noise and blurring, while enhancing the galvo-resonant scanning images to the high-quality level of dual-axis galvo scanning with a low associated time cost.

3.2 Deep learning enables large-field and high-resolution multiphoton imaging

Large-field, high-quality imaging is of vital importance for the visualization and accurate analysis of large-field cell distributions and interactions. Although an objective lens with a higher numerical aperture (NA) is accompanied by a higher image resolution, it concomitantly requires a larger quantity of image acquisitions and longer splicing time. The application of a low-NA objective lens sacrifices image resolution to gain higher acquisition efficiency over a large FOV. This trade-off is readily resolved by deep learning. We used a 20× and 60× objective lens to perform large area imaging on ten normal frozen skin slices. We obtained images of 10240 × 10240 pixels by stitching together fifteen images of 4096 × 4096 pixels obtained with the 20× objective lens in 5 min, corresponding to low-quality domains. We applied a 50% overlap between blocks to ensure the accuracy of stitching into the large image. Considering that the field of view of the 20× objective lens was three times larger than that of the 60× objective lens, a 60× objective lens was used to obtain images of 30720 × 30720 pixels, corresponding to high-quality domains with the same fields. To satisfy the performance of fourfold up-sampling of the network, we used bicubic interpolation to enlarge the acquired 60× images to 40960 × 40960 pixels and added them to the training dataset. Because of the inevitable slight rotation and translation of slices during the process of switching between different objective lenses, the IW module was particularly important in managing the quality of our results (see subsection 3.4).

Figures 4(a)-(c) illustrate the network’s optimization of resolution and noise. As shown in ROI1 (top row), the network clearly reduced the collagen filaments and significantly reduced the background noise. As shown in ROI2 (top row), minute signal points were detected at the edge of the slice; the FWHM of the intensity curve was measured to characterize the improvement in resolution, which amounted to 230 nm (Fig. 4h). Figures 4(d)-(e) illustrate the captured multiphoton images and reconstruction results of another skin slice. We selected two TPEF single channel ROIs to demonstrate the effectiveness of the network in generating semantic information. The ROI3 on the left displays microtubules at the sebaceous glands. Notwithstanding that the TPEF signals captured by the low-power objective lens at this position were blurred and noisy, the network emphasized the edges of the reticular connective tissue and suppressed the noise from the input. Most of the microtubules were clearly distinguished; however, a few blurred dense areas remained where the noise was mistakenly regarded as a signal by the network (an example is marked by the yellow dotted circle in ROI4 in Fig. 4(e)). The suspected overexposure position in the image is actually the superposition of complementary colors of TPEF (magenta) and SHG (cyan). We accurately identified the boundary between the stratum spinosum and the corium in the image captured by the 60× objective lens. This feature was difficult to perceive in the fuzzy input image. Subsequent to learning, the network clearly augmented this boundary information, overcoming the difficulty associated with using fuzzy and low-resolution images in a diagnosis. It required an average of only 5 min 20 s to upgrade a large image of 1.59 × 1.59 mm2 from 10240 × 10240 pixels to 40960 × 40960 pixels instead of 48 min for images acquired with a 60× objective lens of the same quality (Fig. 4g). We also evaluated the ten groups of large input and output images using two no-reference image quality scores: the natural image quality evaluator (NIQE) and perception-based image quality evaluator (PIQE). On average, the values of these two scores increased by 73% and 37%, respectively. These results underscore the substantial advances associated with our proposed deep learning network. The images reconstructed by SADA-GAN were significantly similar to the GT in perceived quality, resolution, contrast, and structure.

 figure: Fig. 4.

Fig. 4. Deep learning enables large-field and high-resolution multiphoton imaging. a-f SHG- and TPEF-superimposed channels of input, result, and target GT large images. g Comparison of network inference and GT acquisition time for n = 15 large images. h Curve fitting of intensity profiles along the solid lines in the ROIs of a-c. i NIQE and PIQE of input and output images for n = 200 image tiles; error bars show the mean standard deviation (SD). GT: ground truth.

Download Full Size | PDF

3.3 Deep learning enables low photobleaching imaging

To optimize the SNR of an image, the excitation power is usually increased to ensure a sufficiently strong sample signal; however, this measure also increases the possibility of damaging the sample. Taking skin samples as an example, melanin (which has a low melting point) is enriched in hair follicles. When encountering two-photon excitation imaging, which concentrates an extreme amount of energy at the focal point, the hair follicles of the sample are readily burned and damaged. Therefore, relatively low-power excitation is a necessary measure when analyzing skin. However, acquiring images at low power will introduce a certain degree of noise because of the thermal dark current of the detector. This is consistent with the shot noise mentioned earlier. Taking advantage of SADA-GAN can improve the SNR of low-power images and assist in avoiding sample damage. We acquired images of five sliced samples of Paget’s disease at the appropriate power (8 mW) and confirmed the location to be collected by using bright illumination; subsequently, we gradually reduced the laser power to lower the light intensity of the image, taking care not to lose the signal, and performed low-quality imaging at the determined position. The excitation power of the low-quality input measured by the power meter was 3.7 mW. Prior to moving the samples to the next position, we increased the excitation power to acquire images with high SNR and high contrast at the corresponding positions. Finally, thirty pairs of input and GT images of 1024 × 1024 pixels each, including TPEF and SHG images, were obtained and added to the datasets.

Figures 5(a)-(c) illustrate the SHG, TPEF, and superimposed channels at three different positions. The low-power images of all of the channels are accompanied by obvious noise. Figures 5(d)-(f) display ROIs of different sizes selected from Figs. 5(a)-(c), depicting the elastic collagen fibers at the boundary of the tumor and the fat vesicles separated into globules by a thin layer of loose connective tissue, as well as the scanning stripes at the background. Figures 5g and 5 h depict the intensity profiles along the solid lines in Figs. 5(d) and 5(e). The areas associated with weak input signals are accompanied by featureless background noise, which is unfavorable to the extraction of histological and pathological information. The network converted the low-power, low fluorescent-level image to high-power, high fluorescent-level image in Fig. 5 g and Figs. 5 h. The output fluorescent level is close to the GT image, which shows that the signal of the low-quality input image is raised to the level close to GT, so as to avoid light damage to the sample. The PSNR across 30 images increased by 7.1 dB on average. The network intelligently differentiated between noise and 4signal in the images, notwithstanding that noise and signal are often confused. The network filtered out the noise and specifically enhanced the signal, resulting in excellent clarity and contrast, to the extent of approximating the GT data. Figure 5(i) displays the ROIs of the background area in Fig. 5(f) and the corresponding area of the GT. Figure 5(j) displays the fast Fourier transform (FFT) diagram corresponding to the three images. We observed that the images acquired at low power not only introduced noise but also amplified the transverse scanning fringe artifacts (SFA) caused by the disharmonious oscillation of the scanning galvanometer. SFA can be expressed (approximately) as periodic sinusoidal signals emanating in the y-axis direction; the Fourier transform of a sinusoidal signal consists of two sharp pulse signals symmetric about the origin, as depicted in the first plot in Fig. 5(j). The disappearance of the symmetric point signal on the y-axis in the second plot confirms that the network also screens out the periodic distributed SFA in the input image. The FFT result of the output background is similar to that of the GT. The cross-shaped signal visible in the center is the Fourier transform of a few scattered points in the background.

 figure: Fig. 5.

Fig. 5. Deep learning enables low photobleaching imaging. a-c Low-power excitation input and network’s output of SHG (cyan), TPEF (magenta), and superimposed channels. d-f ROIs of different sizes in a-c. g, h Intensity profiles along the solid line in d, e. i, j ROIs of background areas in f and the corresponding FFT result. N: noise; S: signal; SFA: scanning fringe artifacts; FFT: fast Fourier transform.

Download Full Size | PDF

3.4 Registration study

Because of the non-collinearity of the two scanning systems, there was a small amount of rotation and translation between the image pairs acquired by the rapid galvo-resonant scanner and the slow dual-axis galvo scanner. Similarly, because the slice samples needed to be removed during switching of the objective lenses, the input and GT images acquired with 20× and 60× objective lenses were inevitably misaligned, with the deviation in pixel position varying from 0 to 200 (Fig. 6(a)). If raw data with these characteristics were directly used in the training network without registering pixel levels, it would be difficult to successfully employ the SAPCD module. In such an event, it would be challenging to extract spatial features and channel features, rendering the reconstruction of ideal images practically impossible.

 figure: Fig. 6.

Fig. 6. Evaluation of IW module on image reconstruction. a Schematic of image misalignment and correction (cyan channel: SHG; magenta channel: TPEF). b Results of registered training, non-registered training, and comparison of RRDAB and RCAN.

Download Full Size | PDF

We adopted a typical feature extraction and distortion method: the oriented FAST and rotated BRIEF (ORB) procedure [28]. The ORB algorithm substantially reduces the spatial mismatch of coupled pixels and forms aligned training datasets. We embedded the IW module based on the ORB algorithm in front of the SAPCD module. To verify the necessity of the IW module, we trained SADA-GAN with raw datasets and with registered datasets generated by the IW module. The output of this procedure is displayed in Fig. 6(b), presenting the output after training and the output of RCANGAN trained on registered data as a peer comparison. The images generated by SADA-GAN without the IW module are very nebulous, with most of the specific features becoming indistinguishable in the SHG channel, coupled with the occurrence of disfigurement of important texture details caused by false-negative judgments. In the TPEF channel, deleterious ghosting is observed at the signal edge and mistaken inferences are generated in the background area (elliptical dashed line) by false-positive judgements. It is noteworthy that the result tiles are also inconsistent, resulting in stitching artifacts. All these flaws arise from the substantial pixel disparity, which causes insufficient overlap between the ROIs of the input images and those of the GT images, and hinders the convergence of the training process. Compared with the RCAN-GAN results that appear to be out of focus, the results of SADA-GAN appear more natural and distinct. Benefitting from adaptive convolution by SAPCD and high-level perceptual loss, RRDAB significantly increased the receptive field to capture the pixel deviation. Finally, the attention SE modules that were introduced after the convolution layer to effectively extract and learn the correct spatial and channel features, prevented obvious errors in the depth network and facilitated the output of satisfactory and authentic results. The structural similarity index (SSIM) is displayed at the bottom left of each image. SSIM comprehensively evaluates luminance, contrast, and image structure, to evaluate images improvement in registration study, which is calculated from ten test input of large image and each network result with corresponding GT respectively. The value shows that the test result (128 image tiles) of SADA-GAN trained with registered data is 13% higher than the input on average, which is closest to GT data.

4. Conclusion

We have demonstrated a method based on deep learning to enhance label-free multiphoton imaging. It improved the low-quality images acquired by fast galvo-resonant scanning to the quality level of corresponding low-noise images, with an average 6 dB improvement in PSNR. Additionally, we enabled images obtained by low-NA objective lenses to improve in resolution by approximately 200 nm while avoiding the considerable time penalty associated with acquiring the same field in high resolution by using a high-NA objective lens. Furthermore, our method suppressed the noise levels of low-power excited images, with an average improvement in PSNR of 7.1 dB, while scanning fringe artifacts could be removed while avoiding photic damage. In addition, we solved the problem of image rotation and translation caused by the non-collinearity of the two scanning systems and the switching of objective lenses. Because of the misalignment of pixel positions, fine textures cannot be well distinguished, resulting in image defocus. Our proposed network was trained using preregistered datasets, composed of TPEF and SHG images, to obtain more accurate spatial features and obtain excellent inference ability. To summarize, our network successfully resolved the conundrum of mutual exclusion between imaging speed, field size, and spatial resolution associated with multiphoton imaging systems. Furthermore, it demonstrates an attractive potential for translating this technique into routine clinical applications, particularly for large-field and quantitative studies of metastatic colonization.

Funding

National Key Research and Development Program of China (2021YFF0502900); National Natural Science Foundation of China (61835009, 61935012, 62127819, 62175163, 62225505); Shenzhen Talent Innovation Project (RCJC20210706091949022); Shenzhen Key Projects (JCYJ20200109105404067); Shenzhen International Cooperation Project (GJHZ20190822095420249).

Acknowledgments

We thank the National Key R&D Program of China (2021YFF0502900), National Natural Science Foundation of China (62225505/61935012/62175163/61835009/62127819), Shenzhen Talent Innovation Project (RCJC20210706091949022), Shenzhen Key Projects (JCYJ20200109105404067), and Shenzhen International Cooperation Project (GJHZ20190822095420249) for their financial support.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. H. Wang, Y. Rivenson, Y. Jin, Z. Wei, R. Gao, H. Günaydın, L. A. Bentolila, C. Kural, and A. Ozcan, “Deep learning enables cross-modality super-resolution in fluorescence microscopy,” Nat. Methods 16(1), 103–110 (2019). [CrossRef]  

2. C. Belthangady and L. A. Royer, “Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction,” Nat. Methods 16(12), 1215–1225 (2019). [CrossRef]  

3. E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-STORM: super-resolution single-molecule microscopy by deep learning,” Optica 5(4), 458–464 (2018). [CrossRef]  

4. T. C. Hollon, B. Pandian, A. R. Adapa, E. Urias, A. V. Save, S. S. S. Khalsa, D. G. Eichberg, R. S. D’Amico, Z. U. Farooq, S. Lewis, P. D. Petridis, T. Marie, A. H. Shah, H. J. L. Garton, C. O. Maher, J. A. Heth, E. L. McKean, S. E. Sullivan, S. L. Hervey-Jumper, P. G. Patil, B. G. Thompson, O. Sagher, G. M. McKhann II, R. J. Komotar, M. E. Ivan, M. Snuderl, M. L. Otten, T. D. Johnson, M. B. Sisti, J. N. Bruce, K. M. Muraszko, J. Trautman, C. W. Freudiger, P. Canoll, H. Lee, S. Camelo-Piragua, and D. A. Orringer, “Near real-time intraoperative brain tumor diagnosis using stimulated Raman histology and deep neural networks,” Nat. Med. 26(1), 52–58 (2020). [CrossRef]  

5. P. Abdolghader, A. Ridsdale, T. Grammatikopoulos, and G. Resch, “Unsupervised hyperspectral stimulated Raman microscopy image enhancement: denoising and segmentation via one-shot deep learning,” Opt. Express 29(21), 34205–34219 (2021). [CrossRef]  

6. M. J. Huttunen, A. Hassan, C. W. McCloskey, S. Fasih, J. Upham, B. C. Vanderhyden, R. W. Boyd, and S. Murugkar, “Automated classification of multiphoton microscopy images of ovarian tissue using deep learning,” J. Biomed. Opt. 23(06), 1 (2018). [CrossRef]  

7. X. Chen, Y. Li, N. Wyman, Z. Zhang, H. Fan, M. Le, S. Gannon, C. Rose, Z. Zhang, J. Mercuri, H. Yao, B. Gao, S. Woolf, T. Pécot, and T. Ye, “Deep learning provides high accuracy in automated chondrocyte viability assessment in articular cartilage using nonlinear optical microscopy,” Biomed. Opt. Express 12(5), 2759–2772 (2021). [CrossRef]  

8. D. Xiao, Z. Zang, W. Xie, N. Sapermsap, Y. Chen, and D. D. U. Li, “Spatial resolution improved fluorescence lifetime imaging via deep learning,” Opt. Express 30(7), 11479–11494 (2022). [CrossRef]  

9. B. Shen, S. Liu, Y. Li, Y. Pan, Y. Lu, R. Hu, J. Qu, and L. Liu, “Deep learning autofluorescence-harmonic microscopy,” Light: Sci. Appl. 11(1), 76 (2022). [CrossRef]  

10. Y. Rivenson, T. R. Liu, Z. S. Wei, Y. Zhang, and A. Ozcan, “PhaseStain: the digital staining of label-free quantitative phase microscopy images using deep learning,” Light: Sci. Appl. 8(1), 23 (2019). [CrossRef]  

11. C. Qiao, D. Li, Y. T. Guo, C. Liu, T. Jiang, Q. Dai, and D. Li, “Evaluation and development of deep neural networks for image super-resolution in optical microscopy,” Nat. Methods 18(2), 194–202 (2021). [CrossRef]  

12. H. Zhang, C. Y. Fang, X. Xie, Y. Yang, W. Mei, D. Jin, and P. Fei, “High-throughput, high-resolution deep learning microscopy based on registration-free generative adversarial network,” Biomed. Opt. Express 10(3), 1044–1063 (2019). [CrossRef]  

13. W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, “Deep learning massively accelerates super-resolution localization microscopy,” Nat. Biotechnol. 36(5), 460–468 (2018). [CrossRef]  

14. S. Li, Y. Li, R. Yi, L. Liu, and J. Qu, “Coherent Anti-Stokes Raman scattering microscopy and its applications,” Front. Phys. 8, (2020).

15. Y. Li, B. Shen, G. Zou, S. Wang, J. Qu, R. Hu, and L. Liu, “Fast denoising and lossless spectrum extraction in stimulated Raman scattering microscopy,” J. Biophotonics 14(8), e202100080 (2021). [CrossRef]  

16. S. You, E. J. Chaney, H. Tu, Y. Sun, S. Sinha, and S. A. Boppart, “Label-free deep profiling of the tumor microenvironment,” Cancer Res. 81(9), 2534–2544 (2021). [CrossRef]  

17. B. Shen, J. Yan, S. Wang, F. Zhou, Y. Zhao, R. Hu, J. Qu, and L. Liu, “Label-free whole-colony imaging and metabolic analysis of metastatic pancreatic cancer by an autoregulating flexible optical system,” Theranostics 10(4), 1849–1860 (2020). [CrossRef]  

18. J. Yan, Y. Zhao, F. Lin, J. Qu, Q. Liu, Y. Pan, and L. Liu, “Monitoring the extracellular matrix remodeling of high-grade serous ovarian cancer with nonlinear optical microscopy,” J. Biophotonics 14(6), e202000498 (2021). [CrossRef]  

19. S. You, H. Tu, E. J. Chaney, Y. Sun, Y. Zhao, A. J. Bower, Y. -Z. Liu, M. Marjanovic, S. Sinha, Y. Pu, and S. A. Boppart, “Intravital imaging by simultaneous label-free autofluorescence-multiharmonic microscopy,” Nat. Commun. 9(1), 2125 (2018). [CrossRef]  

20. K. Tilbury and P. J. Campagnola, “Applications of second-harmonic generation imaging microscopy in ovarian and breast cancer,” Perspect. Med. Chem. 7, PMC.S13214 (2015). [CrossRef]  

21. P. Kunze, L. Kreiss, V. Novosadová, A. V. Roehe, S. Steinmann, J. Prochazka, C. I. Geppert, A. Hartmann, S. Schürmann, O. Friedrich, and R. Schneider-Stock, “Multiphoton microscopy reveals DAPK1-dependent extracellular matrix remodeling in a chorioallantoic membrane (CAM) model,” Cancers 14(10), 2364 (2022). [CrossRef]  

22. J. Lecoq, J. Savall, D. Vucinic, B. F. Grewe, H. Kim, T. Z. Li, L. J. Kitch, and M. J. Schnitzer, “Visualizing mammalian brain area interactions by dual-axis two-photon calcium imaging,” Nat. Neurosci. 17(12), 1825–1829 (2014). [CrossRef]  

23. N. Ji, J. Freeman, and S. L. Smith, “Technologies for imaging neural activity in large volumes,” Nat. Neurosci. 19(9), 1154–1164 (2016). [CrossRef]  

24. J. N. Stirman, I. T. Smith, M. W. Kudenov, and S. L. Smith, “Wide field-of-view, multi-region, two-photon imaging of neuronal activity in the mammalian brain,” Nat. Biotechnol. 34(8), 857–862 (2016). [CrossRef]  

25. N. J. Sofroniew, D. Flickinger, J. King, and K. Svoboda, “A large field of view two-photon mesoscope with subcellular resolution for in vivo imaging,” eLife 5, e14472 (2016). [CrossRef]  

26. A. Zhou, S. A. Engelmann, S. A. Mihelic, A. Tomar, A. M. Hassan, and A. K. Dunn, “Evaluation of resonant scanning as a high-speed imaging technique for two-photon imaging of cortical vasculature,” Biomed. Opt. Express 13(3), 1374–1385 (2022). [CrossRef]  

27. D. R. Sandison, D. W. Piston, R. M. Williams, and W. W. Webb, “Quantitative comparison of background rejection, signal-to-noise ratio, and resolution in confocal and full-field laser scanning microscopes,” Appl. Opt. 34(19), 3576–3588 (1995). [CrossRef]  

28. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: an efficient alternative to SIFT or SURF,” in 2011 IEEE International Conference on Computer Vision (ICCV) (2011), pp. 2564–2571.

29. X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable ConvNets v2: more deformable, better results,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), pp. 9300–9308.

30. X. T. Wang, K. C. K. Chan, K. Yu, C. Dong, and C. C. Loy, “EDVR: video restoration with enhanced deformable convolutional networks,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2019), pp. 1954–1963.

31. W. -S. Lai, J. -B. Huang, N. Ahuja, and M. -H. Yang, “Deep Laplacian pyramid networks for fast and accurate super-resolution,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 5835–5843.

32. F. Yang, H. Yang, J. Fu, H. Li, and B. Guo, “Learning texture transformer network for image super-resolution,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), pp. 5790–5799.

33. X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy, “ESRGAN: enhanced super-resolution generative adversarial networks,” in Computer Vision – ECCV 2018 Workshops, L. Leal-Taixé and S. Roth, eds. (Springer International Publishing, 2019), pp. 63–79.

34. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018), pp. 7132–7141.

35. S. Woo, J. Park, J. -Y. Lee, and I. S Kweon, “CBAM: convolutional block attention module,” in Computer Vision - ECCV 2018, Pt VII, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, eds. (2018), pp. 3–19.

36. Z. Zhang, Y. Wang, R. Piestun, and Z. -L. Huang, “Characterizing and correcting camera noise in back-illuminated sCMOS cameras,” Opt. Express 29(5), 6668–6690 (2021). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1.
Fig. 1. Multiphoton microscopy. a Commercial multiphoton microscope with a fast galvo-resonant scanning system and slow dual-axis galvo scanning system. b Composition of low-quality input and high-quality target images (SHG and TPEF superimposed channels) in datasets. SL: scan lens; TL: tube lens; DM: dichroic mirror; CL: collect lens; OB: objective.
Fig. 2.
Fig. 2. Deep learning network architecture. a Overall network architecture, including IW, SAPCD, and RRDAB modules, convolution layers, the skip connection, up-sampling operation, and discriminator. b Image registration framework. Image warping is performed by the ORB feature extraction method to construct the training dataset. The self-alignment pyramid, cascading, and deformable convolutions (SAPCD) are embedded in the generator. c RRDAB reconstruction network. Cascaded RRDAB modules are used for image reconstruction, and dense connectivity DAB for feature communication. The attention blocks consist of two SE modules. DConv: deformable convolution; L: level; LR: low-resolution; HR: high-resolution; DAB: dense attention block; IW: image warp; FA: feature alignment; HR: high resolution; GT: ground truth.
Fig. 3.
Fig. 3. Deep learning-enhanced multiphoton images with different scanners. a-c Input, result, and target GT large images, respectively. ROI1: superimposed channel; ROI2: TPEF channel; ROI3: SHG channel. d Intensity profiles along the solid line in ROI3. e PSNR of input and output images for n = 10 large images; error bars indicate the mean standard deviation (SD). f Comparison of network inference and GT acquisition time. The values of each scale bar are marked in a.
Fig. 4.
Fig. 4. Deep learning enables large-field and high-resolution multiphoton imaging. a-f SHG- and TPEF-superimposed channels of input, result, and target GT large images. g Comparison of network inference and GT acquisition time for n = 15 large images. h Curve fitting of intensity profiles along the solid lines in the ROIs of a-c. i NIQE and PIQE of input and output images for n = 200 image tiles; error bars show the mean standard deviation (SD). GT: ground truth.
Fig. 5.
Fig. 5. Deep learning enables low photobleaching imaging. a-c Low-power excitation input and network’s output of SHG (cyan), TPEF (magenta), and superimposed channels. d-f ROIs of different sizes in a-c. g, h Intensity profiles along the solid line in d, e. i, j ROIs of background areas in f and the corresponding FFT result. N: noise; S: signal; SFA: scanning fringe artifacts; FFT: fast Fourier transform.
Fig. 6.
Fig. 6. Evaluation of IW module on image reconstruction. a Schematic of image misalignment and correction (cyan channel: SHG; magenta channel: TPEF). b Results of registered training, non-registered training, and comparison of RRDAB and RCAN.

Equations (14)

Equations on this page are rendered with MathJax. Learn more.

xa(p)=k=1Kwkx(p+pk+Δpk)Δmk,
ΔPi=f(x,(ΔPi+1)2);
(xa)i=g(DConv(xi,(ΔPi)),((xa)i+1)2),
ΔPl={Δpl}
{RDAB:Y=X+DAB(X)×BRRDAB:Fi=Fi1+RDAB(RDAB(RDAB(Fi1)))×β,
xout=RRDABd(RRDABd1(RRDAB2(RRDAB1(xa)))),
IHR=conv(LReLU(conv((xfea1+Att(conv(xout)))S))),
DAB:li={LReLU(Conv([l0,l1,,li1])),i=1,2,,m2Att(LReLU(Conv([l0,l1,,li1]))),i=m1Att(Conv([l0,l1,li1])),i=m,
{Att:fea=SE(SE(fea))SE:fea=σ(FC(AvgPool(fea)))FC(AvgPool(fea))=FC1(ReLU(FC0(feaavg))),
Lhipercep=1W5,4H5,4h=1H5,4w=1W5,4(φ5,4(Ih,wGT)φ5,4(Ih,wSR))2,
IGenSR=Lcontent+λ1Lhipercep+λ2LGANgen.
IGANdisSR=logDisΘD(IGT)+log(DisΘD(GenΘG(ILR))),
LGANgen=(logDisΘD(GenΘG(ISR))).
Lcontent=1WHh=1Hw=1W|Ih,wGTIh,wSR|,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.