Laser diode photoacoustic point source detection: machine learning-based denoising and reconstruction

Vincent Vousten; Vincent Vousten; Hamid Moradi; Zijian Wu; Emad M. Boctor; Septimiu E. Salcudean

doi:10.1364/OE.483892

1. Introduction

Photoacoustic imaging (PAI) is a novel and fast-emerging biomedical imaging technique [1–5]. Short pulses of non-ionizing, infrared light are used to excite ultrasound (US) waves in biological tissue through the photoacoustic (PA) effect [6]. These generated US waves are collected using US transducers. This allows for ultrasonic resolution with the contrast induced by light absorption in tissue [5]. PAI has shown a great potential in tissue characterization [7,8], needle tracking [9,10], and cancer detection [5,11–13].

Compact, portable, inexpensive, and high frequency, but low-power, laser diodes (LDs) have been a new approach for PAI [14,15]. LDs would enable clinical applications that are difficult to envisage with the bulky, high-power Nd:YAG lasers that are currently in use. However, those lasers provide a superior signal due to the higher power, and consequently, provide a much better signal-to-noise ratio (SNR) than LD-based PAI.

Here, we explore LD-based PA spot imaging, where a single optical absorber is illuminated [16], with conventional ultrasound transducers that are widely used for B-mode imaging. Applications for PA spot imaging include needle tip localization during surgical intervention [9,10,17], biopsies [18,19], in vivo visualization of prostate brachytherapy seeds [20], localization of transcranial targets [21] and critical structures in the brain [22], nerve imaging through contrast agents [23], registration using PA markers [24], and real-time position verification during surgical intervention [23,25,26]. In these applications, US is insufficient as a modality on its own. Despite spot imaging being a simple application of PAI, it is an important aspect for many applications and the first step in enabling low-power LD-based PAI.

Because of the low SNR in PAI, the source of noise becomes more relevant. In different studies [27,28], PA noise is modelled as Gaussian white noise. Wu et al. [29] described two system noise models, one dependent on the laser and one originating from the US imaging system. Similarly, Manwar et al. [30] classified the noise sources as electronic noise, system thermal noise, and measurement noise from scattered and attenuated signals.

To improve the SNR in PAI, temporal averaging is usually applied [30,31]. This, however, lowers the imaging frame rate and increases the laser dose. Recently, a deep learning (DL) approach was taken by our group to mitigate this problem for PAI with the high-power Nd:YAG laser system [32], by training a Pix2Pix conditional generative adversarial network (cGAN) [33]. This method aimed to denoise the pre-beamformed radio-frequency (RF) data, as well as removing sensor-specific artifacts, by creating a model that will translate the noisy RF data to its corresponding temporal-averaged RF data. A GAN consists of two main structures: the generator and the discriminator. The generator is trained to generate a denoised output based on a noisy input, whereas the discriminator is trained to distinguish that generated output from the reference dataset. Multiple previous studies have looked into PA denoising [34–36]. However, low-energy LD-based PAI and/or image reconstruction from noisy data using neural networks were not addressed in these papers.

DL has also been applied in PAI reconstruction algorithms [37–50]. The Delay-and-Sum (DAS) algorithm is widely acknowledged as an easy and simple reconstruction algorithm, but suffers from low image contrast and reconstruction artifacts [51]. In [39–41,47], different deep neural networks (DNNs) have been employed for PA tomography reconstruction. In [39–41,49,50], the models were trained on simulated PA signals and validated on both simulated PA signals and in vivo experimental data. In [39–42], both the raw RF data and the DAS-reconstructed image are used as the model inputs. The paper of Waibel et al. [42] showed that using only raw RF data as the input was insufficient for good image reconstruction. These models, however, have not been applied in point source reconstruction, where different DL-based approaches are discussed. In [43,44], point sources are recovered by acquiring numerical coordinates using convolutional neural networks (CNNs). In [39–46,48–50], all models are trained on simulated data, rather than experimental data, which can be attributed to the lack of availability of PAI data due to the limited clinical application. In [47], a DL model was trained on limited, experimental data, but using a high-powered Nd:YAG laser in a PA tomography study.

In this study, we aimed to denoise pre-beamformed PAI data from a LD-based PA system that can mimic the behaviour of multi-frame temporal averaging by using few frames, or even a single frame. Using single-frame LD was previously done in [52], but with much higher energy per pulse. The option of employing a DNN for combined denoising and image reconstruction in LD-based PAI was also explored. Furthermore, as combining DL-based models trained for specific tasks has proven useful [53,54], we will combine the denoising model and the reconstruction model. We will show that denoising and reconstruction of point sources are possible using a very few frames, sometimes a single frame, of RF data. Our study brings novelty in several different ways:

• We use limited experimental data from a LD-based PA system used for spot imaging for training of our DL-based model rather than a large simulated dataset. This ensures that the model is trained and evaluated with accurate and real data with realistic SNR and noise distribution, which includes transducer artifacts and/or interference of signals.
• We employ our DL models to low-power LD-based PAI, rather than PAI with high-power lasers. With the previously described benefits of LD over high-power lasers, LD may come close to matching the performance of high-power lasers through DL-based methods for spot imaging applications.
• We employ a combined denoising-reconstruction strategy, which can improve on the reconstruction algorithm for very low-SNR inputs.

The remainder of this paper is structured as follows: in Section 2.1, the process of data acquisition is described, along with a description of the different datasets that were constructed from the acquired data. Section 2.2 describes the denoising algorithms that were evaluated on the data and using the metrics outlined in Section 2.3. Section 2.4 describes the algorithm that was used for automated reconstruction. In Section 3, our findings are described and these findings are discussed in Section 4.

2. Methods

In the following sections, scalars are denoted by lowercase symbols and data matrices are denoted by bold uppercase symbols. A single sample from a data matrix is denoted by a regular uppercase with a superscript. We begin with describing the data acquisition process and the steps that were taken to construct the datasets for training and evaluation. Next, the denoising algorithms that were employed are described, as well as the training procedure. Methods for evaluation of those algorithms will be discussed after that. Finally, we discuss the methods used for automatic reconstruction using DL.

2.1 RF dataset for training

2.1.1 Data acquisition

In our study, the PA signal was induced by a pulsed laser diode (QPhotonics QSP-915-20, MI, USA), operating at a wavelength of 915 nm with a power of 20W driven by a pulsed laser driver (PicoLAS LDP-V 50-100 V3.3, Germany) to generate 100 ns laser pulses with a rise time of 30 ns at an energy of 1.4 µJ per pulse. Acoustic waves were generated using a black plastic cap around the laser diode output.

These acoustic waves were measured using a 128-element linear transducer (Ultrasonix L14-5) with a width of 38 mm, a center frequency of 7.5 MHz and a bandwidth of 5-14 MHz, and an Ultrasonix SonixDAQ module was used to record the RF data at a frequency of 40 MHz. The experimental setup for these acquisitions is visualized in Fig. 1. The entire setup was immersed in water, to facilitate acoustic coupling. Acquisitions were made at different axial distances between the LD output and transducer. With the LD in-plane relative to the transducer, axial distances of 3, 4, 6 and 7 cm were used. With the LD out-of-plane of the transducer, the distance was set to 3 and 4 cm. For each distance, the LD was moved in the lateral direction to capture 20 random lateral point sources. 16 of these 20 acquisitions were assigned for training, with the remaining 4 being assigned for intermediate model validation and for optimization of the model hyperparameters on unseen data. This should have resulted in 96 and 24 datasets in training and validation sets respectively, however, after removing some duplicated training datasets, 84 unique datasets remained in the training set.

Fig. 1. Experimental setup for acquisition of training and validation data with the LD PA source. The whole setup is immersed in water to facilitate acoustic coupling

Download Full Size | PDF

2.1.2 Reference dataset

A single frame of RF data is described by a $(n_t,n_e)$ matrix $\boldsymbol {X_\text {1f}}$, where $n_e$ is the number of elements and $n_t$ is the number of recorded time points. For every dataset, $n_f$ frames are recorded. To obtain a larger variety of different strengths and noise patterns, the 3- and 5-frame averages were also calculated, denoted by $\boldsymbol {X_\text {3f}}$ and $\boldsymbol {X_\text {5f}}$, respectively. Moreover, temporal averaging was applied over all $n_f$ frames to create a reference dataset $\boldsymbol {X_\text {r}}$, with the same shape $(n_t,n_e)$. A 1-D median filter of size 5 was applied in the temporal direction of the reference RF data, to resolve some artifacts incurred by the SonixDAQ machine. In this study, $n_e = 128$ elements, $n_t = 2000$ time points, and $n_f = 260$ frames.

Despite the temporal averaging and the spatial median filtering, image artifacts remained. In the training dataset of 84 images, we segmented the background manually. For every pixel segmented as ‘background’, its pixel value was replaced by the median value of all background pixels in the image. This segmented reference image is denoted by $\boldsymbol {X_\text {s}}$ and has the same size of $(n_t,n_e)$.

To generate reference images to train the reconstruction model, the DAS reconstruction was calculated from $\boldsymbol {X_{\text {r}}}$, yielding reconstructed image datasets $\boldsymbol {X_{\text {Dr}}}$. The DAS algorithm was set up so that these reconstructions had the same size as the RF data, i.e., $(n_t,n_e)$. However, this DAS reconstruction is not very specific and contains some artifacts. To get as close to a ‘true reconstruction’ as possible, and given that the training dataset size was small enough, the location of the point source was selected manually in $\boldsymbol {X_{\text {Dr}}}$. Then, this selected point was isolated by applying a 2-D Gaussian distribution mask around the selected point on the reference DAS reconstruction, where $\sigma _{\text {Gauss}} = 0.3\ \text {[mm]}$, equivalent to the size of the point source that was measured. This results in a single point source on an empty background, and is here denoted by $\boldsymbol {X_{\text {De}}}$ of the same size. In Fig. 2, the different data subsets are visualized.

Fig. 2. Samples from different data types. From left to right: single frame RF, reference RF, segmented reference RF, DAS from a single frame, DAS from $n_f$ frames, and the manual point source reconstruction.

Download Full Size | PDF

2.1.3 Dataset for model training

For training a DL-based model, having limited data, 84 images in our example, often causes the model to not generalize well to unseen data. Especially on larger models and GANs, overfitting is a big problem [55]. Therefore, we present different methods to increase the number of training samples.

First, $n_f$ frames per spot were collected. With $n_f = 260$ frames acquired per dataset in our example, it is theoretically possible to use all those frames. Due to the similarity of the systemic noise patterns of single frames of the same experiment, to include the random noise, and to optimize the memory usage, here, 5% of all available frames were used. The same method was applied for $\boldsymbol {X_{\text {3f}}}$ and $\boldsymbol {X_{\text {5f}}}$, which also both used 5% of the available frames. These three extended datasets were combined into dataset $\boldsymbol {X_{\text {F}}}$, yielding a total number of 1596 RF samples in our example:

(1)$$\left[\boldsymbol{X_\text{F}}\right]_{1596 \times 2000 \times 128} = \left[ \left[\boldsymbol{X}_\text{1f}\right]_{1092 \times 2000 \times 128} \left[\boldsymbol{X}_\text{3f}\right]_{336 \times 2000 \times 128} \left[\boldsymbol{X}_\text{5f}\right]_{168 \times 2000 \times 128} \right]$$

Next, instead of using the RF matrices shaped $(n_t,n_e)$, patches of $(n_e, n_e)$ pixels were extracted by selecting patches containing signal, and separately, patches containing noise. This was done to acquire a balanced dataset, resulting in an approximately equal number of noisy and signal patches. Given the limited number of experimental data, to get the signal at different positions and depths within the patches, a ‘moving window’ method was used, which would use the manually selected patch location and move the patch up and down along the time-axis, as illustrated by Fig. 3. For each training dataset, 21 signal patches and 21 noise patches were selected using the moving window method by going 10 steps up and down from the selected patch, further increasing the training dataset size by 42 times. From here on, datasets containing patches with shape $(n_e,n_e)$ are denoted by the symbol $\boldsymbol {Z}$. A subscription will indicate whether the patches originate from a single frame (F), the reference RF data (r), segmented RF data (s) or exact, manual reconstruction (De).

Fig. 3. Illustration on the moving-window patch selection method. (a) Manually select the top of the typical signal-parabola; (b) A patch will be defined around the selected point, and next, the patch location will move up by a predefined step size for a predefined number of steps; (c) From the selected point, this process will be repeated, but now, moving down from the selected point.

Download Full Size | PDF

2.2 Denoising algorithms

2.2.1 Pix2Pix GAN

The Pix2Pix GAN model was employed, as it was previously shown that this model can perform well for denoising PA data acquired with high-power lasers [32]. The Pix2Pix model is trained to learn a mapping, to convert an input image to an output image. The training objective of the Pix2Pix model can be described as:

(2)$$L_\text{GAN} = \mathbb{E}_{\boldsymbol{Z_\text{F}},\boldsymbol{Z_\text{r}}} \left[\log D\left(\boldsymbol{Z_\text{F}},\boldsymbol{Z_\text{r}} \right) \right] + \mathbb{E}_{\boldsymbol{Z_\text{F}},\boldsymbol{W}} \left[\log\left(1-D\left(\boldsymbol{Z_\text{F}},G\left( \boldsymbol{Z_\text{F}},\boldsymbol{W} \right) \right) \right) \right]$$

(3)$$L_\text{L1} = \mathbb{E}_{\boldsymbol{Z_\text{F}},\boldsymbol{Z_\text{r}},\boldsymbol{W}} \left[ \lVert{\boldsymbol{Z_\text{r}} - G\left(\boldsymbol{Z_\text{F}},\boldsymbol{W} \right)}\rVert_1 \right]$$

(4)$$G^* = \arg \min_G \max_D \left\{ L_\text{GAN} + \lambda_1 L_\text{L1} \right\}$$

Here, $L_\text {GAN}$ is the GAN loss, $L_\text {L1}$ is the L1 distance norm, $\mathbb {E}$ is the expectation value, and $\boldsymbol {W}$ is a random noise vector with the same shape as $\boldsymbol {Z}$, i.e., $(n_e, n_e)$. By controlling the parameter $\lambda _1$, the regularization using the L1 norm can be changed [33]. In case that the model was trained on segmented data as output, $\boldsymbol {Z_\text {r}}$ will change to $\boldsymbol {Z_\text {s}}$ in the Eqs. (2) and (3).

The model architectures for both generator and discriminator are shown in Fig. 4. In the Pix2Pix model, the generator used was an adaption to the U-Net architecture [56]. Compared to many other previous studies [32,57,58], the U-Net is much smaller in terms of the number of model parameters, using only 458,785 parameters compared to 10,471,425 parameters in the U-Net model as used in [32]. A smaller model has an advantage in computation time and reduces the risk for overfitting.

Fig. 4. (a) Discriminator of the Pix2Pix cGAN model. (b) Generator of the Pix2Pix cGAN model.

Download Full Size | PDF

The discriminator uses a simple CNN layout named PatchGAN [59], which consists of five hidden layers. The generator is given noisy inputs, which generates a prediction for denoised RF data. This prediction, along with the noisy and the reference frames, would then be used as input for the discriminator, which would give a binary prediction as output: a ‘0’ as prediction would imply a ‘fake’ image (created by the generator), whereas a ‘1’ as prediction would imply that the discriminator thinks that this is ‘real’ reference data.

The generator and discriminator were optimized during 5000 epochs using the Adam optimizer, with an initial learning rate of $\alpha = 2\cdot 10^{-4}$, L1 regularization parameter $\lambda _\text {1}=10^3$, a batch size of $n_\text {batch} = 32$, and minibatches of 1024 patches per epoch. These values were obtained by performing a small hyperparameter search. The datasets containing signal and noise patches from single frames, 3- and 5-frame averages were used as input to the model. Corresponding patches, generated from $\boldsymbol {X_\text {r}}$ and $\boldsymbol {X_\text {s}}$ were used as target output of the model in two separate experiments. Training was done on a NVIDIA Tesla V100 GPU.

2.2.2 Pix2Pix-residual GAN

The Pix2Pix-Residual GAN shows a lot of similarities to the Pix2Pix GAN, however, the ‘default’ U-Net generator was here replaced with a more advanced U-Net-like architecture, which contains more residual connections between hidden layers. This model was introduced for image denoising by Gurolla-Ramos et al. [60], but has, to our knowledge, not yet been used in a GAN approach. For the exact architecture, we refer to [60]. The residual U-Net is a much larger model in terms of model parameters due to the residual connections. Here, the residual U-Net generator had a total of 3,938,553 model parameters, as the number of filters in the first layer was changed from $f=128$ to $f=8$, and still increasing with a factor 2 for the following layers in the encoder. The PatchGAN discriminator from the original Pix2Pix model was retained in this GAN.

The generator and discriminator were optimized during 5000 epochs using the Adam optimizer, with an initial learning rate of $\alpha = 2\cdot 10^{-4}$, L1 regularization parameter $\lambda _\text {1}=10^3$, a batch size of $n_\text {batch} = 32$, and minibatches of 1024 patches per epoch. Datasets containing patches extracted from $\boldsymbol {X_\text {F}}$ were used as input to the model, and again, corresponding patches originating from $\boldsymbol {X_\text {r}}$ and $\boldsymbol {X_\text {s}}$ were used as target output of the model in two separately trained models.

2.3 Evaluation of denoising algorithm

For evaluation of our methods, new RF data was collected. The experimental setup is similar to the setup used for acquisition of training data (an overview can be found in Fig. 5). The entire setup was immersed in water to facilitate acoustic coupling. Measurements were done with the LD output at random axial distances with ex vivo tissue, a chicken breast sample, between the transducer and LD output. The chicken breast sample was added to introduce attenuation to the travelling US waves, resulting in a different, closer to a real-world situation, signal at the transducer. The LD was moved in the lateral direction, collecting 10 random lateral point sources yielding a total of 20 datasets. The same transducer, laser diode, and laser driver as used in the acquisition for the training data were used here. Again, acoustic waves were generated with a black plastic cap at the laser diode output.

Fig. 5. Experimental setup for acquisition of test data with the LD PA source. The whole setup is immersed in water to facilitate acoustic coupling.

Download Full Size | PDF

To investigate the proposed approach in a more realistic surgical scenario, we mimic prostate nerve sensing [61] with a different transducer and laser. The experimental setup is shown in Fig. 6, in which an ex vivo beef sample stained by indocyanine green (ICG) dye (TCI, Tokyo, Japan) was used. The tissue was supported by pieces of ultrasound gel pad (Parker Laboratories, Fairfield, NJ, USA) to have approximately 4 cm thickness, similar to prostate imaging. A laser diode with a wavelength of 785 nm (QPhotonics QSP-785-4, MI, USA) was used, generating 100ns pulses with rise time of $\sim$30ns at 0.4 µJ/pulse. The laser diode output was held by a clip, illuminating the surface of the ex vivo tissue at random locations. The Ultrasonix biplane transrectal ultrasound (TRUS) prostate brachytherapy probe with a width of 55 mm, an array of 128 elements, and bandwidth of 5-9 MHz was placed underneath the ex vivo tissue and ultrasound pads to detect the photoacoustic signal.

Fig. 6. Experimental setup for acquisition of test data from a beef tissue sample with ICG contrast agent.

Download Full Size | PDF

Predictions from the denoising algorithm were evaluated based on visual evaluation and three metrics. Visual evaluation can prove useful when specific patterns or artifacts occur in the model predictions. As all models were trained on square patches, the evaluation data will be split into patches as well, and reconstructed back to a single image after model prediction. Patches were extracted overlapping with the previous and next patch, covering the whole image. In other words, after extracting a patch, the next patch is located 64 pixels downwards in our case.

Here, model predictions will be denoted by $\boldsymbol {\hat {X}}$. The metrics selected for evaluation of the model predictions are Mean Squared Error (MSE; Eq. (5)) and Structural Similarity Index Measure (SSIM; Eq. (6)) [62]:

(5)$$\mathrm{MSE} = \frac{1}{n_i} \sum^{n_i}_{i=1} \left({x^i_{\text{r}}} - \hat{x}^i \right)^2$$

(6)$$\mathrm{SSIM} = \frac{(2 \mu_{\hat{x}} \mu_{\text{r}} + c_1) (2 \sigma_{{x}_\text{r} {\hat{x}}} + c_2)}{ (\mu_{\hat{x}}^2 + \mu_{\text{r}}^2 + c_1) (\sigma_{\hat{x}}^2 + \sigma_{\text{r}}^2 + c_2) }$$

(7)$$\sigma_{{x}_\text{r},{\hat{x}}} = \frac{1}{n_j-1} \sum^{n_j}_{j=1} ({x^i_{\text{r}}}-\mu_{\text{r}}) ({\hat{x}^i}-\mu_{\hat{x}})$$

with $x_\text {r}^i$ and $\hat {x}^i$ being single pixels in $\boldsymbol {X_\text {r}}$ and $\hat {\boldsymbol {X}}$, respectively, $n_i$ being the number of pixels in an image, $\mu$ the mean signal intensity, $\sigma$ the standard deviation of the signal, and $\sigma _{{x}_\text {r},\hat {x}}$ the correlation coefficient of $X_\text {r}$ and $\hat {\boldsymbol {X}}$. $c_1$ and $c_2$ are small constants to avoid ill-defined values, and are set to $c_1 = 0.01$ and $c_2 = 0.03$, matching the values defined in [62].

Furthermore, the quality of input image will be described by the SNR, where signal and noise are defined as envelope data:

(8)$$\mathrm{SNR_{dB}} = 20 \log_{10} \left( \frac{\text{mean(signal)}}{\text{std(noise)}} \right)$$

Next to these metrics, the evaluation time is measured when predicting on a modern laptop, using the CPU (Intel i7-11800H) or GPU (NVIDIA RTX 3060 Mobile), to assess whether a real-time application of these models would be viable.

2.4 Reconstruction algorithm

Different DL methods have been applied for PA reconstruction [37–46]. Here, we propose a direct method for reconstruction from noisy RF data when measuring point sources. For this, we used the Pix2Pix model with the same hyperparameters as described in Section 2.2.1. However, instead of using reference or segmented images, the target output was changed to the manually derived reconstruction. This adapts the loss functions to:

(9)$$L_\text{GAN} = \mathbb{E}_{\boldsymbol{Z_\text{F}},\boldsymbol{Z_\text{De}}} \left[\log D\left(\boldsymbol{Z_\text{F}},\boldsymbol{Z_\text{De}} \right) \right] + \mathbb{E}_{\boldsymbol{Z_\text{F}},\boldsymbol{W}} \left[\log\left(1-D\left(\boldsymbol{Z_\text{F}},G\left( \boldsymbol{Z_\text{F}},\boldsymbol{W} \right) \right) \right) \right]$$

(10)$$L_\text{L1} = \mathbb{E}_{\boldsymbol{Z_\text{F}},\boldsymbol{Z_\text{De}},\boldsymbol{W}} \left[ \lVert{\boldsymbol{Z_\text{De}} - G\left(\boldsymbol{Z_\text{F}},\boldsymbol{W} \right)}\rVert_1 \right]$$

(11)$$G^* = \arg \min_G \max_D \left\{ L_\text{GAN} + \lambda_1 L_\text{L1} \right\}$$

Here, the generator and discriminator were optimized during 5000 epochs using the Adam optimizer, using an initial learning rate of $\alpha = 2\cdot 10^{-4}$, L1 regularization parameter $\lambda _\text {1}=10^3$, a batch size of $n_\text {batch} = 32$, and minibatches of 1024 patches per epoch.

We also explored the possibility of combined denoising and reconstruction, which we will here call Dual-GAN. A single denoising GAN was selected based on its performance, and was given single frames as the input data. The output from this denoising GAN was then given as input data to the reconstruction GAN, yielding the Dual-GAN output.

The reconstruction algorithm was evaluated by comparing the center locations in the reference DAS image to both predictions (direct reconstruction and Dual-GAN). Furthermore, the full-width-half-maximum (FWHM) was calculated and compared with the reference DAS reconstruction.

3. Results

In this section, we show the results of our findings. First, we describe the results of the denoising algorithm, and next, we show the results of the reconstruction algorithm. All results that are shown here are based on samples from the test set, i.e., data that was previously unseen by any of the models.

3.1 Denoising algorithm

In Table 1, three metrics are shown for different denoising methods. In all metric calculations, samples from the test set are used, as they are previously unseen by the models, and reference RF data was used as the ground truth. For the cases where the models were trained on segmented RF data as the target output, the metric values may not be entirely accurate, since the ‘true reference’ for this model should have an empty background. However, since segmented RF data was only available for the training data, the reference RF data ($\boldsymbol {X_\text {r}}$) is used for these calculations.

Table 1. MSE and SSIM metrics for different denoising methods compared to the values for noisy data, as well as processing times on CPU and GPU.

View Table

In Fig. 7, some predicted samples for different models and reference training data are shown, along with the corresponding input frame and actual reference RF data. Figure 7(a) shows samples of the Pix2Pix model trained on reference data, and Fig. 7(b) shows samples of the Pix2Pix model trained on segmented data. Similarly, in Figs. 7(c) and 7(d), results for the Pix2Pix-Residual model are shown when using reference data and segmented data as target outputs, respectively. Again, it should be noted that the “Reference image” in the previous figures is always the reference RF data ($\boldsymbol {X_\text {r}}$), since no segmented data is available for the test set. In the Supplemental document, more examples can be found in Figures S1-S4.

Fig. 7. Results of the denoising algorithms

Download Full Size | PDF

3.2 Reconstruction algorithm

In Fig. 8, results of the reconstruction model are shown, where one sample showed a successful reconstruction from a single frame and one sample showed successful Dual-GAN reconstruction. In the Supplemental document, more examples can be found in Figures S5 and S6. From 20 samples in the test set, 14 samples had a successful prediction using only the reconstruction model. For the remaining six samples, Dual-GAN managed to predict the reconstruction correctly.

Fig. 8. Results from the reconstruction algorithm, with and without prior denoising

Download Full Size | PDF

For each image set, there are 7 columns, showing from left to right: the single frame that was used as input to the model, the prediction of the Pix2Pix-Residual denoising model, trained on segmented data, the DAS reconstruction calculated from a single frame, the DAS reconstruction from the reference RF data, the predicted reconstruction by the Pix2Pix reconstruction model, the reconstruction based on the Dual-GAN model, and an overlay between the DAS reconstruction from the reference image and the Dual-GAN prediction, to verify localization accuracy. The Pix2Pix-Residual model trained on segmented data was used for the Dual-GAN approach here, based on the sharp parabola shape and little artifacts, which is essential for a good reconstruction.

To establish the localization error in the reconstructions, an overlay was made between the DAS reconstruction from the reference image and the prediction from Dual-GAN. In 16 of 20 cases, the Dual-GAN predictions fitted exactly in the DAS reconstruction. In 3 cases, the alignment was a bit off but still close to the center of the DAS reconstruction, with a maximum error of 0.2 mm axially and 1 mm laterally. The final sample from our test set was not successfully resolved, yielding a 95% success rate.

The FWHM of the point sources were also calculated and compared to the reference DAS image. In the axial direction, the FWHM of $\boldsymbol {X_\text {Dr}}$ was found to be 0.67 $\pm$ 0.15 mm, while this reduced to 0.33 $\pm$ 0.10 mm and 0.32 $\pm$ 0.10 mm for successful direct reconstruction and Dual-GAN, respectively. In the lateral direction, the reference DAS reconstruction had a FWHM of 2.27 $\pm$ 1.67 mm, which reduced to 1.79 $\pm$ 0.66 mm and 1.92 $\pm$ 0.63 mm for successful direct reconstruction and Dual-GAN, respectively. The Gaussian mask that was applied to the reference DAS reconstruction for training had an axial FWHM of 0.32 mm and a FWHM of 2.54 mm in the lateral direction; this is the cause of the difference between axial and lateral resolution.

Results for our algorithms on the beef sample stained with ICG contrast are shown in Fig. 9. Instead of using a single frame which had SNRs as low as −2 dB, which is too low for a good prediction, we calculated the temporal average of 10 frames, which was used as input to the model. In this data, contrary to previously used data in water, reflection artifact is visible, in the DAS. More results can be found in Figure S7 in the Supplemental document.

Fig. 9. Successful prediction from Dual-GAN model from 10 frames averaged on beef-ICG dataset.

Download Full Size | PDF

4. Discussion

Table 1 shows the metrics of the different denoising methods. It appears that averaging does not improve the SSIM and MSE metrics, probably due to the horizontal pattern of noise (see Fig. 2) appeared in the RF data, while SNR does significantly improve. This is according to expectations from previous papers [30,31]. From Figure S8 in the Supplemental document, it appears that the noise distribution in our data is normally distributed, which could indicate Gaussian white noise, which is, again, as expected from literature [27,28].

For our training data, we used a ‘moving window’ method for obtaining a large number of training samples. While this may remove noise independence between training samples, the effect of this will be insignificant, since for each window of RF data, multiple frames were taken with each different noise patterns (260 frames available means a theoretical 260 different noise patterns per RF-window). This resulted in a large number of training datasets with a lot of different noise patterns on which the model can train and generalize. In a real scenario, the curvature of the wavefront would also change dependent on depth, however, the effect of this on our model is very limited. The task for our models was to (a) denoise the image, for which the curvature should remain the same in the output, and (b) determine point source location, for which the model localizes the spot at the tip of the wavefront. For example, we also evaluated the model with transrectal ultrasound transducer (55mm detection surface) which has completely different PA curvature from clinical linear transducer (38mm detection surface) and the model worked well (see Fig. 9, S7). We will investigate extracting localization information from the curvature for multiple sources in future studies.

In our study, we compare our results with DAS beamforming, a conventional localization approach widely used in PA studies [16,63,51]. While deep learning methods have been applied to different localization methods, such as Ultrasound Localization Microscopy (ULM) [64–67], these studies are often structured differently; usually, enhancement of beamformed data is done, rather than reconstruction from RF channel data [64,65,67], or methods are largely based on synthetic data [64–66], whereas we used experimental PA data for training.

It is visible that every model, with the exception of Pix2Pix-Residual with reference RF data as training target, improved in terms of SSIM compared to the noisy data. Furthermore, both Pix2Pix models improved in terms of MSE, with the Pix2Pix-Residual trained on segmented data falling closely behind. Looking at the time performance, we found that the Pix2Pix models can achieve frame rates between 25-32 Hz on GPU and 20-22 Hz on CPU, whereas the Pix2Pix-Residual models fall clearly behind with 13-16 Hz on GPU and around 1-2 Hz on CPU. Based on these metrics, it appears that the Pix2Pix models generally outperform the Pix2Pix-Residual models, as well as averaging techniques. However, as noted before, the metrics for the models trained on segmented RF data were evaluated with the reference RF data as ground truth, which may cause the metrics for these models to be worse, while performance is better. For this reason, we also carried out a visual evaluation.

In Fig. 7(a), the prediction of one sample, randomly chosen from the test data, using the Pix2Pix model trained on reference RF data is shown, along with the corresponding input and reference RF signal. Here, it is apparent that the signal strength has increased compared to the input data. Interestingly, it appears that the model has also learned to slightly improve the signal strength compared to the reference data. However, we can see that the background is a bit distorted, especially around the characteristic parabola. One possible cause of this could be that the background of the reference patches are not completely empty; since our dataset is purely based on experimental data, the only reference that we have is based on averaging many frames. However, this does not perfectly remove all noise, and still leaves some artifacts in the ‘background’ patches as well.

To tackle this problem, and given the viability of our method with little available data, the Pix2Pix model was retrained with segmented RF data as target data. In other words, the model was trained to learn background patches to become empty, instead of the average that was previously used. The predictions of this newly trained model on the test set are shown in Fig. 7(b). Here, it is visible that the background is much better predicted. The contrast of the signal parabola is similar to the contrast seen in Fig. 7(a), but here, the background looks much more homogeneous.

The results for the Pix2Pix-Residual model trained with reference data as targets are shown in Fig. 7(c). Here, the model is still able to improve the signal strength, but the background looks much more distorted compared to the results in Fig. 7(a). However, the characteristic parabolas look sharper and with less artifacts, which were visible in Fig. 7(a). To tackle the background problem, the Pix2Pix-Residual model was trained with segmented data as target data too. The results of this model are shown in Fig. 7(d). Here, we see that the characteristic parabola is predicted very well, with no artifacts and a better prediction of the secondary waves, which were predicted more blurry or not at all by the Pix2Pix model.

Looking at all denoising results, it is clear that all models (except Pix2Pix-Residual with reference RF data as target) managed to improve the input data in terms of contrast and signal strength. The Pix2Pix model generally outperformed the Pix2Pix-Residual models in metrics and time performance, however, the Pix2Pix-Residual model shows good performance when evaluating the RF signals visibly, especially looking at the secondary waves in the image and the clearness of the parabola. However, the quality of the input data is an important part of the predictions. When the input signal is too weak, no prediction or half a prediction is made. A good example of a ‘half prediction’ is visible in Figure S2a in the Supplemental document, where the input data does not show a clear parabola line on the left side, and in return, the model is unable to predict this half of the parabola.

In Fig. 8(a), a successful prediction from the reconstruction model is shown. It is visible that the direct reconstructions do well here and able to estimate a point at the centre location of the DAS reconstruction. Dual-GAN shows a similar effect here. However, in 6 out of 20 test samples, no point source could be predicted from a single frame of RF data using the direct reconstruction method. It appears that this becomes the case when the SNR of the input image falls below 0 dB. In these cases, we applied the Dual-GAN method, in which the Pix2Pix-Residual model was employed to denoise the input first, then, use this denoised input to predict the point source. In all six failure cases, this method allowed for accurate prediction for the point source. An example is shown in Fig. 8(b). One drawback of Dual-GAN is the accumulating error principle; when the denoising model predicts the parabola at a different position, this error is propagated into the reconstruction model. The total error becomes the sum of both denoising and reconstruction models. Another possible drawback of Dual-GAN is that the reconstruction model was trained with low-SNR inputs, whereas the denoising model gives high-SNR outputs by design. Usually, the reconstruction model does handle this well, as can be seen in the examples in Figure S6 in the Supplemental document, however, it is possible that the artifact from Fig. 8(b) was caused by this mismatch in input data.

Finally, we validated our method on a different dataset, where a beef sample stained with ICG contrast agent was used for attenuation. Furthermore, a different laser diode with lower energy and a different US transducer with a different width (55 mm compared to 38 mm previously) were used, compared to the experiments that were used for training our models. We can show that our method is still viable on this dataset, despite the differences in imaging configuration and acquisition parameters.

5. Conclusion

In this study, we aimed to utilize DL-based models for pre-beamformed denoising and reconstruction of PA point source images. To this end, in addition to the conventional averaging method, two DL-based models were employed and compared: the Pix2Pix model, and an adaptation to this model, named Pix2Pix-Residual. These models were trained on two target datasets, to compare performance with different background patterns. Performance for denoising using Pix2Pix showed very promising results, as well as the Pix2Pix-Residual model when trained on target data where background noise was manually removed. Automated reconstruction using Pix2Pix has also proven possible, although in some cases, the signal intensity was too low to predict the location of the point source. In these situations, a point source can be predicted by combining the reconstruction algorithm with the denoising algorithm sequentially.

Funding

Charles Laszlo Chair in Biomedical Engineering.

Acknowledgments

The authors acknowledge funding from the Charles Laszlo Chair in Biomedical Engineering held by Professor Salcudean.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. A. B. E. Attia, G. Balasundaram, M. Moothanchery, U. S. Dinish, R. Bi, V. Ntziachristos, and M. Olivo, “A review of clinical photoacoustic imaging: Current and future trends,” Photoacoustics 16, 100144 (2019). [CrossRef]

2. P. Beard, “Biomedical photoacoustic imaging,” Interface focus 1(4), 602–631 (2011). [CrossRef]

3. S. Jeon, J. Kim, D. Lee, J. W. Baik, and C. Kim, “Review on practical photoacoustic microscopy,” (2019).

4. I. Steinberg, D. M. Huland, O. Vermesh, H. E. Frostig, W. S. Tummers, and S. S. Gambhir, “Photoacoustic clinical imaging,” (2019).

5. M. Xu and L. V. Wang, “Photoacoustic imaging in biomedicine,” Rev. Sci. Instrum. 77(4), 041101 (2006). [CrossRef]

6. I. G. Calasso, W. Craig, and G. J. Diebold, “Photoacoustic Point Source,” Phys. Rev. Lett. 86(16), 3550–3553 (2001). [CrossRef]

7. Y. Yang, X. Li, T. Wang, P. D. Kumavor, A. Aguirre, K. K. Shung, Q. Zhou, M. Sanders, M. Brewer, and Q. Zhu, “Integrated optical coherence tomography, ultrasound and photoacoustic imaging for ovarian tissue characterization,” Biomed. Opt. Express 2(9), 2551 (2011). [CrossRef]

8. F. Gao, X. Feng, Y. Zheng, and C.-D. Ohl, “Photoacoustic resonance spectroscopy for biological tissue characterization,” J. Biomed. Opt. 19(6), 067006 (2014). [CrossRef]

9. J. Su, A. Karpiouk, B. Wang, and S. Emelianov, “Photoacoustic imaging of clinical metal needles in tissue,” J. Biomed. Opt. 15(2), 021309 (2010). [CrossRef]

10. M. A. Lediju Bell and J. Shubert, “Photoacoustic-based visual servoing of a needle tip,” Sci. Rep. 8(1), 15519 (2018). [CrossRef]

11. V. Dogra, B. Chinni, K. Valluru, J. Joseph, A. Ghazi, J. Yao, K. Evans, E. Messing, and N. Rao, “Multispectral photoacoustic imaging of prostate cancer: Preliminary ex-vivo results,” J. Clin. Imaging Sci. 3, 41 (2013). [CrossRef]

12. M. Ishihara, M. Shinchi, A. Horiguchi, H. Shinmoto, H. Tsuda, K. Irisawa, T. Wada, and T. Asano, “Possibility of transrectal photoacoustic imaging-guided biopsy for detection of prostate cancer,” in Photons Plus Ultrasound: Imaging and Sensing 2017, vol. 10064 (SPIE, 2017), p. 10064.

13. M. A. Lediju Bell, X. Guo, D. Y. Song, and E. M. Boctor, “Transurethral light delivery for prostate photoacoustic imaging,” J. Biomed. Opt. 20(3), 036002 (2015). [CrossRef]

14. R. G. Kolkman, W. Steenbergen, and T. G. Van Leeuwen, “In vivo photoacoustic imaging of blood vessels with a pulsed laser diode,” Lasers Med. Sci. 21(3), 134–139 (2006). [CrossRef]

15. P. K. Upputuri and M. Pramanik, “Fast photoacoustic imaging systems using pulsed laser diodes: a review,” (2018).

16. H. Moradi, Y. Wu, S. E. Salcudean, and E. Boctor, “A photoacoustic image reconstruction method for point source recovery,” in Photons Plus Ultrasound: Imaging and Sensing 2021, vol. 11642 (SPIE, 2021), p. 61.

17. D. Piras, C. Grijsen, P. Schütte, W. Steenbergen, and S. Manohar, “Photoacoustic needle: minimally invasive guidance to biopsy,” J. Biomed. Opt. 18(7), 070502 (2013). [CrossRef]

18. H. Wang, S. Liu, T. Wang, C. Zhang, T. Feng, and C. Tian, “Three-dimensional interventional photoacoustic imaging for biopsy needle guidance with a linear array transducer,” J. Biophotonics 12(12), e201900212 (2019). [CrossRef]

19. H. Kim and J. H. Chang, “Multimodal photoacoustic imaging as a tool for sentinel lymph node identification and biopsy guidance,” (2018).

20. M. A. Lediju Bell, N. P. Kuo, D. Y. Song, J. U. Kang, and E. M. Boctor, “In vivo visualization of prostate brachytherapy seeds with photoacoustic imaging,” J. Biomed. Opt. 19(12), 126011 (2014). [CrossRef]

21. M. A. L. Bell, A. K. Ostrowski, K. Li, P. Kazanzides, and E. M. Boctor, “Localization of transcranial targets for photoacoustic-guided endonasal surgeries,” Photoacoustics 3(2), 78–87 (2015). [CrossRef]

22. B. Eddins and M. A. L. Bell, “Design of a multifiber light delivery system for photoacoustic-guided surgery,” J. Biomed. Opt. 22(04), 1–041011 (2017). [CrossRef]

23. H. Moradi, E. M. Boctor, and S. E. Salcudean, “Robot-assisted image guidance for prostate nerve-sparing surgery,” IEEE International Ultrasonics Symposium, IUS 2020-September (2020).

24. A. Cheng, J. U. Kang, R. H. Taylor, and E. M. Boctor, “Direct three-dimensional ultrasound-to-video registration using photoacoustic markers,” J. Biomed. Opt. 18(6), 066013 (2013). [CrossRef]

25. H. Song, H. Moradi, B. Jiang, K. Xu, Y. Wu, R. H. Taylor, A. Deguet, J. U. Kang, S. E. Salcudean, and E. M. Boctor, “Real-time intraoperative surgical guidance system in the da Vinci surgical robot based on transrectal ultrasound/photoacoustic imaging with photoacoustic markers: an ex vivo demonstration,” IEEE Robotics and Automation Letters (2022).

26. N. Gandhi, M. Allard, S. Kim, P. Kazanzides, and M. A. Lediju Bell, “Photoacoustic-based approach to surgical guidance performed with and without a da vinci robot,” J. Biomed. Opt. 22(12), 121606 (2017). [CrossRef]

27. B. Stephanian, M. T. Graham, H. Hou, and M. A. L. Bell, “Additive noise models for photoacoustic spatial coherence theory,” Biomed. Opt. Express 9(11), 5566–5582 (2018). [CrossRef]

28. J. Sun, B. Zhang, Q. Feng, H. He, Y. Ding, and Q. Liu, “Photoacoustic wavefront shaping with high signal to noise ratio for light focusing through scattering media,” Sci. Rep. 9(1), 4328 (2019). [CrossRef]

29. Y. Wu, J. Kang, W. G. Lesniak, A. Lisok, H. K. Zhang, R. H. Taylor, M. G. Pomper, and E. M. Boctor, “System-level optimization in spectroscopic photoacoustic imaging of prostate cancer,” Photoacoustics 27, 100378 (2022). [CrossRef]

30. R. Manwar, M. Hosseinzadeh, A. Hariri, K. Kratkiewicz, S. Noei, and M. R. N. Avanaki, “Photoacoustic signal enhancement: towards utilization of low energy laser diodes in real-time photoacoustic imaging,” Sensors 18(10), 3498 (2018). [CrossRef]

31. M. Zhou, H. Xia, H. Zhong, J. Zhang, and F. Gao, “A noise reduction method for photoacoustic imaging in vivo based on emd and conditional mutual information,” IEEE Photonics J. 11(1), 1–10 (2019). [CrossRef]

32. A. Refaee, C. J. Kelly, H. Moradi, and S. E. Salcudean, “Denoising of pre-beamformed photoacoustic data using generative adversarial networks,” Biomed. Opt. Express 12(10), 6184 (2021). [CrossRef]

33. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” CVPR (2017).

34. E. M. A. Anas, H. K. Zhang, J. Kang, and E. M. Boctor, “Towards a fast and safe led-based photoacoustic imaging using deep convolutional neural network,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part IV 11, (Springer, 2018), pp. 159–167.

35. C. Dehner, I. Olefir, K. B. Chowdhury, D. Jüstel, and V. Ntziachristos, “Deep-learning-based electrical noise removal enables high spectral optoacoustic contrast in deep tissue,” IEEE Trans. Med. Imaging 41(11), 3182–3193 (2022). [CrossRef]

36. M. K. A. Singh, K. Sivasubramanian, N. Sato, F. Ichihashi, Y. Sankai, and L. Xing, “Deep learning-enhanced led-based photoacoustic imaging,” in Photons Plus Ultrasound: Imaging and Sensing 2020, vol. 11240 (SPIE, 2020), pp. 161–166.

37. M. W. Kim, G. S. Jeng, I. Pelivanov, and M. O’Donnell, “Deep-Learning Image Reconstruction for Real-Time Photoacoustic System,” IEEE Trans. Med. Imaging 39(11), 3379–3390 (2020). [CrossRef]

38. A. Hauptmann and B. T. Cox, “Deep learning in photoacoustic tomography: current approaches and future directions,” J. Biomed. Opt. 25(11), 112903 (2020). [CrossRef]

39. H. Lan, K. Zhou, C. Yang, J. Cheng, J. Liu, S. Gao, and F. Gao, “Ki-GAN: Knowledge Infusion Generative Adversarial Network for Photoacoustic Image Reconstruction In Vivo,” in Lecture Notes in Computer Science, vol. 11764 LNCS (Springer Science and Business Media Deutschland GmbH, 2019), pp. 273–281.

40. H. Lan, D. Jiang, C. Yang, F. Gao, and F. Gao, “Y-Net: Hybrid deep learning image reconstruction for photoacoustic tomography in vivo,” Photoacoustics 20, 100197 (2020). [CrossRef]

41. M. Guo, H. Lan, C. Yang, J. Liu, and F. Gao, “AS-Net: Fast Photoacoustic Reconstruction with Multi-Feature Fusion from Sparse Data,” IEEE Trans. Comput. Imaging 8, 215–223 (2022). [CrossRef]

42. D. Waibel, J. Gröhl, F. Isensee, T. Kirchner, K. Maier-Hein, and L. Maier-Hein, “Reconstruction of initial pressure from limited view photoacoustic images using deep learning,” in Photons Plus Ultrasound: Imaging and Sensing 2018, vol. 10494A. A. Oraevsky and L. V. Wang, eds. (SPIE, 2018), p. 98.

43. A. Reiter and M. A. Lediju Bell, “A machine learning approach to identifying point source locations in photoacoustic data,” in Photons Plus Ultrasound: Imaging and Sensing 2017, vol. 10064 (SPIE, 2017), p. 100643J.

44. K. Johnstonbaugh, S. Agrawal, D. A. Durairaj, C. Fadden, A. Dangi, S. P. K. Karri, and S. R. Kothapalli, “A Deep Learning Approach to Photoacoustic Wavefront Localization in Deep-Tissue Medium,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr. 67(12), 2649–2659 (2020). [CrossRef]

45. K. Johnstonbaugh, S. Agrawal, D. A. Durairaj, M. Homewood, S. P. Krisna Karri, and S.-R. Kothapalli, “Novel deep learning architecture for optical fluence dependent photoacoustic target localization,” in Photons Plus Ultrasound: Imaging and Sensing 2019, vol. 10878 (SPIE, 2019), p. 55.

46. D. Allman, A. Reiter, and M. A. Bell, “Photoacoustic Source Detection and Reflection Artifact Removal Enabled by Deep Learning,” IEEE Trans. Med. Imaging 37(6), 1464–1477 (2018). [CrossRef]

47. N. Davoudi, X. L. Deán-Ben, and D. Razansky, “Deep learning optoacoustic tomography with sparse data,” Nat. Mach. Intell. 1(10), 453–460 (2019). [CrossRef]

48. S. Antholzer, M. Haltmeier, and J. Schwab, “Deep learning for photoacoustic tomography from sparse data,” Inverse Probl. Sci. Eng. 27(7), 987–1005 (2019). [CrossRef]

49. N. Awasthi, G. Jain, S. K. Kalva, M. Pramanik, and P. K. Yalavarthy, “Deep neural network-based sinogram super-resolution and bandwidth enhancement for limited-data photoacoustic tomography,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr. 67(12), 2660–2673 (2020). [CrossRef]

50. F. K. Joseph, A. Arora, P. Kancharla, M. K. A. Singh, W. Steenbergen, and S. S. Channappayya, “Generative adversarial network-based photoacoustic image reconstruction from bandlimited and limited-view data,” in Photons Plus Ultrasound: Imaging and Sensing 2021, vol. 11642 (SPIE, 2021), pp. 208–213.

51. S. Jeon, E. Y. Park, W. Choi, R. Managuli, K.-j. Lee, and C. Kim, “Real-time delay-multiply-and-sum beamforming with coherence factor for in vivo clinical photoacoustic imaging of humans,” Photoacoustics 15, 100136 (2019). [CrossRef]

52. Ç. Özsoy, A. Cossettini, A. Özbek, S. Vostrikov, P. Hager, X. L. Deán-Ben, L. Benini, and D. Razansky, “Lightspeed: A Compact, High-Speed Optical-Link-Based 3D Optoacoustic Imager,” IEEE Trans. Med. Imaging 40(8), 2023–2029 (2021). [CrossRef]

53. J. van Boxtel, V. Vousten, J. Pluim, and N. M. Rad, “Hybrid Deep Neural Network for Brachial Plexus Nerve Segmentation in Ultrasound Images,” in 2021 29th European Signal Processing Conference (EUSIPCO), (2021), pp. 1246–1250.

54. Y. Xu, K. Yan, J. Kim, X. Wang, C. Li, L. Su, S. Yu, X. Xu, and D. D. Feng, “Dual-stage deep learning framework for pigment epithelium detachment segmentation in polypoidal choroidal vasculopathy,” Biomed. Opt. Express 8(9), 4061–4076 (2017). [CrossRef]

55. X. Ying, “An Overview of Overfitting and its Solutions,” J. Phys.: Conf. Ser. 1168, 022022 (2019). [CrossRef]

56. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” CoRR abs/1505.0, 234–241 (2015).

57. J. Chi, C. Wu, X. Yu, P. Ji, and H. Chu, “Single low-dose ct image denoising using a generative adversarial network with modified u-net generator and multi-level discriminator,” IEEE Access 8, 133470–133487 (2020). [CrossRef]

58. Y. Han and J. C. Ye, “Framing U-Net via Deep Convolutional Framelets: Application to Sparse-View CT,” IEEE Trans. Med. Imaging 37(6), 1418–1429 (2018). [CrossRef]

59. P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-Janua (2017), pp. 5967–5976.

60. J. Gurrola-Ramos, O. Dalmau, and T. E. Alarcon, “A Residual Dense U-Net Neural Network for Image Denoising,” IEEE Access 9, 31742–31754 (2021). [CrossRef]

61. J. Kang, H. N. D Le, S. Karakus, A. P. Malla, M. M. Harraz, J. U. Kang, A. L. Burnett, and E. M. Boctor, “Real-time, functional intra-operative localization of rat cavernous nerve network using near-infrared cyanine voltage-sensitive dye imaging,” Sci. Rep. 10(1), 1–10 (2020). [CrossRef]

62. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

63. X. Ma, C. Peng, J. Yuan, Q. Cheng, G. Xu, X. Wang, and P. L. Carson, “Multiple delay and sum with enveloping beamforming algorithm for photoacoustic imaging,” IEEE Trans. Med. Imaging 39(6), 1812–1821 (2020). [CrossRef]

64. R. J. van Sloun, O. Solomon, M. Bruce, Z. Z. Khaing, H. Wijkstra, Y. C. Eldar, and M. Mischi, “Super-resolution ultrasound localization microscopy through deep learning,” IEEE Trans. Med. Imaging 40(3), 829–839 (2021). [CrossRef]

65. X. Liu, T. Zhou, M. Lu, Y. Yang, Q. He, and J. Luo, “Deep learning for ultrasound localization microscopy,” IEEE Trans. Med. Imaging 39(10), 3064–3078 (2020). [CrossRef]

66. J. Youn, B. Luijten, M. Schou, M. B. Stuart, Y. C. Eldar, R. J. van Sloun, and J. A. Jensen, “Model-based deep learning on ultrasound channel data for fast ultrasound localization microscopy,” in 2021 IEEE International Ultrasonics Symposium (IUS), (IEEE, 2021), pp. 1–4.

67. U.-W. Lok, C. Huang, P. Gong, S. Tang, L. Yang, W. Zhang, Y. Kim, P. Korfiatis, D. J. Blezek, F. Lucien, R. Zheng, J. D. Trzasko, and S. Chen, “Fast super-resolution ultrasound microvessel imaging using spatiotemporal data with deep fully convolutional neural network,” Phys. Med. Biol. 66(7), 075005 (2021). [CrossRef]

Denoising method	MSE	SSIM	SNR $_{dB}$	Evaluation time CPU [ms]	Evaluation time GPU [ms]
Noisy data	0.018 $\pm$ 0.013	0.744 $\pm$ 0.139	3.45 $\pm$ 4.71	-	-
10 frames averaged	0.021 $\pm$ 0.013	0.892 $\pm$ 0.028	3.96 $\pm$ 2.72	-	-
20 frames averaged	0.017 $\pm$ 0.010	0.883 $\pm$ 0.031	6.91 $\pm$ 1.95	-	-
Pix2Pix trained with reference targets	0.014 $\pm$ 0.004	0.914 $\pm$ 0.014	23.2 $\pm$ 1.6	47.3 $\pm$ 1.8	34.9 $\pm$ 3.7
Pix2Pix trained with segmented targets	0.015 $\pm$ 0.005	0.926 $\pm$ 0.017	23.5 $\pm$ 1.2	46.3 $\pm$ 1.5	35.4 $\pm$ 4.0
Pix2Pix-Residual trained with reference targets	0.047 $\pm$ 0.039	0.579 $\pm$ 0.020	21.3 $\pm$ 2.9	893.0 $\pm$ 19.1	69.3 $\pm$ 4.9
Pix2Pix-Residual trained with segmented targets	0.021 $\pm$ 0.005	0.914 $\pm$ 0.017	23.9 $\pm$ 2.0	874.4 $\pm$ 18.9	71.2 $\pm$ 5.1

Laser diode photoacoustic point source detection: machine learning-based denoising and reconstruction

Abstract

1. Introduction

2. Methods

2.1 RF dataset for training

2.1.1 Data acquisition

2.1.2 Reference dataset

2.1.3 Dataset for model training

2.2 Denoising algorithms

2.2.1 Pix2Pix GAN

2.2.2 Pix2Pix-residual GAN

2.3 Evaluation of denoising algorithm

2.4 Reconstruction algorithm

3. Results

3.1 Denoising algorithm

3.2 Reconstruction algorithm

4. Discussion

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (9)

Tables (1)

Equations (11)

Optics Express