PA-NeRF, a neural radiance field model for 3D photoacoustic tomography reconstruction from limited Bscan data

Yun Zou; Yixiao Lin; Quing Zhu; Quing Zhu

doi:10.1364/BOE.511807

1. Introduction

Photoacoustic tomography (PAT) is a valuable functional imaging technique that offers optical contrast and ultrasound (US) spatial resolution [1]. PAT has many advantages over traditional imaging modalities as X-ray, US, CT, and MRI because of its ability to image functional parameters, such as blood oxygenation saturation and perfusion [2]. It is particularly useful for imaging soft tissues, such as the breast, ovary, thyroid, and skin, in cancer detection and diagnosis [3].

Our group previously utilized a real-time co-registered PAT-US system with a curved transvaginal transducer to measure PA signals and diagnose ovarian cancer [4–6]. Researches have shown that ovarian tissue morphology and function serve as an important biomarker in classifying malignancy [7–9]. However, 2D US and PAT images provide only a cross-sectional view of the lesion. Additionally, because they are limited by the imaging window of the transvaginal probe, US and PAT images can reveal only a portion of large ovarian lesions.

3D images of ovarian lesions can improve diagnosis by providing physicians with more complete morphology and function information [10]. Several methods have been explored to reconstruct 3D volumes in PAT. Typically, either a 2D transducer is needed to collect 3D data [11–15], or a 1D transducer is scanned to perform 3D reconstruction [16–18]. However, these methods all require expensive US arrays or time-consuming data acquisition, and their limited imaging windows lose target information needed for clinical applications.

Deep learning has made significant advances in computer vision and in biomedical imaging. Many deep learning methods have been explored for image reconstruction, image post-processing, image segmentation, and lesion classification [19–22]. As for deep learning in PAT, researchers are exploring various deep learning architectures to improve image resolution, reduce noise and image artifacts, achieve better reconstruction imaging quality, reconstruct images from limited view data, and recover images from sparsely sampled data [23–27].

A neural radiance field, commonly abbreviated as NeRF, is a groundbreaking technique in computer graphics and computer vision. Originally introduced in 2021 [28], this method has redefined the process of generating three-dimensional reconstructions from two-dimensional image data.

At its core, a NeRF synthesizes photorealistic 3D visualizations by forecasting both the structural and aesthetic elements of a scene through a continuous 2D function called a radiance field. Instead of relying on traditional 3D models or point clouds, NeRF leverages deep learning to predict how light interacts with surfaces at every point in space. This nuanced technique can capture intricate details, light reflections, and shadows, generating renderings that are almost indistinguishable from real-life scenes.

Beyond graphics and visual content generation, the transformative power of NeRF is making notable strides. In biomedical imaging, recent studies [29,30] have showcased the potential of NeRF in applications such as 3D CT imaging. The ability of NeRF to filter out noise, rectify artifacts, and amplify resolution has made CT scans more discernible. The increased clarity and contrast pave the way for precise diagnostic evaluations and comprehensive treatment blueprints. These benefits are particularly pronounced in 3D synthesis tasks that require meticulous attention to detail, such as tumor identification and organ boundary demarcation [29–32].

Moreover, the versatility of NeRF extends to ultrasonography as well. In traditional US imaging, capturing high-resolution 3D visuals of biological entities remains a challenge. However, with NeRF, the generation of detailed 3D reconstructions has become more feasible. A notable contribution was made by Wysocki et al., who integrated physics-inspired rendering to make US image synthesis more realistic, emphasizing direction-dependent changes in the scene [32]. Another pivotal study, by Li et al., demonstrated the utility of NeRF in 3D US imaging of the spine. Their technique yielded superior quality images that accentuated bone structures, minimized noise, and ultimately led to a more accurate assessment of spinal curves. Impressively, their approach outperformed traditional methods, resulting in reduced error margins in spine curve estimations [30]. Inspired by NeRF in 3D rendering, we developed a deep learning model for PAT volumetric reconstruction from sparse Bscan data, called PA-NeRF. PA-NeRF is different from a typical NeRF model in that it reconstructs a 3D volume of optical absorption profiles based on both PAT images and US images. Additionally, a forward loss to improve the PA-NeRF model’s performance is implemented. To demonstrate the superiority of the PA-NeRF model over other machine learning models, we conducted phantom experiments and applied the PA-NeRF model to patient data.

2. Methodology

We employed a curved array transducer to capture cross-sectional Bscan data of the target and applied the delay-and-sum technique to reconstruct 2D Bscan images. Since our PAT system was designed for in-vivo imaging of ovarian lesions, the most feasible way to scan the entire lesion area was to rotate the probe to acquire the necessary volumetric data, as depicted in Fig. 1.

Fig. 1. Scanning schematic diagram. The transducer is fixed on top of the target and rotates around the center. ξ represents the transducer rotation angle.

Download Full Size | PDF

In this study, we implemented five different 3D reconstruction methods for comparison. The first method used direct interpolation (DI) from Bscans collected at different angles inside the volume of interest. DI was computationally the most efficient method, and it produced reasonable reconstructions from dense angular sampling. The second method was Bscan-wise universal back projection (BBP), adapted from universal back projection (UBP) to better suit the imaging conditions of this study. Because BBP was an analytical method for 3D reconstruction, it produced more accurate results with sparser angular sampling than DI. The third method was learning-based: a UNet was trained to predict volumetric PAT directly from a series of Bscans from a small number of angles. The UNet proved particularly powerful at generating images under sparse sensing conditions. The fourth method, called TransUnet, implements the transformer block into the Unet model. It combines the local features from Unet and the global features from Transformers. Lastly, the best method was a NeRF-based machine learning model, which used a series of Bscans, specifically Bscan beams, collected at various angles to predict new Bscan beams at unscanned angles through a learned coordinate transform. NeRF could synthesize new views of the same target with a much simpler model than UNet and was more robust against overfitting or biased training data [28–30].

2.1 PA-NeRF

In this comprehensive study, we have leveraged the capabilities of the NeRF technique to enhance the reconstruction process of PAT. This enhanced model, termed PA-NeRF, offers a refined approach to PAT reconstruction by harnessing the power of deep learning. Traditionally, reconstructing a detailed PAT image from scattered scan lines has posed significant challenges, particularly when these lines originated from varied positions and orientations. However, by integrating the NeRF model, PA-NeRF is trained to comprehend these irregularities. By utilizing beamformed data from scan lines whose positions and orientations are already known, PA-NeRF learns the necessary coordinate transformations. This acquired knowledge empowers it to predict how beamformed data might appear if it were derived from scan lines in previously unscanned positions and orientations.

The PA-NeRF architecture is constructed based on a supervised, fully connected neural network framework. Its input contains positional information of each scan line, which includes the spatial coordinates (x, y, z) on the detection surface and the scan angle (ξ), which indicates the transducer rotation in Bscans, as depicted in Fig. 1. Additionally, an object identifier (α) is incorporated as a marker to denote the specific object under reconstruction. This identifier is essential for the model, as it is used to calculate the appropriate coordinate transformation for each object.

The network comprises three hidden layers, each equipped with 256 neurons. For optimization, PA-NeRF employs the Adam optimizer, which enhances its efficiency in reaching an optimal learning rate.

In terms of output, PA-NeRF generates two distinct categories. The first category is a 128-dimensional representation of the absorption coefficient beamline, while the second is a similar-dimensional representation of the initial pressure beamline. The model's primary function is to predict beamformed data corresponding to the input scan line. In addition, it also learns the absorption coefficient profile of the scan line, which is pivotal in accurately reconstructing PAT images by capturing subtle variations in the acoustic properties of the scanned tissue. Finally, the PA-NeRF model will output a Bscan PAT image and a US image from different rotation angles.

To generate a holistic Bscan, the model takes in the positional data of every individual scan line that constitutes the Bscan. The outputs for each of these lines are then carefully concatenated, ensuring alignment with their respective scan line positions. This procedure ensures the generation of a seamless and comprehensive Bscan. As an advanced capability, the PA-NeRF model can also produce volumetric reconstructions by making predictions for Bscans around a complete 360° rotation and subsequently amalgamating these predictions. Bscan images are not overlapped, they are cross-sections around the center of a volume. Figure 2 depicts this intricate process and its outstanding results.

Fig. 2. PA-NeRF model structure. F is the operator to map (x, y, z, ξ, α) to initial pressure and absorption coefficient and $\mathrm{\theta }$ represent weights of PA-NeRF model.

Download Full Size | PDF

Fine-tuning the PA-NeRF model involves several steps. Initially, the model is provided with a new object identifier (α) along with the known transducer positions and angles. The ground truth in this context comprises the available ultrasound (US) images and photoacoustic tomography (PAT) images. The goal is to reconstruct the 3D volume. To achieve this, the model generates PAT images from a variety of rotation angles and transducer positions. In the final step, these individual slices are synthesized to form a comprehensive 3D volume.

In addition to the mean square error (MSE) loss for absorption coefficient images and Bscan data, a forward loss is implemented to improve the model’s performance:

(1)$$\begin{array}{{c}} {Loss = \frac{1}{n}\left( {\sum {{({{\mu_a} - \overline {{\mu_a}} } )}^2} + \sum {{({{\textrm{p}_0} - \overline {{p_0}} } )}^2} + \gamma \sum {{({{\mu_a} \cdot \phi - {p_0}} )}^2}} \right).} \end{array}$$

Here ${\mu _a}\textrm{and}\overline {{\mu _a}} $ are the absorption coefficients of the model output and ground truth, $\gamma $ is a constant coefficient of 0.1, $\textrm{and}\phi $ represents the fluence map. In our PAT-US system, four optical fibers deliver light to the tissue. Similarly, four-point sources generate a homogeneous field in simulations. ${p_0}\textrm{and}\overline {{p_0}} $ are the beamformed initial pressure scanlines of the model output and ground truth, respectively. From the principle of PAT, we know that this loss should be 0 if we reconstruct perfect images.

Compared to other supervised models, PA-NeRF exhibits greater generalizability. This is attributed to its fine-tuning process. When aiming to reconstruct a new object from a set of Bscans, we initially fine-tune the model using the available Bscans. Subsequently, by inputting all transducer positions, we can acquire Bscans from every orientation. This comprehensive collection of Bscans then facilitates the reconstruction of the 3D volume.

2.2 Direct interpolation

Direct Interpolation uses 3D interpolation to build the entire 3D volume by combining several cross-section Bscan images. In photoacoustic imaging, the absorption of each short laser pulse produces an initial pressure rise, ${p_0}$, within the tissue. ${p_0}$ is proportional to the Grüneisen parameter $(\mathrm{\Gamma } )$, the target absorption coefficient $({{\mu_a}} )$, and the light fluence $({\phi \textrm{}(r} ))$ [1,33–35]:

(2)$$\begin{array}{{c}} {{p_0} = \Gamma \cdot {\mu _a} \cdot \phi.} \end{array}$$

To reconstruct the target’s optical properties from photoacoustic signals, delay-and-sum beamforming is typically used, which takes into account the time-of-flight of the ultrasound signal and the location of the point of reflection to create an image with good spatial resolution. Once several Bscan images have been reconstructed from multiple cross-sections, they are positioned in a 3D volume based on their coordinates, and missing voxels between each successive Bscan image are interpolated to generate the full 3D volume.

2.3 Bscan-based back projection (BBP)

BBP is modified from the universal back projection algorithm (UBP), which is an analytical reconstruction method for obtaining volumetric PAT results. UBP can be applied to different scanning geometries and has been used in various clinical applications [36–39]. Compared to other reconstruction methods, such as filtered back projection and iterative model-based reconstruction, UBP strikes a good balance between reconstruction quality, computation cost, and robustness to noise and artifacts. The algorithm projects the measurement from each detector element back to the entire imaged volume in a spherically propagating wave, following the equation below [40]:

(3)$$\begin{array}{{c}} {{p_0}(r )= \; \frac{1}{{{\mathrm{\Omega }_0}}}\mathop \int \nolimits_{{S_0}}\left( {2p({{r_0},t} )- 2t\frac{{\partial p({{r_0},t} )}}{{\partial t}}} \right) \times \frac{{cos{\theta _0}}}{{{{|{r - {r_0}} |}^2}}}d{S_0}} \end{array}$$

where ${r_0}$ and r are the detector and voxel positions, respectively; ${\theta _0}$ is the angle between $\vec{r}$ and $\overrightarrow {{r_0}} $; and the integral integrates over the entire detection surface.

To prepare for reconstruction, we discretized the volume into voxels with dimensions of 0.4 mm (x) x 0.4 mm (y) x 0.3 mm (z), where the axial dimension (z) was chosen to be on the same scale as and slightly greater than the ultrasonic pulse length. In a clinical setting, slight probe motion other than pure rotation between scans cannot be avoided, so the raw data from different Bscans cannot be added coherently. For this reason, unlike in UBP, in BBP we back projected the Bscan data separately to the imaged volume and summed the resultant envelope data of Bscans from all angles to obtain the final reconstruction, as shown in Fig. 3. We refer to this modified procedure as Bscan-wise universal back projection (BBP). In simulations with no motion artifacts between Bscans, we compared the BBP reconstruction with the Hilbert transform of conventional UBP reconstruction and found comparable results.

Fig. 3. Reconstruction diagrams for UBP and BBP. In UBP, the algorithm projects each Bscan data set to the entire imaged volume and add them coherently. In BBP, the algorithm projects Bscan envelop data to the volume and add them to form 3-D images.

Download Full Size | PDF

2.4 Convolutional neural network

For comparison, we have implemented a 3D Unet model. It includes two parts, encoder network, and decoder network. The encoder network employs convolutional layers to extract relevant features from the input image while down sampling the image. The decoder network then uses up sampling layers to reconstruct the image, while preserving the high-resolution features using skip connections between the encoder and decoder networks.

To reconstruct the 3D PAT volume from Bscan images, the 3D CNN Unet model input was a sparse 3D matrix representing the volume of interest. The input Bscan images were inserted into this volume at their respective scan angles in the form of 2D slices, while the space between Bscan images was otherwise not filled. This input design allowed for a flexible number of input Bscan images while keeping the model input size constant. In addition, it was compatible with the UNet architecture and convenient for the subsequent 3D rendering because the volume of interest to be reconstructed was directly used as the model input. To mitigate the high computational cost associated with 3D convolution layers, we implemented three downsampling and upsampling layers in the network structure.

2.5 TransUnet

In our implementation, we adapted the TransUnet model, originally proposed by JN. Chen [41], integrating the comprehensive feature extraction capabilities of both the transformer and U-Net architectures. This model distinctively tokenizes image patches derived from CNN feature maps, which the transformer then processes to incorporate global contexts. These globally informative features are subsequently upsampled and integrated with high-resolution CNN feature maps, ensuring enhanced precision in localization. Our modification particularly adjusted the input and output channels to align with those used in the 3D U-Net model, as shown in Fig. 4. This adaptation was aimed at refining the model's application to a broader range of reconstruction tasks, where TransUnet has already demonstrated its superiority over existing methods.

Fig. 4. Overview of the framework. (a) schematic of the Transformer layer; (b) architecture of modified TransUNet [41].

Download Full Size | PDF

2.6 Generating simulation and experimental data for model training

2.6.1 Simulation data

To generate simulation data from digital phantoms, we used a three-step strategy. First, a volume with the shape of the target letter was defined in a homogeneous background. The optical fluence inside each digital phantom was calculated using Monte Carlo simulation [42]. Second, we computed the initial pressure value after laser pulse excitation at each pixel of the phantom by multiplying the Grüneisen parameter at that pixel by its absorption coefficient and the optical fluence. Third, from this initial pressure map, the raw photoacoustic signal at each transducer element was generated using the k-wave toolbox [43]. The transducer configuration matched that of our co-registered US and PAT system. We simulated 5 letters —W, A, S, H, and U — and for each letter, we generated Bscans at 18 different angles by rotating the US probe. A total of 1350 (3 × 5 × 5 × 18) Bscan data were simulated, as shown in Table 1.

Table 1. Simulation Data

View Table | View all tables in this article

2.6.2 3D Phantom data

Two 3D letter phantoms were made from 4% agar. To achieve a ∼4 cm^-1 reduced scattering coefficient, we added 0.4% intralipid, and to achieve a ∼ 1 cm^-1 absorption coefficient, we added 0.04% v/v of ink. After being refrigerated for 24 hours, the phantoms were placed in an intralipid solution with calibrated absorption and reduced scattering coefficients of 0.02 and 6 cm^-1, respectively, and measured from their PAT/US signals.

2.7 Real-time US-PAT system

All phantom and clinical data were acquired with an in-house, real-time co-registered PAT and US system [5]. The co-registered PAT system has three parts, 1) a tunable Ti:sapphire laser (700 nm to 900 nm) pumped by a Q-switched Nd: YAG laser (Symphotics TII, LS-2134), which generates 10 ns laser pulses at a 15 Hz repetition rate; 2) an optical fiber-based light delivery system [44]; and 3) a commercial US system (EC-12R, Alpinion Medical Systems, Republic of Korea) [45].

A phantom experiment validated the four different methods for 3D PAT reconstruction (DI, BBP, 3D CNN Unet, and PA-NeRF). A transvaginal probe was placed on top of a water tank, and a rotation stage held the probe. The agar-based phantom was placed on the bottom of the tank filled with 1% intralipid solution, as shown in Fig. 5.

Fig. 5. Experimental configuration and phantom ‘U’.

Download Full Size | PDF

2.8 Image quality assessment

2.8.1 SSIM

To quantify the 3D reconstructions from different methods, we used the structural similarity index measure (SSIM) to compare the reconstructions with the ground truth. The SSIM is calculated by Eq. (4):

(4)$$\begin{array}{{c}} {SSIM({x,y} )= l{{({x,y} )}^\alpha } \cdot c{{({x,y} )}^\beta } \cdot s{{({x,y} )}^\gamma }\; .} \end{array}$$

Here, x and y are respectively the reconstructed and the ground truth PA Bscans, l is their luminance, c is their contrast, and s represents the structure. $\alpha ,\beta ,\textrm{and}\gamma $ are coefficients which have a value less than or equal to 1.

2.8.2 PSNR

We also used the peak signal to noise ratio (PSNR) to quantify the quality of the reconstructed images and the ground truth. The calculation of the PSNR is shown in Eq. (5):

(5)$$\begin{array}{{c}} {PSNR({x,y} )= 10 \cdot {{\log }_{10}}\left( {\frac{{MA{X^2}}}{{MSE({x,y} )}}} \right).} \end{array}$$

Here, $x\textrm{and}y$ represent the reconstructed and ground truth images, $\textrm{and}MAX$ is the maximum possible pixel value of the image (for gray scale images, $MAX$ is 255). $MSE({x,y} )$ is the mean square error of two images.

2.8.3 RMSE

In addition to the SSIM metric, we employed the root mean squared error (RMSE) to gauge the accuracy of the 3D reconstructions. The RMSE is formulated as

(6)$$\begin{array}{{c}} {RMSE({x,y} )= {{\left( {\frac{1}{{mnp}}\mathop \sum \limits_1^m \mathop \sum \limits_1^n \mathop \sum \limits_1^p {{({x({i,j,k} )- y({i,j,k} )} )}^2}} \right)}^{\frac{1}{2}}}.} \end{array}$$

3. Results

3.1 Simulation results

Figure 6 shows the simulation of the letter ‘U’ and the corresponding 3D reconstruction results from each of the four methods. The visualization of the 3D volume is presented using four images: (1) a top-down projection image, which shows the maximum projection of the reconstructed volume from the top-down view; and (2-4) three different reconstructed cross-sectional images that show the details of each reconstruction. In this particular case, nine Bscans were used for rendering the 3D volume, with the transducer rotation angles ranging from 0 to 180 degrees with a step of 20 degrees. All the reconstruction images are normalized to maximum amplitude.

Fig. 6. a. Comparative 3D reconstructions of a simulated “U” shape using nine Bscans. Displayed from left to right are the ground truth and the results from direct interpolation, back projection via Bscan, the CNN-Unet model, the TransUnet model, and the PA-NeRF model. The five rows illustrate, respectively, the volume profile, top-down maximum projection, and cross-sectional reconstructions at 30, 90, and 150 degrees. Figure 6(b). SSIM values for each reconstruction method benchmarked against the ground truth. Figure 6(c). PSNR values for each reconstruction approach in comparison with the ground truth.

Download Full Size | PDF

With nine Bscans, the direct interpolation method shows significant artifacts and distortions in the reconstructed volume, which are more obvious in the top-down projection image. The BBP method also shows some artifacts, but they are less prominent than with the DI method. The Unet and TransUnet generates similar output. The TransUnet model reconstructs smoother volume compared with Unet. The PA-NeRF method produce smooth and accurate reconstruction results too, although there are some slight differences between their cross-sections and the ground truth. The top-down projection image of the PA-NeRF method is also twisted due to the shift in each cross section. The use of nine Bscans covers all directional cross-sections, and the Unet model could learn the interpolation process easily from sparse data. As for PA-NeRF model, it learns the function of transducer elements and corresponding beamline data. Therefore, the simulation results from nine Bscans looks a little better for Unet than PA-NeRF.

From the SSIM and PSNR metrics in Fig. 6(b), 6c, and from the RMSE values in Table 2, the Unet method and TransUnet model still outperform the other methods, followed by PA-NeRF. Overall, these results demonstrate the superiority of deep learning-based methods in PAT image reconstruction, especially in scenarios with limited data.

Table 2. RMSE for simulation results

View Table | View all tables in this article

Next, we tried to use fewer Bscans, which provide less data for 3D reconstruction. As shown in Fig. 7, we used five Bscans, with rotation angles of 0, 40, 80, 120, and 160 degrees.

Fig. 7. a. 3D reconstruction results for a simulated letter ‘U’, from five Bscans. Columns and rows show the same information as in Fig. 6, as do plots b and c.

Download Full Size | PDF

It is interesting to see that even with fewer Bscans, the Unet method, TrasUnet model, and PA-NeRF method can still achieve good reconstruction quality, while the direct interpolation method and BBP method fail to reconstruct the correct shape. This demonstrates the effectiveness of deep learning methods in handling the data insufficiency problem in PAT. Additionally, the results suggest that the Unet method, TransUnet model, and PA-NeRF method are more robust to the data variability caused by the rotation angles of the transducer array. The higher PSNR values of the PA-NeRF method also indicate that the model can better preserve the details of the reconstruction. Overall, the results demonstrate the potential of using deep learning methods for 3D PAT reconstruction with reduced data acquisition.

Because the ultimate goal of this research is to recover volume from fewer Bscans, we tested the reconstruction results with only two Bscans, using 0 and 90 degree Bscan data.

Based on the results shown in Fig. 8, the direct interpolation method only extends the two available Bscans along their positions, resulting in a poor reconstruction. The BBP method can reconstruct some parts of the letter “U”, but not the entire shape. With only two Bscans, the Unet model, and TransUnet model do not perform as well as with five or nine Bscans, because there is limited data for interpolation. On the other hand, the PA-NeRF model is still able to reconstruct the letter “U” shape, demonstrating its ability to learn the relationship between transducer element positions and the corresponding cross sections. However, the quality of the reconstruction is not as good as with five or nine Bscans, as indicated by the SSIM and PSNR values. Judging from the RMSE values in Table 2., the PA-NeRF model is much better than Unet and TransUnet. We can see that when there are enough Bscans to reconstruct the 3D volume, Unet, TransUnet and PA-NeRF have similar performance, but when we have very limited Bscans, PA-NeRF can still maintain a reasonable error where Unet falters. The PA-NeRF model does not interpolate between Bscans, but predicts the Bscans from transducer positions.

Fig. 8. a. 3D reconstruction results for a simulated letter ‘U’, from two Bscans. Columns and rows show the same information as in Fig. 6, as do plots b and c.

Download Full Size | PDF

3.2 Phantom results

To demonstrate the performance of the PA-NeRF model with real data, we conducted phantom studies using the letters U and S. Based on the simulation results, the Unet model and PA-NeRF model were selected to reconstruct the 3D volumes.

As shown in Fig. 9, the top-down projections of U and S have similar reconstructed shapes. However, from the volume profile and top-down projection images, the Unet model and TransUnet model can learn an interpolation to reconstruct a volume, therefore, there are blank regions between two Bscans. PA-NeRF also achieves a better volume reconstruction, as demonstrated by the SSIM values.

Fig. 9. Reconstruction comparisons using five Bscans for phantom letters “U”‘ (a, b, bordered in red) and “S” (c, d, bordered in blue). In the absence of a ground truth, US images were utilized as the benchmark in these phantom studies. For each letter, from left to right, the columns sequentially present the US image-based ground truth, and the reconstruction results from the Unet model, the TransUnet model, and the PA-NeRF method. From top to bottom, the rows for each letter display the volume profile, top-down projection, and cross-sectional reconstructions at angles of 30, 90, and 150 degrees. Figure 9 b, d. Corresponding SSIM values, calculated by comparing Bscan and US image reconstructions to their respective ground truths.

Download Full Size | PDF

Next, we challenged the Unet model and PA-NeRF model with a harder problem, reconstructing a volume from only two Bscans. The results are shown in Fig. 10.

Fig. 10. a, c Reconstructed images of phantom letters U and S, from only two Bscans. Columns and rows show the same information as in Fig. 9, as do plots b and c.

Download Full Size | PDF

Here, based on 0 and 90 degree Bscans, the Unet model barely reconstructs the shapes of the U and S. Where the Unet model apparently interpolates between given Bscans to form a 3D volume. Similar reconstruction from TransUnet, it can’t reconstruct with only 2 Bscans. The PA-NeRF model tries to learn the relationship between the transducer element’s positions and its corresponding Bscan beams. Hence, the PA-NeRF model is more generalizable and can better reconstruct the letter shape.

As can be seen in Table 3., unlike the the RMSE of simulation results, PA-NeRF performed better than Unet and TransUnet model, even with five Bscans possibly because the Unet model and TransUnet model are limited to the training data. However, the PA-NeRF model is finetuned with the phantom data and the learning relationship between the transducer elements and corresponding beamline is more generalizable than the interpolation.

Table 3. RMSE for simulation results

View Table | View all tables in this article

3.3 Clinical examples

For ovarian cancer diagnosis, imaging multiple views of lesions is impractically time-consuming. Physicians often take only longitudinal and transverse views, and our PA-NeRF model can reconstruct a 3D volume from the data of just two Bscans. For homogeneous cystic lesions, we assume that either one of the two Bscan images can represent both the longitudinal and transverse orientations. Under this assumption, although we captured PAT data from only one cross-section, this data can be representative of the lesion in two directions.

Below, we describe the reconstruction of two patients’ ovarian lesions. One lesion is reconstructed from longitudinal and transverse Bscan data, but to reconstruct the other, a homogeneous lesion, we used the data from only one Bscan.

The second patient has a relatively homogeneous lesion, so the longitudinal and transverse scans are quite similar and we can use just one longitudinal image to represent both orientations. The PA-NeRF model constructs a good volume profile, as shown in the co-registered US-PAT images, the central area is the cyst, with dense PA signals.

The longitudinal view in Fig. 11 reveals two distinct cysts, and the transverse view (rotating 90 degrees) shows the two cysts merged into one big lesion. These details are successfully reconstructed in the PA-NeRF output. Even though, the PA signals in the input images are sparse and appear in only one cyst area, the co-registered US and PAT images show dense PA signals in the shape of the lesion. Figure 12 presents a prediction that accurately captures the spherical shape of the target, consistent with US images. It depicts both longitudinal and transverse views, revealing the inner signal of the dark area, which is likely indicative of the lesion area. The PA-NeRF model effectively unveils the 3D volume shape and reconstructs the PA signal from a homogenous B-scan, offering valuable insights into the lesion's structural and functional characteristics.

Fig. 11. Volume reconstruction from a 76-year-old woman with simple cysts (the lesion is 3.1 cm in diameter). Left shows input images obtained from Bscan data (blue boxes mark the region of interest), with the corresponding PA-NeRF output volume in the middle. The volume shows in three different slices: front, top, and profile respectively. Right presents two views of the volume, and their US co-registered PAT images.

Download Full Size | PDF

Fig. 12. Volume reconstruction from a 57-year-old woman with a benign mucinous cystadenoma, 3 cm in diameter. Left shows Bscan images obtained from input Bscan data (blue boxes mark the region of interest), with the corresponding PA-NeRF output volume in the middle. The volume shows in three different slices: front, top, and profile respectively. Right presents two different views of the volume, with their US co-registered PAT images.

Download Full Size | PDF

3.4 Ablation study

In this section, we describe comprehensive ablation studies to identify the essential features contributing to the effectiveness of our PA-NeRF model. Our primary configuration, formulated in Eq. (1), utilizes Mean Squared Error (MSE) loss combined with forward loss as the primary loss function. Additionally, we experimented with substituting forward loss with Total Variation (TV) loss, as shown in Table 4. However, our findings indicate that TV loss did not enhance the model's performance, suggesting that noise and artifacts are not significant hindrances in reconstructing the 3D volume.

Table 4. Ablation over TV loss and forward loss

View Table | View all tables in this article

Moreover, we explored variations in the coefficient applied to the forward loss, testing values of 0 (effectively omitting forward loss), 0.01, 0.1, and 1. Our results, as presented in Table 5., demonstrate that a coefficient of 0.1 offers an optimal balance. This balance appears to be crucial for the model's success, highlighting the importance of fine-tuning loss function parameters in achieving desired outcomes.

Table 5. Ablation over coefficient of forward loss

View Table | View all tables in this article

4. Discussion

In this study, we implemented five different methods for 3D reconstruction from sparse Bscans, namely DI, BBP, Unet, TransUnet, and PA-NeRF, with the last being the first attempt to use the NeRF model in PAT reconstruction. The DI method is easy to implement and efficient in terms of time, but its reconstruction volume is highly limited by the quality of the Bscan images, and the reconstruction quality degrades with fewer Bscans. The BBP method produces smoother reconstruction, but with insufficient data, the back projection may result in a non-realistic output. With a sufficient number of Bscans, the Unet model usefully reconstructs a high-quality volume. However, because it learns only by interpolation, with only a few Bscans, the reconstruction may be poor. Furthermore, the Unet model is limited by the dataset, and if trained with simulation data, its output is limited to the simulation data. The TransUnet model is similar to the Unet model, which is limited to the dataset and can only interpolate the Bscans. Moreover, the transformers in the TransUnet model need a huge dataset for training, and our study does not have enough data to train the model to its best performance.

In contrast, the PA-NeRF model learns the relationship between a transducer element’s position and its corresponding beamformed data, yielding a more realistic reconstruction and enhanced generality, especially with a small number of Bscans. Moreover, we introduced a forward loss to improve the reconstruction quality. Both qualitative and quantitative results from simulation and phantom data show that the PA-NeRF method can reconstruct a high-quality volume.

We have tested the PA-NeRF model’s performance on simulation data and phantom targets, as well as in clinical experiments. Compared to the other three methods, the PA-NeRF model provides superior images of a lesion’s volume and profile, and it requires less input data. Further, quantitative results (SSIM, RMSE, and PSNR) validate the superiority of the PA-NeRF model. In clinical experiments, we found that the PA-NeRF model recovered the 3D structure of ovarian lesions quite well because the enhanced PA signal had a shape similar to that of the target, which is a huge improvement. For in vivo studies, PA-NeRF outperformed 3D Unet and TransUnet because it could learn the relationship between the transducer and the Bscan image. 3D Unet is comparatively more constrained by the limited view afforded by a single Bscan image or two Bscan images. PA-NeRF’s learning capability enables it to reconstruct the volume more effectively, even with a limited number of Bscans.

The PA-NeRF model, while offering substantial potential, has shown limitations in preliminary testing. Its efficacy has been gauged on a constrained dataset comprising a select set of phantoms, which raises questions about its applicability and generalizability to real-world medical scenarios. The model's inherent computational intensity could be a potential bottleneck, especially when processing high-resolution data or when it is expected to deliver in real-time clinical situations. A significant dependency on the quality and consistency of Bscans means that any artifacts or noise in the input could adversely impact the quality of the model's outputs. Another concern arises from potential patient movements during imaging sessions, introducing inconsistencies and challenges that the model may not be robust against. While we have touched upon its capability for qualitative 3D reconstruction, its prowess in delivering accurate quantitative measurements remains a grey area due to data limitations. Practical challenges, such as acquiring multi-angle Bscans and seamlessly integrating PA-NeRF into the current clinical imaging infrastructure, underscore the need for continuing improvements.

Funding

National Cancer Institute (RO1CA237664, R01CA228047).

Acknowledgements

This work was supported by two National Institutes of Health grants (R01 CA237664 and R01 CA228047). We thank the entire GYN oncology group at the Washington University School of Medicine and at Barnes Jewish Hospital for helping identify patients, and the Radiology team for helping with US scans. We thank Professor James Ballard for reviewing and editing the manuscript.

Disclosures

The authors declare that there are no conflicts of interest.

Data availability

The code and simulation/phantom data are available on GitHub [46].

References

1. M. Xu and L. V. Wang, “Photoacoustic imaging in biomedicine,” Rev. Sci. Instrum. 77(4), 041101 (2006). [CrossRef]

2. L. V. Wang, “Prospects of photoacoustic tomography,” Med. Phys. 35(12), 5758–5767 (2008). [CrossRef]

3. J. Yao and L. V. Wang, “Photoacoustic tomography: fundamentals, advances and prospects,” Contrast Media Mol. Imaging 6(5), 332–345 (2011). [CrossRef]

4. H.S. Salehi, H. Li, A. Merkulov, et al., “Coregistered photoacoustic and ultrasound imaging and classification of ovarian cancer: ex vivo and in vivo studies,” J. Biomed. Opt. 21(4), 046006 (2016). [CrossRef]

5. S. Nandy, A. Mostafa, I. S. Hagemann, et al., “Evaluation of ovarian cancer: initial application of coregistered photoacoustic tomography and US,” Radiology 289(3), 740–747 (2018). [CrossRef]

6. E. Amidi, G. Yang, K. M. S. Uddin, et al., “Role of blood oxygenation saturation in ovarian cancer diagnosis using multi-spectral photoacoustic tomography,” J Biophotonics. 14(4), e202000368 (2021). [CrossRef]

7. K. R. Cho, “Ovarian cancer update: lessons from morphology, molecules, and mice,” Arch. Pathol. Lab. Med. 133(11), 1775–1781 (2009). [CrossRef]

8. P.D. DePriest, D. Shenson, A. Fried, et al., “A morphology index based on sonographic findings in ovarian cancer,” Gynecol. Oncol. 51(1), 7–11 (1993). [CrossRef]

9. R. J. Kurman and I. M. Shih, “Pathogenesis of ovarian cancer. Lessons from morphology and molecular biology and their clinical implications,” Int. J. Gynecol. Path. 27(2), 151 (2008). [CrossRef]

10. D. Bernardi, P. Macaskill, M. Pellegrini, et al., “Breast cancer screening with tomosynthesis (3D mammography) with acquired or synthetic 2D mammography compared with 2D mammography alone (STORM-2): a population-based prospective study,” The Lancet Oncology 17(8), 1105–1113 (2016). [CrossRef]

11. E. Z. Zhang, J. Laufer, and P. Beard, “Three-dimensional photoacoustic imaging of vascular anatomy in small animals using an optical detection system,” Proc. SPIE 6437, 64370S (2007). [CrossRef]

12. H. F. Zhang, K. Maslov, G. Stoica, et al., “Functional photoacoustic microscopy for high-resolution and noninvasive in vivo imaging,” Nat. Biotechnol. 24(7), 848–851 (2006). [CrossRef]

13. K. H. Song, G. Stoica, and L. V. Wang, “In vivo three-dimensional photoacoustic tomography of a whole mouse head,” Opt. Lett. 31(16), 2453–2455 (2006). [CrossRef]

14. C. G. A. Hoelen, F. F. M. de Mul, R. Pongers, et al., “Three-dimensional photoacoustic imaging of blood vessels in tissue,” Opt. Lett. 23(8), 648–650 (1998). [CrossRef]

15. J.-T. Oh, M.-L. Li, H. F. Zhang, et al., “Three-dimensional imaging of skin melanoma in vivo by dual-wavelength photoacoustic microscopy,” J. Biomed. Opt. 11(3), 034032 (2006). [CrossRef]

16. S. Manohar, S. E. Vaartjes, J. G. C. van Hespen, et al., “Region-of-interest breast images with the Twente Photoacoustic Mammoscope (PAM),” Proc. SPIE 6437, 643702 (2007). [CrossRef]

17. S. Vaithilingam, I. O. Wygant, P. S. Kuo, et al., “Capacitive micromachined ultrasonic transducers (CMUTs) for photoacoustic imaging,” Proc. SPIE 6086, 608603 (2006). [CrossRef]

18. P. Guo, J. Gamelin, S. Yan, et al., “Co-registered 3-D ultrasound and photoacoustic imaging using a 1.75D 1280-channel ultrasound system,” Proc. SPIE 6437, 643713 (2007). [CrossRef]

19. A. Maier, C. Syben, T. Lasser, et al., “A gentle introduction to deep learning in medical image processing,” Zeitschrift für Medizinische Physik 29(2), 86–101 (2019). [CrossRef]

20. I. R. I. Haque and J. Neubert, “Deep learning approaches to biomedical image segmentation,” Informatics in Medicine Unlocked 18, 100297 (2020). [CrossRef]

21. M.H. Hesamian, W. Jia, X. He, et al., “Deep learning techniques for medical image segmentation: achievements and challenges,” Journal of digital imaging 32(4), 582–596 (2019). [CrossRef]

22. C. Bock, M. Moor, C. R. Jutzeler, et al., “Machine learning for biomedical time series classification: from shapelets to deep learning,” Artificial Neural Networks 2190, 33–71 (2021). [CrossRef]

23. A. Hauptmann and B. Cox, “Deep learning in photoacoustic tomography: current approaches and future directions,” J. Biomed. Opt. 25(11), 112903 (2020). [CrossRef]

24. S. Antholzer, M. Haltmeier, and J. Schwab, “Deep learning for photoacoustic tomography from sparse data,” Inverse Probl. Sci. Eng. 27(7), 987–1005 (2019). [CrossRef]

25. S. Guan, A. A. Khan, S. Sikdar, et al., “Limited-view and sparse photoacoustic tomography for neuroimaging with deep learning,” Sci. Rep. 10(1), 8510 (2020). [CrossRef]

26. P. Rajendran and M. Pramanik, “Deep learning approach to improve tangential resolution in photoacoustic tomography,” Biomed. Opt. Express 11(12), 7311–7323 (2020). [CrossRef]

27. H. Zhang, H. Li, N. Nyayapathi, et al., “A new deep learning network for mitigating limited-view and under-sampling artifacts in ring-shaped photoacoustic tomography,” Computerized Medical Imaging and Graphics 84, 101720 (2020). [CrossRef]

28. B. Mildenhall, P.P. Srinivasan, M. Tancik, et al., “NeRF: Representing scenes as neural radiance fields for view synthesis,” Commun. ACM 65(1), 99–106 (2021). [CrossRef]

29. A. Corona-Figueroa, J. Frawley, S. Bond-Taylor, et al., “Mednerf: Medical neural radiance fields for reconstructing 3d-aware ct-projections from a single x-ray,” 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, (2022).

30. H. Li, H. Chen, W. Jing, et al., “3D Ultrasound Spine Imaging with application of Neural Radiance Field Method,” in 2021 IEEE International Ultrasonics Symposium (IUS), (2021).

31. D. Rückert, Y. Wang, R. Li, et al., “NeAT: Neural Adaptive Tomography,” arXivarXiv:2202.02171v1, (2022). [CrossRef]

32. M. Wysocki, M.F. Azampour, C. Eilers, et al., “Ultra-NeRF: Neural Radiance Fields for Ultrasound Imaging,” arXiv, arXiv:2301.10520 (2023). [CrossRef]

33. B. T. Cox, J. G. Laufer, P. C. Beard, et al., “Quantitative spectroscopic photoacoustic imaging: a review,” J. Biomed. Opt. 17(6), 061202 (2012). [CrossRef]

34. M. Luriakose and M. A. Borden, “Microbubbles and Nanodrops for photoacoustic tomography,” Current Opinion in Colloid & Interface Science 55, 101464 (2021). [CrossRef]

35. V. P. Nguyen and Y. M. Paulus, “Photoacoustic ophthalmoscopy: Principle, application, and future directions,” Journal of imaging 4(12), 149 (2018). [CrossRef]

36. L. Lin, P. Hu, X. Tong, et al., “High-speed three-dimensional photoacoustic computed tomography for preclinical research and clinical translation,” Nat Commun. 12(1), 882 (2021). [CrossRef]

37. Y. Duan, Z. Cheng, T. Qiu, et al., “Spherical-matching hyperbolic-array photoacoustic computed tomography,” J Biophotonics. 14(6), e202100023 (2021). [CrossRef]

38. S. Na and L. V. Wang, “Photoacoustic computed tomography for functional human brain imaging [invited],” Biomed. Opt. Express 12(7), 4056–4083 (2021). [CrossRef]

39. B. T. Cox, S. R. Arridge, and P. C. Beard, “Photoacoustic tomography with a limited-aperture planar sensor and a reverberant cavity,” Inverse Problems. 23(6), S95–S112 (2007). [CrossRef]

40. M. Xu and L. V. Wang, “Universal back-projection algorithm for photoacoustic computed tomography,” Phys. Rev. E 71(1), 016706 (2005). [CrossRef]

41. J. Chen, Y. Lu, Q. Yu, et al., “Transunet: Transformers make strong encoders for medical image segmentation,” arXiv, arXiv:2102.04306 (2021). [CrossRef]

42. S. Yan and Q. Fang, “Hybrid mesh and voxel based Monte Carlo algorithm for accurate and efficient photon transport modeling in complex bio-tissues,” Biomed. Opt. Express 11(11), 6262–6270 (2020). [CrossRef]

43. B. E. Treeby and B. T. Cox, “k-wave: Matlab toolbox for the simulation and reconstruction of photoacoustic wave fields,” J. Biomed. Opt. 15(2), 021314 (2010). [CrossRef]

44. H. S. Salehi, T. Wang, P. D. Kumavor, et al., “Design of miniaturized illumination for transvaginal coregistered photoacoustic and ultrasound imaging,” Biomed. Opt. Express 5(9), 3074–3079 (2014). [CrossRef]

45. G. Yang, E. Amidi, W. Champman, et al., “Co-registered photoacoustic and ultrasound real-time imaging of colorectal cancer: ex-vivo studies,” Photons Plus Ultrasound: Imaging and Sensing 2019, Vol. 10878, International Society for Optics and Photonics, 2019.

45. Y. Zhou, Y. Lin, Q. Zhou, et al., “Code,” Optical ultrasound imaging, 2024, https://github.com/OpticalUltrasoundImaging/PA-NeRF.

PARAMETER, UNIT	VALUES	COUNT
TARGET DIAMETER, CM	2, 2.5, 3	3
TARGET DEPTH, CM	2, 2.5, 3, 3.5, 4	5
LETTERS	‘W’, ‘A’, ‘S’, ‘H’, ‘U’	5
ROTATION ANGLES, °	0, 10, 20, …, 170	18

NUM OF BSCAN	DI	BBP	UNET	TRANSUNET	PA-NERF
9	0.099	0.113	0.019	0.020	0.024
5	0.118	0.129	0.022	0.024	0.025
2	0.183	0.176	0.084	0.094	0.031

PHANTOM	NUM OF BSCAN	UNET	TRANSUNET	PA-NERF
‘U’	5	0.128	0.117	0.109
‘U’	2	0.241	0.263	0.155
‘S’	5	0.227	0.231	0.126
‘S’	2	0.310	0.319	0.144

SSIM	5 BSCAN	2BSCAN
RMSE ONLY	0.757 ± 0.094	0.751 ± 0.074
RMSE + TV	0.775 ± 0.102	0.791 ± 0.084
RMSE + FORWARD	0.880 ± 0.027	0.875 ± 0.039

COEFFICIENT	5 BSCAN	2BSCAN
0	0.757 ± 0.094	0.751 ± 0.074
0.01	0.775 ± 0.102	0.791 ± 0.084
0.1	0.880 ± 0.027	0.875 ± 0.039
1	0.790 ± 0.067	0.766 ± 0.052

PA-NeRF, a neural radiance field model for 3D photoacoustic tomography reconstruction from limited Bscan data

Abstract

1. Introduction

2. Methodology

2.1 PA-NeRF

2.2 Direct interpolation

2.3 Bscan-based back projection (BBP)

2.4 Convolutional neural network

2.5 TransUnet

2.6 Generating simulation and experimental data for model training

2.6.1 Simulation data

2.6.2 3D Phantom data

2.7 Real-time US-PAT system

2.8 Image quality assessment

2.8.1 SSIM

2.8.2 PSNR

2.8.3 RMSE

3. Results

3.1 Simulation results

3.2 Phantom results

3.3 Clinical examples

3.4 Ablation study

4. Discussion

Funding

Acknowledgements

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Tables (5)

Equations (6)

Biomedical Optics Express