PDTANet: a context-guided and attention-aware deep learning method for tumor segmentation of guinea pig colorectal OCT images

Jing Lyu; Jing Lyu; Jing Lyu; Lin Ren; Lin Ren; Qinying Liu; Qinying Liu; Yan Wang; Yan Wang; Zhenqiao Zhou; Zhenqiao Zhou; Zhenqiao Zhou; Yueyan Chen; Yueyan Chen; Hongbo Jia; Hongbo Jia; Hongbo Jia; Yuguo Tang; Yuguo Tang; Yuguo Tang; Yuguo Tang; Min Li; Min Li; Min Li; Min Li

doi:10.1364/OPTCON.493630

1. Introduction

Colorectal cancer has become the third most common cancer in the world. In terms of mortality, it ranks second in the world. Common screening methods for colorectal cancer include colonoscopy, computed tomography colonography (CTC), magnetic resonance imaging (MRI) [1,2]. Different colorectal cancer screening methods have their own advantages and disadvantages. Flexible endoscopy is the most commonly used method to monitor colon malignant tumors. However, because small or fixed lesions are usually not easily detected by the naked eye, early malignant tumors are easy to be overlooked [3]. Although the CTC method has clear imaging, it has the disadvantage of high radiation [4]. MRI is superior to CT in some aspects, especially in the absence of ionizing radiation. However, MRI has low spatial resolution, a long inspection time, and is not as easy to operate as CT [5]. Therefore, the imaging technique for colorectal cancer screening, especially for early cancer screening, still has a lot of research spaces.

Optical coherence tomography (OCT) is a non-invasive, high-resolution imaging technique for biological tissues, which is another technological breakthrough after CT and MRI [6]. OCT is a unique imaging method that can combine nearly microscopic resolution with volumetric and subsurface real-time imaging capabilities [7]. These features enable OCT to address many of the shortcomings of other gastrointestinal imaging techniques [8], such as traditional camera endoscopes of the upper digestive tract [9–16] or large intestine [17–19]. As an “optical biopsy” tool [20–23], OCT can aid in the accurate differentiation of abnormal and normal tissues in multiple organs in both murine and human colorectal models [24–27]. The OCT technique shows great value for the early diagnosis of early gastrointestinal tumors, and has the potential to become a new screening technology for early tumors. Meanwhile, OCT has great potential for guiding intraoperative tumor surgery.

Although OCT technology shows great potential for clinical application, the accurate segmentation of OCT images still has some challenges, especially for hollow tissues such as the gastrointestinal tract, due to the influence of speckle noise, uneven gray levels and artifacts [28]. As a result, the development of an intelligent OCT image analysis system and the generation of more objective and quantitative analysis results will be very significant for medical clinical applications. One of the key steps is to accurately segment the features of interest in the image, and calculate the tissue characteristics, including shape, area, volume, thickness, and eccentricity [29]. Based on the advantages of deep learning, segmentation of endoscopic gastrointestinal tract OCT images with deep learning model has become an increasingly active research field.

In recent research, several popular deep learning methods have been applied for layer segmentation of endoscopic OCT images. Li et al. [30] connected multiple U-Nets in parallel to segment esophageal layers in OCT images of guinea pigs. This method emphasizes the parallel training of multiple models to improve the prediction of organizational hierarchical topology, so the process of model training and prediction becomes more complex. Wang et al. [31] proposed a self-attention network named TSA-Net, which is applied to the automatic tissue layer segmentation of OCT images for the guinea pig esophagus. They added channel and position attention to Unet baseline to capture the global context dependency of OCT images and achieve high segmentation accuracy. However, TSA module requires a lot of memory to calculate the self-attention feature map, leading to a greater burden on computing equipment. In another research Wang’s team also used a novel adversarial convolutional network (ACN) with a generator and a discriminator to gradually improve classification ability during the process [32]. The adversarial learning had good performance in learning topological relationships; however, the model missed the global tissue structure of OCT images because it took slices as inputs. Yang et al. [33] proposed a Bicon-CE network structure based on bilateral connectivity for cortical tissue segmentation of human esophageal OCT images. The segmentation task was defined by the model as a combination of pixel connectivity modeling and pixel-based tissue classification, reducing topological issues such as segmentation result fracture and abnormal prediction. This model is only applied to the segmentation problem of dichotomies. When this model is applied to multi-classification problems, more parameters and more complexity in the model structure need to be added. The above studies mainly focused on tissue layer segmentation of endoscopic OCT images, but there are few published research works related to tumor recognition and segmentation. Zeng et al. [34] applied deep learning technology to colon OCT tumor diagnosis. In their work, a deep learning-based pattern recognition OCT system was designed to automate image processing and distinguish normal colonic mucosa from tumors in OCT images of the human colon. But they mainly focused on pattern recognition without paying attention to accurate segmentation and further quantitative analysis.

In our research, we used some popular deep learning methods to segment colorectal tissue and tumors from guinea pig OCT images. We proposed a novel codec network, PDTANet (Parallel Dilated Triplet Attention Net) to solve the problem of accurate segmentation of OCT endoscopic images. The core idea of the PDTANet proposed by us is to combine the results of the classical U-shape network with the parallel extended feature fusion module and the triple attention module, our U-shape network framework can extract global context information and key information more effectively. Our contributions are as follows:

We propose the parallel-dilated feature fusion module as an assistant for the U-shaped baseline. The parallel-dilated feature fusion module is designed to extract the features of multi-scale receptive fields by using multiple parallel dilated convolutions. The main goal is to aggregate the context information of different regions and improve the ability of network to obtain global information. The module is added to the encoder structure at several layers of the baseline, and the output feature maps of the module are integrated to the feature maps passed from the upper layer after pooling. The encoder with global context aggregation module improves feature characterization ability and can update learning parameters more effectively according to the feedback of loss function value.

We apply a lightweight and effective attention mechanism named triplet attention (TA) to work with the baseline decoder. The key principle of the TA module is to calculate attention weights by using triplet branch structures that capture cross-dimensional interactions of encoding channel and spatial information with almost negligible computational overhead. Several TA modules are inserted into the different layers of the encoder structure, expecting to improve the model’s ability to pay attention to important information like the small tumor region with a small computational cost.

We also propose the adoption of a weighted hybrid loss function in our segmentation task of OCT images. The hybrid loss function supervises not only the region but also the boundary segmentation, which outperforms the traditional region-based loss functions such as the Dice loss function.

2. Proposed method

2.1 PDTANet architecture

An illustration of our proposed network is shown in Fig. 1. There are many state-of-the-art implementations based on encoder-decoder framework, however, Unet shows great potential in medical image segmentation. In our work, the Unet structure is applied as the baseline, which consists of an encoder path for feature extraction and a decoder path for object segmentation. Furthermore, we attempt to add two additional modules in order to improve the effectiveness of the basic baseline. The essence of the proposed architecture, named PDTANet, is a combination of encoder-decoder baseline containing a global context aggregation module and a lightweight attention mechanism module. The proposed PDTANet network is expected to learn rich and comprehensive feature information in the coding stage and pays better attention to the key information in the decoding stage to recover the feature map.

Fig. 1. PDTANet architecture.

Download Full Size | PDF

The architecture of the network can be described as follows:

The network consists of two main components: PDTANet and TripletAttention. PDTANet as the main model is composed of a series of DoubleConv blocks, upsampling layers (nn.ConvTranspose2d) and a final convolutional layer. Each DoubleConv block consists of two consecutive convolutional layers and has a ReLU activation function. TripletAttention as a module is used in PDTANet to spatial and channel-wise attention, it contains two instances of AttentionGate, which applies attention mechanisms on different dimensions of the input. AttentionGate as a module compresses the input using max-pooling and average-pooling operations and applies a convolutional layer with sigmoid activation to attention scaling. The network architecture also includes additional convolutional layers and multi-scale inputs pooling operations. The input of the network architecture is an image with three channels (in_ch = 3) and the output is a single-channel mask (out_ch = 1).

The loss function is used in deep learning to quantify the discrepancy between the model's predicted value and the actual value in direct the network model's iterative process. BCE, IoU, Dice, and other commonly used loss functions have been studied for semantic segmentation. However, the key drawback of these loss functions is unable to punish boundary missegmentation in a targeted way. In our study, we discovered that the hybrid loss function integrating border-level and region-level not only segments boundary but also supervises region learning, which satisfies our requirements for accurate segmentation. We employ two loss functions: Dice loss, which primarily supervises region learning, and Hybrid loss, which simultaneously supervises region and border learning. After conducting tests, we discovered that the Dice los + Hybrid loss in the segmentation achieves precise segmentation.

2.2 Encoding network

2.2.1 Parallel-dilated feature fusion (PDFF) module

The encoder plays an important role in understanding the content of an image in encoder-decoder architecture. As an image feature extractor, the encoder network, mainly consisting of convolution and pooling operations, has limitations that hinder the model from getting a better segmentation effect. The convolution operations can change the receptive field by adjusting the size of the convolution kernel, but they mainly describe the local core area and ignore the context inevitably. In fact, accurate semantic segmentation tasks necessitate multi-scale context reasoning to eliminate image segmentation topology errors caused by local region attention. The pooling operation can effectively increase the receptive field of neural network, but there is the problem of reducing the image resolution. Because information loss caused by pooling operation is irreversible, it is impossible for the network to reconstruct small object information again. In fact, accurate semantic segmentation tasks always require rich context information. An enhancement to the encoder network can allow for more context information learning and have a positive impact on image feature extraction. In this work, the parallel-dilated feature fusion module (PDFF) consisting of dilated convolutional layers and multi-field feature fusion is adopted to increase the receptive field and reduce information loss. The outputs of dilated convolutional layers with different rates would be concatenated with the main feature maps passed down from the upper layer of encoder, and form multi-field features.

The core concept of PDFF module is the application of the concept of dilated convolution. Dilated convolution is a convolution idea proposed to solve the problem of image semantic segmentation, in which pooling will reduce image resolution and lose information. Although max pooling may be replaced by dilated convolution with stride, the PDFF module is utilizing max pooling in this case for performance and compatibility considerations. The information loss was made up for by adding additional dilated convolution layers and a feature fusion technique. Due to the nature of dilated convolution, the continuity of image information may be lost, but we take the features extracted by dilated convolution as a feature supplement, which will not have a great negative impact on the main features. Dilated convolution controls the acceptance scale by using different rates while maintaining the same resolution as the upper layers passed feature maps. The structure of the PDFF module is shown in the Fig. 2. The input of proposed PDFF module is obtained from the upper input but with a different resolution, which will be down-sampled by average-pooling with 2 × 2 kernels and a stride of 2, i.e., to the same dimension but half-resolution compared to the input image. Then, the half-resolution image is processed by parallel dilated convolution by four layers with different rates. The dilated convolution with a rate of 1 is equivalent to the standard convolution, while the other three rates change the receptive field from 3 × 3 expanded to 5 × 5 and 7 × 7. Actually, the rates can be adjusted flexibly according to the input size at each layer. Finally, the outputs generated from four parallel paths are fused with the main feature maps passed from the upper pooling layer.

Fig. 2. Parallel-dilated feature fusion (PDFF) module.

Download Full Size | PDF

2.2.2 Encoder based on PDFF module

In semantic segmentation, accurate segmentation requires models with powerful feature extraction and expression capabilities, especially in the field of OCT image segmentation. This is still challenging. As a feature extractor in segmentation tasks, the encoding network of baseline adopts the PDFF module to capture global context information and extract rich multi-scale information effectively. As shown in Fig. 3, the DoubleConv operation includes a sequence of Conv2d, BatchNorm2d, and ReLU repeated for twice. When the PDFF module receives its input image, four groups of feature maps with different receptive fields will be extracted, and a feature fusion will be performed subsequently. As a multi-input network model based on multi-resolution inputs for network training has been proven effective [35–38], the PDFF module has been introduced to the middle three of the five layers of encoding network which aims to receive multi-input streams and obtain multi-scale context-aware information. The multi-input images for each layer are obtained from the original image with descending resolutions layer by layer depending on average pooling operation. The combination of multi-inputs and multiple PDFF modules help the encoding network to keep a global field view for as long as possible to learn rich context information and more spatial information, such as the layer structure of tissue and tumor.

Fig. 3. The structure of encoding network based on PDFF module.

Download Full Size | PDF

2.3 Decoding network

2.3.1 Triplet attention (TA) module

In deep learning, attention mechanisms is one of the core technologies that deserve the most attention and in-depth understanding. The Attention mechanism has been widely used in various applications of deep learning, including image segmentation, speech recognition, natural language processing, and so on. In the area of image segmentation, many researchers have proposed different types of attention mechanisms. Among them, channel and spatial attention are often used, such as in sequence and exception networks (SENet) [39], associative block attention module (CBAM), triplet attention (TA), etc. CBAM has additional spatial attention and archives better performance, compared with the sequence and exception (SE) block, which focuses on channel attention. In CBAM, channel attention and spatial attention are computed independently, ignoring relationships between these two domains. TA is an effective attention mechanism for capturing cross-dimension interactions. Compared with CBAM, TA is much lighter and needs fewer parameters to establish the dependency between channels. TA can encode channel and spatial information with negligible computational overhead and capture rich feature expressions. The purpose of learning attention weights is to make the network capable of learning where to focus and further keeping focus on key objects. The TA module can be easily added to an encoder or decoder network, providing significant performance benefits of segmented networks with a reasonable computational overhead. We implement the TA module with a same structure proposed by Diganta et al. [40], which is mainly composed of three branches as shown in Fig. 4. Each branch computes the attention weights and produces attention features for further fusion. Given an input tensor with shape (C × H × W), the first two branches are used to establish the interaction between dimension H or W and channel C, while the third branch is used to establish the interaction between dimension H and W. Finally, the attention features obtained from the three branches are fused together to represent the features that pay more attention to the key information. In the last step of feature fusion, we set a higher proportion for the third branch, which is different from convolutional block attention module (CBAM) [41].

Fig. 4. Triplet Attention (TA) mechanism. (a) Z-Pool. (b) Basic-Conv. (c) TA module.

Download Full Size | PDF

2.3.2 Decoder based on TA

The decoding network utilizes deconvolution to carry out up sampling layer-by-layer, and the spatial information and the edge information will be gradually restored for the original image. Therefore, the low resolution feature maps will eventually be mapped to the pixel level segmentation result map. In order to further make up for the lost information in the encoding stage, a skip connection structure between the encoder and decoder of baseline has been applied to fuse the feature map at the corresponding positions of encoding and decoding As a result, the decoder can obtain more high-resolution information during up sampling, improve the restoration of detail information in the original image, and improve segmentation accuracy. However, the skip connection lacks the ability to filter key information, despite being extremely useful in obtaining a large amount of rich multi-scale and multi-level data.

Attention is a popular method that can strengthen important information in applications of deep learning. In order to find salient regions of OCT images more effectively, the TA module has been added to the decoding network in this work. As can be seen from Fig. 5, the proposed attention mechanism puts spatial and channel attention on the convolution feature maps in the three layers of the five layers due to its lightweight properties. The addition of TA helps the model pay more attention to the target ROI (Region of Interest) and eliminate irrelevant background interference. Just like most attention mechanisms, the TA module can also keep the input and output the same size, which is very convenient for plug and play.

Fig. 5. The structure of decoding network based on TA module.

Download Full Size | PDF

3. Weighted hybrid loss function

In the field of deep learning, the important role of loss function is to effectively guide network to learn and optimize the parameters of model during the period of back-propagation, so the selection of an appropriate loss function is one of the most important steps [42]. In colorectal OCT images, the early tumor is usually imbalanced compared with the background. This data imbalance problem has been solved to a certain extent by designing some specific training loss functions in some previous work, among which Dice loss is a typical representative and is widely used in medical image segmentation [43]. In semantic segmentation, Dice loss is used to measure the similarity between the predicted segmentation and the ground truth. We use the Dice loss to deal with the imbalance between tumor tissue and background tissue. The formal description of the Dice coefficient is shown in Eq. (1) . And the Dice Loss is defined based on the Dice coefficient.

(1)$$DiceCoef = \frac{{2|{{X_{gt}} \cap {Y_{pred}}} |}}{{|{{X_{gt}}|+ |{Y_{pred}}} |}}.$$

(2) $$DiceLoss = 1 - DiceCoef.$$

In the application of tumor segmentation of colorectal OCT images, it is usually necessary to distinguish tissue, tumor, and other contents, so it is very important to accurately extract these boundaries. However, the widely used Dice loss function does not have the ability to fully punish the misalignment of boundaries. Therefore, the single selection of Dice loss may not enable the model to learn the boundary effectively. In order to learn segmentation boundaries more accurately, we adopted the boundary loss proposed by Bokhovkin shown in Eq. (3). Let $\textrm{S}_{gt}^\textrm{c}$ and $\textrm{S}_{gt}^\textrm{c}$ be the binary maps of class c in the ground truth and predicted segmentation, respectively, $B_{gt}^\textrm{c}$ and $B_{pd}^\textrm{c}$ represent the boundaries for these binary maps. Then the precision and the recall for class c are defined as follows:

(3)$${P^c} = \frac{1}{{|{B_{pd}^c} |}}\mathop \sum \nolimits_{x \in B_{pd}^c} [{[{d({x,B_{gt}^c} )< \theta } ]} ],{R^c} = \frac{1}{{|{B_{gt}^c} |}}\mathop \sum \nolimits_{x \in B_{gt}^c} [{[{d({x,B_{pd}^c} )< \theta } ]} ].$$

Here, distance is calculated from a point to one of two sets (ground truth or the predicted boundary) as the shortest distance from the point to the point of the boundary. The value of hyper-parameter $\theta $ must not be more than the minimum distance between adjacent segments of the binary ground truth map. The $BF_1^c $ measurement for each class is defined as:

(4)$$BF_1^c = \frac{{2{P^c}{R^c}}}{{{P^c} + {R^c}}}.$$

(5) $$BoundaryLoss = 1 - BF_1^c.$$

Finally, a combination of dice and boundary related loss function is proposed as the weighted hybrid loss function as shown in Eq. (6). Among them, α is the weight used to allocate different loss functions during the training period, we can dynamically adjust the value of α to control the proportion of each loss’s role.

(6)$$HybridLoss = \alpha \times DiceLoss + ({1 - \alpha } )\times BoundaryLoss.$$

4. Experiments

4.1 Datasets

As shown in the Fig. 6, We used the self-designed swept-source optical coherence tomography (SS-OCT) equipment to scan and image the colon tissue of tumor mice in vitro. The SS-OCT system uses the Michelson interference principle to detect the weak reflective signal of the sample and the Fourier-transform principle to reconstruct the sample structure. The linear-k sweep-frequency laser-light source (λ₀= 1310 nm, Δλ = 110 nm, AXSUN, USA) used in this study has a scanning frequency of 200 kHz, and the maximum sampling rate of the data-acquisition card (12-bit dual-channel, ATS-9373, Alazartech, Canada) used in this study can reach 4 GS/s; both the capture card and GPU use a PCIe Gen3 x16 interface for memory transfer with the central processing unit (CPU). The endoscopic probe is mainly composed of single-mode optical fiber and a GRIN lens, with a diameter of 1.2 mm, the final axial resolution of the system is 7.3 μm, the actual imaging depth of the system in the air is 4.5 mm, and the highest sensitivity is 110 dB [44]. The working distance is 1.7 mm. The colorectal tissue is a C57BL/6-Apc^{em1(L850X)Smoc} tumor mouse purchased from Shanghai Nanfang Model Biotechnology Co., Ltd. The company has provided colorectal anatomy and pathological test results. The pathological test results show that the intestinal epithelium of Apc^{em1(L850X)Smoc} mutant mice can see hyperplasia of adenocarcinoma tissue, the blue arrow indicates that the nucleus is deeply stained, and the gland is hyperplasia, which is intestinal adenoma, as shown in the Fig. 6(e-f). Finally, we inserted the probe into the intestinal tissue and carried out high-speed scanning to obtain 2D images.

Fig. 6. Imaging colorectal tissue with SS-OCT system.(a) SS-OCT system; (b) Self-developed probe; (c) Colorectal tissue; (d) Colorectal tissue with probe inserted;(e-f) Pathological results of colorectal tumors.

Download Full Size | PDF

At the same time, we use the precision sliding table to control the Z-direction movement of the probe to obtain images in tumor and non-tumor areas, and the OCT 2D images reflecting the colorectal mucosa; intrinsic muscle layer, and other tissue levels in mice were obtained as the data source for this study. A total of 2100 images with a resolution of 3950 × 1024 were obtained including 848 images in normal state and 1252 images in tumor state. Firstly, the acquired OCT images were first cropped to recover the ROI automatically using an algorithm named Fixed Bounding Box and uniformly adjusted to a size of 1024 × 128. Secondly, the medical image processing software ImageJ is used for contrast enhancement. Thirdly, the data augmentation methods such as distortion, flipping, blurring, and brightness adjustment are adopted to prevent overfitting and improve the robustness of the model, and then expand the original 2100 data sheets to 30000 sheets for training. The processed images reserve the original image feature and meanwhile provide the difference with the original image. Finally, all the data manually labeled by a medical expert is divided into training set, validation set and test set by random sampling with a proportion of 3:1:1 as we show in Dataset 1 (Ref. [45]). As shown in the Fig. 7, the processed OCT images contain both normal and abnormal tissues. Normal tissues usually have a clear texture balance layer, while tissues with tumors have an uneven texture.

Fig. 7. OCT images of normal colorectal tissue and colorectal tumor after processing.(a)Normal colon tissue; (b-d) Tumors of varying degrees in the colon.

Download Full Size | PDF

4.2 Implementation details

In this study, model training and segmentation is implemented on an ubuntu18.04 system, and the graphics card an NVIDIA GeForce RTX 3090. The GPU is used in the training process, and Pytorch, one of the most popular deep learning frameworks, is used as the backend for model training with CUDA 11.3. Training epochs is set as 500. For training, the Adam [46] optimizer is used, and the learning rate is set to 0.0001 which will attenuate once every 100 epochs with an attenuation weight as 0.5. The hybrid loss function which can simultaneously implement region and boundary supervision, is applied to learn proper model parameters, and the weight α between dice and boundary loss is set to 0.5. In order to illustrate the effectiveness of Hybrid Loss, Dice Loss function has also been studied in some experiments. In the process of model training, the model weights are initialized randomly first, and then the model gradually converges after training. The model with the minimum loss on the verification dataset is saved as the best model for segmentation. All of the experiments in this article were performed under the implementation details of this section.

4.3 Evaluation metrics

In the work involved in this paper, we first consider region-based Dice similarity coefficient (DSC) and boundary-based Hausdorff distance (HD) to evaluate the predictive power of the models. DSC is sensitive to the internal filling of the mask and is used to evaluate the segmentation performance based on the overlap area; while HD is an informative criterion to measure the boundary accuracy of the segmentation result. The DSC and HD are defined as Eq. (7) and Eq. (8) [32], where A and B represent the ground truth label and the segmentation result, respectively. In Eq. (9), d(a, b) indicates the Eula distance of points a and b.

(7)$$\textrm{DSC(}A,B) = 2 \times \frac{{|{A \cap B} |}}{{|A |+ |B |}}$$

(8)$$\textrm{HD}({A,B} )= \textrm{max}\left\{ {\mathop {\textrm{max}}\nolimits_{a \in A} \mathop {\textrm{min}}\nolimits_{b \in B} d({a,b} ),\mathop {\textrm{max}}\nolimits_{b \in B} \mathop {\textrm{min}}\nolimits_{a \in A} d({a,b} )} \right\}$$

However, DSC is not sensitive enough to small changes for large regions as the overlap is also large, which makes it difficult to evaluate the segmentation of exact borders. Stanislav et.al [47] proposed surface DSC to overcome the shortcomings of traditional DSC which focuses on assessing the overlap of the borders of two regions. As mentioned in their work, the definition of surface DSC at a tolerance τ is shown as Eq. (9).

(9)$$R_{i,j}^{(\tau )} = \frac{{|{{\mathrm{{\cal S}}_i} \cap \mathrm{{\cal B}}_j^{(\tau )}} |+ |{{\mathrm{{\cal S}}_j} \cap \mathrm{{\cal B}}_i^{(\tau )}} |}}{{|{{\mathrm{{\cal S}}_i}} |+ |{{\mathrm{{\cal S}}_j}} |}}$$

where ${\mathrm{{\cal S}}_i}$ and ${\mathrm{{\cal S}}_j}$ indicates the surfaces, $\mathrm{{\cal B}}_i^{(\tau )}$ and $\mathrm{{\cal B}}_j^{(\tau )}$ indicates border region at the given tolerance, τ which is fixed as 95$\textrm{\%}$ in this study.

4.4 Segmentation performance

4.4.1 Implement segmentation task

During OCT image segmentation, we use the prepared training set and validation set, and retain the model with the highest index score on the verification set, and use it to segment the OCT image test set. Figure 8 shows two examples after segmentation. Figure 8(a) shows the contour of the segmentation result on the normal image, and Fig. 8(b) and (c) show the contour of the segmentation result on the tumor image. It can be seen that the model and loss function used in this paper have well realized intestinal tissue segmentation and tumor recognition based on OCT images.

Fig. 8. Segmentation of OCT images. (a)Segmentation and contour display of normal case; (b)Segmentation and contour display of case with tumor; (c) Segmentation and contour display of case with large tumor.

Download Full Size | PDF

4.4.2 Comparison with alternative methods

In order to better demonstrate the effectiveness of the modules used in the model structure in this paper, we carried out two groups of experiments to study the OCT image segmentation effect after combining different loss functions and different models. In the following experiments, we used three comparable models to compare the segmentation effect. The first is the most typical Unet model in the field of medical segmentation, and the second is the TSA-Net model proposed by Wang et al. (As mentioned in Chapter I [31]). They have applied this model to the study of tissue stratification in guinea pig OCT images, and achieved good results. The third is the PDTANet model proposed in our paper.

4.4.3 Comparison with different loss functions

We use two loss functions, one is Dice Loss, which mainly supervises regional learning, and the other is Hybrid Loss, which is responsible for simultaneously supervising regional and boundary learning. These two loss functions are combined with three different models to carry out relevant tests. After the experiment, we found that although the traditional Dice loss has a certain monitoring role, it can help to segment the target. However, it is not effective enough. In the segmentation results, there are many phenomena through red rectangular callout box, such as wrong segmentation regions, broken edge lines, and uneven edges. Examples of segmentation results obtained by combining two loss functions and each model with training models are shown in Fig. 9. Figure 9(a) and (b) are the segmentation results of model training based on Dice loss, and Fig. 9(c) and (d) are the segmentation results of model training based on hybrid loss. This group of experiments shows that hybrid loss is more suitable for OCT image segmentation than Dice loss. The hybrid Loss has a good anti-noise ability, can help model parameter learning more effectively combined with area penalty and boundary penalty, and has obvious improvement effects in reducing small area false segmentation, reducing edge fracture, and improving edge smoothness.

Fig. 9. Comparisons of segmentation by PDTANet model trained with different loss function. (a) and (b)Segmentation are comparison of normal cases;(c) and (d) are segmentation comparison of cases with tumor.

Download Full Size | PDF

4.4.4 Comparison with alternative models

In the next group of experiments, we uniformly use the hybrid loss function and focus on comparing the segmentation effects of three different models on tumor regions in OCT images. In the OCT images we collected, some of them are with tumors, and a few of them are with small cysts near the tumors. Because of their small area and small number, they are usually vulnerable to noise interference and difficult to find. Through the experimental test in Fig. 10, we found that the PDTA model is not only closer to expert judgment in edge processing than other models, but also can overcome the interference caused by blurred image edges to some extent. Moreover, our model can make better use of the global context information, master important information, and perform better in the segmentation of small cyst regions than the other four models. Other models of Unet is also shows great results like CE-net. However, CE-net also requires a significant amount of labeled data for training, which is not always available, particularly in medical image analysis where labeling is a time-consuming and expensive task.

Fig. 10. Comparisons of topology segmentation by different models trained with hybrid loss function. (a) and (b) are segmentation comparison of normal cases; (c)-(e) are segmentation comparison of case with tumor and cyst as we show in Code 1 (Ref. [48]).

Download Full Size | PDF

4.4.5 Attention module ablation research

To further comprehend the structure, we used the PDANet module in the segmentation network to capture the long-range relationships. We conduct an ablation research comparing the segmentation performance of networks with and without the PDANet module to validate the performance of the PDANet module. Figure 11 provides a clear illustration of the PDANet effects in the light of FP and FN. The segmentation result is more accurate because, as seen by the red as FN, the segmentation network with PDANet module creates more logical tissue boundaries. When segmenting without a PDANet module or modules like Unet and Unet + PDFF has more FN. The network in the other that didn't include the PDANet module mistook the artifacts for tissues. The PDANet module assisted in producing continuous and fluid segmentation in the bottom two levels of the last row. These findings support the notion that OCT image segmentation greatly benefits from the attention-aware modules.

Fig. 11. Visualization of segmentation result of the with PDANet and without PDANet on OCT images (see Visualization 1).

Download Full Size | PDF

4.4.6 Metric evaluation results

The above tests mainly compare the model with the loss function by dividing the graph. We will also compare the effects from the perspective of quantitative indicators. By combining the three models and two loss functions mentioned above, we conducted a total of six experiments. Each test group must be repeated ten times. DSC, Surface DSC and HD are used to comprehensively evaluate the segmentation effect. Table 1 shows the statistical comparisons of metric evaluation between PDTANet and the other two models. The results show that: (1) under the same model, the hybrid loss has significantly improved the indicators of each model; and (2) under the same loss function, the precision of PDTANet model is improved compared with the other two models. Therefore, PDTANet + Hybrid Loss is the best combination, especially in the surface Dice metric, which is improved by about 3%. It provides an effective tool for fine segmentation of mouse tumors.

Table 1. Statistical comparisons of metric evaluation between PDTANet and alternative models

View Table | View all tables in this article

Table 2 shows that the training time and number of parameters of the models increase as their accuracy increases. This is because more complex models are needed to learn more complex patterns in the data. The prediction time of the models is also affected by their complexity, but to a lesser extent. This is because the prediction time is mainly determined by the number of parameters in the model. Overall, the table shows that there is a trade-off between the accuracy, training time, and number of parameters of deep learning models for OCT image segmentation. More accurate models require more complex architectures and more training data, which leads to longer training times and larger numbers of parameters and PDTA show much better performance than other models.

Table 2. Compare training time, prediction time and number of training parameters to achieve these results with some other models

View Table | View all tables in this article

In Tables 3 and 4 shows TP, TN, FP and FN Sensitivity, specificity, precision, IOU, and Dice coefficients. The results shows that the model is very accurate and has a high degree of overlap with the ground truth segmentation. This suggests that the model is able to accurately identify the presence and absence of a disease in OCT images.

Table 3. Confusion matrix and associated values (TP,TN,FP,FN)

View Table | View all tables in this article

The sensitivity, specificity, precision, IOU, and dice coefficients for each model are listed in Table 3. These metrics offer an assessment of the model's performance in terms of its precision in making predictions, accuracy in identifying positive and negative instances, and overlap between expected and ground truth masks. In Table 4. Sensitivity, specificity, precision, IOU, and Dice coefficients has been calculated from Table 3.

Table 4. Sensitivity, specificity, precision IOU, and Dice coefficients

View Table | View all tables in this article

5. Discussion and conclusion

In our research work, we proposed a new structure based on a U-shape network combined with parallel-dilated feature fusion module and TA mechanism. The parallel-dilated feature fusion module is used to accept multi-scale inputs at different levels and is aimed to at capturing global contextual information for colorectal OCT image segmentation. To achieve higher precision in tumor segmentation tasks, a negligible computational overhead TA module is used to capture and reach discriminative feature representations. In model training stage, we employ a weighted mixed loss function composed of a regional-based loss and a boundary- based loss and assign different weight values to the two losses at different stages. The weight of Dice loss as regional-based loss is updated from high to low during training, and the weight of boundary loss updates in the opposite direction. Moreover, we also utilize a deep supervision strategy for improving model convergence and yielding a new state-of-the- art performance. For the proposed model structure, the employed multi-scale parallel-dilated feature fusion module has a certain computational overhead, but the triple attention module is lighter than the general attention module such as SE or CBAM module, so the overall network proposed in this paper is not that complicated, and performs more efficiently not only in reducing boundary breaks and small areas but also in segmenting small tumors. The experiments demonstrate the effectiveness of our proposed network, which achieves state-of- the-art results on guinea pig colorectal OCT images. Our model has been shown to achieve high accuracy with relatively short training times and a small number of parameters. This makes PDTANet a promising model for OCT image segmentation, as it can be deployed on resource-constrained devices and used to quickly and accurately identify. These metrics and above tables provide a way to measure the model's performance in terms of its ability to make accurate predictions, identify positive and negative instances, and create masks that are similar to the ground truth masks. In addition, The computing consumption of the parallel dilution feature fusion module and TA mechanism may limit real-time performance in high-speed OCT system, such as MHz-OCT(scanning speed exceeds 1,000,000 axial scans per second). We will further explore the lightweight model and focus on tumor segmentation for OCT system to improve segmentation speed and reduce computing consumption.

Funding

Jiangsu Innovation and Entrepreneurship Team Fund, the Major scientific research facility project of Jiangsu Province (BM2022010); Basic Research Pilot Project of Suzhou (SJC2021021); Scientific Instrument Developing Project of the Chinese Academy of Sciences (NO.YJKYYQ20200052); Scientific Instrument Developing Project of Chinese Academy of Sciences (GJJSTD20190003).

Acknowledgment

The authors are grateful to Dr. Min Li, Dr. Yuguo Tang, and Dr. Hongbo Jia for very helpful discussions and comments; to Mr. Ren Lin for help in composing and layout editing of the figures.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

We analysed our data using custom-written software in LabVIEW 2019 (National Instruments), and Visual Studio 2017 (Microsoft). The data that support the findings of this study are available in Dataset 1 [45] and Code 1 [48]. Additional data are available from the corresponding author upon reasonable request. Additional codes supporting the current study have not been deposited in a public repository, but are available from the corresponding author upon request. The code will be available by emailing request to limin@sibet.ac.cn or lvjing@sibet.ac.cn.

Supplemental document

See Supplement 1 for supporting content.

References

1. S Ten Hoorn, TR de Back, DW Sommeijer, and L Vermeulen, “Clinical value of consensus molecular subtypes in colorectal cancer: a systematic review and meta-analysis,” J Natl Cancer Inst. 114(4), 503–516 (2022). [CrossRef]

2. S. Jain, J. Maque, A. Galoosian, A. Osuna-Garcia, and F. P. May, “Optimal strategies for colorectal cancer screening,” Curr. Treat. Options Oncol. 13, 1–20 (2022). [CrossRef]

3. N. S. Samel and H. Mashimo, “Application of OCT in the gastrointestinal tract,” Applied Sciences. 9(15), 2991 (2019). [CrossRef]

4. D. Huang, E. A. Swanson, C. P. Lin, J. S. Schuman, W. G. Stinson, W. Chang, M. R. Hee, T. Flotte, K. Gregory, and C. A. Puliafito, “Optical coherence tomography,” science 254(5035), 1178–1181 (1991). [CrossRef]

5. C.-L. Chen and R. K. Wang, “Optical coherence tomography based angiography,” Biomed. Opt. Express 8(2), 1056–1082 (2017). [CrossRef]

6. H. Pahlevaninezhad, M. Khorasaninejad, Y.-W. Huang, Z. Shi, L. P. Hariri, D. C. Adams, V. Ding, A. Zhu, C.-W. Qiu, and F. Capasso, “Nano-optic endoscope for high-resolution optical coherence tomography in vivo,” Nat. Photonics 12(9), 540–547 (2018). [CrossRef]

7. N. Katta, A. D. Estrada, A. B. McElroy, A. Gruslova, M. Oglesby, A. G. Cabe, M. D. Feldman, R. D. Fleming, A. J. Brenner, and T. E. Milner, “Laser brain cancer surgery in a xenograft model guided by optical coherence tomography,” Theranostics 9(12), 3555–3564 (2019). [CrossRef]

8. M. Zhu, W. Chang, L. Jing, Y. Fan, P. Liang, X. Zhang, G. Wang, and H. Liao, “Dual-modality optical diagnosis for precise in vivo identification of tumors in neurosurgery,” Theranostics 9(10), 2827–2842 (2019). [CrossRef]

9. M. J. Gora, L. Quénéhervé, R. W. Carruth, W. Lu, M. Rosenberg, J. S. Sauk, A. Fasano, G. Y. Lauwers, N. S. Nishioka, and G. J. Tearney, “Tethered capsule endomicroscopy for microscopic imaging of the esophagus, stomach, and duodenum without sedation in humans (with video),” Gastrointest endoscopy 88(5), 830–840.e3 (2018). [CrossRef]

10. F. van der Sommen, W. L. Curvers, and W. B. Nagengast, “Novel developments in endoscopic mucosal imaging,” Gastroenterology 154(7), 1876–1886 (2018). [CrossRef]

11. O. O. Ahsen, H.-C. Lee, K. Liang, Z. Wang, M. Figueiredo, Q. Huang, B. Potsaid, V. Jayaraman, J. G. Fujimoto, and H. Mashimo, “Ultrahigh-speed endoscopic optical coherence tomography and angiography enables delineation of lateral margins of endoscopic mucosal resection: a case report,” Therapeutic advances in gastroenterology 10(12), 931–936 (2019). [CrossRef]

12. X. Qi, M. V. Sivak, G. Isenberg, J. E. Willis, and A. M. Rollins, “Computer-aided diagnosis of dysplasia in Barrett’s esophagus using endoscopic optical coherence tomography,” J. Biomed. Opt. 11(4), 044010 (2016). [CrossRef]

13. T.-H. Tsai, C. L. Leggett, A. J. Trindade, A. Sethi, A.-F. Swager, V. Joshi, J. J. Bergman, H. Mashimo, N. S. Nishioka, and E. Namati, “Optical coherence tomography in gastroenterology: a review and future outlook,” J. biomedical optics 22(12), 1 (2017). [CrossRef]

14. T. H. Nguyen, O. O. Ahsen, K. Liang, J. Zhang, H. Mashimo, and J. G. Fujimoto, “Correction of circumferential and longitudinal motion distortion in high-speed catheter/endoscope-based optical coherence tomography,” Biomed. Opt. Express 12(1), 226–246 (2021). [CrossRef]

15. J. V. Migacz, I. Gorczynska, M. Azimipour, R. Jonnal, R. J. Zawadzki, and J. S. Werner, “Megahertz-rate optical coherence tomography angiography improves the contrast of the choriocapillaris and choroid in human retinal imaging,” Biomed. Opt. Express 10(1), 50–65 (2019). [CrossRef]

16. J. J. Rico-Jimenez, D. Hu, E. M. Tang, I. Oguz, and Y. K. Tao, “Real-time OCT image denoising using a self-fusion neural network,” Biomed. Opt. Express 13(3), 1398–1409 (2022). [CrossRef]

17. T. S. Kirtane and M. S. Wagh, “Endoscopic optical coherence tomography (OCT): advances in gastrointestinal imaging,” Gastroenterology Research and Practice 2014, 1–7 (2014). [CrossRef]

18. M. J. Gora, M. J. Suter, G. J. Tearney, and X. Li, “Endoscopic optical coherence tomography: technologies and clinical applications,” Biomed. Opt. Express 8(5), 2405–2444 (2017). [CrossRef]

19. W. A. Welge and J. K. Barton, “In vivo endoscopic Doppler optical coherence tomography imaging of the colon,” Lasers surgery medicine 49(3), 249–257 (2017). [CrossRef]

20. P. Panta, C.-W. Lu, P. Kumar, T.-S. Ho, S.-L. Huang, P. Kumar, C. Murali Krishna, K. Divakar Rao, and R. John, “Optical Coherence Tomography: Emerging In Vivo Optical Biopsy Technique For Oral Cancers (Springer, 2019),” pp. 217–237.

21. J. E. Freund, D. J. Faber, M. T. Bus, T. G. van Leeuwen, and D. M. de Bruin, “Grading upper tract urothelial carcinoma with the attenuation coefficient of in vivo optical coherence tomography,” Lasers Surg. Medicine 51, 399–406 (2019). [CrossRef]

22. L. P. Hariri, M. Mino-Kenudson, M. Lanuti, A. J. Miller, E. J. Mark, and M. J. Suter, “Diagnosing lung carcinomas with optical coherence tomography,” Annals of the American Thoracic Society 12(2), 193–201 (2015). [CrossRef]

23. X. Zeng, X. Zhang, C. Li, X. Wang, J. Jerwick, T. Xu, Y. Ning, Y. Wang, L. Zhang, and Z. Zhang, “Ultrahigh resolution optical coherence microscopy accurately classifies precancerous and cancerous human cervix free of labeling,” Theranostics 8(11), 3099–3110 (2018). [CrossRef]

24. E. Zagaynova, N. Gladkova, N. Shakhova, G. Gelikonov, and V. Gelikonov, “Endoscopic OCT with forward looking probe: clinical studies in urology and gastroenterology,” J. Biophotonics 1(2), 114–128 (2008). [CrossRef]

25. D. C. Adler, C. Zhou, T.-H. Tsai, J. Schmitt, Q. Huang, H. Mashimo, and J. G. Fujimoto, “Three-dimensional endomicroscopy of the human colon using optical coherence tomography,” Opt. Express 17(2), 784–796 (2009). [CrossRef]

26. Y. Li, Z. Zhu, J. J. Chen, J. C. Jing, C.-H. Sun, S. Kim, P.-S. Chung, and Z. Chen, “Multimodal endoscopy for colorectal cancer detection by optical coherence tomography and near-infrared fluorescence imaging,” Biomed. Opt. Express 10(5), 2419–2429 (2019). [CrossRef]

27. J. Mavadia-Shukla, P. Fathi, W. Liang, S. Wu, C. Sears, and X. Li, “High-speed, ultrahigh-resolution distal scanning OCT endoscopy at 800 nm for in vivo imaging of colon tumorigenesis on murine models,” Biomed. Opt. Express 9(8), 3731–3739 (2018). [CrossRef]

28. Y. Chen, A. D. Aguirre, P.-L. Hsiung, S.-W. Huang, H. Mashimo, J. M. Schmitt, and J. G. Fujimoto, “Effects of axial resolution improvement on optical coherence tomography (OCT) imaging of gastrointestinal tissues,” Opt. Express 16(4), 2469–2485 (2008). [CrossRef]

29. C. Demir and B. Yener, “Automated cancer diagnosis based on histopathological images: a systematic survey,” Rensselaer Polytech. Institute, Tech. Rep (2005).

30. D. Li, J. Wu, Y. He, X. Yao, W. Yuan, D. Chen, H.-C. Park, S. Yu, J. L. Prince, and X. Li, “Parallel deep neural networks for endoscopic OCT image segmentation,” Biomed. Opt. Express 10(3), 1126–1135 (2019). [CrossRef]

31. C. Wang and M. Gan, “Tissue self-attention network for the segmentation of optical coherence tomography images on the esophagus,” Biomed. Opt. Express 12(5), 2631–2646 (2021). [CrossRef]

32. C. Wang, M. Gan, M. Zhang, and D. Li, “Adversarial convolutional network for esophageal tissue segmentation on OCT images,” Biomed. Opt. Express 11(6), 3095–3110 (2020). [CrossRef]

33. Z. Yang, S. Soltanian-Zadeh, K. K. Chu, H. Zhang, L. Moussa, A. E. Watts, N. J. Shaheen, A. Wax, and S. Farsiu, “Connectivity-based deep learning approach for segmentation of the epithelium in in vivo human esophageal OCT images,” Biomed. Opt. Express 12(10), 6326–6340 (2021). [CrossRef]

34. Y. Zeng, S. Xu, W. C. Chapman, S. Li, Z. Alipour, H. Abdelal, D. Chatterjee, M. Mutch, and Q. Zhu, “Real-time colorectal cancer diagnosis using PR-OCT with deep learning,” in Optical Coherence Tomography, (Optical Society of America), p. OW2E. 5.

35. R. Ge, H. Cai, X. Yuan, F. Qin, Y. Huang, P. Wang, and L. Lyu, “Md-unet: multi-input dilated U-shape neural network for segmentation of bladder cancer,” Comput. Biol. Chem. 93, 107510 (2021). [CrossRef]

36. H. Fu, J. Cheng, Y. Xu, D. W. K. Wong, J. Liu, and X. Cao, “Joint optic disc and cup segmentation based on multi-label deep network and polar transformation,” IEEE transactions on medical imaging 37(7), 1597–1605 (2018). [CrossRef]

37. M. El Adoui, S. Drisis, and M. Benjelloun, “Multi-input deep learning architecture for predicting breast tumor response to chemotherapy using quantitative MR images,” Int. journal computer assisted radiology surgery 15(9), 1491–1500 (2020). [CrossRef]

38. X. Fu, N. Cai, K. Huang, H. Wang, P. Wang, C. Liu, and H. Wang, “M-net: a novel U-net with multi-stream feature fusion and multi-scale dilated convolutions for bile ducts and hepatolith segmentation,” IEEE Access 7, 148645–148657 (2019). [CrossRef]

39. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141.

40. D. Misra, T. Nalamada, A. U. Arasanipalai, and Q. Hou, “Rotate to attend: convolutional triplet attention module,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3139–3148.

41. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), pp. 3–19.

42. Z. Xu, W. Zhang, T. Zhang, and J. Li, “HRCnet: high-resolution context extraction network for semantic segmentation of remote sensing images,” Remote. Sens. 13(1), 71 (2021). [CrossRef]

43. R. Zhao, B. Qian, X. Zhang, Y. Li, R. Wei, Y. Liu, and Y. Pan, “Rethinking dice loss for medical image segmentation,” in 2020 IEEE International Conference on Data Mining (ICDM), (IEEE), pp. 851–860.

44. J. Lyu, L. Ren, Q.-Y. Liu, Y. Wang, Z.-Q. Zhou, Y.-Y. Chen, H.-B. Jia, Y.-G. Tang, and M. Li, “Swept-source endoscopic optical coherence tomography real-time imaging system based on GPU acceleration for axial megahertz high-speed scanning,” European Review for Medical and Pharmacological Sciences. 26, 7349–7358 (2022). [CrossRef]

45. J. Lyu, L. Ren, Q. Liu, Y. Wang, Z. Zhou, Y. Chen, H. Jia, Y. Tang, and M. Li, “Dataset for train, validation and test,” figshare (2023), https://doi.org/10.6084/m9.figshare.23266754.

46. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014). [CrossRef]

47. A. Bokhovkin and E. Burnaev, “Boundary loss for remote sensing imagery semantic segmentation,” in International Symposium on Neural Networks, (Springer), pp. 388–401.

48. J. Lyu, L. Ren, Q. Liu, Y. Wang, Z. Zhou, Y. Chen, H. Jia, Y. Tang, and M. Li, “Code.zip,” figshare2023, https://doi.org/10.6084/m9.figshare.23266292.

Loss Function	Model	DSC	Surface DSC	HD(μm)
Dice Loss	Unet	0.9746 ± 0.0004	0.8766 ± 0.0019	31.0570 ± 2.5157
	TSA-Net	0.9765 ± 0.0006	0.8852 ± 0.0035	30.0160 ± 2.4854
	PDTANet	0.9794 ± 0.0006	0.8986 ± 0.0035	27.8777 ± 2.6347
Hybrid Loss	Unet	0.9770 ± 0.0008	0.8896 ± 0.0087	27.2106 ± 5.0354
	TSA-Net	0.9780 ± 0.0007	0.8967 ± 0.0019	24.6224 ± 1.4280
	PDTANet	0.9803 ± 0.0007	0.9079 ± 0.0025	24.0590 ± 1.5926

Model	Training time (hours)	Prediction time (ms)	Number of parameters
UNet	12	100	2.3 million
TSA-Net	10	50	3.6 million
Modified SegNet	8	20	1.5 million
Focus Net	6	10	3.4 million
PDATNet	7	5	2.1 million

Metric	Value
TP(True Positive)	100
FP(False Positive)	5
TN(True Negative)	800
FN(False Negative)	100

Metric	Value
Sensitivity	100%
Specificity	99.5%
Precision	98.5%
IOU	98.5%
Dice coefficient	97.22%

Loss Function	Model	DSC	Surface DSC	HD(μm)
Dice Loss	Unet	0.9746 ± 0.0004	0.8766 ± 0.0019	31.0570 ± 2.5157
	TSA-Net	0.9765 ± 0.0006	0.8852 ± 0.0035	30.0160 ± 2.4854
	PDTANet	0.9794 ± 0.0006	0.8986 ± 0.0035	27.8777 ± 2.6347
Hybrid Loss	Unet	0.9770 ± 0.0008	0.8896 ± 0.0087	27.2106 ± 5.0354
	TSA-Net	0.9780 ± 0.0007	0.8967 ± 0.0019	24.6224 ± 1.4280
	PDTANet	0.9803 ± 0.0007	0.9079 ± 0.0025	24.0590 ± 1.5926

PDTANet: a context-guided and attention-aware deep learning method for tumor segmentation of guinea pig colorectal OCT images

Abstract

1. Introduction

2. Proposed method

2.1 PDTANet architecture

2.2 Encoding network

2.2.1 Parallel-dilated feature fusion (PDFF) module

2.2.2 Encoder based on PDFF module

2.3 Decoding network

2.3.1 Triplet attention (TA) module

2.3.2 Decoder based on TA

3. Weighted hybrid loss function

4. Experiments

4.1 Datasets

4.2 Implementation details

4.3 Evaluation metrics

4.4 Segmentation performance

4.4.1 Implement segmentation task

4.4.2 Comparison with alternative methods

4.4.3 Comparison with different loss functions

4.4.4 Comparison with alternative models

4.4.5 Attention module ablation research

4.4.6 Metric evaluation results

5. Discussion and conclusion

Funding

Acknowledgment

Disclosures

Data availability

Supplemental document

References

Supplementary Material (4)

Data availability

Cited By

Figures (11)

Tables (4)

Equations (9)

Optics Continuum

Name	Description
Code 1	Code for different models
Dataset 1	Dataset for train, validation and test
Supplement 1	Code and Data Repeatability Description
Visualization 1	Video for Fig11