Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Feasibility of the soft attention-based models for automatic segmentation of OCT kidney images

Open Access Open Access

Abstract

Clinically, optical coherence tomography (OCT) has been utilized to obtain the images of the kidney’s proximal convoluted tubules (PCTs), which can be used to quantify the morphometric parameters such as tubular density and diameter. Such parameters are useful for evaluating the status of the donor kidney for transplant. Quantifying PCTs from OCT images by human readers is a time-consuming and tedious process. Despite the fact that conventional deep learning models such as conventional neural networks (CNNs) have achieved great success in the automatic segmentation of kidney OCT images, gaps remain regarding the segmentation accuracy and reliability. Attention-based deep learning model has benefits over regular CNNs as it is intended to focus on the relevant part of the image and extract features for those regions. This paper aims at developing an Attention-based UNET model for automatic image analysis, pattern recognition, and segmentation of kidney OCT images. We evaluated five methods including the Residual-Attention-UNET, Attention-UNET, standard UNET, Residual UNET, and fully convolutional neural network using 14403 OCT images from 169 transplant kidneys for training and testing. Our results show that Residual-Attention-UNET outperformed the other four methods in segmentation by showing the highest values of all the six metrics including dice score (0.81 ± 0.01), intersection over union (IOU, 0.83 ± 0.02), specificity (0.84 ± 0.02), recall (0.82 ± 0.03), precision (0.81 ± 0.01), and accuracy (0.98 ± 0.08). Our results also show that the performance of the Residual-Attention-UNET is equivalent to the human manual segmentation (dice score = 0.84 ± 0.05). Residual-Attention-UNET and Attention-UNET also demonstrated good performance when trained on a small dataset (3456 images) whereas the performance of the other three methods dropped dramatically. In conclusion, our results suggested that the soft Attention-based models and specifically Residual-Attention-UNET are powerful and reliable methods for tubule lumen identification and segmentation and can help clinical evaluation of transplant kidney viability as fast and accurate as possible.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Optical coherence tomography (OCT) is a non-invasive method to construct high-resolution optical cross-section images of the superficial kidney cortex in situ and in real-time [15]. By utilizing the information obtained from OCT images, it is possible to analyze in which cases a donor kidney has a higher likelihood of success to be transplanted. It was shown that OCT is not only able to provide histopathological information of human donor kidneys harvested for transplantation, but also has the potential to predict the post-transplant function of these donor kidneys non-invasively [1,2,6]. In kidney transplantation, it is important to predict the transplant organ’s survival (also known as graft failure risk) especially with more at-risk kidneys [7]. There is a critical need to enhance prognostic measures and to explore new ways of gaining insights into the viability of these kidneys. Computer-aided design (CAD) has been extensively used to automatically analyze the region of interest (ROI) in OCT images [810]. Medical image segmentation through deep learning has shown advantages over conventional machine learning algorithms, including higher accuracy and reliability, more efficient GPU-based computation, and lower power consumption [1114]. These improvements are crucial for medical image segmentation especially if the final goal is to use the predicted results as a criterion to accept or reject a donor kidney to be transplanted, or in the ophthalmic and laser surgery field, for real-time surgery guidance [15].

Li et. al, in 2009 [16], developed an image processing tool for automatic selection of individual region-of-interest (ROI) and quantification of the size of the hollow structures in the kidney, including renal tubules, glomeruli, and vessels. However, the proposed method could not automatically differentiate different kidney structures. In 2014, tubule size and shape of the kidney along with density and uniformity of 29 patients pre- and post-transplantation were measured [17]. Aside from a small number of patients and the inability to reconstruct 3D images to quantify the blood flow due to motion artifacts, the tubules data are extracted manually which is considerably too slow to process when advising a surgeon on the time-sensitive decision to accept or reject a kidney for transplant. In 2017, an automated CAD based on a convolutional neural network (CNN) was presented to automatically detect and quantify tubular diameter and hypertrophic tubule population from OCT images of rats [2].

Supervised learning algorithms, such as random forest and support vector machines, need a huge computational power and are not sufficiently reliable and accurate when image patterns that have varying shapes and sizes need to be extracted [18]. A Decision tree model has been recently developed in MATLAB by Konkel [6] for automatically selecting tubule lumen on the OCT kidney images and manually segmenting the tubule lumens by 4 different experts. Although the manual segmentation showed a 0.835 average similarity coefficient with the ground truth, redundant features might cause overfitting for the model when large imaging data need to be evaluated [6]. CNNs have dramatically improved the performance of segmentation tasks by taking advantages of fast inference and extracting deep features automatically during training. Fully convolutional networks (FCNs) and U-NET as two common CNN architectures, have been broadly applied to medical image segmentation tasks such as cardiac MR segmentation and cancerous lung nodule detection [19]. However, these models use multi-stage cascaded CNNs to extract both local and global deep features on target structures with dissimilar shapes and sizes. Extracting low-level features within the cascade lead to excessive use of model parameters and power resources [14,20].

Low contrast and blurry images due to speckle noise during OCT scans makes the segmentation task difficult to accurately identify the ROI’s edges [21]. To address this problem and enhance the contrast of medical images (e.g., retina OCT images), multiple researchers have shown that Contrast limited adaptive histogram equalization (CLAHE) is able to enhance the contrast of the grayscale images and ends up with a better edge detection [2224]. CLAHE as an advanced version of adaptive histogram equalization is based on dividing the images into multiple patches and equalizing the histogram for each patch. Unlike many work on retina OCT image enhancement, to the best of our knowledge, up to now, there is no work to assess the performance of CLAHE on kidney OCT images.

Attention-UNET has been applied to pancreas segmentation from CT-SCAN images by Oktay et al. [14]. For pancreas segmentation, Attention-UNET outperformed standard UNET in terms of dice score of 0.84 versus 0.81 for both large and small image datasets. Attention-UNET has also been applied to the enface view of fundus images [25]. Guo et al. [25] recently proposed a spatial Attention-UNET network for retinal vascular segmentation. They showed that even with a small number of training images (20 in a specific dataset), Attention-UNET can provide an accuracy up to 0.97 and a true positive rate (sensitivity) up to 0.85, which was better than other CNNs. On the other side, it is already shown that for the same tubule lumen segmentation problem, soft-Attention-UNET when fed by CLAHE contrast enhanced images, can provide higher segmentation performance than other CNNs (UNET, Res-UNET, FCN) [26]. Inspired by the successful outcome of Attention-UNET for image segmentation and in particular tubule lumen detection on OCT kidney images, this paper proposes a soft Attention-based Residual-UNET model to improve the performance metrics of the tubule lumen segmentation on kidney images collected by OCT. Soft Attention gating contains spatial information in the lowest layer of the network and can assign different weights on the desired images. The extracted coefficients can improve the segmentation results by reducing the training time while using lower computational power and getting better results than conventional CNNs. In this study, we investigated the feasibility of Residual Attention-UNET for kidney tubules segmentation. From the segmented images, we quantified the morphometric parameters such as average tubular density and average diameter, which can be used to predict the post-transplant function of the kidney. Our previous work [6] showed that in the expanded criteria donor (ECD) kidneys (or marginal kidneys), increased tubular lumen diameter could predict delayed graft function (DGF) prior to implantation.

The majority of pixels in the kidney images are black pixels which might experience excessive enhancement effect that could distort the image’s overall visibility and false tubule lumen detection. This problem was also be found by other researchers [2224, 26,27], suggesting that the accuracy of machine learning recognition is closely related to the image quality. Therefore, further pre-processing steps are required. By combining CLAHE and OTSU thresholding, each kidney image was divided into smaller subsections, and the gray levels in each region were normalized (flattened) and automatically set by an optimal threshold to detect edges. Thus, we applied the upgraded CLAHE algorithm to improve the contrast and enhance the edges in kidney images. We compared the performance of Residual-Attention-UNET against other CNNs methods as well as manual segmentation using 14403 OCT images from 169 transplant kidneys for training and testing.

2. System setup, image preprocessing, and model structure

The OCT setup for data collection was illustrated in [6]. 169 donor kidneys were imaged with a 1325 nm center wavelength spectral-domain OCT imaging system (Telesto-II, Thorlabs Inc.), with an incident power of 2.5 mW. The Telesto OCT system was equipped with a hand-held probe with a 36 mm focal length (LSM03, Thorlabs Inc.) objective, providing a lateral resolution of 13 µm and an axial resolution of 5.5 µm in air. Scans were captured at a rate of 28 kHz, with a sensitivity of 103 dB. A-scans were averaged by 2 and no B-scan averaging was applied. B-scan settings were optimized to minimize file storage size while providing a sufficient field of view (FOV) and resolution for analysis. Parameters included a FOV of 4.9 mm in x-axis and 1.9 mm in z-axis (after adjusting for a refractive index of 1.3) at a scale of approximately 2.73 µm/pixel in each dimension. The general block diagram including the processing and training is shown in Fig. 1. The original dataset contains 14403 images with an image size of 1571*539 pixels (Fig. 2(a)). Image processing steps include: 1) CLAHE contrast enhancement with OTSU thresholding to set an optimum threshold and to improve quality of images (Fig. 2(b)), 2) Canny edge detection and dilation to detect the edges and to remove small outlier pixels (Fig. 2(c) and 2(d)), and 3) Contour detection to join all the continuous points along a boundary with the same intensity together and to extract the ROI (Fig. 2(e)). Then, the images were cropped to 1536*256 as shown in Fig. 2(f). To reduce the effects of data imbalance in the results, the following procedure has been applied to the processed images: The width of each image has been cropped to 6 patches of 256*256 pixels images (86418 in total) that had no overlap between each other. Among these 86418 images, 84870 images were detected with tubule lumen (positive pixels) and 1548 images were detected with no tubule lumen (negative pixels). Data augmentation (affine transformations, axial flips, random crops) has been done to reduce the negative pixels (not containing the ROI). Afterward, these images were divided into three groups: 80% for training (69138 images), 15% for validation (12960 images), and 5% for testing (4320 images). Figure 1 summarizes the detailed procedure.

 figure: Fig. 1.

Fig. 1. Block diagram for automatic kidney tubules segmentation. The ground truth dataset was created manually in ImageJ by 4 trained raters.

Download Full Size | PDF

 figure: Fig. 2.

Fig. 2. Image preprocessing steps on kidney images. (a) Original OCT image of the donor kidney, (b) CLAHE with OTSU thresholding to enhance the contrast. (c) Canny edge detection with lower and upper gray-level thresholds of 125 and 375, respectively, (d) Dilated image to remove outlier pixels, (e) Contour detection to detect the borders, (f) The enhanced-cropped image. All units are in pixel index with 2.73 µm/pixel.

Download Full Size | PDF

The proposed Residual-Attention-UNET was developed from the scratch as shown in Fig. 3. To avoid memory crashing and extract multiscale features as deep as possible, images with the size of 256*256 pixels were fed into the network in the format of image height, image width, and channel dimension respectively and then set the spatial channel to be first. The image was normalized before going through residual blocks.

 figure: Fig. 3.

Fig. 3. Block diagram of the proposed Residual Attention-UNET segmentation model. The bridge (block 4) has the same structure as the up-sampling blocks.

Download Full Size | PDF

Each residual block contains: a convolutional layer with a filter size of 3×3, batch normalization and a Rectified Linear Unit (Relu) activation layer. Also, the Downsampling blocks contains strides of 2 to reduce spatial dimensions by half in each stage. The architecture for blocks used in the network are shown on Fig. 3. The input feature map of x passed through the residual block resulting in F(x). On the other hand, x passed through a 1×1 convolutional layer with stride = (2,2) to get same dimension as F(x) so that they can be added at the output of the block. The spatial information (x + F(x)) transferred via skip connection to the corresponding Upsampling block. This skip connection acts as one of the Attention gating inputs as well. Another input (gating signal) was taken from the next lowest layer of the network and before the concatenation operation to merge only relevant activations.

The Residual-Attention-UNET was created with hyperparameters of 23 convolutional layers, 768 neurons, kernel size of 3 by 3, 50 epochs, batch size of 32 and learning rate of 1e-4. The Sorensen-Dice loss discussed in [27] was used to train all models as it was shown as a powerful function for semantic segmentation of an imbalance dataset. To assess the segmentation performance, besides accuracy, precision, recall and specificity, the metrics including Dice coefficient and Intersection Over Union (IOU) have been used. IOU and Dice coefficient were defined as:

$$IOU({X,Y} )= \frac{{X \cap Y}}{{X \cup \; Y}}\; $$
$$Dice({X,Y} )= \frac{{2 \times ({X \cap Y} )\; }}{{X + Y\; }}\; \; $$

Where X denotes the predicted value of the tubule lumen segmented regions and Y represents the ground truth or actual segmented tubule lumen regions.

In this study, two Attention-based mechanisms including Residual-Attention-UNET and Attention-UNET and three non-Attention models including UNET, Res-UNET and FCN were trained for comparison purposes (Fig. 4). All models were built in Python 3.8, TensorFlow 2.6, RTX 3090 GPU with 24 GB memory, and CUDA 11.4. The enhanced image and associated ground truth were divided into 6 patches and then fed into each of 5 deep CNNs separately. The dice loss (1-Dice (X, Y)) for the predicted image was calculated with respect to the corresponding ground truth and the weights were updating until the minimum dice loss was achieved. Soft Attention gating mechanism can be utilized with the standard backpropagation algorithm to update the weights in the Attention-based models [14].

 figure: Fig. 4.

Fig. 4. Training process schematic. FN: False Positive, FP: False Negative, Overlay: Intersection between the predicted and the ground truth. All units are in pixel index with 2.73 µm/pixel.

Download Full Size | PDF

3. Results

Table 1 shows the comparative performance results of the 5 trained models before and after applying CLAHE. To have a fair comparison, all models were trained using the same hyperparameters (23 convolutional layers, 768 neurons, kernel size of 3 by 3, 50 epochs, batch size of 32 and learning rate of 1e-4) on GPU with ADAM optimizer, batch-normalization, and standard data-augmentation techniques. Training results shown that Attention-based models outperformed all 3 non-Attention-based networks before and after image processing with CLAHE. The dice score, IOU, precision, recall and specificity for Residual-Attention-UNET (mean ± SD) was 0.81 ± 0.01, 0.83 ± 0.02, 0.81 ± 0.01, 0.82 ± 0.03 and 0.84 ± 0.02, respectively, which was the highest among all 5 models. The Attention-UNET and Residual-Attention-UNET also needed 21 and 26 epochs respectively to reach the maximum performance, which means Attention-based models can achieve better segmentation results when processed with CLAHE in a relatively shorter time than the other models. The proposed Residual Attention-UNET can reach Fig. 5 shows the prediction results on the test set. The higher dice and IOU scores are associated with lower false positive (FP) and false negative (FN) predicted by the models. Table 1 and Fig. 5 have shown that FCN cannot provide acceptable results for this semantic segmentation problem and suggested that FCN is not able to capture the complexity of the data.

Tables Icon

Table 1. Segmentation results for 5 trained models before and after CLAHE

 figure: Fig. 5.

Fig. 5. Tubule lumen prediction on a random representative test image and its associated ground truth (a, b). Residual Attention-UNET prediction and its overlay with the ground truth (c, d), Attention-UNET prediction and its overlay with the ground truth (e, f), Res-UNET prediction and its overlay with the ground truth (g, h), UNET prediction and its overlay with the ground truth (i, j), and (e) FCN prediction and its overlay with the ground truth (k, l). The Red, Green and Yellow arrows show FN, FP and overlay predictions, respectively. Residual Attention-UNET predicted the tubule lumen with the highest similarity with the ground truth (Mean Dice = 0.81). All units are in pixel index with 2.73 µm/pixel.

Download Full Size | PDF

To check the model’s performance on the smaller training set, the same models were trained with fewer training images (9 subjects =720 B-scan x 6 patches/B-scan = 4320 image patches). From this, we used 3456 images (80%) for training and 864 images (20%) for validation. Table 2 shows the segmentation results on the small training set. As expected, with the smaller training set, the segmentation performance for all models dropped but this rate was smaller for the proposed model compared to others. In this situation, the Residual Attention-UNET achieved 0.93 ± 0.11 accuracy, 0.78 ± 0.01 dice score, 0.78 ± 0.03 IOU, 0.77 ± 0.12 precision, 0.79 ± 0.10 recall and 0.81 ± 0.08 specificity.

Tables Icon

Table 2. Segmentation performance on the smaller dataset (3456 images (80%) for training and 864 images (20%) for validation)

Figure 6 shows the training results for all 5 models. Residual-Attention-UNET can achieve the maximum dice score of 0.8179 after 25 epochs (Fig. 6(a)). Consequently, it produced the lowest dice loss of 0.1821 in this epoch (Fig. 6(c)). The boxplot of dice scores for all 5 developed models is shown in Fig. 7. As shown in Fig. 7, there is no significant difference (p > 0.001) among the dice scores of Residual-Attention-UNET, Attention-UNET and Res-UNET. Also, the mean dice score for Residual-Attention-UNET was significantly higher (p < 0.001) than the regular UNET and FCN. On the other side, as models were training, Attention-based models learnt to focus on target structures which resulted in lower epochs for convergence (Residual-Attention-UNET started to converge after 4 epochs). Considering the dice score measured by the manual segmentation algorithm discussed in [6] as gold standard (0.835 ± 0.05), the Residual-Attention-UNET can segment tubule lumen with the highest similarity (dice = 0.81 ± 0.01). Also, the proposed Residual-Attention-UNET model in this study outperformed the UNET-based segmentation model discussed in [20] with dice score of 0.81 ± 0.01 versus 0.7.

 figure: Fig. 6.

Fig. 6. Learning curves of all the trained models with CLAHE: (a) Training Dice Coefficient, (b) Validation Dice Coefficient, (c) Sorensen-Dice loss profile. Maximum dice coefficient for the proposed Residual-Attention-UNET was 0.8179 and the Dice loss calculated as 0.1821 in epochs 25.

Download Full Size | PDF

 figure: Fig. 7.

Fig. 7. Performance of the developed models against manual segmentation [6] and UNET model developed in [20]. All models developed on the same dataset with 14403 kidney images and size of 1571*539 pixels.

Download Full Size | PDF

4. Discussion and conclusion

This paper investigated the feasibility of Residual-Attention-UNET to segment tubule lumen regions of kidney OCT images. Although there are a few published papers using other deep learning models for OCT kidney segmentation [2,20], to the best of our knowledge, this paper is the first work to utilize Attention-based mechanism and in particular Residual-Attention-UNET for automatic segmentation of OCT kidney images. Five deep learning models were developed to segment tubule lumen regions on the kidney images. From the data provided in Table 2, Attention-based mechanisms can successfully improve the performance metrics compared to the non-Attention-based models in terms of reducing training time (∼2 times), improving IOU (∼2 times), dice scores (∼1.5 times) and helping the model to find the relevant patterns based on the ground truth images, therefore satisfies for faster convergence. Residual-Attention-UNET achieved the best dice score and IOU (0.82 and 0.85) which was the nearest values to the manual segmentation results (dice score = 0.835). The prediction time for each image in the test set (test set size = 4320 images) for Residual Attention-UNET, Attention-UNET, Res-UNET, UNET and FCN was 0.002s, 0.008s, 0.0015s, 0.0150s, and 0.1s, respectively. This suggests that Residual Attention-UNET even can be used for real-time predictions with higher performance.

Furthermore, this paper successfully showed that the Attention-based models (Attention-UNET and Residual-Attention-UNET) can achieve promising results for lumen segmentation as they tend to focus on target structures and suppress feature maps for irrelevant ROIs so that it can significantly reduce training time for large imaging data. Data imbalanced (high ratio of negative pixels with non-ROI regions) made it difficult to get accurate and reliable results when quantifying kidney patterns and, in many cases, led to overfitting of the model or taking lots of time to train the model. Although data augmentation and patching techniques undermined the negative effects of data imbalanced in the prediction results, the cumulative area of ROI (tubule lumen) was still smaller than non-ROI regions in each patch. Future work on this project can be testing the proposed model on OCT images of other tissues (e.g., retina OCT) and assess its capability for ROI segmentation (e.g., retina layers) and disease classification.

Funding

UMass Interdisciplinary Faculty Research Award; National Center for Advancing Translational Sciences; National Institutes of Health (UL1TR001453).

Acknowledgments

M. Moradi thanks the support from the UMass Dean’s First Year Fellowship.

Disclosures

YC: Prebeo, LLC (I, P).

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request. Codes underlying the results presented in this paper are available in Ref. [28].

References

1. P. M. Andrews, Y. Chen, M. L. Onozato, S.-W. Huang, D. C. Adler, R. A. Huber, J. Jiang, S. E. Barry, A. E. Cable, and J. G. Fujimoto, “High-resolution optical coherence tomography imaging of the living kidney,” Lab. Invest. 88(4), 441–449 (2008). [CrossRef]  

2. B. Wang, H.-W. Wang, H. Guo, E. Anderson, Q. Tang, T. T. Wu, R. Falola, T. Smith, P. M. Andrews, and Y. Chen, “Optical coherence tomography and computer-aided diagnosis of a murine model of chronic kidney disease,” J. Biomed. Opt. 22(12), 1 (2017). [CrossRef]  

3. Y. Chen, P. M. Andrews, A. D. Aguirre, J. M. Schmitt, and J. G. Fujimoto, “High-resolution three-dimensional optical coherence tomography imaging of kidney microanatomy ex vivo,” J. Biomed. Opt. 12(3), 034008 (2007). [CrossRef]  

4. Y. Fang, W. Gong, J. Li, W. Li, J. Tan, S. Xie, and Z. Huang, “Toward image quality assessment in optical coherence tomography (OCT) of rat kidney,” Photodiagn. Photodyn. Ther. 32, 101983 (2020). [CrossRef]  

5. C. Wang, P. Calle, N. B. T. Ton, Z. Zhang, F. Yan, A. M. Donaldson, N. A. Bradley, Z. Yu, K.-m. Fung, and C. Pan, “Deep-learning-aided forward optical coherence tomography endoscope for percutaneous nephrostomy guidance,” Biomed. Opt. Express 12(4), 2404–2418 (2021). [CrossRef]  

6. B. Konkel, C. Lavin, T. T. Wu, E. Anderson, A. Iwamoto, H. Rashid, B. Gaitian, J. Boone, M. Cooper, P. Abrams, A. Gilbert, Q. Tang, M. Levi, J. G. Fujimoto, P. Andrews, and Y. Chen, “Fully automated analysis of OCT imaging of human kidneys for prediction of post-transplant function,” Biomed. Opt. Express 10(4), 1794–1821 (2019). [CrossRef]  

7. G. Ligabue, F. Pollastri, F. Fontana, M. Leonelli, L. Furci, S. Giovanella, G. Alfano, G. Cappelli, F. Testa, and F. Bolelli, “Evaluation of the classification accuracy of the kidney biopsy direct immunofluorescence through convolutional neural networks,” Clin. J. Am. Soc. Nephrol. 15(10), 1445–1454 (2020). [CrossRef]  

8. X. Qi, Y. Pan, M. V. Sivak, J. E. Willis, G. Isenberg, and A. M. Rollins, “Image analysis for classification of dysplasia in Barrett’s esophagus using endoscopic optical coherence tomography,” Biomed. Opt. Express 1(3), 825–847 (2010). [CrossRef]  

9. X. Qi, M. V. Sivak Jr, G. Isenberg, J. Willis, and A. M. Rollins, “Computer-aided diagnosis of dysplasia in Barrett's esophagus using endoscopic optical coherence tomography,” J. Biomed. Opt. 11(4), 044010 (2006). [CrossRef]  

10. W. Kang, X. Qi, N. J. Tresser, M. Kareta, J. L. Belinson, and A. M. Rollins, “Diagnostic efficacy of computer extracted image features in optical coherence tomography of the precancerous cervix,” Med. Phys. 38(1), 107–113 (2010). [CrossRef]  

11. M. H. Hesamian, W. Jia, X. He, and P. Kennedy, “Deep learning techniques for medical image segmentation: achievements and challenges,” J Digit Imaging 32(4), 582–596 (2019). [CrossRef]  

12. R. Brehar, D.-A. Mitrea, F. Vancea, T. Marita, S. Nedevschi, M. Lupsor-Platon, M. Rotaru, and R. I. Badea, “Comparison of deep-learning and conventional machine-learning methods for the automatic recognition of the hepatocellular carcinoma areas from ultrasound images,” Sensors 20(11), 3085 (2020). [CrossRef]  

13. S. Devunooru, A. Alsadoon, P. Chandana, and A. Beg, “Deep learning neural networks for medical image segmentation of brain tumours for diagnosis: a recent review and taxonomy,” J Ambient Intell Human Comput 12(1), 455–483 (2021). [CrossRef]  

14. O. Oktay, J. Schlemper, L.L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N.Y. Hammerla, and B. Kainz, “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.

15. M. Sommersperger, J. Weiss, M. A. Nasseri, P. Gehlbach, I. Iordachita, and N. Navab, “Real-time tool to layer distance estimation for robotic subretinal injection using intraoperative 4D OCT,” Biomed. Opt. Express 12(2), 1085–1104 (2021). [CrossRef]  

16. Q. Li, M. L. Onozato, P. M. Andrews, C.-W. Chen, A. Paek, R. Naphas, S. Yuan, J. Jiang, A. Cable, and Y. Chen, “Automated quantification of microstructural dimensions of the human kidney using optical coherence tomography (OCT),” Opt. Express 17(18), 16000–16016 (2009). [CrossRef]  

17. P. M. Andrews, H.-W. Wang, J. Wierwille, W. Gong, J. Verbesey, M. Cooper, and Y. Chen, “Optical coherence tomography of the living human kidney,” J. Innov. Opt. Health Sci. 07(02), 1350064 (2014). [CrossRef]  

18. I. R. I. Haque and J. Neubert, “Deep learning approaches to biomedical image segmentation,” Informatics in Medicine Unlocked 18, 100297 (2020). [CrossRef]  

19. F. Liao, M. Liang, Z. Li, X. Hu, and S. Song, “Evaluate the malignancy of pulmonary nodules using the 3-D deep leaky noisy-or network,” IEEE Trans. Neural Netw. Learning Syst. 30(11), 3484–3495 (2019). [CrossRef]  

20. X. Qin, B. Wang, D. Boegner, B. Gaitan, Y. Zheng, X. Du, and Y. Chen, “Indoor localization of hand-held OCT probe using visual odometry and real-time segmentation using deep learning,” IEEE Trans. Biomed. Eng. (2021), in press.

21. Q. Yan, B. Chen, Y. Hu, J. Cheng, Y. Gong, J. Yang, J. Liu, and Y. Zhao, “Speckle reduction of OCT via super resolution reconstruction and its application on retinal layer segmentation,” Artificial intelligence in medicine 106, 101871 (2020). [CrossRef]  

22. A. Rajani, P. Kora, K. R. Madhavi, and J. Avanija, “Quality Improvement of Retinal Optical Coherence Tomography,” in 2021 2nd International Conference for Emerging Technology (INCET) (IEEE, 2021), pp. 1-5

23. S. S. M. Sheet, T.-S. Tan, M. As’ari, W. H. W. Hitam, and J. S. Sia, “Retinal disease identification using upgraded CLAHE filter and transfer convolution neural network,” ICT Express, 2021.

24. A. Tayal, J. Gupta, A. Solanki, K. Bisht, A. Nayyar, and M. Masud, “DL-CNN-based approach with image processing techniques for diagnosis of retinal diseases,” Multimedia Systems (2021), p. 1-22.

25. C. Guo, M. Szemenyei, Y. Yi, W. Wang, B. Chen, and C. Fan, “Sa-unet: Spatial attention u-net for retinal vessel segmentation,” in 2020 25th International Conference on Pattern Recognition (ICPR). 2021. IEEE.

26. M. Moradi, X. d., and Y. Chen, “Soft attention-based U-NET for automatic segmentation of OCT kidney images,” in SPIE West Conference2022.

27. H. R. Roth, L. Lu, N. Lay, A. P. Harrison, A. Farag, A. Sohn, and R. M. Summers, “Spatial aggregation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation,” Med. Image Anal. 45, 94–107 (2018). [CrossRef]  

28. M. Moradi, “Kidney_segmentation_Residual_Attention_UNET,” Github, 2018, https://github.com/Mousamoradi/Kidney_segmentation_Residual_Attention_UNET.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request. Codes underlying the results presented in this paper are available in Ref. [28].

28. M. Moradi, “Kidney_segmentation_Residual_Attention_UNET,” Github, 2018, https://github.com/Mousamoradi/Kidney_segmentation_Residual_Attention_UNET.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. Block diagram for automatic kidney tubules segmentation. The ground truth dataset was created manually in ImageJ by 4 trained raters.
Fig. 2.
Fig. 2. Image preprocessing steps on kidney images. (a) Original OCT image of the donor kidney, (b) CLAHE with OTSU thresholding to enhance the contrast. (c) Canny edge detection with lower and upper gray-level thresholds of 125 and 375, respectively, (d) Dilated image to remove outlier pixels, (e) Contour detection to detect the borders, (f) The enhanced-cropped image. All units are in pixel index with 2.73 µm/pixel.
Fig. 3.
Fig. 3. Block diagram of the proposed Residual Attention-UNET segmentation model. The bridge (block 4) has the same structure as the up-sampling blocks.
Fig. 4.
Fig. 4. Training process schematic. FN: False Positive, FP: False Negative, Overlay: Intersection between the predicted and the ground truth. All units are in pixel index with 2.73 µm/pixel.
Fig. 5.
Fig. 5. Tubule lumen prediction on a random representative test image and its associated ground truth (a, b). Residual Attention-UNET prediction and its overlay with the ground truth (c, d), Attention-UNET prediction and its overlay with the ground truth (e, f), Res-UNET prediction and its overlay with the ground truth (g, h), UNET prediction and its overlay with the ground truth (i, j), and (e) FCN prediction and its overlay with the ground truth (k, l). The Red, Green and Yellow arrows show FN, FP and overlay predictions, respectively. Residual Attention-UNET predicted the tubule lumen with the highest similarity with the ground truth (Mean Dice = 0.81). All units are in pixel index with 2.73 µm/pixel.
Fig. 6.
Fig. 6. Learning curves of all the trained models with CLAHE: (a) Training Dice Coefficient, (b) Validation Dice Coefficient, (c) Sorensen-Dice loss profile. Maximum dice coefficient for the proposed Residual-Attention-UNET was 0.8179 and the Dice loss calculated as 0.1821 in epochs 25.
Fig. 7.
Fig. 7. Performance of the developed models against manual segmentation [6] and UNET model developed in [20]. All models developed on the same dataset with 14403 kidney images and size of 1571*539 pixels.

Tables (2)

Tables Icon

Table 1. Segmentation results for 5 trained models before and after CLAHE

Tables Icon

Table 2. Segmentation performance on the smaller dataset (3456 images (80%) for training and 864 images (20%) for validation)

Equations (2)

Equations on this page are rendered with MathJax. Learn more.

I O U ( X , Y ) = X Y X Y
D i c e ( X , Y ) = 2 × ( X Y ) X + Y
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.