Deep learning segmentation of the tear fluid reservoir under the sclera lens in optical coherence tomography images

Yuheng Zhou; Guangqing Lin; Xiangle Yu; Yang Cao; Hongling Cheng; Ce Shi; Jun Jiang; Hebei Gao; Hebei Gao; Fan Lu; Fan Lu; Fan Lu; Meixiao Shen; Meixiao Shen; Meixiao Shen

doi:10.1364/BOE.480247

1. Introduction

Reduced visual acuity is common in irregular astigmatism resulting from severe keratoconus and in certain patients with a history of pellucid marginal degeneration, ectasia after laser in situ keratomileusis (LASIK), or ocular disease. Visual rehabilitation of these patients with irregular astigmatism is challenging because conventional methods of optical correction often do not provide a satisfactory quality of vision. Recently, sclera lenses (ScL) have emerged as a treatment option to restore vision, reduce symptoms, and improve the quality of life for patients with irregular astigmatism suffering from complex corneal disease. The vault between the ScL and cornea called the tear fluid reservoir (TFR) (Fig. 1(A)) is a very important indicator and unique characteristic to achieve a successful lens-to-corneal fitting relationship, which involves regular optical surface reshaping during lens wear.

Fig. 1. Example of tear fluid reservoir under sclera lens in OCT image. (Top) schematic representation of the TFR under the sclera, (Left) typical OCT image showing the TRF between the sclera lens and cornea. and (Right) manual annotations, by the Matlab algorithm, of the given OCT image. ScL: sclera lens; TFR: tear fluid reservoir; CO: cornea. The black arrow denotes that the dynamic TFR changes during the ScL setting time.

Download Full Size | PDF

The anterior segment optical coherence tomography (AS-OCT) has been widely used for the evaluation of dynamic TFR changes, with high speed and high resolution [1] (Fig. 1(B)). Accurate TFR segmentation based on the OCT images is the first important step for dynamically generating the tear film maps, acquiring the location of the thinnest tear, and quantitatively assessing the midday fogging [1––4], which all play important roles in sclera lens fitting and design [2,5,6]. When evaluating the quantitative parameters related to the TFR on the OCT images, manual segmentation is often time-consuming and subjective; therefore, it is extremely desirable to develop automatic segmentation methods.

Recently, deep learning, especially convolutional neural networks (CNN), has made great advances in biomedical image processing, such as image segmentation, image de-noising [7–10], and image reconstruction [11–18]. Machine and deep learning methods have been recently proposed for pathologies and structures in individual retinal and corneal OCT scans [19–23]. For example, Zhou, et al. used optical attenuation coefficients, visualization and automated quantification of hyperreflective foci on OCT scans [22]. Mukherjee et al. used a deep convolutional regression network for 3D segmentation of retinal layers on OCT for patients suffering from age-related macular degeneration. Wang et al. used deep learning to automatically segment retinal layer boundaries and regions in OCT images [24]. However, before OCT images have complex characteristics that make it difficult to directly use existing methods.

There are two challenges for the segmentation of the TFR in OCT images. (1) The TFR presents with different morphological features with arbitrary shapes, sizes, and inhomogeneous intensities due to the presence of the foggy midday that appears during lens wear. This phenomenon is even more pronounced in the eyes of patients with severely irregular ocular surfaces, as illustrated in the representative patch of Fig. 2(A)–2(C). This creates a significant challenge for the existing DL segmentation algorithms, including those successfully implemented for the corneal layers. (2) Most of the existing methods tend to be very sensitive to noise or pathological changes, and often rely on the well-contrasted boundaries of uniform layer structures. The TFR has a low contrast in the OCT images, especially in some cases with severe midday foggy, and is somewhat invisible, particularly in some periphery locations, as illustrated in the representative patch of Fig. 2(D).

Fig. 2. Typical OCT images with the TFR under the sclera lens. The left column denotes the raw images with zoomed-in local images. The right column denotes the ground truth images with manually labeled TFR. (A) Typical OCT image showing a thick TFR, (B) Typical OCT image showing a thin TFR, (C) Typical OCT image showing the asymmetric and uneven distribution of the TFR in a keratoconus eye with an irregular corneal surface, (D) Typical OCT image showing cloudy foggy in the TFR.

Download Full Size | PDF

In this manuscript, we propose a custom-improved deep learning method based on the typical Unet architecture, namely FMFE-Unet, for the segmentation of the TFR under the sclera lens in OCT images. We compared the performance of the proposed network and the outputs of ground truth segmented with our previously developed semi-automatic algorithm [25]. Several neural networks and ablation models were tested for the segmentation task, and the training and final performance of each model were analyzed.

2. Materials and methods

2.1 Dataset

The data set was acquired using commercial SS-OCT (TomeyCasia2, Tomey Corporation, Nagoya, Japan). The central wavelength is 1310 nm and the axial resolution is 10 µm. The scanning speed is 50,000 A-scan per second, which has been described in detail in previous papers [26,27]. The dataset contained 52 healthy eyes with regular corneal surfaces and 46 keratoconus eyes with irregular corneal surfaces; the results were recorded immediately, and thereafter at 30 min, 1 h, 2 h, and 4 h after lens wear. Each 3D-OCT datapoint consisted of 18 B-scan images centered on the corneal apex, each B-scan size is 796 × 1160 (H × W), corresponding to the actual dimension of 13mm × 16 mm (H × W), thus leading to a total of 31850 images. The study protocol was approved by the Ethics Committee of the Wenzhou Medical University (WMU) Eye Hospital and was performed in accordance with the Declaration of Helsinki. The study participants provided written informed consent before participating in the study.

To segment the tear fluid reservoir and generate the labeled images, we used a Matlab R2019 (The Mathworks Inc., Natick, MA, USA) algorithm, which is based on the graph theory algorithm that we have previously described [26] to semi-aumatically segment the boundies of TRF. After that, two experienced graders (Dr.S Ce & Dr. HL Chen) checked the boundaries of the segmented TFR under the sclera lens for each image and corrected manually if necessary. At last, a labeled image containing 2 classes, TFR and background, was generated as shown in Fig. 1(C).

2.2 Network architecture

The basic architecture of the proposed FMFE-Unet is shown in Fig. 3. In FMFEM, we denote d as the dilation rate and n refers to number of Resblocks. Since the Unet structure has shown promising performances in medical image segmentation [28,29], our proposed deep learning method also adopted the U-shape structure as the backbone of the network. In this study, a full-range multi-scale feature-enhanced (FMFE) module across the encoding path was proposed to extract the multi-scale features of the TFR from the OCT images. Then, we replace the cross-entropy loss by using a hybrid loss function to tackle the problem of class imbalance and obtain the final segmentation with pixel classification, which is different from the typical Unet structure [29].

Fig. 3. The basic architecture of the FMFE-Unet network.

Download Full Size | PDF

2.2.1 FMFE module

In a typical Unet structure, the multi-scale features can be detected; however, a lack of global context leads to inconsistencies. Therefore, error extends prediction would yield for different receptive views produced different discrimination characters. In the encoder deep-level sub-network more semantic features information but less detailed spatial information exists because of the down-sample pooling and stride convolutional operations. In contrast, the low-level sub-network has more spatial information, but poor semantic consistency due to its small receptive field. Some efforts have been explored to preserve more detailed features in the different scales [30]. The direct method is to apply an image pyramid for the encoder part to extract features from each scale input [24]; however, this method frequently does not scale well for deeper CNN blocks due to the limited GPU memory and requires more computation time. According to previous work [31], dilated convolution is able to achieve a larger receptive field size without increasing the number of kernel parameters. Inspired by this concept, we developed the FMFE module across the encoding path that can exploit multi-scale layer shapes from the raw input image’s entirety with exponentially expanding receptive fields and aggregating spatial information (Fig. 4).

Fig. 4. Full-range multi-scale feature enhanced (FMFE) module of consisting a contextual perception (CP) unit and a Resblock.

Download Full Size | PDF

A basic FMFE module includes a contextual perception (CP) unit and a residual block (ResBlock) to increase the performance of dense prediction (Fig. 4). As illustrated in Fig. 4, a CP unit consists of two cascading dilated 3 × 3 convolutions with the same dilation ratio [31,32] d and the ResBlock includes r convolutional blocks stacked in a cascading manner. This type of block has two 3 × 3 convolutions with a 1 × 1 convolution layer in the shortcut path to match the input and output dimensions. All the convolution operations are followed by batch normalization [33]. In the FMFE module, the CP unit abstracts features from the input feature maps and concatenates the ResBlock output features to generate new feature maps. Thereafter, it goes to the lower level to offer more discriminative details and enhance the gradient flow with feature maps from the lower stages of the encoder layers. We used five layers of encoder and decoder, and the dilation rate d was set as 7, 5, 3, 2, 1, from the highest to the lowest layer respectively, while the r values were 3, 4, 6, 8, 3, correspondently. The design applies a reverse spatial pyramid pooling strategy [34] to aggregate the context at several ranges, as illustrated in Fig. 4.

In the FMFE module, the image-level features are exploited for global context information, where the dilation rate d also corresponds to how fast the input image is sampled. With the input image is P with C channels, for each receptive field r and a dilate filter $\mathrm{\Omega}\; \textrm{on}\; f$, we have the context unit output feature map f as shown below:

(1)$$f{\; } = \mathop \sum \nolimits_i p{\; }[{i{\; } + {\; }d \cdot r} ]\mathrm{\Omega}[r ]$$

The final layer of the encoder especially applies 1 × 1 convolutions and produces output feature maps $({\textrm{H} \times \textrm{W} \times \textrm{C}} )$ of size $512 \times 512 \times 1$. Finally, the multi-range context information is obtained using $F = \mathop \sum \nolimits_l f,l = 1,{\; }2, \ldots ,{\; }5{\; }$.

2.2.2 Loss function

The TFR is a minuscule proportion of pixels in the OCT images, thereby leading to an uneven distribution of the target and background. Dice loss is a region-related loss, which means that the loss of the current pixel is related not only to the predicted value of the pixel but also to the values of other points, therefore it is more suitable to predict the positive samples. In the training process, the dice loss regions place greater emphasis on the mining of the target area. However, in the case of small targets, the training loss is prone to instability. Focal loss is an extension of cross-entropy loss to solve the imbalance problem. It increases the weights assigned to the misclassified samples and reduces the weights of the correct ones. Hence, in this study, we used a hybrid loss, consisting of contributions from both dice loss [35] and focal loss [36], calculated by:

(2)$$£\; = {\; }{£_{\textrm{Focal}}} + \mathrm{\lambda }{£_{\textrm{Dice}}}$$

where $\lambda $ represents the weight of the dice loss terms. In this study, we set $\lambda = 0.6$ (The influence of different loss functions is described in the online supplemental eTable 2). ${£_{Dice}}\; $ is the dice loss, and is defined as follows:

(3)$${£_{\textrm{Dice}}} = 1 - \frac{{2\textrm{pg} + \mathrm{\varepsilon }}}{{\textrm{p} + \textrm{g} + \mathrm{\varepsilon }}}$$

where p is the predicted value; g is the ground truth, and $\varepsilon $ is the smooth to gradient operations and prevents division by zero. ${£_{Focal}}$ is the focal loss, and is defined as follows:

(4)$${£_{Focal}} = \left\{ {\begin{array}{c} { - \alpha {{({1 - {P_t}} )}^2}\textrm{log}({{P_t}} ),g = 1}\\ { - ({1 - \alpha } ){{({{P_t}} )}^2}\textrm{log}({1 - {P_t}} ){,\; }otherwise{\; \; }} \end{array}} \right.$$

where Pt is the model’s estimated probability, α is used to adjust the distribution of the samples for easy prediction, and (1-P)2 is the dynamic scaling factor, which is used to adjust the distribution of the hard samples, $g = \{{ - 1,1} \}$.

2.3 Evaluation metrics

We validated the performance of the FMFE-Unet using the independent testing dataset. The performances of all the networks involved were evaluated using the intersection over union (IoU), precision, recall, specificity, and F1-score. The metrics can be calculated using the following equations:

(5)$$IoU = \frac{{Pi{x_{seg}} \cap Pi{x_{gt}}}}{{Pi{x_{seg}} \cup Pi{x_{gt}}}}$$

(6)$$Precision = \frac{{TP}}{{TP + FP}}$$

(7)$$Recall = \frac{{TP}}{{TP + FN}}$$

(8)$$Specificity = \frac{{TN}}{{FP + TN}}$$

(9)$$F1 - score = 2 \times \frac{{Precision\; \textrm{x}\; Recal\textrm{l}}}{{Precision{\; } + {\; }Recall}}$$

where $Pi{x_{seg}}$ is the pixel of predicted segmentation and $Pi{x_{gt}}$ is the ground truth. TP, TN, FP, and FN denote true positive, true negative, false positive, and false negative, respectively.

In addition to this, the performance of FMFE-Unet was evaluated using the pixel-value difference (PVD) [37] of the distance between the segmented TFR and the ground truth; we compared each pixel in the segmentation map to the ground truth by calculating the coordinate differences along the vertical axis marked by ${d_y}$, and the pixel failure ratios ${f_r}$ (ranges from 0 to 3) are described as follows:

(10)$${f_r} = \left\{ {\begin{array}{c} {0\; \; \; \; \; \; \; \; \; \; \; ({0 \le {d_y} \le 2} )}\\ {1\; \; \; \; \; \; \; \; \; \; \; ({2 \le {d_y} \le 5} )}\\ {2\; \; \; \; \; \; \; \; \; \; \; ({6 \le {d_y} \le 8} )}\\ {3\; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; ({{d_y} \ge 8} )} \end{array}} \right.$$

while $PV{D_r}$ is described as follows:

(11)$$PV{D_r} = \frac{{{f_r}}}{{\sum {f_r}}}$$

2.4 Setup

All the networks were implemented using python 3.6 and keras library version 2.4 with the Tensorflow platform. The experiments were performed on the Windows 10 operating system using CPU Intel Core i9-11900K 3.5 GHz, GPU NVIDIA GeForce GTX TITAN, and a RAM of 128 GB. The OCT images and masks were resized to $512\; x\; 512$ to facilitate the network design. To evaluate the network, a total of 1837 images selected were randomly selected from all groups in the dataset and split into three subsets, 75% for training, 15% for validation, and 10% for testing.

3. Experimental results

3.1 Overall performance and performance comparison

Tables 1 and 2 display the performance differences between Unet, DeepLabv3+, and our proposed FMFE-Unet in identifying the enclosed TFR under the sclera lens depicted in the collected OCT images. The results achieved by the developed network for the IoU, precision, specificity, and recall were 0.9426, 0.9678, 0.9965, and 0.9731, respectively, for the segmentation of the TFR. These values suggest that the developed network exhibits promising potential to extract the TFR under the sclera lens from OCT images, compared to the manual annotations. Figure 5 visually displays the performance of the proposed FMFE-Unet on typical OCT images, from a healthy eye and a keratoconus eye.

Fig. 5. Illustration of the performance of the FMFE-Unet. The first automatically segmented tear liquid layer in OCT images with high accuracy. Segmentations of the tear liquid layer under the sclera lens in healthy and keratoconus corneas using the Matlab algorithm and our proposed network. The keratoconus eyes usually exhibit the tear liquid layer in non-uniform shapes and thicknesses. KC: keratoconus.

Download Full Size | PDF

Table 1. Performance of each tested method compared to manual delineation on the collected dataset.

View Table | View all tables in this article

Table 2. Pixel-value difference (PVD, %) of the distance between the segmented TFR and the ground truth for each tested method, on the collected datase

View Table | View all tables in this article

It can be seen from the performance comparison of the segmentation results that FMFE-Unet exhibited slightly better performance than Unet and DeepLabv3+, in terms of precision, recall, IoU, and the F1-score. Specifically, FMFE-Unet achieved the highest$\; \textrm{PV}{\textrm{D}_0}$ of 87.59% and lowest $\textrm{PV}{\textrm{D}_3}$ of 0.15%, compared to the other two methods. The examples in Fig. 6 showed the computerized results from the developed algorithm, Unet, and DeepLabv3 + in identifying the enclosed TFR depicted in the OCT images.

Fig. 6. Performance differences of the three segmentation networks involved in identifying the tear liquid layers from four typical OCT images. The first column is the raw OCT images. The next three columns are the segmentation results obtained using Unet, DeepLabv3+, and FMFE-Unet, respectively. The last column is the manual annotations, by the Matlab algorithm, of the given OCT images. Scale bar = 1 mm.

Download Full Size | PDF

3.2 Ablation analysis

Compared to the basic Unet, the design of the FMFE module is the main improvement in FMFE-Unet, which consists of a CP unit and ResBlock. To analyze the effects of the FMFE module design, we conducted experiments on three different architectures with the same structure of Unet, namely CP-Unet, ResBlock-Unet, and FMFE -Unet. Tables 3 and 4 show the performance results of all the ablation models evaluated on the collected OCT images. Figure 7 visualizes the segmentation results of the different ablation models in several typical OCT images. The results show that the model with the FMFE module had a higher accuracy than the Unet with only the CP unit or only the Resblock.

Fig. 7. Performance differences of our ablation models in identifying the tear liquid layers from four typical OCT images. The first column is the raw OCT images. The next three columns are the segmentation results obtained by CP -Unet, ResBlock –Unet, and FMFE-Unet, respectively. Scale bar = 1 mm.

Download Full Size | PDF

Table 3. Performance of each ablation model compared to manual delineation on the collected dataset.

View Table | View all tables in this article

Table 4. Pixel-value difference (PVD, %) of the distance between the segmented TFR and the ground truth for each ablation model, on the collected dataset. each ablation model, on the collected dataset.

View Table | View all tables in this article

Table 5 demonstrates the impact of the parameter λ, for the hybrid loss function, on the performance of the developed deep learning method. The parameter was set to 0, 0.2, 0.4, 0.6, 0.8, and only dice loss for the same segmentation experiments. The experiment results showed that the developed network had the best performance when λ=0.6, compared to the other values in image segmentation.

Table 5. Dice coefficients (%) of the developed network with the dice loss function and different ${\mathbf \lambda }\; $in the hybrid loss functions.

View Table | View all tables in this article

4. Discussion

Our results show that the proposed deep learning method has the ability to obtain accurate delineation results of the enclosed TFR under the sclera lens in OCT images. Based on these results, doctors can obtain the properties of the TFR, including the distribution, thickness, area, thinnest location, and intensity of the foggy midday. These properties play an important role in sclera lens fitting and visual therapy. To the best of our knowledge, this is the first time that the segmentation of the TFR, under the sclera lens, in OCT images, has been performed using deep learning.

The full-range multi-scale enhanced module has been developed in the proposed network. It aims to preserve more detailed features in different scales during down-sampling across the encoding path. To highlight the advantages of the developed method in segmentation, we compared the performance of the proposed method with the other two available networks, Unet and DeepLabv3+, on our database for the following reasons. The classical Unet has a popular Encoder-Decoder architecture for biomedical image segmentation, which is also the backbone of our approach. DeepLabv3 + is a state-of-the-art method proposed by Google, which has a similar Atrous Spatial Pyramid Pooling (ASPP) module strategy as our framework.

The experimental results show that the proposed method is significantly superior to the other two segmentation models, because it can obtain more accurate object shapes and edge continuity. For DeepLabv3+, the dilation rate of ASPP is 6, 12 and 18. As we know, the benefit of ASPP with large dilation rate is to enlarge the field of view (FOV) of filters to incorporate larger context without increasing the number of parameters. However, TRF segmentation is small target segmentation with long edges and sparse pixels. When identifying continuous edges, objects similar to each other, or objects with different types of similarity may be exaggerated by convolutional neural networks. This problem is especially serious in the field of OCT image segmentation due to the similarity of physical characteristics of various corneal tissues. The various tissue layers usually overlap with each other on the edge. The large dilation rate increases the FOV of spatial channel-wise convolution too much, which may not improve the segmentation effect [31]. In our experiments, we set the dilation rate as 7, 5, 3, and 2, which provided suitable FOV to capture the continuous edge of TR, thus improving the segmentation performance. Additionally, compared to Unet, DeepLabv3 + may start with high training and validation losses but converge slowly. We set the number of epochs to 300 in our experiments, which is enough for Unet to train on the dataset, but for DeepLabv3+, it needs more epochs to converge. This may be another reason why this proposed network performs better than DeepLabv3 + .

In the full-range multi-scale enhanced module, a CP unit and ResBlock were embedded in the encoding path, based on the typical Unet network. The experiment results showed that FMFE-Unet outperformed the Unet with only a CP unit or only a Resblock, which indicates the effectiveness of our module design. Based on the comparison between our model and module ablations, with the help of a full-range multi-scale feature-enhanced module, our model obtains richer multi-scale feature information in the encoder. This improvement may be because of the CP unit which enlarges the receptive fields of the features, thus enabling the proposed Unet to capture more global context information. In addition to this, as the number of layers deepens, detailed features contained in the feature map will be reduced, layer by layer, thus in FMFE-Unet, the Resblocks were added to ensure that the network contains more image information. This further demonstrates that all improvements effectively enhanced the final segmentation performance.

In the training of the CNN-based network, the contribution of the cost function is to assess the agreement between the output of a segmentation network and the ground truth. In our application, the class imbalance is the main problem in the training of the network because the TFR makes up a minuscule proportion of pixels in the OCT images, thereby leading to an uneven distribution of the target and the background. The hybrid loss function was designed and optimized to ensure that the segmented regions agree with their ground truth as best as possible. Our experiment results indicate that without the focal loss function or using excessively large λ (i.e. λ > 0.6) will cause adverse effects on the backpropagation and easily make training unstable, ultimately leading to the degradation of the segmentation performance. When λ=0.6, the hybrid loss function has the most efficacy in alleviating the class imbalance problem, which causes the TFR to contain much fewer pixels than the image backgrounds.

This study has a few limitations. One is the lack of an external dataset since this is the first application of using deep learning methods for the segmentation of the TFR in OCT images. However, we performed random rotating, as well as horizontal and vertical flipping on each OCT image, and subsequently only included augmented images in the training phase. The augmentation operations can improve the generalization ability of the network. Another limitation of our study is that only a TFR mask was generated as an output in the current network. Further studies should focus on the extraction of spatial details from the object edges to provide the boundary line.

5. Conclusion

In summary, we proposed FMFE-Unet, a custom-modified Unet network, for the segmentation of the TFR in OCT images with high accuracy. Motivated by dilated convolution and residual deep learning, we developed a full-range multi-scale feature-enhanced module to advance semantic information, with less detailed feature loss across the encoding path. During further research, we plan to extend our proposed network to support the simultaneous extraction of multi-layers and delineation of their boundaries on 3D images. Overall, our proposed deep learning method can be used for the sclera lens fitting and the visual rehabilitation of patients in clinics.

Funding

National Natural Science Foundation of China (82171016).

Disclosures

The authors declare no conflicts of interest regarding the publication of this paper.

Data Availability

Data are available upon reasonable request.

References

1. S. J. Vincent, D. Alonso-Caneiro, and M. J. Collins, “The temporal dynamics of miniscleral contact lenses: central corneal clearance and centration,” Cont Lens Anterior Eye 41(2), 162–168 (2018). [CrossRef]

2. S. J. Vincent, D. Alonso-Caneiro, and M. J. Collins, “Miniscleral lens wear influences corneal curvature and optics,” Ophthalmic Physiol Opt 36(2), 100–111 (2016). [CrossRef]

3. L. P. Kowalski, M. J. Collins, and S. J. Vincent, “Scleral lens centration: The influence of centre thickness, scleral topography, and apical clearance,” Cont Lens Anterior Eye 43(6), 562–567 (2020). [CrossRef]

4. S. J. Vincent and D. Fadel, “Optical considerations for scleral contact lenses: a review,” Cont Lens Anterior Eye 42(6), 598–613 (2019). [CrossRef]

5. L. A. Hall, G. Young, J. S. Wolffsohn, and C. Riley, “The influence of corneoscleral topography on soft contact lens fit,” Invest Ophthalmol Vis Sci 52(9), 6801–6806 (2011). [CrossRef]

6. R. E. Norman, J. G. Flanagan, I. A. Sigal, S. M. Rausch, I. Tertinegg, and C. R. Ethier, “Finite element modeling of the human sclera: influence on optic nerve head biomechanics and connections with glaucoma,” Exp Eye Res 93(1), 4–12 (2011). [CrossRef]

7. A. Abbasi, A. Monadjemi, L. Fang, H. Rabbani, and Y. Zhang, “Three-dimensional optical coherence tomography image denoising through multi-input fully-convolutional networks,” Comput Biol Med 108, 1–8 (2019). [CrossRef]

8. A. Li, C. Du, N. D. Volkow, and Y. Pan, “A deep-learning-based approach for noise reduction in high-speed optical coherence Doppler tomography,” J Biophotonics 13(10), e202000084 (2020). [CrossRef]

9. M. Mehdizadeh, C. MacNish, D. Xiao, D. Alonso-Caneiro, J. Kugelman, and M. Bennamoun, “Deep feature loss to denoise OCT images using deep neural networks,” J. Biomed. Opt. 26(4), 046003 (2021). [CrossRef]

10. Q. Xie, Z. Ma, L. Zhu, F. Fan, X. Meng, X. Gao, and J. Zhu, “Multi-task generative adversarial network for retinal optical coherence tomography image denoising,” Phys. Med. Biol. 68(4), 045002 (2023). [CrossRef]

11. K. Nagib, B. Mezgebo, N. Fernando, B. Kordi, and S. S. J. S. Sherif, “Generalized image reconstruction in optical coherence tomography using redundant and non-uniformly-spaced samples,” Sensors 25(21), 7057 (2021). [CrossRef]

12. C. Tsoumpas, J. S. Jørgensen, C. Kolbitsch, and K. T. Thielemans, “Synergistic Tomographic Image Reconstruction: part 1,” (The Royal Society Publishing, 2021), p. 20200189.

13. Z. Chu, L. Wang, X. Zhou, Y. Shi, Y. Cheng, R. Laiginhas, H. Zhou, M. Shen, Q. Zhang, L. de Sisternes, A. Y. Lee, G. Gregori, P. J. Rosenfeld, and R. K. Wang, “Automatic geographic atrophy segmentation using optical attenuation in OCT scans with deep learning,” Biomed. Opt. Express 13(3), 1328–1343 (2022). [CrossRef]

14. S. K. Devalla, T. H. Pham, S. K. Panda, L. Zhang, G. Subramanian, A. Swaminathan, C. Z. Yun, M. Rajan, S. Mohan, R. Krishnadas, V. Senthil, J. M. S. De Leon, T. A. Tun, C. Y. Cheng, L. Schmetterer, S. Perera, T. Aung, A. H. Thiery, and M. J. A. Girard, “Towards label-free 3D segmentation of optical coherence tomography images of the optic nerve head using deep learning,” Biomed. Opt. Express 11(11), 6356–6378 (2020). [CrossRef]

15. M. Gao, Y. Guo, T. T. Hormel, J. Sun, T. S. Hwang, and Y. Jia, “Reconstruction of high-resolution 6×6-mm OCT angiograms using deep learning,” Biomed. Opt. Express 11(7), 3585–3600 (2020). [CrossRef]

16. X. Liu, L. Bi, Y. Xu, D. Feng, J. Kim, and X. Xu, “Robust deep learning method for choroidal vessel segmentation on swept source optical coherence tomography images,” Biomed. Opt. Express 10(4), 1601–1612 (2019). [CrossRef]

17. J. Loo, L. Fang, D. Cunefare, G. J. Jaffe, and S. Farsiu, “Deep longitudinal transfer learning-based automatic segmentation of photoreceptor ellipsoid zone defects on optical coherence tomography images of macular telangiectasia type 2,” Biomed. Opt. Express 9(6), 2681–2698 (2018). [CrossRef]

18. V. A. D. Santos, L. Schmetterer, H. Stegmann, M. Pfister, A. Messner, G. Schmidinger, G. Garhofer, and R. M. Werkmeister, “CorneaNet fast segmentation of cornea OCT scans of healthy and keratoconic eyes using deep learning,” Biomed. Opt. Express 10(2), 622–641 (2019). [CrossRef]

19. T. M. Aslam, H. R. Zaki, S. Mahmood, Z. C. Ali, N. A. Ahmad, M. R. Thorell, and K. Balaskas, “Use of a neural net to model the impact of optical coherence tomography abnormalities on vision in age-related macular degeneration,” Am. J. Ophthalmol. 185, 94–100 (2018). [CrossRef]

20. A. Elsawy and M. Abdel-Mottaleb, “PIPE-Net: A pyramidal-input-parallel-encoding network for the segmentation of corneal layer interfaces in OCT images,” Comput Biol Med 147, 105595 (2022). [CrossRef]

21. S. Mukherjee, T. De Silva, P. Grisso, H. Wiley, D. L. K. Tiarnan, A. T. Thavikulwat, E. Chew, and C. Cukras, “Retinal layer segmentation in optical coherence tomography (OCT) using a 3D deep-convolutional regression network for patients with age-related macular degeneration,” Biomed. Opt. Express 13(6), 3195–3210 (2022). [CrossRef]

22. H. Zhou, J. Liu, R. Laiginhas, Q. Zhang, Y. Cheng, Y. Zhang, Y. Shi, M. Shen, G. Gregori, P. J. Rosenfeld, and R. K. Wang, “Depth-resolved visualization and automated quantification of hyperreflective foci on OCT scans using optical attenuation coefficients,” Biomed. Opt. Express 13(8), 4175–4189 (2022). [CrossRef]

23. S. Mukherjee, T. De Silva, G. Jayakar, P. Grisso, H. Wiley, T. Keenan, A. Thavikulwat, E. Chew, and C. Cukras, Retinal Layer Segmentation for Age-related Macular Degeneration Patients with 3D-UNet, SPIE Medical Imaging (SPIE, 2022), Vol. 12033.

24. B. Wang, W. Wei, S. Qiu, S. Wang, D. Li, and H. He, “Boundary Aware U-Net for retinal layers segmentation in optical coherence tomography images,” IEEE J. Biomed. Health Inform. 25(8), 3029–3040 (2021). [CrossRef]

25. S. Huang, M. Shen, D. Zhu, Q. Chen, C. Shi, Z. Chen, and F. Lu, “In vivo imaging of retinal hemodynamics with OCT angiography and Doppler OCT,” Biomed. Opt. Express 7(2), 663–676 (2016). [CrossRef]

26. M. Shen, Z. Xu, C. Yang, L. Leng, J. Liu, Q. Chen, J. Wang, and F. Lu, “Agreement of corneal epithelial profiles produced by automated segmentation of SD-OCT images having different optical resolutions,” Eye Contact Lens 40(2), 99–105 (2014). [CrossRef]

27. Z. Xu, S. Chen, C. Yang, S. Huang, M. Shen, and Y. Wang, “Reliability of entire corneal thickness mapping in normal post-laser in situ keratomileusis and keratoconus eyes using long scan depth spectral domain optical coherence tomography,” Ophthalmic. Res. 59(3), 115–125 (2018). [CrossRef]

28. S. K. Devalla, P. K. Renukanand, B. K. Sreedhar, G. Subramanian, L. Zhang, S. Perera, J. M. Mari, K. S. Chin, T. A. Tun, N. G. Strouthidis, T. Aung, A. H. Thiery, and M. J. A. Girard, “DRUNET: a dilated-residual U-Net deep learning network to segment optic nerve head tissues in optical coherence tomography images,” Biomed. Opt. Express 9(7), 3244–3265 (2018). [CrossRef]

29. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image Computing and Computer-assisted Intervention, (Springer, 2015), 234–241.

30. Q. Jin, Z. Meng, T. D. Pham, Q. Chen, L. Wei, and R. J. Su, “DUNet: A deformable network for retinal vessel segmentation,” Knowledge-Based Syst. 178, 149–162 (2019). [CrossRef]

31. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv, arXiv.1511.07122 (2015). [CrossRef]

32. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. J. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115, 211–252 (2015). [CrossRef]

33. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, (PMLR, 2015), 448–456.

34. K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” European Conf. Comput. Vis. 37, 1904–1916 (2015). [CrossRef]

35. X. Li, X. Sun, Y. Meng, J. Liang, F. Wu, and J. Li, “Dice loss for data-imbalanced NLP tasks,” (2019).

36. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in IEEE international conference on computer vision, (2017), 2980–2988.

37. D.-C. Wu and W.-H. Tsai, “A steganographic method for images by pixel-value differencing,” Pattern Recognition Lett. 24, 1613–1626 (2003). [CrossRef]

	$PV D_{0}$ (%)	$PV D_{1}$ (%)	$PV D_{2}$ (%)	$PV D_{3}$ (%)
Unet	87.29	11.88	0.63	0.20
DeepLabv3+	86.90	11.94	0.71	0.45
FMFE-Unet	87.59	11.75	0.51	0.15

	Precision	Recall	F1-score	IoU	Specificity
CP -Unet	0.9539	0.9714	0.9625	0.9278	0.9958
ResBlock -Unet	0.9691	0.9693	0.9692	0.9403	0.9964
FMFE-Unet	0.9678	0.9731	0.9704	0.9426	0.9965

	$PV D_{0}$ (%)	$PV D_{1}$ (%)	$PV D_{2}$ (%)	$PV D_{3}$ (%)
CP -Unet	85.84	12.94	0.83	0.39
ResBlock -Unet	87.47	11.73	0.61	0.19
FMFE-Unet	87.59	11.75	0.51	0.15

	$PV D_{0}$ (%)	$PV D_{1}$ (%)	$PV D_{2}$ (%)	$PV D_{3}$ (%)
Unet	87.29	11.88	0.63	0.20
DeepLabv3+	86.90	11.94	0.71	0.45
FMFE-Unet	87.59	11.75	0.51	0.15

	Precision	Recall	F1-score	IoU	Specificity
CP -Unet	0.9539	0.9714	0.9625	0.9278	0.9958
ResBlock -Unet	0.9691	0.9693	0.9692	0.9403	0.9964
FMFE-Unet	0.9678	0.9731	0.9704	0.9426	0.9965

Deep learning segmentation of the tear fluid reservoir under the sclera lens in optical coherence tomography images

Abstract

1. Introduction

2. Materials and methods

2.1 Dataset

2.2 Network architecture

2.2.1 FMFE module

2.2.2 Loss function

2.3 Evaluation metrics

2.4 Setup

3. Experimental results

3.1 Overall performance and performance comparison

3.2 Ablation analysis

4. Discussion

5. Conclusion

Funding

Disclosures

Data Availability

References

Data Availability

Cited By

Figures (7)

Tables (5)

Equations (11)

Biomedical Optics Express

	Precision	Recall	F1-score	IoU	Specificity
Unet	0.9666	0.9707	0.9686	0.9392	0.9965
DeepLabv3+	0.9494	0.9726	0.9609	0.9247	0.9957
FMFE-Unet	0.9678	0.9731	0.9705	0.9426	0.9965