MPB-CNN: a multi-scale parallel branch CNN for choroidal neovascularization segmentation in SD-OCT images

Yuhan Zhang; Zexuan Ji; Yuexuan Wang; Sijie Niu; Wen Fan; Songtao Yuan; Qiang Chen

doi:10.1364/OSAC.2.001011

1. Introduction

Age-related macular degeneration (AMD) is the leading reason of permanent visual loss for elder adults in recent years. The advanced, neovascular form of AMD is characterized by the presence of choroidal neovascularization (CNV), pathologic new vessels from the choroid that grow into the avascular outer retina through breaks in Bruch’s membrane (BM) [1,–4. CNV can lead to a great diversity of complication, such as subretinal hemorrhage, fluid exudation, lipid deposition, detachment of the retinal pigment epithelium from the choroid, fibrotic scars, and the combination of these mentioned findings [5–7]. By segmenting CNV lesion accurately, clinical ophthalmologists are able to acquire the properties of CNV lesion, including the area, volume, width, height, optical density value, etc. Using these properties, clinical ophthalmologists can estimate the expansion and contraction of the CNV lesion precisely, and thereby design more effective treatment regimens so as to further predict the evolution of CNV lesion in the future. Thus, accurate CNV segmentation could help ophthalmologists conducting auxiliary diagnosis and treatment, which is becoming more and more important in the field of ophthalmic medical image processing.

Fluorescein angiography (FA) and indocyanine green angiography (ICGA) are important diagnostic tools used to detect and evaluate CNV in clinical practice. However, both FA and ICGA require intravenous dye injection, which may result in nausea and, rarely, anaphylaxis [1,3,8]. Besides, FA and ICGA can only capture one single fundus image and thus ignore the internal structure information of CNV. Optical coherence tomography (OCT) is a noninvasive, depth resolved, volumetric imaging technique that is commonly used to visualize retinal morphology. Spectral-domain optical coherence tomography (SD-OCT) has many advantages comparing to traditional OCT technologies, such as high resolution, real 3D volumetric image of retina and manifestation of more comprehensive anatomical structures [9–11]. Optical coherence tomography angiography (OCTA) is a functional extension of OCT, which allows for the visualization of vasculature by assessing variation in the OCT signal over time [4,9].

Fig. 1. CNV pathologic analysis results. The area surrounded by the red line is the CNV lesion, which is annotated by more than one ophthalmologist. Yellow arrows indicate CNV characteristics and blue arrows indicate other factors that influence CNV segmentation.

Download Full Size | PDF

In recent years, the quantitative and qualitative analyses of CNV lesion in SD-OCT images were based on the manual segmentation by ophthalmologists. Additionally, some effective methods were used to segment CNV lesion in OCTA. Liu et al. [2] developed the split-spectrum amplitude-decorrelation angiography (SSADA) algorithm to distinguish the blood flow from static tissues based on detecting the reflectance amplitude decorrelation over consecutive cross-sectional B-scans at the same location. Get et al. [3] developed a saliency based detection algorithm, which could automatically separate the CNV membrane from noises/artifacts on the en face angiograms. Abdelmoula et al. [4] presented a framework for segmenting CNV lesions based on parametric modeling of the intensity variation in fundus FA. Additionally, some graph based methods were also proposed for CNV segmentation in OCTA, such as combined edge detection and simple thresholding strategies [12], active contour models [13], and the level-set segmentation algorithm [14].

With the rise of deep learning in recent years, semantic segmentation in nature images has made a tremendous progress. In the up-to-date PASCAL VOC Challenge, the highest mean segmentation accuracy has reached 89% by DeepLabV3+ [15], in which the encoder and decoder part combining with atrous convolution and spatial pyramid pooling (SPP) is included. In addition, weakly supervision was introduced to deep learning by Luo et al. [16] in order to reduce the difficulty of obtaining training samples and has reached the accuracy of 86.8%. Fu et al. [17] stacked multiple shallow deconvolutional networks one by one to integrate contextual information and thus guaranteed the fine recovery of localization information with the accuracy achieving 86.8%. IDW-CNN, proposed by Wang et al. [18], trained an independent network for each category and each behavior interaction, and then utilized the results from subnets to refine the final segmentation map acquiring a segmentation accuracy of 86.3%.

Inspired by the success of deep learning in the field of image processing and pattern recognition, more and more medical image researchers attempted to take advantage of deep learning to solve relevant problems. Chen et al. [19] proposed a Densely-Connected Volumetric ConvNets to automatically segment 3D cardiac MR images. The architecture introduced the dense connection, 3D fully convolutional network and had the down- and up-sampling components. DeepMedic was designed by Kamnitsas et al. [20] to segment brain tumor, which was an 11-layers deep, multi-scale 3D CNN and was extended with residual connections. To break through the limitation of the small number of samples, Ronneberger et al. [21] used a U-shape network structure to deal with microscopy cell image segmentation. Moreover, deep learning has also been widely used for image processing in SD-OCT, including layer segmentation, pigment epithelial detachments (PED) segmentation, fluid/edema segmentation, choroid segmentation, and retinal vessel segmentation [22–34]. However, few current literatures focused on CNV segmentation in SD-OCT images. In 2017, Zhu et al. [35,36] initially predicted the growth of CNV with a reaction diffusion model, which was tested on a dataset with 84 longitudinal OCT images, and reached a mean dice of 76.40%.

Due to that the size of CNV varies greatly, multi-scale features are capable of representing the CNV lesion more robustly. In this paper, a multi-scale parallel branch CNN (MPB-CNN) is proposed by extending CNN with three branches to extract multi-scale image features. Inspired by [15,37,38], we replaced standard convolution with atrous convolution. Additionally, in order to preserve signal transmission and benefit the network optimization, intra- and inter-branch connections are introduced. Extra supervision at the end of each branch can further guarantee a better optimization of each branch. Finally, an effective gradient constraint is added to loss function to preserve the boundary of CNV lesion. Experiment results indicated that our model can provide reliable segmentations for CNV in SD-OCT images.

2. Method

2.1 Problem statement

In terms of semantic segmentation in retinal SD-OCT images, the target is to assign each pixel to a particular label l, where $l \in \{{1,2, \ldots ,K} \}$ and K is the number of categories. Specifically, as shown in Fig. 1(a), the CNV segmentation task is treated as a classification problem with 4 classes. For CNV appears inside the retina, it is natural that we first divided the whole B-scan to three regions, vitreous, retina and a region including sclera and choroid, by their great difference. In addition, CNV lesion is treated as an extra class. Intuitively, more elaborate classes lead to smaller intra-class difference, but makes it more difficult and expensive to obtain per-pixel labelmaps for training. Finally, a 4-class classification problem is conducted. Specifically, the first category is the vitreous body located above the retina, the second category is the retinal region excluding CNV, the third category is the region under the BM layer including sclera and choroid, and the last category is the CNV lesion region. Consequently, the proposed model can simultaneously segment ILM, BM and CNV regions, as shown in Fig. 1(a).

Fig. 2. The architecture of MPB-CNN network. All the three branches share the same network structure, but use convolutional kernels of different scales to capture multi-scale information.

Download Full Size | PDF

2.2 CNV pathologic analysis

Figure 2 shows the explicit pathologic characteristics of CNV, which may include the following situations: (1) The pixel intensity within CNV is non-uniform (b, d, e); (2) Upper boundary of CNV may be blurred or invisible (b, g, j); (3) Lower boundary of CNV may vanish (c, e, i); (4) Illumination between A-scans may be non-uniform (i); (5) Other retinal diseases may influence segmentation (cystoid edema (f, g), hyper-reflective foci (f, g), neurosensory retinal detachment (NRD) (b, e, g) and so on); (6) The size of CNV has great difference (a, h); (7) The intensity of CNV has great difference (f, i).

Fig. 3. SD-OCT volumetric images. (a) is an SD-OCT cube, which contains $1024 \times 512 \times 128$ voxels with a corresponding trim size of $2\,{\rm{mm}} \times 6\,{\rm{mm}} \times 6\,{\rm{mm}}$ in (b). In (a), green, red and yellow lines represent the internal limiting membrane (ILM), the boundary of CNV, and the bruch’s membrane (BM), respectively. The four numbers in (a) respectively represent the four categories labels in the practical experiment.

Download Full Size | PDF

Compared with other retinal lesions, such as NRD and pigment epithelial detachments (PED), CNV shows more complex in pathologic characteristics. The boundaries of most CNV lesions are blurred or invisible, and some retinal abnormalities (incrassate RPE layer, exudation, NRD, PED and so on) have the similar appearances with CNV. Therefore, it is empirical for experts to segment CNV manually. For traditional machine learning, it is very difficult to design suitable hand-crafted features to represent CNV robustly and then distinguish CNV from non-CNV. However, deep learning has the capability to automatically compute discriminating features for classification, and thus avoid the boundedness of hand-crafted features.

2.3 Network architecture

The whole network architecture of MPB-CNN is illustrated in Fig. 3, which can only handle 2D B-scans in SD-OCT images. Although the information captured by SD-OCT device is in the form of 3D volume data, the displacement of consecutive slices occurs easily due to the misoperation during the process of capturing. In order to avoid the alignment of consecutive slices, we segmented each 2D B-scan directly. In MPB-CNN, three branches are arranged in parallel. The input SD-OCT B-scan is sent to the three branches respectively. All branches share the same network structure, including an encoder part and a decoder part. Three atrous convolutional kernels with different rates are used to encode the input image. In the encoder part, the use of dense connection could facilitate the signal delivery. During up-sampling, borrowing information from previous layers may improve the final segmentation results. In addition, an extra path stacking four convolutional layers is contracted to extract the low-level features of the input image. Finally, low-level features and three maps obtained by different branches are cascaded together as the input of the main network to obtain the final segmentation results.

Baseline: The fully convolutional network (FCN) [39] is selected as the base architecture of each branch. FCN is a popular frame structure for image semantic segmentation. Figure 4 presents the network architecture of the FCN. The FCN contains two parts: the encoder part and the decoder part. The encoder part employs standard convolution to extract the features computed by the deep convolutional neural networks and the number of channels increases gradually. In this paper, we fixed the number of channels to 64 in each convolution layer. To compare the effectiveness of the intra-branch connection and the atrous convolution, Dense-FCN (DFCN) was generated by first adding intra-branch connections on the architecture of FCN, and further generating Dense-AFCN by replacing standard convolution with atrous convolution on the architecture of DFCN (Fig. 4). FCN, DFCN and Dense-AFCN were respectively used as the architectures of the three branches to validate the segmentation results of MPB-CNN. The convolution kernel size is $3 \times 3$. Then the decoder part bilinearly up-sampled the features to the size of the original image to obtain the final segmentation map.

Fig. 4. The architecture of each branch.

Download Full Size | PDF

Atrous convolution: The convolutional kernel of the atrous convolution [15,37,38], also named the hole convolution, is sparse comparing to the standard convolution. The concept of atrous convolution is proposed based on the assumption that pixels closely connected have similar semantic information, and thus standard convolution leads to massive redundancy. Atrous convolution is a powerful tool that has the capability to capture the multi-scale image information by explicitly controlling the resolution of the features computed by deep convolutional neural networks and adjust the field-of-view of the convolutional kernel [15]. In particular, in the case of 2D images, for each location ${\boldsymbol i}$ on the output feature map ${\boldsymbol y}$ and a convolutional kernel ${\boldsymbol w}$, atrous convolution is applied over the input feature map ${\boldsymbol x}$ as follows:

(1)$${\boldsymbol y}[{\boldsymbol i} ]= \mathop \sum \nolimits_{\boldsymbol k} {\boldsymbol x}[{{\boldsymbol i} + r \cdot {\boldsymbol k}} ]{\boldsymbol w}[{\boldsymbol k} ]$$

where the atrous rate r determines the stride with which we sampled the input image. ${\boldsymbol k}$ is the size of the convolutional kernel. Note that when $r = 1$, the atrous convolution can be regarded as a standard convolution. The field-of-view of the convolution kernel is adaptively modified by changing the rate r. In addition, the atrous convolution saves more memories than the standard convolution due to the sparsity of convolutional kernels. Figure 5 presents the differences between the standard convolution and the atrous convolution in a more intuitive way.

Fig. 5. Comparing between standard convolution and atrous convolution.

Download Full Size | PDF

Intra- and inter-branch connections: Researchers found that the performance of neural networks is likely to decrease along with the increment of layers. That is because the gradient is likely to get vanished when signals are propagated to deeper layers. To boost the training process against the vanishing gradient, ResNet [40,41] introduced skip connection which integrates the features from the previous layers to turn the network into its counterpart residual version. Skip connection could augment the information propagation, reduce the number of parameters and benefit the network optimization. The skip connection can be directly used when the input and the output are of the same dimension:

(2)$${\boldsymbol y} = f({\boldsymbol x} )\oplus {\boldsymbol x}$$

where ${\boldsymbol x}$ and ${\boldsymbol y}$ are respectively the input and the output vectors of the layers considered. $f({\boldsymbol x} )$ represents a series of deep-learning operations on ${\boldsymbol x}$, $\oplus $ represents the element-wise sum.

To further improve the information flow within the network, the dense connection exercises the idea of skip connections to the extreme by implementing the connections from one layer to all its subsequent layers. The dense connection enables all layers to receive a direct supervision signal. More importantly, such a policy can encourage the reuse of features among all the connected layers. Supposing the output ${{\boldsymbol y}_i}$ of the i-th layer in a plain connection is expressed as follows:

(3)$${{\boldsymbol y}_i} = {f_i}({{{\boldsymbol x}_{i - 1}}} )= {f_i}({{f_{i - 1}}({ \ldots {f_1}({{{\boldsymbol x}_0}} )\ldots } )} )$$

then the output ${{ \tilde{\boldsymbol y}}_i}$ of the i-th layer with dense connection could be written as follows:

(4)$${{ \tilde{\boldsymbol y}}_i} = {{\boldsymbol y}_i} \oplus \ldots \oplus {{\boldsymbol y}_2} \oplus {{\boldsymbol y}_1} = {f_i}({{f_{i - 1}}({ \ldots {f_1}({{{\boldsymbol x}_0}} )\ldots } )} )\oplus \ldots \oplus {f_2}({{f_1}({{{\boldsymbol x}_0}} )} )\oplus {f_1}({{{\boldsymbol x}_0}} )$$

We found that the curse of gradient vanishing is much more serious for SD-OCT images than that for natural color images. That is because the image quality of SD-OCT images is often worse than that of natural color images. Specifically, the SD-OCT images have three characteristics: first, an SD-OCT B-scan is just a single channel image, which contains fewer information than color natural image; second, the similarity is high among local regions in SD-OCT images; third, SD-OCT images contain severe speckle noises. Based on the above discussion, in order to avoid this situation, we introduced the concept of dense connection into each branch, namely intra-branch connection, as the green arrows of Dense-AFCN show in Fig. 4. Different from ResNet, the size of the feature maps decreases along with the signals propagation in MPB-CNN. Therefore, performing the element-wise sum directly is infeasible. In order to deal with the inconsistency of size, a proper average-pooling was used to control the size of feature maps. Finally, the intra-connection can be written as follows:

(5)$$\begin{aligned}{{{ \tilde{\boldsymbol y}}}_i} &= {{\boldsymbol P}_i}[{{\boldsymbol y}_i}\left] { \oplus \ldots \oplus {{\boldsymbol P}_2}} \right[{{\boldsymbol y}_2}\left] { \oplus {{\boldsymbol P}_1}} \right[{{\boldsymbol y}_1}]\\ &= {{\boldsymbol P}_i}[{{f_i}({{f_{i - 1}}({ \ldots {f_1}({{{\boldsymbol x}_0}} )\ldots } )} )} ]\oplus \ldots \oplus {{\boldsymbol P}_2}[{{f_2}({{f_1}({{{\boldsymbol x}_0}} )} )} ]\oplus {{\boldsymbol P}_1}[{{f_1}({{{\boldsymbol x}_0}} )} ]\end{aligned}$$

where ${{\boldsymbol P}_i}[\cdot ]$ represents the corresponding average-pooling. ${{\boldsymbol P}_i}[\cdot ]$ forces the size of multiple feature maps, which perform the element-wise sum, to keep the same.

To further improve the segmentation results, an inter-branch connection was also added. During up-sampling, each branch can borrow information from the adjacent branch. We considered that the different scale information is helpful for each independent branch, as the yellow arrows show in Fig. 3.

Branch supervision and loss function: Deeper and more complex networks always result in the difficulties of training. Similarly, we stacked multiple subnets in parallel with random initialization, which leads to additional optimization difficulties. According to the previous addition, intra- and inter-branch connections are able to assist the training process. In order to further alleviate this problem, we added extra branch supervision in the end of each branch with pixel-wise ground truth to help optimizing each branch. The pixel-wise cross-entropy loss was applied to all predictions in the network. For the i-th branch with a pixel-wise cross-entropy loss ${E_i}$, the loss was computed as follows:

(6)$${E_i} = \mathop \sum \nolimits_{{\boldsymbol x} \in \Omega } \omega ({\boldsymbol x} )\log {p_l}({\boldsymbol x} )$$

where ${p_l}({\boldsymbol x} )$ provides the estimated probability of pixel ${\boldsymbol x}$ belonging to class l, and $\omega ({\boldsymbol x} )$ is the weight associated with pixel ${\boldsymbol x}$.

We treated the final loss function as E, and three extra branch supervision loss functions as ${E_1}$, ${E_2}$ and ${E_3}$, respectively. For the convenience of optimization, the four loss functions were combined together as follows:

(7)$$L(\Theta )= E + \alpha {E_1} + \beta {E_2} + \gamma {E_3}$$

where $\alpha $, $\beta $ and $\gamma $ control the balance between the final loss function and the branch loss functions, $\Theta $ denotes the parameters in deep learning.

In order to preserve the boundary of the CNV lesion, we added gradient constrain to the cross-entropy loss $L(\Theta )$. The sum of the boundary gradients of the segmentation results can be written as follows:

(8)$$G = \mathop \sum \nolimits_{{\boldsymbol n} \in \aleph } g({\boldsymbol n} )= \mathop \sum \nolimits_{{\boldsymbol n} \in \aleph } \sqrt {g{{_x^{\boldsymbol n}}^2} + g{{_y^{\boldsymbol n}}^2}} $$

where $\aleph $ denotes the set of boundary pixels, and $g({\boldsymbol n} )$ denotes the gradient computing operator. $g_x^{\boldsymbol n}$ and $g_y^{\boldsymbol n}$ denote the gradients of pixel n along the x direction and the y direction, respectively. The optimization target can be rewritten as follows:

(9)$${min}_{\Theta} L(\Theta)\quad s.t.{max}\ G$$

Finally, we rewrote the loss function for the final optimization as follows:

(10)$$Loss = L(\Theta )+ {e^{ - \mu {\ast }G}} = E + \alpha {E_1} + \beta {E_2} + \gamma {E_3} + {e^{ - \mu {\ast }G}}$$

where $\mu $ controlled the proportion of G.

In the testing phase, we only used the results of the main network as the final prediction. It could be found that, under the global optimization premise, local parameters of each branch were also optimized well with the extra branch supervision. This guarantees enough discrimination and balances among branches.

2.4 Implement details

Every input image was sent to three branches respectively acquiring three features with different scales. For M-FCN and M-DFCN, the sizes of the standard convolution in three branches were set to $3 \times 3$, $5\times 5$ and $9\times 9$, respectively. For MPB-CNN, after replacing the standard convolution with the atrous convolution, the rates of the atrous convolution in three branches were set to 1, 2 and 4, respectively. Then we cascaded the output of the three branches as the input to predict the segmentation label map followed by stacking three convolutional layers in backbone. The convolutional layers in each branch were composed of a $3\times 3$ atrous convolution operation and a ReLU followed by a dropout with the probability of 0.85. Unlike the changing channel number of FCN, the channel number of the convolutional layers was fixed to 64. In the encoder part of each branch, after several convolutional layers, max-pooling with a window of $2\times 2$ and a stride of 2 was performed. In order to train against the vanishing gradient and benefit the optimization, intra-branch connections were added to all max-pooling layers. Each connection of the intra-branch connections included an average-pooling operation with a changing window and stride, and an element-wise sum. Average-pooling operation forced the two feature maps, which needed to perform element-wise sum, to have a same size. The decoder part included three up-sampling operations. The first two up-sampling operations could be conducted by a $4\times 4$ deconvolutional operation with a stride of 2 and the final up-sampling recovered the size of the feature map to the size of the original image. During up-sampling, the decoder part borrowed the information from the encoder part and the adjacent branch to improve the segmentation results. In the loss function, $\alpha $, $\beta $ and $\gamma $ were all set to 0.5, and μ was set to 0.001.

The proposed MPB-CNN was implemented by Tensorflow based on Python3.5. We optimized it using the gradient descent algorithm with the batch size of 20. The learning rate was set to 0.0001. We applied a data augmentation in the training process. Here, horizontal flip was adopted. The number of iterations in this network was set to 50000. 90% of the training samples were conducted for training while the remaining part is used for validating. Most important constants, including the values of constants in the loss function, learning rate, kernel size and iterations, were selected by hyperparameter tuning. In addition, the remaining parameters, including dropout probability, channel number and batch size, were selected based on the experience. Table 1 summarizes the whole network parameters.

Table 1. Parameter setting for our network

View Table | View all tables in this article

3. Experiment

The experiment was performed in a hardware condition with Intel Xeon CPU, one 11 GB NVIDIA GeForce GTX 1080 GPU and 64 GB RAM, and a software condition with Python3.5 and Tensorflow. Considering the demand for massive amounts of data in deep learning, the experiments were designed in two ways: patient independence and patient dependence. Specifically, in the experiment with patient independence, samples from one patient were used for testing and samples from all the other patients were used for training and validating. An N-folder cross validation was performed until the samples from each patient have been tested. In other words, all cubes from 11 patients were used to train the proposed network and all cubes from the remaining patient were used for testing. All the testing samples will not appear in the training set. Additionally, obvious differences exist among different patients. In the experiment with patient dependence, all cubes were integrated together and the order was disrupted, of which 85% were used for training and validating and 15% were used for testing. In order to verify the improvement of MPB-CNN, FCN, DFCN, M-FCN and M-DFCN were used to make a comparison. We added dense connections to DFCN in the encoder part and borrowed the information from the encoder part during the decoder process comparing with the standard FCN (Fig. 7). We used FCN and DFCN respectively for M-FCN and M-DFCN in each multi-scale branch of MPB-CNN.

3.1 Data processing

Data collection: The proposed MPB-CNN was validated on a dataset containing 202 cubes from 12 patients diagnosed with CNV. All patients were aged between 50 and 81 and the ratio of men to women is 3:1. They were all treated with 3 + prn treatment. All cubes from the same patient were captured at different times with a special SD-OCT device (Carl Zeiss Meditec, Inc., Dublin, CA) from Jiangsu Province Hospital. As shown in Fig. 1, each SD-OCT volume, also known as a cube, contains $1024\times 512\times 128$ voxels with a corresponding trim size of $2\,{\rm{mm}} \times 6\,{\rm{mm}}\times 6\,{\rm{mm}}$. In a cube, each azimuthal slice, with the size of $1024\times 512$, is known as a B-scan.

Data screening: Although deep learning can achieve a surprising effect in the field of semantic segmentation and classification, it is still a data-driven supervised frame. Thus, it is difficult and unpractical for such networks to predict a case that never be learned. All the SD-OCT volume data was captured in a real scene in the hospital, which may lead to following situations: (1) CNV pathological characteristics from some patients are of great specificity and thus are lack of sufficient samples for network training and testing; (2) The real capturing process may bring severely nonuniform illumination and blurring. In order to deal with these situations and thus improve the prediction model, we performed data screening. Specifically, we classified all cubes into different categories based on the patterns difference of CNV manifestations, and then removed aforesaid unqualified samples from the whole dataset by observing. In order to guarantee the cube integrity, the removing unit is based on cube, which means we removed an entire cube instead of exact B-scans at one time during the screening. Finally, 202 cubes from 12 patients were selected for the final experiments.

Annotation: The CNV ground truth was annotated mainly by one expert and were given joint affirmation for many times by multiple experts to guarantee the precision of the final annotations. ILM and BM were segmented by automated graph-based surface segmentation algorithm described by Sijie Niu et al. [42].

Data augment: To train an acceptable deep learning model, enough training samples are necessary. In SD-OCT retinal images, the eye structures from the top to the bottom are respectively vitreous, retina, choroid and sclera, which does not yield to up-and-down symmetry and thus did not be considered in the data augment. In our experiment, we chose the flipping from left to right of each B-scan, which can completely preserve the eye structure, to conduct the data augment.

3.2 Evaluation criterion

To quantitatively evaluate the accuracy of segmentation result, we compared our experiment results with manual segmentations of CNV drawn by ophthalmologists. Quality control was established by having a third party grade the manual raters’ practice segmentation and feedback was given to improve their accuracy prior to the manual raters performing the segmentations for the study.

Six metrics were used to assess the differences in the volume and the area of the CNV segmentation results between the ground truth and the proposed method: dice coefficient (Dice), correlation coefficient (CC), overlap ratio (Overlap), overestimated ratio (Overest), underestimated ratio (Undest) and pixel accuracy (Pixel Acc). The pixel accuracy was computed based on the whole B-scan (with background), while the Dice and overlap ratio (without background) were computed only based on segmentation regions.

3.3 Quantitative evaluation

Table 2 shows an overall agreement between the automated segmentation and the manual segmentation on all B-scans with patient independence and patient dependence. On the whole, the segmentation results from patient dependence experiment have been greatly improved compared with those from patient independence experiment. That is because neural network is able to learn more similar cases between training samples and testing samples. Among all the network architectures mentioned in Table 2, MPB-CNN achieved the highest mean overlap ratio (77.8%) and dice value (0.875). Besides, overestimated ratio and underestimated ratio from MPB-CNN were both the lowest.

Table 2. Performance comparison of different networks on CNV dataset

View Table | View all tables in this article

According to the results of the patient independence experiment, the overlap ratio from MPB-CNN reached 60.8%, which still keeps the highest among all the other networks. Compared to the patient dependence experiment, the experiment with patient independence has greater practical significance for clinical ophthalmologists. That is because each OCT retinoscopy is independent in actual situations. To verify the effectiveness of intra-branch connections, we compared the results from single branch (FCN and DFCN) with that from multiple branches (M-FCN and M-DFCN) and found that the overlap ratio boosted by nearly 3% by adding intra-branch connections. The improvement of segmentation results from two groups (FCN and M-FCN, DFCN and M-DFCN) indicates that the multi-scale operation captures more discriminative features to distinguish CNV lesions from non-CNV regions. Another interesting comparison between MPB-CNN and M-DFCN demonstrates that the atrous convolution is more powerful than the standard convolution. Figure 6 presents a comparison of the overlap ratios from all patients. In conclusion, MPB-CNN has a general improvement comparing to all the other neural network architectures in ablation experiments.

Fig. 6. Quantitative comparison about the overlap of FCN, DFCN, M-FCN, M-DFCN and MPB-CNN.

Download Full Size | PDF

In our network, we also introduced the concept of branch supervision. Table 3 validated the effectiveness of the branch supervision. Branch supervision guarantees local optimization of each branch. The extra added term in the loss function, namely the gradient constraint, preserves the boundary of the CNV lesion, and thereby makes better visual effects. Table 4 shows the segmentation improvement of adding the gradient constraint, from which we can find that both the branch supervision and the gradient constraint improve the segmentation results to some extent.

Table 3. Effect validation of the branch supervision on CNV dataset

View Table | View all tables in this article

Table 4. Effect validation of the gradient constraint on CNV dataset

View Table | View all tables in this article

3.4 Qualitative analysis and discussion

Due to that the choroidal neovascularization (CNV) always appears in the advanced stage of age-related macular degeneration, the complex pathological characteristics of CNV is a huge challenge for accurate CNV segmentation. Until now, there are few literatures about CNV segmentation in SD-OCT B-scans. Figure 7 shows the visual effect comparison between MPB-CNN and the other network architectures mentioned. In order to compare the differences between the automated segmentation results and the ground truth more intuitively, we directly overlaid the boundaries of the target regions into the original images. We can find that incrassate RPE layer and exudation have a similar dominant appearance with CNV, which always mislead the neural network immensely. This makes overestimated higher than underestimated ratio generally in Table 2. Additionally, segmentation results indicate that the boundaries of the CNV lesions segmented by MPB-CNN is preserved better than those from other networks by adding boundary gradient constraint in custom loss function. Gradient constraint makes the boundary of the CNV lesion shrink to fit well and avoid overspread. Besides, the multi-scale learning enhances the discrimination power between the CNV lesion and other similar interferences. Figure 7 also shows 3D segmentation results of an exact cube, which compares the effects of all networks more intuitively. The CNV lesion segmented by MPB-CNN keeps a smoother boundary and a more nature transition, and meanwhile, avoids wrong segmentations of small regions and severe spread.

Fig. 7. The comparison of 2D and 3D segmentation results with visual effects among FCN, DFCN, M-FCN, M-DFCN and MPB-CNN. Red line is the boundary of the CNV lesion from ground truth, green line is the boundary of CNV lesion from the automated segmentation.

Download Full Size | PDF

Figure 8 shows the final visualization results from our method. We can find that MPB-CNN has enough capacity to segment most of the CNV lesions and agrees with the manual ground truth immensely. Meanwhile, ILM and BM are segmented precisely. Although effective layer segmentation can be performed on normal retinas, it is still a puzzle for layer segmentation accompanied by severe retinal lesion. MPB-CNN segments ILM and BM commendably, even in cases with severe CNV lesions. However, some failed segmentation still emerged. The last row of Fig. 8 shows a failure mode. Although having been improved a lot by MPB-CNN, incrassate RPE layer, exudation and foci still disturb a small amount of segmentation results. It is also difficult for experts to distinguish the case of CNV from the case of incrassate RPE layer and exudation. In addition, severe vanishing of the lower boundary of CNV may cause the mismatches between the automated segmentation and the manual segmentation. Besides, the CNV lesion, of which the height is even lower than RPE layer, may be misjudged as RPE layer. These complex situations is waiting to be solved in the future work.

Fig. 8. The final visualization results of the proposed method. The last row shows a failure mode. The red line is the boundary of the CNV lesion from ground truth, the green line is the boundary of the CNV lesion from the automated segmentation, the yellow line is the ILM segmented automatically from deep learning, the blue line is the BM segmented automatically from deep learning.

Download Full Size | PDF

4. Conclusion

In this paper, we proposed a novel neural network architecture, a multi-scale parallel branch CNN (MPB-CNN), for complex CNV segmentation in SD-OCT images. MPB-CNN combines the low level features with the multi-scale features to guarantee their diversity and abundance. Considering the lack of the information and the local similarity of each single channel SD-OCT B-scan, intra- and inter-branch connections were added to avoid the gradient vanishing. In MPB-CNN, the standard convolution was replaced with the atrous convolution which has been proved to be superior to the standard convolution in terms of the extraction of multi-scale features. In order to benefit the optimization of each branch, the concept of branch supervision was introduced. Finally, gradient constraint was added to loss function to preserve the boundary of the CNV lesion. Experiments were performed in two parts, one with the patient independence and one with the patient dependence, with a comparison to FCN, DFCN, M-FCN and M-DFCN. Results indicate that MPB-CNN has enough capacities to segment the CNV lesions robustly. In future works, segmentation accuracy is expected to get further improvement and weakly supervision is taken into consideration to reduce the number of samples.

Funding

National Natural Science Foundation of China (NSFC) (61671242/61473310); Suzhou Industral Innovation Project (SS201759); Key Research and Development Program of the Jiangsu Science and Technology Department (BE2018131.

References

1. Y. Jia, S. T. Bailey, D. J. Wilson, and O. Tan, “Quantitative Optical Coherence Tomography Angiography of Choroidal Neovascularization in Age-Related Macular Degeneration,” Ophthalmology 121(7), 1435–1444 (2014). [CrossRef]

2. L. Liu, S. S. Gao, S. T. Bailey, and D. Huang, “Automated choroidal neovascularization detection algorithm for optical coherence tomography angiography,” Biomed. Opt. Express 6(9), 3564–3576 (2015). [CrossRef]

3. W. M. Abdelmoula, S. M. Shah, and A. S. Fahmy, “Segmentation of Choroidal Neovascularization in Fundus Fluorescein Angiograms,” IEEE Trans. Biomed. Eng. 60(5), 1439–1445 (2013). [CrossRef]

4. S. S. Gao, L. Liu, and S. T. Bailey, “Quantification of choroidal neovascularization vessel length using optical coherence tomography angiography,” J. Biomed. Opt. 21(7), 076010 (2016). [CrossRef]

5. R. S. Sulaiman, J. Quigley, X. Qi, M. N. O’Hare, M. B. Grant, and M. E. Boulton, “A simple optical coherence tomography quantification method for choroidal neovascularization,” J. Ocul. Pharmacol. Ther. 31(8), 447–454 (2015). [CrossRef]

6. U. Schmidt-Erfurth, K. Kriechbaum, and A. Oldag, “Three-dimensional angiography of classic and occult lesion types in choroidal neovascularization,” Invest. Ophthalmol. Visual Sci. 48(4), 1751–1760 (2007). [CrossRef]

7. C. L. Tsai, Y. L. Yang, S. J. Chen, K. S. Lin, C. H. Chan, and W. Y. Lin, “Automatic Characterization of Classic Choroidal Neovascularization using AdaBoost for Supervised Learning,” Invest. Ophthalmol. Visual Sci. 52(5), 2767–2774 (2011). [CrossRef]

8. T. Fukuchi, K. Takahashi, M. Uyama, and M. Matsumura, “Comparative Study of Experimental Choroidal Neovascularization by Optical Coherence Tomography and Histopathology,” Jpn. J. Ophthalmol. 45(3), 252–258 (2001). [CrossRef]

9. T. E. de Carlo, M. A. Bonini Filho, A. T. Chin, M. Adhi, and D. Ferrara, “Spectral-Domain Optical Coherence Tomography Angiography of Choroidal Neovascularization,” Ophthalmology 122(6), 1228–1238 (2015). [CrossRef]

10. E. Bruyère, V. Caillaux, S. Y. Cohen, and D. Martiano, “Spectral-Domain Optical Coherence Tomography of Subretinal Hyperreflective Exudation in Myopic Choroidal Neovascularization,” Am. J. Ophthalmol. 160(4), 749–758 (2015). [CrossRef]

11. C. Framme, G. Panagakis, and R. Birngruber, “Effects on choroidal neovascularization after anti-VEGF Upload using intravitreal ranibizumab, as determined by spectral domain-optical coherence tomography,” Invest. Ophthalmol. Visual Sci. 51(3), 1671–1676 (2010). [CrossRef]

12. E. Brankin, P. McCuIlagh, N. Black, W. Patton, and A. Muldrew, “The optimisation of thresholding techniques for the identification of choroidal neovascular membranes in exudative age-related macular degeneration,” In Proc. 19th IEEE Symp. Comput.-Based Med. Syst, 430–435, (2006).

13. E. Brankin, P. McCullagh, W. Patton, A. Muldrew, and N. Black, “Identification of choroidal neovascularisation on fluorescein angiograms using gradient vector flow active contours,” In Proc. Int. Mach. Vis. Imag. Process. Conf, 165–169, (2008).

14. S. Takerkart, R. Fenouil, J. Piovano, A. Reynaud, L. Hoffart, F. Chavane, T. Papadopoulo, J. Conrath, and G. S. Masson, “A quantification framework for post-lesion neovascularization in retinal angiography,” In Proc. IEEE Int. Symp. Biomed. Imag, 1457–1460, (2008).

15. L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).

16. P. Luo, G. Wang, L. Lin, and X. Wang, “Deep Dual Learning for Semantic Image Segmentation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).

17. J. Fu, J. Liu, Y. Member, H. Wang, and Lu, “Stacked Deconvolutional Network for Semantic Segmentation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).

18. G. Wang, P. Luo, L. Lin, and X. Wang, “Learning Object Interactions and Descriptions for Semantic Image Segmentation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).

19. X. Yang, H. Chen, and J. Qin, “Automatic 3D Cardiovascular MR Segmentation with Densely-Connected Volumetric ConvNets,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).

20. K. Kamnitsas, E. Ferrante, S. Parisot, and C. Ledig, “DeepMedic for Brain Tumor Segmentation,” International Conference on Medical Image Computing and Computer Assisted Intervention, (2016).

21. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” International Conference on Medical Image Computing and Computer Assisted Intervention, (2015).

22. A. Dasgupta and S. Singh, “A fully convolutional neural network based structured prediction approach towards the retinal vessel segmentation,” IEEE International Symposium on Biomedical Imaging (ISBI), (2017).

23. T. Schlegl, S. M. Waldstein, and H. Bogunovic, “Fully Automated Detection and Quantification of Macular Fluid in OCT Using Deep Learning,” Ophthalmology 125(4), 549–558 (2018). [CrossRef]

24. L. Fang, D. Cunefare, C. Wang, R. H. Guymer, S. Li, and S. Farsiu, “Automatic segmentation of nine retinal layer boundaries in OCT images of non-exudative AMD patients using deep learning and graph search,” Biomed. Opt. Express 8(5), 2732–2744 (2017). [CrossRef]

25. C. S. Lee, A. J. Tyring, N. P. Deruyter, Y. Wu, A. Rokem, and A. Y. Lee, “Deep-learning based, automated segmentation of macular edema in optical coherence tomography,” Biomed. Opt. Express 8(7), 3440–3448 (2017). [CrossRef]

26. Y. Xu, K. Yan, J. Kim, X. Wang, C. Li, L. Su, S. Yu, X. Xu, and D. D. Feng, “Dual-stage deep learning framework for pigment epithelium detachment segmentation in polypoidal choroidal vasculopathy,” Biomed. Opt. Express 8(9), 4061–4076 (2017). [CrossRef]

27. A. G. Roy, S. Conjeti, S. P. K. Karri, D. Sheet, A. Katouzian, C. Wachinger, and N. Navab, “ReLayNet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks,” Biomed. Opt. Express 8(8), 3627–3642 (2017). [CrossRef]

28. R. Rasti, H. Rabbani, A. Mehridehnavi, and F. Hajizadeh, “Macular OCT Classification Using a Multi-Scale Convolutional Neural Network Ensemble,” IEEE Trans. Med. Imaging 37(4), 1024–1034 (2018). [CrossRef]

29. M. Chen, J. Wang, I. Oguz, B. L. VanderBeek, and J. C. Gee, “Automated segmentation of the choroid in edi-oct images with retinal pathology using convolution neural networks,” in Fetal, Infant and Ophthalmic Medical Image Analysis, 177–184, (2017).

30. A. Shah, et al., “Multiple surface segmentation using convolution neural nets: application to retinal layer segmentation in OCT images,” Biomed. Opt. Express 9(9), 4509–4526 (2018). [CrossRef]

31. X. Sui, Y. Zheng, B. Wei, H. Bi, J. Wu, X. Pan, Y. Yin, and S. Zhang, “Choroid segmentation from optical coherence tomography with graph edge weights learned from deep convolutional neural networks,” J. Neurocomput. 237, 332–341 (2017). [CrossRef]

32. F. Venhuizen, B. Ginneken, B. Liefers, M. Grinsven, S. Fauser, C. Hoyng C, T. Theelen, and C. Sanchez, “Robust total retina thickness segmentation in optical coherence tomography images using convolutional neural networks,” Biomed. Opt. Express 8(7), 3292–3316 (2017). [CrossRef]

33. A. Shah, M. D. Abramoff, and X. Wu, “Simultaneous multiple surface segmentation using deep learning,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 3–11, (2017).

34. O. Cicek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: learning dense volumetric segmentation from sparse annotation,” in International Conference on Medical Image Computing and Computer- Assisted Intervention, 424–432, (2016).

35. S. Zhu, X. Chen, F. Shi, D. Xiang, and W. Zhu, “3D choroid neovascularization growth prediction based on reaction-diffusion model,” Proceedings of the SPIE, (2016).

36. S. Zhu, F. Shi, D. Xiang, W. Zhu, H. Chen, and X. Chen, “Choroid Neovascularization Growth Prediction With Treatment Based on Reaction-Diffusion Model in 3-D OCT Images,” IEEE J. Biomed. Health Inform. 21(6), 1667–1674 (2017). [CrossRef]

37. L. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking Atrous Convolution for Semantic Image Segmentation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).

38. L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). [CrossRef]

39. E. Shelhamer, J. Long, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015).

40. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016).

41. Y. Tai, J. Yang, and X. Liu, “Image Super-Resolution via Deep Recursive Residual Network,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).

42. S. Niu, Q. Chen, L. de Sisternes, and D. L. Rubin, “Automated retinal layers segmentation and quantitative evaluation in SD-OCT images,” Comput. Biol. Med. 54, 116–128 (2014). [CrossRef]

Patient Independence
Methods		Dice	CC	Overlap	Overest	Undest	Pixel Acc
Single Branch	FCN	0.670	0.689	0.508	0.257	0.235	0.9937
Single Branch	DFCN	0.699	0.711	0.532	0.243	0.225	0.9933
Multiple Branches	M-FCN	0.705	0.724	0.550	0.308	0.142	0.9948
	M-DFCN	0.728	0.741	0.574	0.272	0.154	0.9956
	MPB-CNN	0.757	0.762	0.608	0.206	0.186	0.9965
Patient Dependence
Single Branch	FCN	0.844	0.882	0.730	0.180	0.090	0.9985
Single Branch	DFCN	0.852	0.886	0.742	0.174	0.084	0.9987
Multiple Branches	M-FCN	0.858	0.891	0.7470	0.181	0.072	0.9988
	M-DFCN	0.863	0.895	0.752	0.173	0.075	0.9989
	MPB-CNN	0.875	0.902	0.778	0.152	0.070	0.9991

	Dice	Overlap	CC	Pixel Acc
Non-branch supervision	0.733	0.583	0.750	0.9953
Branch supervision	0.757	0.608	0.762	0.9965

	Dice	Overlap	CC	Pixel Acc
Non-gradient constraint	0.737	0.587	0.753	0.9956
Gradient constraint	0.757	0.608	0.762	0.9965

Patient Independence
Methods		Dice	CC	Overlap	Overest	Undest	Pixel Acc
Single Branch	FCN	0.670	0.689	0.508	0.257	0.235	0.9937
Single Branch	DFCN	0.699	0.711	0.532	0.243	0.225	0.9933
Multiple Branches	M-FCN	0.705	0.724	0.550	0.308	0.142	0.9948
	M-DFCN	0.728	0.741	0.574	0.272	0.154	0.9956
	MPB-CNN	0.757	0.762	0.608	0.206	0.186	0.9965
Patient Dependence
Single Branch	FCN	0.844	0.882	0.730	0.180	0.090	0.9985
Single Branch	DFCN	0.852	0.886	0.742	0.174	0.084	0.9987
Multiple Branches	M-FCN	0.858	0.891	0.7470	0.181	0.072	0.9988
	M-DFCN	0.863	0.895	0.752	0.173	0.075	0.9989
	MPB-CNN	0.875	0.902	0.778	0.152	0.070	0.9991

	Dice	Overlap	CC	Pixel Acc
Non-branch supervision	0.733	0.583	0.750	0.9953
Branch supervision	0.757	0.608	0.762	0.9965

MPB-CNN: a multi-scale parallel branch CNN for choroidal neovascularization segmentation in SD-OCT images

Abstract

1. Introduction

2. Method

2.1 Problem statement

2.2 CNV pathologic analysis

2.3 Network architecture

2.4 Implement details

3. Experiment

3.1 Data processing

3.2 Evaluation criterion

3.3 Quantitative evaluation

3.4 Qualitative analysis and discussion

4. Conclusion

Funding

References

Cited By

Figures (8)

Tables (4)

Equations (10)

OSA Continuum

Parameter	Value	Parameter	Value
$α$	0.5	dropout probability	0.85
$β$	0.5	channel	64
$γ$	0.5	pooling	$2 \times 2$ window and stride 2
$μ$	0.001	learning rate	0.0001
kernel size	$3 \times 3$	batch size	20
rate	1, 2 and 4	iterations	50000