Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

MAFE-Net: retinal vessel segmentation based on a multiple attention-guided fusion mechanism and ensemble learning network

Open Access Open Access

Abstract

The precise and automatic recognition of retinal vessels is of utmost importance in the prevention, diagnosis and assessment of certain eye diseases, yet it brings a nontrivial uncertainty for this challenging detection mission due to the presence of intricate factors, such as uneven and indistinct curvilinear shapes, unpredictable pathological deformations, and non-uniform contrast. Therefore, we propose a unique and practical approach based on a multiple attention-guided fusion mechanism and ensemble learning network (MAFE-Net) for retinal vessel segmentation. In conventional UNet-based models, long-distance dependencies are explicitly modeled, which may cause partial scene information loss. To compensate for the deficiency, various blood vessel features can be extracted from retinal images by using an attention-guided fusion module. In the skip connection part, a unique spatial attention module is applied to remove redundant and irrelevant information; this structure helps to better integrate low-level and high-level features. The final step involves a DropOut layer that removes some neurons randomly to prevent overfitting and improve generalization. Moreover, an ensemble learning framework is designed to detect retinal vessels by combining different deep learning models. To demonstrate the effectiveness of the proposed model, experimental results were verified in public datasets STARE, DRIVE, and CHASEDB1, which achieved F1 scores of 0.842, 0.825, and 0.814, and Accuracy values of 0.975, 0.969, and 0.975, respectively. Compared with eight state-of-the-art models, the designed model produces satisfactory results both visually and quantitatively.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Segmentation of blood vessels in retinal images plays a unique and crucial role in the initial prevention, diagnosis, and evaluation of ocular diseases, as such diseases typically cause changes in retinal vascular morphology [1,2]. In spite of the fact that retinal images are commonly used for the diagnosis and treatment certain illnesses, manual segmentation of blood vessels is a extraordinarily challenging task due to low contrast, complex curvilinear structures, and irregular illumination [3,4], as shown in Fig. 1. Especially, this subjectivity may leads to inconsistent results among different doctors during the segmentation stage, which may hinder the diagnosis by clinical doctors [5]. As a result, we can use various deep learning frameworks to automatically and non-invasive capture micro-vessels and furnish rich vascular features from retinal images, effectively assisting clinicians in diagnosing and treating various eye diseases [6,7]. However, there is no unifying model has yet emerged for retinal vessel segmentation.

 figure: Fig. 1.

Fig. 1. Fundus vascular images. In which, the first row contains four original images, and the second row displays the corresponding areas inside the white box. The third row provides the corresponding ground truths for the images in the first row. Similarly, the fourth row offers the corresponding ground truths for the local regions in the second row.

Download Full Size | PDF

In order to assist clinicians in diagnosing and treating ocular diseases, many distinct algorithms have been presented to acquire a wide variety of features for retinal vessel segmentation. These segmentation approaches can be divided into three categories. The first category is to manually design the feature extraction layers to highlight the curvilinear structures, such as Hessian Matrix [8], second-order image derivatives [9], stick-based filters [10,11], and dynamic evolution models [12]. The second classification is to use deep learning methods to segment retinal vessels, such as UNet network [13], CENet network [14], NAUNet network [15], ConvUNext network [16], CS2Net network [17], SA-UNet network [18], and GDF-Net network [19]. The third type combines deep learning and manually designed feature layers to improve the segmentation accuracy of retinal images, such as D-GaussianNet [20], LIOT [21], and the combination of an ODoS filter and an IterNet network [22]. Although these methods have achieved good results, it is still difficult to detect weak retinal vessels [1].

The traditional strategy is to extract features based on the unique shapes and structures for retinal vessel segmentation. Based on this theory, Lesage et al. [8] proposed the Hessian matrix to precisely segment the curvilinear structures, but it may lead to incomplete and broken blood vessels. Using a different approach, Li et al. [23] presented a hardware-oriented approach to enhance fundus vascular images, but the segmentation efficiency was not good for thin blood vessels. However, the manual methods of designing feature layers could only extract a few features from retinal images and cannot accurately detect blood vessels.

In order to make up for the defect that traditional methods cannot accurately extract fundus information, many deep learning models have been designed to segment blood vessels. Such as the milestone based segmentation network named FCN [24], but it may led to serious information loss and was quickly abandoned by researchers. Its improved version named UNet [13], by designing a skip connection operation to reduce information loss, but the information fusion in the skip connection part has its limitations. The limitation is that the consecutive pooling operations or convolution striding reduce the feature resolution, making it difficult to learn increasingly abstract feature representations. To remedy the shortcoming, Gu et al. [14] designed the CENet, which used the dilate convolution to extract features of the target objects, and introduced DAC and ASPP to reduce the information loss. But dilate convolution could produced grid effects, making it difficult to segment small blood vessels. Similarly, Han et al. [16] presented a unique ConvUnext architecture, which has significant advantages in expanding receptive fields and removing irrelevant information from retinal images. Unfortunately, those methods [14,16] with very high computational power requirements cannot meet the demand for fast segmentation. To reduce time consumption, Yang et al. [15] designed a lightweight NAUNet. By adding channel attention mechanism to the lightweight UNet model, it can segment blood vessels quickly. Using the same strategy, Guo et al. [18] proposed a SA-UNet architecture to filter out irrelevant information, but it could not accurately segment weak blood vessels. Recently, inspired by the self-attention mechanism [25,26], Mou et al. [17] introduced a CS2-Net to capture global information of fundus images. However, those deep learning algorithms [1418] were mainly looking for powerful structures and frameworks, ignoring the complexity of the algorithms.

Recently, many deep learning models have been combined with manually designed feature layers to achieve the goal of segmenting retinal vessels. For example, Alvarado-Carrillo et al. [20] applied the combination of Gaussian filter and adaptive parameters to segment retinal images. Although good results were achieved, it consumed too much memory and time. Using a different approach, Shi et al. [21] designed a LIOT scheme to detect blood vessels. However, it may result in information loss during the image transformation stage. Recently, Peng et al. [22] improved the model [21] and combined an ODoS filter with an IterNet network, achieving good results in retinal vessel detection. Nevertheless, those effective methods require adjusting parameters for the most effective feature extraction. If the parameters were poorly chosen, the desired effect may not be achieved.

However, most retinal vessel segmentation approaches are improvements on the UNet model, which may result in the loss of some scene information. To cope with this challenge, the self-attention mechanism [25,26] has been gradually integrated into various deep learning frameworks for image segmentation. Unlike former frameworks, the essence of the self-attention mechanism is to extract global information and reduce the loss of image information. Unfortunately, the self-attention mechanism construction takes a long time. Hence, the self-attention mechanism is integrated with convolutional neural networks to develop a simple, practical and efficient algorithm for retinal vessel detection. Furthermore, the ensemble learning model can effectively integrate the advantages of multiple deep learning models, so as to segment retinal vessels more accurately. Therefore, an ensemble learning network is designed to improve retinal vessel segmentation.

In this study, we present a unique and practical framework based on multiple attention-guided fusion mechanism and ensemble learning network (MAFE-Net) for blood vessel segmentation. Firstly, the attention fusion module is applied to reduce scene information loss. Secondly, a spatial attention module is employed to effectively combine low-level and high-level features for redundant and irrelevant information suppression. Thirdly, a DropOut layer is employed to prevent overfitting of the presented framework. Finally, the ensemble learning framework is designed to improve retinal vessel segmentation. Our work can be summarized as follows:

  • 1. A lightweight neural network comprising of four different encoders and decoders is applied to improve the segmentation performance. Furthermore, Batch Normalization (BN) and DropOut modules were imported during convolution operations, preventing the improved MAFE-Net from overfitting.
  • 2. To address the limitations of convolution, a self-attention fusion module is employed to aggregate spatial-channel attention units in parallel. This facilitates the extraction of global information from retinal images, thereby compensating for the aforementioned defects.
  • 3. The skip connection component incorporates a spatial attention module to enhance the model’s ability to acquire pertinent features by filtering out extraneous information and assigning greater importance to relevant information.
  • 4. To further improve the segmentation performance, an ensemble learning strategy is used to combine multiple deep learning models for retinal vessel segmentation.
  • 5. An efficient curvilinear structure detection method is developed through the joint application of the ensemble learning framework, attention fusion module, and DropOut layer.

2. Related works

2.1 Attention mechanism-based methods

Recently, various attention mechanisms have been presented for image segmentation. The main reason is that attention mechanism-based models can learn more effective parameters and extract more effective dynamic features. Consequently, their segmentation performance and robustness are improved. Based on this theory, many researchers have presented many unique algorithms to segment blood vessels by incorporating various designed channel attention mechanisms and spatial attention mechanisms into different deep learning frameworks, thereby improving segmentation efficiency and enhancing model flexibility and robustness. For instance, Li et al. [27] presented a novel FANet based on dual-direction attention and soft attention to capture global contextual information and multi-scale features for retinal image segmentation improvement. Taking a different approach, Alvarado-Carrillo et al. [28] designed a width attention mechanism to emphasize on dynamic changes in blood vessel width. Unfortunately, those models [27,28] exhibit a high level of complexity. To overcome the disadvantage, Li et al. [29] designed a lightweight attention mechanism to capture global information and reduce time consumption. Recently, Mou et al. [17] presented a CS2-Net based on spatial-channel attention for integrating local and global features. Similarly, Li et al. [30] proposed a global self-attention mechanism approach for retinal vessel segmentation. However, those models [17,30] exhibit a common drawback of having a high computational resource consumption.

2.2 Ensemble learning

Ensemble learning, as a fundamental approach, aims to leverage the strengths of multiple models in order to achieve desirable generalization performance [31]. The amalgamation of predictions from various individual models has proven to be an efficacious technique for improving model performance [32]. Consequently, ensemble learning has gained significant traction in the medical domain, finding extensive applications in the detection of lung and colon cancer [33], heart disease [34], COVID-19 [35], thyroid nodules [36], and retinal vessel segmentation [37]. In terms of retinal image segmentation, Fraz et al. [38] used an ensemble system to segment blood vessels, but the segmentation accuracy cannot meet clinical requirements. To improve the segmentation accuracy, Wang et al. [39] adopted the winner-takes-all classifier to acquire the best classification performance. Taking a different approach, Du et al. [40] presented an ensemble strategy to fuse different deep learning models for retinal vessel segmentation, and achieved satisfactory results. Although the ensemble strategy may increase the computational complexity, it can greatly improve the segmentation results.

3. Materials and method

3.1 Datasets

Three publicly available datasets were used to demonstrate the effectiveness of MAFE-Net: DRIVE [41], STARE [42], and CHASEDB1 [38]. Table 1 shows the specific details of these datasets.

Tables Icon

Table 1. The specific details of the publicly available datasets.

3.2 Method

3.2.1 Network architecture

In this work, a multiple attention-guided fusion network (MAF-Net) is presented for retinal vessel segmentation. As shown in Fig. 2(a), an attention fusion module (AFM) is introduced to extract global information. In contrast, a spatial attention module (SAM) is applied to remove redundant information. In order to better understand the designed model, all details are presented in Table 2. Furthermore, an ensemble learning network MAFE-Net is designed to integrate four different deep learning models (UNet [13], SA-UNet [18], CS2-N et [17] and MAF-Net) for retinal vessel segmentation, as shown in Fig. 2(b). The difference between MAF-Net and MAFE-Net is that the MAFE-Net integrates the advantages of multiple models, which allows for better segmention of fundus images.

 figure: Fig. 2.

Fig. 2. The presented deep learning network. (a)The MAF-Net structure. (b) The MAFE-Net structure. Specifically, the UNet, SA-UNet, CS2-Net, and MAF-Net are integrated into a new MAFE-Net.

Download Full Size | PDF

Tables Icon

Table 2. The parameter details of the MAF-Net.

3.2.2 Attention fusion module

The dual-attention mechanism, initially designed by Fu et al. [43], enhances contextual information to compensate for scene information lost during downsampling. Building upon the findings of previous studies [17,18,43], the MAF-Net incorporates a self-attention fusion module [17].

It is widely acknowledged that the utilization of local features obtained from neural networks may lead to classification errors, primarily due to their limited ability to effectively model fundus images at a global level. In order to address this issue, a dual-attention mechanism is introduced. This mechanism aims to capture and incorporate global information present in retinal images, as depicted in Fig. 3. By operating in parallel, the mechanism mitigates any potential interference between the attention mechanisms and facilitates the comprehensive extraction of intricate details to retinal vessels. In which, the impact of the spatial attention module on long-range correlation and their ability to extract global information from fundus images is examined. Conversely, the channel attention mechanism primarily focuses on weighting information, assigning higher weights to channels that contain valuable data and reducing the weights of channels with less pertinent information.

 figure: Fig. 3.

Fig. 3. The attention fusion module (AFM). In which, the spatial attention mechanism and the channel attention mechanism are combined to enhances contextual information for scene information compensation.

Download Full Size | PDF

The issue of global feature extraction can be effectively addressed by employing the spatial attention mechanism [17,18,43], which improves the model’s ability to learn the underlying global features. To improve the segmentation performance of retinal vessels along both the horizontal and vertical axes, the convolution operations are substituted with horizontal and vertical convolutions, respectively. In order to accurately capture the blood vessels in the retinal images, we decompose the fundus image along the horizontal and vertical directions, and use 1x3 convolution and 3x1 convolution to replace the traditional 3x3 convolution, so that more information about the edges of the blood vessels can be obtained. As depicted in Fig. 4, this mechanism takes input features $\mathrm {F} \in R^{C\times H\times W}$ (where N=H$\times$W), and linearly maps them to generate three matrices: Qy, Kx, and $\mathrm {V} = R^{C\times H\times W}$. The variables Qy and Kx pertain to the vertical and horizontal directional characteristics. Additionally, the variable C indicates the quantity of feature channels, while H and W respectively represent the width and height dimensions. Consequently, the acquisition of the spatial attention map can be accomplished through the utilization of a softmax function.

$$S(x,y) = \frac{{\exp (Q_y^T \cdot {K_x})}}{{\sum\limits_{x' - 1}^N {\exp (Q_y^T \cdot {K_{x'}})} }}$$
Where the relationship between the xth position and yth position is denoted by S(x,y). The spatial attention map effectively captures the vascular structures across various spatial regions, with higher similarity leading to increased values in the feature map. Additionally, another feature V is acquired by employing a 1$\times$1 convolutional layer and reshaping it accordingly. Consequently, the mathematical representation:
$$\mathrm{SA} = \mathrm{reshape}\left ( S \otimes V \right ) \oplus \mathrm{F}$$

Using the spatial attention mechanism, blood vessel segmentation could be improved by extracting global features from retinal images.

 figure: Fig. 4.

Fig. 4. The structure of the spatial attention mechanism.

Download Full Size | PDF

The classification of retinal image channels into distinct categories can contribute to the identification of interconnected internal semantic characteristics, thereby extracting pertinent information from various channels to enhance semantic expression capability. The channel attention mechanism, illustrated in Fig. 5. This function multiplies the input $\mathrm {F_{x} } \in R^{C\times H\times W}$ with its transpose matrix $\mathrm {F_{_{y}}^{T} } \in R^{C\times H\times W}$ to produce the channel attention feature map. The specific formula is presented below:

$$C(x,y) = \frac{{\exp (F_{x} \cdot F_{y}^{T} )}}{{\sum\limits_{x' - 1}^C {F_{x^{\prime } } \cdot F_{y}^{T} } }}$$

Here, the relationship between the xth channel and yth channel is denoted by C(x,y). Through the computation of the feature map, channels exhibiting high similarity are enhanced, while channels with low similarity are inhibited. Subsequently, the Softmax activation function is employed to distinguish between background and vascular structures in retinal images. Consequently, the mathematical representation:

$$\mathrm{CA} = \mathrm{reshape}\left ( C \otimes F \right ) \oplus \mathrm{F}$$

The incorporation of the channel attention mechanism leads to an augmentation in the distinction among various channels, thereby enhancing the overall efficacy of the model. Consequently, the dual-attention mechanism can be explicated as follows.

$$\mathrm{DA} = \mathrm{SA} \oplus \mathrm{CA} \oplus \mathrm{F}$$

The utilization of the dual-attention mechanism exhibits commendable efficacy in enhancing feature representation, thereby facilitating the acquisition of more accurate segmentation outcomes.

 figure: Fig. 5.

Fig. 5. The structure of the channel attention mechanism.

Download Full Size | PDF

3.2.3 Spatial attention module

The UNet model addresses this issue of downsampling by integrating image information from both the encoder and decoder sides. Nevertheless, this strategy introduces a blend of low-level and high-level features, resulting in limited efficacy. In fact, this amalgamation may introduce superfluous and unrelated noise to high-level features, consequently leading to inadequate detection, particularly for blood vessels detection [44]. Inspired by previously published models [15,18,44], the incorporation of the spatial attention module [18] into the skip connection component, aims to eliminate superfluous information. The underlying approach is visually depicted in Fig. 6. Assuming that the input features $\mathrm {F } \in R^{H\times W\times C}$, two features $\mathrm {F_{avgpool} } \in R^{H\times W\times 1}$ and $\mathrm {F_{maxpool} } \in R^{H\times W\times 1}$ are generated after average pooling and max pooling. By employing max pooling and average pooling techniques, matrix vectors $\mathrm {F_{maxpool} }$ and $\mathrm {F_{avgpool} }$ are produced. Following this, the spatial attention feature map is generated through convolution and Sigmoid operations. The output features $\mathrm {F_{s} } \in R^{H\times W\times C}$ are obtained by the dot product of the attentional feature map and the input features. This operation improves the segmentation accuracy of retinal images. The mathematical representation:

$$\begin{aligned}F s & =F \cdot \sigma\left(f^{7 \times 7}([\operatorname{MaxPool}(F) ; \operatorname{AvgPool}(F)])\right)\\ & =F \cdot \sigma\left(f^{7 \times 7}\left[\mathrm{F}_{\text{maxpool }} ; \mathrm{F}_{\text{avgpool }}\right]\right) \end{aligned}$$
where ${f}^{7\times 7} \left ( \cdot \right )$ stands for convolution operation with convolution kernel 7, and $\sigma \left ( \cdot \right )$ denotes for Sigmoid function.

 figure: Fig. 6.

Fig. 6. The structure of the spatial attention module (SAM).

Download Full Size | PDF

3.2.4 DropOut module

Despite the initial application of a data augmentation operation in the presented framework, it may lead to overfitting in cases where there is an inadequate number of retinal images for training [45]. To address this issue, the DropOut module is introduced to encourage the network to acquire more resilient and efficient features [19]. Unlike the conventional convolution module, the incorporation of DropOut and BN (Batch Normalization) into the convolution layers, as depicted in Fig. 7, facilitates the acceleration of the neural network’s convergence rate while mitigating the problem of model overfitting.

 figure: Fig. 7.

Fig. 7. DropOut module. The diagram shows the incorporation of DropOut and BN (Batch Normalization) modules into the convolution layers.

Download Full Size | PDF

Although data augmentation operation increases the number of images, it is still inadequate for neural networks. Specifically, the DropOut module is applied into the presented model to avoid overfitting phenomenon. In other words, the DropOut module is randomly implemented by deactivating some neurons during training. As a result, these operations reduced the dependence of neural networks on specific features, enabling them to learn more robust and universal features. In this way, the neural network overfitting can be prevented [46].

3.2.5 Ensemble learning

To further improve the segmentation performance, an ensemble learning network [33,47] named MAFE-Net is designed to combine multiple different lightweight deep learning models (UNet, SA-UNet, CS2-Net and MAF-Net) for retinal vessel detection, as presented in Fig. 8.

 figure: Fig. 8.

Fig. 8. Overall of the deep learning network framework diagram. (a) UNet structure; (b) SA-UNet structure; (c) CS2-Net structure; (d) MAF-Net structure; (e) The ensemble learning network of our proposed MAFE-Net, which integrates the above four deep learning models.

Download Full Size | PDF

As the first architecture, we applied a lightweight UNet model to learn the contextual information from retinal vessel images for the predicted probability achievement. In which, the skip connection part was used to reduce the spatial information loss, as presented in Fig. 8(a).

The second implemented architecture is the lightweight SA-UNet model presented by Guo et al. [18]. As shown in Fig. 8(b), a spatial attention module in the skip connection part is presented to remove redundant information, and assign more weights to relevant information. What’s more, a DropOut layer is used to prevent the SA-UNet network from overfitting.

Different from the previous introduced UNet and SA-UNet models, Mou et al. introduced a CS2-Net model [17], which includes two types of attention modules to further integrate local features with global features for retinal vessel segmentation improvement. Unlike the original CS2-Net model, we adopted a lightweight CS2-Net for model integration, as shown in Fig. 8(c).

The MAF-Net model, illustrated in Fig. 8(d), represents the fourth implemented architecture. This model incorporates the attention fusion module to extract diverse blood vessel features. Additionally, a spatial attention module is integrated into the skip connection to eliminate redundant information, while the DropOut layer is utilized to randomly discard certain neurons. These modifications can promote the MAF-Net to learn more effective features, resulting in improved retinal vessel segmentation.

To construct the ensemble learning framework, the convolutional layers were applied to integrate the predicted probability from above deep learning models for final segmentation results achievement. Finally, the Sigmoid activation function is applied to acquire the segmentation results of retinal vessels, and the details is demonstrated in Fig. 8(e).

4. Experiments

4.1 Implementation details

There are some publicly available retinal image datasets on the internet, but each dataset only includes a few dozens fundus images. To avoid overfitting of the specific neural network, we introduce a data augmentation strategy. Specifically, random rotation, adding Gaussian noise, and color jittering are used on retinal fundus images to increase the number of pictures. In addition, all the algorithms have the same epoches, batchsize and learning rate. Where the epoch is 100, and the batchsize is 8. The details are described in Table 3.

Tables Icon

Table 3. The specific information of DRIVE, STARE, and CHASEDB1.

4.2 Evaluation matrix

To illustrate the segmentation efficacy of the designed MAF-Net and MAFE-Net, conventional evaluation metrics such as accuracy (Acc), F1-score, sensitivity (SE), and specificity (SP) are employed to illustrate the superior performance of these methods compared to numerous state-of-the-art approaches. This superiority is demonstrated mathematically.

$$\mathrm{Acc} = \frac{\mathrm{TP}+\mathrm{TN} }{\mathrm{TP} +\mathrm{FP} +\mathrm{TN}+\mathrm{FN} }$$
$$\mathrm{SE} = \frac{\mathrm{TP} }{\mathrm{TP} +\mathrm{FN} }$$
$$\mathrm{SP} = \frac{\mathrm{TN} }{\mathrm{TN} +\mathrm{FP} }$$
$$\mathrm{F_{1} } = \frac{2\times \mathrm{TP} }{2\times \mathrm{TP} +\mathrm{FP} +\mathrm{FN} }$$

Among these metrics, TP denotes the count of pixels accurately identifying retinal vessels as genuine vessels, and FP signifies the rest pixels erroneously identifying background as genuine vessels, TN represents the count of pixels accurately identifying background as true background, and FN indicates the count of pixels incorrectly identifying the vessel class as true background.

4.3 Visual inspection

To illustrate the effectiveness of the presented MAF-Net and MAFE-Net in weak object segmentation, our computerized models were validated on three datasets. As described, Fig. 9(a) and (b) denote the original retinal images and their corresponding labels. Figure 9(c-l) represent the retinal vessel segmentation results by using UNet [13], ResUNet, Attention-UNet, R2UNet, UNet++ [48], SA-UNet [18], CS2Net [17], LIOT [21], MAF-Net and MAFE-Net models, respectively. In which, the areas marked with red boxes are partially weak objects. Based on comparisons with eight state-of-the-art models, the presented framework exhibits amazing performance in weak vessel segmentation. In other words, the presented model achieves the maximum SE value.

 figure: Fig. 9.

Fig. 9. Experimental results with different state-of-the-art models on DRIVE, STARE, and CHASEDB1 datasets.

Download Full Size | PDF

4.4 Quantitative evaluation

To better illustrate the validity of the presented models, we applied commonly used quantization matrices (F1, ACC, SE, and SP) for validation on widely and publicly accessible datasets, i.e., STARE, DRIVE and CHASEDB1. As described in Table 4, the presented MAFE-Net model exhibited the highest F1 and ACC values when compared to eight state-of-the-art methods. In which, the highlighted UNet network, SA-UNet, and CS2Net are intergrated to a new deep learning framwork named USC-Net. Conversely, the proposed MAF-Net model demonstrated the highest SE values. Experimental results demonstrate that the presented MAF-Net and MAFE-Net models segment retinal vessels extremely well.

Tables Icon

Table 4. Quantitative evaluation with different methods.

4.5 Ablation study

4.5.1 Influence of each module on the presented framework

Ablation studies were performed to evaluate the effectiveness of various modules in the MAF-Net. Specifically, we compared the DropOut layer with the spatial attention module (SAM), the BN layer (DB), and the attention fusion module (AFM). The results, as presented in Table 5, confirm the significance of each module in enhancing the algorithm’s performance on the STARE and DRIVE datasets. Notably, the removal of any module resulted in a decrease in both F1 and Acc values for the entire model. This observation supports our claim that the module contributes to improved accuracy in fundus vascular image segmentation, albeit with a marginal increase in computational requirements.

Tables Icon

Table 5. The influence of each module on the presented framework.

4.5.2 Effect of parameters in the DropOut layer on the presented framework

The inclusion of the DropOut layer within the MAF-Net framework serves to mitigate overfitting. However, it is important to note that the parameters associated with the DropOut layer can also influence the performance of neural networks. To demonstrate the influence of the parameters, we performed experiments on the STARE, DRIVE and CHASEDB1 datasets, utilizing diverse parameter values for validation purposes. The obtained results, as depicted in Table 6, reveal that setting the DropOut parameters to 0.5, 0.7, and 0.9 respectively yielded varying experimental outcomes. Notably, when the DropOut parameter was set to 0.9, both F1 and Acc achieved their maximum values.

Tables Icon

Table 6. The influence of parameters in the DropOut layer on the presented framework.

4.5.3 Effect of the model parameters on the presented framework

To prove that our proposed model is a lightweight model, we have conducted extensive comparisons with other methods in model parameters. As shown in Table 7, the model parameters of the presented MAF-Net and MAFE-Net are 0.51M and 5.88M.

Tables Icon

Table 7. The model parameters with different methods.

4.5.4 Three datasets are combined to illustrate the validation of the presented method

To further demonstrate the effectiveness of the algorithm, we have combined the three datasets to form a new dataset. Among them, 50 images were used for training and 38 images were used for testing. As shown in Table 8, the presented method has higher parameter indicators F1 and ACC than many compared methods.

Tables Icon

Table 8. The indicators with different methods.

5. Discussion

The study introduces a novel approach for segmenting retinal images by employing multiple attention mechanisms and ensemble learning networks. This approach has been proven to be effective even in scenarios where vessels are thin, weak, and inhomogeneous. The framework offers several advantages and distinct characteristics. Firstly, it enhances the UNet model by incorporating the DropOut module and BN (Batch Normalization) module within the convolutional neural network. These modifications not only improve the performance of the MAFE-Net but also ensure the maintenance of segmentation accuracy. Secondly, in order to address the constraints posed by convolution and attain comprehensive modeling of retinal images, this study introduces a dual-attention mechanism to extract global information. This mechanism enables effective interaction between local and global information, thereby enhancing the simultaneous extraction of information from retinal images. Consequently, this approach ultimately leads to an improvement in the accuracy of segmenting minute vessels. Thirdly, a spatial attention module is applied to eliminate extraneous information from the encoder components, thereby ensuring the effective fusion of decoder images and enhancing the accuracy of retinal image segmentation. Additionally, an ensemble learning framework is devised to enhance the blood vessel segmentation performance by integrating multiple distinct deep learning models. The precise segmentation of blood vessels in fundus images can significantly assistant medical professionals in the diagnosis and treatment of various retinal diseases.

The presented framework was validated on three publicly available datasets. Compared with eight state-of-the-art models, through visual examination and quantitative assessment, it demonstrates the exceptional performance of the presented framework in accurately segmenting thin, weak, and inhomogeneous blood vessels. The main reasons are as follows: (1) The presented MAF-Net and MAFE-Net incorporate a dual attention mechanism that effectively extracts retinal vessel information from both spatial and channel domains. (2) The spatial attention module implemented in this framework serves to eliminate irrelevant information originating from the encoder side, thereby preventing the transmission of irrelevant information to the decoder side. (3) The reduction in the number of channels within the algorithm enhances its running speed without compromising the segmentation accuracy of retinal images.

However, the proposed method may result in fragmented and incomplete segmentation of small blood vessels due to the uneven contrast and substantial variations in lighting observed in retinal images. This can be observed in Fig. 10, where a comparison with ground truths reveals that the proposed method may lead to incomplete representation of certain small blood vessels. Furthermore, the emphasis of the MAFE-Net lies in the development of deep architectures, neglecting the capture of shape features pertaining to retinal vessels. Despite the numerous limitations of the presented deep learning framework, it demonstrates efficacy in detecting delicate structures.

 figure: Fig. 10.

Fig. 10. The breakage and incompleteness of the segmented small blood vessels. Where the first row contains four original images, and the second row contains the corresponding ground truths. The third row shows the corresponding ground truths in the green box. The fourth row shows the corresponding segmentation results of the MAFE-Net. The fifth row showcases the outcomes in the green box.

Download Full Size | PDF

6. Conclusion

The purpose of this study is to effectively segment retinal vessels, with a specific focus on the difficulties associated with segmenting vessels that are weak, thin, and exhibit inhomogeneity. To mitigate the issue of overfitting in the UNet model, the utilization of DropOut and Batch Normalization techniques is introduced. Furthermore, considering that convolutional neural networks cannot fully capture the complex information in retinal vessel images, this research employs the integration of spatial-channel attention modules to concurrently extract comprehensive knowledge from retinal images. Unfortunately, the approach may result in irrelevant information from the encoder side to the decoder side. In order to alleviate the troublesome problem, the spatial attention module is imported. Moreover, to further improve the blood vessel segmentation performance, an ensemble learning framework is designed to combine multiple different deep learning models to detect retinal vessels. The presented methodology is assessed using publicly accessible datasets, i.e., DRIVE, STARE, and CHASEDB1. The experimental findings indicate that the MAF-Net and MAFE-Net frameworks, as presented, exhibit commendable performance in retinal vessel segmentation when compared to several contemporary approaches. However, it is worth noting that although our algorithm has successfully segmented certain weak vessels, there may still be instances of vessel breakage. In the future, we will attempt to apply all models using the same encoder with different branches (UNet, SA-UNet, CS2Net, MAF-Net) to construct a new deep learning framework for fundus image segmentation, so as to effectively reduce the issue of vessel breakage.

7. Author contribution statement

Pengpeng Luan: Conceptualization, Data curation, Methodology, Software, Validation, Visualization, Writing original draft. Yuanyuan Peng, Yingjie Tang, Pengpeng Luan, Zixu Zhang and Hongbin Tu: Conceptualization, Funding acquisition, Writing - review & editing.

Funding

Natural Science Foundation of Jiangxi Province (20212BAB202007, 20202BAB212004, 20224BAB202024); National Natural Science Foundation of China 62265007; Jiangxi Provincial Graduate Student Innovation Foundation YC2023-S514.

Disclosures

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

Data underlying the results presented in this paper are available in Ref. [38,41,42]. These STARE, DRIVE and CHASEDB1 datasets are publicly available online for free download.

References

1. M. Badar, M. Haris, and A. Fatima, “Application of deep learning for retinal image analysis: A review,” Comput. Sci. Rev. 35, 100203 (2020). [CrossRef]  

2. H. Tong, Z. Fang, Z. Wei, et al., “Sat-net: a side attention network for retinal image segmentation,” Appl. Intell. 51(7), 5146–5156 (2021). [CrossRef]  

3. X. Bian, G. Wang, Y. Wu, et al., “Tci-unet: transformer-cnn interactive module for medical image segmentation,” Biomed. Opt. Express 14(11), 5904–5920 (2023). [CrossRef]  

4. Y.-f. Zhu, X. Xu, X.-d Zhang, et al., “Ccs-unet: a cross-channel spatial attention model for accurate retinal vessel segmentation,” Biomed. Opt. Express 14(9), 4739–4758 (2023). [CrossRef]  

5. Z. Deng, Y. Cai, L. Chen, et al., “Rformer: Transformer-based generative adversarial network for real fundus image restoration on a new clinical benchmark,” IEEE J. Biomed. Health Inform. 26(9), 4645–4655 (2022). [CrossRef]  

6. K. S. Kumar and N. P. Singh, “Analysis of retinal blood vessel segmentation techniques: a systematic survey,” Multimed. Tools Appl. 82(5), 7679–7733 (2023). [CrossRef]  

7. J. Song, X. Chen, Q. Zhu, et al., “Global and local feature reconstruction for medical image segmentation,” IEEE Trans. Med. Imaging 41(9), 2273–2284 (2022). [CrossRef]  

8. D. Lesage, E. D. Angelini, I. Bloch, et al., “A review of 3d vessel lumen segmentation techniques: Models, features and extraction schemes,” Med. Image Anal. 13(6), 819–845 (2009). [CrossRef]  

9. M. Ashikuzzaman, A. Sadeghi-Naini, A. Samani, et al., “Combining first- and second-order continuity constraints in ultrasound elastography,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr. 68(7), 2407–2418 (2021). [CrossRef]  

10. Y. Peng and C. Xiao, “An oriented derivative of stick filter and post-processing segmentation algorithms for pulmonary fissure detection in ct images,” Biomed. Signal Process. Control. 43, 278–288 (2018). [CrossRef]  

11. H. Zhao, B. C. Stoel, M. Staring, et al., “A framework for pulmonary fissure segmentation in 3d ct images using a directional derivative of plate filter,” Signal Process. 173, 107602 (2020). [CrossRef]  

12. D. D. Sheka, O. Pylypovskyi, V. O. M. Volkov, et al., “Fundamentals of curvilinear ferromagnetism: Statics and dynamics of geometrically curved wires and narrow ribbons,” Small 18(12), 1 (2022). [CrossRef]  

13. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, eds. (Springer International Publishing, Cham, 2015), pp. 234–241.

14. Z. Gu, J. Cheng, H. Fu, et al., “Ce-net: Context encoder network for 2d medical image segmentation,” IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019). [CrossRef]  

15. D. Yang, H. Zhao, K. Yu, et al., “Naunet: lightweight retinal vessel segmentation network with nested connections and efficient attention,” Multimed. Tools Appl. 82(16), 25357–25379 (2023). [CrossRef]  

16. Z. Han, M. Jian, and G.-G. Wang, “Convunext: An efficient convolution neural network for medical image segmentation,” Knowledge-Based Syst. 253, 109512 (2022). [CrossRef]  

17. L. Mou, Y. Zhao, H. Fu, et al., “Cs2-net: Deep learning segmentation of curvilinear structures in medical imaging,” Med. Image Anal. 67, 101874 (2021). [CrossRef]  

18. C. Guo, M. Szemenyei, Y. Yi, et al., “Sa-unet: Spatial attention u-net for retinal vessel segmentation,” in 2020 25th International Conference on Pattern Recognition (ICPR), (2021), pp. 1236–1242.

19. J. Li, G. Gao, L. Yang, et al., “Gdf-net: A multi-task symmetrical network for retinal vessel segmentation,” Biomed. Signal Process. Control. 81, 104426 (2023). [CrossRef]  

20. D. E. Alvarado-Carrillo, E. Ovalle-Magallanes, and O. S. Dalmau-Cedeño, “D-gaussiannet: Adaptive distorted gaussian matched filter with convolutional neural network for retinal vessel segmentation,” in Geometry and Vision, M. Nguyen, W. Q. Yan, and H. Ho, eds. (Springer International Publishing, Cham, 2021), pp. 378–392.

21. T. Shi, N. Boutry, Y. Xu, et al., “Local intensity order transformation for robust curvilinear object segmentation,” IEEE Trans. on Image Process. 31, 2557–2569 (2022). [CrossRef]  

22. Y. Peng, L. Pan, P. Luan, et al., “Curvilinear object segmentation in medical images based on odos filter and deep learning network,” Applied Intelligence (2023).

23. Z. Li, L. Ma, X. Long, et al., “Hardware-oriented algorithm for high-speed laser centerline extraction based on hessian matrix,” IEEE Trans. Instrum. Meas. 70, 1–14 (2021). [CrossRef]  

24. E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017). [CrossRef]  

25. Y. Yang, H. Zhang, X. Wu, et al., “Mstfdn: Multi-scale transformer fusion dehazing network,” Appl. Intell. 53, 5951–5962 (2022). [CrossRef]  

26. F. Shamshad, S. Khan, S. W. Zamir, et al., “Transformers in medical imaging: A survey,” Med. Image Anal. 88, 102802 (2023). [CrossRef]  

27. K. Li, X. Qi, Y. Luo, et al., “Accurate retinal vessel segmentation in color fundus images via fully attention-based networks,” IEEE J. Biomed. Health Inform. 25(6), 2071–2081 (2021). [CrossRef]  

28. D. E. Alvarado-Carrillo and O. S. Dalmau-Cedeno, “Width attention based convolutional neural network for retinal vessel segmentation,” Expert Syst. with Appl. 209, 118313 (2022). [CrossRef]  

29. X. Li, Y. Jiang, M. Li, et al., “Lightweight attention convolutional neural network for retinal vessel image segmentation,” IEEE Trans. Ind. Inf. 17(3), 1958–1967 (2021). [CrossRef]  

30. Y. Li, Y. Zhang, J.-Y. Liu, et al., “Global transformer and dual local attention network via deep-shallow hierarchical feature fusion for retinal vessel segmentation,” IEEE Trans. Cybern. 53(9), 5826–5839 (2023). [CrossRef]  

31. J. Kazmaier and J. H. van Vuuren, “The power of ensemble learning in sentiment analysis,” Expert Syst. with Appl. 187, 115819 (2022). [CrossRef]  

32. M. A. Ganaie, M. Hu, A. K. Malik, et al., “Ensemble deep learning: A review,” Eng. Appl. Artif. Intell. 115, 105151 (2022). [CrossRef]  

33. M. A. Talukder, M. M. Islam, M. A. Uddin, et al., “Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning,” Expert Syst. with Appl. 205, 117695 (2022). [CrossRef]  

34. R. Arnaout, L. Curran, Y. Zhao, et al., “An ensemble of neural networks provides expert-level prenatal detection of complex congenital heart disease,” Nat. Med. 27(5), 882–891 (2021). [CrossRef]  

35. S. Tang, C. Wang, J. Nie, et al., “Edl-covid: Ensemble deep learning for covid-19 case detection from chest x-ray images,” IEEE Trans. Ind. Inf. 17(9), 6539–6549 (2021). [CrossRef]  

36. Y. Chen, D. Li, X. Zhang, et al., “Computer aided diagnosis of thyroid nodules based on the devised small-datasets multi-view ensemble learning,” Med. Image Anal. 67, 101819 (2021). [CrossRef]  

37. D. Jia and X. Zhuang, “Learning-based algorithms for vessel tracking: A review,” Comput. Med. Imaging Graph. 89, 101840 (2021). [CrossRef]  

38. M. M. Fraz, P. Remagnino, A. Hoppe, et al., “An ensemble classification-based approach applied to retinal blood vessel segmentation,” IEEE Trans. Biomed. Eng. 59(9), 2538–2548 (2012). [CrossRef]  

39. S. Wang, Y. Yin, G. Cao, et al., “Hierarchical retinal blood vessel segmentation based on feature and ensemble learning,” Neurocomputing 149, 708–717 (2015). [CrossRef]  

40. L. Du, H. Liu, L. Zhang, et al., “Deep ensemble learning for accurate retinal vessel segmentation,” Comput. Biol. Med. 158, 106829 (2023). [CrossRef]  

41. J. Staal, M. Abramoff, M. Niemeijer, et al., “Ridge-based vessel segmentation in color images of the retina,” IEEE Trans. Med. Imaging 23(4), 501–509 (2004). [CrossRef]  

42. A. Hoover, V. Kouznetsova, and M. Goldbaum, “Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response,” IEEE Trans. Med. Imaging 19(3), 203–210 (2000). [CrossRef]  

43. J. Fu, J. Liu, H. Tian, et al., “Dual attention network for scene segmentation,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), pp. 3141–3149.

44. M. Mubashar, H. Ali, C. Gronlund, et al., “R2u++: a multiscale recurrent residual u-net with dense skip connections for medical image segmentation,” Neural Comput. Appl. 34(20), 17723–17739 (2022). [CrossRef]  

45. T. Xue and P. Ma, “Tc-net: transformer combined with cnn for image denoising,” Applied Intelligence (2022).

46. Z. Liu, J. Guo, K.-Y. Lam, et al., “Efficient dropout-resilient aggregation for privacy-preserving machine learning,” IEEE Trans. on Inf. Forensics Secur. 18, 1839–1854 (2023). [CrossRef]  

47. K. S. Pradhan, P. Chawla, and R. Tiwari, “Hrdel: High ranking deep ensemble learning-based lung cancer diagnosis model,” Expert Syst. with Appl. 213, 118956 (2023). [CrossRef]  

48. Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, et al., “Unet plus plus : Redesigning skip connections to exploit multiscale features in image segmentation,” IEEE Trans. Med. Imaging 39(6), 1856–1867 (2020). [CrossRef]  

Data availability

Data underlying the results presented in this paper are available in Ref. [38,41,42]. These STARE, DRIVE and CHASEDB1 datasets are publicly available online for free download.

38. M. M. Fraz, P. Remagnino, A. Hoppe, et al., “An ensemble classification-based approach applied to retinal blood vessel segmentation,” IEEE Trans. Biomed. Eng. 59(9), 2538–2548 (2012). [CrossRef]  

41. J. Staal, M. Abramoff, M. Niemeijer, et al., “Ridge-based vessel segmentation in color images of the retina,” IEEE Trans. Med. Imaging 23(4), 501–509 (2004). [CrossRef]  

42. A. Hoover, V. Kouznetsova, and M. Goldbaum, “Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response,” IEEE Trans. Med. Imaging 19(3), 203–210 (2000). [CrossRef]  

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (10)

Fig. 1.
Fig. 1. Fundus vascular images. In which, the first row contains four original images, and the second row displays the corresponding areas inside the white box. The third row provides the corresponding ground truths for the images in the first row. Similarly, the fourth row offers the corresponding ground truths for the local regions in the second row.
Fig. 2.
Fig. 2. The presented deep learning network. (a)The MAF-Net structure. (b) The MAFE-Net structure. Specifically, the UNet, SA-UNet, CS2-Net, and MAF-Net are integrated into a new MAFE-Net.
Fig. 3.
Fig. 3. The attention fusion module (AFM). In which, the spatial attention mechanism and the channel attention mechanism are combined to enhances contextual information for scene information compensation.
Fig. 4.
Fig. 4. The structure of the spatial attention mechanism.
Fig. 5.
Fig. 5. The structure of the channel attention mechanism.
Fig. 6.
Fig. 6. The structure of the spatial attention module (SAM).
Fig. 7.
Fig. 7. DropOut module. The diagram shows the incorporation of DropOut and BN (Batch Normalization) modules into the convolution layers.
Fig. 8.
Fig. 8. Overall of the deep learning network framework diagram. (a) UNet structure; (b) SA-UNet structure; (c) CS2-Net structure; (d) MAF-Net structure; (e) The ensemble learning network of our proposed MAFE-Net, which integrates the above four deep learning models.
Fig. 9.
Fig. 9. Experimental results with different state-of-the-art models on DRIVE, STARE, and CHASEDB1 datasets.
Fig. 10.
Fig. 10. The breakage and incompleteness of the segmented small blood vessels. Where the first row contains four original images, and the second row contains the corresponding ground truths. The third row shows the corresponding ground truths in the green box. The fourth row shows the corresponding segmentation results of the MAFE-Net. The fifth row showcases the outcomes in the green box.

Tables (8)

Tables Icon

Table 1. The specific details of the publicly available datasets.

Tables Icon

Table 2. The parameter details of the MAF-Net.

Tables Icon

Table 3. The specific information of DRIVE, STARE, and CHASEDB1.

Tables Icon

Table 4. Quantitative evaluation with different methods.

Tables Icon

Table 5. The influence of each module on the presented framework.

Tables Icon

Table 6. The influence of parameters in the DropOut layer on the presented framework.

Tables Icon

Table 7. The model parameters with different methods.

Tables Icon

Table 8. The indicators with different methods.

Equations (10)

Equations on this page are rendered with MathJax. Learn more.

S ( x , y ) = exp ( Q y T K x ) x 1 N exp ( Q y T K x )
S A = r e s h a p e ( S V ) F
C ( x , y ) = exp ( F x F y T ) x 1 C F x F y T
C A = r e s h a p e ( C F ) F
D A = S A C A F
F s = F σ ( f 7 × 7 ( [ MaxPool ( F ) ; AvgPool ( F ) ] ) ) = F σ ( f 7 × 7 [ F maxpool  ; F avgpool  ] )
A c c = T P + T N T P + F P + T N + F N
S E = T P T P + F N
S P = T N T N + F P
F 1 = 2 × T P 2 × T P + F P + F N
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.