Deep learning network with differentiable dynamic programming for retina OCT surface segmentation

Hui Xie; Weiyu Xu; Ya Xing Wang; Xiaodong Wu

doi:10.1364/BOE.492670

1. Introduction

Highly accurate surface segmentation for retina optical coherence tomography (OCT) is a clinical necessity in many diagnostic and treatment tasks of ophthalmic diseases. In retina OCT imaging, the frequent presence of weak image boundaries complicated by image artifacts often leads to undesirable boundary spikes with many automated segmentation methods [1–4]. However, experienced ophthalmologists can well delineate retinal surfaces from OCT scans in those difficult scenarios while taking advantage of their global shape information and mutual interaction, such as relative surface positions and surface distance ranges. This indicates that when insufficient image-derived information is available for defining some retinal surfaces, such surface insufficiency can be remedied by making use of surface shape and context priors in the segmentation methods [5]. Fazekas et al. recently developed a method for segmenting Bruch’s membrane in retinal OCT using anatomical priors and uncertainty quantification [6].

Many retina OCT segmentation methods have been proposed in past years. Garvin et al. first introduced the graph-based optimal surface segmentation method [7] for surface delineation in retinal OCT [1], which was further developed by incorporating various a priori knowledge reflecting anatomic and imaging information [2,3,5]. Other known OCT surface segmentation approaches include level set [4,8–11], probabilistic global shape model [12], random forest classifier [13,14], and dynamic programming [15–20]. Each of these methods has its own strength. They all share a common drawback which is their dependence on handcrafted features.

Armed with superior data representation learning capacity, deep learning methods are emerging as powerful alternatives to traditional segmentation algorithms for many medical image segmentation tasks [21,22]. Fully convolutional networks (FCNs) [23,24], Convolutional neural networks (CNNs) [25], and U-Net [26–31] have been utilized for retinal layer segmentation in OCT images. Recent works have also used the layer context and shapes from the inherent 3D structure of the retina in the deep convolution network for retinal layer segmentation [32–34]. Due to the scarcity of training data in medical imaging, it is yet nontrivial for deep learning networks to implicitly learn global structures of the target surfaces [35,36]. Thus, the retinal layer topology cannot be guaranteed with those methods, and neither the continuity nor smoothness of the retinal surfaces can be ensured [25]. To address those limitations, the graph-based method and dynamic programming were used as post-processing for the deep learning models to enforce surface monotonicity and smoothness [37,38]. In this scheme, feature learning is, in fact, disconnected from the downstream optimization; the learned features thus may not be truly appropriated for the graph model. Other approaches to facilitate capturing more global structures include atrous convolution and atrous spatial pyramid pooling proposed in DeepLab [35]. He et al. further extended the deep regression idea [25] with fully differentiable soft-argmax operations to generate surface positions followed by ReLU operations to guarantee the surface order in their fully convolutional regression network (FCRN) [29]. The hybrid 2D-3D CNN [30] using B-scan alignment was proposed to obtain continuous 3D retinal layer surfaces from OCT. The IPM optimization method [31] effectively integrates the deep learning feature learning with the IPM optimization to enforce mutual interaction between surfaces, but the IPM optimization runs on each A-scan. All these methods [29–31] achieved highly accurate segmentation of retinal surfaces from OCT. However, their performance is prone to be affected by image outliers with bad quality or artifacts with the limited size of training data [29], as they lack the capability of explicitly learning surface smoothness structure.

This study proposes to unify the powerful feature learning capability of deep learning with a constrained DDP module in a single deep neural network for end-to-end training to achieve globally optimal segmentation while explicitly enforcing surface smoothness. In the proposed segmentation framework, a U-Net with additional image gradient channels [31] is leveraged as the backbone for learning parameterized surface costs. The retinal surface inference by minimizing the total surface cost while satisfying surface smoothness constraints is realized by a DDP module for a globally optimal solution. The differentiability of the DDP module enables efficient backward propagation of gradients for end-to-end learning. To the best of our knowledge, this is the first work to apply differentiable dynamic programming for surface segmentation in medical images. Experiments on retina spectral-domain OCT data sets demonstrated improved surface segmentation accuracy.

2. Method

2.1 Problem Formulation

Let $\mathcal {I}(X, Z)$ of size $X{\times }Z$ be a given 2D B-scan of an OCT image. For each $x = 0, 1, \ldots, X-1$, the pixel subset $\{\mathcal {I}(x, z) | 0 \le z < Z\}$ forms a column parallel to the $\mathbf {z}$-axis, denoted by Col$(x)$, which corresponds to an A-scan of the OCT image. Given an integer $N > 0$, our goal is to seek $N$ retinal surfaces, each of which $S_i$ ($i = 0, 1, \ldots, N-1$) intersects every column Col$(x)$ at exactly one location $z^{(i)}_x$, that is, $\mathcal {I}(x, z^{(i)}_x) \in S_i$. To find an optimal surface $S_i$, each pixel $\mathcal {I}(x, z)$ is associated with an on-surface cost $c_i(x,z)$, which is related to the likelihood of $\mathcal {I}(x,z)$ on $S_i$. Each retinal surface expresses a certain degree of smoothness, which specifies the maximum allowed change in the $\mathbf {z}$-dimension of a feasible surface along each unit distance change in the $\mathbf {x}$-dimension. More specifically, with given smoothness parameters $\Delta ^{(i)}_x > 0$ for $S_i$, if $\mathcal {I}(x, z'), \mathcal {I}(x-1, z'') \in S_i$, then $|z' - z''| \leq \Delta ^{(i)}_x$. The optimization objective of our surface segmentation problem is to maximize the total on-surface cost $\mathbb {E}(\mathcal {S})$ of all pixels on the $N$ sought surfaces $\mathcal {S} = \{S_0, S_1, \ldots, S_{N-1}\}$, with

(1)$$\begin{aligned} \mathcal{S}^* & =\operatorname*{argmax}_{\mathcal{S} = \{S_0, S_1,\ldots, S_{N-1}\}} \mathbb{E}(\mathcal{S}) = \sum_{i=0}^{N-1}\sum_{\mathcal{I}\;(x,z) \in S_i} c_i(x,z)\\ & \text{s.t.}\quad |z_x^{(i)} - z_{x+1}^{(i)} | \le \Delta_x^{(i)}, \quad \text{for}\; i = 0, 1,\ldots, N-1; x=0, 1,\ldots, X-2. \end{aligned}$$

2.2 Network architecture

The proposed surface segmentation network is based on a U-Net architecture [39], as illustrated in Fig. 1, which consists of seven convolution layers. This U-Net acts as a feature-extracting module for the surface segmentation head. We started with 24 feature maps in the first convolution layer. In each down-sampling layer, a conv2d module followed by a $2 \times 2$ max-pooling doubles the feature maps, and then a cascade of three same conv2d modules with a residual connection is used. The up-sampling layers use a symmetric structure as the down-sampling layers but with bi-linear up-sample modules.

Fig. 1. The U-Net-based network architecture with additional seven gradient channels as input. The number of channels for each layer is indicated with $c$. The DDP module solves the optimization problem for segmentation and outputs optimal smooth retinal surfaces $\mathcal {S}$ for the $L_1$ loss.

Download Full Size | PDF

As in the IPM segmentation method [31], we use image gradient information as additional input channels to enrich image input information while reducing the learning burden of the network, as image gradients are prominent features to discriminate image boundaries. Our proposed network used seven gradient channels including gradient scales along the orientations of $0^{\circ }$ ($\mathbf {x}$-dimension), $45^{\circ }$, $90^{\circ }$, and $135^{\circ }$, normalized gradient directions in the $0^{\circ }$-$90^{\circ }$ and $45^{\circ }$-$135^{\circ }$ coordinate systems, and the gradient magnitude in the $0^{\circ }$-$90^{\circ }$ coordinate system, all of which are directly computed from raw images. All these gradient channels with the raw image are then concatenated into eight channels as the input to the U-Net framework.

Following the U-Net is a simple segmentation head, which consists of a $1 \times 1$ conv2d followed by a conv2d module. The segmentation head outputs a $N \times X \times Z$ logit. Note that $N$ is the number of target surfaces and the size of the input B-scan is $X$x$Z$. A softmax over the $\mathbf {z}$-dimension of the $N \times X \times Z$ logits gets the predicted probabilities $p^{(i)}_{x,z} \in [0,1]$, where $i \in [0,N)$, $z \in [0,Z)$, and $x \in [0,X)$. Each $p^{(i)}_{x,z}$ indicates the probability of the pixel $\mathcal {I}(x,z)$ on Surface $S_i$. The initial estimate $\mu _x^{(i)}$ of the surface location of $S_i$ on Column Col$(x)$ can then be computed [29,31], with $\mu _x^{(i)} = \sum _{z=0}^{z=Z-1} z\cdot p^{(i)}_{x,z}$. The on-surface cost $c_i(x, z)$ for each pixel $\mathcal {I}(x,z)$ on Col$(x)$ is parameterized, as $c_i(x, z) = -(z - \mu _x^{(i)})^2$.

2.3 DDP module with smoothness constraints

The DDP module solves the optimization problem in Eqn. (1). The smoothness constraints $\Delta ^{(i)}_x$ between two adjacent columns Col$(x)$ and Col$(x-1)$ for the target surface $S_i$ can be learned from the training data. In our experiments, $\Delta ^{(i)}_x$ is simply set to be $\alpha > 0$ plus the maximum surface position change of $S_i$ between Col$(x)$ and Col$(x+1)$ in the whole training set, where $\alpha$ is a security buffer as the maximum surface position change in the training set does not represent the surface position change in the test set. Let $\tau ^{(i)}_{x, z}$ denote the maximum total on-surface cost for $S_i$ starting from Col$(0)$ while ending at the pixel $\mathcal {I}(x,z)$. Based on the dynamic programming technique, we have

(2)$$\tau^{(i)}_{x, z} = \begin{cases} c_i(0, z) &\text{if}\;x=0 \\ c_i(x, z) +\max_{z-\Delta^{(i)}_x \le z' \le z+\Delta^{(i)}_x}\{\tau^{(i)}_{x-1, z'}\} &\text{else}\;x \in [1,X),\end{cases}$$

where $z \in [0, Z)$, and $\Delta ^{(i)}_x > 0$. The maximum of $\{\tau ^{(i)}_{X-1, 0}, \tau ^{(i)}_{X-1, 1}, \ldots, \tau ^{(i)}_{X-1, Z-1}\}$ gives the total on-surface cost of the optimal surface $S^*_i$.

As the $\max$ operator above in Eq. (2) is not differentiable, we use a differentiable operator of LogSumExp [40], as follows, to approximate the $\max$ operator:

(3)$$\phi^{(i)}_{x-1,z}(\tau) = \frac{1}{t} \log \sum_{z' = z-\Delta^{(i)}_x}^{z+\Delta^{(i)}_x} \exp(t\cdot \tau^{(i)}_{x-1, z'}), \quad t > 0.$$

This $\phi ^{(i)}_{x-1,z}(\tau )$ has an elegant property that it is bounded in a narrow band of the real value $m = \max _{z-\Delta ^{(i)}_x \le z' \le z+\Delta ^{(i)}_x}\{\tau ^{(i)}_{x-1, z'}\}$, as follows:

(4)$$m \le \phi^{(i)}_{x-1,z}(\tau) \le m + \frac{\log (2\Delta^{(i)}_x +1)}{t}$$

In practice, we can choose a proper $t$ such that $\frac {\log (2\Delta x +1)}{t} \le \epsilon$, to make the approximation error of $\phi ^{(i)}_{x-1,z}(\tau )$ to be less than arbitrary small $\epsilon >0$. Using the approximation of $\phi ^{(i)}_{x-1,z}(\tau )$, the DP recursive formula Eqn. (2) can be written, as follows.

(5)$$\tau^{(i)}_{x, z} = \begin{cases} c_i(0, z) & \text{if x=0} \\ c_i(x, z) +\frac{1}{t} \log \sum_{z' = z-\Delta^{(i)}_x}^{z+\Delta^{(i)}_x} \exp(t\cdot \tau^{(i)}_{x-1, z'} ) & \text{else}\;x \in [1,X), \end{cases}$$

which is differentiable everywhere. The difference between our method and Mensch et al.’s DDP [40] is that we consider a constrained DDP model.

In the backtracking stage of DP to obtain optimal $S_i^*$, we need to know the surface location $z^{(i)}_x$ of $S_i^*$ on each column Col$(x)$. Based on Danskin’s Theorem [41], the gradient of $\phi ^{(i)}_{x-1,z}(\tau )$ attains $\textrm{arg min} _{z-\Delta ^{(i)}_x\le z' \le z+\Delta ^{(i)}_x}\{\tau ^{(i)}_{x-1, z'}\}$, as follows:

(6)$$\begin{aligned} \mathop{\textrm{arg min}}\limits_{z-\Delta^{(i)}_x\le z' \le z+\Delta^{(i)}_x}\{\tau^{(i)}_{x-1, z'}\}& = \frac{\partial \phi^{(i)}_{x-1,z}(\tau)}{\partial \tau} \\ &= \frac{ \exp(t\cdot \tau^{(i)}_{x-1, z'})}{\sum_{z^{\prime\prime} = z-\Delta^{(i)}_x}^{z+\Delta^{(i)}_x} \exp(t\cdot \tau^{(i)}_{x-1, z^{\prime\prime}})}, \quad \text{for}\; z' \in [z-\Delta^{(i)}_x, z+\Delta^{(i)}_x]. \end{aligned}$$

Here Eqn. (6) indicates that the softmax function is a smooth approximation of the argmax function. Thus, during backtracking, the optimal surface location $z^{(i)}_{x-1}$ of $S_i^*$ on Col$(x-1)$ can be computed, as follows.

(7)$$\begin{aligned} z^{(i)}_{x-1} = \frac{\sum_{z' = z-\Delta^{(i)}_x}^{z+\Delta^{(i)}_x} (z' \cdot \exp(t\cdot \tau^{(i)}_{x-1, z'}))}{\sum_{z' = z-\Delta^{(i)}_x}^{z+\Delta^{(i)}_x} \exp(t\cdot \tau^{(i)}_{x-1, z'})}. \end{aligned}$$

In the inference stage, we thus obtain the set of optimal surfaces $\mathcal {S}^* = \{S_0^*, S_1^*, \ldots, S_{N-1}^*\}$. While during the training stage, $\mathcal {S}^*$ is used to define the loss of the network for backward propagation.

2.4 Loss functions

We use multiple-surface cross-entropy loss $L_{mCE}$ and the L1 loss $L_1$ to train our proposed network. Let $g^{(i)}_{x,z} \in \{0,1\}$ denote the ground truth probability of pixel $\mathcal {I}(x, z)$ on $S_i$. Then, $L_{mCE}$ can be computed, with

(8)$$L_{mCE} = \frac{-\sum_{i=0}^{N-1} \sum_{z=0}^{Z-1}\sum_{x=0}^{X-1}\{g^{(i)}_{x,z}\ln(p^{(i)}_{x,z})+ (1-g^{(i)}_{x,z})\ln(1-p^{(i)}_{x,z})\}}{NZX}.$$

Let $s^{(i)}_{x}$ be the ground truth surface location of $S_i$ on Column Col$(x)$. The L1 loss can be computed, as follows

(9)$$L_{1} = \frac{1}{NX}\sum_{i=0}^{N-1}\sum_{x=0}^{X-1} \|z^{(i)}_{x} -s^{(i)}_{x} \|.$$

The total loss $L$ for this surface segmentation network is $L = L_{mCE} + L_{1}$. To improve the training efficiency, we first pre-train the proposed segmentation network without the DDP module and then add it for further fine-tuning of the network. In the pre-training, $z^{(i)}_{x} = \mu ^{(i)}_{x}$ in Eqn. (9) is used.

3. Experiments

The proposed method was validated on two public data sets – Duke AMD (age-related macular degeneration) data set [42] and JHU MS (multiple sclerosis) data set [43]. PyTorch version 1.81 on Ubuntu Linux 20.04 was used for the implementation of the proposed method. Ablation experiments were also conducted to evaluate the contribution of the DDP module in the proposed network, in which we investigated the performance of the proposed segmentation model by removing the DDP module to understand its contribution to the overall model. Experiments used 100 epochs for pre-training without the DDP module and then added the DDP module for further training.

3.1 Duke AMD OCT data

The Duke AMD data set [42] consists of 384 SD-OCT volumes (115 normal and 269 AMD subjects). Each original OCT volume is of size $100 \times 512 \times 1000$. The manual tracings are only available around the fovea. So each volume was cropped to form $51$ B-scans of size $512 \times 361$ around the fovea as our input data. The $\mathbf {z}$-axial resolution of A-scans is 3.24 $\mu m$/pixel. The manual tracings on each scan include three surfaces: Inner Limiting Membrane (ILM), inner Retinal Pigment Epithelium-Drusen Complex (RPEDC), and the Outer aspect of the Bruch Membrane (OBM) from top to bottom. In our experiments, we randomly divided, based on patient IDs, all 384 samples into training (187 AMD + 79 normal), validation (41 AMD + 18 normal), and test (41 AMD + 18 normal) sets. In this experiment, we used 128 channels in the segmentation head, a batch size of 8, and an Adam optimizer with an initial learning rate of 0.1 without weight decay. In the fine tune stage, training with the DDP module needed 2.9 seconds per batch.

The mean absolute surface distance (MASD) errors for the proposed and the compared methods as well as the ablation study are shown in Table 1. Our DDP method demonstrated improved segmentation accuracy for all MASD measurements over the compared methods. The ablation experiment and $p\textrm {-values}$ analysis showed that the DDP module significantly improved the segmentation accuracy for all three surfaces in the AMD subjects; while for the normal subjects, the DDP module significantly improved segmentation performance on both ILM and inner RPEDC and achieved comparable results on OBM. For the Hausdorff distance (HD), our method outperformed both FCRN [29] and Hybrid2D3D [30] (Table 2), achieving much smaller HD errors, especially for two challenging surfaces of inner RPEDC and OBM. The DDP module significantly reduced HD errors for both ILM and inner RPEDC. Sample segmentation of three AMD cases is illustrated in Fig. 2, and an extreme disease case is illustrated in Fig. 3.

Fig. 2. Segmentation samples of three AMD cases in the Duke AMD test set. The red arrows indicate segmentation errors. Our proposed DDP method alleviated the errors in weak boundary regions. The orange arrows show that our DDP results are also not perfect, which shows an over-smoothness effect on the segmented surfaces.

Download Full Size | PDF

Fig. 3. Segmentation sample of an extreme AMD disease case with drusen and collapsed ILM in the Duke AMD test set. The red arrows indicate segmentation errors. Our proposed DDP method smooths the drusen region.

Download Full Size | PDF

Table 1. MASD $\pm$ standard deviation evaluated on Duke AMD test set and the ablation study.

View Table | View all tables in this article

Table 2. The HD errors evaluated on Duke AMD test set. Bold fonts indicate the best in its column.

View Table | View all tables in this article

3.2 JHU MS OCT Data

The public JHU MS dataset [43] includes 35 human retina scans acquired on a Heidelberg Spectralis SD-OCT system, of which 14 are healthy controls (HC) and 21 have a diagnosis of multiple sclerosis (MS). Each volume has 9 surfaces, and 49 B-scans each with the size of 128$\times$1024 after cropping out the center part by a Matlab script [43]. The $\mathbf {z}$-axial resolution of A-scans is 3.87 $\mu$m/pixel.

In this experiment, the proposed segmentation model was trained on the last six HC and last nine MS subjects according to the order of their patient IDs and tested on the other 20 subjects, which is the same experimental configuration as in He et al.’s FCRN [29]. The Gaussian, salt & pepper noise, and random flipping on the $\mathbf {x}$-direction were used for data augmentation. The experiment used a segmentation head of 64 channels, batch size 4, an Adam optimizer with an initial learning rate of 0.01 without weight decay, and a reducing learning rate on a plateau scheduler with patience 20 and factor 0.5. In the fine tune training stage, DDP needed 13 seconds per batch.

The MASD errors for the FCRN [29], IPM [31], and proposed methods are shown in Table 3, which also shows the ablation study results. The visual examples are illustrated in Fig. 4. The proposed method achieved an overall MASD error of $2.75\pm 0.94{\mu }$m averaged over all nine surfaces, outperforming both compared methods. For individual surfaces, our method achieved higher segmentation accuracy on 8 surfaces compared to FCRN and on 6 surfaces compared to IPM. The DDP module significantly improved the segmentation accuracy with respect to the MASD metric for all nine surfaces. As to the HD metric, the DDP module achieved lower HD errors for all segmented surfaces and lower 95% HD errors on six out of nine surfaces (Table 4).

Fig. 4. Segmentation samples of an MS case in the JHU MS test set using different methods. The red arrows indicate segmentation errors. Our proposed DDP method alleviated the errors in weak boundary regions, even though it is still not perfect.

Download Full Size | PDF

Table 3. MASD $\pm$ standard deviation evaluated on JHU MS test set and the ablation study.

View Table | View all tables in this article

Table 4. The ablation study on the HD errors evaluated on the JHU MS test set.

View Table | View all tables in this article

3.3 DDP module on noise images

Speckle noise, caused by multiple forward and backward scattering of light waves, is the main quality degrading factor in OCT images [45]. The presence of speckle noise often obscures subtle but important morphological details, thus it is detrimental to clinical diagnosis.

With the presence of speckle noise, our DDP method achieves better robustness overcoming the segmentation quality degradation. To test the effect of the DDP module, we added extra speckle noise on the test set of JHU MS data set [43], and then compared their segmentation accuracy of the two methods of Ours and OursWithoutDDP, as shown in Table 5. With speckle noise which blurs subtle layer boundaries, both methods got decreased segmentation accuracy, but with the proposed DDP module our method achieved much better segmentation accuracy (Fig. 5).

Fig. 5. Segmentation samples of an MS case in the JHU MS test set with extra speckle noise using different methods. Our proposed DDP method explicitly brings smoothness in serious noise images.

Download Full Size | PDF

Table 5. MASD (mean absolute surface distance, $\mu m$) $\pm$ standard deviation errors evaluated on the JHU MS test set with extra added speckle noise

View Table | View all tables in this article

We conducted a second experiment which showed that image contrast adjustment has no explicit effect on segmentation accuracy. Image contrast is defined as the maximum intensity difference between pixels in the image. The adjustment of image contrast is to apply a consistent scale-down factor on the intensity of each pixel. Our network uses intensity normalization with intensity mean and standard deviation before feeding the image into the network, which enforced the image contrast into the range of [-3, 3] in a normalized value. The image contrast adjustment is offset by the intensity normalization process. Therefore, image contrast adjusting (except image contrast = 0) does not affect the segmentation accuracy.

4. Discussion

Recent deep learning methods [29–31,46] for retinal OCT surface segmentation significantly improved segmentation accuracy to the sub-voxel error level. The proposed method further improves the state-of-the-art performance by integrating a DDP module into the deep learning framework. Especially, for the noisy retinal OCT scans, the proposed DDP method shows much higher segmentation accuracy compared to the baseline U-Net model (OursWithDDP), as shown in Table 5. Visual improvement on segmentation samples is demonstrated in Fig. 5.

Our smoothness constraint parameters support different smoothness constraints at different locations. The smoothness constraint parameter $\Delta ^{(i)}_x$ indicates an element value at Col($x$) for the $i$-th surface, and all smoothness constraint parameters form a constraint matrix of size $N \times (X-1)$, where $N$ is the number of surfaces, and $X$ is the number of columns in a B-scan. This smoothness constraint matrix is flexible to handle the different smoothness for different surfaces at different column locations. For example, the ILM and OBM surfaces are smoother than the InnerRPEDC surface, and the middle region of the InnerRPEDC is bumpier than the side regions due to the presence of drusen in the AMD disease cases. However, in our experiments on both datasets, we did not utilize the image-specific constraints for surface smoothness. In the future, we plan to develop a deep learning module to learn the surface smoothness matrix to further improve segmentation accuracy.

The limitation of this DDP method is its computation overhead illustrated in Table 6. The training time of our DDP model was increased by about 3-4 times compared to the model without using DDP. The reason is that the Bellman iterations in the DDP module can not allow parallel along the $x$-dimension, which is essentially sequential with one column after another in B-scans.

Table 6. Computation overhead in FPS (frame per second) for different data sets

View Table | View all tables in this article

Even though the DDP method is slower than the normal U-Net method (OursWithoutDDP), our proposed DDP method is much more efficient in both model training and reference than the IPM method [31]. The IPM method was designed to enforce the surface ordering and surface distance constraints for multi-surface segmentation without enforcing surface smoothness, while the proposed DDP method can explicitly enforce surface smoothness over the whole B-scans. The IPM method is much more computationally intensive than the DDP method as it needs to perform expensive matrix inverse operations. It is not feasible to train an IPM segmentation model using a large-size dataset like the Duke AMD dataset, which consists of $384\times 51$ B-scans. We thus did not perform the comparison to the IPM method on the Duke AMD dataset.

5. Conclusion

In this paper, a novel deep learning framework for OCT surface segmentation is proposed, which unifies a constrained DDP optimization with a deep learning network for end-to-end learning. It effectively integrates the feedback from the downstream model optimization for segmentation into the forefront feature learning by the deep learning network, explicitly enforcing smoothness for all segmented surfaces.

The proposed method was validated in two public OCT data sets and outperformed the compared state-of-the-art methods. The further improvement includes using deep learning to learn surface smoothness constraints for the DDP module. This proposed method has the potential to be adapted for other structured surface segmentation problems in medical imaging.

Funding

National Institute of Mental Health (1U54HL165442); National Science Foundation (CCF-1733742, ECCS-2000425, ECCS-2133205).

Acknowledgments

This research was supported, in part, by the NSF grants CCF-1733742, ECCS-2000425 and ECCS-2133205, and the NIH grant 1U54HL165442.

Disclosures

The authors declare no conflicts of interest.

Data availability

The proposed method was validated on two public data sets – Duke AMD (age-related macular degeneration) data set [42] and JHU MS (multiple sclerosis) data set [43].

References

1. M. K. Garvin, M. D. Abràmoff, R. Kardon, S. R. Russell, X. Wu, and M. Sonka, “Intraretinal layer segmentation of macular optical coherence tomography images using optimal 3-D graph search,” IEEE Trans. Med. Imaging 27(10), 1495–1505 (2008). [CrossRef]

2. A. Shah, M. D. Abámoff, and X. Wu, “Optimal surface segmentation with convex priors in irregularly sampled space,” Med. Image Anal. 54, 63–75 (2019). [CrossRef]

3. P. A. Dufour, L. Ceklic, H. Abdillahi, S. Schroder, S. De Dzanet, U. Wolf-Schnurrbusch, and J. Kowal, “Graph-based multi-surface segmentation of OCT data using trained hard and soft constraints,” IEEE Trans. Med. Imaging 32(3), 531–543 (2013). [CrossRef]

4. A. Carass, A. Lang, M. Hauser, P. A. Calabresi, H. S. Ying, and J. L. Prince, “Multiple-object geometric deformable model for segmentation of macular OCT,” Biomed. Opt. Express 5(4), 1062–1074 (2014). [CrossRef]

5. Q. Song, J. Bai, M. K. Garvin, M. Sonka, J. M. Buatti, and X. Wu, “Optimal multiple surface segmentation with shape and context priors,” IEEE Trans. Med. Imaging 32(2), 376–386 (2013). [CrossRef]

6. B. Fazekas, D. Lachinov, G. Aresta, J. Mai, U. Schmidt-Erfurth, and H. Bogunovic, “Segmentation of bruch’s membrane in retinal oct with amd using anatomical priors and uncertainty quantification,” IEEE J. Biomed. Health Inform. 27(1), 41–52 (2023). [CrossRef]

7. K. Li, X. Wu, D. Z. Chen, and M. Sonka, “Optimal surface segmentation in volumetric images-a graph-theoretic approach,” IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 119–134 (2006). [CrossRef]

8. K. Gawlik, F. Hausser, F. Paul, A. U. Brandt, and E. M. Kadas, “Active contour method for ILM segmentation in ONH volume scans in retinal OCT,” Biomed. Opt. Express 9(12), 6497–6518 (2018). [CrossRef]

9. Y. Liu, A. Carass, Y. He, B. J. Antony, A. Filippatou, S. Saidha, S. D. Solomon, P. A. Calabresi, and J. L. Prince, “Layer boundary evolution method for macular OCT layer segmentation,” Biomed. Opt. Express 10(3), 1064–1080 (2019). [CrossRef]

10. J. Novosel, G. Thepass, H. G. Lemij, J. F. de Boer, K. A. Vermeer, and L. J. van Vliet, “Loosely coupled level sets for simultaneous 3D retinal layer segmentation in optical coherence tomography,” Med. Image Anal. 26(1), 146–158 (2015). [CrossRef]

11. J. Novosel, K. A. Vermeer, J. H. De Jong, Z. Wang, and L. J. Van Vliet, “Joint segmentation of retinal layers and focal lesions in 3-D OCT data of topologically disrupted retinas,” IEEE Trans. Med. Imaging 36(6), 1276–1286 (2017). [CrossRef]

12. F. Rathke, S. Schmidt, and C. Schnörr, “Probabilistic intra-retinal layer segmentation in 3-D OCT images using global shape regularization,” Med. Image Anal. 18(5), 781–794 (2014). [CrossRef]

13. A. Lang, A. Carass, M. Hauser, E. S. Sotirchos, P. A. Calabresi, H. S. Ying, and J. L. Prince, “Retinal layer segmentation of macular OCT images using boundary classification,” Biomed. Opt. Express 4(7), 1133–1152 (2013). [CrossRef]

14. D. Xiang, G. Chen, F. Shi, W. Zhu, Q. Liu, S. Yuan, and X. Chen, “Automatic retinal layer segmentation of OCT images with central serous retinopathy,” IEEE J. Biomed. Health Inform. 23(1), 283–295 (2019). [CrossRef]

15. S. J. Chiu, X. T. Li, P. Nicholas, C. A. Toth, J. A. Izatt, and S. Farsiu, “Automatic segmentation of seven retinal layers in SDOCT images congruent with expert manual segmentation,” Opt. Express 18(18), 19413–19428 (2010). [CrossRef]

16. Q. Yang, C. A. Reisman, Z. Wang, Y. Fukuma, M. Hangai, N. Yoshimura, A. Tomidokoro, M. Araie, A. S. Raza, D. C. Hood, and K. Chan, “Automated layer segmentation of macular OCT images using dual-scale gradient information,” Opt. Express 18(20), 21293–21307 (2010). [CrossRef]

17. B. Keller, D. Cunefare, D. S. Grewal, T. H. Mahmoud, J. A. Izatt, and S. Farsiu, “Length-adaptive graph search for automatic segmentation of pathological features in optical coherence tomography images,” J. Biomed. Opt. 21(7), 076015 (2016). [CrossRef]

18. F. Rathke, M. Desana, and C. Schnörr, “Locally adaptive probabilistic models for global segmentation of pathological oct scans,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, (Springer, 2017), pp. 177–184.

19. P. P. Srinivasan, S. J. Heflin, J. A. Izatt, V. Y. Arshavsky, and S. Farsiu, “Automatic segmentation of up to ten layer boundaries in sd-oct images of the mouse retina with and without missing layers due to pathology,” Biomed. Opt. Express 5(2), 348–365 (2014). [CrossRef]

20. J. Tian, B. Varga, G. M. Somfai, W.-H. Lee, W. E. Smiddy, and D. Cabrera DeBuc, “Real-time automatic segmentation of optical coherence tomography volume data of the macular region,” PLoS One 10(8), e0133908 (2015). [CrossRef]

21. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Med. Image Anal. 42, 60–88 (2017). [CrossRef]

22. D. Shen, G. Wu, and H.-I. Suk, “Deep learning in medical image analysis,” Annu. Rev. Biomed. Eng. 19(1), 221–248 (2017). [CrossRef]

23. T. Schlegl, S. M. Waldstein, H. Bogunovic, F. Endstraßer, A. Sadeghipour, A.-M. Philip, D. Podkowinski, B. S. Gerendas, G. Langs, and U. Schmidt-Erfurth, “Fully automated detection and quantification of macular fluid in OCT using deep learning,” Ophthalmology 125(4), 549–558 (2018). [CrossRef]

24. S. Masood, R. Fang, P. Li, H. Li, B. Sheng, A. Mathavan, X. Wang, P. Yang, Q. Wu, J. Qin, and W. Jia, “Automatic choroid layer segmentation from optical coherence tomography images using deep learning,” Sci. Rep. 9(1), 1–18 (2019). [CrossRef]

25. A. Shah, L. Zhou, M. D. Abrámoff, and X. Wu, “Multiple surface segmentation using convolution neural nets: application to retinal layer segmentation in OCT images,” Biomed. Opt. Express 9(9), 4509–4526 (2018). [CrossRef]

26. A. G. Roy, S. Conjeti, S. P. K. Karri, D. Sheet, A. Katouzian, C. Wachinger, and N. Navab, “ReLayNet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks,” Biomed. Opt. Express 8(8), 3627–3642 (2017). [CrossRef]

27. V. A. Dos Santos, L. Schmetterer, H. Stegmann, M. Pfister, A. Messner, G. Schmidinger, G. Garhofer, and R. M. Werkmeister, “CorneaNet: fast segmentation of cornea OCT scans of healthy and keratoconic eyes using deep learning,” Biomed. Opt. Express 10(2), 622–641 (2019). [CrossRef]

28. C. S. Lee, A. J. Tyring, N. P. Deruyter, Y. Wu, A. Rokem, and A. Y. Lee, “Deep-learning based, automated segmentation of macular edema in optical coherence tomography,” Biomed. Opt. Express 8(7), 3440–3448 (2017). [CrossRef]

29. Y. He, A. Carass, Y. Liu, B. M. Jedynak, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Structured layer surface segmentation for retina OCT using fully convolutional regression networks,” Med. Image Anal. 68, 101856 (2021). [CrossRef]

30. H. Liu, D. Wei, D. Lu, Y. Li, K. Ma, L. Wang, and Y. Zheng, “Simultaneous alignment and surface regression using hybrid 2D-3D networks for 3D coherent layer segmentation of retina OCT images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, (Springer, 2021), pp. 108–118.

31. H. Xie, Z. Pan, L. Zhou, F. A. Zaman, D. Z. Chen, J. B. Jonas, W. Xu, Y. X. Wang, and X. Wu, “Globally optimal OCT surface segmentation using a constrained IPM optimization,” Opt. Express 30(2), 2453–2471 (2022). [CrossRef]

32. Y. He, A. Carass, Y. Liu, P. A. Calabresi, S. Saidha, and J. L. Prince, “Longitudinal deep network for consistent oct layer segmentation,” Biomed. Opt. Express 14(5), 1874–1893 (2023). [CrossRef]

33. S. Mukherjee, T. D. Silva, P. Grisso, H. Wiley, D. L. K. Tiarnan, A. T. Thavikulwat, E. Chew, and C. Cukras, “Retinal layer segmentation in optical coherence tomography (oct) using a 3d deep-convolutional regression network for patients with age-related macular degeneration,” Biomed. Opt. Express 13(6), 3195–3210 (2022). [CrossRef]

34. S. Mukherjee, T. De Silva, G. Jayakar, P. Grisso, H. Wiley, T. Keenan, A. Thavikulwat, E. Chew, and C. Cukras, “Retinal layer segmentation for age-related macular degeneration patients with 3D-UNet,” in Medical Imaging 2022: Computer-Aided Diagnosis, vol. 12033 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference SeriesK. Drukker, K. M. Iftekharuddin, H. Lu, M. A. Mazurowski, C. Muramatsu, and R. K. Samala, eds. (2022), p. 120333J.

35. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). [CrossRef]

36. J. Cai, L. Lu, Z. Zhang, F. Xing, L. Yang, and Q. Yin, “Pancreas segmentation in mri using graph-based decision fusion on convolutional neural networks,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016, S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, and W. Wells, eds. (Springer International Publishing, Cham, 2016), pp. 442–450.

37. L. Fang, D. Cunefare, C. Wang, R. H. Guymer, S. Li, and S. Farsiu, “Automatic segmentation of nine retinal layer boundaries in OCT images of non-exudative AMD patients using deep learning and graph search,” Biomed. Opt. Express 8(5), 2732–2744 (2017). [CrossRef]

38. J. Kugelman, D. Alonso-Caneiro, S. A. Read, S. J. Vincent, and M. J. Collins, “Automatic segmentation of OCT retinal boundaries using recurrent neural networks and graph search,” Biomed. Opt. Express 9(11), 5759–5777 (2018). [CrossRef]

39. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

40. A. Mensch and M. Blondel, “Differentiable dynamic programming for structured prediction and attention,” in International Conference on Machine Learning, (PMLR, 2018), pp. 3462–3471.

41. D. P. Bertsekas, “Control of uncertain systems with a set-membership description of the uncertainty,” Ph.D. thesis, Massachusetts Institute of Technology (1971).

42. S. Farsiu, S. J. Chiu, R. V. O’Connell, et al., “Quantitative classification of eyes with and without intermediate age-related macular degeneration using optical coherence tomography,” Ophthalmology 121(1), 162–172 (2014). [CrossRef]

43. Y. He, A. Carass, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Retinal layer parcellation of optical coherence tomography images: Data resource for multiple sclerosis and healthy controls,” Data brief 22, 601–604 (2019). [CrossRef]

44. A. Shah, M. D. Abramoff, and X. Wu, “Simultaneous multiple surface segmentation using deep learning,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, (Springer, 2017), pp. 3–11.

45. Y. Ma, X. Chen, W. Zhu, X. Cheng, D. Xiang, and F. Shi, “Speckle noise reduction in optical coherence tomography images based on edge-sensitive cgan,” Biomed. Opt. Express 9(11), 5129–5146 (2018). [CrossRef]

46. Y. He, A. Carass, Y. Liu, B. M. Jedynak, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Fully convolutional boundary regression for retina OCT segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, (Springer, 2019), pp. 120–128.

		Normal Group				AMD Group
Methods	TotalMean	GroupMean	ILM	InnerRPEDC	OBM	GroupMean	ILM	InnerRPEDC	OBM
Method comparison: MASD (mean absolute surface distance) $\pm$ standard deviation ( $μ m$ )^a
G-OSC [5]	–	–	3.85 $\pm$ 0.16	4.56 $\pm$ 0.35	–	–	4.43 $\pm$ 0.71	9.33 $\pm$ 1.74	–
CNN-S [44]	–	–	2.88 $\pm$ 0.22	4.14 $\pm$ 0.32	–	–	3.43 $\pm$ 0.35	5.92 $\pm$ 0.84	–
CNN-S-2 [25]	–	4.07 $\pm$ 0.55	3.36 $\pm$ 0.23	3.84 $\pm$ 0.58	4.97 $\pm$ 1.01	5.20 $\pm$ 1.58	3.71 $\pm$ 0.77	6.07 $\pm$ 1.84	5.58 $\pm$ 1.80
FCRN [30]^b	2.78 $\pm$ 3.31	–	1.24 $\pm$ 0.51	2.06 $\pm$ 1.51	2.28 $\pm$ 0.36	–	1.73 $\pm$ 2.50	3.09 $\pm$ 2.09	4.94 $\pm$ 5.35
Hybrid2D3D [30]	2.71 $\pm$ 2.25	–	1.26 $\pm$ 0.47	2.10 $\pm$ 1.36	2.40 $\pm$ 0.39	–	1.76 $\pm$ 2.39	3.04 $\pm$ 1.79	4.43 $\pm$ 2.68
Ours	1.88 $\pm$ 1.96	1.41 $\pm$ 0.89	0.55 $\pm$ 0.41	1.57 $\pm$ 0.90	2.10 $\pm$ 0.45	2.08 $\pm$ 2.25	1.38 $\pm$ 3.17	1.80 $\pm$ 1.10	3.06 $\pm$ 1.61
Ablation Experiment: MASD $\pm$ standard deviation ( $μ m$ )
OursWithoutDDP	1.90 $\pm$ 2.10	1.43 $\pm$ 0.93	0.56 $\pm$ 0.24	1.64 $\pm$ 1.06	2.10 $\pm$ 0.46	2.10 $\pm$ 2.42	1.47 $\pm$ 3.65	1.83 $\pm$ 1.04	3.00 $\pm$ 1.45
Ablation $p -values$ analysis for the MASD errors
Ours vs. OursWithoutDDP	1.43e-08	1.06e-13	3.23e-02	3.82e-24	9.78e-01	6.81e-04	9.23e-26	1.00e-03	2.24e-23

	Average 95% HD			Average HD
Methods	ILM	InnerRPEDC	OBM	ILM	InnerRPEDC	OBM
FCRN $(μ m)$ [30]^a	4.88	9.73	11.32	7.44	16.19	14.73
Hydrid2D3D $(μ m)$ [30]^a	4.78	8.99	9.91	7.41	14.78	13.43
OursWithoutDDP $(μ m)$	4.80	6.59	6.87	6.00	11.13	9.18
Ours $(μ m)$	4.74	6.53	6.99	5.91	10.98	9.25

Methods	Overall	ILM	RNFL-GCL	IPL-INL	INL-OPL	OPL-ONL	ELM	IS-OS	OS-RPE	BM
Method comparison: MASD (mean absolute surface distance)^a $\pm$ standard deviation ( $μ m$ ) errors
FCRN [29]	2.83 $\pm$ 1.48	2.41 $\pm$ 0.81	2.96 $\pm$ 1.70	2.87 $\pm$ 1.69	3.19 $\pm$ 1.49	2.72 $\pm$ 1.70	2.65 $\pm$ 1.14	2.01 $\pm$ 0.88	3.55 $\pm$ 1.73	3.10 $\pm$ 2.21
IPM [31]	2.78 $\pm$ 0.85	2.32 $\pm$ 0.27	3.07 $\pm$ 0.68	2.86 $\pm$ 0.33	3.24 $\pm$ 0.60	2.73 $\pm$ 0.57	2.63 $\pm$ 0.51	1.97 $\pm$ 0.57	3.35 $\pm$ 0.83	2.88 $\pm$ 1.63
Ours	2.75 $\pm$ 0.94	2.35 $\pm$ 0.38	2.89 $\pm$ 0.69	2.83 $\pm$ 0.41	3.03 $\pm$ 0.48	2.74 $\pm$ 0.62	2.57 $\pm$ 0.68	2.02 $\pm$ 0.80	3.31 $\pm$ 0.77	2.99 $\pm$ 1.99
Ablation Experiment: MASD $\pm$ standard deviation ( $μ m$ ) errors
OursWithoutDDP	2.77 $\pm$ 0.96	2.33 $\pm$ 0.30	2.91 $\pm$ 0.61	2.79 $\pm$ 0.39	3.12 $\pm$ 0.58	2.69 $\pm$ 0.61	2.62 $\pm$ 0.79	2.05 $\pm$ 0.98	3.45 $\pm$ 0.92	2.94 $\pm$ 1.87
Ablation $p -values$ analysis for the MASD errors
Ours vs. OursWithoutDDP	2.30e-56	4.94e-05	5.20e-03	1.59e-27	1.47e-111	2.33e-30	1.95e-74	1.13e-41	7.21e-264	1.65e-34

Methods	ILM	RNFL-GCL	IPL-INL	INL-OPL	OPL-ONL	ELM	IS-OS	OS-RPE	BM
Average 95% HD
OursWithoutDDP $(μ m)$	5.42	8.38	7.24	7.63	7.45	5.99	4.66	7.55	5.92
Ours $(μ m)$	5.44	8.11	7.31	7.5	7.59	5.85	4.47	7.50	5.90
DDP’s improvement	−0.37%	3.33%	−0.96%	1.73%	−1.84%	2.39%	4.25%	0.67%	0.34%
Average HD
OursWithoutDDP $(μ m)$	8.32	13.96	11.62	12.03	11.99	8.83	7.85	10.52	8.38
Ours $(μ m)$	8.18	13.12	11.33	10.97	11.22	8.25	6.75	10.36	8.2
DDP’s improvement	1.71%	6.40%	2.56%	9.66%	6.86%	7.03%	16.30%	1.54%	2.20%

Methods	Overall	ILM	RNFL-GCL	IPL-INL	INL-OPL	OPL-ONL	ELM	IS-OS	OS-RPE	BM
OursWithoutDDP	7.83 $\pm$ 3.06	12.25 $\pm$ 2.65	10.06 $\pm$ 1.39	8.47 $\pm$ 2.03	9.00 $\pm$ 2.01	9.10 $\pm$ 2.76	5.98 $\pm$ 2.04	4.95 $\pm$ 0.78	5.72 $\pm$ 1.57	4.98 $\pm$ 1.17
Ours	4.05 $\pm$ 1.09	3.66 $\pm$ 0.50	4.89 $\pm$ 1.03	4.81 $\pm$ 0.59	4.96 $\pm$ 0.51	4.31 $\pm$ 0.69	3.37 $\pm$ 0.73	2.74 $\pm$ 0.41	4.35 $\pm$ 1.01	3.38 $\pm$ 1.37

Deep learning network with differentiable dynamic programming for retina OCT surface segmentation

Abstract

1. Introduction

2. Method

2.1 Problem Formulation

2.2 Network architecture

2.3 DDP module with smoothness constraints

2.4 Loss functions

3. Experiments

3.1 Duke AMD OCT data

3.2 JHU MS OCT Data

3.3 DDP module on noise images

4. Discussion

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (5)

Tables (6)

Equations (9)

Biomedical Optics Express

	OursWithouthDP		Ours
	Duke AMD data	JHU MS data	Duke AMD data	JHU MS data
Training	11.43	1.46	2.76	0.31
Test	20.90	8.73	12.04	1.94