Longitudinal deep network for consistent OCT layer segmentation

Yufan He; Aaron Carass; Yihao Liu; Peter A. Calabresi; Shiv Saidha; Jerry L. Prince

doi:10.1364/BOE.487518

1. Introduction

Multiple sclerosis (MS) is one of the most common demyelinating diseases, in which the immune system attacks the protective sheath of the nerve fibers. It causes physical and cognitive disability, affecting millions of people globally. There is no cure for MS but disease modifying treatments are available to reduce disease activity. Magnetic resonance (MR) images (MRIs) and clinical testing are used to diagnose MS. The white matter lesions in the cerebrum, brain stem, and cervical spine that are associated with MS can be seen and measured from MRI. Optical coherence tomography (OCT) is a non-invasive imaging modality widely applied to retinal imaging. The thinning of the retinal nerve fiber layer (RNFL) and ganglion cell/inner plexiform layer (GCL+IPL) measured from OCT images has been found to be correlated with MS progression [1,2]. Thus, retinal layer thicknesses are becoming an important biomarker for studying MS.

Eight retinal layers are commonly considered in observing retinal OCT images (see Fig. 1). Since manually segmenting each layer is time consuming, automated retinal layer segmentation methods have been well explored [3–28]. Among existing methods, deep learning based algorithms [18,19,21,22,24–29] are the most popular for their speed and accuracy. Most of the deep learning OCT segmentation methods [18,21,22,24–28] are $2$D networks that independently segment each slice (B-Scan) from a $3$D OCT volume. A typical Spectralis OCT image is composed of $49$ B-Scans of size $496\times 1024$ pixels, where each B-scan resolution is $3.9\times 5.5$ micron, but the distance between B-scans is about 118 microns. Due to the high resolution of B-Scans and the large distance between them, it is reasonable to use $2$D deep networks for their efficiency and low memory consumption.

Fig. 1. Eight retinal layers: the retinal nerve fiber layer (RNFL); the ganglion cell layer combined with the inner plexiform layer (GCL+IPL); the inner nuclear layer (INL); the outer plexiform layer (OPL); the outer nuclear layer (ONL), the inner segment (IS); the outer segment (OS); and the retinal pigment epithelium (RPE). Surfaces between these layers are identified by hyphenating their acronyms, except the inner limiting membrane (ILM), the external limiting membrane (ELM), and the Bruch’s Membrane (BM). The fovea is also labeled in the figure.

Download Full Size | PDF

The retinal thinning associated with MS is temporally slow in the absence of acute optic neuritis attacks. Ratchford et al. [2] found that the GCL+IPL thickness thinning rate is about $0.32$ $\mu$m/year. However, for typical clinical OCT devices the pixel depth resolution is $2$ to $4$ $\mu$m. Although this retinal thinning can be observed in studies of large MS cohorts, monitoring the changes for individual subjects is extremely challenging. As shown in Fig. 2, the variation in eye positioning, noise, and highly diffuse layer boundaries can cause large difference in thickness measurements even for repeat scans acquired in a short period of time. A $2$D deep network does not use the information from either adjacent B-Scans or longitudinal scans; thus, the final segmentation may lack $3$D spatial and longitudinal consistency. As a result, the true longitudinal changes caused by MS is likely to be concealed by the variability of the measurements. To reveal the mild thickness changes, we developed a deep learning based segmentation pipeline with sub-pixel accuracy that improves:

– $3$D spatial segmentation consistency between adjacent $2$D B-Scans,
– $4$D longitudinal segmentation consistency between intra-subject longitudinal scans.

Most deep learning OCT segmentation methods are 2D networks that independently segment each slice (B-Scan) from a 3D OCT volume; there are also methods that directly use 3D convolution for segmenting 3D OCT volumes [30,31]. Although those 3D methods reported performance improvements, applying a 3D convolution with isotropic kernels to 3D OCT volumes with anisotropic resolution raises concerns. Specifically, in addition to the widely known high computation costs, the features extracted from the 3D kernel may suffer from poor generalizability and can easily lead to overfitting because the same anatomy structure appears differently within a B-scan and across B-scans [32]. This problem cannot be solved by interpolating OCT volumes to isotropic resolution because this will cause artifacts, as shown in Fig. 3. Moreover, 3D networks are computationally expensive and can cause GPU memory problems, especially for high-resolution scans like OCT (a typical Spectralis scan size is $496\times 1024\times 49$ pixels and a Cirrus scan size is $1024\times 512\times 128$ pixels). To utilize 3D information while avoiding 3D convolutions, we propose a 2D network to extract intra-slice features and a convolutional long-short-term-memory (LSTM) [33] network to leverage inter-slice information. This architecture has been explored for 3D medical image segmentation [32,34,35] with demonstrated efficacy, though has never been used for retinal OCT applications.

For 4D longitudinal consistency, a joint segmentation algorithm using registered longitudinal scans is usually performed. For example, in neuroimaging, longitudinal MRI scans are registered to a baseline scan or a follow-up scan and each voxel in the aligned space has longitudinal intensity observations which can be used to suppress noise and enable smooth longitudinal measurements [36,37]. However, for 3D OCT volumes, a voxel may not have a corresponding longitudinal observation. As shown in Fig. 4, the B-Scans of the two longitudinal scans (visits) of the same subject are positioned differently on the retina (red and green lines) and do not have many corresponding A-scans. In recent years, several methods developed longitudinal segmentations. Lang et al. [10] and Oguz et al. [14] proposed 4D graph methods to connect 3D graphs for retinal surfaces of different visits. However, the joint energy minimization used in a 4D graph cut cannot be adapted to feed forward deep networks where the output segmentation is a direct mapping from the input. Gao et al. [38] used a 4D fully convolutional LSTM network for longitudinal 3D brain MR segmentation without registration. However, using unregistered 3D scans as inputs to a 4D network contradicts the intention of using a convolutional layer because the features are extracted from distinct regions in each 3D scan. Additionally, the GPU memory requirement for such a network is impractical. Loo et al. [39] fine-tuned their network for each subject using the manually corrected segmentation from a previous time point. Manual correction and fine-tuning are time consuming tasks that become impractical for large cohorts. Rivail et al. [40] trained a network to predict the time intervals between two longitudinal scans and learned a better representation, but they only showed that the learned representations were good for disease severity classification. Li et al. [41] trained the registration and segmentation networks together for longitudinal segmentation; however, their 3D deformable registration of brain images cannot be applied to OCT images [42,43]. In summary, none of the existing methods provide a good solution to achieve consistent longitudinal segmentation of OCT scans with manageable GPU memory consumption.

Fig. 2. The layer boundary ambiguity in repeat scans from two subjects, with the repeat scans only one week apart.

Download Full Size | PDF

Fig. 3. A typical Spectralis 3D scan is interpolated to have approximately isotropic resolution. Left: $i$-th B-Scan. Right: $(i+1)$-th B-Scan. Middle: interpolated image with 10x upsample rate between B-Scans $i$ and $i+1$.

Download Full Size | PDF

Fig. 4. B-Scan locations of two longitudinal scans (red and green) from the same subject.

Download Full Size | PDF

To improve the 3D segmentation consistency and address the challenges in 4D longitudinal segmentation, we propose a longitudinal network for extracting features from a stack of 2D OCT B-scans and a novel longitudinal fusion network to incorporate longitudinal priors. We realize that the smooth and continuous retinal surfaces across the whole retina can be easily interpolated without introducing artifacts. Our proposed longitudinal network uses an iterative surface registration and segmentation algorithm that registers and warps the segmented retinal surfaces from a previous time point to the current timepoint to provide a longitudinal prior without introducing artifacts. To reduce the GPU memory costs of joint segmentation of all longitudinal scans, we use a scheme inspired by hidden Markov models that can segment longitudinal scans sequentially.

2. Methods

2.1 Pseudo-3D feature extraction with convolutional LSTM

For a 3D OCT scan $x_t$ from visit $t$ with $N$ slices $x^1_t, \ldots, x^N_t$, we separate the B-Scans into $\lceil N/k \rceil$ stacks, where $k$ is the number of adjacent B-Scans in a stack ($k=3$ in our setting). We use a 2D encoder-decoder architecture to extract features from a stack of adjacent 2D B-Scans. Each 2D B-scan is first processed by a convolutional encoder independently. Inter-slice information across the stack of B-Scans are fused using a bidirectional convolutional LSTM (Bi-Conv-LSTM). The feature maps from the Bi-Conv-LSTM and the encoder are concatenated and sent to the 2D decoder. In other words, we replace the convolutional block that is usually used at the bottleneck of a residual U-Net with a Bi-Conv-LSTM. Bi-Conv-LSTM allows us to leverage information from adjacent B-Scans without using 3D convolutions. This design is similar to the previous use of convolutional LSTM to capture high-level inter-slice information that will guide the segmentation of the current B-scan [34,35]. We did not use convolutional LSTM at up-sampled feature maps as in [44,45] because we expect that level of information to not be as helpful because of the large distance between B-scans. The detailed structure of the network, denoted as LSTM-UNet, is shown in Fig. 5. For each stack of $k$ B-scans of size $H\times W$, LSTM-UNet outputs $k$ feature maps of size $C\times H\times W$, where $C$ is the number of channels in each feature map. Therefore, $k$ separate B-scans produces $k$ representations.

Fig. 5. The detailed architecture of LSTM-UNet.

Download Full Size | PDF

In our LSTM-UNet, we modify the convolutional LSTM [33], by replacing the input, output, and forget gate with residual convolutional blocks (ResBlock). For the $i^{\text {th}}$ B-Scan $x^{(i)}$ in the stack, we use two ResBlocks in each gate for the hidden state $h^{(i)}$ and feature output from the encoder. The resultant cell state is denoted as $c^{(i)}$. For each new stack, the hidden state $h^{(0)}$ and cell state $c^{(0)}$ are initialized with zero tensors that have the same spatial resolution as the input. Since the B-Scan sequence in a 3D OCT volume can be considered as symmetric, instead of using a conventional bidirectional convolutional LSTM, we use a single LSTM to model both the forward and backward directions. In the forward direction, a stack of features from $k$ B-Scans will generate $k$ hidden states $h^{(1)},\ldots,h^{(k)}$. In the backward direction, we re-initialize the hidden state and cell state, and use the reversed sequence of features to get $h^{(k')},\ldots,h^{(1')}$. The final output of the Bi-Conv-LSTM for the $i^{\text {th}}$ B-Scan is the average of $h^{(i)}$ and $h^{(i')}$. For an input 3D OCT scan $x_t$ with $N$ slices, $N$ feature maps $f_t^1,\ldots,f_t^N$, each of size $C\times H\times W$ are produced.

2.2 Longitudinal prior fusion with surface registration

For 4D longitudinal consistency, our goal is to leverage longitudinal information for segmentation. However, a direct joint segmentation of the longitudinal scans faces two challenges: 1) artifacts and errors arise from interpolating highly anisotropic 3D images prevents meaningful registration and 2) large GPU memory usage for a 4D joint segmentation network with all longitudinal 3D scans as inputs. To obtain spatially aligned longitudinal information without image intensity interpolation, we use the segmented surfaces as the longitudinal prior. Since the segmented surfaces are smooth and continuous across B-Scans, the surface interpolation is more reliable than interpolating intensity values. To solve the GPU memory issue, we avoid segmenting all the past images at the same time and propose a hidden Markov model based on retinal surfaces.

2.2.1 Hidden Markov model

In this section, we present a longitudinal deep network motivated by the hidden Markov model (HMM). We note that our network is analogous to the HMM but, strictly speaking, not a Bayesian deep network. Given longitudinal scans $x_0, x_1, \ldots, x_t$ from the same subject, we want to obtain the surface segmentation $s_t$ for $x_t$ using all the previous scans, which from a probabilistic view, is equivalent to estimating $\alpha _t(s_t) = p(s_t, x_0, \ldots, x_t)$. Considering the difficulties in the inference and adaptation to a deep learning setting, we model the longitudinal segmentation as a HMM. As shown in Fig. 6, the retinal layer surfaces $s_0, \ldots, s_t$ are the hidden states and the OCT images $x_0,\ldots, x_t$ are the observations. Each observation $x_i$ is dependent only on its hidden state $s_i$. Since we are focused on MS patients and the retinal layers are changing slowly in a short period of time, we assume $s_t$ and $s_{t-1}$ can be modeled by some transition probability $p(s_t|s_{t-1})$. We note that this first-order Markov chain where $s_t$ only depends on $s_{t-1}$ is a coarse model since the treatment or other environmental changes can affect the retinal layer rate of change. To estimate $\alpha$, we have the Forward Algorithm as follows,

(1)$$\begin{aligned} \alpha_t(s_t) & = p(s_t, x_0, \ldots, x_t) = \sum_{s_{t-1}} p(s_t, s_{t-1}, x_0, \ldots, x_t),\\ & =\sum_{s_{t-1}}p(x_t|s_t,s_{t-1}, x_0, \ldots, x_{t-1}) p(s_t|s_{t-1}, x_0, \ldots, x_{t-1}) p(s_{t-1}, x_0, \ldots, x_{t-1}),\\ & =p(x_t|s_t)\sum_{s_{t-1}}p(s_t|s_{t-1})\alpha_{t-1}(s_{t-1}). \end{aligned}$$

In a HMM, $p(x_t|s_t)$ and $p(s_t|s_{t-1})$ are given by the model’s emission distributions and transition probabilities, and $\alpha _t(s_t)$ can be easily computed by the forward algorithm. However, in deep learning, where we learn a direct mapping from the input to the output, how can we formulate our deep network pipeline for this HMM? In our problem, for every $t$, $s_t$ is a segmentation and the corresponding ground truth $s^*_t$ is known during training. Therefore, we can view $\alpha _t(s_t)$ as a delta function $\alpha _t(s_t) = \delta (s_t-s^*_t)$ instead of a distribution. This allows us to rewrite Eq. (1) as:

(2)$$\begin{aligned} \alpha_t(s_t) &= p(x_t|s_t)\sum_{s_{t-1}}p(s_t|s_{t-1})\delta(s_{t-1}-s^*_{t-1}),\\ &= p(x_t|s_t)p(s_t|s^*_{t-1}) = p(x_t|s_t,s^*_{t-1})p(s_t|s^*_{t-1}),\\ &= p(x_t,s_t|s^*_{t-1}) = p(s_t|x_t,s^*_{t-1})p(x_t|s^*_{t-1}) \propto p(s_t|x_t,s^*_{t-1}). \end{aligned}$$

A segmentation $s_t$ that minimizes the difference between $\alpha _t(s_t)$ and $\delta (s_t-s^*_t)$ is what we desire (a discussion on the Viterbi algorithm is presented in Sec. 2.5). Equation (2) motivates us to train a feed forward network $\Phi (\cdot ;\theta )$ with parameters $\theta$ to explicitly model $p(s_t|x_t,s^*_{t-1})$. During training, $\Phi (\cdot ;\theta )$ takes $x_t$ and $s^*_{t-1}$ as inputs and uses $s^*_t$ to optimize $\theta$ as

(3)$$\theta^* = \mathop{\textrm{arg}\,\textrm{min}}\limits_{\theta} \mathcal{L} \left( p( \Phi(x_t,s^*_{t-1};\theta)|x_t,s^*_{t-1}), \delta( \Phi(x_t,s^*_{t-1}; \theta) - s^*_t ) \right),$$

where $\mathcal {L}$ is the loss function that captures the difference between $\alpha _t(s_t)$ and $\delta (s_t-s^*_t)$.

Fig. 6. Hidden Markov model for longitudinal segmentation.

Download Full Size | PDF

The longitudinal segmentation problem is now simplified to a per-visit segmentation problem. Specifically, for a subject scanned longitudinally with image $x_t$ at visit $t$, we segment $x_t$ as $s_t = \Phi (x_t, s_{t-1})$, with the knowledge of $s_{t-1}$ from the previous visit $x_{t-1}$ (since the ground truth $s^*_{t-1}$ is not available during inference stage).

The mapping $\Phi$ is a fully convolutional network (FCN), which preserves the spatial relationship between its input and output. Therefore, when $x_t$ and $s_{t-1}$ are not spatially aligned, the training is extremely difficult due to the large position variation. Thus, we register $s_{t-1}$ to match $x_t$ spatially. The details of the registration step are introduced in Sec. 2.4. We denote the warped segmentation at visit $t-1$ as $s^r_{t-1}$, and now we have $s_t = \Phi (x_t, s^r_{t-1})$. $s_{t-1}$ and $s^*_{t-1}$ in Eqs. (1), (2), and (3) are now replaced by $s^r_{t-1}$ and $s^{r*}_{t-1}$, respectively. We use $s_{t-1}$ interchangeably with $s^r_{t-1}$ in equations of distributions in the following sections.

2.3 Longitudinal network $\Phi$

For the longitudinal OCT segmentation task, $\Phi$ is the combination of LSTM-UNet in Sec. 2.1 and a longitudinal fusion network (Long-Fuse-Net) shown in Fig. 7(b). During registration, we represent the surface segmentation $s_{t-1}$ with a matrix of size $9\times W \times N$ where each element is the sub-pixel position of a surface (nine surfaces in total) at an A-Scan location. After registration, $s^r_{t-1}$ is converted to layer masks $m^r_{t-1}$ of size $H\times W \times N$ and concatenated with the feature maps from the LSTM-UNet, as shown in Fig. 7. The concatenated features for each B-Scan are sent to a shared 2D fusion block independently, and the output is added to the features from LSTM-UNet. This step uses the idea of residual connection and the idea of LSTM gates. The fusion block takes the longitudinal prior and features from LSTM-UNet as input, and decides what new information should be added to those features. If the longitudinal prior is not informative due to registration failure, large longitudinal retinal changes, or segmentation errors in $s_{t-1}$, we expect the fusion block to weight the longitudinal prior less and depend more on the image features.

Fig. 7. The network $\Phi$ for longitudinal OCT retinal layer segmentation. (a) LSTM-UNet: Extract features and fuse inter B-Scans information using bidirectional convolutional LSTM (Bi-Conv-LSTM). (b) Long-Fuse: Fuse features with longitudinal prior and generate final surface output and mask output. $\Phi$ consists of both LSTM-UNet in (a) and Long-Fuse in (b) and is trained end-to-end. Network blocks with the same name within the dotted box share weights. The fusion blocks, Conv-s and Conv-m, are fully convolutional with stacks of resblocks, as shown in Fig. 5.

Download Full Size | PDF

The output from the fusion block is sent to a surface head (Conv-s) and a mask head (Conv-m) for final topology correct surface segmentation $s_t$ and pixel-wise labeling layer mask. The Conv-s is a convolution block with column-wise softmax, soft-argmax surface regression, and a topology guarantee module [27,28], and it outputs the surface segmentation of size $9\times W\times N$. Conv-m is a convolution block with channel-wise softmax, and it outputs layer probability maps of size $H\times W \times 9 \times N$ (8 layers and 1 background class). The fusion blocks, Conv-s and Conv-m, are fully convolutional with stacks of resblocks, as shown in Fig. 5.

2.4 Iterative surface registration and segmentation

To obtain a spatially aligned $s^r_{t-1}$ with $x_t$, we need OCT image registration. To avoid introducing artifacts during interpolating the anisotropic OCT image, we modify an existing OCT registration method from [10] to focus on surface alignment. Specifically, given $x_{t-1}$, $s_{t-1}$, and $x_t$, we perform the registration via the following steps.

Total Retinal Thickness Map A initial segmentation $s_t=\Phi (x_t,0)$ of size $9\times W\times N$ is first obtained, where we set the longitudinal prior as masks with all zeros since $s_{t-1}$ is not registered to $x_t$. We obtain a total retinal thickness map $\mathbf {T}$ of size $W\times N$. It is the position difference between the top and bottom layer surface from $s_t$, and is normalized to $[0,1]$ by dividing the maximum thickness value.

Fundus Image Generation Since retinal blood vessels project strong shadows over the RPE layer, we sum up the OCT image pixels from the bottom surface (Bruch’s Membrane) to 80 $\mu$m above, divide by the mean value, and scale to $[0,1]$. To get a smoother fundus image, $\mathbf {T}$ is added to the vessel projection image. Examples of the fundus images are shown in Fig. 8.

Fundus and Axial Alignment We first estimate the fovea as the thinnest point of the retinal thickness. $\mathbf {F}_{t}$ and $\mathbf {F}_{t-1}$ are initially rigidly registered by aligning the fovea, and then affinely registered by minimizing the Matts Mutual Information [46] using an evolutionary algorithm [47]. An example of the warped $\mathbf {F}_{t-1}$ is shown in Fig. 8. The resultant transformation is used to warp $s_{t-1}$ to align with $s_t$ in the fundus plane. However, the depth (A-Scan direction) is not aligned. Since the IS-OS surface is distinct, we shift all the surfaces in $s_{t-1}$ (surface distances unchanged) such that the IS-OS surface aligns with the IS-OS surface in $s_t$. The overall transformation is denoted as $\mathcal {T}$. An example of the final fundus and axially aligned $s^r_{t-1}$ is shown in Fig. 9.

Iterative Registration and Segmentation $s_t = \Phi (x_t, 0)$ without any prior is a coarse segmentation and therefore the registration can also be less accurate. We use the coarse registered $s^r_{t-1}$ for a better $s_t=\Phi (x_t,s^r_{t-1})$ and iterate over those steps until a max iteration is achieved. The overall method is shown in Algorithm 1.

Algorithm 1. Iterative segmentation and registration

View Table | View all tables in this article

Fig. 8. Generated fundus image $\mathbf {F}_{t-1}$ from $x_{t - 1}, s_{t - 1}$ (left), $\mathbf {F}_{t}$ from $x_t, s_t$ (middle), and the warped $\mathbf {F}_{t-1}$ (right).

Download Full Size | PDF

Fig. 9. A B-scan overlaid with $s_t$ (red) and registered segmentation $s_{t-1}^r$ (green).

Download Full Size | PDF

2.5 Forward and backward segmentation chain

In Sec. 2.2.1, we seek a segmentation $s_t$ that maximizes $\alpha _t(s_t) = p(s_t, x_1,\ldots,x_t)$ using the past observations and a forward algorithm which is a classical HMM inference algorithm that computes the marginals $p(s_t, x_1,\ldots,x_T)$ over the entire sequence $x_1,\ldots,x_T$ for any timepoint $t$. Given the sequence of observations, we can also try to use all observations—including future ones—and maximize $p(s_t, x_1,\ldots,x_T) \propto p(s_t|x_1,\ldots,x_t)p(x_{t+1},\ldots,x_T|s_t)$ to segment $x_t$. The forward pass provides $p(s_t|x_1,\ldots,x_t)$ and the backward pass provides $p(x_{t+1},\ldots,x_T|s_t)$ as follows

(4)$$\begin{aligned} \beta_t(s_t) &= p(x_{t+1},\ldots,x_T|s_t) = \sum_{s_{t+1}}p(s_{t+1}|s_t)p(x_{t+1},\ldots,x_T|s_t,s_{t+1}),\\ &= \sum_{s_{t+1}}p(s_{t+1}|s_t)p(x_{t+1}|s_{t+1},s_t)p(x_{t+2},\ldots,x_T|s_{t+1},s_t,x_{t+1}),\\ &= \sum_{s_{t+1}}p(s_{t+1}|s_t)p(x_{t+1}|s_{t+1})p(x_{t+2},\ldots,x_T|s_{t+1}),\\ &= \sum_{s_{t+1}}p(s_{t+1}|s_t)p(x_{t+1}|s_{t+1})\beta_{t+1}(s_{t+1}). \end{aligned}$$

We note that Viterbi’s algorithm is used to find optimal hidden state sequences $s_1, \ldots, s_T$ by maximizing $p(s_1, \ldots, s_T, x_1, \ldots, x_T)$. However, our hidden state is in a high-dimensional, non-discrete segmentation space and we do not have a distribution in our current setting (as we do not use Bayesian networks), and thus we cannot directly apply the forward-backward pass defined in an HMM probabilistic model or Viterbi’s algorithm for the optimal segmentation at all time points. In this section, we propose a different forward-backward algorithm.

Forward pass Our network $\Phi$ is trained to output $s_t$ that maximizes $\alpha _t(s_t)=p(s_t|x_t,s^*_{t-1})$. However, we do not have $s^*_{t-1}$ during deployment, so we start with $s_1=\Phi (x_1,0)$. The forward pass segments $x_1,\ldots,x_T$ sequentially as shown in Fig. 10. $s_2$ is obtained from $\Phi (x_2,s^r_1)$ (after iterative registration) and this continues for all time points using successive application of $\Phi$, as $s_t=\Phi (x_t,\Phi (x_{t-1},\ldots,\Phi (x_1,0))$.

Backward pass The information from the future observations $\beta _t$ and past observations $\alpha _t$ are combined using $\eta _t = \beta _t*\alpha _t$. Following the assumption used in Sec. 2.2.1, we can assume the ground truth hidden state is known and can have $\beta$ as delta function,

(5)$$\beta_t(s_t)=\delta(s_t-s^*_t)= p(s^*_{t+1}|s_t)p(x_{t+1}|s^*_{t+1})\propto p(s^*_{t+1}|s_t)$$

Fig. 10. Forward longitudinal Markov segmentation chain and backward chain. The blue dotted box is the iterative registration and segmentation diagram.

Download Full Size | PDF

The product of $\alpha _t$ and $\beta _t$, denoted as $\eta _t$, provides a better estimate by using all available longitudinal scans:

(6)$$\eta_t(s_t) = \alpha_t(s_t)\beta_t(s_t) \propto p(s_t|x_t,s^*_{t-1})p(s^*_{t+1}|s_t).$$

Ideally, our network $\Phi (\cdot ;\theta )$ should take three inputs of $s^*_{t+1}$, $s^*_{t-1}$, and $x_t$, and output $s_t$ that maximizes $\eta _t$. However, this requires $s_{t+1}$ when segmenting $x_t$ which is not feasible as we segment the data sequentially. We solve this by jumping out of the probabilistic perspective and approach the problem from a deep learning perspective. Our network $s_t=\Phi (x_t,s_{t-1})$ takes $s_{t-1}$ as a longitudinal prior. It can be considered as the attention that tells $\Phi$ where to focus. Meanwhile, the intensity features from $x_t$ are combined with prior non-intensity features to be more robust. The $\Phi$ is trained to balance the current observation with the prior information automatically. Since $s_{t-1}$ is not identical to $s_t$, the training will force $\Phi$ to rely more on intensity features from $x_t$ when $s_{t-1}$ shows a large difference from $s_t$ (notice $\Phi$ is an FCN that preserves its input and output spatial relationships). If the intensity features are not enough for a correct prediction, the network will rely more on prior $s_{t-1}$. As long as the training data contains enough variations of $s_{t-1}$ and $x_t$, the network will learn the trade-off automatically. An example is shown in Fig. 9; the network learns to output the correct segmentation with both good and bad priors.

We assume that $s_{t+1}$ should not change much from $s_t$, which means that $s_t$ should not differ much from $s_{t+1}$. $s_t$ can be used as a prior for $s_{t+1}$ and thus $s_{t+1}$ should also be valid for $s_t=\Phi (x_t,s_{t+1})$, which enables the backward pass. The problem with the forward pass is that the $s_1$ segmentation without any prior may contain large errors, which may be propagated to the later scans. So we use $s_T=\Phi (x_T,0)$ as a starting point for the backward pass, which uses $s_t=\Phi (x_t,s_{t+1})$. Together, these steps can be written as

(7)$$\begin{aligned} s^{f}_t&=\Phi(x_t,\Phi(x_{t-1},\ldots,\Phi(x_1,0)),\quad t=1,\ldots,T, \quad\text{Forward Pass}\\ s^{b}_t&=\Phi(x_t,\Phi(x_{t+1},\ldots,\Phi(x_{T},s^f_T)),\quad t=1,\ldots,T-1, \quad\text{Backward Pass}\\ s_t &= \begin{cases} s^{b}_t, & t=1,\ldots t-1\\ s^{f}_t, & t=T \end{cases}, \quad\text{Final Results,} \end{aligned}$$

as illustrated in Fig. 10. Errors propagated in the forward pass are diminished by later observations. Starting from the last visit’s forward pass, we use a backward pass to get the final results.

3. Experiments

Training Data 3D macular OCT scans from Zeiss Cirrus and Heidelberg Spectralis scanners were used in our experiments. 2,037 Cirrus scans from 176 subjects and 347 Spectralis scans from 70 subjects were used for training. 27 Spectralis scans from 6 subjects and 90 Cirrus scans from 12 subjects were used for training validation. The results were evaluated using pseudo-ground truth from Aura [6]. It is a state-of-the-art method compared to other publicly available tools [48,49]. We have used Aura for segmenting over ten thousand clinical OCT scans and have achieved many clinical findings [50–54]. Obtaining manual delineation of retinal layers on large datasets (over thousands of images) is time consuming; thus the use of Aura as a pseudo-ground truth for training the deep network is reasonable and practical. To ensure the quality of our pseudo-ground truth, the AURA results were manually reviewed and no dramatic failures were observed. A discussion about the pseudo-ground truth training is in Sec. 4.

Training Details The number of samples from the two scanners are very different so we use a weighted sampler such that the probability of sampling from each scanner is the same. We randomly sampled a 3D scan $x_t$, its registered prior $s^{r*}_{t-1}$ and segmentation $s^*_{t}$ from this scanner-balanced sampler and sliced the 3D scan into stacks of B-Scans as described in Sec. 2.1. In each iteration, a single stack of adjacent B-Scans was resampled to a fixed within-B-Scan resolution of 5$\times$15 $\mu$m and augmented with random horizontal flipping, random horizontal scaling, random Gaussian noise, and random gamma correction. Also, two core corruption schemes were used: 1) random prior removal with probability $0.5$ and 2) random B-Scan removal. Random prior removal forces $\Phi$ to generate good segmentations with and without priors and random B-Scan removal forces the network to use inter-B-Scan information. For random B-Scan removal, we randomly set a B-Scan to zero (a stack with all zero B-Scans is not allowed to avoid gradient problems), thus the network has to use the Bi-Conv-LSTM to predict the correct segmentation for a missing B-Scan using its neighboring B-Scans. We trained the network parameters using the loss described in [27]. The loss $\mathcal {L}$ is a combination of pixel-wise labeling loss (Dice + Cross Entropy), surface position classification loss (Cross Entropy), and surface position loss (smooth $L_1$ loss). These three losses are linearly combined with weights 1, 1, and 10 respectively during training. We use the Adam optimizer with learning rate $1\times 10^{-4}$ and weight decay $1\times 10^{-4}$. We trained our network for 10 epochs and validated the model every 200 mini-batches on the validation dataset. The best average network performance with and without a prior (intentionally removed) were used as an early stopping criteria. For iterative registration and segmentation, we set the maximum iteration to one to save inference time.

3.1 Consistency on repeated scans

We first evaluate the segmentation consistency on repeated scans. To do this, we used ten healthy subjects where each subject was scanned twice within a week for both eyes (twenty pairs of repeated scans but one pair is excluded due to file damage).The retinal thickness of the repeated scans should not change within this short period of time for healthy subjects. We compare our method Deep-Backward (final forward backward results) with baselines (1) Deep-Single, where each scan is segmented using its image only $s_t=\Phi (s_t,0)$, (2) Deep-Forward, which uses only the forward pass results, and (3) Aura [6]. The repeated scan retinal layer thickness differences (averaged over the $5\times 5$mm rectangular area) are shown in Fig. 11 and Table 1. Overall, it is observed that the our methods reduces the segmentation variations for repeated scans, and the segmentation with longitudinal priors can further improve the consistency. The performance differences between the proposed Deep-Backward and the two variations Deep-Single and Deep-Forward demonstrate the effectiveness of each component and amount to an ablation study. In particular, the performance difference between Deep-Backward and Deep-Forward demonstrates the effectiveness of the proposed Forward and Backward Segmentation Chain; the performance difference between Deep-Backward and Deep-Single demonstrates the effectiveness of the proposed longitudinal network in incorporation of longitudinal information.

Fig. 11. Boxplots of layer thickness differences ($\mu$m) (averaged over 5$\times$5 mm macular rectangular region) on repeated scans. "Overall" is the mean of the absolute value of all layer differences.

Download Full Size | PDF

Table 1. Mean (standard deviation) in $\mu$m for retinal layer thickness differences on repeated scans. “Overall” is the mean of the absolute value of each column. ‘*’ denotes Wilcoxon ranksum test significance ($p < 0.05$) by comparing ‘Deep-Backward’ with ‘Aura’.

View Table | View all tables in this article

3.2 Longitudinal segmentation stability

For longitudinal changes in MS subjects, we do not know the true change caused by the disease, so we measure the thickness measurement stability. Specifically, for each MS subject with retinal thickness $T_1, T_2, \ldots, T_t$ measured from $t$ visits $x_1, x_2, \ldots, x_t$ at age $a_1, a_2, \ldots, a_t$, we fit the retinal thickness to the age with a straight line using least squares method

(8)$$\left[T_1 \;\; T_2 \;\; \cdots \;\; T_t \right]^{\top} = b \left[a_1 \;\; a_2 \;\; \cdots \;\; a_t\right]^{\top} + c$$

and the residual fitting error can be used as a measurement for segmentation instability. The root mean square fitting error (RMSE) is defined as $\sqrt {\sum _{i=1}^t(T_i - b*a_i - c)^2 / t}$. Testing was done on 75 Spectralis scans from 13 MS subjects. The ages of the subjects are between 18 and 100 and each MS subject has at least four scans. The RMSE results of Deep-Single, Deep-Forward, Deep-Backward, and Aura are shown in Fig. 12 and Table 2, and it is observed that Deep-Backward achieves the best stability in measuring longitudinal changes for MS. The retinal thickness change trajectories of INL—see Fig. 13(a)—shows that Deep-Backward achieves better stability in outlying cases. This is also observed in the retinal thickness change trajectories of RNFL and GCL+IPL (See Figs. S1 and S2 in Supplement 1).

Fig. 12. The boxplot of root mean square fitting error ($\mu$m) of longitudinal retinal layer thickness. "Overall" is the mean of all the layers’ RMSE.

Download Full Size | PDF

Fig. 13. (a) Trajectory of INL thickness changes for each MS subject. (b) The boxplot of mean absolute surface distance (MAD) compared with manual delineation on 20 Spectralis scans.

Download Full Size | PDF

Table 2. Mean and standard deviation ($\mu$m) for RMSE of longitudinal retinal layer thickness.

View Table | View all tables in this article

3.3 Segmentation accuracy

The previous two sections measure the stability of the measurement, which is not a standard measurement for segmentation evaluation. In this section, we compare with manual delineation on independent OCT scans (longitudinal datasets with manual delineation are not available at this time). Twenty 3D Spectralis OCT scans with manual surface delineations [55] are used for evaluation. We used the Aura results $s^a$ as an input prior for the network $\Phi (x,s^a)$, which we denote as Deep-Forward*. We compare Deep-Forward* with Deep-Single, which does not use any prior information. The results are in Fig. 13(b) and Table 3. We observe that the Deep-Single result can have segmentation outliers and the results are slightly worse than Aura. In contrast, the Deep-Forward* achieved much improved results by incorporating the prior information. We point out that the manual delineation is also biased and may not be very accurate due to image and rater uncertainty, but it will not contain large segmentation errors like results from automated algorithms. Aura is trained from another small dataset different than these 20 scans, and there are large differences between these two sets of manual delineations. The segmentation bias in Aura is considered acceptable based on our past experience in MS research, which is why we trained our network with Aura results as ground truth. Figure 14 shows the MAD results using Aura results as validation ground truth. By providing the Aura results as prior, the Deep-Forward* does not output the same results as Aura, but the results are improved with fewer outliers. This shows the efficacy of the prior provided to the network.

Fig. 14. Boxplot of MAD compared with Aura results on 20 Spectralis scans.

Download Full Size | PDF

Table 3. Mean absolute surface distance (MAD) and standard deviation ($\mu$m) compared with manual delineation.

View Table | View all tables in this article

3.4 Qualitative results

Longitudinal Prior We find our longitudinal prior to be very effective at situations where the retinal layer information are missing in some B-Scans. Figure 15 shows an example where a B-Scan is missing (black). The segmentation result based on adjacent B-Scans features from Bi-Conv-LSTM is not very accurate since the input feature spatial resolution is eight times lower than the original image. Without an inter-B-Scan fusing module, the results from a pure 2D UNet are completely useless and have to be excluded from the layer thickness statistics. On the other hand, if a longitudinal prior is provided, the segmentation on a blank B-Scan will largely rely on the longitudinal prior and will not have a large effect on the thickness statistical analysis.

Forward Backward Here we show the segmentation changes in the forward and backward pass. As shown in the top figure in Fig. 16, the segmentation using the prior (Deep-Forward) improved over Deep-Single in many regions (e.g., right zoom in window) but an incorrect prior may also lead to regions with worse performance (e.g., left window). Although errors in the early visits will be propagated to future times, those errors will be gradually corrected with the new observations. The backward prior shown in the bottom figure of Fig. 16 is much better, and the final segmentation (Deep-Backward) can successfully leverage the current image and backward prior. Some extreme cases on datasets with manual delineations are provided in Fig. S3 and S4 available in the Supplement 1.

4. Discussion

In this paper, we proposed a pipeline for longitudinal OCT retinal layer segmentation. The inter-slice features are fused by a Bi-Conv-LSTM to improve segmentation consistency between adjacent B-Scans. The segmented surfaces from a previous visit are registered as a longitudinal prior for the current segmentation. We borrowed the concept of a hidden Markov model and proposed a forward backward pass to iteratively segment longitudinal scans. For the experiments, we trained our model using the Aura [6] results as training ground truth because manual delineation on large longitudinal datasets is currently unavailable. More importantly, we observed the manual delineation consistency is worse than the results from Aura tools and there are large variation in manual delineations and large rater differences. This highlights the challenge of longitudinal consistency in OCT segmentation as well as its importance. Training our model on the Aura results may cause our model to learn segmentation biases from Aura. However, the Aura tool has been used for analysing tens of thousands of clinical scans over the past five years and proved its accuracy and robustness in scientific research. In our longitudinal studies of multiple sclerosis, we care more about the longitudinal changes and consistency, and having the same bias as Aura can improve the statistical consistency of our model with previous clinical findings from Aura.

Fig. 15. Top: The segmentation of blank B-Scan (slice 13) using adjacent slices without prior (Deep-Single). Bot: Segmentation results with registered prior (Deep-Forward) overlaid on original B-Scan (slice 13).

Download Full Size | PDF

Fig. 16. The forward backward pass. (a) The registered forward prior, segmentation without prior (Deep-Single) and segmentation using the prior (Deep Forward) in the forward pass. (b) The registered backward prior from a future scan and the segmentation using the prior in the backward pass.

Download Full Size | PDF

Our network is trained to balance information from 1) the intensity information from the current B-scan, 2) the prior information from adjacent B-scans learned by LSTM-UNet, and 3) the information from longitudinal priors. If our training data contains examples where adjacent B-scans have sudden movements (jumps), our network will be able to learn to ignore the prior information from adjacent B-scans. As we show in Fig. 15, when the intensity information is completely missing from a B-scan, our model is able to accurately segment the adjacent B-scan, which suggests that the features from the blacked out B-scan are not considered in segmenting its adjacent B-scan. However, if those jumps are not observed during training, it will be difficult for our model to automatically adjust the balance between the information from the current B-scan and the prior information from adjacent B-scans and longitudinal scans.

Longitudinal consistency is not considered in our training ground truth since Aura segments each OCT scan independently. From a deep learning training perspective, then how does our network improve longitudinal consistency if the ground truth is not longitudinally consistent? For OCT images, the segmentation variation caused by inherent data uncertainty from resolution limitation, noise, and artifacts cannot be solved by more training data. We want the segmentation to be accurate in regions with clear boundaries and low noise and artifacts while regions with high data uncertainty should be segmented with a longitudinal prior for improved consistency. So why can the deep network improve the consistency in high uncertainty regions if it is trying to output Aura results as training ground truth? Aura results on unclear regions will largely depend on the graph-based methods, which are less reliant on the intensity features than our FCN. Therefore, the correspondence between image features and Aura results is harder for our FCN to learn. Deep networks will learn the easy and general features at first to quickly minimize the training loss. The longitudinal prior, which is close to the ground truth segmentation from Aura, is much easier to use to quickly reduce the loss and give our model the potential to achieve a balance between using current intensity features and longitudinal priors. However, the deep network may finally over-fit to Aura results and show no improvement on consistency. Currently, we show our longitudinal consistency improvements over Aura through experimentation. Improving the deep network results from its training ground truth is an active research field called learning from noisy labels [56,57], and is a future direction to further improve and formalize our method. Besides potentially improved longitudinal consistency, the longitudinal priors also serve as attention maps and can increase the robustness against intensity variations.

Funding

National Institute of Neurological Disorders and Stroke (R01-NS082347); National Eye Institute (R01-EY032284); National Eye Institute (R01-EY024655).

Acknowledgment

This work was supported by the NIH under NEI grant R01-EY024655 (PI: J.L. Prince), NEI grant R01-EY032284 (PI: J.L. Prince), and NINDS grant R01-NS082347 (PI: P.A. Calabresi).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in Section 3.1 and 3.2 are not publicly available at this time. Data underlying the results presented in Section 3.3 are available at [55].

Supplemental document

See Supplement 1 for supporting content.

References

1. L. S. Talman, E. R. Bisker, D. J. Sackel, et al., “Longitudinal study of vision and retinal nerve fiber layer thickness in multiple sclerosis,” Ann. Neurol. 67(6), 749–760 (2010). [CrossRef]

2. J. N. Ratchford, S. Saidha, E. S. Sotirchos, J. A. Oh, M. A. Seigo, C. Eckstein, M. K. Durbin, J. D. Oakley, S. A. Meyer, A. Conger, T. C. Frohman, S. D. Newsome, L. J. Balcer, E. M. Frohman, and P. A. Calabresi, “Active MS is associated with accelerated retinal ganglion cell/inner plexiform layer thinning,” Neurology 80(1), 47–54 (2013). [CrossRef]

3. M. K. Garvin, M. D. Abràmoff, X. Wu, S. R. Russell, T. L. Burns, and M. Sonka, “Automated 3-D intraretinal layer segmentation of macular spectral-domain optical coherence tomography images,” IEEE Trans. Med. Imag. 28(9), 1436–1447 (2009). [CrossRef]

4. Q. Yang, C. A. Reisman, Z. Wang, Y. Fukuma, M. Hangai, N. Yoshimura, A. Tomidokoro, M. Araie, A. S. Raza, D. C. Hood, and K. Chan, “Automated layer segmentation of macular OCT images using dual-scale gradient information,” Opt. Express 18(20), 21293–21307 (2010). [CrossRef]

5. S. J. Chiu, X. T. Li, P. Nicholas, C. A. Toth, J. A. Izatt, and S. Farsiu, “Automatic segmentation of seven retinal layers in SDOCT images congruent with expert manual segmentation,” Opt. Express 18(18), 19413–19428 (2010). [CrossRef]

6. A. Lang, A. Carass, M. Hauser, E. S. Sotirchos, P. A. Calabresi, H. S. Ying, and J. L. Prince, “Retinal layer segmentation of macular OCT images using boundary classification,” Biomed. Opt. Express 4(7), 1133–1152 (2013). [CrossRef]

7. A. Carass, A. Lang, M. Hauser, P. A. Calabresi, H. S. Ying, and J. L. Prince, “Multiple-object geometric deformable model for segmentation of macular OCT,” Biomed. Opt. Express 5(4), 1062–1074 (2014). [CrossRef]

8. A. Lang, A. Carass, P. A. Calabresi, H. S. Ying, and J. L. Prince, “An adaptive grid for graph-based segmentation in macular cube OCT,” Proc. SPIE 9034, 903402 (2014). [CrossRef]

9. B. J. Antony, M. S. Miri, M. D. Abràmoff, Y. H. Kwon, and M. K. Garvin, “Automated 3D segmentation of multiple surfaces with a shared hole: segmentation of the neural canal opening in SD-OCT volumes,” in 17th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2014), vol. 8673 of Lecture Notes in Computer Science (Springer Berlin Heidelberg, 2014), pp. 739–746.

10. A. Lang, A. Carass, O. Al-Louzi, P. Bhargava, H. S. Ying, P. A. Calabresi, and J. L. Prince, “Longitudinal graph-based segmentation of macular OCT using fundus alignment,” Proc. SPIE 9413, 94130M (2015). [CrossRef]

11. J. Novosel, G. Thepass, H. G. Lemij, J. F. de Boer, K. A. Vermeer, and L. J. van Vliet, “Loosely coupled level sets for simultaneous 3D retinal layer segmentation in optical coherence tomography,” Med. Image Anal. 26(1), 146–158 (2015). [CrossRef]

12. I. Oguz, L. Zhang, M. D. Abràmoff, and M. Sonka, “Optimal retinal cyst segmentation from OCT images,” Proc. SPIE 9784, 97841E (2016). [CrossRef]

13. B. J. Antony, A. Lang, E. K. Swingle, O. Al-Louzi, A. Carass, S. D. Solomon, P. A. Calabresi, S. Saidha, and J. L. Prince, “Simultaneous Segmentation of Retinal Surfaces and Microcystic Macular Edema in SDOCT Volumes,” Proc. SPIE 9784, 97841C (2016). [CrossRef]

14. I. Oguz, M. D. Abramoff, L. Zhang, K. Lee, E. Z. Zhang, and M. Sonka, “4D Graph-Based Segmentation for Reproducible and Sensitive Choroid Quantification From Longitudinal OCT Scans,” Invest. Ophthalmol. Vis. Sci. 57(9), OCT621 (2016). [CrossRef]

15. A. Lang, A. Carass, O. Al-Louzi, P. Bhargava, S. D. Solomon, P. A. Calabresi, and J. L. Prince, “Combined registration and motion correction of longitudinal retinal OCT data,” Proc. SPIE 9784, 97840X (2016). [CrossRef]

16. A. Lang, A. Carass, A. K. Bittner, H. S. Ying, and J. L. Prince, “Improving graph-based OCT segmentation for severe pathology in Retinitis Pigmentosa patients,” Proc. SPIE 10137, 101371M (2017). [CrossRef]

17. J. Novosel, K. A. Vermeer, J. H. de Jong, Z. Wang, and L. J. van Vliet, “Joint Segmentation of Retinal Layers and Focal Lesions in 3-D OCT Data of Topologically Disrupted Retinas,” IEEE Trans. Med. Imag. 36(6), 1276–1286 (2017). [CrossRef]

18. Y. He, A. Carass, Y. Yun, C. Zhao, B. M. Jedynak, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Towards Topological Correct Segmentation of Macular OCT from Cascaded FCNs,” in Fetal, Infant and Ophthalmic Medical Image Analysis, (Springer, 2017), pp. 202–209.

19. L. Fang, D. Cunefare, C. Wang, R. H. Guymer, S. Li, and S. Farsiu, “Automatic segmentation of nine retinal layer boundaries in OCT images of non-exudative AMD patients using deep learning and graph search,” Biomed. Opt. Express 8(5), 2732–2744 (2017). [CrossRef]

20. Y. Liu, A. Carass, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Multi-layer fast level set segmentation for macular OCT,” in 15th International Symposium on Biomedical Imaging (ISBI 2018), (2018), pp. 1445–1448.

21. J. Kugelman, D. Alonso-Caneiro, S. A. Read, S. J. Vincent, and M. J. Collins, “Automatic segmentation of OCT retinal boundaries using recurrent neural networks and graph search,” Biomed. Opt. Express 9(11), 5759–5777 (2018). [CrossRef]

22. J. Wang, Z. Wang, F. Li, G. Qu, Y. Qiao, H. Lv, and X. Zhang, “Joint retina segmentation and classification for early glaucoma diagnosis,” Biomed. Opt. Express 10(5), 2639–2656 (2019). [CrossRef]

23. Y. Liu, A. Carass, Y. He, B. J. Antony, A. Filippatou, S. Saidha, S. D. Solomon, P. A. Calabresi, and J. L. Prince, “Layer boundary evolution method for macular OCT layer segmentation,” Biomed. Opt. Express 10(3), 1064–1080 (2019). [CrossRef]

24. M. Pekala, N. Joshi, T. A. Liu, N. M. Bressler, D. C. DeBuc, and P. Burlina, “Deep learning based retinal OCT segmentation,” Comput. Biol. Med. 114, 103445 (2019). [CrossRef]

25. Y. He, A. Carass, Y. Liu, B. M. Jedynak, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Deep learning based topology guaranteed surface and MME segmentation of multiple sclerosis subjects from retinal OCT,” Biomed. Opt. Express 10(10), 5042–5058 (2019). [CrossRef]

26. Y. He, A. Carass, B. M. Jedynak, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Topology guaranteed segmentation of the human retina from OCT using convolutional neural networks,” arXiv, arXiv:1803.05120 (2018). [CrossRef]

27. Y. He, A. Carass, Y. Liu, B. M. Jedynak, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Fully convolutional boundary regression for retina OCT segmentation,” in 22nd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2019), (Springer, 2019), pp. 120–128.

28. Y. He, A. Carass, Y. Liu, B. M. Jedynak, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Structured layer surface segmentation for retina OCT using fully convolutional regression networks,” Med. Image Anal. 68, 101856 (2021). [CrossRef]

29. D. Li, J. Wu, Y. He, X. Yao, W. Yuan, D. Chen, H.-C. Park, S. Yu, J. L. Prince, and X. Li, “Parallel deep neural networks for endoscopic OCT image segmentation,” Biomed. Opt. Express 10(3), 1126–1135 (2019). [CrossRef]

30. S. Mukherjee, T. De Silva, P. Grisso, H. Wiley, D. K. Tiarnan, A. T. Thavikulwat, E. Chew, and C. Cukras, “Retinal layer segmentation in optical coherence tomography (oct) using a 3d deep-convolutional regression network for patients with age-related macular degeneration,” Biomed. Opt. Express 13(6), 3195–3210 (2022). [CrossRef]

31. M.-X. Li, S.-Q. Yu, W. Zhang, H. Zhou, X. Xu, T.-W. Qian, and Y.-J. Wan, “Segmentation of retinal fluid based on deep learning: application of three-dimensional fully convolutional neural networks in optical coherence tomography images,” Int. J. Ophthalmol. 12(6), 1012 (2019). [CrossRef]

32. J. Chen, L. Yang, Y. Zhang, M. Alber, and D. Z. Chen, “Combining fully convolutional and recurrent neural networks for 3D biomedical image segmentation,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, (2016), pp. 3044–3052.

33. X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” Adv. Neural Inf. Process. Syst. (2015).

34. A. A. Novikov, D. Major, M. Wimmer, D. Lenis, and K. Bühler, “Deep sequential segmentation of organs in volumetric medical scans,” IEEE Trans. Med. Imag. (2018).

35. K.-L. Tseng, Y.-L. Lin, W. Hsu, and C.-Y. Huang, “Joint sequence learning and cross-modality convolution for 3D biomedical segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition, (2017), pp. 6393–6400.

36. S. Roy, A. Carass, J. Pacheco, M. Bilgel, S. M. Resnick, J. L. Prince, and D. L. Pham, “Temporal filtering of longitudinal brain magnetic resonance images for consistent segmentation,” NeuroImage: Clin. 11, 264–275 (2016). [CrossRef]

37. Z. Xue, D. Shen, and C. Davatzikos, “Classic: consistent longitudinal alignment and segmentation for serial image computing,” NeuroImage 30(2), 388–399 (2006). [CrossRef]

38. Y. Gao, J. M. Phillips, Y. Zheng, R. Min, P. T. Fletcher, and G. Gerig, “Fully convolutional structured LSTM networks for joint 4D medical image segmentation,” in 15th International Symposium on Biomedical Imaging (ISBI 2018), (IEEE, 2018), pp. 1104–1108.

39. J. Loo, L. Fang, D. Cunefare, G. J. Jaffe, and S. Farsiu, “Deep longitudinal transfer learning-based automatic segmentation of photoreceptor ellipsoid zone defects on optical coherence tomography images of macular telangiectasia type 2,” Biomed. Opt. Express 9(6), 2681–2698 (2018). [CrossRef]

40. A. Rivail, U. Schmidt-Erfurth, W.-D. Vogl, S. M. Waldstein, S. Riedl, C. Grechenig, Z. Wu, and H. Bogunovic, “Modeling disease progression in retinal OCTs with longitudinal self-supervised learning,” in International Workshop on Predictive Intelligence in Medicine, (Springer, 2019), pp. 44–52.

41. B. Li, W. J. Niessen, S. Klein, M. de Groot, M. A. Ikram, M. W. Vernooij, and E. E. Bron, “A hybrid deep learning framework for integrated segmentation and registration: evaluation on longitudinal white matter tract changes,” in 22nd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2019), (Springer, 2019), pp. 645–653.

42. M. Chen, A. Lang, E. Sotirchos, H. S. Ying, P. A. Calabresi, J. L. Prince, and A. Carass, “Deformable registration of macular OCT using A-mode scan similarity,” in 10th International Symposium on Biomedical Imaging (ISBI 2013), (2013), pp. 476–479.

43. M. Chen, A. Lang, H. S. Ying, P. A. Calabresi, J. L. Prince, and A. Carass, “Analysis of Macular OCT images using deformable registration,” Biomed. Opt. Express 5(7), 2196 (2014). [CrossRef]

44. R. Azad, M. Asadi-Aghbolaghi, M. Fathy, and S. Escalera, “Bi-directional convlstm u-net with densley connected convolutions,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops , (2019), pp. 0–0.

45. F. Xu, H. Ma, J. Sun, R. Wu, X. Liu, and Y. Kong, “Lstm multi-modal UNet for brain tumor segmentation,” in 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC), (IEEE, 2019), pp. 236–240.

46. T. Gaens, F. Maes, D. Vandermeulen, and P. Suetens, “Non-rigid multimodal image registration using mutual information,” in 1th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 1998), (Springer, 1998), pp. 1099–1106.

47. M. Styner, C. Brechbuhler, G. Szckely, and G. Gerig, “Parametric estimate of intensity inhomogeneities applied to mri,” IEEE Trans. Med. Imaging 19(3), 153–165 (2000). [CrossRef]

48. P. Bhargava, A. Lang, O. Al-Louzi, A. Carass, J. L. Prince, P. A. Calabresi, and S. Saidha, “Applying an open-source segmentation algorithm to different OCT devices in Multiple Sclerosis patients and healthy controls: Implications for clinical trials,” Mult. Scler. Int. 2015, 1–10 (2015). [CrossRef]

49. J. Tian, B. Varga, E. Tatrai, P. Fanni, G. M. Somfai, W. E. Smiddy, and D. Cabrera Debuc, “Performance evaluation of automated segmentation software on optical coherence tomography volume data,” J. Biophotonics 9(5), 478–489 (2016). [CrossRef]

50. J. Button, O. Al-Louzi, A. Lang, P. Bhargava, S. D. Newsome, T. Frohman, L. J. Balcer, E. M. Frohman, J. Prince, P. A. Calabresi, and S. Saidha, “Disease-modifying therapies modulate retinal atrophy in multiple sclerosis: a retrospective study,” Neurology 88(6), 525–532 (2017). [CrossRef]

51. N. Gonzalez Caldito, B. Antony, Y. He, A. Lang, J. Nguyen, A. Rothman, E. Ogbuokiri, A. Avornu, L. Balcer, E. Frohman, T. C. Frohman, P. Bhargava, J. Prince, P. A. Calabresi, and S. Saidha, “Analysis of agreement of retinal-layer thickness measures derived from the segmentation of horizontal and vertical Spectralis OCT macular scans,” Curr. Eye Res. 43(3), 415–423 (2018). [CrossRef]

52. A. G. Filippatou, E. S. Vasileiou, Y. He, K. C. Fitzgerald, G. Kalaitzidis, J. Lambe, M. A. Mealy, M. Levy, Y. Liu, J. L. Prince, E. M. Mowry, S. Saidha, P. A. Calabresi, and E. S. Sotirchos, “Evidence of subclinical quantitative retinal layer abnormalities in AQP4-IgG seropositive NMOSD,” Multiple Scler. J. 27(11), 1738–1748 (2021). [CrossRef]

53. A. G. Filippatou, E. S. Vasileiou, Y. He, K. C. Fitzgerald, G. Kalaitzidis, J. Lambe, M. A. Mealy, M. Levy, Y. Liu, J. L. Prince, E. M. Mowry, S. Saidha, P. A. Calabresi, and E. S. Sotirchos, “Optic Neuritis–Independent Retinal Atrophy in Neuromyelitis Optica Spectrum Disorder,” J. Neuro-Ophthalmology (2021).

54. R. Zusman, A. Carass, Y. He, J. Prince, H. S. Ying, G. Dagnelie, A. Vinnett, and A. K. Bittner, “Longitudinal reductions in thickness of foveal inner retinal layers in relation to progressive vision loss in retinitis pigmentosa patients,” Invest. Ophthalmol. Vis. Sci. 59, 50 (2018).

55. Y. He, A. Carass, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Retinal layer parcellation of optical coherence tomography images: Data resource for Multiple Sclerosis and Healthy Controls,” Data in Brief 22, 601–604 (2019). [CrossRef]

56. H. Song, M. Kim, D. Park, Y. Shin, and J.-G. Lee, “Learning from noisy labels with deep neural networks: A survey,” IEEE Trans. Med. Imag. pp. 1–19 (2022).

57. D. Karimi, H. Dou, S. K. Warfield, and A. Gholipour, “Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis,” Med. Image Anal. 65, 101759 (2020). [CrossRef]

	Aura	Deep-Single	Deep-Forward	Deep-Backward
RNFL-Macula	0.439(1.293)	0.383(1.339)	0.157(1.184)	0.264(1.064)*
GC+IPL-Macula	0.054(0.797)	0.136(0.929)	0.232(0.801)	0.043(0.73)*
INL-Macula	-0.283(0.623)	-0.281(0.381)	-0.306(0.333)	-0.169(0.272)
OPL-Macula	-0.106(0.489)	-0.082(0.179)	-0.027(0.198)	-0.038(0.152)
ONL-Macula	0.106(0.617)	-0.155(0.421)	-0.151(0.346)	-0.125(0.326)
IS-Macula	0.065(0.329)	-0.008(0.263)	-0.104(0.269)	-0.026(0.182)
OS-Macula	-0.052(0.362)	-0.401(0.567)	-0.063(0.497)	-0.211(0.348)*
RPE-Macula	-0.627(0.92)	-0.491(0.725)	-0.705(0.573)	-0.14(0.398)*
Overall	0.552(0.212)	0.516(0.193)	0.457(0.16)	0.349(0.148)*

	Aura	Deep-Single	Deep-Forward	Deep-Backward
RNFL-Macula	1.218(1.568)	1.285(1.388)	0.96(0.825)	0.859(0.561)
GC+IPL-Macula	0.575(0.286)	0.798(0.542)	0.588(0.371)	0.534(0.254)
INL-Macula	0.641(0.909)	0.513(0.558)	0.404(0.34)	0.35(0.204)
OPL-Macula	0.239(0.129)	0.183(0.101)	0.149(0.083)	0.141(0.079)
ONL-Macula	0.557(0.172)	0.587(0.224)	0.473(0.179)	0.448(0.18)
IS-Macula	0.153(0.067)	0.18(0.082)	0.145(0.057)	0.155(0.051)
OS-Macula	0.376(0.197)	0.357(0.111)	0.26(0.133)	0.235(0.102)
RPE-Macula	0.461(0.14)	0.487(0.167)	0.374(0.137)	0.361(0.122)
Overall	0.527(0.357)	0.549(0.337)	0.419(0.217)	0.385(0.148)

	ILM	RNFL-GCL	IPL-INL	INP-OPL	OPL-ONL
Aura	2.492(0.39)	4.273(0.807)	4.402(0.654)	5.887(1.069)	5.497(1.707)
Deep-Single	2.541(0.525)	4.445(1.295)	4.292(1.151)	6.176(1.393)	5.406(1.703)
Deep-Forward*	2.517(0.534)	3.973(0.788)	3.89(0.618)	6.126(1.166)	5.002(1.471)
	ELM	IS-OS	OS-RPE	BM	Overall
Aura	2.675(0.548)	3.06(0.769)	4.374(0.988)	5.751(1.717)	4.268(0.531)
Deep-Single	2.824(0.84)	3.174(0.801)	4.804(1.13)	5.363(1.484)	4.336(0.748)
Deep-Forward*	2.645(0.729)	2.464(0.291)	4.928(1.288)	5.233(1.444)	4.087(0.428)

	Aura	Deep-Single	Deep-Forward	Deep-Backward
RNFL-Macula	0.439(1.293)	0.383(1.339)	0.157(1.184)	0.264(1.064)*
GC+IPL-Macula	0.054(0.797)	0.136(0.929)	0.232(0.801)	0.043(0.73)*
INL-Macula	-0.283(0.623)	-0.281(0.381)	-0.306(0.333)	-0.169(0.272)
OPL-Macula	-0.106(0.489)	-0.082(0.179)	-0.027(0.198)	-0.038(0.152)
ONL-Macula	0.106(0.617)	-0.155(0.421)	-0.151(0.346)	-0.125(0.326)
IS-Macula	0.065(0.329)	-0.008(0.263)	-0.104(0.269)	-0.026(0.182)
OS-Macula	-0.052(0.362)	-0.401(0.567)	-0.063(0.497)	-0.211(0.348)*
RPE-Macula	-0.627(0.92)	-0.491(0.725)	-0.705(0.573)	-0.14(0.398)*
Overall	0.552(0.212)	0.516(0.193)	0.457(0.16)	0.349(0.148)*

	Aura	Deep-Single	Deep-Forward	Deep-Backward
RNFL-Macula	1.218(1.568)	1.285(1.388)	0.96(0.825)	0.859(0.561)
GC+IPL-Macula	0.575(0.286)	0.798(0.542)	0.588(0.371)	0.534(0.254)
INL-Macula	0.641(0.909)	0.513(0.558)	0.404(0.34)	0.35(0.204)
OPL-Macula	0.239(0.129)	0.183(0.101)	0.149(0.083)	0.141(0.079)
ONL-Macula	0.557(0.172)	0.587(0.224)	0.473(0.179)	0.448(0.18)
IS-Macula	0.153(0.067)	0.18(0.082)	0.145(0.057)	0.155(0.051)
OS-Macula	0.376(0.197)	0.357(0.111)	0.26(0.133)	0.235(0.102)
RPE-Macula	0.461(0.14)	0.487(0.167)	0.374(0.137)	0.361(0.122)
Overall	0.527(0.357)	0.549(0.337)	0.419(0.217)	0.385(0.148)

Longitudinal deep network for consistent OCT layer segmentation

Abstract

1. Introduction

2. Methods

2.1 Pseudo-3D feature extraction with convolutional LSTM

2.2 Longitudinal prior fusion with surface registration

2.2.1 Hidden Markov model

2.3 Longitudinal network $\Phi$

2.4 Iterative surface registration and segmentation

2.5 Forward and backward segmentation chain

3. Experiments

3.1 Consistency on repeated scans

3.2 Longitudinal segmentation stability

3.3 Segmentation accuracy

3.4 Qualitative results

4. Discussion

Funding

Acknowledgment

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (16)

Tables (4)

Equations (8)

Biomedical Optics Express