Ambient noise-based weakly supervised manhole localization methods over deployed fiber networks

Alexander Bukharin; Alexander Bukharin; Shaobo Han; Yuheng Chen; Ming-Fang Huang; Yue-Kai Huang; Yao Xie; Ting Wang

doi:10.1364/OE.484083

1. Introduction

Optical fiber sensing technology has become increasingly ubiquitous in many applications using dedicated or existing fibers, such as pipeline surveillance [1–3], railway crack [4] or intrusion detection [5], seismic monitoring [6,7], tunnel steel loop structure monitoring [8], traffic and road condition monitoring [9,10], and cable safety protection [11,12]. As the backbone of the 5G and beyond networks, a large portion of cables is buried under roads with slack fibers preserved in manholes. To fix faults in the fiber, a mapping between the fiber length and geographic location is needed. Optical time-domain reflectometry (OTDR) is a common method used by field technicians to locate fiber faults. However, OTDR often fails since complicated fiber cable deployment paths and fiber slacks reserved along the route can make the OTDR distance much longer than the geographic distance from the central office (CO) to the fault location with a typical mismatch of $15 \sim 20 \%$. Distributed fiber optic sensing (DFOS) has recently been proposed as a non-destructive solution to conduct underground cable mapping [13]. With DFOS, manholes can be used as landmarks for underground cable mapping, preventing error propagation due to complicated cable deployment paths. Once the landmark positions (with known GPS coordinates) are pinpointed on the cable, the GPS coordinates of cable segments in-between these landmarks can be derived through linear interpolation. In order for this technique to work, we must be able to localize the manholes on the route. Beyond localizing manholes to use as landmarks, localizing manholes is important for carriers and operators to enhance the efficiency of network operations as well.

Currently, manhole localization through fiber sensing technology requires external vibrations such as hammer strikes on manhole lids [14]. This procedure is labor-intensive, time-consuming, and impractical for large-scale deployment. In this work, we propose a new approach for manhole localization based on ambient traffic signals over deployed cables sensed by the DFOS. The proposed method is based on deep learning with selective sampling and a learning-based attention module for supervision enhancement. This attention mechanism [15] provides interpretable information on the selection of meaningful data across time periods with different traffic densities. Therefore, hand-labeling across time is not needed for machine learning training. To the best of our knowledge, this is the first time that manholes are localized automatically using only ambient signals from DFOS. Experiments on multiple operational deployed fiber networks verify the effectiveness of the proposed method.

1.1 Related work

There is an increasing interest in applying machine learning methods to automatically process fiber sensing data and therefore reduce manual efforts [16]. In [14], a long short-term memory (LSTM) based neural network is proposed for a similar manhole localization problem based on distributed vibration sensing (DVS) technique. In [17], a bilinear neural network model is presented for fine-grained pole localization based on distributed acoustic sensing (DAS) waveforms. Both approaches use active excitations such as hammer strikes on manhole lids or utility poles. In contrast, our approach extracts manhole location information based only on ambient excitations. Beyond learning from data collected through designed experiments and manual labeling, it is pivotal to build machine learning models that can also learn meaningful information from ambient data. Ambient data is free once the cable is installed and potentially unlimited, yet careful procedures must be conducted to learn from such noisy data.

To train a machine learning model for fiber sensing applications, usually, a large amount of training data needs to be acquired and labeled. Different from natural images with many choices of data augmentation procedures, sensing trace data allows very few legitimate operations to enlarge the size of the training dataset. To address this limitation, Shiloh et al. [18] proposed to use a conditional Generative Adversarial Network (C-GAN) to refine simulated data to have features of real field experimental data, which showed improved performance in the classification of vehicles and footsteps. Our approach further relaxes the labeling constraint by alleviating the burden of obtaining accurate annotation of events along the time axis.

The rest of this paper is organized as follows. In Section 2, we go over the DFOS system design and experimental setup, and signal characteristics. Section 3 describes the proposed method. Section 4 shows real data performance analysis of the proposed method using data collected from field trials, and Section 5 briefly concludes our study.

2. System and experimental setup

The DFOS system used in this study was designed based on a coherent-detection Rayleigh phase OTDR [19]. Using coherent detection, the DFOS system recovers the full polarization and phase information of the Rayleigh backscatter from the field fiber (Fiber under test, FUT). A typical coherent-receiver-based DFOS system is shown in Fig. 1, where the sensing laser is also used as the local oscillator (LO) to down-convert the backscatter signal to the electric baseband. Inside the receiver (Data Acquisition & DSP), the Rayleigh backscatter field $\mathbf {h}(t)$ can then be reconstructed from the outputs of the optical hybrid and balanced photodiodes, which corresponds to the in-phase (I) and quadrature (Q) components of the two polarizations:

(1)$$\mathbf{y}(t) = e^{j\phi_{T_{x}}(t)}[x(t)\otimes\mathbf{h}(t)] + \mathbf{n}(t),$$

where $\mathbf {y}(t)$ is the coherent receiver output and $x(t)$ is the modulated interrogation signal, which in this case is an optical pulse with configurable width for adjusting the spatial resolution and reach of the DAS system, and $\mathbf {n}(t)$ is the noise term which consisted of mostly optical amplifier noise and quantization noise of the ADC. The optical pulse repetition (frame) rate of $R_{p}=1/T_{p}$ is also adjusted to be longer than the round-trip propagation delay of the FUT, and $\phi _{T_{x}}(t)$ is the phase noise of the sensing laser. The four baseband electrical signals are sampled by analog-to-digital converters at the sampling rate of $R_{s}=1/T_{s}$ before digital signal processing (DSP) is implemented in the field-programmable gate array (FPGA).

Fig. 1. Architecture for coherent-detection Rayleigh OTDR.

Download Full Size | PDF

After initial digital filtering, the two digitized complex signals (at two polarizations) are parallelized into frames to obtain $\mathbf {y}[n;m]\triangleq \mathbf {y}(nT_{s}+mT_{p})$, which is a $2\times 1$ complex-valued vector of the backscatter electric field at each distance index $n$ and time index $m$. With the parallelized data arrangement, differential phase beating of Rayleigh scatters at different locations can be computed between pairs of polarizations $i$ and $j$ in DSP:

(2)$$\xi_{ij}[n, m]=y_{i}[n, m]y_{j}^{*}[n-\Delta_{g}, m],\quad i,j\in\{1, 2\},$$

where $\Delta _{g}$ is the delay value that emulates a differential gauge length of $z_{g}=\Delta _g (c/(2n_{\textrm {eff}}))T_{s}$, which is adjustable in the DSP to provide tuning of spatial resolution and sensitivity, with $c/(2n_{\textrm {eff}})$ being the effective speed of light in fiber. To mitigate Rayleigh fading due to polarization, the DFOS system applies a rotated vector sums method [20] to combine four separate beating pairs, where the DC phase of the vectors is rotated and aligned after low pass filtering. Compared to obtaining one beating pair with only one polarization, which could produce large errors if either of the terms $y_{i}[n,m]$ or $y_{i}[n-\Delta _g,m]$ goes into a fade, using both polarizations produces four separate beating pairs which can greatly reduce the chance of polarization fading after they are combined. The unwrapped phase of the summed differential product $\theta [n,m]=\angle \xi [n,m]$ is an estimate of the strain $\epsilon [n,m]$ at time $mT_{p}$ and fiber location $d_{n}=(n-\Delta _{g}?2)\cdot (c?(2n_{\textrm {eff}}))\cdot T_{s}$. As $\delta \theta \propto \delta \epsilon$, the output of the DFOS allows reconstruction of the vibration via tracking of time-varying changes in $\delta \theta$ [21,22].

Figure 1 displays the architecture of the DFOS system. The system employs a native ADC sampling rate of $250$ Msps, corresponding to a spatial resolution limit of $0.4$ m. Due to FPGA resource limitation, the output DFOS phase results were furthered down-sampled $4$ times to the spatial resolution of $1.6$ m. The optical pulse was created using an acousto-optic modulator (AOM) with $40$-ns width. The AOM will create a frequency shift $f_{\textrm {AOM}}$ in $x(t)$ so the modulated interrogation signal becomes $x(t)=e^{j2\pi f_{\textrm {AOM}}t}\cdot x_{\textrm {base}}(t)$, where $x_{\textrm {base}}(t)$ is the original optical pulse without frequency shift. When performing differential phase beating using equation (2), the AOM frequency shift will introduce a constant coefficient of $e^{j2\pi f_{\textrm {AOM}}\cdot (\Delta _{g}T_{s})}$ in $\zeta _{ij}[n,m]$, which is a DC phase term (same for all locations and time frames). As DAS measures dynamic strain changes instead of static strain, the phase introduced by the extra coefficient will be effectively removed by digital bandpass filtering in the DSP. The frame rate of DFOS was set at $2000$ Hz to monitor multiple routes with the whole length from 15 km to $\sim 25$ km. To obtain the waterfall plot for image analysis by the machine learning algorithm, the vibration intensity was calculated by first band-passing each location channel with cutoff frequencies at $30$ Hz and $200$ Hz. The frequency range was selected during initial calibration testing such that high vibration SNR for vehicular traffics can be achieved. The filtered signal was then squared and accumulated over samples to obtain an update rate of $\sim 10$ Hz for each location.

The experimental setup is shown in Fig. 2. The sensing system was connected to a deployed single mode fiber (SMF) for ambient data collection. The sensing system measures the vibration strength along the cable every $120$ milliseconds and creates a vibration array. We refer to this array as a "waterfall". The horizontal axis represents the location along the fiber and the vertical axis time. Within the time-location plane, we use the color map to represent the intensity of the local vibration. This vibration array can thus be viewed as an image, where the x-axis and y-axis correspond to the sensing distance and detection period.

Fig. 2. Experimental setup: manhole localization through ambient traffic.

Download Full Size | PDF

An exemplary waterfall plot is displayed in Fig. 3. In this $40$-second waterfall plot, the pixel color represents the intensity of signals caused by vibrations near the cable. Warmer colors on the waterfall represent higher intensity vibrations. In this way, we can view the historical vibrational map along the optic fiber route. Any static vibration next to the route will cause a vertical stripe on the waterfall plot (e.g., the middle part in Fig. 3). On the contrary, an object moving along the route, for example, a vehicle, causes diagonal traces (Fig. 4(a), a non-manhole location). Any slack cable coil stored in the manhole/handhole on the route will be excited simultaneously whenever a vehicle passes by. It, therefore, results in a horizontal stripe with a width equal to the length of the coil, creating two disconnected diagonal trajectories (Fig. 4(b), a manhole location). If a vehicle runs across a manhole lid, it creates a much larger impact on the local area compared to normal driving. It will excite the fiber section much further away from the point of the interaction, also creating a horizontal strip superpose on the diagonal trace. The extension of the stripe, however, is approximately symmetric with the center on the diagonal trajectory. And the stripe is in fact tilting slightly upward due to the finite propagation speed of the impact surface wave on the ground (see Fig. 4(c) for examples from multiple vehicles). These phenomena allow humans and machines to identify manholes and slack fibers from ambient sensing data.

Fig. 3. Exemplary waterfall plot: diagonal lines capture vehicles traveling along the road.

Download Full Size | PDF

Fig. 4. Exemplary waterfall patterns at (a) non-manhole, (b) manhole locations via traffic trajectory, (c) traffic excited stripes at manhole locations. (d) traffic excited stripes at non-manhole locations. Figures (e)$\sim$(h) further illustrate some hard cases in classifying manhole and non-manhole.

Download Full Size | PDF

Discriminative features useful for manhole localization are mainly created by vehicles interacting with roads and manholes. Due to different factors including vehicle types, driving speeds, fiber coil lengths, and sensing distances, the resulting vibration signatures are non-identical. Therefore it is hard to handcraft rules to detect these features. Instead, we propose a supervised learning approach, in which once a model is trained with labels at a sufficient amount of manholes, the rest of the manholes (along the same or different routes) can be automatically detected.

Picking discriminative frames that can be used to train the model, from hours of sensing data and thousands of candidate locations is both a challenging and tedious task for humans. For one reason, the discriminative pattern does not always appear (see Fig. 4(g) and (h)). Moreover, vehicles passing road cracks or potholes may generate similar-looking patterns (see Fig. 4(d)). To confidently say which locations are manholes and which are not would highly depend on how experienced the technician is. In the next, we present a weakly-supervised machine learning-based approach, which is able to automatically select discriminative patterns from ambient traffic data and localize manholes.

3. Methodology

In order to identify manhole locations, the sensing data is viewed as an image, and spatial-temporal features relevant to manholes are extracted with a convolutional neural network (ConvNet) model [23]. The biggest challenge with training a ConvNet for manhole localization is the weak informativeness of ambient data. Most portions of the ambient data contain little information due to the low density of traffic. Manually annotating the time periods during which discriminative vibration events occur is very tedious and time-consuming. To deal with this problem, we propose two data selection strategies: a heuristic top-K strategy and a learned data selection strategy. We find that the top-K operator is more effective when the number of inference samples is large. On the other hand, the learned selection strategy is less accurate when the number of inference samples is large but can yield better performance when little inference data is available. In practice, one can choose between the two data selection strategies based on how much inference data is available. We emphasize that the performance depends not only on the model architecture but also crucially on how the datasets are composed.

3.1 Data preparation

We divide the waterfall plot into small image patches of size $H \times W$. At each location on the cable, a classification label of “manhole" or “non-manhole" is assigned. Optionally, one can also include a third class indicating aerial cable locations. A classification model is trained to predict which segments of the cable lie in a manhole.

Label assignment In conventional fully-supervised image classification, the objective is to directly classify each image. In our problem, the objective is to classify at the location level, which contains multiple images collected at the same location. It is desirable to have labels for every image patch, but only coarse-grained labels at the location level are available.

The generalization performance of the model depends crucially on the label assignment strategy, that is, how the location label is assigned to patches collected at the same location. We consider training the classifier either at the instance-level or at the group-level [24], the corresponding label assignment strategies are illustrated in Fig. 5. In the instance-level approach, the location label is assigned directly to each image patch from that location. The model predicts labels for each image patch. In the group-level approach, the location label is assigned at the group level in an indirect way to a group of image patches without telling which patches are discriminative. The model learns to assign importance to each image patch.

Fig. 5. Different labeling assignment strategies of our model. The boxes indicate which set of patches the location label is assigned to, with green indicating manhole and red indicating non-manhole.

Download Full Size | PDF

Our approach belongs to the weakly-supervised learning paradigm [25], which does not require expensive temporal annotation. To be more specific, the weak supervision is inexact. Group-level labeling allows the model provides greater flexibility and modeling power, which is particularly useful in low-data scenarios in the test time. However, the greater flexibility also makes the group-level data harder to train on, as the model may overfit the data or ignore some patches during training.

Data selection Besides annotation, the performance of supervised learning is heavily influenced by the way the training dataset is composed. A key problem with using all collected image patches is that many images may contain no relevant information about manhole presence. If the road is not busy, the cable will only pick up weak background vibrations and informative patterns will be very sparse on the waterfall. With these non-informative examples, it is very difficult to differentiate images from manhole or non-manhole locations. Moreover, brutely assigning a label of “manhole" or “non-manhole" to these non-informative images may confuse the model and hinder performance. To enhance the quality of our dataset, we propose two strategies to select the most informative examples at each location for our model to be trained and evaluated on.

One simple strategy is a heuristic data selection strategy based on the total intensity of patches, referred to as the top-K sampling scheme. We observe that the most informative images contain higher levels of vibration, usually corresponding to a car passing over the cable segment. To extract these images, we select the top-K images with the highest total vibration level at each location. In Section 4.2, this scheme is compared against a baseline random sampling scheme. This strategy works quite well when the number of images to select from is large, but deteriorates when the number of samples is small. This is due to the fact that when the number of available samples at a location is small, the top-K samples may contain little useful information.

3.2 Training: network architecture

ConvNets have achieved great successes in a variety of computer vision tasks ranging from image classification to object detection [26,27]. In [28] and [10], ConvNets are used for threat classification and event detection based on phase-OTDR data. In [29], the denoising CNN (DnCNN) model is used to enhance the signal-noise ratio of Raman OTDR traces viewed as 2D images. In this study, we use a relatively shallow ConvNet with four convolutional layers followed by two fully connected layers for classification. In particular, the convolutional layer works as follows. For every input-output channel pair,

(3)$$z_{:, :, s} = \sum_{r=1}^{C_{in}}g_{:, :, r} * f_{r, s, :, :},$$

where $g \in \mathbb {R}^{H\times W\times C_{\textrm {in}}}$ denotes $C_{\textrm {in}}$ channels of input of size $H\times W$, $z \in \mathbb {R}^{H'\times W'\times C_{\textrm {out}}}$ denotes $C_{\textrm {out}}$ channels of output of size $H\times W$, and $f \in \mathbb {R}^{ C_{\textrm {in}}\times C_{\textrm {out}}\times k \times k}$ denotes the convolutional filter. Each output channel feature map is obtained by sliding the $k\times k$ kernel $f_{r,s,:,:}$ over the input channel feature map $g _{:,:, r}$ by the 2D convolution operator $*$. For the first layer, the input is the waterfall patches $x\in \mathbb {R}^{H\times W}$ with $C_{\textrm {in}}=1$. In addition, the $\ell$-th fully connected layers work as follows,

(4)$$z_{\ell+1} = \Phi_{\ell}^{T}z_{\ell} + b_{\ell},$$

where $\{\Phi _{\ell }, b_{\ell }\}$ are the parameters. We use ReLU as the nonlinear activation function for the model layers except for the last one uses softmax. A detailed description of the model design and the size of input and outputs can be seen in Table 1.

Table 1. Convolutional neural network architecture details.

View Table | View all tables in this article

To train the model, we minimize the cross-entropy loss of our training set with the Adam optimization algorithm [30]. Several regularization and normalization techniques are adopted, including dropout, batch normalization, and weight decay [31]. We find these regularization techniques to be necessary as some routes contain high levels of label noise. In some cases, the labeled “non-manhole" location contains some manholes that are not identified.

Deep multiple instance learning As an alternative to the top-K strategy, we propose to learn which images are important with the attention-based multiple instance learning (MIL) framework [15]. Using an attention-based MIL framework gives several benefits. First, it allows the model to make predictions on a group level by deciding which images are most important for each location. This allows more complex patterns to be learned at each location. Second, the learned attention scores provide a level of model interpretability, allowing cable operators to see which images were important for the model’s decision.

The MIL model works as follows. For every image $x_k \in \mathbb {R}^D$ at a location, we extract a vector embedding $h_k \in \mathbb {R}^M$ with the ConvNet, and then calculate an attention score for each image as

(5)$$a_k = \frac{\textrm{exp}\{w^T \textrm{tanh}(V h_k^T)\}}{\sum_{k=1}^K \textrm{exp}\{w^T \textrm{tanh}(V h_k^T)\}},$$

where $w \in \mathbb {R}^{L \times 1}$ and $V \in \mathbb {R}^{L \times M}$ are learned parameters. We can then combine the vector embeddings according to their attention scores to get a final vector representation for each location. This attention mechanism can be thought of as a learned data selection mechanism embedded into the model.

We can visualize the MIL attention scores assigned to different images at a location. An example can be seen in Fig. 6. The MIL mechanism clearly picks out the most informative images by assigning them a high attention score, demonstrating the usefulness of the learned attention mechanism.

Fig. 6. Attention scores from the MIL model. The images with the highest attention scores are outlined in red.

Download Full Size | PDF

The attention score can be used to quantitatively assess the weak informativeness contained in each patch. Looking across longer periods of time, one may identify temporal trends such as correlations to rush hours and select the best time period to collect data.

3.3 Inference: from classification to localization

Once a classification model is trained on the group or patch level, we can output manhole predictions at the location level via a post-processing procedure, which further boosts the performance, especially when generalizing to a new roadway with distribution shifts. The post-processing procedure includes three steps:

1. Averaging: at every location along the cable, we sample multiple patches (baseline ConvNet) or construct multiple groups of image patches (MIL). Each set of data is fed into the classification model, and the probability of that location can be obtained by averaging the binary classification decisions across different patches or groups.
2. Adaptive threshold: each location is initially assigned a binary label based on an adaptive threshold that can be tuned for each roadway. The threshold can be dynamically adjusted such that the total number of detected manholes reaches a reasonable density (e.g. one every $500$ meters) or matches the number expected by the network operators. In many cases, such information is available. If not, the threshold is set to the median prediction score.
3. Manhole qualification: since we use a sliding window strategy along the cable, a series of neighboring locations around any manhole will be classified as manholes and assigned label “1". We further set the rule that: segments of the cable are predicted as manhole locations only if they have over $C$ consecutive predictions of “1" as manhole locations. In the experiments, $C$ is set to $15$ based on prior knowledge about the minimum length of slack fiber coils inside each manhole.

Experimental results in Section 4 show that the proposed post-processing procedure can effectively combine information across multiple time periods and provide good localization performance.

In summary, the complete processing pipeline for the ConvNet model with top-K sampling can be seen in Fig. 7, showing how informative images are selected with top-K sampling before being fed into the ConvNet model. The pipeline for the MIL-based model can be seen in Fig. 8, in which bag embeddings are extracted by the ConvNet encoder model before being classified by the MIL model.

Fig. 7. Diagram for the top-K sampling-based approach.

Download Full Size | PDF

Fig. 8. Diagram for the multiple instance learning (MIL)-based approach.

Download Full Size | PDF

4. Real data performance analysis

To demonstrate the effectiveness of our proposed framework, we conduct extensive experiments on three existing deployed optic fiber networks. These experiments show that our model can provide accurate predictions of manhole locations. In addition, we conduct several ablation studies that highlight the importance of each component of our model and analyze the performance of our approach in different settings.

4.1 Main results

We evaluate the performance of our framework on three routes, with lengths ranging from $15$ km to $25$ km. A detailed description of the datasets size and other statistics can be found in Table 2. Figure 9 shows some examples of manholes on the road (or close to the sidewalk).

Fig. 9. Exemplary manholes on the road or close to the roadside.

Download Full Size | PDF

Table 2. Dataset information about the $3$ cable routes.

View Table | View all tables in this article

We evaluate the performance of our framework on two evaluation tasks: (i) the intermediate manhole classification task and (ii) the final manhole localization task. The key difference between these two tasks is that in manhole classification we predict whether an image contains a manhole pattern, while in manhole localization we specify the location of a manhole on the cable route. Although manhole localization is our final goal, evaluating classification performance allows us to verify that our model can distinguish between the manhole and non-manhole locations, and manhole localization is built upon manhole classification.

We use the following evaluation metric to quantify the model performance on manhole classification and localization:

• ACC: accuracy, the ratio of samples get correctly classified (both manhole and non-manhole locations)
• AUC: the area under the Receiver Operator Characteristic (ROC) curve
• Precision (P): the fraction of the predicted manhole samples/locations that are truly from manhole locations
• Recall (R): the fraction of the true manhole samples/locations that are successfully identified
• F1 score: the harmonic mean between precision and recall, ${2PR}/{(P+R)}$

Classification To evaluate the classification performance of our model, we create a dataset on each route with the top-K data selection strategy as described in Section 3. This dataset contains a balanced amount of data from the manhole and non-manhole locations. Note that this dataset is not created using a sliding window approach, which would result in an imbalanced dataset. Instead, manhole and non-manhole locations are picked with equal probability. This dataset is randomly split into three smaller datasets: a train dataset containing $60\%$ of the data, a validation dataset containing $20\%$ of the data, and a test dataset containing $20\%$ of the data. These datasets contain no overlapping locations, so there is no information leakage. Several models with different hyper-parameters are trained and the one with the best performance on the validation dataset is picked. The hyperparameters tuned include the number of epochs the model is trained for, the batch size during training, the learning rate, and the weight decay parameter. The test performance of the ConvNet classifier on each road is evaluated under (i) instance-level labeling (Table 3), and (ii) group-level labeling assignment (Table 4). For these experiments we evaluate the performance of our framework over five different train/validation/test splits. The number in brackets refers to the standard deviation of the result across different splits, while the number not in brackets refers to the average result across different splits. Note that when group-level (location) label assignment is used, an additional MIL layer (with bag size equal to $10$) is applied on top of the ConvNet.

Table 3. Classification results with instance-level label assignment

View Table | View all tables in this article

Table 4. Classification results with group-level label assignment

View Table | View all tables in this article

From Tables 3 and 4 we can see that both labeling strategies are able to achieve a high level of accuracy on all three roadways, indicating that the base convolutional architecture can successfully extract features relevant to manhole detection. In addition, while the group-level strategy can achieve a high F1 score on every roadway, its training accuracy and AUC score are slightly lower than that of instance-level assignments. This implies that although the group-level strategy can separate manhole and non-manhole locations with high probability, the classification probabilities provided by the model are not well calibrated. This is possibly due to the fact that the MIL layer can more easily overfit the data. For the localization results, we use a model trained and evaluated with the instance-level label assignment, which has more stable performance. We provide a more detailed analysis of the trade-off between the two strategies in Section 4.2.

To evaluate the generalization performance of our method on classification tasks, we evaluate the models performance on roadways it was not trained on. To expose the model with more data variability during training, we train and validate a model on two roadways, and test it on the third one. In particular, we train on a randomly selected 80% of the locations from the two training roadways and then validate on the remaining 20% of locations from the training roadways. The test result on the new unseen roadway can be seen in Table 5 for instance-level labeling and Table 6 for group-level labeling.

Table 5. Generalization results with instance-level label assignment.

View Table | View all tables in this article

Table 6. Generalization results with group-level label assignment.

View Table | View all tables in this article

As we can see from Tables 5 and 6, the generalization performance drops on the new, unseen test route. The performance drop indicates potential domain shifts between routes, which could be due to different pavement materials used on roads and cable-buried depths. Despite this, both instance-level and group-level classifiers can provide reasonably good accuracy as an intermediate step, which can be used for localization in the next.

Localization We consider two settings to evaluate our manhole localization framework: same roadway generalization and different roadway generalization. For the same roadway generalization, we train and validate on the first $80\%$ of the data and test on another $20\%$. For different roadway generalizations, we train and validate on two roadways and do inference on the third one. Similar to the classification setting, we train on a randomly selected $80\%$ of the locations from the two training roadways and then validate on the remaining $20\%$ of locations from the training roadways. To generate manhole predictions, we follow the procedure described in section 3.3 and use a ConvNet model with top-K sampling and instance-level label assignment. Across these experiments, the sliding window paradigm and the proposed post-processing methods (including the adaptive score threshold and the minimum manhole length cutoff) are used. In practice, we find them very effective in alleviating domain shifts across different routes and further improving localization performance. In evaluation, we consider a manhole prediction to be correct if it overlaps with a real manhole and incorrect otherwise. We can then measure the precision, recall, and F1 scores of our model, which can be seen in Tables 7 and 8.

Table 7. Same roadway generalization

View Table | View all tables in this article

Table 8. Different roadway generalization (localization)

View Table | View all tables in this article

From Table 7, we can see that our framework is able to achieve a high F1 score on all routes. Our framework is able to achieve an F1-score of over $0.88$ on every route, indicating our framework can successfully localize manholes with only ambient data. On some roadways, we notice that the precision is slightly lower than recall. This indicates that while our framework can identify almost all manholes on the road, it may predict some non-manhole locations as manholes. We hypothesize that this happens because many locations on the road may emit manhole-like patterns, such as locations containing potholes or bridge junctions.

From Table 8 we can see that our framework is even able to generalize across roadways, despite an expected drop in performance due to a large covariate shift. This generalization ability is very useful because it means that when a cable is installed under a new road, the cable operator can simply apply our framework without any extra work. Despite the limitations of training data in terms of the number of routes and manholes, it can detect over $70\%$ of the manholes on the new route. We note that there are a few manholes close to the sidewalk that are harder to detect. An interesting line of future work would be to utilize tools from the domain adaptation literature to improve generalization performance and close the gap between different roadway generalizations and the same roadway generalization [32–34].

Beyond the F1 score, we can also visually inspect our framework’s manhole predictions. In Fig. 10 we plot the manhole predictions for both the same road predictions and different road predictions. These figures further confirm that our framework is able to generate accurate manhole predictions on both the same road it was trained on and new unseen roadways. Note that segments that only contain a partial manhole are not included in the training data set, it is interesting to see how the model handles it in the test time.

Fig. 10. Spatial visualization of our model’s predictions. Orange dots represent the predicted probability of a manhole. Blue shading represents the ground-truth manhole locations. Red lines represent the edges of the predicted manholes.

Download Full Size | PDF

4.2 Ablation experiments

In this subsection, we investigate additional factors of our framework and make several recommendations from the practitioner’s perspective, including the influence of different dataset preparation strategies (sampling strategy and label assignment) and different amounts of data in both training and inference. The baseline method is ConvNet trained with instance-level label assignment, as used in many strongly supervised learning applications [10,26–28].

Data preparation strategy First we study the effect of data preparation strategies on performance. We consider two data selection heuristics: random data selection at each location and top-K data selection at each location. We also examine the two label assignment schemes: the basic instance-level scheme and the group-level scheme. Recall that when the group-level scheme is used, a MIL layer is appended onto the ConvNet model to select the most important images. We use location-level classification as our task in order to make the difference between methods clear. We present results from these models averaged across all three roadways in Table 9.

Table 9. Ablation experiments averaged across all $3$ roadways.

View Table | View all tables in this article

Note that for each roadway we run each experiment over $5$ random seeds, and then present the averaged result (number not in brackets) over all $15$ experiments for each method. The number in brackets refers to the standard deviation of the result across different runs.

From Table 9 we can see that with top-K sampling, both the instance-level and group-level strategies can accurately separate the manhole and non-manhole locations. This indicates that top-K sampling is an effective way to select useful images. With random sampling, performance drastically drops for the instance-level strategy and only mildly for the group-level labeling strategy. These results indicate that a group-level labeling strategy with a MIL layer can effectively select useful images and significantly outperform the instance-level strategy on randomly selected data. The gains in the F1 score from data selection, and both data selection and MIL are shown to be statistically significant based on a two-sample t-test.

Amounts of training and validation data Secondly, we evaluate the performance of our framework with varying amounts of training and validation data. To vary the amount of training data, we randomly select images from each location and select the top half of the images by intensity. From Fig. 9(a), we can see that the instance-level labeling strategy achieves steady performance across all amounts of training data, while the performance of the group-level labeling strategy model drops $\sim 9\%$ with the lowest amounts of training data ($8$ hours). This performance drop implies that the MIL mechanism is more difficult to learn than the instance-level model and may require more training data.

Amounts of inference data We also investigate how each variant performs with varying amounts of inference data. For each amount of data, we select $500$ points either using the top-K strategy or randomly. The results from this experiment on roadway three can be seen in Fig. 11(b). As the number of inference points decreases, the performance of the models trained with top-K sampling decreases drastically. This is because the top-K strategy will become less effective when it has fewer points to select from. On the other hand, the models trained with random data selection have a steady level of performance across varying amounts of data. In fact, the model trained with randomly selected data and group-level label assignment can achieve superior performance than models trained with top-K data selection with $5$ times fewer inference images. Therefore the group-level labeling strategy with random data selection would be more useful to a cable operator who needs predictions quickly (within one day), while the instance-level labeling strategy with top-K data selection will be more useful to a cable operator who can wait for inference results (at least $5$ days). Altogether, Figs. 11(a) and 11(b) show how our framework is able to take advantage of having abundant ambient data for both training and inference.

Fig. 11. Model performance with different amounts of data over 3 random initializations. The shade region denotes mean $\pm$ standard deviation.

Download Full Size | PDF

4.3 Visual explanation of model performance

To verify that our machine learning model can recognize important waterfall patterns at an image level, we visualize which regions of the waterfall images are important for prediction with Grad-CAM [35]. In short, Grad-CAM uses the gradient of a target class prediction (i.e. manhole or non-manhole) with respect to the final layer convolutional layer to produce a heat map. In this heat map, regions with a higher gradient can be interpreted as having higher importance for the model’s prediction. These heat maps are displayed next to the original image in Fig. 12, where we can see that the model assigns the greatest importance to vehicle traces and other vibration sources. This indicates that the model is learning physically coherent ways to distinguish between the manhole and non-manhole locations.

Fig. 12. Waterfall plots and the corresponding gradient plots at the manhole and non-manhole locations. Lighter regions on the gradient plots represent higher gradient scores.

Download Full Size | PDF

5. Conclusion and future work

In this paper, we propose a DFOS system and machine learning method to automatically localize manholes, which forms a key step in the cable mapping process. This framework utilizes weakly supervised learning methods to predict manhole locations based on ambient data captured from the route. To deal with the non-informative ambient data, we investigate data selection and label assignment strategies and verify their effectiveness extensively in a variety of settings, including data efficiency and generalizability to different routes. We also find that the post-processing step is very helpful in bridging the gap between classification and localization and combining results from multiple predictions.

Besides the weak informativeness of ambient data, the other two practical challenges are (i) the label noise from human annotation and (ii) covariate shift of DFOS data between routes. These challenges remain to be addressed in future work. The proposed framework can be used to predict potholes and road surface defects as well, which can be investigated as future DFOS applications with ambient data.

Disclosures

The authors declare no conflict of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. F. Tanimola and D. Hill, “Distributed fibre optic sensors for pipeline protection,” J. Nat. Gas Sci. Eng. 1(4-5), 134–143 (2009). [CrossRef]

2. J. Tejedor, H. F. Martins, D. Piote, J. Macias-Guarasa, J. Pastor-Graells, S. Martin-Lopez, P. C. Guillén, F. De Smet, W. Postvoll, and M. González-Herráez, “Toward prevention of pipeline integrity threats using a smart fiber-optic surveillance system,” J. Lightwave Technol. 34(19), 4445–4453 (2016). [CrossRef]

3. Z. Peng, J. Jian, H. Wen, A. Gribok, M. Wang, H. Liu, S. Huang, Z.-H. Mao, and K. P. Chen, “Distributed fiber sensor and machine learning data analytics for pipeline protection against extrinsic intrusions and intrinsic corrosions,” Opt. Express 28(19), 27277–27292 (2020). [CrossRef]

4. C. Fan, F. Ai, Y. Liu, Z. Xu, G. Wu, W. Zhang, C. Liu, Z. Yan, D. Liu, and Q. Sun, “Rail crack detection by analyzing the acoustic transmission process based on fiber distributed acoustic sensor,” in 2019 Optical Fiber Communications Conference and Exhibition (OFC), (IEEE, 2019), pp. 1–3.

5. Z. Li, J. Zhang, M. Wang, Y. Zhong, and F. Peng, “Fiber distributed acoustic sensing using convolutional long short-term memory network: a field test on high-speed railway intrusion detection,” Opt. Express 28(3), 2925–2938 (2020). [CrossRef]

6. M. R. Fernández-Ruiz, M. A. Soto, E. F. Williams, S. Martin-Lopez, Z. Zhan, M. Gonzalez-Herraez, and H. F. Martins, “Distributed acoustic sensing for seismic activity monitoring,” APL Photonics 5(3), 030901 (2020). [CrossRef]

7. P. Jousset, G. Currenti, B. Schwarz, A. Chalari, F. Tilmann, T. Reinsch, L. Zuccarello, E. Privitera, and C. M. Krawczyk, “Fibre optic distributed acoustic sensing of volcanic events,” Nat. Commun. 13(1), 1753 (2022). [CrossRef]

8. D. Hu, B. Tian, H. Li, C. Fan, T. Liu, T. He, Y. Liu, Z. Yan, and Q. Sun, “Intelligent structure monitoring for tunnel steel loop based on distributed acoustic sensing,” in CLEO: Applications and Technology, (Optical Society of America, 2021), pp. ATh1S–4.

9. M.-F. Huang, M. Salemi, Y. Chen, J. Zhao, T. J. Xia, G. A. Wellbrock, Y.-K. Huang, G. Milione, E. Ip, P. Ji, T. Wang, and A. Yoshiaki, “First field trial of distributed fiber optical sensing and high-speed communication over an operational telecom network,” J. Lightwave Technol. 38(1), 75–81 (2020). [CrossRef]

10. T. Li, Y. Chen, M.-F. Huang, S. Han, and T. Wang, “Vehicle run-off-road event automatic detection by fiber sensing technology,” in 2021 Optical Fiber Communications Conference and Exhibition (OFC), (IEEE, 2021), pp. 1–3.

11. T. J. Xia, G. A. Wellbrock, M.-F. Huang, S. Han, Y. Chen, M. Salemi, P. N. Ji, T. Wang, and Y. Aono, “Field trial of abnormal activity detection and threat level assessment with fiber optic sensing for telecom infrastructure protection,” in 2021 Optical Fiber Communications Conference and Exhibition (OFC), (IEEE, 2021), pp. 1–3.

12. S. Ozharar, Y. Ding, Y. Tian, Y. Yoda, J. M. Moore, T. Wang, and Y. Aono, “Detection and localization of stationary weights hanging on aerial telecommunication field fibers using distributed acoustic sensing,” Opt. Express 29(26), 42855–42862 (2021). [CrossRef]

13. T. J. Xia, G. A. Wellbrock, M.-F. Huang, M. Salemi, Y. Chen, T. Wang, and Y. Aono, “First proof that geographic location on deployed fiber cable can be determined by using OTDR distance based on distributed fiber optical sensing technology,” in Optical Fiber Communication Conference, (Optical Society of America, 2020), pp. Th3A–5.

14. M. Wada, Y. Maeda, H. Shimabara, and T. Aihara, “Manhole locating technique using distributed vibration sensing and machine learning,” in Optical Fiber Communication Conference, (Optica Publishing Group, 2021), pp. Tu1G–3.

15. M. Ilse, J. Tomczak, and M. Welling, “Attention-based deep multiple instance learning,” in International Conference on Machine Learning, (PMLR, 2018), pp. 2127–2136.

16. P. Westbrook, “Big data on the horizon from a new generation of distributed optical fiber sensors,” APL Photonics 5(2), 020401 (2020). [CrossRef]

17. Y. Lu, Y. Tian, S. Han, E. Cosatto, S. Ozharar, and Y. Ding, “Automatic fine-grained localization of utility pole landmarks on distributed acoustic sensing traces based on bilinear resNets,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2021), pp. 4675–4679.

18. L. Shiloh, A. Eyal, and R. Giryes, “Efficient processing of distributed acoustic sensing data using a deep learning approach,” J. Lightwave Technol. 37(18), 4755–4762 (2019). [CrossRef]

19. E. Ip, F. Ravet, H. Martins, M.-F. Huang, T. Okamoto, S. Han, C. Narisetty, J. Fang, Y.-K. Huang, M. Salemi, E. Rochat, F. Briffod, A. Goy, M. R. Fernánez-Ruiz, and M. González-Herráez, “Using global existing fiber networks for environmental sensing,” Proc. IEEE 110(11), 1853–1888 (2022). [CrossRef]

20. D. Chen, Q. Liu, and Z. He, “Phase-detection distributed fiber-optic vibration sensor without fading-noise based on time-gated digital OFDR,” Opt. Express 25(7), 8315–8325 (2017). [CrossRef]

21. Z. Pan, K. Liang, Q. Ye, H. Cai, R. Qu, and Z. Fang, “Phase-sensitive OTDR system based on digital coherent detection,” in 2011 Asia Communications and Photonics Conference and Exhibition (ACP), (IEEE, 2011), pp. 1–6.

22. Z. Wang, L. Zhang, S. Wang, N. Xue, F. Peng, M. Fan, W. Sun, X. Qian, J. Rao, and Y. Rao, “Coherent ϕ-OTDR based on I/Q demodulation and homodyne detection,” Opt. Express 24(2), 853–858 (2016). [CrossRef]

23. J. Denker, W. Gardner, H. Graf, D. Henderson, R. Howard, W. Hubbard, L. D. Jackel, H. Baird, and I. Guyon, “Neural network recognizer for hand-written zip code digits,” Advances in Neural Information Processing Systems 1 (1988).

24. J. Amores, “Multiple instance classification: Review, taxonomy and comparative study,” Artif. Intelligence 201, 81–105 (2013). [CrossRef]

25. Z.-H. Zhou, “A brief introduction to weakly supervised learning,” Natl. Sci. Rev. 5(1), 44–53 (2018). [CrossRef]

26. F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size,” arXiv preprint arXiv:1602.07360 (2016). [CrossRef]

27. Y. Tian, G. Yang, Z. Wang, H. Wang, E. Li, and Z. Liang, “Apple detection during different growth stages in orchards using the improved YOLO-v3 model,” Comput. Electron. Agric. 157, 417–426 (2019). [CrossRef]

28. M. Aktas, T. Akgun, M. U. Demircin, and D. Buyukaydin, “Deep learning based multi-threat classification for phase-otdr fiber optic distributed acoustic sensing applications,” in Fiber Optic Sensors and Applications XIV, vol. 10208 (SPIE, 2017), pp. 75–92.

29. H. Wu, C. Zhao, R. Liao, Y. Chang, and M. Tang, “Performance enhancement of ROTDR using deep convolutional neural networks,” in Optical Fiber Sensors, (Optica Publishing Group, 2018), p. TuE16.

30. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014). [CrossRef]

31. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research 15, 1929–1958 (2014).

32. Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in International Conference on Machine Learning, (PMLR, 2015), pp. 1180–1189.

33. B. Sun and K. Saenko, “Deep CORAL: Correlation alignment for deep domain adaptation,” in European Conference on Computer Vision, (Springer, 2016), pp. 443–450.

34. D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,” in International Conference on Learning Representations, (2020).

35. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE International Conference on Computer Vision, (2017), pp. 618–626.

Layer	Layer Type	Feature Maps	Size	Kernel Size	Stride	Activation
Input	Image	1	50 x 50	-	-	-
1	Convolution	10	52 x 52	3x3	1	ReLU
2	Max Pooling	10	26 x 26	2x2	2	ReLU
3	Convolution	10	28 x 28	3x3	1	ReLU
4	Max Pooling	10	14 x 14	2x2	2	ReLU
5	Convolution	5	16 x 16	3x3	1	ReLU
6	Max Pooling	5	8 x 8	2x2	2	ReLU
7	Convolution	5	10 x 10	3x3	1	ReLU
8	Max Pooling	5	5 x 5	2x2	2	ReLU
9	Fully Connected	-	45	-	-	ReLU
10	Fully Connected	-	2	-	-	Softmax

Route	ACC	F1	AUC
1	0.79 (0.04)	0.84 (0.04)	0.87 (0.04)
2	0.79 (0.04)	0.77 (0.04)	0.80 (0.04)
3	0.96 (0.06)	0.95 (0.09)	1.0 (0.0)

Route	ACC	F1	AUC
1	0.77 (0.04)	0.82 (0.04)	0.86 (0.04)
2	0.69 (0.03)	0.79 (0.04)	0.74 (0.04)
3	0.96 (0.05)	0.96 (0.05)	0.99 (0.05)

Test Route	ACC	F1	AUC
1	0.72 (0.03)	0.71 (0.02)	0.74 (0.07)
2	0.74 (0.16)	0.86 (0.21)	0.78 (0.09)
3	0.90 (0.04)	0.89 (0.04)	0.98 (0.02)

Test Route	ACC	F1	AUC
1	0.71 (0.09)	0.77 (0.08)	0.75 (0.12)
2	0.64 (0.13)	0.69 (0.17)	0.66 (0.10)
3	0.75 (0.10)	0.74 (0.09)	0.83 (0.11)

Ambient noise-based weakly supervised manhole localization methods over deployed fiber networks

Abstract

1. Introduction

1.1 Related work

2. System and experimental setup

3. Methodology

3.1 Data preparation

3.2 Training: network architecture

3.3 Inference: from classification to localization

4. Real data performance analysis

4.1 Main results

4.2 Ablation experiments

4.3 Visual explanation of model performance

5. Conclusion and future work

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Tables (9)

Equations (5)

Optics Express

Method	Label Assignment	Sampling	Test ACC	Test F1
Baseline	Instance Level	random	0.78 (0.07)	0.77 (0.07)
With data selection	Instance Level	top-K	0.85 (0.07)	0.86 (0.06)
With MIL	Group Level	random	0.81 (0.05)	0.81 (0.05)
With both data selection and MIL	Group Level	top-K	0.84 (0.08)	0.85 (0.07)

Route	Length (km)	Duration (hrs)	Number of manholes	Number of patches
1	20	120	76	30,000,000
2	17	20	46	5,000,000
3	16	75	40	14,000,000

Route	Length (km)	Duration (hrs)	Number of manholes	Number of patches
1	20	120	76	30,000,000
2	17	20	46	5,000,000
3	16	75	40	14,000,000