Constellation-based identification of linear and nonlinear OSNR using machine learning: a study of link-agnostic performance

Hyung Joon Cho; Daniel Lippiatt; Varghese A. Thomas; Siddharth Varughese; Steven Searcy; Thomas Richter; Sorin Tibuleac; Stephen E. Ralph

doi:10.1364/OE.443585

1. Introduction

The optical signal-to-noise ratio (OSNR) is a primary optical link performance metric. Monitoring the OSNR enables network operators to validate the expected performance from the network planning stage, monitor the health of the WDM system permitting preemptive maintenance and to optimize the link capacity and/or margin, particularly when changes are made to the link. The traditional OSNR monitored by comparing the signal power to the adjacent out-of-band noise power using an optical spectrum analyzer (OSA) is the OSNR_ASE and it suffers from multiple challenges. First, the tight channel spacing in dense wavelength division multiplexed (DWDM) systems completely obfuscates the noise floor, Fig. 1(a). Second, the OSNR_ASE mostly captures the linear ASE noise and does not accurately capture the non-linear interference contribution, since nonlinear interference is predominantly found in-band, Fig. 1(b) [1]. Finally, OSNR_ASE measurement using an OSA often involves changes to the link, for example, removing adjacent channels. The generalized OSNR (GOSNR) ideally captures only optical impairments including linear noise and nonlinear interference and is the primary metric of interest since it reflects the BER more accurately. In the low nonlinearity regime, GOSNR is close to the OSNR_ASE. For all other scenarios, GOSNR is lower than the OSNR_ASE [2].

Fig. 1. Conceptual optical spectra demonstrating the challenges of measuring OSNR_ASE using OSA in a deployed environment. (a) DWDM spectrum showing inaccessible noise floor due to side channel interference; (b) a spectrum showing the difference between in-band noise and out-of-band noise floor.

Download Full Size | PDF

The GOSNR can be estimated for a specific link by identifying the back-to-back (B2B) OSNR_ASE needed to achieve the same BER [3,4]. This presumes the transceiver impairments are constant for B2B and after fiber transport. In deployed links, identifying separate contributions of linear noise and non-linear interference also requires the knowledge of OSNR_ASE of the deployed link, which in turn generally requires adjacent channels to be turned off or the transmission of specific symbols to estimate the noise floor [5].

We previously demonstrated accurate OSNR_ASE estimation using machine learning (ML) techniques employing only I-Q constellation images [6,7] without separately discerning linear and nonlinear interference. ML can potentially identify unique Gaussian and non-Gaussian signatures of ASE and nonlinear interference, respectively. Nonlinear interference originates from three types of interactions via fiber nonlinearity: signal-signal, signal-ASE and ASE-ASE [8]. Such interactions can be mitigated through digital back propagation techniques or maximum a posterior joint detection of center and adjacent channels [9]. Signal-signal interactions are the most likely to produce distinct non-Gaussian signatures and are likely to be the dominant interaction. However, it is noted that in a deployed system the adjacent channels may originate at different points in the network and may have accumulated significant dispersion prior multiplexing in which case they may appear noise-like. We use the term nonlinear noise to denote the total of all nonlinear interferences in our system and the associated signal to nonlinear noise ratio as OSNR_NL. Estimating the GOSNR from constellation density matrices using machine learning requires the OSNR_ASE of the deployed link only in the ML training phase. Nonlinear interference from fiber nonlinearity is unlike amplified spontaneous emission (ASE) noise because distortions associated with nonlinear interference noise are not necessarily circularly Gaussian symmetric [10]. This is especially true for monitoring nonlinearities within metro networks with the associated lower dispersion. These networks have generated significant interest as can be seen from the use of predistortion and probabilistic shaping for nonlinear mitigation within these systems [11,12]. The recovered I-Q constellations should therefore provide unique signatures of ASE and nonlinear noise. In fact, our results do demonstrate that the ML technique of convolutional neural network (CNN) is able to discern linear and nonlinear noise in these metro links. It remains a further challenge to examine possibilities for long-haul links, where the large accumulated dispersion tends to result in noise that is mostly “circular”, as suggested by the success of the Gaussian Noise model and other works.

Recently, it has been demonstrated that a deep neural network (DNN) ML technique based on the amplitude histogram can accurately estimate OSNR [13–15]. However, these works were typically performed at a fixed nonlinear noise power without separately identifying the relative contributions of nonlinear and ASE noise. Previous efforts have demonstrated the ability to estimate the nonlinear noise and ASE noise separately using the amplitude noise correlation characteristics and a known sequence of symbols [16,17]. However, this technique requires synchronous sampling at high sample rates and hence the availability of high-speed data from some point in the receiver DSP chain. On the other hand, constellations can be acquired asynchronously and at lower rates, dramatically easing implementation. Most importantly, none of the previous efforts have demonstrated complete or partial universality (within a regime of interest) of their models, where this refers to the ability to train on one link configuration and successfully estimate noise on a different link configuration.

In this work, we implement a multi-tasking CNN that maps constellation density features to ASE and nonlinear noise characteristics simultaneously. Given the complex nature of nonlinear noise and its dependence on various parameters such as modulation format, launch power, copropagating channels, fiber characteristics etc., it is advantageous to quantify this metric using a data driven approach such as machine learning. The use of constellations instead of waveforms enhances the practicality of the method. We begin by assessing the performance of our ML model on linear OSNR_ASE, nonlinear OSNR_NL, and GOSNR when the model is trained and tested on data obtained from the same link.

We also consider the scenario of reduced training at a few optical launch powers spanning the data set and then predicting the noise metrics at intermediate optical powers not used in training. We then assess the universality of our model within the regime of metro networks by cross-training with data from metro links comprised of different fiber types. Linear OSNR_ASE, nonlinear OSNR_NL and GOSNR estimation are demonstrated with ≤ 0.5 dB mean absolute error (MAE) for experimental dual-polarized 32-GBaud 16-QAM waveforms. We also verified our ML approach using data obtained from QPSK experiments, where the trends are similar to 16-QAM. Here, we focus on the 16-QAM results. It should be noted that error refers to imperfect estimation and that the training, validation as well as testing data has the typical acceptable uncertainty associated with any experimental effort.

2. Methodology

A CNN is a variant of a DNN and has proven to be highly effective in various image processing applications such as brain image analysis in medical science [18,19]. An image is a matrix of pixel values, and a CNN finds useful relations between input images and output labels through application of several kernels (filter masks) that slide across the width and the height of the pixel matrix and assign learnable parameters such as weights and biases to various parts of the image creating distinctive feature maps composed of smaller matrices. These extracted features are passed to a network of nodes with nonlinear activation functions that create a nonlinear mapping between the network’s inputs and outputs.

The symbol distributions (2D distribution around each symbol centroid) within a constellation image depend on the ASE noise and nonlinear noise with the asymmetric distortions associated with nonlinear noise being pronounced in higher modulation formats [20,21]. We anticipate that a CNN can map the constellation density features simultaneously onto the quantity of linear ASE noise and nonlinear noise, Fig. 2. Subsequently, we can also estimate the GOSNR [3,4]:

(1)$$\frac{1}{{OSN{R_{TOT}}}} = \frac{1}{{OSN{R_M}}} + \left( {\frac{1}{{OSN{R_{ASE}}}} + \frac{1}{{OSN{R_{NL}}}}} \right) = \frac{1}{{OSN{R_M}}} + \frac{1}{{GOSNR}}\; $$

where OSNR_ASE and OSNR_NL are the linear ASE and nonlinear contributions to the total OSNRTOT and OSNRM captures the transceiver penalty. We note that the GOSNR refers to the optical signal quality and is separate from the transceiver penalty. We show in the next section that our transceiver penalty is small and can be neglected for the range of OSNR_ASE considered here. The OSNR_ASE estimates the linear in-band ASE noise by measuring the local out-of-band ASE noise. Our CNN consists of three parts: Input Layer, Shared Layers, and Job-Specific Layers generating the OSNR_ASE and OSNR_NL estimates. The input layer simply pre-processes the input data. Here the I-Q-constellation symbols are classified into 2D-bins resulting in a 2D-histogram matrix capturing the constellation density. Each element of this matrix represents the number of symbols in a certain bin size within the I-Q-constellation image.

Fig. 2. CNN architecture for simultaneous OSNR_ASE and OSNR_NL estimation. X- and Y-polarized constellations are treated as two separate channels instead of the three RGB channels in the standard CNN application [22]. Two convolutional layers and 2 corresponding pooling layers in Shared Layers and 1 extra convolutional layer and 1 corresponding pooling layer were implemented. For this first layer, the density matrix is zero-padded to enable capture of information in the periphery of the constellations. Each convolutional layer is followed by a batch normalization layer to mitigate overfitting [23] and a nonlinear rectified linear unit to extract features, (not shown). Pooling reduces the matrix size and is accomplished by averaging over a 2 × 2x2 matrix that is scanned with a 2 × 2x2 step size (i.e., stride). Kernels are initialized using the Glorot method to prevent vanishing or exploding gradients during the training process [24,25]. Dropout layers with a factor of 0.1 are implemented to further prevent overfitting. The outputs are estimated throughout CNN models trained with regression fitting. 25 selected feature maps in (i) first convolutional layer, (ii) second convolutional layer, (iii) third convolutional layer in the top branch, and (iv) third convolutional layer in the bottom branch.

Download Full Size | PDF

The Shared Layers consist of two convolutional layers and two corresponding pooling layers for generating, via 2D convolutions with kernels, several feature maps containing low to mid-level spatial features of the constellation density matrix. Figure 2(i) shows 25 selected feature maps generated by the kernels inside the first convolutional layer. The pooling layers downsample the feature maps, while preserving features and reducing computational complexity. The widely used leaky rectified linear units (ReLUs) are the nonlinear function immediately following each convolutional layer. The conventional ReLU is a piecewise linear function that outputs the input if it is positive, and outputs zero if it is negative [26]. A leaky ReLU responds with a small slope (e.g., 0.01) for inputs in the negative region, and is known to prevent the death of neurons with negative inputs during the training. These neurons would not have played any role had a conventional ReLU been used [27]. We used average pooling to mitigate overfitting. The second convolutional layer extracts higher level spatial features.

The Job-Specific Layers have two branches, each of which have one additional convolutional layer to extract features specific to linear or nonlinear noise contributions. The feature maps after the 3A and 3B convolutional layers in the top OSNR_ASE and bottom OSNR_NL branches, respectively, are largely abstracted by the machine learning itself, Fig. 2(iii)-(iv). However, we note that the activations are more spread out across the feature maps after convolutional layer 3A as the ASE noise is uniformly spread around all constellation points, while the activation functions are more sparsely concentrated in the feature maps after convolutional layer 3B likely because they represent the asymmetrical distortions around the constellation points. Insets (iii) and (iv) show 25 out of the 256 feature maps after the convolutional layers in the top and bottom branches, respectively. The global average pooling layer at the end computes the overall mean of the final feature maps and these are vectorized by fully-connected (FC) layers in each branch. We doubled the number of kernels in the subsequent convolutional layers following the first convolutional layer as there are more combinations of features to capture. We used appropriately small kernel sizes in the convolutional layers to ensure accurate capture of the density variations and used OSNR_NL and OSNR_ASE in dB for target variables to avoid unstable and large gradient updates arising from the wide dynamic ranges of OSNR_ASE and OSNR_NL [7].

The specific CNN structure was heuristically determined: 30 × 30 bin size using 30,000 symbols and CNN hyperparameters such as the number of convolutional layers (3) and kernel size (64/128/256) were designed for best performance with minimal computational complexity, Fig. 3. We explored the root-mean-squared training loss (difference between target and estimated value) for different bin sizes and number of symbols, in OSNR_ASE and OSNR_NL estimation, Fig. 3(a) and 3(b) respectively. In this optimization phase, we measured data consisting of 104 combinations of ASE and nonlinear noises from our homogeneous link described in section 3. We also investigated the performance when the number of kernels are gradually increased from a combination of 16 (layer #1) /32 (layer #2) /64 (layer #3) to 256/512/1024 by increasing the kernels in all convolutional layers by the same factor each time, Fig. 3(c) and 3(d). We increased the number of convolution and pooling layers from 1 to 4 and chose 3 since beyond that there is no significant performance improvement. The training loss of OSNR_NL was higher than OSNR_ASE in part due to the fewer and sparser OSNR_NL target labels.

Fig. 3. RMSE training loss performance dependence on bin resolution and number of symbols in the constellation density matrix for (a) OSNR_ASE; (b)OSNR_NL; Training loss performance for (c) OSNR_ASE and (d) OSNR_NL versus the ratio of number of kernels where 1 indicates 16/32/64 kernels in the first, second, and third convolutional layers and increasing ratio refers to increasing the kernels in all layers by the same factor.

Download Full Size | PDF

3. Experiment setup

3.1 DWDM setup

We examine two distinct 3-span links operating in C-band in which the center channel suffers a dispersion of either ∼4,600 ps/nm or ∼3,500 ps/nm, Fig. 4(a). For links with modest dispersion, the nonlinearities are expected to exhibit a non-Gaussian signature. In our DWDM system, adjacent channels are emulated by filtered ASE noise having the same power and spectral density as the center channel [28]. Unlike the EDFA ASE noise that is added after each span, adjacent channels emulated by filtered ASE noise are only added once at the beginning of the link. Prior to detection, the optical signal is filtered with an optical bandpass filter (3.5 order Super-Gaussian, BW = 37.5 GHz). We note that using modulated signals as adjacent channels as well as using additional adjacent channels will make the non-Gaussian signature of the nonlinear products more readily identifiable. The adjacent channels were multiplexed with a tight spacing of 37.5 GHz, Fig. 4(b). The center channel’s OSNR_ASE was measured by temporarily increasing the channel spacing from 37.5 GHz to 112.5 GHz, Fig. 4(c). We did not remove the adjacent channels so that the power in the spans remains constant. The OSNR_ASE was varied by ASE noise loading after fiber propagation but before the optical bandpass filter. We used standard PRBS-15 electrical signals to drive the QAM transmitter. The received signal is processed using standard offline digital signal processing including chromatic dispersion compensation, sampling timing recovery, polarization demultiplexing, frequency offset compensation, carrier phase recovery, and channel equalization [29].

Fig. 4. (a) Experimental DWDM link configured to have a DP-32 GBaud QAM signals copropagating through three spans of 90 km SSMF with two side channels emulated by amplified and filtered noise; (b) Optical spectrum of the tightly spaced transmitted signal; (c) optical spectrum with temporarily increased channel spacing for OSNR measurement.

Download Full Size | PDF

First, the 16-QAM signals were transmitted across a homogeneous link of 3 spans, each of which are 90 km SSMF. In the inhomogeneous link, the last span was replaced by non-zero dispersion shifted fiber (NZDSF). The attenuation, dispersion parameter and nonlinear coefficient of the SSMF are approximately 0.19 dB/km, 17.0 ps/nm-km and 1.17 W⁻¹km⁻¹, respectively, while they are 0.21 dB/km, 4.5 ps/nm-km, and 2.03 W⁻¹km⁻¹ for the NZDSF [30].

3.2 OSNR_NL extraction for training

Figure 5(a) depicts the conventional BER vs. launch power, where OSNR_ASE is allowed to freely vary with launch power producing the traditional “inverted bell-shaped” response with the bottom of the inverted bell representing the optimum launch power for minimum BER. The clearly defined minima reveals that any transceiver penalty does not produce a BER floor in the OSNR_ASE range of interest. Transceiver limitations may be significant for OSNR_ASE higher than those considered here [31]. In Fig. 5(b), we increased the launch power and held the OSNR_ASE constant at 23 dB by adjusting the noise loading to emphasize that the increasing BER at higher optical power is occurring due to fiber nonlinearity. Furthermore, the NZDSF span results in an increased nonlinearity for the inhomogeneous link resulting in higher BER for the same optical power and OSNR_ASE, Fig. 5(b). We note that the different links exhibit slightly different limiting BERs at low launch power [32].

Fig. 5. BER vs. Launch power (a) variable OSNR_ASE which corresponds to the conventional “inverted bell-shaped” curve used for determining the optimum launch power and (b) OSNR_ASE constant at 23 dB by adjusting the noise loading to emphasize that the increasing BER at higher optical power is occurring due to fiber nonlinearity.

Download Full Size | PDF

The GOSNR includes both the ASE and nonlinear noise and determines the Q-factor $20lo{g_{10}}\left( {\sqrt 2 erf{c^{ - 1}}({2BER} )} \right)$ and BER for a given link, while OSNR_ASE only captures the in-band ASE noise using the out-of-band noise floor. To determine the GOSNR and OSNR_NL used for CNN training, we compare the Q-factor vs. OSNR_ASE performance for the B2B and fiber link cases., Fig. 6(a). We then identify the GOSNR of the fiber link as the OSNR_ASE required to achieve the same BER in the B2B case. Graphically this corresponds to “tracing back” from the link performance plot to the B2B plot along a fixed BER [3,4], Fig. 6(a). The fiber link’s OSNR_NL used for CNN training is measured via Eq. (1) using the measured OSNR_ASE and the GOSNR identified from this “tracing back” method. Figure 6(a) shows the Q-factors of the homogeneous link, while its measured OSNR_NL is shown in Fig. 6(b). The stronger saturation in the Q-factor plots at higher OSNR_ASE for the non-B2B cases occurs primarily due to the fiber nonlinearity becoming increasingly significant as the ASE contribution reduces. As expected, higher launch powers result in lower Qs for all OSNR_ASE. As expected [33], there is a very small variation in measured OSNR_NL for constant launch power, despite the OSNR_ASE varying substantially. This enables a CNN, which was trained using the average OSNR_NL, to generate robust estimates. The average OSNR_NL decreases with launch power due to increase in nonlinear effects as expected. The average OSNR_NL for the inhomogeneous link was consistently lower than the corresponding homogeneous link, Fig. 6(c), again due to the higher nonlinear coefficient of the NZDSF fiber.

Fig. 6. (a) Q-factor of 3 spans of homogeneous link and depiction of the traceback method that identifies the GOSNR and OSNR penalty due to nonlinearities; (b) increasing OSNR_NL with launch power for the 3-span homogeneous link; (c) Average OSNR_NL in inhomogeneous link and homogeneous link.

Download Full Size | PDF

The data was captured with the OSNR_ASE increased in steps of 0.5 dB, from 17 dB to 23 dB. Optical launch power was increased in steps of 1 dBm from 1 dBm to 8 dBm in the homogeneous link, and from 0 dBm to 7 dBm in the inhomogeneous link. Two hundred forty waveforms were captured for each unique combination of OSNR_ASE and launch power. The CNN training/validating/testing split was 160/40/40 waveforms per configuration.

4. ML estimation results

4.1 ML performance for training and testing on the same link

We first assess the capability of our model when trained and tested on the same link [6,7,13–17]. Testing was done for all 104 combinations of 13 measured OSNR_ASE and 8 launch powers. For each combination, estimation of OSNR_ASE, OSNR_NL and GOSNR was done for 40 waveforms and the mean estimate was computed. Hence, a total of 4,160 waveforms were used for testing. However, only five launch powers spanning the low to high OSNR_NL regions in the two links are shown for clarity.

The mean CNN-estimated vs. the measured OSNR_ASE revealed good performance, Fig. 7. Over all 4,160 waveforms tested, the maximum estimation error was 0.75 dB and the MAE was 0.11 dB in the homogenous link, while the maximum estimation error was 1.51 dB and the MAE was 0.21 dB in the inhomogeneous link. Additionally, we have shown the root mean square error (RMSE) for an example launch power in the insets. Generally, the estimation errors were slightly higher in the inhomogeneous case at higher OSNR_ASE. The higher nonlinear noise in the inhomogeneous case may mask changes in the constellation density at high OSNR_ASE. However, estimation errors ≥ 1 dB were only seen ≤ 1% of the entire testing data set.

Fig. 7. CNN performance in estimating OSNR_ASE when trained and tested on same link (a) homogeneous link; (b) inhomogeneous link. Markers represent mean CNN-estimated OSNR_ASE. Error bars indicate worst case estimate over 40 waveforms. Insets depict the RMSE for a single launch power.

Download Full Size | PDF

OSNR_NL are well estimated for all OSNR_ASE for both links, Fig. 8. The solid lines represent the average measured OSNR_NL used as target labels for CNN training, where averaging was done over all OSNR_ASE at a given launch power, while the markers represent the mean estimates over the 40 waveforms at a given launch power and OSNR_ASE. The MAE was computed between the average measured OSNR_NL and the CNN-estimated OSNR_NL over the entire 4,160 test waveforms. The OSNR_NL estimation has relatively higher estimation errors than OSNR_ASE. The inhomogeneous link exhibits a higher sensitivity to launch power due to the higher average nonlinearity. On the other hand, the homogeneous link had a slightly higher dispersion. Dispersion causes multiple effects depending on the amount of dispersion and nonlinearity. At modest dispersion, the conversion of nonlinear phase distortions to amplitude distortions is discernible. This likely caused the maximum estimation error of OSNR_NL to be slightly lower in the homogenous link. However, in long-haul links, the larger accumulated dispersion tends to circularize the distortions and degrades the estimation accuracy. The MAE was ≤ 0.5 dB in both links and the maximum estimation error was ≤ 2 dB and ≤ 2.5 dB for the homogeneous and the inhomogeneous links, respectively. However, estimation errors higher than 1 dB occurred in ≤ 5% of the testing data sets.

Fig. 8. CNN-estimated OSNR_NL when trained and tested on same link (a) homogeneous link; (b) inhomogeneous link. Colored solid lines represent the OSNR_NL from the traceback method. Colored markers represent the CNN-estimated OSNR_NL.

Download Full Size | PDF

The CNN-estimated GOSNR computed using CNN estimates and Eq. (1) match the GOSNR determined via the measured Q and the traceback analysis, Fig. 9. The MAE and the maximum error were < 0.2 dB and < 0.7 dB for both links, demonstrating accurate estimation. The errors in the estimated GOSNR are relatively lower than OSNR_ASE because GOSNR is estimated from features of the total noise), thereby circumventing the need to separate the linear ASE and nonlinear noise. The estimation error histograms for the homogenous link, Fig. 10, reveal that the OSNR_NL error tends to be larger than the OSNR_ASE error. Figure 10(a) also reveals a slight tendency to have larger OSNR_ASE error at higher launch powers consistent with Fig. 7. The error histograms for OSNR_NL, Fig. 10(b), reveal a larger error but no simple trend. These behaviors are representative of all cases.

Fig. 9. CNN-estimated 16-QAM GOSNR when trained and tested on same link (a) homogeneous link; (b) inhomogeneous link. Dotted lines represent the GOSNR from traceback method. Insets depict the RMSE for a single launch power.

Download Full Size | PDF

Fig. 10. Error histograms of homogeneous link for (a) OSNR_ASE; (b) OSNR_NL. Colored solid lines represent the errors for the different launch powers.

Download Full Size | PDF

We trained our CNN using only data with moderate to high nonlinearity and tested it in the low nonlinearity regions, Fig. 11. The threshold OSNR_NL is the OSNR_NL at and beyond which the data has moderate nonlinearity. Each marker represents the average of 40 CNN-estimated OSNR_NL at that OSNR_ASE and launch power. Most of the estimated OSNR_NL in the low nonlinearity region were close to or just above the measured OSNR_NL threshold used in training, which represents a qualitatively accurate estimation. It is important to note that quantitative assessment outside of the machine learning training range cannot be performed. This test showed that the CNN had extracted useful features that can characterize the nonlinear noise in the constellations.

Fig. 11. CNN-estimated 16-QAM OSNR_NL when trained on moderate to high nonlinear regions and tested on ASE-dominated low nonlinearity data for (a) homogeneous; (b) inhomogeneous links. Colored solid lines represent the OSNR_NL from the traceback method. NR: nonlinear region; NTR: nonlinear threshold regions.

Download Full Size | PDF

Lastly, our I-Q constellation densities were formed from 30,000 contiguous data symbols which requires high sampling rate and high bandwidth hardware in the receiver DSP, thereby increasing the implementation cost. However, the constellation density matrices can be formed using down-sampled data with equal integrity. We created new constellation images by concatenating six 5,000 symbol segments randomly selected from the waveforms at a given optical power and OSNR_ASE. We ensured that the same segment of symbols is not used in multiple constellation images. We used 5,000 contiguous symbols in a segment to ensure each possible symbol and symbol transitions were equally represented. We trained and tested our CNN on this data for both link cases. The accuracy of estimation was equivalent to using 30,000 contiguous symbols, thus demonstrating that our CNN performs well with drastically down-sampled acquisition of symbols.

4.2 ML performance with reduced training set

We tested using a reduced training set and found CNN to be robust and sufficiently general to accurately predict for cases not considered in the training phase but spanned by the training set. This provides two advantages, first, it increases the degree of universality of the model within the regime of metro networks and second, the speed of ML training is increased by training on a smaller data set.

For the homogeneous link, we trained using the data at 3 dBm, 6 dBm, and 8 dBm and tested on the data at 4 dBm, 5 dBm, and 7 dBm, that is only on data that has launch power within bounds of the training. For the inhomogeneous link scenario, we trained the CNN using the data at 2 dBm, 5 dBm, and 7 dBm and tested on the data at 3 dBm, 4 dBm, and 6 dBm. We included the OSNR_NL extremities in the training. The mean estimated OSNR_ASE, OSNR_NL, and GOSNR, Fig. 12, Fig. 13, and Fig. 14, respectively demonstrate that the estimation errors are comparable or only marginally higher than when training using all eight launch powers. These results demonstrate an important capability to estimate performance at intermediate launch powers not used in training in addition to performing well at launch powers used for training.

Fig. 12. CNN-estimated OSNR_ASE when trained on subset of launch powers and tested on the other launch powers in same link. (a) homogeneous link; (b) inhomogeneous link. Insets depict the RMSE for a single launch power.

Download Full Size | PDF

Fig. 13. CNN-estimated OSNR_NL of 16-QAM when trained on a subset of the launch power range and tested on the remaining launch powers in the same link. (a) homogeneous link; (b) inhomogeneous link. Colored solid lines represent the OSNR_NL from the traceback method. Colored markers represent the CNN-estimated OSNR_NL.

Download Full Size | PDF

Fig. 14. CNN-estimated GOSNR of 16-QAM when trained on a subset of the launch power range and tested on the remaining launch powers in the same link. (a) homogeneous link; (b) inhomogeneous link. Dotted lines with markers represent the GOSNR from traceback method. Insets depict the RMSE for a single launch power.

Download Full Size | PDF

4.3 Assessing the limits of universality: ML performance for cross-training

Although the application of ML in optical performance monitoring (OPM) is an ongoing research area, most of the reports [6,7,13–15] use a ML model that is trained and tested on the same link or set of links [16,17]. However, it is significantly more useful if one can attain a high degree of universality in which the ML is trained one link yet is applicable to any other link or links within a class. This goal is made more challenging because a particular GOSNR and BER can be associated with different symbol distributions (2D distribution around each symbol centroid) within a constellation image. Here, we assess the performance and identify sources of error, when we cross-train a CNN on a metro class link that is different from the testing link. We conducted the cross-training test for two new scenarios: training on the homogeneous link and testing on the inhomogeneous link and vice versa.

For the first cross-training scenario we trained on the homogeneous link and tested on inhomogeneous link, Fig. 15(a). Since the inhomogeneous link exhibits OSNR_NL outside the range observed from the homogenous link, we only test on the inhomogeneous link with launch powers of 0 dBm to 5 dBm. Here, Fig. 15(a), the estimated OSNR_ASE in the low OSNR are comparable to the case when training and testing were done using the same link. However, we observe a systematic underestimation that increased with OSNR_ASE and with launch power. This deviation is likely caused by the different appearance of nonlinear features in the constellations of the two links. Essentially, the CNN associates the increasing nonlinear features of the inhomogeneous link with ASE and hence the OSNR_ASE estimate is lower than measured. The reverse behavior is observed with the opposite case when the CNN is trained on the inhomogeneous link and tested on the homogeneous link, Fig. 15(b). In Fig. 15(b), we did not train on 6 dBm and 7 dBm data as a similar OSNR_NL range during training and testing ensured better OSNR_ASE estimation than when the training had a wider OSNR_NL range. These results demonstrate that the nonlinear features appear differently for different link configurations.

Fig. 15. CNN-estimated OSNR_ASE when cross training with no target label normalization (a) trained on homogeneous link and tested on inhomogeneous link, (b) trained on inhomogeneous link and tested on homogeneous link.

Download Full Size | PDF

Figure 16 shows the constellations of the inhomogeneous and homogeneous links having the same 23 dB OSNR_ASE and 24.4 dB OSNR_NL. The observed distribution of received symbols is different. This results from the different nonlinear coefficient, dispersion, and their distributed interplay in the two links resulting in a different constellation distortion [34]. These distortions become more visible for scenarios where nonlinear noise is significant and not obscured by the ASE noise, namely at high OSNR_ASE and at high operating powers.

Fig. 16. Constellations of 16-QAM at 23 dB OSNR_ASE and 24.4 dB OSNR_NL in (a) homogeneous link; (b) inhomogeneous link demonstrating that different links exhibit different constellation distributions when operated at the same OSNR_ASE and OSNR_NL.

Download Full Size | PDF

The estimated OSNR_NL when the CNN was trained on the homogeneous link and tested on the inhomogeneous link using all launch powers are shown in Fig. 17(a) and the reverse scenario in Fig. 17(b), where we observe overestimated and underestimated OSNR_NL, respectively. This is consistent with the observation of Figs. 15(a) and 15(b). Although the use of a multiple optical powers enabled the CNN to learn the overall dependence of nonlinearity on optical power, the specific features of the nonlinear signature depend on the link details including the dispersion and fiber nonlinearity of each span and the relative location of each fiber span. Without this information ML may not be able to precisely capture the dependency of OSNR_NL on launch power for an arbitrary link when not trained on that link. To test this, we demonstrate a simple ad hoc normalization of the link nonlinearities although wide universality may require more link metrics be made available to the CNN. This is further motivated by the observed systematic offset of the OSNR_NL estimations. Specifically, we scale the OSNR_NL of the untrained link by a single parameter; the ratio of an effective nonlinear coefficient of the two links. An effective nonlinear coefficient can be estimated from a single measurement or analytically [35,36]. Both methods lead to the determination that the ratio of the effective fiber nonlinearity between the two links is approximately 1.5.

Fig. 17. CNN-estimated OSNR_NL when cross training with no target label normalization (a) trained on homogeneous link and tested on inhomogeneous link; (b) trained on inhomogeneous link and tested on homogeneous link. Colored solid lines represent the OSNR_NL from the traceback method. Colored markers represent the CNN-estimated OSNR_NL.

Download Full Size | PDF

We estimate an effective nonlinear coefficient for the inhomogeneous link as:

(2)$${a_{inhomogeneous}} = \frac{{{a_{SSMF}} \cdot {L_{SSMF}} + {a_{NZDSF}} \cdot {L_{NZDSF}}}}{{{L_{SSMF}} + {L_{NZDSF}}}}\; $$

where ${a_{SSMF}}$ and ${a_{NZDSF}}$ are the nonlinear coefficient of SSMF and NZDSF having lengths of ${L_{SSMF}}$ and ${L_{NZDSF}}$, respectively.

The OSNR_NL estimated using this scaled approach are shown in Fig. 18 for the two cross training scenarios. All the launch powers were used during training as the OSNR_NL ranges for the two links match post normalization. The systematic over/under estimations are nearly eliminated. The estimated OSNR_NL in the moderately nonlinear region have errors comparable to those when training and testing is done on the same link. The errors are somewhat higher in the high nonlinearity region, most likely due to the estimation limits of the ML model, which was trained on sparse high-step size OSNR_NL data in this region. There is also likely estimation error due to the difference in the chromatic dispersion map of each link. Clearly the difference in the complex distributed interplay between chromatic dispersion and nonlinearity in the two links may not be completely captured through this simple normalization. However, these results suggest that a universal model at least within a class of links such as those associated with metro links with moderate accumulated dispersion and only a few spans, may be possible with additional link information added as input to the CNN.

Fig. 18. CNN-Estimated OSNR_NL using normalized target labels when (a) trained on homogeneous link and tested on inhomogeneous link; (b) trained on inhomogeneous link and tested on homogeneous link. Nonlinear coefficient of tested link is available. Colored solid lines represent the OSNR_NL from the traceback method. Colored markers represent the CNN-estimated OSNR_NL.

Download Full Size | PDF

Lastly, we demonstrate that the GOSNR are still accurately estimated when the ML was trained on the homogeneous link and tested on the inhomogeneous link along with the reverse cross training, Fig. 19, although the individual contributions to the GOSNR may not be precisely estimated. We have also compared these estimated GOSNR with the estimated GOSNR when the ML was directly trained with GOSNR as the target variable and found the estimation errors to be similar. Thus, universality of GOSNR estimation within the regime of metro networks is more readily accomplished than identifying individual noise contributions using CNN. Interestingly, a study of the skew and kurtosis did not reveal any high level, easily identifiable difference between the scenarios where linear and nonlinear noise dominates.

Fig. 19. CNN-Estimated GOSNR when cross training with no target label normalization (a) trained on homogeneous link and tested inhomogeneous link; (b) trained on inhomogeneous link and tested on homogeneous link. Dotted lines represent the GOSNR using the traceback method. Insets depict the RMSE for a single launch power.

Download Full Size | PDF

5. Conclusion

We have demonstrated that ML can be used to accurately estimate the OSNR_ASE and OSNR_NL when training and testing are done on the same link. We implemented a multi-tasking CNN that maps the constellation density features simultaneously onto the ASE noise and nonlinear noise levels, thereby indirectly estimating GOSNR. We also demonstrated that accurate estimation of these metrics is possible even at intermediate optical powers that were not used during training, which enables real time tracking of these metrics in the event of random or systematic variation of operating power/gain in the network over time. Training using a reduced data set was shown to perform without significant increase in the estimation error.

Estimation of the OSNR_ASE and OSNR_NL were demonstrated with < 0.5 dB MAE for 32-GBaud 16-QAM DWDM links. We demonstrated the robustness of our approach by using links comprised of two different fiber types. The GOSNR can in turn be estimated to within 0.5 dB MAE using the estimated OSNR_ASE and OSNR_NL. We also showed that the performance was robust when training was done using noncontiguous asynchronously captured data significantly reducing hardware requirements.

Finally, we investigated the possibility of creating a universal machine learning estimation tool by considering the scenario of training on one link and testing on a link comprised of different fiber types. We identified a simple scaling approach based on an effective link nonlinearity that enables good cross trained estimation. The results suggest that including additional link metrics as inputs to the ML may enable a fully universal ML method at least within a class of links. All methods presented are readily deployed without the need for additional hardware nor transmission of special symbols or patterns.

Funding

Georgia Electronic Design Center; L3Harris Technologies, Inc.; ADVA Optical Networking, Inc..

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. R.-J. Essiambre, G. P. Kramer, J. Winzer, G. J. Foschini, and B. Goebel, “Capacity limits of optical fiber networks,” J. Lightwave Technol. 28(4), 662–701 (2010). [CrossRef]

2. A. D. Shiner, M. E. Mousa-Pasandi, M. Qiu, M. A. Reimer, E. Y. Park, M. Hubbard, Q. Zhuge, F. J. Vaquero Caballero, and M. O’Sullivan, “Neural network training for OSNR estimation from prototype to product,” in Proc. Optical Fiber Communication Conference and Exhibition, paper M4E.2 (2020).

3. E. R. Hartling, A. Pilipetskii, D. Evans, E. Mateo, M. Salsi, P. Pecci, and P. Mehta, “Design, acceptance and capacity of subsea open cables,” J. Lightwave Technol. 39(3), 742–756 (2021). [CrossRef]

4. L. Berg, “Demystifying Transceiver and Line Characterization Metrics,” in Proc. Optical Fiber Communication Conference and Exhibition, Tutorial N1 (2019).

5. C. Rasmussen and M. Aydinlik, “Optical signal-to-noise ratio monitoring and measurement in optical communications systems,” U.S. patent 2,015,036,516,5A1 (2017).

6. H.J. Cho, D. Lippiatt, S. Varughese, and S.E. Ralph, “Convolutional neural networks for optical performance monitoring,” in Proc. IEEE Avionics and Vehicle Fiber-Optics and Photonics Conf., paper WD2 (2019).

7. H. J. Cho, S. Varughese, D. Lippiatt, R. DeSalvo, S. Tibuleac, and S. E. Ralph, “Optical performance monitoring using digital coherent receivers and convolutional neural networks,” Opt. Express 28(21), 32087–32104 (2020). [CrossRef]

8. R. Dar and P. J. Winzer, “Nonlinear interference mitigation: methods and potential gain,” J. Lightwave Technol. 35(4), 1 (2017). [CrossRef]

9. J. Pan, C. Liu, T. Detwiler, A. J. Stark, Y. Hsueh, and S. E. Ralph, “Inter-channel crosstalk cancellation for Nyquist-WDM superchannel applications,” J. Lightwave Technol. 30(24), 3993–3999 (2012). [CrossRef]

10. F. Caballero, D. Ives, Q. Zhuge, M. O’Sullivan, and S. J. Savory, “Joint estimation of linear and non-linear signal-to-noise ratio based on neural networks,” in Proc. Optical Fiber Communication Conference and Exhibition, paper M2F.4 (2018).

11. M. Mussolin, D. Rafique, J. Mårtensson, M. Forzati, J. K. Fischer, L. Molle, M. Nölle, C. Schubert, and A. D. Ellis, “Polarization multiplexed 224 Gb/s 16QAM transmission employing digital back-propagation,” in Proc. European Conference and Exhibition on Optical Communication, paper We.8.B.6 (2011).

12. T. T. Nguyen, T. Zhang, E. Giacoumidis, A. A. I. Ali, M. Tan, P. Harper, L. P. Barry, and A. D. Ellis, “Coupled transceiver-fiber nonlinearity compensation based on machine Learning for probabilistic shaping system,” J. Lightwave Technol. 39(2), 388–399 (2021). [CrossRef]

13. F. N. Khan, K. Zhong, X. Zhou, W. H. Al-Arashi, C. Yu, C. Lu, and A. P. T. Lau, “Joint OSNR monitoring and modulation format identification in digital coherent receivers using deep neural networks,” Opt. Express 25(15), 17767–17776 (2017). [CrossRef]

14. L. Xia, J. Zhang, S. Hu, M. Zhu, Y. Song, and K. Qiu, “Transfer learning assisted deep neural network for OSNR estimation,” Opt. Express 27(14), 19398–19406 (2019). [CrossRef]

15. Z. Wan, Z. Yu, L. Shu, Y. Zhao, H. Zhang, and K. Xu, “Intelligent optical performance monitor using multi-task learning based artificial neural network,” Opt. Express 27(8), 11281–11291 (2019). [CrossRef]

16. A. S. Kashi, Q. Zhuge, and J. C. Cartledge, “Fiber nonlinear noise-to-signal ratio monitoring using artificial neural networks,” in Proc. European Conference and Exhibition on Optical Communication, paper M.2.F.2 (2017).

17. K. Zhang, Y. Fan, T. Ye, Z. Tao, S. Oda, T. Tanimura, Y. Akiyama, and T. Hoshida, “Fiber nonlinear noise-to-signal ratio estimation by machine learning,” in Proc. Optical Fiber Communication Conference and Exhibition, paper Th2A.45 (2019).

18. R. C. Gonzalez, “Deep convolutional neural networks,” IEEE Signal Process. Mag. 35(6), 79–87 (2018). [CrossRef]

19. W. Liu, C. Qin, K. Gao, H. Li, Z. Qin, and Y. Cao, “Research on medical data feature extraction and intelligent recognition technology based on convolutional neural network,” IEEE Access 7, 150157–150167 (2019). [CrossRef]

20. D. Zibar, O. Winther, N. Franceschi, R. Borkowski, A. Caballero, V. Arlunno, M. N. Schmidt, N. G. Gonzales, B. Mao, Y. Ye, K. J. Larsen, and I. T. Monroy, “Nonlinear impairment compensation using expectation maximization for dispersion managed and unmanaged PDM 16-QAM transmission,” Opt. Express 20(26), B181–B196 (2012). [CrossRef]

21. R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Properties of nonlinear noise in long, dispersion-uncompensated fiber links,” Opt. Express 21(22), 25685–25699 (2013). [CrossRef]

22. J. Jiang, X. Feng, F. Liu, Y. Xu, and H. Huang, “Multi-spectral RGB-NIR image classification using double-channel CNN,” IEEE Access 7, 20607–20613 (2019). [CrossRef]

23. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” In Proc. International Conference on Machine Learning, 37, pp. 448–456 (2015).

24. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” In Proc. International Conference on Artificial Intelligence and Statistics, 9, pp. 249 - 256 (2010).

25. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: a review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). [CrossRef]

26. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

27. J. Feng and S. Lu, “Performance analysis of various activation functions in artificial neural networks,” J. Phys.: Conf. Ser. 1237(2), 1–6 (2019). [CrossRef]

28. T. Richter, J. Pan, and S. Tibuleac, “Comparison of WDM bandwidth loading using individual transponders, shaped, and flat ASE noise,” in Proc. Optical Fiber Communication Conference and Exhibition, paper W1B.2 (2018).

29. S. Varughese, J. Langston, V. A. Thomas, S. Tibuleac, and S. E. Ralph, “Frequency dependent ENoB requirements for M-QAM optical links: an analysis using an improved digital to analog converter model,” J. Lightwave Technol. 36(18), 4082–4089 (2018). [CrossRef]

30. A. J. Stark, Y.-T. Hsueh, S. Searcy, T. Detwiler, C. Liu, M. Filer, S. Tibuleac, G. K. Chang, and S. E. Ralph, “Scaling 112 Gb/s optical networks with the nonlinear threshold metric,” J. Lightwave Technol. 30(9), 1291–1298 (2012). [CrossRef]

31. R. Hui, C. Laperle, D. Charlton, and M. O’Sullivan, “Estimating System OSNR With a Digital Coherent Transceiver,” IEEE Photonics Technol. Lett. 33(14), 743–746 (2021). [CrossRef]

32. B. Zhu, H. Zhang, P. I. Borel, T. Geisler, R. Jensen, M. Stegmaier, B. Palsdottir, D. W. Peckham, R. Lingle, D. Vaidya, M. F. Yan, P. W. Wisk, and D. J. DiGiovanni, “200 km Repeater length transmission of real-time processed 21.2Tb/s (106×200Gb/s) over 1200 km fibre,” in Proc. European Conference and Exhibition on Optical Communication, pp. 1–3 (2019).

33. Z. Wang, A. Yang, P. Guo, and P. He, “OSNR and nonlinear noise power estimation for optical fiber communication systems using LSTM based deep learning technique,” Opt. Express 26(16), 21346–21357 (2018). [CrossRef]

34. P. Minzioni, V. Pusino, I. Cristiani, L. Marazzi, M. Martinelli, and V. Degiorgio, “Study of the Gordon-Mollenauer effect and of the optical-phase-conjugation compensation method in phase-modulated optical communication systems,” IEEE Photonics J. 2(3), 284–291 (2010). [CrossRef]

35. G. Bosco, A. Carena, R. Cigliutti, V. Curri, P. Poggiolini, and F. Forghieri, “Performance prediction for WDM PM-QPSK transmission over uncompensated links,” in Proc. Optical Fiber Communication Conference and Exhibition, paper OThO7 (2011).

36. E. Torrengo, R. Cigliutti, G. Bosco, A. Carena, V. Curri, P. Poggiolini, A. Nespola, D. Zeolla, and F. Forghieri, “Experimental validation of an analytical model for nonlinear propagation in uncompensated optical links,” Opt. Express 19(26), B790–B798 (2011). [CrossRef]

Constellation-based identification of linear and nonlinear OSNR using machine learning: a study of link-agnostic performance

Abstract

1. Introduction

2. Methodology

3. Experiment setup

3.1 DWDM setup

3.2 OSNR_NL extraction for training

4. ML estimation results

4.1 ML performance for training and testing on the same link

4.2 ML performance with reduced training set

4.3 Assessing the limits of universality: ML performance for cross-training

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (19)

Equations (2)

Optics Express

Abstract

1. Introduction

2. Methodology

3. Experiment setup

3.1 DWDM setup

3.2 OSNRNL extraction for training

4. ML estimation results

4.1 ML performance for training and testing on the same link

4.2 ML performance with reduced training set

4.3 Assessing the limits of universality: ML performance for cross-training

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (19)

Equations (2)

Optics Express

3.2 OSNR_NL extraction for training