Optical performance monitoring using lifelong learning with confrontational knowledge distillation in 7-core fiber for elastic optical networks

Xu Zhu; Xu Zhu; Xu Zhu; Bo Liu; Bo Liu; Bo Liu; Jianxin Ren; Jianxin Ren; Jianxin Ren; Xiaorong Zhu; Xiaorong Zhu; Xiaorong Zhu; Yaya Mao; Yaya Mao; Yaya Mao; Xiangyu Wu; Xiangyu Wu; Xiangyu Wu; Ying Li; Ying Li; Ying Li; Yongfeng Wu; Yongfeng Wu; Yongfeng Wu; Lilong Zhao; Lilong Zhao; Lilong Zhao; Tingting Sun; Tingting Sun; Tingting Sun; Rahat Ullah; Rahat Ullah; Rahat Ullah; Yunyun Chen; Yunyun Chen; Yunyun Chen

doi:10.1364/OE.463490

1. Introduction

The growing demand for 5G, Internet of Things, cloud services, and virtual reality will lead to higher capacity in the future optical networks [1]. Over the past few years, the optical transmission system has enabled the rapid growth of data traffic in the network backbone. The drive for greater bandwidth and lower cost has led to the rapid technological evolution of optical communications. The next-generation optical communication systems are envisioned to be more elastic and flexible, which could adjust the modulation format, symbol rate, signal power, and signal bandwidth according to the network resources and channel conditions adaptively [2,3]. Accompanying the spectral efficiency (SE) in single-mode fiber (SMF) based optical transmission systems approaching the Shannon capacity limit, the spatial division multiplexing (SDM) technology has been proposed as an alternative solution to meet the increasing demand for heavy data traffic [4]. Under such circumstances, optical performance monitoring (OPM) technology is found to be a significant tool for facilitating the management of future SDM-based elastic optical networks (EONs). OPM technology is a methodology to measure the states of the physical layer by obtaining the transmission link information like the optical signal-to-noise ratio (OSNR) of the transmission link or the modulation format of the received signals, which aims to provide accurate information on the current statement in each transmission channel. Then, the monitored parameters are fed back to the controller layer of the optical network and used as inputs to the optical network planning tools for dynamic network optimization [5]. The intention of OPM is to monitor the signal’s quality and predict failures at low cost and with low disturbance. However, overcomplicated OPM schemes for every transmission link in SDM-EONs generally undergo huge computational costs and memory consumption and are difficult to be embedded into resource-constrained devices, such as intermediate nodes or receiving ends. It has been shown that the reduction of computation resources in OPM is crucial. In order to assure cost-effective operation, the next-generation optical network should intelligently monitor the physical state in each channel at the least possible computational cost.

Various OPM methods have been proposed to monitor the physical state of the transmission channels. OPM can be integrated into the intermediate nodes as a standalone device or as a set of algorithms running in the devices such as the optical transceivers [5]. It is not difficult to implement when the OPM is associated with the resourceful coherent transceivers, but it would be expensive to mitigate the same OPM schemes to the intermediate nodes. For OPM in intermediate nodes, the tasks of jointly estimating OSNR and recognizing modulation format are considerably more challenging than those in the optical transceivers, especially under the constraint of cost efficiency [5]. Conventional data-aided approaches to the OPM scheme may require high-speed analog to digital converters (ADCs) or sacrifice the SE owing to the data overhead of training sequences [6–8]. Furthermore, some monitoring technologies often show great performance along with high computational complexity [9–11]. Aside from those traditional OPM schemes, machine learning (ML) algorithms offer powerful tools to monitor the performance of the optical networks [12] and show intensive impacts on modulation formation identification in SDM-based optical networks [13]. ML methods are well known for performing exceptionally well in scenarios in which the data set of the problem is fairly big. Nevertheless, monitoring signals of all the channels in SDM-EONs with limited computational resources is intractable via ML methods. Often, OPM is cost-limited so one can only employ a simple resource-limited device and acquire partial signal features to monitor different channel parameters such as OSNR and modulation format [1]. Moreover, conventional machine learning algorithms are no longer applicable to multiplex transmission channels in the dynamic environment within low computational cost, because most of them merely focus on the current monitor task within and lack adaptability to the multiple transmission channels. Once the neural network is displayed in the intermediate node or receiving end, the monitoring scheme is no longer scalable. There is obviously a vacancy between deploying one neural network that monitors all transmission channels and managing the computational resource of receiving end or intermediate node reasonably.

To realize monitoring the signals of all channels in SDM-EONs with low computational resources, we adopt lifelong learning to perform the OPM tasks of all channels via one neural network. Besides, adversarial learning-assisted knowledge distillation is proposed to simplify the architecture of neural network for OPM tasks. Furthermore, the extracted features of the received signals in the proposed OPM scheme are two Stokes specific sectional images in Poincaré sphere, avoiding the influence of polarization mixing, carrier phase noise, and frequency offset. The traditional deep learning-based OPM method needs several neural network models for each task in different channels with different transmission conditions, which means the computational resource is heavy. Hence, lifelong learning is applied to learn multiple OPM tasks of all channels sequentially rather than treating all channels data as a training set at one time, which means the neural network is capable of expanding to more OPM tasks [14]. When faced with a new monitoring task, the neural network with the help of lifelong learning uses the relevant knowledge gained in the past OPM tasks to help the current OPM task and the neural network is well-trained after all the monitoring tasks. In this way, one well-trained neural network that could execute the OPM tasks of all the channels can be obtained. Besides, to reduce resource consumption while maintaining excellent performance as much as possible, the well-trained neural network through lifelong learning is used as the teacher model to transfer informative knowledge into a lightweight and unskilled student network. Notably, during this process, adversarial learning-assisted knowledge distillation is utilized to realize the decrease of computational resources and promote the performance of the student model. Due to the confrontational architecture, the lightweight neural network after knowledge distillation can achieve excellent performance in OPM tasks than the conventional distillation scheme. The investigation gives a hint toward estimating ONSR and recognizing modulation format of all the channels with one lightweight neural network.

2. Principle

2.1 Generation of specific Stokes sectional images

In the DSP configuration for digital coherent receiver, the modulation format independent algorithms, including chromatic dispersion compensation and timing recovery, are utilized firstly to compensate the linear transmission impairments. Then, the processed signals are mapped into Stokes space to generate Stokes sectional images for the following monitoring tasks. The generated specific sectional images in Stokes space are served as the input data of the neural network, thus, the modulation format reorganization (MFR) part of the proposed scheme is insensitive to polarization mixing, carrier phase noise, and frequency offset [15]. The processed signals are mapped into Stokes space via formula (1),

(1)$$S = \left( \begin{array}{c} {S_0}\\ {S_1}\\ {S_2}\\ {S_3} \end{array} \right) = \left( \begin{array}{c} {e_x}e_x^\ast{+} {e_y}e_y^\ast \\ {e_x}e_x^\ast{-} {e_y}e_y^\ast \\ e_x^\ast {e_y} + {e_x}e_y^\ast \\ - je_x^\ast {e_y} + j{e_x}e_y^\ast \end{array} \right) = \left( \begin{array}{c} a_x^2 + a_y^2\\ a_x^2 - a_y^2\\ 2{a_x}{a_y}\cos \theta \\ 2{a_x}{a_y}\sin \theta \end{array} \right)$$

where e_x and e_y are PDM complex signals after algorithms. Superscript “*” is the conjugation operation. a_x and a_y are the amplitudes of the complex signals. θ is the phase difference between e_x and e_y; S₀ is the total power, while S₁, S₂, and S₃ denote the Stokes parameters [15]. After that, power normalization is carried out to normalize the four-dimensional Stokes vector S according to formula (2):

(2)$$S_1^{\prime} = \frac{{{S_1}}}{{{S_0}}},S_2^{\prime} = \frac{{{S_2}}}{{{S_0}}},S_3^{\prime} = \frac{{{S_3}}}{{{S_0}}}$$

As is depicted in Table 1, the constellation points in the 2-D constellation diagram of different modulation formats render different clustering distributions in Stokes space, which demonstrates the fact that the Stokes sectional images can incarnate the information of different modulation formats. Moreover, the m-PSK signals have only one distribution plane, the S₂-S₃ plane, while the m-QAM signals have multiple distribution planes owing to the fact that m-QAM signals contain not only phase information but also amplitude information [16]. One of the sectional images is a longitudinal S₁-S₂ sectional image reflecting the signal amplitude information while the other transverse S₂-S₃ sectional image shows the phase information of the optical signals. After power normalization and Stokes mapping, we intercept two specific sectional planes in Poincaré sphere with the corresponding OSNR label as the input data of the neural network. A sample of the signal can be transformed into an image with 64 × 64 pixels. The region of each cross-section is divided into a sub-region grid of 64 rows and 64 columns, and the number of clustering points under each sub-region is calculated. Then, the number of constellation points in each subregion is normalized by dividing by the maximum number of all subregions. Finally, we take the normalized value of each subregion as the gray value of the corresponding pixels in the image with a size of 64 × 64 to get the feature images. Two images are combined so the dimension of input data is 64 × 128.

Table 1. Stokes sectional images of different modulation format signals

View Table

2.2 Lifelong learning for all OPM tasks in SDM-EONs

Multifarious machine learning methods have been applied to solve the OPM tasks for SDM-based optical networks, such as transfer learning (TL) [17] and artificial neural network (ANN) [18]. Nevertheless, TL merely focuses on the tasks in the target domain and the neural network after knowledge transferred cannot show the same performance on the source domain as before. A majority of neural network models rely on the labeled training samples currently. Once the condition of the channel has changed or switching to the OPM task of a new channel, network parameters need to be retrained in the whole dataset to adapt to the changes in data distribution, resulting in the performance of the previously learned task being reduced by learning the new OPM task [14]. The traditional approach for these tasks is to fine-tune the base neural network with training data for new tasks. However, this methodology suffers from catastrophic forgetting notoriously [19]. The phenomenon usually leads to severe performance of the neural network drop on old tasks and the alternative means, training from scratch with the combination of old and new data, may require plenty of computation and storage resources on old dataset.

In order to accomplish the OPM tasks of 7-core fiber with limited computational cost, lifelong learning technology is used in the proposed scheme during the process of training the neural network to address the catastrophic forgetting problem. For the OPM field, lifelong learning is a prospective methodology to handle multiple monitoring tasks in SDM-EONs. Lifelong learning is a technique that uses only one neural network structure and trains it on different tasks to make the network competent for all the tasks. With the advantage of lifelong learning, the proposed OPM scheme can realize incremental learning and show excellent scalability in OPM tasks of SDM-EONs. In other words, the neural network, like the human brain, can continuously learn the OPM tasks of the new channel without forgetting the OPM task of the previous channels, solving the problem of catastrophic forgetting in machine learning [14].

Generally speaking, deep neural network model is composed of multilayer linear projection and nonlinearity elements. In order to optimize the performance, the neural network need to adjust the set of weights and biases of the linear elements during the learning process. After training a task, the weight and biases of the neural network are fixed. When faced with a new task, the neural network only modifies the weights that have little influence on the previous task, and do not modify the weights that have great influence on the previous task, which is called Elastic Weight Consolidation (EWC) [14]. Lifelong learning can be realized through EWC, which allows knowledge of previous tasks to be protected during new learning by selectively decreasing the plasticity of weights. Notably, the network parameters are tempered by a prior distribution which is the posterior distribution of the parameters given data from the previous tasks. This enables fast learning rates on parameters that are poorly constrained by the previous tasks and slow learning rates for those that are crucial. Low computational complexity can be obtained via using a crude Laplace approximation to the true posterior distribution of the parameters on account the run time of EWC is linear in both the number of parameters and the number of training examples [14]. Given this approximation, the function ${\textrm{L}^{\prime}}(\theta )$ that we minimize in EWC is

(3)$${\textrm{L}^{\prime}}(\theta ) = L(\theta ) + \lambda \sum\limits_i {{b_i}{{({\theta _i} - \theta _i^b)}^2}}$$

where $L(\theta )$ is the loss for current task only, $\lambda$ sets how important the old task is compared with the new one, and i labels each parameter.

In order to prove the feasibility of the EWC method, we carry out test experiments on the data of channel 1 to channel 7 in 7-core fiber. All the samples in the dataset were collected by oscilloscope one after another. Stokes sectional images of each channel with the corresponding OSNR labels are set as the training dataset to obtain a neural network that can complete OPM tasks on channel 1 firstly. Then, we trained the neural network via plain stochastic gradient descent (SGD), L2 regularization, and EWC, respectively. When the neural network fine-tune the parameters according to the sectional images of new channel, the significance of the parameters is determined by the likelihood loss at the end of each training stage. It can be considered as the more important regularizing parameters if the parameters have less discrepancy for old OPM tasks in the previous channel [19]. As demonstrated in Fig. 1, the neural network trained with EWC method has better performance on all the channels, while the neural network trained on this sequence of tasks with SGD or L2 regularization incurs catastrophic forgetting. With the increase in the task number, the neural network trained with SGD hardly maintains a high performance on old tasks while retaining the ability to learn new tasks. Namely, the neural networks trained with SGD or L2 regularization failed to learn the new OPM task without forgetting the OPM task in the previous channel. The experimental results demonstrate the fact that EWC can adaptively constrain each parameter of the new neural network model not to deviate much from its counterpart in the previous neural network model during fine-tuning of new OPM tasks datasets in a new channel, according to its important weight for previous OPM tasks.

Fig. 1. Training curves of across OPM tasks in all channels, using between SGD, L2 and EWC in 7-core fiber.

Download Full Size | PDF

2.3 Adversarial learning assisted knowledge distillation

In addition to using EWC to save the computational resources and protect previously acquired knowledge, we also propose the confrontational knowledge distillation, a novel framework with a cooperative dual model architecture consisting of a generator and a discriminator. Knowledge distillation (KD) is a high-efficiency technology for transferring knowledge from a sophisticated teacher network to a lightweight student network, aiming to reduce the computational cost required for prediction without sacrificing prediction performance [20]. Furthermore, the computational resources can be saved ulteriorly with the proposed antagonistic cooperation models introduced by adversarial learning [21]. As an effective teaching strategy, adversarial learning is embedded into the knowledge distillation architecture, in which not only the teacher model and generator provide knowledge, but also the discriminator gives constructive feedback to students upon their outputs, to improve their learning performance.

There are two main steps in the adversarial learning-assisted knowledge distillation stage. The first step is the offline training of the heavyweight neural network model, which is called the teacher model. In this work, the teacher model is the foregoing neural network trained with lifelong learning, which has a larger structure than the student model due to the need to handle multiple channel monitoring tasks. The Stokes sectional images with OSNR labels of the received signals in each channel have been collected for the training of the teacher model. And the trained teacher model is capable of recognizing the modulation format and estimating the OSNR values with sufficient training samples in preparation for the knowledge transfer to the student model. After the offline training, the knowledge from the teacher model is extracted in the form of logits, guiding the training of the student model during the distillation procedure [22]. A serviceable student model can be obtained under the guidance of the teacher model to gain considerable accuracy. Supervisory signals from the teacher model can help the student model imitate the behavior of the teacher model. As a result, the performance of the student model relies on the teacher model to a great extent.

Since the feature learning ability of the student model is less mighty than that of the teacher model, it is impossible that the knowledge that the student model learned in a conventional way is identical to the knowledge provided by the teacher model. Therefore, a novel framework of confrontational knowledge distillation has been proposed to improve the performance of the student model. In the second step, feature-based knowledge is transferred from the well-trained teacher model to the lightweight student model through offline distillation while adversarial learning-assisted knowledge distillation is appended, as shown in Fig. 2. Apart from the conventional knowledge distillation method, the idea of adversarial learning is introduced to the teacher-student architecture. The proposed OPM scheme adds a generator to the teacher-student structure to augment the dataset for the student model, while a discriminator is also applied to estimate the output of the neural network. The generator guided by the teacher model produces training data for the student model to improve knowledge distillation performance, which encourages the outputs of the student model come closer to the true data distribution. In the meanwhile, the discriminator distinguishes the outputs between the student model and the teacher model, so as to narrow the distribution difference between the outputs of two neural networks. More specifically, the student model can learn more about the OPM tasks except the knowledge distilled from the teacher model because of the adversarial learning. The lightweight student model can achieve effective performance through capturing the teacher’s knowledge and using distillation strategies with adversarial learning. In addition to their ability to model natural images, the structure of adversarial learning is successfully adapted for improving the accuracy of the estimation and recognition in knowledge distillation, typically by extending the discriminator to determine the specific output of an example instead of determining whether it is real or fake. Compared with traditional knowledge distillation, not only the knowledge transferred from the teacher model can be learned, but also the student model can have a better understanding of the true data distribution.

Fig. 2. The teacher-student architecture of the adversarial learning-based knowledge distillation.

Download Full Size | PDF

Take ResNet34 as the backbone of the teacher model, above the bottleneck layer, a new FC layer is created as the head, and we add the new hidden layer as the neck in order to adapt to the new output. The student model we selected is MobileNet V3-small, which has twelve bottleneck layers, one standard convolution layer, and two pointwise convolution layers. Both the generator and the discriminator have six neural layers, where 256 neurons are in the input layer of generator G and six neurons in the output layer of discriminator D. In generator G, there are 288, 256, 64, and 8 neurons in the hidden layers, respectively. In discriminator D, there are 256, 256, 64, and 8 neurons in the hidden layers, respectively. The activation function of the generator and discriminator is Leaky ReLU, which are trained by a controlled learning rate by Adam optimizer. The learning rates of two neural networks are both 0.0001 and warm-up technology is used in the training process.

The knowledge distilled by teacher model is known as soft targets, which represents the probabilities that the input belongs to the categories and can be estimated by a softmax function as:

(4)$$\textrm{P}({Z_i},T) = \frac{{\textrm{exp} ({Z_i}/T)}}{{\sum\nolimits_j {\textrm{exp} ({Z_j}/T)} }}$$

where the Z_i is the logit for the i-th channel, and the temperature factor T in (4) is used to control the importance of each soft target [21]. Soft targets are produced by raising the temperature of the softmax function on the output from the teacher model, with the aim of restoring the similarities between the data implicitly [23]. The informative dark knowledge from the heavyweight teacher model is incorporated by the soft targets [21]. By increasing the temperature factor T, the inputs to the final softmax, known as logits, contain richer information than traditional one-hot labels. Guided by this softened knowledge, the student model could pay more attention to extra supervision, for example, the probability correlation between different modulation formats. In addition, the adversarial structure could reduce the training cost of the student network at the receiving ends by the augment of dataset. And the generalization ability of the student model could be further improved when achieving Nash equilibrium. Dark knowledge learned by complex and bulky teacher model can be transferred to lightweight and flexible student model, so as to enhance the generalization ability of student models and complete the optical performance monitoring task of network nodes. By doing this, a lightweight student model can be deployed in optical network nodes or receiving end to realize OPM tasks. The MFR task can be regarded as a classification problem and the OSNR estimation task can be considered as a regression problem. Two specific Stokes sectional images are fed into the student model for features extraction, which provides information about the modulation format and the OSNR of the transmission link. In consequence, the student model after knowledge distillation is able to handle the extracted features of received signals in all cores and then produce the prediction OPM results.

3. Experimental setup

To demonstrate the feasibility and practicality of the proposed scheme, we conducted an experiment in a coherent system based on a 2 km weakly coupled 7-core fiber, as shown in Fig. 3. At the transmitter side, four channels 12.5 GSa/s signals generated by an arbitrary waveform generator (AWG, TekAWG70002A) were fed into the IQ modulator. The wavelength of the light source used in the transmitter was 1550 nm at 14.5 dBm. An erbium-doped fiber amplifier (EDFA) was used to amplify the modulated signals and a variable optical attenuator (VOA) adjusted the OSNR of the transmission system. Then, the amplified signals were divided into seven parts by a power splitter and each part was connected with different delay line. A fan-in device concatenated the signals from seven standard single-mode fibers with 7-core fiber. The length of the weakly coupled 7-core fiber was 2 km and the 7-core channels are spatially demultiplexed into single-mode fiber by a fan-out device. At the receiver side, an optical band-pass filter (OBPF) was utilized in front of the receiver to reject the out-of-band ASE noise. Subsequently, the signals were detected by a coherent receiver and sampled by a mixed signal oscilloscope (Tektronix, MSO73304DX) with a sampling rate of 50 GSa/s. Finally, the samples were processed by offline DSP including the proposed MFR and OSNR estimation scheme. The results of MFR and OSNR estimation were given by the trained neural network. The parameters of the neural networks were trained offline on the PyTorch platform using NVIDIA GeForce RTX 2060. The experimental data set contains 70000 samples, which are randomly divided into 70%, 20%, and 10% for training, validation, and testing, respectively. Each core contains 10000 sets of data. K-fold cross validation was applied where k is set to 4, and the hyperparameters for neural network training are selected accordingly.

Fig. 3. Experimental Setup (AWG: arbitrary waveform generator; EDFA: erbium-doped fiber amplifier; VOA: variable optical attenuator; OBPF: optical bandpass filter; LO: local oscillator; OC: optical coupler; MSO: mixed signal oscilloscope; MFR: modulation format recognition; OSNR: optical-signal-noise ratio).

Download Full Size | PDF

4. Results and discussion

Since the OPM tasks are based on the Stokes sectional images, several classical neural networks with outstanding performance in the field of image processing, including VGG16, GoogleNet, MobileNet V1, and MobileNet V2, were selected to demonstrate the comparative advantage of the proposed scheme. Same OPM tasks were carried out via these networks on the test dataset collected from core 1. In this paper, computing resources required by neural networks are quantified by computing time and the number of network parameters. In terms of the actual deployment, there is a positive correlation between computation time and neural network parameters. Large-scale neural networks often have more parameters to optimize, so they need more computing time and resources to complete training. As depicted in Fig. 4, the computing time of the proposed scheme was only 20.3 ms while that of VGG16, GoogleNet, MobileNet V1, and MobileNet V2 was 257 ms, 156 ms, 112 ms, and 78 ms, respectively. Furthermore, in comparison to the classical neural networks, the proposed scheme has the least number of neural network parameters. Although the student model has fewer parameters and requires less computation time, the accuracy of the student model is maintained at an ideal level.

Fig. 4. The comparison between the VGG16, GoogleNet, MobileNet V1, MobileNet V3, the proposed scheme under the same dataset.

Download Full Size | PDF

In order to verify the performance of the proposed scheme for OPM in SDM-EONs, we test the MFR and OSNR estimation capability of the proposed scheme on the test dataset respectively. Firstly, the modulation format of the transmitted signals was fixed as 16QAM and the OSNR of each channel was 20 dB. Relevant recognition results of the proposed scheme are shown in Fig. 5. Figure 5 illustrates the recognition performance of the proposed OPM scheme in all channels. From Fig. 5, it can be observed that the proposed scheme could achieve 100% MFR success rate for the MFR task in each core. Due to the impact of manufacturing technology and core crosstalk, the transmission status of each channel is different. Crosstalk may be caused by coupling between channels. When crosstalk occurs between different channels, the most intuitive reflection is that the features in Stokes sectional images may not be obvious or even obscure. However, all the samples could be recognized with the increase of epochs. Then we fix the transmission channel as core 1 and the OSNR of each channel was 20 dB. Interrelated recognition results are shown in Fig. 6. In Fig. 6, the results were also derived from the data collected in core 1. Figure 6 illustrates the fact that although the high-order modulation format requires more epochs to achieve 100% recognition success rate, all the five modulation formats can be recognized correctly within 100 epochs.

Fig. 5. The recognition accuracy at different training epochs in each channel.

Download Full Size | PDF

Fig. 6. The recognition accuracy at different training epochs for five modulation formats.

Download Full Size | PDF

Similarly, the OSNR estimation capability of the proposed scheme was also tested to demonstrate the superiority of our proposed method on the test dataset. In additional, the OSNR values in the experiment were set from 10 dB to 30 dB by the EDFA and VOA. The modulation format of the signals in Fig. 7 and Fig. 8 were 16QAM. Figure 7 shows the actual value curve and the predicted value curve of channel OSNR. It is obvious that the predicted value is basically in line with the actual value, and a few errors are within the actual acceptable range. Moreover, the RMSE of OSNR estimation vs required training epochs for student model is below 0.1 dB within 30 training epochs, whereas the teacher model needs around 90 training epochs to achieve the same performance as shown in Fig. 8. The results in Fig. 7 and Fig. 8 are based on the data from all the cores. The performance of the student model shows a comparable estimation error in comparison with the teacher model according to the experimental results. With an adversarial learning-based knowledge distillation, the student model can obtain stable modulation format recognition accuracy and small OSNR estimation error.

Fig. 7. The results of ONSR estimation for actual OSNR vs. estimated OSNR.

Download Full Size | PDF

Fig. 8. The comparison of OSNR estimation performance between student model and teacher model.

Download Full Size | PDF

5. Conclusion

This paper proposed a novel optical performance monitoring scheme for SDM-EONs. Lifelong learning was utilized to achieve OPM tasks for all channels with one single neural network instead of using multiple neural network models to complete OPM tasks in each channel. Additionally, adversarial learning was also introduced in the architecture of knowledge distillation to save the computational resource further. To verify the feasibility of the proposed OPM scheme, a coherent transmission over 2 km 7-core fiber is experimentally demonstrated. BPSK, QPSK, 8PSK, 8QAM, and 16QAM are taken into consideration for modulation format recognition in the experiment. Compared with other conventional neural networks, the proposed scheme has a lightweight structure that needs fewer computational resources to achieve high OPM performance. Experimental results indicate that 100% modulation format recognition success rate can be achieved and the RMSE of the OSNR estimation value is below 0.1 dB. Therefore, with the attractive OPM performance, the proposed OPM scheme is a competitive candidate for the next-generation SDM-EONs.

Funding

National Key Research and Development Program of China (No. 2021YFB2800903); National Natural Science Foundation of China (No. 61727817, 61935005, 62171227, U2001601, 62035018, 61875248, 61835005, 61935011, 61720106015, 61975084); Jiangsu team of innovation and entrepreneurship; The Startup Foundation for Introducing Talent of NUIST.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Z. Dong, F. N. Khan, Q. Sui, K. Zhong, C. Lu, and A. P. T. Lau, “Optical performance monitoring: a review of current and future technologies,” J. Lightwave Technol. 34(2), 525–543 (2016). [CrossRef]

2. Z. Zhu, W. Lu, L. Zhang, and N. Ansari, “Dynamic Service Provisioning in Elastic Optical Networks with Hybrid Single-/Multi-Path Routing,” J. Lightwave Technol. 31(1), 15–22 (2013). [CrossRef]

3. L. Gong and Z. Zhu, “Virtual Optical Network Embedding (VONE) over Elastic Optical Networks,” J. Lightwave Technol. 32(3), 450–460 (2014). [CrossRef]

4. W. Klaus, J. Sakaguchi, B. J. Puttnam, Y. Awaji, and N. Wada, “Optical technologies for space division multiplexing,” in Proc. 13thWorkshop Inf. Opt., Neuchatel, 2014, pp. 1–3.

5. D. Wang, H. Jiang, G. Liang, Q. Zhan, Y. Mo, Q. Sui, and Z. Li, “Optical Performance Monitoring of Multiple Parameters in Future Optical Networks,” J. Lightwave Technol. 39(12), 3792–3800 (2021). [CrossRef]

6. C. Do, A. V. Tran, C. Zhu, D. Hewitt, and E. Skafidas, “Data-aided OSNR estimation for QPSKand16-QAMcoherent optical system,” IEEE Photonics J. 5(5), 6601609(2013). [CrossRef]

7. C. C. Do, C. Zhu, and A. V. Tran, “Data-aided OSNR estimation using low-bandwidth coherent receivers,” IEEE Photonics Technol. Lett. 26(13), 1291–1294 (2014). [CrossRef]

8. F. Wu, P. Guo, A. Yang, and Y. Qiao, “Chromatic dispersion estimation based on CAZAC sequence for optical fiber communication systems,” IEEE Access 7(139), 139388–139393 (2019). [CrossRef]

9. D. Tang, X. Wang, L. Zhuang, P. Guo, A. Yang, and Y. Qiao, “Delay-tap sampling-based chromatic dispersion estimation method with ultra-Low sampling rate for optical fiber communication systems,” IEEE Access 8, 101004–101013 (2020). [CrossRef]

10. J. Mata, I. Miguel, R. Duran, N. Merayo, S. Singh, A. Jukan, and M. Chamania, “Artificial intelligence (AI) methods in optical networks: A comprehensive survey,” Opt. Switching Netw. 28, 43–57 (2018). [CrossRef]

11. F. N. Khan, Q. Fan, C. Lu, and A. P. T. Lau, “An optical communication’s perspective on machine learning and its applications,” J. Lightwave Technol. 37(2), 493–516 (2019). [CrossRef]

12. W. Saif, A. Ragheb, T. Alshawi, and S. Alshebeili, “Optical performance Monitoring in Mode Division Multiplexed Optical Networks,” J. Lightwave Technol. 39(2), 491–504 (2021). [CrossRef]

13. W. Saif, A. Ragheb, H. Seleem, T. Alshawi, and S. Alshebeili, “Modulation Format Identification in Mode Division Multiplexed Optical Networks,” IEEE Access 7, 156207–156216 (2019). [CrossRef]

14. J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,” Proc. Natl. Acad. Sci. U.S.A. 114(13), 3521–3526 (2017). [CrossRef]

15. B. Szafraniec, B. Nebendahl, and T. Marshall, “Polarization demultiplexing in Stokes space,” Opt. Express 18(17), 17928–17939 (2010). [CrossRef]

16. X. Zhu, B. Liu, X. Zhu, J. Ren, Y. Mao, S. Han, S. Chen, M. Li, F. Tian, Z. Guo, and Y. Chen, “Multiple Stokes sectional plane image based modulation format recognition with a generative adversarial network,” Opt. Express 29(20), 31836–31848 (2021). [CrossRef]

17. L. Xia, J. Zhang, S. Hu, M. Zhu, Y. Song, and K. Qiu, “Transfer learning assisted deep neural network for OSNR estimation,” Opt. Express 27(14), 19398–19406 (2019). [CrossRef]

18. Q. Zhang, H. Zhou, M. Liu, J. Chen, and J. Zhang, “A simple artificial neural network based joint modulation format identification and OSNR monitoring algorithm for elastic optical networks,” IEEE ACP. Su2A, 1–3 (2018). [CrossRef]

19. L. Liu, Z. Kuang, J. Xue, W. Yang, and W. Zhang, “IncDet: In Defense of Elastic Weight Consolidation for Incremental Object Detection,” IEEE Trans. Neural Netw. Learning Syst. 32(6), 2306–2319 (2021). [CrossRef]

20. C. Tan and J. Liu, “Online knowledge distillation with elastic peer,” Inf. Sci. 583, 1–13 (2022). [CrossRef]

21. J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge Distillation: A Survey,” Int J Comput Vis 129(6), 1789–1819 (2021). [CrossRef]

22. M. Kang and S. Kang, “Data-free knowledge distillation in neural networks for regression,” Expert Systems with Applications 175, 114813 (2021). [CrossRef]

23. M. Tzelepi, N. Passalis, and A. Tefas, “Online Subclass Knowledge Distillation,” Expert Systems with Applications 181, 115132 (2021). [CrossRef]

Optical performance monitoring using lifelong learning with confrontational knowledge distillation in 7-core fiber for elastic optical networks

Abstract

1. Introduction

2. Principle

2.1 Generation of specific Stokes sectional images

2.2 Lifelong learning for all OPM tasks in SDM-EONs

2.3 Adversarial learning assisted knowledge distillation

3. Experimental setup

4. Results and discussion

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (1)

Equations (4)

Optics Express