Fast adaptation of multi-task meta-learning for optical performance monitoring

Yu Zhang; Yu Zhang; Peng Zhou; Peng Zhou; Yan Liu; Yan Liu; Jixiang Wang; Jixiang Wang; Chuanqi Li; Ye Lu; Ye Lu

doi:10.1364/OE.488829

1. Introduction

Recently, the development of optical communication network is characterized by high-speed, heterogeneous and dynamic. Thus, optical performance monitoring (OPM) modules need to be quickly generated and deployed. Since the choice of compensation algorithm in digital signal processing (DSP) which relies on modulation format information determines the final performance of optical fiber transmission, the capability to autonomously identify the modulation formats of received signals without any prior information from the transmitters has become a essential component for OPM [1,2]. In addition, due to a momentous relation with automatic fault diagnosis and the positive correlation with the bit error rate (BER), optical signal-to-noise ratio (OSNR) is primarily desired to be measured in the implemention of OPM [3,4].

Traditional OPM techniques are not suitable for dynamic optical network due to additional hardware support [5]. With the advance of machine learning (ML), schemes based on support vector machine (SVM) [5,6], random forests (RF) [7], k-nerarest neighbors (KNN) [8], and artificial neural network (ANN) [9–14] have begun to attract widespread attention, but machine learning algorithms (MLAs) are lack of feature extraction capabilities. The recent boom in deep learning (DL) has pushed research in this area to a high performance bottleneck, benefiting from a large number of supervised samples. Based on multi-task learning (MTL), Wang et al. combined modulation format identification (MFI) and OSNR estimation by using convolutional neural network (CNN) [15–17]. Recently, the team proposed and verified a RF-based ensemble learning scheme [18], jointing bit rate/MF identification and OSNR estimation from the perspective of simulation and experiment respectively. Chen and Zhang et al. proposed to accelerate the convergence of deep neural network (DNN) [4,19] by introducing transfer learning. Han et al. [20] demonstrated a simultaneous MFI and OSNR monitoring scheme based on optoelectronic reservoir computing (RC) and signal's amplitude histograms (AHs). In addition, Zhao et al. used a novel method based on binary neural network [1] to reduce model memory. However, in a dynamic system, the link state changes constantly and the generated model requires a lot of data every time. Besides, the performance degradation on new task also reflects the weak generalization ability of the original model. Therefore, a technology with strong generalization ability and low requirements for data set scale is urgently desired. Actually, one of the most important things of the successful DL-based applications is to improve their generalization ability across different scenes or conditions. Gradually, more related works on improving the generalization of models have be reaserched. Some data augmentation-assisted methods are used to improve the generalization of the model on small-scale data sets. For DNN [21–23] dominated by linear layers with less computation, expansion for samples is feasible and effective. Meanwhile, meta-learning has been successfully applied to optical networks to improve the performance of small sample monitoring. Cheng et al. proposed a meta-learning technique for OSNR monitoring to directly detected 16QAM signals and on a small data set, the root mean square error (RMSE) and mean absolute error (MAE) is 0.87 dB and 0.57 dB respectively [24]. The few-shot learning (FSL) model proposed by Zhang et al. has good performance for OSNR estimation and MFI in the range of common OSNR for multiple modulation formats, but it cannot recognize all classes at the same time with the 2 dB interval [25]. In fact, under the configuration of 5-way-k-shot, the basenet can only recognize 5 values at a time. The problem of new class expansion and large error tolerance force techniques based on classification unsuitable for practical OPM.

In this paper, we propose a technology jointing MFI and OSNR estimation based on multi-task model-agnostic meta-learning (MT-MAML) for fast adaptation to dynamic system. Features of OSNR and modulation formats (MFs) (32 G GBaud daul-polarization (DP) QPSK, 8QAM, 16QAM, 32QAM, 64QAM, 128QAM) are extracted from constellation diagrams and used for meta-learning. The simulation results show that the proposed method can achieve 100% recognition rate for MFI and ∼0.18 dB MSE for OSNR monitoring in the test task, outperforming single-task model-agnostic meta-learning (ST-MAML) and adaptive multi-task learning (AMTL). Our proposed model have great generalization ability to transmission distance. In addition, we find that the proposed method can excellently estimate the low-order MFs’ OSNR. We prove that the participation of ancillary task can enable the meta-learner to train a basenet with better generalization and faster adaptation to main task which is of great significance for OPM equipment requiring rapid deploying multiple monitoring tasks in the future.

2. Operating principle

The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks with only a small number of training examples in limited steps [26]. The emergence of meta-learning represented by MAML and Reptile has solved the problem of poor universality in traditional CNN, making meta-learner to learn like people. We hope to introduce the training idea of meta learning to improve the adaptation and generalization ability of OPM equipment on new tasks.

The motivation for the proposed algorithm in this paper comes from paper [25] in which 5-way-5-shot is adopted as the data set configuration which makes the monitoring performance of the FSL model in the full range of OSNR band questionable. In fact, this method randomly extracts a few categories for discrimination can only achieve “sparse” classifications in the distribution of rich data, that is, the high accuracy comes from the categories that are easy to distinguish in the evaluation. When we try to configure it as “dense” 16-way-k-shot, the generalization of the meta-learner on the query sets of test tasks is not satisfactory and in practical system, OSNR is a continuous value, giving us an additional argument to consider OSNR estimation as a regression problem.

Due to the limited amount of data, the error of meta learning based schemes on small sample sets is unsatisfactory, causing a significant impact on later processing. To address this problem, the key idea is to introduce the related tasks to improve the performance of the network trained by MAML. Fortunately, it can be traced back that introducing tasks with inter-affinity with each other can improve performance of related tasks by extracting and sharing useful information among them [27].

Assume that there is a certain task affinity between the auxiliary task and the main task, that is, when the information from auxiliary task is added, there is a vector addition to the directional derivative that determine the convergence speed of the main task, so that the gradient is deflected closer to the optimal target and simultaneously achieve better generalization. As shown in Fig. 1, multi-task meta-learning makes one step gradient descent for each meta task consisting of main task and ancillary task, only one step is inferred in the figure, and then selects a direction suitable for all meta tasks to update, so it can faster adapt to new tasks. It can be inferred that the temporary parameters considering multiple gradient descent results will obtain the ability to quickly approach the optimal target in a short time in the given sample space. We hope that the meta-learner can not only identify the MFs, but also continuously predict the OSNR value of each image. We quote the concept of episode from meta-learning in which one training task is performed. Different from previous methods, auxiliary task and main task in one episode are simultaneously performed at the same time, that is to say, one training task will include the main task and auxiliary task. Formally, we consider a model represented by a parameterized function $f_{\theta}$ initialized as $\theta$ randomly. We assume MFI as ancillary task and OSNR monitoring as main task which forming meta task. Small scale task batches randomly sampled from training distribution $P(\Gamma )$ will support the fitting operation in one episode, firstly and each batch is composed of samples with different MFs and OSNRs. In order to avoid causing conceptual confusion, we define the data used for training tasks as training data and the data participated in test task as testing data. After assigning task as units from the whole sample space, one task ${\Gamma _i}$ will also be randomly divided to support set ${D_i}$ and query set ${D^{\prime}_i}$ in a ratio of $k:q$. Randomly dividing the sample space and task distribution avoids category imbalance and ensures persuasive results. Similarly, the principle of sampling and partitioning data is also applicable to the fine-tuning stage while the query set does not participate in parameter updates which is one of the contributions to reducing training costs and sample usage.

(1)$${\theta ^{\prime}_i} = \theta - \alpha {\nabla _\theta }{L_{{D_i} \sim {\Gamma _i}}}({f_\theta })$$

oe-31-14-23183-i001

The step size α is fixed as 0.1. Note that the parameters calculated by Eq. (1) are not directly used to update the θ of the basenet, but only used to calculate the total loss of query sets after few steps gradient descent. Two common loss functions used for supervised classification and regression are cross-entropy and mean absolute error (MAE) used for MFI and OSNR estimation, respectively. The loss function of MFI task on the entire ${\Gamma _i}$ can be expressed as,

(2)$${L_\Gamma }_{_i \sim P(\Gamma )}{({f_{{{\theta ^{\prime}_i}}}})_1} ={-} \sum\limits_{{x^{(j)}},y_1^{(j)} \sim {\Gamma _i}} {\sum\limits_{k = 1}^C {{y_{1,k}}^{(j)}} } \log {f_{{{\theta ^{\prime}_i}}}}{({x^{(j)}})_{1,k}}$$

where ${x^{(j)}}$ is a combination of shared features and $y_1^{(j)} \in \{ 0,1,2,3,4,5\}$ is the label defined for ancillary task. ${f_{{{\theta ^{\prime}_i}}}}{({x^{(j)}})_{1,k}}$ is one of the predicted value from output units representing various MFs in fully-connected (FC) layer. Similarly, for OSNR estimation, the MAE loss at one batch of task takes the form:

(3)$${L_\Gamma }_{_i \sim P(\Gamma )}{({f_{{{\theta ^{\prime}_i}}}})_2} = \sum\limits_{{x^{(j)}},y_1^{(j)} \sim {\Gamma _i}} {|{{f_{{{\theta^{\prime}}_i}}}{{({x^{(j)}})}_2} - {y_2}^{(j)}} |}$$

where $y_2^{(j)} \in \{ 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f\}$ is the label defined for main task and ${f_{{{\theta ^{\prime}_i}}}}{({x^{(j)}})_2}$ is the predicted value of OSNRs. It should be noted that whether in the support set for gradient descent updates or the query set for meta-optimization, losses needs to be calculated, such that the meta-objective is as follows:

(4)$$\mathop {\arg \min }\limits_\theta {L_{{{D^{\prime}_i}}}}_{ {\sim} {\Gamma _i}}({f_{{{\theta ^{\prime}_i}}}}) = \frac{1}{C}[\sum\limits_{{{D^{\prime}_i}} \sim {\Gamma _i}} {{L_{{{D^{\prime}_i}}}}_{ {\sim} {\Gamma _i}}{{({\lambda _1}{f_{\theta - \alpha {\nabla _\theta }{L_{{D_i} \sim {\Gamma _i}}}{{({f_\theta })}_1}}})}_1}} + \sum\limits_{{{D^{\prime}_i}} \sim {\Gamma _i}} {{L_{{{D^{\prime}_i}}}}_{ {\sim} {\Gamma _i}}{{({\lambda _2}{f_{\theta - \alpha {\nabla _\theta }{L_{{D_i} \sim {\Gamma _i}}}{{({f_\theta })}_2}}})}_2}} ]$$

The direction of gradient update is determined by the last descent step in last task batch, meta-optimization across C batches is performed by Adam, such that the layer parameters θ are updated as follows:

(5)$$\theta = \theta - \beta {\nabla _\theta }{L_{{{D^{\prime}_i}} \sim {\Gamma _i}}}({f_{{{\theta ^{\prime}_i}}}})$$

The step size β are initially fixed as 0.1, conversely, the task weights in the objective optimization function are a set of learnable parameters and as the corresponding gradient changes, the balance between the two tasks is adaptively adjusted. The vector combined by task weights $\varphi \textrm{ = }[{\lambda _1},{\lambda _2}]$ is circularly updated by

(6)$$\varphi \leftarrow \varphi - \beta {\nabla _\varphi }{L_{{{D^{\prime}_i}} \sim {\Gamma _i}}}({f_{{{\theta ^{\prime}_i}}}})$$

So far, an epoch is finished. The full algorithm in pretraining process, in the general case, is outlined in Algorithm 1. When it comes to fine-tuning, query data will just be used to forecast results, so the step 8 and 10 with respect to ${D_i}$ will be withdrawn in training algorithm for adapting parameters.

Fig. 1. The diagram of multi-task meta-learning algorithm.

Download Full Size | PDF

Unlike previous methods requiring selecting appropriate weight ratios for MTL [28], we contribute a more intelligent approach, using the evolving gradients of sub-tasks to guide learning corresponding task weights, allowing the model to adaptively change the degree of gradient descent for main task and auxiliary task at different stages during the process of training. The algorithm using adaptive task weights can flexibly determine the update amplitude of task weights by the objective function, taking away the burden from manually adjusting task weights. Our method proposed achieves both scaling and directional correction to gradient for main task in back propagation, theoretically, this adaptive multi-task gradient descent can accelerate transfering knowledge to new scenarios in dynamic and complex optical networks.

3. Simulation setup and results

3.1 Simulation setup

The simulation environment designed for jointing MFI and OSNR monitoring is shown in Fig. 2. VPI Design Suite 9.8 is used to simulate the optical communication system and Python 3.7 is employed to import data points collected from coherent receiver to generate constellation maps, also build a basenet for adaptive MAML algorithm. We generate 32GBaud signals including QPSK, 8QAM, 16QAM, 32QAM, 64QAM and 128QAM after modulating an optical carrier, provided by an external cavity laser (ECL). The center wavelength of ECL is 1550 nm and its linewidth is 100 kHz. I/Q modulators receive multi-level electrical signals, grouped by a pseudo-random binary sequence (PRBS) for different MFs. Subcarrier in I and Q channels are simultaneously modulated and then coupled through polarization multiplexing, composed with polarization beam splitter (PBS), optical delay lines and polarization beam combiner (PBC). The first erbium-doped fiber amplifier (EDFA) is used as a relay amplifier to boost the system power. The amplified modulated signals are transmitted on a 200 km standard single-mode fiber (SSMF) link. Subsequently, the six kind of signals are respectively adjusted to the required OSNR (10∼25 dB, 10∼25 dB, 15∼30 dB, 20∼35 dB, 20∼35 dB, 25∼40 dB) using a variable optical attenuator (VOA) with step size of 1 dB. Besides, OSNRs can be measured by an optical spectrum analyzer (OSA). The electrical signal after photoelectric conversion is sampled via a digital oscilloscope (DO) at a speed of 50 GSa/s.

Fig. 2. Simulation environment for joint MFI and OSNR monitoring.

Download Full Size | PDF

Fig. 3. Six different constellation grayscale maps that is randomly sampled from data sets with various OSNR values.

Download Full Size | PDF

Signals will be processed through comprised I/Q imbalance compensation, dispersion compensation, and constant modulus algorithm (CMA) equalization. After CMA equalization, each constellation diagram is mapped by 10000 pairs (${I_x}$, ${Q_x}$) or (${I_y}$, ${Q_y}$) in a coordinate system. For each OSNR, 40 grayscale maps are generated, constructing data sets used for train and test tasks. Figure 3 shows the constellation diagrams of six modulation formats under different OSNR. 20 × 6 × 16 = 1920 samples will participate in pretraining the meta-learner and when it comes to 1-shot, the remaining samples are used to form 6 batches test tasks, including 1 × 6 × 16 = 96 samples used for fine-tuning and 19 × 6 × 16 = 1824 used for prediction to evaluate the performance of the basenet possessing quick adaptability. Thus, only one image is required for each OSNR to obtain good generalization when basenet is migrated to new scenarios. With the help of MFI and OSNR monitoring by adaptive MT-MAML, remaining compensation algorithms including multiple modulo algorithm (MMA)-based equalization, carrier phase recovery, symbol detection and decision which need modulation format information will be perform and OSNR information is used for fault & link health detection to assist the management in cognitive optical network (CON).

In order to reflect the progressiveness of the algorithm in this paper, we simulate the other schemes, ST-MAML and AMTL, in the same environment. The basenet structure of AMTL and MT-MAML is listed in Table 1, the usage of parameters is just 4255. Although 6 neurons are added for MFI in the FC layer of the CNN used in the multi-task schemes, they manipulate the same backbone with ST-MAML, extracting dimensionally consistent feature both in pretraining and fine-tuning on the same data sets.

Table 1. The structure of the basenets

View Table | View all tables in this article

3.2 Effects of support set scale and multi-step gradient descent

Since the number of samples in support set affects the result of OSNR estimation and determines the training costs during dynamic fine-tuning as mentioned in Section 2, it is necessary to try adjusting the size of support set so that a balance between complexity and performance for the method will be achieved, which is crucial for small sample learning. When the number of iterations in pretraining is set to 1400, the variation of joint loss in multi-task optimization with respect to epoch is shown in Fig. 4. Obviously, when the support set with 4 shot is trained, the red curve at the bottom achieves the fastest convergence speed and better local optimal point.

Fig. 4. Multi-task loss curves in pretraining for support set with different scales.

Download Full Size | PDF

Similarly, the error curves in Fig. 5 prove this conclusion that 4 shot is the best choice for the setup of data set. These curves reveal that just 10/12/14/17 shot achieve smaller MSE and RMSE than 4 shot. However, the reduction for error is small, but the increase in the number of samples used for fine-tuning is significant. Overall, the error curve exhibits wave like horizontal stability. It is not difficult to find that the error curves are investigated from the 0^th iteration in Fig. 5, indicating that the initial generalization of the model on new data will be studied without fine-tuning. Surprisingly, on unknown data, 4 shot achieved ∼0.24 dB for MSE and ∼0.49 dB for RMSE, far superior to ∼0.66 dB and ∼0.81 dB at 15 shot. The errors obtained after fine-tuning are ∼0.18 dB and ∼0.42 dB, respectively. If such excellent generalization is experimentally verified in various scenarios, the method proposed in this paper will greatly reduce the cost for repetitively collecting data and the training resources required for fine-tuning models when OPM devices need to be deployed at massive nodes in dynamic optical networks.

Fig. 5. Error curves at query set used for testing in fine-tuning for support set with different scales (a) MSE (b) RMSE.

Download Full Size | PDF

We further investigate the number of gradient descent steps that meet the requirements of algorithm complexity and error tolerance by Fig. 6. As the number of gradient descent steps increases, the convergence speed of the pretraining process will become faster and faster. Specifically, the steepness of the curves in Fig. 6 (a) is positively correlated with the steps, clearly consistent with the inference in Section 2. However, it does not mean that the same results will also be obtained on the query set used for testing in fine-tuning. Although the errors are only at a moderate level when the gradient descent step is 1, the algorithm complexity during pretraining and fine-tuning is lower, and similarly, we also find surprising performance, that is, withdrawn fine-tuning, solely relying on existing knowledge learned, the model achieves the lowest error (0.37 dB and 0.61 dB) initially. In other words, we conclude that the proposed method better captures the similarity between new and previous knowledge at 4 shot and 1 step. Actually, compared to baselines, this generalization ability improves overall.

Fig. 6. (a) Multi-task loss curves in pretraining (b) MSE curves of OSNR estimation (c) RMSE curves of OSNR estimation with several gradient descent steps.

Download Full Size | PDF

3.3 Results and discussions

Second, we evaluate baselines based on the same backbone as designed in Table 1 including AMTL and ST-MAML on the main task, using the data set gathered in Section 3.1, a common benchmark (MTL) with adaptive task weights and single-task meta regression (Chelsea Finn et al, 2017) problem.

Table 2 summarizes the accuracy of MFI for six modulation formats in the simulation. Table 3 shows the results after fine-tuning, reporting MSE, RMSE, the mean and standard deviation of error between target and prediction, the proposed method shows a strong performance compared to baselines by comparing various errors. Our method achieves the minimum value in all types of errors and performs stably.

Table 2. MFI accuracies for various modulation formats during the predicting stage in the simulation.

View Table | View all tables in this article

Table 3. Results on baselines and ours: main task is the OSNR regression problem.

View Table | View all tables in this article

Next, we will specifically analyze the performance of the three benchmarks. Figure 7 (a) demonstrates the loss curves of the main task achieved by three schemes within 2000 epochs during pretraining, while Fig. 7 (b) and (c) respectively indicate the MSE and RMSE curves obtained by three basenets within 200 epochs during fine-tuning. Remarkably, MT-MAML defeat the other two schemes in terms of convergence speed and the minimum error. Its loss curve on training task is steeper and more stable than ST-MAML as revealed by Fig. 7(a). Therefore, MT-MAML has faster adaptability. With respect to MSE, the error of the meta-learner aided by auxiliary task is smaller than that of ST-MAML and AMTL by ∼0.12 dB and ∼0.42 dB respectively, thus the methods based on MAML outperform baseline (AMTL) on small sample sets and have stable performance. It is worth noting that in the test task, AMTL needs to train 20 images per OSNR, while MT-MAML only trains 4 images during fine-tuning.

Fig. 7. Performance comparison of three schemes (a) loss during pretraining vs. epoch, (b)MSE vs. epoch, (c)RMSE vs. epoch.

Download Full Size | PDF

Furthermore, we investigate the influence of different modulation formats on the main task in Fig. 8. Obtained by our method, the estimated errors of the OSNR for QPSK, 8QAM and 16QAM are only 0.03 dB, 0.07 dB and 0.07 dB, respectively. The estimated error is mainly concentrated on the high-order modulation formats. Compared with baselines, as the order of the modulation format increases, the reduction in OSNR estimated MSE becomes increasingly significant. It is not difficult to speculate that our method is also suitable for the application scenarios where high-order modulation formats are transmitted and monitored in the future.

Fig. 8. The OSNR estimated MSE of the three schemes under each modulation format.

Download Full Size | PDF

Importantly, the degree of deviation between the predictions and true values needs to be revealed, which directly affects the reliability of the proposed algorithm implemented in systems with high requirements for real-time performance and bit error rate. In Table 3, we have provided the average prediction error and standard deviation of the OSNR for the whole data set. More specifically, the average predicted value and standard deviation under each OSNR are shown in Fig. 9, it can be seen that the OSNR range with a large deviation from the true value is 25-40 dB as indicated in Fig. 9 (b) which is a partial enlarged display of grey rectangular area in Fig. 9 (a). The model achieves the maximum standard deviation (0.65 dB) at 38 dB and the minimum value (0.08 dB) is obtained at 12 dB.

Fig. 9. Demonstration about (a) the predicted mean value and standard deviation for each OSNR (b) local amplification at high OSNR range (25-40 dB).

Download Full Size | PDF

We just demonstrate the first 100 epoch for investigating the task weight during pretraining by Fig. 10 (a) and apparently, the curves respectively representing ${\lambda _1}$, task weight of MFI and ${\lambda _2}$, task weight of OSNR estimating are almost symmetrical. Initially, normalized task weights are both 0.5, ${\lambda _1}$ is slightly higher than ${\lambda _2}$ from 1 epoch to 3 epoch. Starting from the fourth round, the weight of OSNR estimation exceeds MFI, and the model begins to pay more attention to the correlation between OSNR and inputted images. Supposing that the number of modulation formats and transmission speed increase, MFI may become more complex, even difficult. The method that artificially selecting weight ratio can no longer adapt to different requirements from optical network nodes. Researchers cannot know when to allocate using appropriate task weights during training. However, it is an intelligent method to automatically determine the weights for different tasks by the learnable algorithm, which enables the model to know when to end its concern to simple tasks, in a certain way, it can avoid negative migration. Just like the stationary point of the curve, at 30^th epoch, ${\lambda _1}$ is almost reduced to 0.

Fig. 10. (a) Adaptive task weights vary with the number of iterations during training (b) Generalization of ablation algorithms under different transmission distances.

Download Full Size | PDF

Significantly, we will investigate the generalization of the basenets in the new scenarios, set up at different transmission distance varied from 200 km to 1000 km at the interval of 200 km, and then collect the constellation diagrams (20 × 6 × 16 = 1920) to construct test task. We set the support set size as 4 shot and gradient descent step as 1. Figure 10 (b) shows the change of OSNR estimated MSE with distance, evaluated by AMTL, ST-MAML and MT-MAML. When the transmission distance is longer than 400 km, the performance of AMTL exceeds that of ST-MAML. It is worth mentioning that in the long-distance system, MT-MAML has a great superiority in monitoring OSNR.

4. Experimental setup and results

4.1 Experimental setup

In order to verify the feasibility of the algorithm applied, a DP optical communication system transmitting QPSK/8QAM/16QAM/32QAM/64QAM/128QAM is experimentally built at 32 GBaud. The principle is shown in Fig. 11. The outer cavity laser generates a carrier with a central wavelength of 1549.5 nm and a line width of 100kHz. The PBS divides the light signal into two vertical polarized light. Arbitrary waveform generator (AWG) is used to drive the I/Q modulator and modulate the electrical signal to the light signal. EDFA 1 is used to keep the output power at 0dbm. The transmission link (1000 km) is composed of 10 spans. Each sub link contains 100 km SSMF and an EDFA, contributing to compensate the loss of light energy after 100 km transmission. The role of EDFA 2 and VOA is to make the signal's OSNR within the target scope. In the receiver, the 90° mixer changes frequency of the modulated light signal by local oscillation. Then, the photoelectric converter transforms the optical signal into a electrical signal. We sample the received signal by using a digital oscilloscope with a sampling rate of 50GSa/s.

Fig. 11. Experimental schematic diagram of coherent optical transmission system.

Download Full Size | PDF

Finally, offline DSP technology will be implemented to process digital signals. Constellation diagrams with the same quantity in Section 3.1 will be assembled in the experimental environment. The data set size and model hyper parameters are selected as the optimal choice evaluated in the simulation.

4.2 Experimental results and discussions

The MFI results are shown in Table 4, the accuracy of MFI for six modulation formats can both reach 100%. Figure 12 shows the experimental OSNR monitoring results in a 1000 km transmission system. On the experimental data set collected in Section 4.1, we conduct fine-tuning and testing on the three pretrained models during the simulation process. The MSE is ∼0.71 dB for the whole test task achieved by MT-MAML and for each modulation format, the performance far surpasses the comparison schemes, reflecting remarkable generalization ability and adaptability. In the trend of intelligence and elasticity, this is extremely important for OPM equipment that needs to adapt to new scenarios and new tasks.

Fig. 12. OSNR estimated MSE for various modulation formats in a 1000 km transmission system.

Download Full Size | PDF

Table 4. MFI accuracies for various modulation formats during the predicting stage in the experiment.

View Table | View all tables in this article

5. Conclusion

In this work, an algorithm used for muti-task meta-learning jointing MFI and OSNR estimation is proposed. Six widely used modulation formats are well recognized and a wide range of OSNR (10-40 dB) can be forecasted at 32GBaud. The simulation results show that MT-MAML can achieve 100% accuracy of MFI and the obtained MSE for OSNR estimation is just ∼0.18 dB, surprisingly, the estimated errors for QPSK, 8QAM and 16QAM are only ∼0.03 dB, ∼0.07 dB and ∼0.07 dB, respectively. Furthermore, the numerical results indicate that MSE of OSNR estimation is worsened from 0.18 dB to 0.61 dB, when system distance is increased from 200 km to 1000 km. Moreover, the MSE is ∼0.71 dB achieved by MT-MAML for the whole test task implemented in experimental environment and for each modulation format, the performance far surpasses the comparison schemes (ST-MAML and AMTL). During the fine-tuning process, the method only need 4 images for each OSNR value that will get remarkable generalization ability and adaptability. In the trend of intelligence and elasticity, this is extremely important for OPM equipment that needs to adapt to new scenarios and new tasks.

Funding

The Basic Ability Enhancement Program for Middle and Young-aged Teachers, Education Department of Guangxi Zhuang Autonomous Region (2023KY0070).

Disclosures

The authors declare no conflict of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Y. Zhao, Z. Yu, Z. Wan, S. Hu, L. Shu, J. Zhang, and K. Xu, “Low complexity OSNR monitoring and modulation format identification based on binarized neural networks,” J. Lightwave Technol. 38(6), 1314–1322 (2020). [CrossRef]

2. Z. Dong, F. N. Khan, Q. Sui, K. Zhong, C. Lu, and A. P. T. Lau, “Optical performance monitoring: A review of current and future technologies,” J. Lightwave Technol. 34(2), 525–543 (2016). [CrossRef]

3. Z. Pan, C. Yu, and A. E. Willner, “Optical performance monitoring for the next generation optical communication networks,” Opt. Fiber Technol. 16(1), 20–45 (2010). [CrossRef]

4. J. Zhang, Y. Li, S. Hu, W. Zhang, Z. Wan, Z. Yu, and K. Qiu, “Joint Modulation Format Identification and OSNR Monitoring Using Cascaded Neural Network With Transfer Learning,” IEEE Photonics J. 13(6), 1–6 (2021). [CrossRef]

5. J. Thrane, J. Wass, M. Piels, J. C. M. Diniz, R. Jones, and D. Zibar, “Machine learning techniques for optical performance monitoring from directly detected PDM-QAM signals,” J. Lightwave Technol. 35(4), 868–875 (2017). [CrossRef]

6. H. Zhou, M. Tang, X. Chen, Z. Feng, Q. Wu, S. Fu, and D. Liu, “Fractal dimension aided modulation formats identification based on support vector machines,” in European Conference on Optical Communication (ECOC, 2017), pp. 1–3.

7. X. Chen, L. Wang, T. Yang, and J. Du, “Low-complexity and nonlinearity-tolerant modulation format identification using random forest,” IEEE Photon. Technol. Lett. 31(11), 853–856 (2019). [CrossRef]

8. Y. Zhao, C. Shi, D. Wang, Z. Cai, Z. Li, H. Han, Y. Cui, and B. Luo, “Nonlinearity mitigation using a machine learning detector based on k -nearest neighbors,” IEEE Photonics Technol. Lett. 28(19), 2102–2105 (2016). [CrossRef]

9. S. Li, J. Zhou, Z. Huang, and X. Sun, “Modulation Format Identification Based on an Improved RBF Neural Network Trained With Asynchronous Amplitude Histogram,” IEEE Access 8, 59524–59532 (2020). [CrossRef]

10. M. A. Jalil, J. Ayad, and H. J. Abdulkareem, “Modulation Scheme Identification Based on Artificial Neural Network Algorithms for Optical Communication System,” J. ICT Res. Appl. 14(1), 69 (2020). [CrossRef]

11. X. Wu, J. Jargon, L. Christen, and A. E. Willner, “Training of neural networks to perform optical performance monitoring of a combination of accumulated signal nonlinearity, CD, PMD, and OSNR,” in LEOS 2008-21st Annual Meeting of the IEEE Lasers and Electro-Optics Society, 543–544 (2008).

12. F. N. Khan, Y. Zhou, A. P. T. Lau, and C. Lu, “Modulation format identification in heterogeneous fiber-optic networks using artificial neural networks,” Opt. Express 20(11), 12422–12431 (2012). [CrossRef]

13. F. N. Khan, T. S. R. Shen, Y. Zhou, A. P. T. Lau, and C. Lu, “Optical performance monitoring using artificial neural networks trained with empirical moments of asynchronously sampled signal amplitudes,” IEEE Photonics Technol. Lett. 24(12), 982–984 (2012). [CrossRef]

14. F. N. Khan, K. Zhong, W. H. Al-Arashi, C. Yu, C. Lu, and A. P. T. Lau, “Modulation format identification in coherent receivers using deep machine learning,” IEEE Photonics Technol. Lett. 28(17), 1886–1889 (2016). [CrossRef]

15. D. Wang, M. Zhang, Z. Li, J. Li, M. Fu, Y. Cui, and X. Chen, “Modulation format recognition and OSNR estimation using CNN-based deep learning,” IEEE Photonics Technol. Lett. 29(19), 1667–1670 (2017). [CrossRef]

16. D. Wang, M. Wang, M. Zhang, H. Yang, J. Li, J. Li, and X. Chen, “Cost-effective and data size–adaptive OPM at intermediated node using convolutional neural network-based image processor,” Opt. Express 27(7), 9403–9419 (2019). [CrossRef]

17. D. Wang, M. Zhang, J. Li, Z. Li, J. Li, C. Song, and X. Chen, “Intelligent constellation diagram analyzer using convolutional neural network-based deep learning,” Opt. Express 25(15), 17150–17166 (2017). [CrossRef]

18. J Chai, X Chen, Y Zhao, et al., “Joint symbol rate-modulation format identification and OSNR estimation using random forest based ensemble learning for intermediate nodes,” IEEE Photonics J. 13(6), 1–6 (2021). [CrossRef]

19. Y. Cheng, W. Zhang, S. Fu, M. Tang, and D. Liu, “Transfer learning simplified multi-task deep neural network for PDM-64QAM optical performance monitoring,” Opt. Express 28(5), 7607–7617 (2020). [CrossRef]

20. M Han, M Wang, Y Fan, et al., “Simultaneous modulation format identification and OSNR monitoring based on optoelectronic reservoir computing,” Opt. Express 30(26), 47515–47527 (2022). [CrossRef]

21. W. Zhao, Y. Cheng, M. Xiang, M. Tang, Y. Qin, and S. Fu, “Nonlinear SNR estimation based on the data augmentation-assisted DNN with a small-scale dataset,” Opt. Express 30(22), 39725–39735 (2022). [CrossRef]

22. W. Zhao, Z. Yang, M. Xiang, M. Tang, Y. Qin, and S. Fu, “Accurate OSNR monitoring based on data-augmentation-assisted DNN with a small-scale dataset,” Opt. Lett. 47(1), 130–133 (2022). [CrossRef]

23. Q. Zheng, P. Zhao, Y. Li, H. Wang, and Y. Yang, “Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification,” Neural Comput & Applic 33(13), 7723–7745 (2021). [CrossRef]

24. Y. Cheng, Z. Yang, Z. Yan, D. Liu, S. Fu, and Y. Qin, “Meta-learning-enabled accurate OSNR monitoring of directly detected QAM signals with one-shot training,” Opt. Lett. 47(9), 2218–2221 (2022). [CrossRef]

25. H. Zhang, D. Zhang, and Y. L. Xue, “Constellation Diagram Analyzer Based on Few Shot Learning,” in Asia Communications and Photonics Conference (OSOA, 2021), pp. T4A-263.

26. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International conference on machine learning (PMLR, 2017), pp. 1126–1135.

27. X. Fan, Y. Xie, F. Ren, Y. Zhang, X. Huang, W. Chen, and T. Zhangsun, “Joint optical performance monitoring and modulation format/bit-rate identification by CNN-based multi-task learning,” IEEE Photonics J. 10(5), 1–12 (2018). [CrossRef]

28. X Fan, L Wang, F Ren, et al., “Feature fusion-based multi-task ConvNet for simultaneous optical performance monitoring and bit-rate/modulation format identification,” IEEE Access 7, 126709–126719 (2019). [CrossRef]

Input size	Weights	Function	Stride/Padding	Output size
28 × 28 × 1	3 × 3 × 8	Conv	1/1	28 × 28 × 8
28 × 28 × 8	2 × 2	Max_pool	2/0	14 × 14 × 8
14 × 14 × 8	3 × 3 × 8	Conv	1/1	14 × 14 × 8
14 × 14 × 8	2 × 2	Max_pool	2/0	7 × 7 × 8
7 × 7 × 8	3 × 3 × 16	Conv	1/1	7 × 7 × 16
7 × 7 × 8	2 × 2	Max_pool	2/0	3 × 3 × 16
3 × 3 × 16	3 × 3 × 16	Conv	1/1	3 × 3 × 16
3 × 3 × 16	2 × 2	Max_pool	2/0	1 × 1 × 16
1 × 1 × 16	7	FC	—	1 × 1 × 7

	AMTL	ST-MAML	Our method
MSE (dB)	0.60 ± 0.25	0.30 ± 0.08	0.18 ± 0.03
RMSE (dB)	0.78 ± 0.22	0.56 ± 0.06	0.42 ± 0.01
Mean error (dB)	0.54 ± 0.30	0.24 ± 0.14	0.17 ± 0.06
Standard deviation (dB)	0.68 ± 0.43	0.47 ± 0.33	0.27 ± 0.14

Input size	Weights	Function	Stride/Padding	Output size
28 × 28 × 1	3 × 3 × 8	Conv	1/1	28 × 28 × 8
28 × 28 × 8	2 × 2	Max_pool	2/0	14 × 14 × 8
14 × 14 × 8	3 × 3 × 8	Conv	1/1	14 × 14 × 8
14 × 14 × 8	2 × 2	Max_pool	2/0	7 × 7 × 8
7 × 7 × 8	3 × 3 × 16	Conv	1/1	7 × 7 × 16
7 × 7 × 8	2 × 2	Max_pool	2/0	3 × 3 × 16
3 × 3 × 16	3 × 3 × 16	Conv	1/1	3 × 3 × 16
3 × 3 × 16	2 × 2	Max_pool	2/0	1 × 1 × 16
1 × 1 × 16	7	FC	—	1 × 1 × 7

	AMTL	ST-MAML	Our method
MSE (dB)	0.60 ± 0.25	0.30 ± 0.08	0.18 ± 0.03
RMSE (dB)	0.78 ± 0.22	0.56 ± 0.06	0.42 ± 0.01
Mean error (dB)	0.54 ± 0.30	0.24 ± 0.14	0.17 ± 0.06
Standard deviation (dB)	0.68 ± 0.43	0.47 ± 0.33	0.27 ± 0.14

Fast adaptation of multi-task meta-learning for optical performance monitoring

Abstract

1. Introduction

2. Operating principle

3. Simulation setup and results

3.1 Simulation setup

3.2 Effects of support set scale and multi-step gradient descent

3.3 Results and discussions

4. Experimental setup and results

4.1 Experimental setup

4.2 Experimental results and discussions

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Tables (4)

Equations (6)

Optics Express

		Identified Modulation Format
		QPSK	8QAM	16QAM	32QAM	64QAM	128QAM
Actual Modulation Format	QPSK	320 (100%)	0	0	0	0	0
	8QAM	0	320 (100%)	0	0	0	0
	16QAM	0	0	320 (100%)	0	0	0
	32QAM	0	0	0	320 (100%)	0	0
	64QAM	0	0	0	0	320 (100%)	0
	128QAM	0	0	0	0	0	320 (100%)