Geronimo Bergk, Behnam Shariati, Pooyan Safari, and Johannes K. Fischer, "ML-assisted QoT estimation: a dataset collection and data visualization for dataset quality evaluation," J. Opt. Commun. Netw. 14, 43-55 (2022)
Machine learning (ML)-assisted solutions for quality of transmission (QoT) estimation or classification have received significant attention in recent years. However, due to the unavailability of large and well-structured datasets, individual research groups need to create and use their own datasets for validating their proposed solutions. Therefore, the reported results (obtained using different datasets) are difficult to reproduce and hardly comparable. Regardless of this limitation, the unavailability of a technique to be followed by different research groups for the explainability of the dataset makes it even harder to validate the developed ML-assisted solutions across different papers. In this work, we present a publicly available dataset collection to open the problem of data-driven QoT estimation to the ML community. The dataset collection allows various solutions presented by different research groups to be compared. Furthermore, we present techniques to visualize and evaluate datasets for QoT estimation. The presented visualizations can also deliver deep insight into the error analysis of ML models. We apply these new methods to evaluate an artificial neural network on different datasets. The results show the relevance of the presented visualizations for comparing different approaches and different datasets. The proposed methods enable the comparison and validation of different ML-based solutions and published datasets.
You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Figure files are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Article tables are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Equations are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
$B_{\Delta}$ is the channel spacing. ${L_s}$ is the span length. ${N_{{\rm paths}}}$ is the number of shortest paths considered between the endpoints of a connectivity service. ${N_{{\rm MF}}}$ is the number of considered modulation formats (MFs). SA is the spectrum allocation. ${R_{\rm s}}$ is the symbol rate.
Total number of active channels along the lightpath
1
Maximum BER of interfering lightpaths
1
Minimum BER of interfering lightpaths
1
Average BER of interfering lightpaths
1
Minimum cardinality of the modulation format (left)
1
Maximum cardinality of the modulation format (left)
1
Minimum cardinality of the modulation format (right)
1
Maximum cardinality of the modulation format (right)
Gb/s
Minimum lightpath line rate (left)
Gb/s
Maximum lightpath line rate (left)
Gb/s
Minimum lightpath line rate (right)
Gb/s
Maximum lightpath line rate (right)
1
Minimum BER (left)
1
Maximum BER (left)
1
Minimum BER (right)
1
Maximum BER (right)
Table 5.
Number of Samples and Class Balance of the Datasets in the QoT Dataset Collection
Number of Samples
Dataset
Total
Positive Class
Negative Class
01
1,524,755
1,155,635
369,120
02
1,280,573
1,187,330
93,243
03
1,321,452
949,682
371,770
04
1,348,293
1,263,121
85,172
Table 6.
Accuracy Scores [%] on the Training, Validation, and Test Set for Each of the Four QoT Datasets
Dataset
Training
Validation
Test
01
98.35
97.56
98.44
02
99.53
99.44
99.48
03
99.32
99.17
99.03
04
99.36
99.22
99.14
Table 7.
Accuracy Scores the Trained Models Achieve on the Class-Balanced Subsample of Each Other Dataset
Model Trained on the
Test Accuracy [%] on Dataset
Training Set of Dataset
01
02
03
04
01
-
99.05
93.21
99.4
02
59.51
-
66.56
70.32
03
55.41
57.39
-
57.47
04
73.3
99.12
57.59
-
Table 8.
Limits of the Signed OSNR Error as Well as the 0% and 5% OSNR Error Tolerance Margins of the ANNs Trained on the Four QoT Datasets, Evaluated on Each Respective Complete Dataset Considering All Samples
Model Trained on
OSNR Error Margin [dB]
Dataset
OSNR Error Limits [dB]
0%
5%
01
1.16
0.04
02
0.17
0.00
03
0.41
0.04
04
0.47
−0.01
Tables (8)
Table 1.
Key Information about Network Simulation Scenarios of Studies Recently Published on ML-based QoT Estimationa
$B_{\Delta}$ is the channel spacing. ${L_s}$ is the span length. ${N_{{\rm paths}}}$ is the number of shortest paths considered between the endpoints of a connectivity service. ${N_{{\rm MF}}}$ is the number of considered modulation formats (MFs). SA is the spectrum allocation. ${R_{\rm s}}$ is the symbol rate.
Total number of active channels along the lightpath
1
Maximum BER of interfering lightpaths
1
Minimum BER of interfering lightpaths
1
Average BER of interfering lightpaths
1
Minimum cardinality of the modulation format (left)
1
Maximum cardinality of the modulation format (left)
1
Minimum cardinality of the modulation format (right)
1
Maximum cardinality of the modulation format (right)
Gb/s
Minimum lightpath line rate (left)
Gb/s
Maximum lightpath line rate (left)
Gb/s
Minimum lightpath line rate (right)
Gb/s
Maximum lightpath line rate (right)
1
Minimum BER (left)
1
Maximum BER (left)
1
Minimum BER (right)
1
Maximum BER (right)
Table 5.
Number of Samples and Class Balance of the Datasets in the QoT Dataset Collection
Number of Samples
Dataset
Total
Positive Class
Negative Class
01
1,524,755
1,155,635
369,120
02
1,280,573
1,187,330
93,243
03
1,321,452
949,682
371,770
04
1,348,293
1,263,121
85,172
Table 6.
Accuracy Scores [%] on the Training, Validation, and Test Set for Each of the Four QoT Datasets
Dataset
Training
Validation
Test
01
98.35
97.56
98.44
02
99.53
99.44
99.48
03
99.32
99.17
99.03
04
99.36
99.22
99.14
Table 7.
Accuracy Scores the Trained Models Achieve on the Class-Balanced Subsample of Each Other Dataset
Model Trained on the
Test Accuracy [%] on Dataset
Training Set of Dataset
01
02
03
04
01
-
99.05
93.21
99.4
02
59.51
-
66.56
70.32
03
55.41
57.39
-
57.47
04
73.3
99.12
57.59
-
Table 8.
Limits of the Signed OSNR Error as Well as the 0% and 5% OSNR Error Tolerance Margins of the ANNs Trained on the Four QoT Datasets, Evaluated on Each Respective Complete Dataset Considering All Samples