Initialized autoencoders to find constellations robust to residual laser phase noise

Amir Omidi; Xun Guan; Ming Zeng; Leslie A. Rusch

doi:10.1364/OPTCON.480399

1. Introduction

Telecommunications is the foundation of our information society. The ubiquitous connectivity requires systems with high spectral and energy efficiency. Constellation shaping is a promising solution to achieve higher throughput and lower power consumption. Existing shaping methods fall into two categories: geometric and probabilistic. We focus on the geometric one.

Geometric constellation shaping optimizes the locations of the constellation points in the complex plane to achieve higher spectral- and energy-efficiency [1]. For additive white Gaussian noise (AWGN), the optimal constellation maximizes the Euclidean distance among neighboring points for unit power. The constellation has an associated bit-to-symbol mapping that minimizes the Hamming distance of assigned bit patterns. This optimization is challenging, since it involves both non-linear constellation shaping and a bit-to-symbol mapping. Indeed, the two parts are often coupled, thereby forming a mixed-integer non-linear optimization, which is often non-deterministic polynomial-time hard (NP-hard) [2].

To handle this challenging problem, different approaches have been proposed, including manual choice of constellation points [3–5], and heuristic and meta-heuristic optimization algorithms [6–10]. The performance of these conventional methods is often limited, and they lack flexibility for application to non-AWGN channels. To overcome these drawbacks, an end-to-end deep learning AE has been exploited to optimize geometric constellations [1,11–14]. In AE the input is first mapped to a compressed form (called the code) and then reconstructed at the output. Geometric constellations from AE have achieved a higher shaping gain than those from conventional methods for AWGN channels [1,11] and nonlinear fiber optic channels [12,13,15,16].

Phase noise from lower-cost lasers with large linewidth (LW) leads to difficult phase recovery [17,18]. While several methods exist for phase recovery [19,20], phase noise (PN) is never completely eliminated, and especially detrimental to high order modulation formats [10]. Residual phase noise from pilot-assisted phase recovery wasstudied in [21] for AE-based geometric constellation. The blind phase recovery was used in [14,22], with a robustness to variations in signal-to-noise ratio (SNR) and laser linewidth addressed in [23].

Stochastic gradient approaches have been used for constellation shaping for the pilot-assisted phase recovery receiver [7,10]. We compare the performance of our solution to both those found with stochastic gradient and AE approaches, while [14,16,22] limit their comparisons to the performance of standard square QAM.

We consider the residual phase noise when using pilot assisted recovery. We show the use of initialization of the neural network for encoder and decoder leads to greater gain in AE constellations for 16QAM and 64QAM. Initialization of a neural network was used [24] for constellation design in phase noise induced by nonlinear fiber propagation (not laser phase noise). That initialization was not only used for a different channel, but was also restricted to the transmitter side. We show via simulation and experiment that the proposed approach always outperforms existing geometric constellation solutions under various PN levels. The greater the residual PN, the greater the improvement.

We use an ad hoc approach that derives bit-to-symbol mappings from our constellations found with initialization. We examine the gap between BER for these ad hoc mappings. For this reason we include BER results, while [14,16,22] consider only SER and/or mutual information (MI).

The AE approach to constellation design has as a byproduct an optimized detector, i.e., the AE decoder. Performance is typically evaluated for the classic maximum likelihood detection that neglects phase noise (also referred to as the mismatched Gaussian receiver). We quantify the improvement when using the AE detector, as well as the maximum likelihood AWGN detector. For the blind phase search study in [14], the 64QAM AE decoder was briefly examined for MI and little gain was observed in simulation. In our case, we see gain in simulation, as well as experimentally (0.8 dB) for 16QAM.

Preliminary results of this paper were previously reported in [1]. The results in [1] were limited to 16QAM, while in this paper we address 64QAM as well. Consideration of 64QAM is important, as larger constellations are much more sensitive to phase errors. Only in this paper do we examine the bit-to-symbol mapping and contrast autodecoder detection to maximum likelihood detection. To the best of our knowledge, our previous paper [21] is the first work that validates the performance of AE-based constellation shaping through rigorous experiments. In this manuscript we have added bit error rate calculations to the experimental results.

This paper is organized as follows. Section 2 presents the communication system model and phase estimation method. Section 3 presents our proposed AE initialization technique, including the constellations chosen for that initialization. Section 4 presents the simulated performance in terms of SER, BER and GMI. We separately address performance of a sub-optimal version of the maximum likelihood (ML) receiver, and the AE decoder. Section 5 presents the experimental validation of our work, while section 6 concludes the paper.

2. Principal concepts

2.1 Communication system model

The coherent communication system model with pilot-based carrier phase estimation CPE is presented in Fig. 1. Symbols to be transmitted are structured in frames including pilot symbols, which are known at the receiver and are used to estimate the carrier phase. We consider a 4.5% pilot rate where the pilots are chosen from quadrature phase shift keying (QPSK) symbols and inserted after each 32 data symbols.

Fig. 1. Baseband equivalent, sampled channel model for a system with PN and additive white Gaussian noise.

Download Full Size | PDF

We consider a system corrupted by both AWGN and a drifting carrier phase. Let $x_k\in {\boldsymbol {X}=\{s_1, s_2,\ldots,s_m,\ldots s_M\}}$ be the $k^{th}$ transmitted symbol selected from M-ary complex alphabet. The received signal at time slot $k$, impaired by PN and AWGN, can be written as

(1)$$r_k = x_ke^{j\theta_k}+n_k,$$

where $\theta _k$ is the sampled phase noise and $n_k$ is AWGN with zero mean and variance $\sigma _{N0}^2$. The laser PN is modelled by a Wiener process (Gaussian random walk) [25] per

(2)$$\Theta(t) = \Theta(0) + \int_{0}^{t} W(\tau)d\tau,$$

where $\Theta (0)$ is uniform in the range $[-\pi, \pi )$ and $W(t)$ is a real Gaussian process. Laser PN is thus a Markov process and $\theta _k$ follows

(3)$$\theta_k = \theta_{k-1} + \Delta \theta_{k}$$

where $\Delta \theta _k$ is a random Gaussian variable with zero mean and variance $\sigma _{PN}^2$. [17]. The variance is related to the laser LW $\Delta v$ and symbol rate via

(4)$$\sigma^2_{PN} = 2\pi \Delta v T_s,$$

where $T_s$ is the symbol duration [25].

We examine a CPE with pilot transmission. As mentioned, pilot symbols are inserted into the transmission stream and frames are structured at the transmitter. At the receiver, these known pilot symbols are used to estimate the phase of the received signal. Received pilots are first extracted from the received data stream and the phase rotation amount is determined. A linear interpolation is applied to estimate the phase rotation for the symbols received between each pair of pilots. A detailed explanation of the phase recovery algorithm is provided in the Appendix A. The channel after CPE is generally known as a partially coherent AWGN channel (PC-AWGN) in literature [26,27,28].

2.2 End-to-end learning system for constellation optimization

Figure 2 presents the overall architecture of the end-to-end communication system for constellation optimization in the presence of laser PN and AWGN. The initializer presets the AE to generate a constellation with known good properties.

Fig. 2. Architecture of end-to-end learning system for constellation optimization with the AE block at the top and the initializer of the encoder and decoder at the bottom.

Download Full Size | PDF

As introduced in [29], AE is an unsupervised deep learning technique that consists of two components, namely the encoder and decoder. The encoder maps the input to a compressed form called the code, while the decoder maps the code to the output, i.e., a reconstruction of the input. In our application, the input is the one of $M$ symbols to be transmitted, the code is the I/Q coordinates assigned to the symbol, and output is the detected received signal, i.e., the estimated symbol. We use a feed forward neural network (FFNN) for the encoder and decoder, as the PC-AWGN channel includes only laser phase noise and additive Gaussian noise, with no bandwidth restriction that might have motivated a recurrent neural network. Moreover, the FFNN is of lower complexity and much easier to implement than a recurrent neural network. The communication systems takes the output of the encoder and applies framing/pilots insertion, channel effects, and CPE.

Data symbols are represented in one-hot form, i.e., a vector defined by

(5)$$v_{i,j} = \left\{ \begin{array}{ll} 1 & \textrm{if} \quad {{i} = {j}} \\ 0 & \textrm{otherwise} \end{array} \quad j = 0,\ldots, M-1 \right.$$

where $v_{i,j}$ denotes the $j^{th}$ entry of the vector $\vec {v_i}$ (one-hot from of the $i^{th}$ symbol) and $M$ is the number of symbols. Once the one-hot inputs are mapped to the I/Q coordinates via the encoder, we normalize the constellation power. The channel introduces AWGN referenced to that unit power.

The TX framing adds pilot symbols as described previously. The channel adds PN and AWGN to the signal. Through the RX carrier phase estimation, the phase rotation of the received signal is estimated using the pilot-assisted CPE described in section A. The phase recovered data in I/Q form are fed into the decoder layers. The output of the decoder is a vector of scores with positive and negative real values. which can be transformed into a one-hot vector, i.e., a symbol decision.

We convert the output scores into symbol probabilities using the following softmax function

(6)$$S(r_i)=\frac{e^{r_i}}{\sum_{j=1}^{M} e^{r_j}},$$

where $r_i$ is the $i^{\textrm{th}}$ entry of the scores vector $\vec {r}$.

We use the cross entropy loss function (also called the cost function) defined as

(7)$$l(\vec{v},\vec{r}) ={-}\sum_{i}^{}v_i \log{S(r_i)},$$

where $\vec {v}$ is the input one-hot vector. As shown in [30], minimizing binary cross entropy loss is equivalent to maximizing GMI. Following the same logic, minimizing cross entropy loss (non-binary form) is equivalent to the maximizing the MI. When we examined binary cross entropy loss, the constellations we obtained were of poor quality, even with initialization. Accordingly, we use one-hot symbols for optimization.

Random data are used in all stages of this work including training of the AE model and generating both simulation and experimental results. We consider full batch gradient descent approach (i.e., using a batch of all randomly generated samples in each epoch) while training. Training is not limited to a specific number of epochs. In each training step, we check average movement of the normalized constellation points compared to the previous step. If the movement is higher than a manually set threshold (1e-4 here), we let the model continue training. We implement the AE in Python, while PyTorch is used as our deep learning framework. Training hyper-parameters of the AE are presented in Table 1.

Table 1. Hyper-parameters of the AE

View Table | View all tables in this article

2.3 Bit-to-symbol mapping

In communication systems, there are two main error criteria, the symbol error rate (SER) and bit error rate (BER). These two criteria are strongly correlated per the following inequality

(8)$$\frac{SER}{BER} \leq \log_2{M}.$$

A Gray code for the bit-to-symbol mapping is one that achieves the BER lower bound [31]. In this ideal case the ratio SER/BER equals the number of bits per symbol. We focus on the optimization of SER by adopting one-hot symbols at the AE input. We do not optimize the bit-to-symbol mapping, but we do examine easily implemented ad-hoc mappings and their achievable BER.

Generally, there exist two strategies for optimizing the bit-mapping. One is to first optimize the SER performance and then optimize the bit-mapping for the obtained constellation. The other is to use binary sequences at the input rather than one-hot vectors [13,32]. As discussed in [1], the shaping quality degrades with this approach, hence we turn to our initialization to address this challenge.

We adopt an initialization constellation with an established mapping, a Gray code if available. We preserve the bit-to-symbol mapping with our final constellations, although this will not provide the optimal mapping for the final, optimized constellation. We will show via simulation that this approach is near-optimal, with a very small gap between the achieved BER and the ideal BER in (8).

3. Initialization of AE

The constellation shaping problem is non-convex, hence performance varies greatly with the initial state of the training model. Neural network models typically have distinct initial weights for different types of NN layers. When using random weight initialization, as well as random training data, the ultimate result is not unique. When we animate the movement of constellation points for each training process, we observe variations in the resulting constellations, even when using similar hyper-parameters. We also investigated more tailored blind initialization approaches (e.g., Xavier or Kaiming weight initialization), but the optimizer did not improve much.

Deterministic initialization of the weights is more promising. Starting from a sub-optimal solution, we can streamline the search during training and reach a more global optimization. We adopt an initialization of the AE weights that mimics an existing sub-optimal constellation with good performance against residual phase noise.

We create a feed forward neural network with the same structure as the AE encoder and train the network to produce I/Q coordinates of the adopted initialization constellation. Likewise, we create a second feed forward neural network with the same structure as the AE decoder and train this network to produce one-hot vectors following an ideal, noise-free channel. The weights found during this process are used to initialize the AE encoder and decoder which will next be trained on a PN and AWGN channel.

To find the best initialization constellation, we tested several QAM constellations. Square QAM was the obvious choice as it has the simplest structure. The hexagonal QAM constellation has optimal minimal distance. The amplitude phase shift keying (APSK) constellation has an interesting circular symmetry, and can be parameterized via a variety of phase levels and number of points on each layer (e.g., 16APSK of 2/6/8, 8/8, 4/8/4, etc.). There have also been constellations developed specifically for phase noise channels [10]. Among the constellations we tested, APSK provided the best performance. When initializing with the constellations in [10], performance was slightly worse than with APSK. The other constellations, e.g., square QAM, irregular hexagonal QAM (I-HQAM), showed about 80% decrease in the performance. The shared circular geometry of APSK constellations and received noisy, phase-rotated symbols could be the reason why this format is effective for initialization. The 16-ary and 64-ary APSK constellations [33] used as AE initializers are presented in Fig. 3(a) and 3(b), respectively. We also indicate the Gray bit-to-symbol mapping used with these constellations. The decimal equivalent of the assigned bit sequence is noted adjacent to the symbol.

Fig. 3. APSK constellations for AE initialization with the Gray mapping for (a) 16APSK, and (b) 64APSK.

Download Full Size | PDF

Not only does initialization improve the geometric gain by lowering the SER, it also can improve BER by providing a reasonable bit-to-symbol mapping, i.e., a quasi-Gray mapping. There is no definitive method in the literature to find a Gray mapping for non-symmetric constellations such as those generated with AEs. Some techniques, such as Binary switching algorithm (BSA) [34], give a quasi-Gray mapping for constellations. In this paper we show that if we initialize the model with a Gray mapped constellation, the final constellation will also have a quasi-Gray mapping.

4. Simulation results

We simulate joint optimization of geometric constellations and detection with an initialized AE in the presence of AWGN and laser PN. We examine 16QAM and 64QAM and sweep AWGN. We assume 60 Gbaud operation and 4 MHz linewidth in the body of the paper, with results for two other phase noise levels included in an appendix.

To illustrate the advantage of initialization we train first with random weight initialization and refer the resulting constellation as AE. Typical AE constellations for 16QAM and 64QAM are given in Fig. 4(a) and 4(b), respectively. The constellation we find with our initialization with the APSK constellation in Fig. 3 is referred to as I-AE. Typical I-AE constellations for 16QAM and 64 QAM are given in Fig. 4(c) and 4(d), respectively. All constellations are obtained for LW 4 MHz at the highest SNR examined (i.e., 18 dB for 16QAM and 24 dB for 64QAM). The constellations have limited symmetry and the outer symbols tend to be spaced farther apart.

Fig. 4. For LW 4 MHz, AE constellations at (a) 16QAM, 18 dB SNR, and (b) 64QAM, $24$ dB SNR; and I-AE constellations at (c) 16QAM,18 dB SNR and (d) 64QAM, $24$ dB SNR.

Download Full Size | PDF

We will also compare performance against square QAM (SQAM), and two PN-robust constellations found via different gradient descent approaches. The constellation from [10] (we refer to as SG1) optimized GMI for a specific SNR and PN level. The constellation from [7] (we refer to as SG2) optimized SER for a specific SNR and fixed phase offset. For the sake of easy comparison to previously published results [10], we adopt PN parameters in Table 2 and $4.5\%$ pilot rate. The phase noise variance is calculated assuming 60 Gbaud operation at three net laser LWs (accounting for both transmitter and receiver lasers).

Table 2. PN levels considered for the $4.5\%$ pilot rate

View Table | View all tables in this article

We will compare the performance using several metrics: SER, BER, and GMI. In subsection 4.1, we adopt the classic maximum likelihood receiver. The noisy symbol is compared to all reference symbols in turn and the closest reference symbol is selected as the output decision. This ML receiver is optimal in an AWGN, but not for the PN channel. Both AE and non-AE constellations are compared with this receiver; this was the receiver typically used [7,10,23]. In subsection 4.2, we compare the performance of the two AE solutions using the decoder found during the AE constellation design. The noisy IQ coordinates at the receiver are fed to the decoder to produce a symbol decision from the NN.

4.1 Initialized AE performance with classic ML(AWGN) detection

We estimate performance via Monte Carlo simulations for 16-ary and 64-ary SQAM, SG1 and our AE constellations with (I-AE) and without (AE) initialization. No 64QAM was found in [7], so we only include SG2 for the case of 16QAM. We use the standard ML detector for an AWGN channel, that is, the symbol at the closest Euclidean distance is used as a symbol decision. We refer to this as ML(AWGN) detection.

The SER performance is presented in Fig. 5(a). For 64QAM, the AE constellations outperform the other constellations. Comparing two AE generated constellations, the initialized I-AE constellation outperforms AE at all SNRs, and the gain goes up to 0.8 dB. For 16QAM, the two AE constellations achieve performance close to SG2.

Fig. 5. Performance of 16QAM and 64QAM at 4 MHz, 60 Gbaud, with ML(AWGN) detection for (a) SER, (c) BER, (b,d) GMI. Annotations in (c) indicate average ratio SER/BER across SNRs.

Download Full Size | PDF

The bit-to-symbol mapping determines if gains in SER transfer to BER. An ideal Gray mapping exists [33] for SQAM. The SG1 constellations have a mapping found using binary label switching and it is nearly ideal. No mapping was provided for the 2/6/8 constellation or the AE constellation without initialization; we used ad hoc mappings found manually.

For initialized AE constellations, we use the APSK Gray mapping even after the symbol points have been shifted by the AE operation. Figure 3 indicates a decimal equivalent of the bit sequence assigned to the adjacent symbol for 16-ary and 64-ary APSK. After geometric shaping it is no longer a Gray mapping.

We present the BER performance in Fig. 5(c). The annotations next to each curve show the ratio of SER/BER averaged over SNRs. This quantifies the performance of bit-to-symbol mapping. The higher the ratio, the better the bit-to-symbol mapping. For an ideal bit-to-symbol mapping, the SER/BER would equal the number of bits per symbol, that is 4 for 16QAM and 6 for 64QAM. We see similar performance trends as for SER. The exception is for the non-initialized AE. It faces severe performance degradation due to the lack of an effective mapping. Both SQAM have nearly ideal mappings (4 and 6), and SG1 is close to ideal with values of 3.75 and 5.73 for 16 and 64QAM, respectively. Nonetheless, the best performance remains with I-AE, but the gain is somewhat reduced.

The ratio of the SER to BER is plotted against SNR in Fig. 6. The square and SG1 constellations have lower BER, but their mapping is excellent and close to the upper bound. The better mapping of the initialized AE contributes to the greater advantage in BER compared to the uninitialized AE constellation. The I-AE has room for increased performance if a Gray code could be found.

Fig. 6. SER/BER vs. SNR with ML(AWGN) detector for various constellations; upper bound is $\log _2M$.

Download Full Size | PDF

As shown in Fig. 5(b), 5(d), the final metric we examine is GMI. The GMI for 16QAM and 64QAM are bounded by 4 and 6, respectively. The GMI increases with SNR, approaching these bounds closely at high SNR. As with Fig. 5(c), I-AE achieves the highest GMI for both 16QAM and 64QAM, which again validates its effectiveness. Without proper initialization, the performance of AE can be even lower than SQAM at relatively low SNR.

We present the 4 MHz LW results in the main body of the paper. In general, larger LW means greater residual PN and therefore more potential gain. The I-AE capitalizes on this potential. In appendix B, we include GMI results for LWs of 1 MHz and 100 kHz.

4.2 Performance with enhanced detection

For results in the previous section we used the classic ML(AWGN) based on Euclidean distance. We focused on the performance improvement solely from the constellation optimization. However, the decoder produced during the AE constellation design can also be used as a detector. Another enhanced detector is the maximum likelihood detector for the partially coherent phase noise channel, ML(PC-AWGN), described in [28].

4.2.1 Initialized vs. uninitialized constellations

We first examine constellations designed with machine learning - with and without initialization. We simulate the 4 MHz LW, 60 Gbaud case for 16QAM and 64QAM, and report SER, BER, and GMI in Fig. 7. Results with the AE decoder have filled markers, while for easy comparison the unfilled markers repeat ML(AWGN) detector results from Fig. 5. The square markers indicate performance with our initialization technique, while diamond markers are for the uninitialized approach.

Fig. 7. Performance of 16QAM and 64QAM at 4 MHz, 60 Gbaud, with AE detection for (a) SER, (c) BER, (b,d) GMI. Annotations in (c) are SER/BER averaged across SNRs.

Download Full Size | PDF

There is clear improvement in SER in Fig. 7(a) for 64QAM (about 0.8 dB) when using the AE decoder as a detector, for both AE and I-AE (initialized). That is, the filled markers are always to the left of the unfilled markers. For 16QAM there is visible SER improvement, although much less than for 64QAM. In all cases, the initialized approach yields the best performance, with similar margin. That is, the decoder works well for detection for constellations found with or without initialization. However, initialization leads to constellations with the best performance.

The BER results in Fig. 7(c) show the importance of the symbol mapping above all. We see greater differentiation at 16QAM. The AE (uninitialized) approach gets no advantage with the better decoder as the bit-to-symbol mapping is too poor. For 64QAM, the improvement between ML and AE decoders is present for both AE and I-AE. However, the initialized approach pulls away with better performance.

Finally, from GMI in Fig. 7(b), 7(d) we see a clear progression in performance whether for 16QAM or for 64QAM. The initialized outperforms the AE, no matter which detector is used. The AE detector is always superior to the ML detector. The combination of initialization and AE decoder results in the substantial gain between uninitialized constellation with standard receiver (diamond unfilled) and initialized, end-to-end solution using the AE decoder (square filled). Comparing diamond and square markers, it is clear that the effect of initialization is more prominent than the decoder (filled vs. unfilled). Finally, the GMI of all schemes approach the upper bound as the SNR grows, which indicates that the phase noise is well-compensated.

4.2.2 Autoencoder vs. ML(PC-AWGN) approaches

The stochastic gradient algorithm used to find the SG1 constellations with a ML(AWGN) approach were updated to SG1* constellations in [27]. The SG-1* constellations were optimized for use with ML(PC-AWGN) detection. We implemented the ML(PC-AWGN) decoder described in [28]. The SG1* constellations found in [27] were downloaded from the file repository for the publication. In Fig. 8, at each SNR point we used the SG1* and AE constellations and the decoder optimized for that SNR. The ML(PC-AWGN) used the known SNR and residual phase noise variances.

Fig. 8. BER performance for 16QAM (dashed) and 64QAM (solid) for: the SG1 constellation (circles), the PC-AWGN channel (triangles), and the I-AE constellation (squares). Markers are empty when using ML(AWGN).

Download Full Size | PDF

We can see in Fig. 8 that the AE outperforms the 64QAM constellation SG1* that was found with the PC-AWGN maximum likelihood equation when the SNR value does not exceed 24 dB. In the case of 16QAM, the AE is able to see significant improvement for all SNR values examined. In Fig. 2 of [28], we see that ML(PC-AWGN) vs. ML(AWGN) varies with both the phase noise level (alpha) and the modulation order. The training in AE seems more robust to variations in modulation order.

4.2.3 Complexity considerations

Implementing a shaped constellation requires limited additional complexity to standard constellations. Once the encoder is trained, the symbols are sent once through the encoder and the constellation points are recorded. All shaped constellations (NN or otherwise) can be implemented in a simple look up table.

The decoder will have greater complexity than an ML(AWGN) detector using Euclidean distance. The decoder neural network maps the two IQ components to 64 and then $M$ (modulation order of 16 or 64 were examined) outputs, i.e., 2$\rightarrow$64$\rightarrow \!\!\!M$. It includes $2\times 64 + 64 \times M$ multiplications. A SiLU function is applied to 64 values and a softmax function is applied to $M$ values to generate the detection result. The ML(PC-AWGN) will also have greater complexity than ML(AWGN), as discussed in [27].

5. Experimental evaluation of the results

We validated our 16QAM results experimentally using the setup in Fig. 9. For experimental convenience we implement the PN level of 1 MHz LW at 60 GBaud with a 100 kHz LW at 6 GBaud. With this baud rate we avoid inter-symbol interference (ISI) and can focus on residual PN.

Fig. 9. Experimental setup, acronyms explained in the text.

Download Full Size | PDF

We generate the I and Q signals in MATLAB with the appropriate constellation. We send these signals at 6 GBaud to a digital-to-analog converter (DAC, Fujitsu Oola, 3 dB bandwidth of16 GHz). The DAC has 84 Gsamples/s, yielding a non-integer number of samples per symbol. We modulate the transmitter laser diode (CoBrite DX1, 100 kHz LW) with an I/Q modulator (SHF 86213D, 3 dB bandwidth of 25 GHz).

The AWGN loading and OSNR sweeping is realized by a pair of variable optical attenuator (VOA) and erbium-doped fiber amplifier (EDFA). A polarization controller fixes the signal polarization injected into the coherent receiver, as polarization diversity is not employed in our experiments. We tap off 10% signal power for OSNR monitoring with an optical spectrum analyzer (OSA, Ando AQ6317B). The remaining 90% power enters a coherent receiver (Picometrix, 3 dB bandwidth of 21 GHz). The local oscillator (Teraxion PureSpectrum) used for the coherent detection has a low linewidth at 5 kHz, contributing negligible PN compared to the signal laser.

The electrical signals are sampled by a real-time oscilloscope (RTO, Keysight Infiniium 90000 series, 80 GSa/s, 3 dB bandwidth of 33 GHz) and sent to MATLAB for offline processing. The offline digital signal processing includes clock recovery, carrier frequency synchronization, and time synchronization. We use the pilot-based carrier phase error (CPE) recovery with a $4.5\%$ pilot symbol ratio. Finally, over 2 million symbols are used for error counting.

We examine three different constellations experimentally: conventional square QAM, SG1 [10], and I-AE 16QAM. While in simulations constellations are found for each SNR, we use one constellation (found in simulation for 16 dB SNR) for all OSNR in our experiment.

We use maximum-likelihood ML(AWGN) symbol decision based on the minimum Euclidean distance for square and SG1 constellations. We examine AE decoder detection for the I-AE 16QAM.

Since the experimental channel differs from the simulation, the AE decoder is found with experimental data. We use a data set at 30 dB OSNR to retrain the AE decoder, and then use this decoder for all OSNR. We use the same AE structure, but this time with the decoder weight fixed - no update is made through back propagation. Random data are used at the input of the AE. The encoder output is transferred to the experimental channel. The captured received signal is sent to the decoder offline.

We give our experimental SER results in Fig. 10(a). With conventional SQAM, we see a SER floor at $2e{\!-\!3}$. The major reason for this error floor is the residual phase noise (RPN), visible in the constellations provided as insets. The SQAM constellation diagram has significant distortion from RPN at the outer constellation points. The SG1 16QAM and AE 16QAM constellations have significantly reduced SER floors. The SG1 16QAM reduces the required optical signal-to-noise ratio (OSNR) at SER of $2e{\!-\!3}$ by 3.5 dB, although a SER floor still presents at $1 e{\!-\!3}$.

Fig. 10. Experimental evaluation of (a) SER and (b) BER. Annotations in (b) are SER/BER averaged over OSNRs.

Download Full Size | PDF

Our AE 16QAM further reduces the required OSNR by 1 dB at $2e{\!-\!3}$, even without retraining the AE decoder. An extra 0.8 dB OSNR reduction is achievable with AE decoder at the same SER level. The inset of AE constellation diagram verifies the benefit, in which the residual PN brings discernibly less overlap at the outer circle constellations, compared to the constellation diagram of square 16QAM.

In Fig. 10(b) we present BER results when applying the bit-to-symbol mapping adapted from the APSK initialization. We see similar trends in the BER curves as we saw with SER, i.e., same order of curves from best to worst. We see improvement in BER that comes close to the Gray mapping factor of four. The conventional square 16QAM sees a BER floor at around $5\times 10^{-4}$ above an OSNR of 28 dB. The SG1 constellation capitalizes on its very high SER/BER ratio to have BER performance much closer to the I-AE than was the case for SER. Nonetheless, the I-AE has a lower floor, especially for the decoder trained on experimental data.

We note that the I-AE solution is robust; it performs well at OSNRs far from OSNR used to design the constellation. Our I-AE outperforms conventional square QAM and SG1 for all the swept OSNR. Due to the error floors, we observe larger performance gains in our experiments than simulations.

6. Conclusion

We exploited machine learning for constellation shaping and detection. We proposed an AE-based end-to-end learning system for PN channels with AWGN that exploits AE initialization. Our AE-based scheme initialized the constellation to APSK.

Extensive simulation results were presented for initialized, non-initialized AEs, and received data detected with ML and AE detector. Our solution outperforms previously proposed shaped constellations (whether found without machine learning or with neural networks) under all scenarios considered in terms of SER and BER, and GMI. Using an AE detector further increases gain with initialization. The performance improvement grows with the modulation order and the PN intensity. We confirmed our method experimentally for 16QAM modulation in terms of SER and BER.

Appendix A: Pilot-based phase recovery

Details of pilot-based phase recovery are explained here. As mentioned before, a $4.5\%$ pilot rate is employed in this work. At the transmitter, we insert a QPSK pilot symbol after each 32 data symbols; pilots are placed in the entries 0, 33, 65 and so on. The pattern of pilot symbols is random. In each 67 symbols there are 3 pilots; the overhead rate is ${100*3}/{67} \approx 4.5\%$.

The phase estimation process is shown in Fig. 11. We extract the received pilot symbols ($r_p$) from the received symbols ($r$). We estimate the phase rotation of the pilots per

(9)$$\begin{aligned} & \Delta \theta_p = \angle r_p - \angle x_p, \\ & \hat{\theta}_p = \angle r_p - \Delta \theta_p \end{aligned}$$

where $x_p$ denotes the transmitted pilots. We apply a linear interpolation to estimate the Wiener phase for each block of the 64 data symbols using three pilots (preceding $\hat {\theta }_{p_i}$, mid-block $\hat {\theta }_{p_{i+1}}$ and following $\hat {\theta }_{p_{i+2}}$), i.e.,

(10)$$\hat{\theta} = f(\hat{\theta}_{p_i}, \hat{\theta}_{p_{i+1}}, \hat{\theta}_{p_{i+2}}, N),$$

where $f(.)$ is the linear interpolation function over $N=64$ points. Finally, the estimated phase is applied to the received symbols to generate the phase recovered samples for detection.

Fig. 11. Phase recovery algorithm based on $4.5\%$ pilot rate.

Download Full Size | PDF

To illustrate the impact of PN and CPE, Fig. 12 compares the received symbols before and after phase recovery for 16 square QAM. The SNR estimated from the received PN constellation is about 17 dB. The LW is 4 MHz and the symbol rate is 15 Gbaud. Figure 12(a) presents the received symbols with rotated by a Wiener process and before CPE. Figure 12(b) and 12(c) are constellations after carrier phase recovery with $4.5\%$ and $100\%$ pilots rate, respectively. It is clear that the CPE is essential and effective. Due to the presence of AWGN, residual PN exists even in the case of $100\%$ pilot rate; note the non-circular clusters around outer symbols.

Fig. 12. Received symbols at 17 dB SNR for PN from 4 MHz LW and 15GB transmission (a) before phase recovery, (b) after carrier phase recovery with $4.5\%$ pilot rate, and (c) after best possible phase recovery ($100\%$ pilots rate).

Download Full Size | PDF

Appendix B: GMI performance and SER/BER rate of 1 MHz and 100kHz LWs

B.1 GMI performance of 1 MHz and 100 kHz LWs

Included LWs are narrower than that studied in the main body of the paper (4 MHz), and thus refer to lasers of greater cost but less phase noise. For the case of the maximum likelihood receiver, the performance in terms GMI are presented in Fig. 13(a) for 1 MHz LW, and in Fig. 13(b) for 100 kHz LW. We note that LW=100 kHz was not considered in the SG1 paper Fig. 13(b). We next turn to the case of the AE decoder receiver. We present the GMI performances for 1 MHz LW in Fig. 13(c), and 100 kHz LW in Fig. 13(d).

Fig. 13. GMI performance achieved at 60 Gbaud for 16QAM (dashed) and 64QAM (solid) With ML detector for LWs of (a) 1 MHz and (b) 100 kHZ and With AE detector for LWs of (c) 1 MHz and (d) 100 kHZ.

Download Full Size | PDF

As expected, narrower LWs lead to better performance for all constellations as the PN is reduced. Nonetheless, the trend is similar to the LW of 4 MHz. We note that in [14] the phase noise level examined was the equivalent of 200 kHz at our rate of 60 Gbaud. They examined only mutual information and for the blind phase search rather than pilot-assisted phase recovery. Their 64QAM constellation without initialization saw only small advantage to using the AE decoder, and only at low SNR. With the pilot-assisted phase recovery even without initialization the AE decoder continues to give modest improvement even at high SNR. With initialization, we see a greater advantage when adopting the AE encoder.

B.2 SER/BER rate of experimental results

The ratio of SER to BER with the AE detector is shown in Fig. 14. Results for simulations are presented in Fig. 14(a), while results for experimental performance are given in Fig. 14(b). Again we see that the quality of bit-to-symbol mapping for initialized AE dominates non-initialized case. Although, bit-to-symbol mapping quality is lower in AE results compared to SG2 and SQAM, we still have the highest BER performance when we use the initialized AE as well as AE detector.

Fig. 14. $SER/BER$ ratio versus SNR for the constellations received using the AE detector in (a) 16QAM and 64QAM simulations and (b) the 16QAM experiment evaluation; upper bound is $\log _2M$.

Download Full Size | PDF

Disclosures

The authors declare no conflicts of interest.

Data Availability

Data underlying the results presented in this paper are randomly generated in Python using random seed and thus, are not included.

References

1. A. Omidi, M. Zeng, J. Lin, and L. A. Rusch, “Geometric constellation shaping using initialized autoencoders,” in 2021 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), (2021), pp. 1–5.

2. J. Lee and S. Leyffer, Mixed Integer Nonlinear Programming, The IMA Volumes in Mathematics and its Applications (Springer New York, 2011).

3. P. K. Singya, P. Shaik, N. Kumar, V. Bhatia, and M.-S. Alouini, “A survey on higher-order QAM constellations: Technical challenges, recent advances, and future trends,” IEEE Open J. Commun. Soc. 2, 617–655 (2021). [CrossRef]

4. S. Park and M. Byeon, “Error performances of 64-ary triangular quadrature amplitude modulation in AWGN channel,” in 2007 IEEE 65th Vehicular Technology Conference - VTC2007-Spring, (2007), pp. 1752–1755.

5. W. Webb, “QAM: the modulation scheme for future mobile radio communications?” Electron. Commun. Eng. J. 4(4), 167–176 (1992). [CrossRef]

6. Z. Ugray, L. Lasdon, J. Plummer, F. Glover, J. Kelly, and R. Martí, “Scatter search and local nlp solvers: A multistart framework for global optimization,” INFORMS Journal on Computing 19(3), 328–340 (2007). [CrossRef]

7. T. Pfau, X. Liu, and S. Chandrasekhar, “Optimization of 16-ary quadrature amplitude modulation constellations for phase noise impaired channels,” in 2011 37th European Conference and Exhibition on Optical Communication, (2011), pp. 1–3.

8. Z. Qu and I. B. Djordjevic, “On the probabilistic shaping and geometric shaping in optical communication systems,” IEEE Access 7, 21454–21464 (2019). [CrossRef]

9. F. Kayhan and G. Montorsi, “Constellation design for memoryless phase noise channels,” IEEE Trans. Wireless Commun. 13(5), 2874–2883 (2014). [CrossRef]

10. H. Dzieciol, G. Liga, E. Sillekens, P. Bayvel, and D. Lavery, “Geometric shaping of 2-D constellations in the presence of laser phase noise,” J. Lightwave Technol. 39(2), 481–490 (2021). [CrossRef]

11. Q. Huang, M. Jiang, and C. Zhao, “Learning to design constellation for AWGN channel using auto-encoders,” in 2019 IEEE International Workshop on Signal Processing Systems (SiPS), (2019), pp. 154–159.

12. R. T. Jones, T. A. Eriksson, M. P. Yankov, and D. Zibar, “Deep learning of geometric constellation shaping including fiber nonlinearities,” in 2018 European Conference on Optical Communication (ECOC), (2018), pp. 1–3.

13. R. T. Jones, M. P. Yankov, and D. Zibar, “End-to-end learning for GMI optimized geometric constellation shape,” in 45th European Conference on Optical Communication (ECOC 2019), (2019), pp. 1–4.

14. O. Jovanovic, M. P. Yankov, F. D. Ros, and D. Zibar, “Gradient-free training of autoencoders for non-differentiable communication channels,” J. Lightwave Technol. 39(20), 6381–6391 (2021). [CrossRef]

15. M. Schaedler, S. Calabrò, F. Pittalà, G. Böcherer, M. Kuschnerov, C. Bluemm, and S. Pachnicke, “Neural network assisted geometric shaping for 800 Gbit/s and 1 Tbit/s optical transmission,” in 2020 Optical Fiber Communications Conference and Exhibition (OFC), (2020), pp. 1–3.

16. A. Cohen and S. Derevyanko, “Generative adversarial network and end-to-end learning for optical fiber communication systems limited by the nonlinear phase noise,” in 2021 IEEE International Conference on Microwaves, Antennas, Communications and Electronic Systems (COMCAS), (2021), pp. 241–246.

17. M. G. Taylor, “Phase estimation methods for optical coherent detection using digital signal processing,” J. Lightwave Technol. 27(7), 901–914 (2009). [CrossRef]

18. M. Parniak, I. Galinskiy, T. Zwettler, and E. S. Polzik, “High-frequency broadband laser phase noise cancellation using a delay line,” Opt. Express 29(5), 6935–6946 (2021). [CrossRef]

19. M. Magarini, L. Barletta, A. Spalvieri, F. Vacondio, T. Pfau, M. Pepe, M. Bertolini, and G. Gavioli, “Pilot-symbols-aided carrier-phase recovery for 100-g pm-qpsk digital coherent receivers,” IEEE Photonics Technol. Lett. 24(9), 739–741 (2012). [CrossRef]

20. M. Xiang, S. Fu, L. Deng, M. Tang, P. Shum, and D. Liu, “Low-complexity feed-forward carrier phase estimation for M-ary QAM based on phase search acceleration by quadratic approximation,” Opt. Express 23(15), 19142–19153 (2015). [CrossRef]

21. X. Guan, A. Omidi, M. Zeng, and L. A. Rusch, “Experimental demonstration of a constellation shaped via deep learning and robust to residual-phase-noise,” in Conference on Lasers and Electro-Optics, (Optica Publishing Group, 2022), p. SW4E.2.

22. A. Rode, B. Geiger, and L. Schmalen, “Geometric constellation shaping for phase-noise channels using a differentiable blind phase search,” in Optical Fiber Communication Conference (OFC) 2022, (2022), p. Th2A.32.

23. O. Jovanovic, M. Yankov, F. Da Ros, and D. Zibar, “End-to-end learning of a constellation shape robust to variations in SNR and laser linewidth,” in Proceedings of European Conference on Optical Communications 2021, (2021).

24. K. Gümüs, A. Alvarado, B. Chen, C. Häger, and E. Agrell, “End-to-end learning of geometrical shaping maximizing generalized mutual information,” in Optical Fiber Communications Conference and Exhibition (OFC), (2020), pp. 1–3.

25. H. Ghozlan and G. Kramer, “Models and information rates for wiener phase noise channels,” IEEE Trans. Inf. Theory 63(4), 2376–2393 (2017). [CrossRef]

26. G. J. Foschini and R. D. Gitlin and S. B. Weinstein, “On the selection of a two-dimensional signal constellation in the presence of phase jitter and Gaussian noise,” The Bell Syst. Tech. J. 52(6), 927–965 (1973). [CrossRef]

27. H. Dzieciol, E. Sillekens, G. Liga, P. Bayvel, R. Killey, and D. Lavery, “The partially-coherent awgn channel: Transceiver strategies for low-complexity fibre links,” J. Lightwave Technol. 39(17), 5423–5431 (2021). [CrossRef]

28. M. Sales-Llopis and S. J. Savory, “Approximating the partially coherent additive white gaussian noise channel in polar coordinates,” IEEE Photonics Technol. Lett. 31(11), 833–836 (2019). [CrossRef]

29. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal Representations by Error Propagation (MIT Press, Cambridge, MA, USA, 1986), p. 318–362.

30. T. Matsumine, T. Koike-Akino, and Y. Wang, “Deep learning-based constellation optimization for physical network coding in two-way relay networks,” in Proc. IEEE ICC, (2019), pp. 1–6.

31. P. Massoud Salehi and J. Proakis, Digital Communications (McGraw-Hill Education, 2007).

32. R.-J. Essiambre, R. Ryf, M. Kodialam, et al., “Increased reach of long-haul transmission using a constant-power 4d format designed using neural networks,” 2020 European Conference on Optical Communications (ECOC), (2020), pp. 1–4.

33. Q. Wang, Q. Xie, Z. Wang, S. Chen, and L. Hanzo, “A universal low-complexity symbol-to-bit soft demapper,” IEEE Trans. Veh. Technol. 63(1), 119–130 (2014). [CrossRef]

34. F. Schreckenbach, N. Gortz, J. Hagenauer, and G. Bauch, “Optimization of symbol mappings for bit-interleaved coded modulation with iterative decoding,” IEEE Commun. Lett. 7(12), 593–595 (2003). [CrossRef]

Hyper-parameter	Value
hidden layers and hidden units	(64,2,2,64)
Learning rate	0.005
Activation function	SiLU
Optimizer	RMSprop
Batch size	6416 (16QAM), 6464 (64QAM)
Loss function	CrossEntropyLoss

Linewidth (LW)	Phase noise variance ( $σ_{P N}^{2}$ )
4 MHz	0.0004
1 MHz	0.0001
100 kHz	0.00001

Hyper-parameter	Value
hidden layers and hidden units	(64,2,2,64)
Learning rate	0.005
Activation function	SiLU
Optimizer	RMSprop
Batch size	6416 (16QAM), 6464 (64QAM)
Loss function	CrossEntropyLoss

Linewidth (LW)	Phase noise variance ( $σ_{P N}^{2}$ )
4 MHz	0.0004
1 MHz	0.0001
100 kHz	0.00001

Initialized autoencoders to find constellations robust to residual laser phase noise

Abstract

1. Introduction

2. Principal concepts

2.1 Communication system model

2.2 End-to-end learning system for constellation optimization

2.3 Bit-to-symbol mapping

3. Initialization of AE

4. Simulation results

4.1 Initialized AE performance with classic ML(AWGN) detection

4.2 Performance with enhanced detection

4.2.1 Initialized vs. uninitialized constellations

4.2.2 Autoencoder vs. ML(PC-AWGN) approaches

4.2.3 Complexity considerations

5. Experimental evaluation of the results

6. Conclusion

Appendix A: Pilot-based phase recovery

Appendix B: GMI performance and SER/BER rate of 1 MHz and 100kHz LWs

B.1 GMI performance of 1 MHz and 100 kHz LWs

B.2 SER/BER rate of experimental results

Disclosures

Data Availability

References

Data Availability

Cited By

Figures (14)

Tables (2)

Equations (10)

Optics Continuum