Optimizations and investigations for transfer learning of iteratively pruned neural network equalizers for data center networking

Jiawang Xiao; Lin Sun; Caoyang Liu; Gordon Ning Liu

doi:10.1364/OE.468716

1. Introduction

Driven by the continuous growth of data consumption for Internet applications such as video streaming services, virtual / augmented reality (VR/AR) and cloud computing, the demand for high-capacity optical communications is ever increasing. As for the optical signal reception, coherent detection is mainly used in long-haul optical transmissions due to its high sensitivity during optical field detection, while optical direct detection scheme is preferable in short-reach scenario thanks to its advantages of simplicity, low cost and low power consumption [1]. To improve the spectrum efficiency of optical intensity modulation and direct detection (IMDD) systems, advanced modulation formats have to be adopted such as four-level pulse amplitude modulation (PAM-4), discrete multi-tone (DMT) and carrier-less amplitude and phase modulation (CAP) [2–4]. Among them, PAM-4 is the most straightforward format which improves spectrum efficiency by the use of more amplitude levels [5]. However, the main challenge of PAM-modulated optical IMDD system is the missing of optical phase information after square law detection, that results in a type of nonlinear distortion when square law detection interplays with inter-symbol interference (ISI) [6]. Therefore, various equalizers at the receiver have been proposed to mitigate the nonlinear distortion, such as decision feedback equalizer (DFE), Volterra equalizer, maximum-likelihood sequence estimation (MLSE) and neural network equalizers (NNEs) [7–11].

Among them, NNEs are advantageous over the others for dealing with strong nonlinear distortions because they could construct the inverse transfer function by training the neuron connections among multiple layers [12–17]. Because NNEs have more complex connections and nonlinear activation functions, they exhibit the improved performance but sacrificed with higher complexities at the same time. Moreover, NNEs normally need large numbers of prior-known symbols and epochs to train, resulting in the time consuming for equalizing parameter initialization. Practically for the link switching of optical networking, e.g., switching between links with various transmission distances, when a new optical link is switched to, the original NNEs parameters on the past link are not suitable to the newly-routed link. However, if the NNE were trained with initialized randomly, the long consuming time and redundant overheads limits the potentials of NNEs for the practical optical communications. In order to solve this problem, the idea of transfer learning (TL) of NNEs for different optical links has been proposed. When the optical network is switched to a new link (the target link), one can reuse the NNE parameters which have been trained in the source link instead of training with the random initialization.

For optical coherent communications with transmission distance over 50 km, TL has been employed for equalizing the fiber nonlinearity, which can reduce the number of training epochs. The related researches are stated in Table 1 [18–21]. In addition, TL has also been demonstrated on optical IMDD systems employing NNEs with different architectures, with the transmission distance below 50 km [22,23]. In general, TL-aided NNEs converge faster, so TL could speed up the equalizer initialization for those newly-switched links within the intra data center networks.

Table 1. Current researches on the TL-aided equalizers for optical communications

View Table

In this paper, we for the first time proposed an iterative pruning technique [24] to accelerate TL of NNEs among optical links with different lengths. Through conducting an optical IMDD Nyquist-shaped PAM-4 system based on concrete simulations on VPItransmissionMakers, the proposed TL scheme could further accelerate convergence and reduce complexity without sacrificing NNE performance. Moreover, the guidance of optimizing the pruned threshold and pruned span have also been provided, for ensuring a compromising performance between the performance stability and complexity.

2. Operational principle

2.1 Basics of recurrent neural network equalizer

For addressing the fading nulls due to fiber dispersion and square law detection, we employ autoregressive recurrent neural network (RNNE) with decision feedback inputs [17]. RNNE with only one hidden layer is utilized in this work for the complexity consideration, whose structure is given in Fig. 1.

Fig. 1. Structure of autoregressive recurrent neural network equalizer.

Download Full Size | PDF

In the input layer, the past, current and following samples are sent as the input neurons for the purpose of mitigating inter-symbol nonlinear distortion. Moreover, the input neurons also contain the decided samples. The output layer is to predict the samples after sufficient iterations during training. Transfer function of RNNE can be given by

(1)$$y = {{\boldsymbol W}_2}f({{{\boldsymbol W}_1}[{{\boldsymbol x},\hat{{\boldsymbol y}}} ]+ {{\boldsymbol b}_1}} )+ {{\boldsymbol b}_2}. $$

Vector ${\boldsymbol x}$ represents the feedforward neurons, y represents the output of RNNE and $\hat{{\boldsymbol y}}$ represents decision feedback neurons. Defining that the number of neurons in the i^th layer is nⁱ. ${{\boldsymbol W}_{\boldsymbol k}}$ and ${{\boldsymbol b}_{\boldsymbol k}}$ are the weighted matrix (n^k+1×n^k) and bias matrix (n^k+1× 1) respectively, where k = 1, 2, 3. Activation function is denoted by $f\; ({\cdot} )$. The training procedure of RNNE includes the prediction and back propagation (BP) processes. Once NNE predicted a sample, a time of BP is needed to update the parameters of RNNE to approach the minimum point by random gradient descent method according to $error\; = \; {({y - \; \hat{y}} )^2}$.

2.2 Principle of iterative pruning TL

The classical TL strategy is to transfer the trained RNNE parameters on the source link to the target link, to accelerate the initialization speed of training. We further simplify the RNNE structure by evaluating the redundant weights and pruning them iteratively during the TL process, which could further speed up the training of RNNE, as shown in Fig. 2.

Fig. 2. Transfer learning with iterative pruning of RNNE.

Download Full Size | PDF

For realizing iterative pruning, we firstly arrange the neurons in the RNNE on the source link in descending order of their absolute values, then reset the neurons below the predefined threshold T to 0. The pruning process is repeated every G times of BP. If T is pre-defined too large, the overcutting of the RNNE will suffer the performance degradation. While T is very small, the computation complexity can hardly be reduced. Consequently, T requires to be optimized for the iterative pruning TL. Then the training is carried out on the target link. Finally, the network converges as a sparsely-connected RNNE. When G is infinite, it is equivalent to the one-shot pruning algorithm [25]. The computation complexity C_PLEX of RNNE can be characterized by the full connection number:

(2)$${C_{PLEX}}{\; } = \; {\textrm{n}^2} \times {n^1} + {\textrm{n}^2} + \; {\textrm{n}^3} \times {n^2} + {\textrm{n}^3}. $$

For the pruned RNNE, C_PLEX equals to the residual connection number.

3. Results and discussions

For evaluating the proposed iterative pruning TL scheme, we conducted a concreate simulation platform on VPItransmissionMakers according to the parameters of the commercial devices. At the transmitter, a digital Nyquist-shaped PAM-4 signal is generated with a roll-off factor of 0.1. An 8-bit resolution digital-to-analog converter (DAC) is working at the baud rate of 56Gbaud. With a certain bias current provided by a driver, an electro-absorption modulator (EAM) is used to modulate the laser at 1550-nm wavelength. Then, SMF links with different lengths are used for transmission to emulate the routing within an intra data center network. At the receiver, the received optical power is tuned to 0 dBm by a variable optical attenuator (VOA). Then a photodetector (PD) with a sensitivity of 0.8 A/W is used to detect the optical signals. Offline digital signal processing (DSP) contains resampling, TL of RNNE and bit rate error (BER) counting. To our best acknowledgment, the longest reach of C-band 112Gbps-above PAM-4 optical IMDD links is 50 km to date [26]. Therefore, we believe it is reasonable to investigate several scenarios of TL with link distances at 30 km, 25 km, 20 km, 15 km and 10 km for C-band 112Gbps-above optical IMDD applications. TL among 112-Gbps links with different distances are performed by the sparsely-connected RNNE, as shown in Fig. 3. Here we utilize the Pytorch deep learning framework to examine the TL performance of the pruned RNNE.

Fig. 3. Validation setup of the iterative pruning TL among 112-Gbps PAM-4 IMDD links.

Download Full Size | PDF

We use autoregressive RNNE to mitigate the nonlinear impairment in the system which operates as a migration equalizer. Firstly, RNNEs are trained on the source links (corresponding distances are 10 km, 15 km, 25 km, 30 km respectively) to transfer the initial parameters of RNNE to the target link whose transmission distance is 20 km. During the training process, the length of the training set of 30 K is used. To avoid the performance degradation due to the insufficient training, we performed more than 30 epochs to ensure the convergence of TL-aided equalizers. For the proposed TL scheme, RNNE is iteratively pruned according to the predefined threshold T in the process of retraining, and finally converges as a sparsely-connected RNNE. In order to keep the balance between the complexity and BER, an appropriate size of RNNE needs to be optimized through finely choosing the pruned threshold T and span G.

The input layer of RNNE consists of $\textrm{n}_{\textrm{ff}}^1$ feedforward neurons and $\textrm{n}_{\textrm{fd}}^1$ feedback neurons. For 112-Gbps 20-km PAM4 signals with ROP at 0 dBm, the impact of $\textrm{n}_{\textrm{ff}}^1$ and ${n^2}$ on the BER can be indicated in Fig. 4(a).

Fig. 4. BERs of 112 Gb/s PAM4 signaling over 20-km fiber (a) with different ${n^2}$ and $\textrm{n}_{\textrm{ff}}^1$, and (b) with different $\textrm{n}_{\textrm{fd}}^1$.

Download Full Size | PDF

When $\textrm{n}_{\textrm{ff}}^1$ exceeds 27 and ${n^2}$ exceeds 25, the improvement of BER performance begins to saturate. Consequently, 27 feedforward neurons and 25 hidden layer neurons are sufficient to mitigate the signal distortion with an acceptable complexity. On the other hand, more neurons may lead to the instability and overfitting during training. However, for 20-km transmission with a severe power fading, the NNE with only feedforward neurons is not sufficient to compensate. Figure 4(b) shows the impact of feedback neurons on BER in RNNE. When $\textrm{n}_{\textrm{fd}}^1$ reaches over 8, it helps to realize a BER of lower than 7% overhead (OH) forward error correction (FEC) threshold. For balancing the complexity and BER performance, we choose the RNNE with a size of ${n^1} = 37$ ($\textrm{n}_{\textrm{ff}}^1 = 27$, $\textrm{n}_{\textrm{fd}}^1$=10) and ${n^2}$ = 26, ${n^3} = 1$.

Figure 5 depicts the BER of RNNE at every epoch for different TL scenarios. For the traditional RNNE, the parameters are initialized randomly and trained on the target link. As for TL, the RNNE parameters are migrated from the source link then retrained on the target link, instead of random initialization. As a general trend for the results in Fig. 5, TL-assisted RNNE converges faster than the traditional RNNE. In addition, with the help of iterative pruning, the convergence speed can be further accelerated. Specifically, when migrated from 10 km, 15 km, 25 km and 30 km links, the convergence speed for the 20-km link can be increased by 23.81%, 14.28%, 9.52% and 19.05%, respectively. It indicates that a farther migration could take more advantages of iterative pruning. However, because RNNE still maintains the information on the source link, wrong pruning will appear in the beginning epochs, resulting in the performance deterioration. By keeping the training, the BER tends to converge.

Fig. 5. BER vs. epochs curves and equalized eyes for the 20-km target link migrated from the source links with distances of (a) 10 km, (b) 15 km, (c) 25 km, (d) 30 km.

Download Full Size | PDF

To show the effectiveness of iterative pruning, the residual connections at every epoch are shown in Fig. 6. Without TL (the blue rectangles), RNNE continues to perform iterative pruning until it converges on the target link. While in the case of TL, pruning at the first several epochs is more moderate. That is due to the difference between the source link and target link. In addition, the number of remaining connections using TL is slightly higher than that without TL. The number of residual connections after pruning is even smaller than a half of that without pruning, which indicates that there are a large number of redundant connections in the traditional RNNE.

Fig. 6. The number of residual connections during pruning under different migration conditions.

Download Full Size | PDF

For pruned TL, there are two important parameters to be optimized: pruned threshold T and repetition G. A over lager T will cause performance loss. If T is too small, the improvement of convergence speed is inconspicuous. For the case of 30-km source link and 20-km target link, Fig. 7 shows the relationship between epoch and T. We use five different and convergent RNNE parameters as initialization parameters of the equalizer on the target link. A large G value means that between the two pruning processes, the NNE has enough time to accommodate the target link, thus the removed connection is of little benefit to the target link. A low G value means the equalizer starts pruning before it has learnt enough about the target link, resulting in incorrect pruning.

Fig. 7. The minimum convergence epoch number under different pruned threshold T and pruned span G at (a) 10, (b) 100, (c) 1000, (d) 10000.

Download Full Size | PDF

When G is infinite, it is equivalent to one-shot pruning. The equalizer is pruned and then migrated to the target link and there is only one pruning operation. As shown in Fig. 8, both the one-shot pruning algorithm and the iterative pruning algorithm can promote the convergence speed of the equalizer. However, the one-shot pruning is more stable.

Fig. 8. The minimum convergence epoch number under different pruned threshold T in one-shot pruning.

Download Full Size | PDF

Figure 9(a) shows the relationship between the residual neuron connection numbers and the pruned threshold after the equalizer converges under different G. All the pruned TL results shown in the figure are better than the TL without iterative pruning assistance in terms of convergence speed. Although the one-shot pruning algorithm is stable, when deployed in the link, the remaining neuron connection numbers are much higher than that with the iterative pruning algorithm. After iterative pruning of the equalizer, no matter how G is set, there is always a corresponding T value to make our proposed scheme work. Then the connection of residual neurons can be reduced by half after convergence, and its complexity is also only a half of that of traditional TL learning. T also affects the complexity of the equalizer deployed in the link. The smaller the T, the higher the complexity. In Fig. 8(b), we show the BER curve with different pruning parameter settings to assist transfer learning. One-shot pruning setting T = 0.0064, iterative pruning 1 stands for [T G] = [100 0.0019], iterative pruning 2 for [T G] = [1000 0.0059], and iterative pruning 3 for [T G] = [10000 0.023]. In all cases, the fluctuation of the convergent BER is within one order of magnitude. This proves once again that pruning algorithm plays a role in TL.

Fig. 9. (a) Residual neuron-connections vs. threshold curves; (b) BER vs. epochs curves under different settings.

Download Full Size | PDF

4. Conclusion

In this work, we propose a neural network equalization scheme for the data center network with pruning algorithm to assist transfer learning. The scheme further shortens the training time and deployment complexity of NNE on the basis of traditional TL. Using the proposed scheme, the migration performance can be significantly improved by pruning neuron connections, and the convergence speed of 10 km, 15 km, 25 km, and 30 km links is increased by 23.81%, 14.28%, 9.52%, and 19.05%, respectively. We believe that the proposed low complexity scheme is promising for optical switched data center networks that need to dynamically reconfigure optical interconnection.

Funding

National Key Research and Development Program of China (2018YFB1801701); National Natural Science Foundation of China (62105273).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. A. Ghiasi, “Large data centers interconnect bottlenecks,” Opt. Express 23(3), 2085–2090 (2015). [CrossRef]

2. C. Yang, R. Hu, M. Luo, Q. Yang, C. Li, H. Li, and S. Yu, “IM/DD-Based 112-Gb/s/lambda PAM-4 Transmission Using 18-Gbps DML,” IEEE Photonics J. 8(3), 1–7 (2016). [CrossRef]

3. G. N. Liu, L. Zhang, T. Zuo, and Q. Zhang, “IM/DD Transmission Techniques for Emerging 5G Fronthaul, DCI, and Metro Applications,” J. Lightwave Technol. 36(2), 560–567 (2018). [CrossRef]

4. L. Sun, J. Du, and Z. He, “Multiband Three-Dimensional Carrierless Amplitude Phase Modulation for Short Reach Optical Communications,” J. Lightwave Technol. 34(13), 3103–3109 (2016). [CrossRef]

5. K. Zhong, X. Zhou, T. Gui, L. Tao, Y. Gao, W. Chen, J. Man, L. Zeng, A. P. T. Lau, and C. Lu, “Experimental study of PAM-4, CAP-16, and DMT for 100 Gb/s short reach optical transmission systems,” Opt. Express 23(2), 1176–1189 (2015). [CrossRef]

6. K. Zhong, X. Zhou, J. Huo, C. Yu, C. Lu, and A. P. T. Lau, “Digital Signal Processing for Short-Reach Optical Communications: A Review of Current Technologies and Future Trends,” J. Lightwave Technol. 36(2), 377–400 (2018). [CrossRef]

7. J. Zhang, X. Wu, L. Sun, J. Liu, A. P. T. Lau, C. Guo, S. Yu, and C. Lu, “C-band 120-Gb/s PAM-4 transmission over 50-km SSMF with improved weighted decision-feedback equalizer,” Opt. Express 29(25), 41622–41633 (2021). [CrossRef]

8. N. Stojanovic, F. Karinou, Z. Qiang, and C. Prodaniuc, “Volterra and Wiener Equalizers for Short-Reach 100G PAM-4 Applications,” J. Lightwave Technol. 35(21), 4583–4594 (2017). [CrossRef]

9. Y. Yu, Y. Che, T. Bo, D. Kim, and H. Kim, “Reduced-state MLSE for an IM/DD system using PAM modulation,” Opt. Express 28(26), 38505–38515 (2020). [CrossRef]

10. L. Sun, J. Xiao, Y. Cai, G. Shen, G. N. Liu, and C. Lu, “Complex-valued decision feedback equalizer for optical IMDD signals with adaptive manipulations in time and amplitude domains,” Opt. Lett. 47(17), 4391–4394 (2022). [CrossRef]

11. L. Yi, T. Liao, L. Huang, L. Xue, P. Li, and W. Hu, “Machine Learning for 100 Gb/s/λ Passive Optical Network,” Journal of Lightwave Technol. 37(6), 1621–1630 (2019). [CrossRef]

12. C. Y. Chuang, C. C. Wei, T. C. Lin, K. L. Chi, L.C. Liu, J. W. Chen, Y.K. Shi, and J. Chen, “Employing Deep Neural Network for High Speed 4-PAM Optical Interconnect,” in European Conference on Optical Communication (2017), pp. 1–3.

13. J. Zhang, L. Yan, L. Jiang, A. Yi, Y. Pan, W. Pan, and B. Luo, “Convolutional Neural Network Equalizer for Short-reach Optical Communication Systems,” in Asia Communications and Photonics Conference and International Conference on Information Photonics and Optical Communications (2020), paper T4A.79.

14. Q. Zhou, F. Zhang, and C. Yang, “AdaNN: Adaptive Neural Network-Based Equalizer via Online Semi-Supervised Learning,” Journal of Lightwave Technol. 38(16), 4315–4324 (2020). [CrossRef]

15. C. Ye, D. Zhang, X. Hu, X. Huang, H. Feng, and K. Zhang, “Recurrent Neural Network (RNN) Based End-to-End Nonlinear Management for Symmetrical 50Gbps NRZ PON with 29dB + Loss Budget,” in European Conference on Optical Communication (2018), pp. 1–3.

16. X. Dai, X. Li, M. Luo, Q. You, and S. Yu, “LSTM networks enabled nonlinear equalization in 50-Gb/s PAM-4 transmission links,” Appl. Opt. 58(22), 6079–6084 (2019). [CrossRef]

17. Z. Xu, C. Sun, T. Ji, S. Dong, X. Zhou, and W. Shieh, “Investigation on the Impact of Additional Connections to Feedforward Neural Networks for Equalization in PAM4 Short Reach Direct Detection Links,” in Asia Communications and Photonics Conference and International Conference on Information Photonics and Optical Communications (2020), paper S4l.1.

18. J. Zhang, L. Xia, M. Zhu, S. Hu, B. Xu, and K. Qiu, “Fast remodeling for nonlinear distortion mitigation based on transfer learning,” Opt. Lett. 44(17), 4243–4246 (2019). [CrossRef]

19. W. Zhang, T. Jin, T. Xu, J. Zhang, and K. Qiu, “Nonlinear Mitigation with TL-NN-NLC in Coherent Optical Fiber Communications,” in Asia Communications and Photonics Conference/International Conference on Information Photonics and Optical Communications (2020), paper M4A.321.

20. X. Ren, Y. Wang, and C. Li, “Transfer Learning Aided Optical Nonlinear Equalization Based Feature Engineering Neural Network,” in Asia Communications and Photonics Conference (2021), paper T4A.104.

21. P. J. Freire, D. Abode, J. E. Prilepsky, N. Costa, B. Spinnler, A. Napoli, and S. K. Turitsyn, “Transfer Learning for Neural Networks-Based Equalizers in Coherent Optical Systems,” Journal of Lightwave Technology. 39(21), 6733–6745 (2021). [CrossRef]

22. Z. Xu, C. Sun, T. Ji, J. H. Manton, and W. Shieh, “Feedforward and Recurrent Neural Network-Based Transfer Learning for Nonlinear Equalization in Short-Reach Optical Links,” Journal of Lightwave Technology. 39(2), 475–480 (2021). [CrossRef]

23. X. Liu, J. Li, Z. Fan, and J. Zhao, “Transfer-Learning Based Convolutional Neural Network for Short-Reach Optical Interconnects,” in Asia Communications and Photonics Conference (2021), paper T4A.79.

24. L. Ge, W. Zhang, C. Liang, and Z. He, “Compressed Neural Network Equalization Based on Iterative Pruning Algorithm for 112-Gbps VCSEL-Enabled Optical Interconnects,” Journal of Lightwave Technology. 38(6), 1323–1329 (2020). [CrossRef]

25. G. S. Yadav, C. Y. Chuang, K. M. Feng, J. Chen, and C. Y. Chen, “Computation efficient sparse DNN nonlinear equalization for IM/DD 112 Gbps PAM4 inter-data center optical interconnects,” Opt. Lett. 46(9), 1999–2002 (2021). [CrossRef]

26. X. Wu, J. Zhang, G. Zhou, A. P. T. Lau, and C. Lu, “C-Band 112-Gb/s PAM-4 Transmission over 50-km SSMF Using Absolute-Term Based Nonlinear FFE-DFE,” in Asia Communications and Photonics Conference (2021), paper M4I.3.

Year	NNE	Source link	Target link	Symbol rate	Modulation format	References
2019	DNN	8 × 80km	80km	28GBaud	SSB-PAM4	[18]
			2 × 80km
			4 × 80km
			6 × 80km
			10 × 80km
2020	NN-NLE	14 × 100km	2 × 100km	16GBaud	PM-QAM16	[19]
			4 × 100km
			6 × 100km
			8 × 100km
			10 × 100km
			12 × 100km
2021	FE-DNN	22 dB (OSNR)	24dB	15GBaud	DP-QAM16	[20]
			26dB
			28dB
			30dB
2021	CNN + biLSTM	18 × 50 km SSMF	9 × 50 km TWC fiber	34.4GBaud	QAM16	[21]
2021	FNN and RNN	10km	20km	25GBaud	PAM4	[22]
		15km
		25km
		30km
2021	CNN and ANN	25 km 30km	20km	25GBaud	PAM4	[23]
			30km
			35km
			40km

Optimizations and investigations for transfer learning of iteratively pruned neural network equalizers for data center networking

Abstract

1. Introduction

2. Operational principle

2.1 Basics of recurrent neural network equalizer

2.2 Principle of iterative pruning TL

3. Results and discussions

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (9)

Tables (1)

Equations (2)

Optics Express