Adaptive digital back propagation exploiting adjoint-based optimization for fiber-optic communications

Mahmoud M. T. Maghrabi; Mahmoud M. T. Maghrabi; Mohamed H. Bakr; Shiva Kumar

doi:10.1364/OE.453640

1. Introduction

Optical fibers play a vital role in modern telecommunication systems and networks [1]. An optical fiber link imposes linear and nonlinear distortions on the propagating light-wave signal due to the inherent dispersive nature and nonlinear behavior of the fiber [2,3]. Although linear impairments can be effectively compensated in coherent optical fiber systems using linear equalizers, the mitigation of nonlinear distortions is considered a major challenge [4]. These nonlinear distortions impede the higher data rate transmission over longer distances. Developing an efficient and computationally inexpensive digital signal processing (DSP) technique to perfectly compensate for the nonlinear fiber impairments is therefore essential and of preeminent importance.

Various DSP-based compensation techniques have been proposed in literature for mitigating the distortions imposed by optical fiber communication systems. These techniques include least mean square equalizers [5], Volterra-series-based equalizers [6], neural network-based equalizers [7], and digital back propagation (DBP) [8–14]. The DBP is one of the most attractive candidates that can principally be used to effectively eliminate the deterministic impairments in optical fiber systems with coherent detection. The basic idea of the DBP is to transmit the received distorted signal through a virtual fiber whose loss, dispersion, and nonlinear parameters are the same in magnitude as that of the actual transmission fiber, but with opposite signs such that the deterministic distortion effects of the transmission link can be fully inverted [9,10]. However, the compensation of fiber impairments using the DBP requires prior knowledge of the physical fiber-optic channel parameters. Such information can only be available at the receiver in the case of a point-to-point transmission scenario. Precise information of the physical fiber channel parametrization may not be available in the case of a dynamic optical network. For instance, transmitted carriers in elastic optical networks (EONs) may propagate through several different routes until reaching the same receiver point [15]. These routes are determined according to the dynamic network configurations. Hence, it becomes challenging and very difficult to determine the fiber channel parameters in such scenarios [16]. Moreover, the physical parameters of optical fiber links and in-line erbium-doped fiber amplifiers (EDFAs) are subject to slight variations, from their nominal values, due to environmental changes.

Many approaches have been proposed for estimating the DBP parameters without knowing the launch power and channel parameters. A semi-blind DBP method is introduced in [17], where the dispersion and nonlinear parameters of the DBP are tuned according to information derived from previously detected data after being recovered using the forward error correction (FEC). However, such approach requires significant delay and hence, its use is limited for high-speed transmission systems. To overcome this delay, adaptive methods that can self-determine the fiber nonlinearity coefficient utilizing gradient-based algorithms are proposed in [16,18,19]. Starting from randomly guessed nonlinear coefficients, the DBP is trained using the known transmitted data to minimize the error vector magnitude [16], the phase-noise variance [18], or the Godard’s error function [19]. More advanced adaptive DBP approaches, based on the assisted filter DBP [11] and the perturbation theory [20], are proposed in [21,22]. These advanced approaches are capable of estimating not only the nonlinear compensation parameters, but also the dispersion compensation parameter and other parameters, e.g. the linear filter bandwidth in case of the assisted filter DBP.

However, in all of these adaptive DBP methods [16,18,19,21,22], the classical finite-difference (FD) approach is utilized to determine the sensitivity information required by the optimizer, during the DBP parameters estimation. The computational cost for obtaining the sensitivity information using FD methods is very expensive $- $ it scales linearly with the number of design parameters. Such overhead together with the expensive computational cost of a single run of the DBP system make the adaptive DBP training cost prohibitive and impractical, especially when a large number of design parameters needs to be estimated. This intensive computational cost can though be significantly reduced through evaluating the sensitivities using an adjoint sensitivity analysis (ASA) approach [23].

The ASA method estimates the sensitivities of the desired objective function with respect to all design parameters using at most one extra system simulation. ASA approaches significantly reduces the computational time cost required by FD approaches, at the expense of using extra memory storage [23]. Various ASA algorithms have been developed in literature based on Maxwell’s equations or the wave equation for high frequency structures [24–28]. An ASA technique is also proposed to estimate the sensitivities of low frequency power applications, e.g., switched reluctance motors [29]. Recently, we proposed an ASA method based on the nonlinear Schrödinger equation (NLSE) to evaluate the sensitivities of a repeaterless single fiber span optical communication system [30,31]. However, the use of this method is limited to forward propagation problems with single fiber span, since it cannot account for the effects of in-line amplifiers. In this paper, we extend this approach to the sensitivity analysis of the general multi-span DBP model. The extended ASA includes both the effect of the inverse nonlinear Schrödinger equation (INLSE) and the in-line amplifier inverse effect. Based on this extended sensitivity analysis approach, we propose an adaptive DBP (A-DBP) scheme, with fast adaption process, to blindly compensate for the linear and nonlinear distortions of optical fiber transmission channels. An adjoint-based optimization (ABO) algorithm, based on the sequential quadratic programming (SQP) method, is proposed to train and optimize the parameters of the A-DBP. All sensitivity calculations required by the ABO algorithm are estimated using the extended ASA approach, which significantly accelerates the A-DBP training process. Regardless of the number of A-DBP design parameters, full gradient information of the A-DBP training problem with respect to all parameters is obtained using only one extra adjoint DBP system simulation. The signal and system parameters required for the DBP such as launch power, fiber lengths, dispersion, loss and nonlinear coefficients of all the fiber spans can be estimated using the proposed approach.

The robustness and efficiency of the proposed A-DBP algorithm is investigated through applying it to mitigate the distortions induced in 4-span and 20-span fiber-optic communication systems. Thanks to the proposed ABO algorithm, the A-DBP could rapidly be trained achieving the optimum compensation performance that is obtained using a conventional DBP with the correct parameters of the channel. Moreover, to reduce the computational complexity of the compensation process, we demonstrate the compensation of the 4-span optical communication scenario using a coarse-mesh A-DBP model with less number of virtual spans.

The remainder of the article is organized as follows: Section 2 provides a brief description of the multi-span DBP mathematical model. In Section 3, we derive the general adjoint system simulation corresponding to the original system simulation of the DBP. A full description of the modified split-step Fourier scheme (SSFS) algorithm required for solving the derived adjoint simulation is also given. Section 3 also provides a complete computational complexity analysis of the modified SSFS algorithm. A detailed description of the proposed ABO algorithm is given in Section 4. In Section 5, we investigate the efficiency of the proposed A-DBP scheme through applying the ABO algorithm to train the DBP for blindly mitigating the inherent distortions in typical 4-span and 20-span fiber-optic communication systems. We also demonstrate the performance of the ABO algorithm training as compared to the finite-difference-based optimization algorithms. Moreover, in Section 5, the proposed ABO algorithm is applied to train coarse-mesh A-DBP models with fewer spans, in order to reduce the computational cost of the compensation of the 4-span optical communication scenario. Finally, conclusions of the work are drawn in Section 6.

2. Multi-span DBP mathematical model

The signal evolution within a DBP span (i.e. virtual fiber + loss element) is described by the inverse nonlinear Schrödinger equation (INLSE) coupled with the inverse gain effect, given as follows [9,10]:

(1)$$\frac{{\partial v}}{{\partial Z}} - [{{{\hat{D}}_b} + {{\hat{N}}_b}} ]v + {\hat{G}_b}v = {v_{in}}\delta (Z ), $$

where $v({Z,t} )= {q_b}/\sqrt {{P_0}} $ is a normalized signal of the complex equalized field ${q_b}({Z,t} )$, ${P_0}$ is the peak power of the transmitted pulse, $Z = {L_{tot}} - z$ is the backward direction, z is the propagation distance in the fiber-optic link, and ${L_{tot}}$ is the total transmission distance of the fiber-optic link. The linear, nonlinear, and attenuation operators of the DBP (${\hat{D}_b}$, ${\hat{N}_b}$, and ${\hat{G}_b}$) for the $j$^th physical span are given as [9,10]:

(2)$${\hat{D}_b}(t )={-} \frac{{{\beta _{3,j}}}}{6}\frac{{{\partial ^3}}}{{\partial {t^3}}} + \frac{{i{\beta _{2,j}}}}{2}\frac{{{\partial ^2}}}{{\partial {t^2}}} + \frac{{{\alpha _j}}}{2}, $$

(3)$${\hat{N}_b}({Z,t} )={-} i{\gamma _j}{P_0}{|{v({Z,t} )} |^2}, $$

(4)$${\hat{G}_b}(Z )= \left\{ {\begin{array}{lc} {g_j},& \textrm{at}\; Z = \mathop \sum \limits_{l = j + 1}^M {L_l}\\ {0},&\textrm{elsewhere} \end{array}} \right., $$

where ${\beta _{3,j}}$ and ${\beta _{2,j}}$ are the third- and second-order dispersion coefficients of the $j$^th physical span, $j = \; M,\; M - 1,\; \ldots ,\; 2,\; 1$, and $M\; $ is the number of spans. The parameters ${\alpha _j}$ and ${\gamma _j}$ are the loss and nonlinear coefficient of the $j$^th span. ${L_j}$ and ${G_j} = \exp ({{g_j}} )$ are the fiber length and the amplifier gain of the $j$^th span. Here, we use $i = \sqrt { - 1} $. Note that in Eq. (1), we assume zero boundary condition, i.e., $v({0,t} )= 0$, and the field excitation is given by the right-hand side term ${v_{in}}\delta (Z )$, where ${v_{in}} = u({z = {L_{tot}},t} )$, and $u({z,t} )$ is the field in the fiber-optic link. Expressing the complex field $v({Z,t} )$ in terms of its real and imaginary parts ($v = {v_{re}} + i{v_{im}}$), and separating real and imaginary terms, Eq. (1) becomes

(5)$$\begin{array}{c}\left[ {\begin{array}{@{}cc@{}} {{\tilde{\beta }_3}}&0\\ 0&{{{\tilde{\beta }}_3}} \end{array}} \right]\left[ {\begin{array}{@{}c@{}} {\frac{{{\partial^3}{v_{re}}}}{{\partial {t^3}}}}\\ {\frac{{{\partial^3}{v_{im}}}}{{\partial {t^3}}}} \end{array}} \right] + \left[ {\begin{array}{@{}cc@{}} 0&{{{\tilde{\beta }}_2}}\\ { - {{\tilde{\beta }}_2}}&0 \end{array}} \right]\left[ {\begin{array}{@{}c@{}} {\frac{{{\partial^2}{v_{re}}}}{{\partial {t^2}}}}\\ {\frac{{{\partial^2}{v_{im}}}}{{\partial {t^2}}}} \end{array}} \right] + \left[ {\begin{array}{@{}cc@{}} {\partial /\partial Z}&0\\ 0&{\partial /\partial Z} \end{array}} \right]\left[ {\begin{array}{@{}c@{}} {{v_{re}}}\\ {{v_{im}}} \end{array}} \right] + \left[ {\begin{array}{@{}cc@{}} {\tilde{\alpha }}&0\\ 0&{\tilde{\alpha }} \end{array}} \right]\left[ {\begin{array}{@{}c@{}} {{v_{re}}}\\ {{v_{im}}} \end{array}} \right] +\\ \left[ {\begin{array}{@{}cc@{}} 0&{ - \tilde{\gamma }}\\ {\tilde{\gamma }}&0 \end{array}} \right]\left[ {\begin{array}{@{}c@{}} {{v_{re}}}\\ {{v_{im}}} \end{array}} \right] + \left[ {\begin{array}{@{}cc@{}} {\tilde{g}}&0\\ 0&{\tilde{g}} \end{array}} \right]\left[ {\begin{array}{@{}c@{}} {{v_{re}}}\\ {{v_{im}}} \end{array}} \right] = \left[ {\begin{array}{@{}c@{}} {Re\{ {v_{in}}\} \delta (Z)}\\ {Im\{ {v_{in}}\} \delta (Z)} \end{array}} \right]\end{array}$$

where ${\tilde{\beta }_3} = {\beta _{3,j}}/6$, ${\tilde{\beta }_2} = {\beta _{2,j}}/2$, $\tilde{\alpha } ={-} {\alpha _j}\; /2$, and $\tilde{\gamma }(v )= {\gamma _j}{P_0}({v_{re}^2 + v_{im}^2} )$. The parameter $\tilde{g} = {g_j}\delta \left( {Z - \mathop \sum \nolimits_{l = j + 1}^M {L_l}} \right)$, where j can be $1,\; 2,\; \ldots ,\; \textrm{or}\; M$. Without loss of generality, we assume that the ${j^{th}}$ span is discretized into ${p_j}$ spatial cells with a step size of h, and $p = \mathop \sum \nolimits_{j = 1}^M {p_j}$ is the total number of spatial cells in the computational domain. Writing (5) for the entire computational domain and approximating the spatial derivatives using finite-differences, Eq. (5) becomes:

(6)$${{\boldsymbol B}_3}{\dddot {\boldsymbol V}} + {{\boldsymbol B}_2}{\dddot {\boldsymbol V}} + {{\boldsymbol K}_c}{\boldsymbol V} + {\boldsymbol {AV}} + {\boldsymbol \varGamma }({\boldsymbol V} ){\boldsymbol V} + {\boldsymbol {GV}} = {{\boldsymbol V}_{in}}(t ), $$

where ${\boldsymbol V} = {\boldsymbol V}({Z,t} )= {[{{\boldsymbol v}_{re}^T(t )\; \; {\boldsymbol v}_{im}^T(t )} ]^T}$ is the system state vector, containing all real and imaginary parts of the complex field v in the whole computational domain, T denotes the transpose, ${\cdot} $ denotes time derivative, and

(7A)$${{\boldsymbol v}_{re}}(t )= {[{{v_{re}}({0,t} ),\; \ldots {v_{re}}({lh,t} ),\; \ldots {v_{re}}({({p - 1} )h,t} )} ]^T}, $$

(7B)$${{\boldsymbol v}_{im}}(t )= {[{{v_{im}}({0,t} ),\; \ldots {v_{im}}({lh,t} ),\; \ldots {v_{im}}({({p - 1} )h,t} )} ]^T}. $$

The values of the system state vector components implicitly depend on the vector ${\boldsymbol x}$, which includes all DBP design parameters. A general form of the design vector ${\boldsymbol x}$ is defined as:

(8)$${\boldsymbol x} = {[{{P_0}\; {L_1}\; {L_2} \cdots {L_M}\; {\gamma_1}\; {\gamma_2} \cdots {\gamma_M}\; {\beta_{3,1}}\; {\beta_{3,2}} \cdots {\beta_{3,M}}\; {\beta_{2,1}}\; {\beta_{2,2}}\; \cdots {\beta_{2,M}}\; {\alpha_1}\; {\alpha_2} \cdots {\alpha_M}\; {g_1}\; {g_2} \cdots {g_M}} ]^T}. $$

The vector ${{\boldsymbol V}_{{\boldsymbol in}}}(t )= {[{Re\{{{v_{in}}} \}{\boldsymbol e}_1^T\; \; Im\{{{v_{in}}} \}{\boldsymbol e}_1^T} ]^T}$ is the excitation vector, where ${{\boldsymbol e}_1} = {[{1\; 0 \cdots 0} ]^T}$ is the $1$^st elementary column vector of size p, and the constant matrix ${{\boldsymbol K}_c}$ approximates the spatial derivatives $\partial {{\boldsymbol v}_{re}}/\partial Z$ and $\partial {{\boldsymbol v}_{im}}/\partial Z$ using central finite-difference approximations. The form of the matrix ${{\boldsymbol K}_c}$ is given by

(9)$${{\boldsymbol K}_c} = \left[ \begin{array}{cccccccc} {}&{\tilde{h}}&{}&{}&{}&{}&{}&{}\\ { - \tilde{h}}&{}& \ddots &{}&{}&{}&{\mathbf 0}&{}\\ {}& \ddots &{}&{\tilde{h}}&{}&{}&{}&{}\\ {}&{}&{ - \tilde{h}}&{}&{}&{\tilde{h}}&{}&{}\\ {}&{}&{}&{}&{-\tilde{h}}&{}& \ddots &{}\\ {}&{\mathbf 0}&{}&{}&{}& \ddots &{}&{\tilde{h}}\\ {}&{}&{}&{}&{}&{}&{ - \tilde{h}}&{} \end{array} \right], $$

where $\tilde{h} = 1/({2h} )$ and $0$ is $p \times p$ zero matrix. The system matrices are given as: ${{\boldsymbol B}_3} = \left[ {\begin{array}{cc} {{{\tilde{{\boldsymbol \beta }}}_3}}&0\\ 0&{{{\tilde{{\boldsymbol \beta }}}_3}} \end{array}} \right]$, ${{\boldsymbol B}_2} = \left[ {\begin{array}{cc} 0&{{{\tilde{{\boldsymbol \beta }}}_2}}\\ { - {{\tilde{{\boldsymbol \beta }}}_2}}&0 \end{array}} \right]$, ${\boldsymbol \varGamma }({\boldsymbol V} )= \left[ {\begin{array}{cc} 0&{ - \tilde{{\boldsymbol \gamma }}}\\ {\tilde{{\boldsymbol \gamma }}}&0 \end{array}} \right]$, ${\boldsymbol A} = \left[ {\begin{array}{cc} {\tilde{{\boldsymbol \alpha }}}&0\\ 0&{\tilde{{\boldsymbol \alpha }}} \end{array}} \right]$, and ${\boldsymbol G} = \left[ {\begin{array}{cc} {\tilde{{\boldsymbol g}}}&0\\ 0&{\tilde{{\boldsymbol g}}} \end{array}} \right]$, where ${\tilde{{\boldsymbol \beta }}_3}$, ${\tilde{{\boldsymbol \beta }}_2}$, and $\tilde{{\boldsymbol \alpha }}$ are all diagonal matrices whose ${k^{th}}$ diagonal entry contains the corresponding values of ${\tilde{\beta }_3}$, ${\tilde{\beta }_2}$, and $\tilde{\alpha }$, respectively, at the $k$^th cell of the computational domain, and $k = 1,\; 2,\; \ldots ,\; p$. The diagonal matrix $\tilde{{\boldsymbol g}}$ is: $\tilde{{\boldsymbol g}} = \mathop \sum \nolimits_{j = 1}^M {g_j}{{\boldsymbol e}_{{k_j}}}{\boldsymbol e}_{{k_j}}^T$, where ${{\boldsymbol e}_{{k_j}}}$ is an elementary column vector of size p whose nonzero entry is ${k_j} = 1 + \mathop \sum \nolimits_{l = 1}^j {p_{l - 1}}$ with the assumption that ${p_0} = 0$. The matrix $\tilde{{\boldsymbol \gamma }}({\boldsymbol V} )$ is a diagonal matrix whose ${k^{th}}$ diagonal entry is given by: ${\gamma _k}{P_0}[{v_{re}^2({kh} )+ v_{im}^2({kh} )} ]$, $k = 0,\; 1, \ldots ,\; p - 1$. Equation (6) represents the original simulation of the multi-span DBP system, which is typically solved using the split-step Fourier scheme (SSFS) method [32].

Notice that the analysis throughout this paper considers a single channel with a single polarization system. However, the presented formulation can be extended to a system with polarization-multiplexed signal. In the case of dual polarization, the system state vector ${\boldsymbol V}$ would have $4$ independent components at each spatial cell, rather than $2$, corresponding to the real and imaginary parts of the x and y polarization components of the normalized complex-envelope field. In other words, the system state vector becomes: ${\boldsymbol V} = {\boldsymbol V}({Z,t} )= {[{{\boldsymbol v}_{x|re}^T(t )\; {\boldsymbol v}_{y|re}^T(t )\; \; {\boldsymbol v}_{x|im}^T(t )\; \; {\boldsymbol v}_{y|im}^T(t )\; } ]^T}$, where ${{\boldsymbol v}_{q|re}}$ and ${{\boldsymbol v}_{q|im}}$ represent the real and imaginary parts of the q polarization component of the complex field, respectively, and q is x or y. The system matrices of Eq. (6) would have the same forms but with $4$ block matrices instead of $2$ block matrices. For example, the matrices ${{\boldsymbol B}_3}$ and ${\boldsymbol \varGamma }({\boldsymbol V} )$ becomes: $\left[ {\begin{array}{cccc} {{{\tilde{{\boldsymbol \beta }}}_3}}&0&0&0\\ 0&{{{\tilde{{\boldsymbol \beta }}}_3}}&0&0\\ 0&0&{{{\tilde{{\boldsymbol \beta }}}_3}}&0\\ 0&0&0&{{{\tilde{{\boldsymbol \beta }}}_3}} \end{array}} \right]$ and $\left[ {\begin{array}{cccc} 0&0&{ - \widetilde {\tilde{{\boldsymbol \gamma }}}}&0\\ 0&0&0&{ - \widetilde {\tilde{{\boldsymbol \gamma }}}}\\ {\widetilde {\tilde{{\boldsymbol \gamma }}}}&0&0&0\\ 0&{\widetilde {\tilde{{\boldsymbol \gamma }}}}&0&0 \end{array}} \right]$, respectively, where $0$ is $p \times p$ zero matrix and the ${k^{th}}\; $ diagonal component of the diagonal matrix $\widetilde {\tilde{{\boldsymbol \gamma }}}({\boldsymbol V} )$ becomes: $\frac{8}{9}{\gamma _k}{P_0}[{v_{x|re}^2({kh} )+ v_{y|re}^2({kh} )+ v_{x|im}^2({kh} )+ v_{y|im}^2({kh} )} ]$, for $k = 0,\; 1, \ldots ,\; p - 1$.

Next, we derive an adjoint system simulation corresponding to Eq. (6). Using these 2 simulations (the original and adjoint simulations), the full gradient of a general objective function with respect to all DBP design parameters are estimated.

3. ASA for the DBP

Our objective is to estimate the sensitivities of the following objective function [23]:

(10)$$F = \mathop \smallint \limits_{ - {T_m}}^{{T_m}} \psi ({{\boldsymbol x},{\boldsymbol V}} )dt, $$

where the integral kernel $\psi $ is a predefined function, ${\boldsymbol x} \in {\mathrm{\mathbb{R}}^N}$ is the vector of design parameters, ${\boldsymbol V}$ is the system state vector, and ${T_m}$ is half the computational window width. For example, F may represent the mean-square difference between the output of the DBP ${v_{out}}(t )$ and the transmitter output ${u_{tx}}(t )$, i.e.

(11)$$F = \mathop \smallint \limits_{ - {T_m}}^{{T_m}} {|{{v_{out}}(t )- {u_{tx}}(t )} |^2}dt, $$

which we would wish to minimize. The analytic derivative of Eq. (10) with respect to the $l$^th parameter ${x_l}$, $l = 1,\; 2,\; \ldots ,\; N$, is given by:

(12)$$\frac{{\partial F}}{{\partial {x_l}}} = \mathop \smallint \limits_{ - {T_m}}^{{T_m}} \frac{{{\partial ^e}\psi }}{{\partial {x_l}}}dt + \mathop \smallint \limits_{ - {T_m}}^{{T_m}} {\left( {\frac{{\partial \psi }}{{\partial {\boldsymbol V}}}} \right)^T}\frac{{\partial {\boldsymbol V}}}{{\partial {x_l}}}dt, $$

where ${\partial ^e}/\partial {x_l}$ denotes the explicit dependence. The analytic expression in Eq. (12) cannot be evaluated unless the vector $\partial {\boldsymbol V}/\partial {x_l}$ is known for every time step. In order to derive an adjoint sensitivity analysis (ASA) approach for estimating the implicit derivative in Eq. (12) with respect to all the design parameters using only one extra system simulation, we follow the same derivation steps of the NLSE-based ASA method [30]. We start by differentiating the original multi-span DBP system (6) with respect to the ${l^{th}}$ parameter ${x_l}$, $l = 1,\; 2, \ldots ,N$, taking into account that the excitation vector ${{\boldsymbol V}_{in}}$ is independent of ${x_l}$. By reorganizing and moving the known terms to the right-hand side, we get

(13)$${{\boldsymbol B}_3}\frac{{{\partial ^4}{\boldsymbol V}}}{{\partial {x_l}\partial {t^3}}} + {{\boldsymbol B}_2}\frac{{{\partial ^3}{\boldsymbol V}}}{{\partial {x_l}\partial {t^2}}} + {{\boldsymbol K}_c}\frac{{\partial {\boldsymbol V}}}{{\partial {x_l}}} + {\boldsymbol A}\frac{{\partial {\boldsymbol V}}}{{\partial {x_l}}} + \frac{{\partial ({{\boldsymbol \varGamma \bar{V}}} )}}{{\partial {{\boldsymbol V}^T}}}\frac{{\partial {\boldsymbol V}}}{{\partial {x_l}}} + {\boldsymbol \varGamma }\frac{{\partial {\boldsymbol V}}}{{\partial {x_l}}} + {\boldsymbol G}\frac{{\partial {\boldsymbol V}}}{{\partial {x_l}}} ={-} {{\boldsymbol R}_l}, $$

where $\bar{{\boldsymbol V}}$ is the nominal value of ${\boldsymbol V}$, treated as constant during the differentiation operation in Eq. (13). Note that all system matrices arising in Eq. (6) are constants with time, except for the nonlinear coefficients matrix ${\boldsymbol \varGamma }({\boldsymbol V} )$. This dependency arises from the fact that the nonlinear coefficient $\tilde{\gamma } = \gamma {P_0}{|v |^2}$ is a function of the local field value. The residue vector ${{\boldsymbol R}_l}$, corresponding to the $l$^th parameter ${x_l}$, is given by:

(14)$${{\boldsymbol R}_l} = \frac{{{\partial ^e}{{\boldsymbol B}_3}}}{{\partial {x_l}}}{\dddot {\boldsymbol V}} + \frac{{{\partial ^e}{{\boldsymbol B}_2}}}{{\partial {x_l}}}\ddot {\boldsymbol V} + \frac{{{\partial ^e}{\boldsymbol A}}}{{\partial {x_l}}}{\boldsymbol V} + \frac{{{\partial ^e}{\boldsymbol \varGamma }}}{{\partial {x_l}}}{\boldsymbol V} + \frac{{{\partial ^e}{\boldsymbol G}}}{{\partial {x_l}}}{\boldsymbol V}. $$

Note that all explicit derivatives appearing in Eq. (14) are known and can be evaluated using the definitions of the DBP system matrices: ${{\boldsymbol B}_3}$, ${{\boldsymbol B}_2}$, ${\boldsymbol A}$, ${\boldsymbol \varGamma }$, and ${\boldsymbol G}$. For example, if the second order dispersion coefficient is the current design parameter (i.e. ${x_l} = {\beta _2}$), then ${\partial ^e}{{\boldsymbol B}_3}/\partial {x_l} = 0$, ${\partial ^e}{{\boldsymbol B}_2}/\partial {x_l} ={-} 0.5{{\boldsymbol I}_{2p}}$, ${\partial ^e}{\boldsymbol A}/\partial {x_l} = 0$, ${\partial ^e}{\boldsymbol \varGamma }/\partial {x_l} = 0$, and ${\partial ^e}{\boldsymbol G}/\partial {x_l} = 0$. Multiplying both sides of Eq. (13) by the yet-to-be-determined temporal adjoint vector ${\boldsymbol \lambda }$ (which is a function of time and space), integrating over time, and exploiting integration by parts to have no mixed derivative terms in the kernel of the integral, we obtain

(15)$$\mathop \smallint \nolimits_{ - {T_m}}^{{T_m}} \left[- {{{\dddot {\boldsymbol \lambda}}}^T}{{\boldsymbol B}_3} + {{\ddot {\boldsymbol \lambda}^T}{{\boldsymbol B}_2} + {{\boldsymbol \lambda }^T}\left( {{{\boldsymbol K}_c} + {\boldsymbol A} + \frac{{\partial ({{\boldsymbol \varGamma \bar{V}}} )}}{{\partial {{\boldsymbol V}^T}}} + {\boldsymbol \varGamma } + {\boldsymbol G}} \right)} \right]\frac{{\partial {\boldsymbol V}}}{{\partial {x_l}}}dt ={-} \mathop \smallint \nolimits_{ - {T_m}}^{{T_m}} {{\boldsymbol \lambda }^T}{{\boldsymbol R}_l}dt, $$

where ${\boldsymbol \lambda }({Z,t} )= {[{{\boldsymbol \lambda }_{re}^T\; \; {\boldsymbol \lambda }_{im}^T} ]^T}$ is the adjoint state vector. Notice that to obtain Eq. (15), we assume the following zero boundary conditions $:\; {\boldsymbol V}({Z,\; - {T_m}} )= \dot{{\boldsymbol V}}({Z,\; - {T_m}} )= {\ddot {\boldsymbol V}}({Z,\; - {T_m}} )= 0$, and ${\boldsymbol \lambda }({Z,\; {T_m}} )= \dot{{\boldsymbol \lambda }}({Z,\; {T_m}} )= \ddot {\boldsymbol \lambda} ({Z,\; {T_m}} )= 0$. Comparing the left-hand side term in Eq. (15) with the last term in the right-hand side (i.e. implicit integral) of Eq. (12), we enforce the condition:

(16)$$- {{\dddot {\boldsymbol \lambda} }^T}{{\boldsymbol B}_3} + {\ddot {\boldsymbol \lambda} ^T}{{\boldsymbol B}_2} + {{\boldsymbol \lambda }^T}\left[ {{{\boldsymbol K}_c} + {\boldsymbol A} + \frac{{\partial ({{\boldsymbol \varGamma \bar{V}}} )}}{{\partial {{\boldsymbol V}^T}}} + {\boldsymbol \varGamma } + {\boldsymbol G}} \right] = {\left( {\frac{{\partial \psi }}{{\partial {\boldsymbol V}}}} \right)^T}. $$

From the definitions of the original system matrices in Eq. (6), it is clear that ${{\boldsymbol B}_3}$, ${\boldsymbol A}$, and ${\boldsymbol G}$ are symmetric matrices, whereas ${{\boldsymbol B}_2}$, ${{\boldsymbol K}_c}$, and ${\boldsymbol \varGamma }$ are skew-symmetric matrices. Taking the transpose of both two sides of Eq. (16), we thus obtain the adjoint problem, given as

(17)$$- {{\boldsymbol B}_3}{\dddot {\boldsymbol \lambda} } - {{\boldsymbol B}_2}\ddot {\boldsymbol \lambda} - {{\boldsymbol K}_c}{\boldsymbol \lambda } + {\boldsymbol A\lambda } + {{\boldsymbol \varGamma }^\lambda }({\boldsymbol V} ){\boldsymbol \lambda } + {\boldsymbol G\lambda } = {\boldsymbol Q}_{in}^\lambda , $$

where the adjoint excitation vector is given by: ${\boldsymbol Q}_{in}^\lambda (t )= \partial \psi /\partial {\boldsymbol V} = {[{\partial \psi /\partial {\boldsymbol v}_{re}^T\; \partial \psi /\partial {\boldsymbol v}_{im}^T} ]^T}$. The matrix ${{\boldsymbol \varGamma }^\lambda }$ corresponding to the fiber nonlinearity of the adjoint problem is given by:

(18)$${{\boldsymbol \varGamma }^\lambda }({\boldsymbol V} )= \frac{{\partial {{({{\boldsymbol \varGamma \bar{V}}} )}^T}}}{{\partial {\boldsymbol V}}} - {\boldsymbol \varGamma } = \left[ {\begin{array}{cc} { - {\boldsymbol a}}&{\boldsymbol b}\\ {\boldsymbol c}&{\boldsymbol a} \end{array}} \right], $$

where ${\boldsymbol a}$, ${\boldsymbol b}$, and ${\boldsymbol c}$ are diagonal matrices whose ${k^{th}}$ diagonal elements are given, respectively, as follows:

(19A)$${a_{kk}} = 2\gamma {P_0}{v_{re}}({kh} ){v_{im}}({kh} ), $$

(19B)$${b_{kk}} = \gamma {P_0}[{3v_{re}^2({kh} )+ v_{im}^2({kh} )} ], $$

(19C)$${c_{kk}} ={-} \gamma {P_0}[{v_{re}^2({kh} )+ 3v_{im}^2({kh} )} ], $$

for $k = 0,\; 1, \ldots ,\; p - 1$. Equation (17) describes the evolution of the adjoint vector ${\boldsymbol \lambda }$ over time and space. It represents the modified INLSE simulation in the adjoint space. It has some resemblance to the original INLSE simulation (6) but the nonlinear term is different. It can also be noticed from the opposite sign of the spatial derivative matrix ($- {{\boldsymbol K}_c}$), as compared to the original problem (6), that the adjoint problem is solved in an opposite direction to the backward direction Z at which the original DBP problem is solved. In other words, the adjoint DBP problem (17) is solved in the forward direction $z = {L_{tot}} - Z$, i.e., the same direction of propagation in the physical fiber-optic link.

Figure 1 depicts the adjoint DBP simulation (17) as opposed to the original DBP simulation (6). As shown, each adjoint fiber span (AFS) has the same absolute values of the corresponding dispersion coefficients (${\beta _{3,j}}$ and ${\beta _{2,j}}$), but with reversed signs, as compared to the original DBP simulation. Since the dispersion coefficients used in the original DBP problem (6) have their signs opposite to those of the actual physical fiber spans, the dispersion coefficients used in the adjoint DBP simulation are identical to the corresponding parameters of the physical fiber spans. The gain coefficients ($- {\alpha _j}$) of the adjoint problem are identical to the corresponding gain parameters in the original DBP simulation. All loss elements $({{g_j}} )$ also remain the same in the adjoint simulation. The input of the adjoint problem ${\lambda _{in}}$ is obtained by differentiating the kernel of the objective function integral $\psi $ with respect to the original DBP field v. The nonlinear behavior of the adjoint DBP problem is completely different from the original DBP nonlinear effect. Moreover, the nonlinear coefficient of each AFS depends on the value of the corresponding original field v at the original DBP simulation, rather than the adjoint field $\lambda $, as shown in Fig. 1.

Fig. 1. Propagation model of an original DBP simulation, and its corresponding adjoint DBP simulation. Rx: receiver; DBP: digital back propagation; VFS: virtual fiber span; LE: loss element; AFS: adjoint fiber span.

Download Full Size | PDF

Once the original DBP response ${\boldsymbol V}$ and its corresponding adjoint response ${\boldsymbol \lambda }$ are evaluated for every time step in the entire computational domain, through solving Eqs. (6) and (17), respectively, the adjoint sensitivities of the objective function F are evaluated through the following formula, obtained by using Eqs. (16) and (15) in Eq. (12):

(20)$$\frac{{\partial F}}{{\partial {x_l}}} = \mathop \smallint \limits_{ - {T_m}}^{{T_m}} \frac{{{\partial ^e}\psi }}{{\partial {x_l}}}dt - \mathop \smallint \limits_{ - {T_m}}^{{T_m}} {{\boldsymbol \lambda }^T}{{\boldsymbol R}_l}dt,\; \; \; \; \; \; l = 1,\; 2,\; \ldots ,\; N. $$

Note that all these derivation steps are valid and hold for the case of dual-polarized system. The derived adjoint problem would also have a similar form to Eq. (17) except for the fact that all system matrices would consist of $4$ block matrices rather than $2$ block matrices. The main difference would only be on the structure of the adjoint nonlinear matrix ${{\boldsymbol \varGamma }^\mathrm{\lambda }}({\boldsymbol V} )$ whose new form would become as follows:

(21)$${{\boldsymbol \varGamma }^\lambda }({\boldsymbol V} )= \frac{{\partial {{({{\boldsymbol \varGamma \bar{V}}} )}^T}}}{{\partial {\boldsymbol V}}} - {\boldsymbol \varGamma } = \left[ {\begin{array}{cccc} { - {{\boldsymbol a}_x}}&{ - {{\boldsymbol a}_{xy}}}&{{{\boldsymbol b}_x}}&{{{\boldsymbol d}_1}}\\ { - {{\boldsymbol a}_{yx}}}&{ - {{\boldsymbol a}_y}}&{{{\boldsymbol d}_1}}&{{{\boldsymbol b}_y}}\\ {{{\boldsymbol c}_x}}&{{{\boldsymbol d}_2}}&{{{\boldsymbol a}_x}}&{{{\boldsymbol a}_{yx}}}\\ {{{\boldsymbol d}_2}}&{{{\boldsymbol c}_y}}&{{{\boldsymbol a}_{xy}}}&{{{\boldsymbol a}_y}} \end{array}} \right], $$

where all the block matrices ${{\boldsymbol a}_x}$, ${{\boldsymbol a}_y}$, ${{\boldsymbol a}_{xy}}$, ${{\boldsymbol a}_{yx}}$, ${{\boldsymbol b}_x}$, ${{\boldsymbol b}_y}$, ${{\boldsymbol c}_x}$, ${{\boldsymbol c}_y}$, ${{\boldsymbol d}_1}$, and ${{\boldsymbol d}_2}$ are diagonal matrices whose ${k^{th}}$ diagonal elements would be given, respectively, as follows:

(22A)$${a_{x,kk}} = \frac{{16}}{9}\gamma {P_0}{v_{x|re}}({kh} ){v_{x|im}}({kh} ), $$

(22B)$${a_{y,kk}} = \frac{{16}}{9}\gamma {P_0}{v_{y|re}}({kh} ){v_{y|im}}({kh} ), $$

(22C)$${a_{xy,kk}} = \frac{{16}}{9}\gamma {P_0}{v_{x|re}}({kh} ){v_{y|im}}({kh} ), $$

(22D)$${a_{yx,kk}} = \frac{{16}}{9}\gamma {P_0}{v_{y|re}}({kh} ){v_{x|im}}({kh} ), $$

(22E)$${b_{x,kk}} = \frac{8}{9}\gamma {P_0}[{2v_{x|re}^2({kh} )+ |{v_x^2({kh} )} |+ |{v_y^2({kh} )} |} ], $$

(22F)$${b_{y,kk}} = \frac{8}{9}\gamma {P_0}[{2v_{y|re}^2({kh} )+ |{v_x^2({kh} )} |+ |{v_y^2({kh} )} |} ], $$

(22G)$${c_{x,kk}} ={-} \frac{8}{9}\gamma {P_0}[{2v_{x|im}^2({kh} )+ |{v_x^2({kh} )} |+ |{v_y^2({kh} )} |} ], $$

(22H)$${c_{y,kk}} ={-} \frac{8}{9}\gamma {P_0}[{2v_{y|im}^2({kh} )+ |{v_x^2({kh} )} |+ |{v_y^2({kh} )} |} ], $$

(22I)$${d_{1,kk}} = \frac{{16}}{9}\gamma {P_0}{v_{x|re}}({kh} ){v_{y|re}}({kh} ), $$

(22J)$${d_{2,kk}} ={-} \frac{{16}}{9}\gamma {P_0}{v_{x|im}}({kh} ){v_{y|im}}({kh} ), $$

where ${|{{v_q}} |^2} = v_{q|re}^2 + v_{q|im}^2$, q refers to the x or y polarization component of the complex-envelope field, and $k = 0,\; 1, \ldots ,\; p - 1$.

3.1 Adjoint DBP problem solution

In the operator notation, the adjoint DBP problem (17) may be rewritten as:

(23)$$\frac{\partial }{{\partial z}}\hat{{\boldsymbol \lambda }} - ({{{\hat{{\boldsymbol D}}}_a} + {{\hat{{\boldsymbol N}}}_a}} )\; \hat{{\boldsymbol \lambda }} + \; {\hat{{\boldsymbol G}}_a}\hat{{\boldsymbol \lambda }} = {\hat{{\boldsymbol \lambda }}_{in}}\delta (z ), $$

where $\hat{{\boldsymbol \lambda }}({z,t} )= {[{{\lambda_{re}}\; \; {\lambda_{im}}} ]^T}$ is the vector of real and imaginary adjoint field at a certain grid point, ${\hat{{\boldsymbol \lambda }}_{in}}(t )= {[{\partial \psi /\partial {v_{re}}\; \partial \psi /\partial {v_{im}}} ]^T}$ is the vector of real and imaginary parts of the adjoint excitation at that spatial point. Note that the adjoint excitation ${\hat{{\boldsymbol \lambda }}_{in}}\mathrm{\delta }(z )$ is assumed to have a non-zero value only at $z = 0$. The linear, nonlinear, and attenuation operators (${\hat{{\boldsymbol D}}_a}$, ${\hat{{\boldsymbol N}}_a}$, and ${\hat{{\boldsymbol G}}_a}$) of the adjoint INLSE are given by:

(24A)$${\hat{{\boldsymbol D}}_a} = \left[ {\begin{array}{cc} {\frac{{{\beta_3}}}{6}\frac{{{\partial^3}}}{{\partial {t^3}}} + \frac{\alpha }{2}}&{\frac{{{\beta_2}}}{2}\frac{{{\partial^2}}}{{\partial {t^2}}}}\\ { - \frac{{{\beta_2}}}{2}\frac{{{\partial^2}}}{{\partial {t^2}}}}&{\frac{{{\beta_3}}}{6}\frac{{{\partial^3}}}{{\partial {t^3}}} + \frac{\alpha }{2}} \end{array}} \right], $$

(24B)$${\hat{{\boldsymbol N}}_a}(v )= \left[ {\begin{array}{cc} {2\gamma {P_0}{v_{re}}{v_{im}}}&{ - \gamma {P_0}({3v_{re}^2 + {v_{im}}} )}\\ {\gamma {P_0}({v_{re}^2 + 3v_{im}^2} )}&{ - 2\gamma {P_0}{v_{re}}{v_{im}}} \end{array}} \right], $$

(24C)$${\hat{{\boldsymbol G}}_a} = \left[ {\begin{array}{cc} {{g_j}\delta \left( {z - \mathop \sum \limits_{l = 1}^j {L_l}} \right)}&0\\ 0&{{g_j}\delta \left( {z - \mathop \sum \limits_{l = 1}^j {L_l}} \right)} \end{array}} \right], $$

where $j \in [{1,M} ]$ is the fiber span number and M is the total number of spans. We approximate the solution of Eq. (23), by employing the symmetric split-step scheme [32] in the fiber region, as follows:

(25)$$\hat{{\boldsymbol \lambda }}({({k + 1} )h,\; t} )= \exp \left( {\frac{h}{2}{{\hat{{\boldsymbol D}}}_a}} \right)\exp ({h{{\hat{{\boldsymbol N}}}_a}} )\exp \left( {\frac{h}{2}{{\hat{{\boldsymbol D}}}_a}} \right)\hat{{\boldsymbol \lambda }}({kh,\; t} )+ O({{h^3}} ), $$

for $k = 0,\; 1,\; 2, \ldots ,\; p - 1$. The parameter p is the total number of discretization cells, h is the spatial step size, and $\hat{{\boldsymbol \lambda }}({0,\; t} )= {\hat{{\boldsymbol \lambda }}_{in}}$. At the location of the attenuators, we have:

(26)$$\hat{{\boldsymbol \lambda }}({{k^ + }h,\; t} )= \exp ({ - {g_j}} )\hat{{\boldsymbol \lambda }}({{k^ - }h,\; t} ), $$

where $j = 1,\; 2,\; \ldots ,M$, and ${k^ + }/{k^ - }$ is the distance after/before the attenuator. To realize the first linear sub-step operation in Eq. (25), $\exp ({0.5h{{\hat{{\boldsymbol D}}}_a}} )\hat{{\boldsymbol \lambda }}({kh,\; t} )$, we write its equivalent complex differential equation, and apply a pair of fast Fourier transforms (FFTs) to obtain the linear sub-step solution ${\hat{{\boldsymbol \lambda }}^l}({({k + 1/2} )h,\; t} )$, as follows [32]:

(27)$${\lambda ^l}({({k + 1/2} )h,\; t} )= {\mathrm{{\cal F}}^{ - 1}}\left\{ {\exp \left[ {\frac{h}{2}\left( { - i\frac{{{\beta_3}}}{6}{\omega^3} + i\frac{{{\beta_2}}}{2}{\omega^2} + \frac{\alpha }{2}} \right)} \right]\mathrm{{\cal F}}\{{\lambda ({kh,\; t} )} \}} \right\}, $$

for $k = 0,\; 1,\; 2,\; \ldots ,\; p - 1$. The parameter $\omega $ is the angular frequency and $\lambda ({kh,t} )= {\lambda _{re}} + i{\lambda _{im}}$ is the complex adjoint field at $z = kh$. The Fourier and inverse Fourier transformations are denoted by $\mathrm{{\cal F}}\{\cdot \}$ and ${\mathrm{{\cal F}}^{ - 1}}\{\cdot \}$, respectively. After determining ${\lambda ^l}({({k + 1/2} )h,\; t} )$, we obtain the linear sub-step solution in complex domain as: ${\hat{{\boldsymbol \lambda }}^l}({({k + 1/2} )h,\; t} )= {[{Re\{{{\lambda^l}} \}\; Im\{{{\lambda^l}} \}} ]^T}$. Then, in order to apply the nonlinear operation $exp ({h{{\hat{{\boldsymbol N}}}_a}} ){\hat{{\boldsymbol \lambda }}^l}({({k + 1/2} )h,\; t} )$, and determine the nonlinear step solution ${\hat{{\boldsymbol \lambda }}^{nl}}({({k + 1/2} )h,\; t} )$, it is required to solve the following differential equation

(28)$$\frac{\partial }{{\partial z}}({{{\hat{{\boldsymbol \lambda }}}^{nl}}} )= {\hat{{\boldsymbol N}}_a}{\hat{{\boldsymbol \lambda }}^{nl}}({z,t} ), $$

where ${\hat{{\boldsymbol \lambda }}^{nl}}({0,t} )= {\hat{{\boldsymbol \lambda }}^l}({({k + 1/2} )h,\; t} )$. In the case of INLSE, the nonlinear operator ${\hat{{\boldsymbol N}}_b}$ is a diagonal matrix and a nonlinear step similar to Eq. (28) can easily be accomplished. However, in our case, ${\hat{{\boldsymbol N}}_a}\; $ is not a diagonal matrix and thus, Eq. (28) cannot be directly solved. We need first to diagonalize ${\hat{{\boldsymbol N}}_a}$ so that the systems corresponding to real and imaginary parts are decoupled. It can be shown that the eigenvalue decomposition (EVD) of ${\hat{{\boldsymbol N}}_a}$ is given as:

(29)$${\hat{{\boldsymbol N}}_a} = {\boldsymbol P}{{\boldsymbol D}_\mu }{{\boldsymbol P}^{ - 1}}, $$

where the diagonal matrix of the eigenvalues ${{\boldsymbol D}_\mu }$, the eigenvectors’ matrix ${\boldsymbol P}$ and its inverse ${{\boldsymbol P}^{ - 1}}$ are given by

(30A)$${{\boldsymbol D}_\mu }(v )= i\sqrt 3 \gamma {P_0}{|v |^2}\left[ {\begin{array}{cc} 1&0\\ 0&{ - 1} \end{array}} \right], $$

(30B)$${\boldsymbol P}(v )= \left[ {\begin{array}{cc} i&{ - i}\\ {\frac{{\sqrt 3 {{|v |}^2} + 2i{v_{re}}{v_{im}}}}{{{{|v |}^2} + 2v_{re}^2}}}&{\frac{{\sqrt 3 {{|v |}^2} - 2i{v_{re}}{v_{im}}}}{{{{|v |}^2} + 2v_{re}^2}}} \end{array}} \right], $$

(30C)$${{\boldsymbol P}^{ - 1}}(v )= \left[ {\begin{array}{cc} { - \frac{{{v_{re}}{v_{im}}}}{{\sqrt 3 {{|v |}^2}}} - i\frac{1}{2}\; \; }&{\frac{{({{{|v |}^2} + 2v_{re}^2} )}}{{2\sqrt 3 {{|v |}^2}}}}\\ { - \frac{{{v_{re}}{v_{im}}}}{{\sqrt 3 {{|v |}^2}}} + i\frac{1}{2}}&{\frac{{({{{|v |}^2} + 2v_{re}^2} )}}{{2\sqrt 3 {{|v |}^2}}}} \end{array}} \right], $$

where $v = {v_{re}} + i{v_{im}}$ is the original DBP problem solution at the current spatial point. It should be clear that in the case of a dual-polarized system, a diagonalization of $4 \times 4$ matrix would be required. Substituting the EVD of ${\hat{{\boldsymbol N}}_a}$ in Eq. (28), and introducing $\hat{{\boldsymbol W}} = {[{{W_1}\; {W_2}} ]^T} = {{\boldsymbol P}^{ - 1}}{\hat{{\boldsymbol \lambda }}^{nl}}$, Eq. (28) casts as:

(31)$$\frac{\partial }{{\partial z}}({\hat{{\boldsymbol W}}} )= {{\boldsymbol D}_\mu }\hat{{\boldsymbol W}}({z,t} ), $$

where $\hat{{\boldsymbol W}}({0,t} )= {{\boldsymbol P}^{ - 1}}{\hat{{\boldsymbol \lambda }}^l}({({k + 1/2} )h,\; t} )$. Note that this extra diagonalization step represents the main difference between the conventional SSFS and the modified SSFS (M-SSFS) method. The solution of Eq. (31) can now be obtained directly as:

(32)$$\hat{{\boldsymbol W}}({({k + 1} )h,\; t} )= \exp ({h{{\boldsymbol D}_{\boldsymbol \mu }}} )\hat{{\boldsymbol W}}({kh,\; t} ), $$

where $k = 0,\; 1,\; \ldots ,\; p - 1$. The vector of the nonlinear step solution is then obtained by:

(33)$${\hat{{\boldsymbol \lambda }}^{nl}}({({k + 1/2} )h,\; t} )= {[{\lambda_{re}^{nl}\; \; \lambda_{im}^{nl}} ]^T} = {\boldsymbol P\hat{W}}({({k + 1} )h,\; t} ). $$

Once the nonlinear step is performed, we apply the second linear sub-step operation $exp ({0.5h{{\hat{{\boldsymbol D}}}_a}} ){\hat{{\boldsymbol \lambda }}^{nl}}({({k + 1/2} )h,\; t} )$ to determine the adjoint field at the next grid point, as follows [32]:

(34)$$\lambda ({({k + 1} )h,\; t} )= {\mathrm{{\cal F}}^{ - 1}}\left\{ {\exp \left[ {\frac{h}{2}\left( { - i\frac{{{\beta_3}}}{6}{\omega^3} + i\frac{{{\beta_2}}}{2}{\omega^2} + \frac{\alpha }{2}} \right)} \right]\mathrm{{\cal F}}\{{{\lambda^{nl}}({({k + 1/2} )h,\; t} )} \}} \right\}, $$

where $k = 0,\; 1,\; 2,\; \ldots ,\; p - 1$ and ${\lambda ^{nl}}({({k + 1/2} )h,\; t} )= \lambda _{re}^{nl} + i\lambda _{im}^{nl}$. Finally, the vector of the next grid point solution is given by: $\hat{{\boldsymbol \lambda }}({({k + 1} )h,\; t} )= {[{Re\{\lambda \}\; Im\{\lambda \}} ]^T}$.

3.2 Algorithm

First, the output of the fiber-optic link is passed through a DBP block and the adjoint excitation ${\hat{{\boldsymbol \lambda }}_{in}}$ is calculated (See Fig. 1). The algorithm of the introduced M-SSFS utilized for solving the adjoint DBP problem (17) is then summarized as follows:

Step 0: The forward distance parameter is set to: $z = 0$, the current span index is set to: $j = 1$, and the initial adjoint field ${\lambda ^l}$ is calculated, according to the desired objective function as: ${\lambda ^l} = {\lambda _{in}} = \frac{{\partial \psi }}{{\partial {v_{re}}}} + i\frac{{\partial \psi }}{{\partial {v_{im}}}}$, where ${v_{re}}$ and ${v_{im}}$ are the real and imaginary parts of the original DBP problem output $.$ Notice that, in practice, the objective function F is defined in terms of the DBP output signal ${{\boldsymbol V}_{out}}$. We therefore assume that the adjoint excitation ${\boldsymbol Q}_{in}^\lambda (t )= \partial \psi /\partial {\boldsymbol V}$ is nonzero only at the input of the adjoint DBP problem, i.e., at $z = 0$ or $Z = {L_{tot}}$.

Step 1: At the beginning, the loss step is performed as follows:

(35)$${\lambda ^l} = \exp (g ){\lambda ^l}. $$

Step 2: The parameters of the current adjoint DBP and amplifier span are updated, and the step-size h utilized for the current span is determined. A first half-linear step is then performed as follows:

(36)$$\left\{ \begin{array}{c} {{\Lambda ^l} = \mathrm{{\cal F}}\{{{\lambda^l}} \}}\\ {\Lambda ^l} = {\Lambda ^l} \times H({\omega ,h/2} ) \\ {{\lambda^l} = {\mathrm{{\cal F}}^{ - 1}\{{\Lambda ^l}} \}}\\ {{\hat{{\boldsymbol \lambda }}}^l} = {{[{Re\{{{\lambda^l}} \}\; \; Im\{{{\lambda^l}} \}} ]}^T} \end{array}\right., $$

where $H({\omega ,z} )= \exp \left( {\left[ { - \frac{{i{\beta_3}{\omega^3}}}{6} + \frac{{i{\beta_2}{\omega^2}}}{2} + \frac{\alpha }{2}} \right]z} \right)$ is the linear transfer function of the adjoint DBP problem.

Step 3: We then apply a modified nonlinear step that includes the following operations:

(37)$$\left\{ {\begin{array}{c} {\begin{array}{c} {\hat{{\boldsymbol W}} = {{\boldsymbol P}^{ - 1}}{{\hat{{\boldsymbol \lambda }}}^l}}\\ {\hat{{\boldsymbol W}} = exp ({{{\boldsymbol D}_\mu }(v )h} )\hat{{\boldsymbol W}}} \end{array}}\\ {{{\hat{{\boldsymbol \lambda }}}^{nl}} = {\boldsymbol P\hat{W}}}\\ {{\lambda^l} = \lambda_{re}^{nl} + i\lambda_{im}^{nl}} \end{array}} \right., $$

where ${\hat{{\boldsymbol \lambda }}^{nl}} = {[{\lambda_{re}^{nl}\; \; \lambda_{im}^{nl}} ]^T}$ and $v = {v_{re}} + i{v_{im}}$. The matrices ${{\boldsymbol P}^{ - 1}}$, ${{\boldsymbol D}_\mu }$ and ${\boldsymbol P}$ are as given in Eq. (30).

Step 4: After applying the nonlinear step, the distance parameter is updated as: $z = z + h$. If z is still less than the length of the current span, a full linear-step is performed as follows:

(38)$$\left\{ {\begin{array}{c} {\begin{array}{c} {{\Lambda ^l} = \mathrm{{\cal F}}\{{{\lambda^l}} \}}\\ {{\Lambda ^l} = {\Lambda ^l} \times H({\omega ,h} )} \end{array}}\\ {{\lambda^l} = {\mathrm{{\cal F}}^{ - 1}}\{{{\Lambda ^l}} \}}\\ {{{\hat{{\boldsymbol \lambda }}}^l} = {{[{Re\{{{\lambda^l}} \}\; \; Im\{{{\lambda^l}} \}} ]}^T}} \end{array}} \right., $$

Steps 3 and 4 are repeated until z becomes larger than or equal to the current fiber span length. Once the end of the current span is reached, the last half-linear step is applied, as given by Eq. (36). The current span index j is then incremented as: $j = j + 1$, and steps 1, 2, 3, and 4 are repeated until j exceeds the total number of spans M. Then, the final output of the adjoint DBP problem is assigned as: ${\lambda _{out}} = {\lambda ^l}$, and the algorithm terminates.

3.3 Complexity analysis

The computational complexity of each adjoint nonlinear step in the M-SSFS algorithm is larger than that in the conventional SSFS algorithm due to the extra step required to diagonalize the adjoint nonlinear matrix ${{\boldsymbol \varGamma }^\lambda }({\boldsymbol V} )$. The adjoint linear step though has the same complexity of the linear step in the conventional SSFS algorithm.

To analyze the complexity of the M-SSFS algorithm, we consider a block size of ${N_s}$ samples. One step of the adjoint linear step, Eq. (36) or Eq. (38), requires the evaluation of two ${N_s} - $ point complex fast Fourier transforms (FFTs) and ${N_s}$ complex multiplications. Since each complex FFT operation costs $0.5{N_s}{log _2}({{N_s}} )$ complex multiplications, the complexity of one linear step is approximately $[{{N_s} + {N_s}{{log }_2}({{N_s}} )} ]$ complex multiplications [33–35]. Note that one complex multiplication involves 4-real multiplications [36]. Therefore, the total number of real multiplications per one adjoint linear step is $4{N_s}[{1 + {{log }_2}({{N_s}} )} ]$.

As for the first adjoint nonlinear sub-step, evaluating the components of the ${{\boldsymbol P}^{ - 1}}$ matrix costs $9{N_s} - $ real multiplications, and multiplying by the vector ${\hat{{\boldsymbol \lambda }}^l}$ requires $4{N_s} - $ real multiplication. Notice that the squared magnitude function evaluation requires two real multiplications, and the complexity of a real division operation is the same as that of a real multiplication [36]. To calculate the components of the ${{\boldsymbol D}_\mu }$ matrix, extra $2{N_s} - $ real multiplications are needed. Multiplying ${{\boldsymbol D}_\mu }$ by the current step h needs $2{N_s}$ extra real multiplications. The exponential function $\exp ({{{\boldsymbol D}_\mu }h} )$ is then implemented with a lookup table, while the multiplication in $\exp ({{{\boldsymbol D}_\mu }h} )\times \hat{{\boldsymbol W}}$ requires $2{N_s}$ more complex multiplications or $8{N_s}$ more real multiplications. Thus, the total cost of the second adjoint nonlinear sub-step is $12{N_s} - $ real multiplications. Evaluating the ${\boldsymbol P}$ matrix components in the third sub-step requires extra $5{N_s} - $ real multiplications, and the multiplication of ${\boldsymbol P} \times \hat{{\boldsymbol W}}$ needs extra $12{N_s} - $ real multiplications ($4{N_s} - $ real multiplications $+ $ $2{N_s} - $ complex multiplications). Therefore, each adjoint nonlinear step (37) in the M-SSFS algorithm requires a total of $42{N_s} - $ real multiplications.

Suppose that the entire $M $-span adjoint DBP simulation is performed using ${K_{NLS}}$ nonlinear steps. The total number of linear steps needed will thus be ${K_{NLS}} + M$, where the step length used in the first and last linear steps of each span is a half-step, i.e., $h/2$. However, all intermediate linear steps within a span use full length step h. Also, the total number of real multiplications required by the amplifier step (35) is $2{N_s}M$, where each amplifying step costs $2{N_s} - $ real multiplications. In total, the number of real multiplications for the symmetric M-SSFS algorithm is:

(39)$$4{N_s}({{K_{NLS}} + M} )({1 + {{log }_2}({{N_s}} )} )+ 42{N_s}{K_{NLS}} + 2{N_s}M, $$

where M, ${N_s}$, ${K_{NLS}}$ are the number of spans, the number of samples, and the number of performed nonlinear steps, respectively. The computational complexity of the M-SSFS algorithm as compared to the complexity of the conventional SSFS algorithm are summarized in Table 1. As can be seen, an overhead of $32{N_s} - $ real multiplications per step are required by the M-SSFS algorithm, as opposed to the conventional SSFS algorithm, which is due to the extra diagonalization step needed within each adjoint nonlinear step.

Table 1. Computational Complexity of the Symmetric Conventional SSFS and the Modified Split-Step Fourier Scheme (M-SSFS) Algorithms

View Table | View all tables in this article

To better compare between the computational cost of the conventional SSFS method and the M-SSFS method cost, we define an M-SSFS overhead parameter $\tau $ as the ratio of the modified SSFS cost over the conventional SSFS cost. Taking into account that the number of performed nonlinear steps ${K_{NLS}}$ is usually much larger than the number of spans M in practical simulations, it can be shown that the M-SSFS overhead parameter is given as:

(40)$$\tau = 1 + \frac{{16}}{{7 + 2{{\log }_2}({{N_s}} )}}. $$

Next, we introduce the adjoint-based optimization algorithm proposed for accelerating the optimization of the adaptive digital back propagation parameters.

4. Adjoint-based optimization algorithm

In order to train and optimize the design parameters of an adaptive digital back propagation (A-DBP), a constrained nonlinear optimization problem has to be solved. The general formulation of this training problem is given as:

(41)$${\mathrm{subject\; to}}{\kern 20pt}\begin{array}{c} {\mathop {\min }\limits_{\boldsymbol x} F({{\boldsymbol x},{\boldsymbol V}} )}\\ {{c_j}({\boldsymbol x} )\ge 0,{\kern 10pt}j = 1,\; 2,\; \ldots ,\; m,} \end{array}$$

where $F({{\boldsymbol x},{\boldsymbol V}} )$ is a general objective function that measures the error between the current and desired DBP outputs, ${\boldsymbol x} \in {\mathrm{\mathbb{R}}^N}$ is the DBP design parameters vector, and ${\boldsymbol V}$ is the system state vector. The inequality constraints ${c_j}({\boldsymbol x} )$, $j = 1,\; 2,\; \ldots ,\; m$, might be, in general, nonlinear functions of the optimization variables ${\boldsymbol x}$, and m is the total number of constraints. Particularly, in the training of the adaptive DBP, the constraints ${c_j}({\boldsymbol x} )$ restrict the variation of the optimization parameters within the practical ranges, and/or guide the optimizer to the global optimum solution. Note also that in the definition of (41), only inequality constraints are considered. However, an equal constraint $h({\boldsymbol x} )= 0$ can simply be imposed in Eq. (41) as a couple of two inequality constraints given by [37]: $h({\boldsymbol x} )\ge 0$ and $- h({\boldsymbol x} )\ge 0$.

The first order optimality condition of problem (41) is obtained utilizing the Lagrangian function [37]. Since the Lagrangian function is applied on equality constrains, we introduce an active set $\mathrm{{\cal A}}({\boldsymbol x} )$ that contains all indices of the inequality constraints ${c_j}({\boldsymbol x} )$ that are equal to $0$ at the current design vector ${\boldsymbol x}$, defined as [37]: $\mathrm{{\cal A}}({\boldsymbol x} )= l\; :{c_l}({\boldsymbol x} )= 0$. The Lagrangian function corresponding to problem (41) is thus formulated as follows:

(42)$$\; \mathrm{{\cal L}}({{\boldsymbol x},{\boldsymbol \pi }} )= F({{\boldsymbol x},{\boldsymbol V}} )- \mathop \sum \nolimits_{l \in \mathrm{{\cal A}}({\boldsymbol x} )} {\pi _l}{c_l}({\boldsymbol x} ), $$

where the Lagrangian multipliers ${\pi _l}$ are positive nonzero scalers. The first order Karush-Kuhn-Tucker (KKT) optimality conditions of Eq. (42) are obtained by:

(43)$$\nabla \mathrm{{\cal L}}({{\boldsymbol x},{\boldsymbol \pi }} )= \left( {\begin{array}{c} {{\nabla_{\boldsymbol x}}F - \mathop \sum \nolimits_{l \in \mathrm{{\cal A}}} {\pi_l}{\nabla_{\boldsymbol x}}{c_l}}\\ { - {\boldsymbol C}} \end{array}} \right) = 0, $$

where ${\pi _l} \ne 0$, $\forall \,l \in \mathrm{{\cal A}}({\boldsymbol x} )$, and $\nabla $ denotes gradient operation with respect to the vectors ${\boldsymbol x}$ and ${\boldsymbol \pi }$, i.e., $\nabla = {[{\nabla_{\boldsymbol x}^T\,\nabla_{\boldsymbol \pi }^T} ]^T} = {[{\partial /\partial {x_1}\,\partial /\partial {x_2} \cdots \,\partial /\partial {x_N}\,\partial /\partial {\pi_{{i_1}}}\,\partial /\partial {\pi_{{i_2}}}\, \cdots \,\partial /\partial {\pi_{{i_{q({\boldsymbol x} )}}}}} ]^T}$. The active constraint vector ${\boldsymbol c}({\boldsymbol x} )$ is a column vector that contains all active constraints at certain ${\boldsymbol x}$, i.e., ${\boldsymbol C}({\boldsymbol x} )= {[{{c_{{i_1}}}\,{c_{{i_2}}}\, \cdots \,{c_{{i_{q({\boldsymbol x} )}}}}} ]^T}$, where $\mathrm{{\cal A}}({\boldsymbol x} )= \{{{i_1},{i_2}, \cdots ,{i_{q({\boldsymbol x} )}}} \}$, and $q({\boldsymbol x} )\le m$ is the number of active constraints at ${\boldsymbol x}$. Starting from an initial guess ${{\boldsymbol x}_0}$ and ${{\boldsymbol \pi }_0}$, the optimal set (${{\boldsymbol x}^{\ast }}$, ${{\boldsymbol \pi }^{\ast }}$) that satisfies all constraints of Eq. (41) and at which the KKT conditions (43) holds can be obtained recursively using any well-established nonlinear optimization algorithm. In this work, we utilize the sequential quadratic programing (SQP) method [38–40].

SQP is one of the most effective methods for the numerical solution of nonlinear constrained optimization problems. Its basic idea is to model the objective function by its local quadratic function approximation at the current design point ${{\boldsymbol x}_k}$, and to linearize the most violated constraint(s), i.e., active constraints, at point ${{\boldsymbol x}_k}$, in order to construct a quadratic programing (QP) subproblem. The QP subproblem is then solved, using Newton or quasi-Newton method, to obtain a new solution iterate ${{\boldsymbol x}_{k + 1}}$. The procedure is repeated iteratively until converging to a local optimum solution ${{\boldsymbol x}^\ast }$ that minimizes the objective function and satisfies all problem constraints.

A framework of the SQP method is summarized as follows. Starting from a current iterate solution ${{\boldsymbol x}_k}$, the SQP algorithm determines the search direction of the next iterate by solving the following QP subproblem [38–40]:

(44)$${\mathrm{subject\; to}}\begin{array}{c} {\mathop {\min }\limits_{{{\boldsymbol s}_k}} \; {{({{\nabla_{\boldsymbol x}}{F_k}} )}^T}{{\boldsymbol s}_k} + \frac{1}{2}{{\boldsymbol s}_k}^T({\nabla_{{\boldsymbol xx}}^2{\mathrm{{\cal L}}_k}} ){\boldsymbol \; }{{\boldsymbol s}_k}}\\ {{c_l}({{{\boldsymbol x}_k}} )+ {{({{\nabla_{\boldsymbol x}}{c_l}({{{\boldsymbol x}_k}} )} )}^T}{{\boldsymbol s}_k} = 0,\; \; \forall \; l \in \mathrm{{\cal A}}({{{\boldsymbol x}_k}} ),} \end{array}$$

where ${{\boldsymbol s}_k} = {{\boldsymbol x}_{k + 1}} - {{\boldsymbol x}_k}$ is the search direction for the next iterate solution ${{\boldsymbol x}_{k + 1}}$. The matrix $\nabla _{{\boldsymbol xx}}^2{\mathrm{{\cal L}}_k}$ is the hessian matrix of the Lagrangian function $\mathrm{{\cal L}}({{{\boldsymbol x}_k},{{\boldsymbol \pi }_k}} )$, given in Eq. (42), with respect to the ${\boldsymbol x}$ parameters only. This QP problem is obtained through replacing the Lagrangian function (42) by its local quadratic approximation at ${{\boldsymbol x}_k}$. The current active constraints are also replaced by their local linear approximations at current ${{\boldsymbol x}_k}$. Using the constraint given in Eq. (44) and by omitting the constant terms, it can be shown that the objective function provided in the QP subproblem (44) is equivalent to the local quadratic approximation of the Lagrangian function $\mathrm{{\cal L}}({{\boldsymbol x},{\boldsymbol \pi }} )$ at ${{\boldsymbol x}_k}$ [40]. In order to solve the QP subproblem (44), we write the KKT conditions of Eq. (44), and apply the quasi-Newton’s method to solve the obtained nonlinear system of equations [40]:

(45)$$\left( {\begin{array}{c} {\nabla_{{\boldsymbol xx}}^2{{\mathbf {\cal L}}_{\boldsymbol k}}{{\boldsymbol s}_{\boldsymbol k}} + {\nabla_{\boldsymbol x}}{{\boldsymbol F}_{\boldsymbol k}} - {\boldsymbol J}_{\boldsymbol k}^{\boldsymbol T}{{\boldsymbol \pi }_{{\boldsymbol k} + 1}}}\\ {{{\boldsymbol J}_{\boldsymbol k}}{{\boldsymbol s}_{\boldsymbol k}} + {{\boldsymbol C}_{\boldsymbol k}}} \end{array}} \right) = 0, $$

where ${{\boldsymbol J}_k}$ is the Jacobian matrix of the active constraints ${\boldsymbol C}({\boldsymbol x} )$ at ${{\boldsymbol x}_k}$. It is given by: ${\boldsymbol J}{({\boldsymbol x} )^T} = [{\nabla {c_{{i_1}}}({\boldsymbol x} ),\; \nabla {c_{{i_1}}}({\boldsymbol x} ),\; \cdots ,\; \nabla {c_{{i_{q({\boldsymbol x} )}}}}({\boldsymbol x} )} ]$, where $\mathrm{{\cal A}}({\boldsymbol x} )= \{{{i_1},{i_2}, \cdots ,{i_{q({\boldsymbol x} )}}} \}$, and $q({\boldsymbol x} )\le m$. Reformulating (45), replacing the Hessian matrix $\nabla _{{\boldsymbol xx}}^2\mathrm{{\cal L}}({{{\boldsymbol x}_k},{{\boldsymbol \pi }_k}} )$ with the Broyden–Fletcher–Goldfarb–Shanno (BFGS) approximation [41], the search direction of the next iterate ${{\boldsymbol s}_k}$ and the new Lagrangian multipliers ${{\boldsymbol \pi }_{k + 1}}$ can thus be identified with the solution of:

(46)$$\left[ {\begin{array}{cc} {{{\boldsymbol B}_k}}&{ - {\boldsymbol J}_k^T}\\ {{{\boldsymbol J}_k}}&0 \end{array}} \right]\left[ {\begin{array}{c} {{{\boldsymbol s}_k}}\\ {{{\boldsymbol \pi }_{k + 1}}} \end{array}} \right] = \left[ {\begin{array}{c} { - {\nabla_{\boldsymbol x}}{F_k}}\\ { - {{\boldsymbol C}_k}} \end{array}} \right], $$

where the approximation of the Hessian matrix ${\boldsymbol B}(x )$ at the new point ${{\boldsymbol x}_{k + 1}}$ is obtained through the BFGS updating formula [41]:

(47)$${{\boldsymbol B}_{k + 1}} = {{\boldsymbol B}_k} - \frac{{{{\boldsymbol B}_k}{{\boldsymbol s}_k}{\boldsymbol s}_k^T{{\boldsymbol B}_k}}}{{{\boldsymbol s}_k^T{{\boldsymbol B}_k}{{\boldsymbol s}_k}}} + \frac{{{{\boldsymbol y}_k}{\boldsymbol y}_k^T}}{{{\boldsymbol y}_k^T{{\boldsymbol s}_k}}}, $$

where ${{\boldsymbol y}_k} = \; {\nabla _{\boldsymbol x}}\mathrm{{\cal L}}({{{\boldsymbol x}_{k + 1}},{{\boldsymbol \pi }_{k + 1}}} )- {\nabla _{\boldsymbol x}}\mathrm{{\cal L}}({{{\boldsymbol x}_k},{{\boldsymbol \pi }_k}} )$. Equations (46) and (47) summarize the SQP step to update the current design vector. It is clear that the calculations of Eqs. (46) and (47) involve evaluating the gradient of the objective function and the Jacobian of the active constraints ${\boldsymbol J}({\boldsymbol x} )$ at the current design vector ${{\boldsymbol x}_k}$. The Jacobian matrix ${\boldsymbol J}({\boldsymbol x} )$ is obtained analytically using the definitions of the constraints ${C_j}({\boldsymbol x} )$, $j = 1,\; 2, \cdots ,m$. Contrarily, the gradient ${\nabla _{\boldsymbol x}}{F_k}$ cannot be determined analytically due to the implicit dependence of the A-DBP training function $F({{\boldsymbol x},{\boldsymbol V}} )$ on the system state vector ${\boldsymbol V}$. However, we exploit the ASA algorithm, proposed in Section 3, to estimate the gradient of F at each optimization iterate using only one extra adjoint simulation. The introduced optimization algorithm is therefore denoted as an adjoint-based optimization (ABO) method.

4.1 ABO algorithm

A summary of the proposed ABO algorithm is as follows:

1. Define the training objective function $F({{\boldsymbol x},{\boldsymbol V}} )$, the initial solution ${{\boldsymbol x}_0}$, the initial Lagrangian multipliers ${{\boldsymbol \pi }_0}$, the stopping gradient criterion $\varepsilon $, and the problem constraints ${c_j}({\boldsymbol x} )$, $j = 1,\; 2,\; \ldots ,\; m$.
2. Set: $k = 0$, ${{\boldsymbol x}_k} = {{\boldsymbol x}_0}$, ${{\boldsymbol \pi }_k} = {{\boldsymbol \pi }_0}$, and ${{\boldsymbol B}_k} = {\boldsymbol I}$. Evaluate current constraints ${c_j}({{{\boldsymbol x}_k}} )$, $j = 1,\; 2,\; \ldots ,\; m$, determine current active constraints vector ${{\boldsymbol C}_k}$, update $\mathrm{{\cal A}}({{{\boldsymbol x}_k}} )$, and obtain current Jacobian matrix ${{\boldsymbol J}_k}$.
3. Determine the current objective function ${F_k}$ and its gradient ${\nabla _{\boldsymbol x}}{F_k}$, via performing the original and adjoint DBP simulations, Eqs. (6) and (17), respectively.
4. Check whether the current solution satisfies stopping criteria (i.e. satisfying all constraints and having a gradient norm value less than $\varepsilon $), or not. If satisfied, the optimal solution is assigned: ${{\boldsymbol x}^\ast } = {{\boldsymbol x}_k}$, and the algorithm terminates. Otherwise, continue to next step.
5. Solve the quasi-Newton-KKT system of equations (46) to obtain the next search direction and determine the new solution set (${{\boldsymbol x}_{k + 1}},\; {{\boldsymbol \pi }_{k + 1}}$), where ${{\boldsymbol x}_{k + 1}} = {{\boldsymbol x}_k} + {{\boldsymbol s}_k}$.
6. Evaluate next constraints ${c_j}({{{\boldsymbol x}_{k + 1}}} )$, $j = 1,\; 2,\; \ldots ,\; m$, determine next active constraints vector ${{\boldsymbol C}_{k + 1}}$, update $\mathrm{{\cal A}}({{{\boldsymbol x}_{k + 1}}} )$, and obtain next Jacobian matrix ${{\boldsymbol J}_{k + 1}}$. Determine the next objective function ${F_{k + 1}}$ and its gradient ${\nabla _{\boldsymbol x}}{F_{k + 1}}$, using the ASA approach.
7. Evaluate the next Hessian matrix approximation ${{\boldsymbol B}_{k + 1}}$, using the BFGS formula (47). Set next parameters as current, i.e., $\mathrm{{\cal A}}({{{\boldsymbol x}_k}} )= \mathrm{{\cal A}}({{{\boldsymbol x}_{k + 1}}} )$, ${{\boldsymbol C}_k} = {{\boldsymbol C}_{k + 1}}$, ${{\boldsymbol J}_k} = {{\boldsymbol J}_{k + 1}}$, ${F_k} = {F_{k + 1}}$, ${\nabla _{\boldsymbol x}}{F_k} = {\nabla _{\boldsymbol x}}{F_{k + 1}}$, ${{\boldsymbol B}_k} = {{\boldsymbol B}_{k + 1}}$, ${{\boldsymbol x}_k} = {{\boldsymbol x}_{k + 1}}$, and ${{\boldsymbol \pi }_{k + 1}} = {{\boldsymbol \pi }_k}$. Update the iteration index: $k = k + 1$.

We repeat steps 4 to 7 until terminating. It is worth emphasizing that a feasible initial solution is required to be provided to the algorithm, i.e., a solution that satisfies the problem constraints [40].

Next, we investigate the robustness and efficiency of the proposed ABO algorithm for the training of the adaptive DBP, through a number of numerical examples.

5. Results

For all results presented in this section, we consider Monte-Carlo simulations of the single-channel single-polarization fiber-optic communication system, shown in Fig. 2. The system is operating at $28$ Gbaud, and the transmitted data have a root raised cosine pulse shaping with a roll-off factor of $0.1$. The modulation format used in the simulation is a $16$-quadrature amplitude modulation ($16$-QAM) format. The transmission channel is a multi-span optical fiber system. Each span consists of a standard single-mode fiber (SSMF) followed by an inline erbium doped-fiber amplifier (EDFA). The simulation parameters of the fiber are given in Table 2. In order to simulate the scenario of a network communication, we assume that the lengths of fiber spans are unknown at the receiver. Although the fiber parameters may change slightly from span to span due to temperature and environmental fluctuations, we assume that they are the same for each span (except the length). The length of each fiber span is different, with values varying between $50$ km and $125\; $ km. This is the typical range of the fiber span lengths in North America [3]. The EDFA at the end of each fiber span has a gain that fully compensates for the respective fiber loss, with a noise figure ${N_f} = 4.77$ dB. In our simulations, we particularly consider two cases: 4-span and 20-span optical fiber communication systems. Their actual fiber lengths are respectively given in km as: $L = {[{{L_1}\; {L_2}\; {L_3}\; {L_4}} ]^T} = {[{50\; \; \; 80\; \; \; 85\; \; \; 125} ]^T}$, and $L = [50\; 65\; 80\; 125\; 50\; 80\; 85\; 75\; 125\; 50\; 50\; 50\; 50\; 80\; 85\; 80\; 80\; 80$

Fig. 2. Block diagram of the fiber-optic communication system considered in the simulation. Tx: transmitter; BPF: band pass filter; Rx: receiver; A/D: analog to digital converter; CDC: chromatic dispersion compensation; ABO: adjoint-based optimizer; A-DBP: adaptive digital back propagation; MF: matched filter.

Download Full Size | PDF

Table 2. Fiber Simulation Parameters

View Table | View all tables in this article

$80\; 80{]^T}$. The forward propagation in these scenarios are simulated using the split-step Fourier scheme (SSFS) algorithm [32], with an adaptive step-size guaranteeing that the maximum nonlinear phase per step does not exceed $0.01$ radian [42].

A Gaussian band pass filter (BPF) with $50$ GHz bandwidth is used before the coherent receiver. The front-end coherent receiver translates the in-phase and quadrature-phase components of the received optical signal into electrical signal components. An analog to digital converter (ADC) is utilized before the digital signal processing (DSP) unit. The sampling rate for the fiber-optic system simulation is $8$ samples per symbol whereas after the ADC, it is reduced to $2$ samples per symbol. The chromatic dispersion compensation (CDC) block is used to estimate the total length of the transmission fiber link ${L_{tot}}$. It is only applied in the training mode (see Fig. 2), i.e., during the training process of the adaptive digital back propagation (A-DBP). The CDC can be performed either in time domain or frequency domain [3]. In this paper, we used the frequency domain approach using the fast Fourier transforms (FFTs) with $({\beta _2} \times {L_{tot}}$) as an unknown parameter to be optimized. The optimum $({\beta _2} \times {L_{tot}}$) is found by minimizing the BER. Once the value of the total accumulated dispersion (${\beta _2} \times {L_{tot}}$) introduced by the channel is estimated, we divide it by the dispersion coefficient ${\beta _2}$ in order to obtain an estimation of the total transmitted distance ${L_{tot}}$. After the training, the A-DBP switches to the transmission mode where it blindly compensates for the linear and nonlinear distortions induced by the fiber channel. The output of the A-DBP passes through a matched filter (i.e. a root raised cosine filter) to limit the noise. The bit error rate (BER) is calculated by comparing the symbol sequence after the filter with that at the transmitter and symbols that cross the boundaries are counted as error symbols.

Unless otherwise specified, the A-DBP training problem is defined as:

(48)$$\textrm{subject to}\begin{array}{c} {\mathop {\min }\limits_{\boldsymbol{x}} \int_{ - {T_m}}^{{T_m}} {{{|{{v_{out}} - {u_{tx}}} |}^2}dt} }\\ {\left\{ {\begin{array}{l} {\begin{array}{ll} {{L_1} + {L_2} + \cdots + {L_M} = {L_{tot}},}\\ {50\textrm{ km} \le {L_j} \le 125\textrm{ km},{\kern 10pt}j = 1,2, \cdots ,M,} \end{array}}\\ {} \end{array}} \right.} \end{array}$$

where ${v_{out}}$ is the normalized output signal of the A-DBP, ${u_{tx}}$ is the normalized transmitted signal in the electrical domain, and ${T_m}$ is half the computational window width used in the DSP unit. The design vector ${\boldsymbol x} = {[{{L_1}\; {L_2}\; \cdots {L_M}} ]^T}$ contains the lengths of all fiber spans, where ${L_j}$ is the fiber length of the ${j^{th}}$ span, and M is the number of spans. The training problem (48) aims at obtaining the optimal design vector ${{\boldsymbol x}^\ast }$ that minimize the error between the equalized signal ${v_{out}}$ (i.e. output signal of the A-DBP) and the desired output ${u_{tx}}$. Note that in order to reduce the estimation time, we assume that all fiber span parameters are known to the receiver, except for the length of each fiber. However, the proposed ABO algorithm is general and can be used to estimate all fiber parameters as well as the average power launched to the fiber. In practice, most of the fiber links are standard single-mode fibers. Therefore, fiber span parameters rather than the length (e.g. 2^nd order dispersion coefficient ${\beta _2}$ and nonlinear coefficient $\gamma $) are known and similar for all spans. Their values might only be subject to slight deviations from nominal values given in Table 2 due to environmental changes. It is clear that the objective function in Eq. (48) has a form similar to the general form of Eq. (10) which has been used to derive the adjoint sensitivity problem of the original DBP simulation (6). Hence, our adjoint sensitivity analysis algorithm can be utilized to estimate the gradient of this objective function using only one extra adjoint simulation. The equality constraint in Eq. (48) is a linear compensation (LC) constraint. It is introduced to guide the ABO to a global solution that minimizes both linear and nonlinear distortions of the fiber. Obviously, any DBP whose fiber span lengths satisfies the LC condition can effectively compensate for the linear distortions of the transmission fiber link. All the linear solutions satisfying the LC condition may be seen as local minima of problem (48) without the LC constraint. While solving it, the ABO might get stuck at any of these local minima points. We therefore introduce the LC constraint to guide the ABO algorithm to search for the global optimal solution (that minimizes both linear and nonlinear distortions) among all possible linear solutions. The initial parameters of the ABO algorithm are given as: ${{\boldsymbol \pi }_0} = {[{1\; 1\; \cdots 1} ]^T}$ and $\varepsilon = {10^{ - 12}}$.

5.1 4-span A-DBP

We first investigate the accuracy and efficiency of the proposed ABO algorithm in training and optimizing the parameters of the A-DBP. We study the ability of the ABO to converge to the ideal DBP solution (ideal DBP refers to the case in which the receiver knows exactly the values of fiber-optic system parameters). We also compare our ABO algorithm to other finite-difference-based optimization algorithms in terms of convergence rate and computational complexity.

The ABO algorithm is first applied to train a 4-span A-DBP system for mitigating the fiber impairments of the 4-span fiber communication scenario. The virtual fibers of the A-DBP have the same transmission fibers’ parameters given in Table 2. However, their lengths are assumed to be unknown to the receiver. The number of known symbols used to train the A-DBP is $4096$. The average launch power at the training process is ${P_{av}} = 7$ dBm. The total length of the channel (estimated using the CDC unit which provides the best performance when the accumulated dispersion of CDC is $-{\beta _2} \times {L_{tot}}$) is ${L_{tot}} = 340$ km. Initially, the 4 spans of the A-DBP are assumed to have an equal length of ${L_{tot}}/4$. Clearly, this initial point is a feasible design point, i.e., it satisfies all the constraints of the training problem (48).

Figure 3 illustrates the evolution of the objective function in Eq. (48) with number of optimization iterates. The ABO algorithm terminates after $6$ iterates with the optimal solution given in Table 3. These values are quite close to the actual fiber span lengths. The Euclidian norm of the objective function gradient at this point is $4.1 \times {10^{ - 16}}$. As can be seen, the ABO algorithm is capable of training the A-DBP, and successfully converges to the physical fiber lengths. Note that the launch power during the training should be chosen such that the nonlinear effects are not negligible. In other words, the training launch power should be reasonably high, i.e., the range of powers for which the nonlinear effects are the dominant source of distortions. For example, if we choose a different launch power within the fiber nonlinearity dominant region, say $6$ or $9\,$ dBm, it would have a minimum impact on the estimated fiber lengths. However, if a low launch training power is chosen, say ${P_{av}} < 5\,$ dBm, the ABO algorithm will not converge to the correct lengths of the physical fiber. Rather, it will terminate at any local minima that satisfies the LC constraint, i.e., any solution that equalizes the total accumulated dispersion.

Fig. 3. Value of the training objective function in Eq. (48) versus the number of optimization iterates for the 4-span fiber-optic communication scenario. Four optimization algorithms are considered for the training of the A-DBP, namely, the ABO, SQP-CFD, SQP-FFD, and SQP-BFD algorithms.

Download Full Size | PDF

Table 3. Optimal Design Point Solutions of the A-DBP in Case of the 4-span Fiber-Optic Communication Scenario. The A-DBP is Trained Using the ABO, SQP-CFD, SQP-FFD, and SQP-BFD Algorithms. The Initial Design Point (in km) is: ${{\boldsymbol x}_0} = {[{85{\boldsymbol \; }85{\boldsymbol \; }85{\boldsymbol \; }85} ]^{\boldsymbol T}}$.

View Table | View all tables in this article

Once the training of the A-DBP is completed, we have an estimate of fiber lengths (column 3 of Table 3), which is used for the A-DBP. This scheme is referred to as ‘trained A-DBP’. We then transmit ${2^{17}}$ unknown data symbols through the 4-span communication system, and the trained A-DBP is switched into the transmission mode. If the exact parameters of the fiber channel are known at the receiver, the INLSE (Eq. (1)) can be solved using the SSFS approach. This scheme is referred to as ‘ideal DBP’. For both trained A-DBP and ideal DBP, we use the adaptive step-size that guarantees that the maximum nonlinear phase per step does not exceed $0.01$ radian. Figure 4 shows the bit error rate (BER) in the case of the initial A-DBP (i.e. equal virtual span lengths) and the trained A-DBP, as compared to the ideal DBP case, as a function of the average launch power. As compared to the initial A-DBP, the trained A-DBP effectively mitigates the distortions at the higher launch power ranges, achieving the same BER performance as the ideal DBP. Using the trained A-DBP, the BER is found to be zero for average fiber launch powers larger than $- 4$ dBm and less than $14$ dBm. For low launch power (${P_{av}} \le - 8$ dBm), the fiber nonlinear distortion is negligible, and the distortions are mainly due to fiber CD effects and amplifier noise [2,3]. Therefore, the initial A-DBP can provide a BER performance comparable to the trained A-DBP (as well as the ideal DBP) at this region. As the power slightly increases, the signal to noise ratio increases resulting in a BER reduction. However, at higher power levels (${P_{av}} \ge \; \; 2$ dBm), the fiber nonlinearity becomes the dominant source of distortions and the initial A-DBP cannot mitigate the impairments anymore. Contrarily, the trained A-DBP can still provide a BER performance benefit up to average launch power of $14$ dBm. For very high launch power (${P_{av}} > 14$ dBm), both the trained A-DBP and the ideal DBP fails to mitigate the distortions due to the severe signal-noise nonlinear interactions [43,44].

Fig. 4. BER versus average launch power for the 4-span optical fiber communication scenario. The equalization at the receiver is performed using the ideal DBP, the initial A-DBP, or the trained A-DBP. The number of transmitted symbols is ${2^{17}}$.

Download Full Size | PDF

In order to investigate the computational efficiency of the introduced ABO, the same gradient-based optimization algorithm is utilized to train the initial A-DBP. However, at this time the required sensitivity information is obtained using the central-finite difference (CFD), the forward-finite difference (FFD), or the backward-finite difference (BFD) approximations. These optimizers are respectively denoted as SQP-CFD, SQP-FFD, and SQP-BFD. The three algorithms (SQP-CFD, SQP-FFD, and SQP-BFD) terminate after $6 - $ iterations with an objective function gradient norm of $1.19 \times {10^{ - 16}}$, $2.9 \times {10^{ - 14}}$, and $2.6 \times {10^{ - 14}}$, respectively. The optimal design points obtained are given in Table 3. Owing to the proper definition of the training problem (48), all three optimizers converge to the ideal DBP solution as well as the ABO does. However, the solutions of the ABO and SQP-CFD algorithms are slightly better (closer to the ideal solution) than the SQP-FFD and SQP-BFD solutions. This is because the derivatives estimations of both the adjoint and CFD approaches are more accurate than the FFD and BFD estimations [23]. Figure 3 shows the objective function evolution in case of using the three finite-differences-based optimizers as compared to that of the ABO algorithm. Obviously, there are slight differences between routes that each algorithm takes towards the optimal solution. This is due to the slight differences between the gradient estimations of each sensitivity technique.

Table 4 compares the four optimizers (ABO, SQP-CFD, SQP-FFD, and SQP-BFD) in terms of the total number of SSFS simulations performed during the training process of the initial A-DBP. The total number of symbols used during the training process is $4096$, i.e., a total of $4096 \times 2$ training samples is used. Substituting in Eq. (40), the modified SSFS overhead parameter for these settings is $\tau \approx 1.48$. In other words, the complexity of one adjoint DBP simulation is approximately equivalent to $1.48 \times $ complexity of single original DBP simulation. At each ABO algorithm iterate, we need to perform $1 - $ original and $1 - $ adjoint DBP simulations, to evaluate current objective function value and its gradient vector. This is equivalent to performing $2.48$ conventional SSFS simulation. Contrarily, each SQP-CFD iterate requires $({2N + 1} )- $ original DBP simulations to determine current objective function value and the gradient vector, where N is the number of design parameters. Each of the SQP-FFD and SQP-BFD needs to run $({N + 1} )- $ original DBP simulations every optimization iterate, to obtain the current objective function value and a less accurate (as compared to the ABO and SQP-CFD algorithms) gradient vector estimation. As shown in Table 4, the proposed ABO algorithm saves around $39 - $ system simulations, as opposed to the SQP-CFD algorithm for the same accuracy. As compared to SQP-FFD/BFD, the ABO saves around $15 - $ system simulations; besides, the accuracy of SQP-FFD/BFD to estimate the fiber lengths is not so good (See Table 3).

Table 4. Computational Cost of Sensitivity Calculations Required for Training the A-DBP to Mitigate the Distortions of the 4-Span and 20-Span Fiber-Optic Communication Scenarios. The A-DBP is Trained Using ABO, SQP-CFD, SQP-FFD, and SQP-BFD Algorithms. The Number of Design Parameters are $4$ and $20$, Respectively. The Number of Executed Iterations are $6$ and $22$, Respectively.

View Table | View all tables in this article

In order to better understand the gain achieved by the ABO algorithm in terms of the computational cost, we define a simulation saving factor $\eta $ as the ratio of the total number of conventional SSFS simulations required by a certain optimizer to train an A-DBP over the number of equivalent conventional SSFS simulations needed by the ABO to train the same A-DBP. For this example, the value of $\eta $ is calculated to be $3.63$ and $2.02$ for the SQP-CFD and SQP-FFD/BFD, respectively, when the number of design parameters is $N = \; 4$. In other words, our ABO algorithm is $3.63$ times faster than the SQP-CFD in training the 4-span A-DBP.

5.2 20-span A-DBP

In order to further demonstrate the efficiency of the ABO algorithm, we train a 20-span A-DBP system to blindly compensate for the impairments of the 20-span optical fiber communication scenario. The average launch power during training is ${P_{av}} = 6$ dBm, and the number of training symbols is $4096$. The total length of the channel (estimated using the CDC) is ${L_{tot}} = 1500$ km. The length of each A-DBP’s span is initially set to $75$ km. The ABO algorithm requires $22$ iterates until successfully converging to the actual fiber span lengths, terminating at gradient norm of $1.05 \times {10^{ - 16}}$.

Figure 5 shows the BER performance of the 20-span transmission system versus a sweep of the average launch power. The BER obtained using the ideal DBP at the receiver is compared to the BER achieved in case of utilizing the initial A-DBP or the trained A-DBP. The number of actual transmitted symbols is ${2^{17}}$. As compared to the 4-span system, the overall BER performance significantly deteriorates because of the longer transmission distance that causes much higher linear and nonlinear signal distortions. The initial A-DBP cannot provide BER less than ${10^{ - 2}}$ for any average launch power value. However, the trained A-DBP achieves the same BER performance obtained using the ideal DBP. The BER is found to be zero using the trained A-DBP when the average launch power is larger than $2$ dBm and less than $10$ dBm.

Fig. 5. BER versus average launch power for the 20-span optical fiber communication scenario. The equalization at the receiver is performed using the ideal DBP, the initial A-DBP, or the trained A-DBP. The number of transmitted symbols is ${2^{17}}$.

Download Full Size | PDF

To investigate the computational cost of the training, we again re-train the initial A-DBP using the SQP-CFD, SQP-FFD, and SQP-BFD optimizers. The three algorithms terminate after $22 - $ iterations with gradient norms of $5.81 \times {10^{ - 16}}$, $1.09 \times {10^{ - 14}}$, and $2.02 \times {10^{ - 14}}$, respectively. Similar to the 4-span results, the solutions of the ABO and SQP-CFD algorithms are slightly more accurate than the SQP-FFD and SQP-BFD solutions. However, as shown in Table 4, the proposed ABO algorithm needs only around 55 conventional SSFS simulations to train the A-DBP, as opposed to $902$ and $462$ simulations required by the SQP-CFD and the SQP-FFD/BFD algorithms. For this example, the value of the simulation saving factor $\eta $ is calculated to be $16.5$ and $8.47$ for the SQP-CFD and SQP-FFD/BFD, respectively. In other words, the ABO algorithm is $16.5$ times faster than the SQP-CFD, and $8.47$ times faster than the SQP-FFD/BFD, in training the 20-span A-DBP, achieving the same high accuracy obtained using the SQP-CFD algorithm. Clearly, the time saving gain achieved by the ABO algorithm increases as the number of design parameters increases. This is because the number of extra simulations required by the finite-differences-based optimizers is linearly proportional to the number of design parameters. The ABO algorithm, on the other hand, always requires one extra simulation of the adjoint system (Eq. (17)) per iterate to evaluate the full gradient information regardless of the number of parameters.

5.3 Optimum low-complexity A-DBP

Typically, the computational cost of the DBP is expensive. It scales directly with the number of propagation steps and hence, DBP approaches with step size larger than the physical fiber span lengths have drawn significant attention [11,12,19,21,22]. In this subsection, we therefore investigate the reduction of the A-DBP computational cost at the expense of a little bit lower equalization performance. For the ideal DBP, we will replace the small step-size used in the SSFS simulation with a relatively larger step. Since the SSFS simulation has an error of order $O({{h^3}} )$, a coarse-mesh ideal DBP simulation, with relatively larger step-size, will not be identical to the inverse response of the fiber channel. As a result, the equalization performance with a coarse-mesh ideal DBP model degrades as opposed to the fine-mesh ideal DBP model. For the 4-span fiber-optic communication scenario, Fig. 6(a) shows the BER performance of the coarse-mesh ideal DBP as compared to the fine-mesh ideal DBP model. The adaptive step-size used in the coarse-mesh model is chosen such that the maximum nonlinear phase change within each step is $0.1$ radian, as opposed to $0.01$ radian in the fine-mesh case. As shown in Fig. 6(a), the BER performance of the coarse-mesh ideal DBP deteriorates significantly for average launch power above $0$ dBm. However, as depicted in Fig. 6(b), the coarse-mesh ideal DBP requires only a total of 4-SSFS steps at ${P_{av}} = 0$ dBm, as compared to $53 - $ SSFS steps required by the fine-mesh ideal DBP model at the same launch power.

Fig. 6. (a) BER, and (b) total number of required SSFS steps versus average launch power for the 4-span fiber-optic communication scenario. The equalization at the receiver is performed using ideal DBP with fine-mesh and coarse-mesh models. The number of transmitted symbols is ${2^{17}}$.

Download Full Size | PDF

In order to enhance the performance of the coarse-mesh ideal DBP model with a computational cost within the same range, we apply our ABO algorithm to train coarse-mesh A-DBP models with $2 - $, $3 - $, and 4-virtual spans. The fiber-optic system has 4 spans with parameters as given in Tables 2 and 3. We thus wish to compensate for the dispersion and nonlinear impairments of the 4-span fiber-optic system with M virtual spans, where $M\; < = \; 4$. We consider here the length and nonlinear coefficient of each virtual fiber span as design parameters. In other words, the design vector ${\boldsymbol x} = {[{{L_1}\; {L_2}\; \cdots {L_M}\; \; {\gamma_1}\; {\gamma_2}\; \cdots {\gamma_M}} ]^T}$ contains the lengths and nonlinear coefficients of all fiber spans. However, the proposed ABO algorithm is general and could be utilized to estimate all other A-DBP parameters, e.g., average launch power ${P_{av}}$, dispersion coefficients (${\beta _{3,j}}\; $ and ${\beta _{2,j}}$), loss coefficients ${\alpha _j}$, and amplifiers’ gains ${g_j}$. The A-DBP training problem is modified as follows:

(49)$$\textrm{subject to }\begin{array}{{c}} {\mathop {\min }\limits_x \int_{ - {T_m}}^{{T_m}} {{{|{{v_{out}} - {u_{tx}}} |}^2}dt} }\\ {\left\{ {\begin{array}{{l}} {{L_1} + {L_2} + \cdots {L_M} = {L_{tot}},}\\ {0\,\textrm{km} \le {L_j} \le {L_{tot}},{\kern 10pt}j = 1,2, \cdots ,M,}\\ {0.001{\textrm{W}^{ - 1}}\textrm{k}{\textrm{m}^{ - 1}} \le {\gamma_j} \le 20{\textrm{W}^{ - 1}}\textrm{k}{\textrm{m}^{ - 1}},{\kern 10pt}j = 1,2, \cdots ,M,} \end{array}} \right.} \end{array}$$

where ${L_{tot}} = 340$ km, and $M = 2$, $3$, or $4$. The average launch power at the training process is ${P_{av}} = 6$ dBm. The initial length and nonlinear coefficient of each span are set to $0.6$ W^-1km^-1, and ${L_{tot}}/M$, respectively. The number of symbols used to train the A-DBP is $4096$. The optimal design points of all the trained A-DBP models, obtained using the ABO algorithm, are tabulated in Table 5. Notice that in this example we do not wish to converge to the actual channel parameters, since the coarse-mesh ideal DBP model (whose parameters are identical to actual channel parameters) does not provide the global optimum solution anymore. We therefore relax the constraints of the training problem, owing to the fact that the A-DBP is a virtual fiber, which allows for non-physical values of the parameters. This relaxation extends the parameter space (i.e. the feasible region) at which the ABO algorithm searches for the optimal coarse-mesh A-DBP parameters, thus seeking to achieve the best possible compensation performance.

Table 5. Optimum Values of the A-DBP Design Parameters for Mitigating the 4-Span Fiber-Optic Communication Scenario. A Coarse-Mesh Model is Considered with a Maximum Allowed Nonlinear Phase Change per SSFS Step of 0.1 Rad. The ABO is Used to Train A-DBP with $2 - $, $3 - $, and $4 - $ Virtual Fiber Spans.

View Table | View all tables in this article

Figure 7(a) shows the Q-factor performance of the 4-span fiber-optic communication system as a function of fiber launch power for various cases of DBP equalization schemes. The Q-factor in $\textrm{dB}$ is calculated using [3]:

(50)$$Q = 20\textrm{lo}{\textrm{g}_{10}}\left( {\sqrt 2 erf{c^{ - 1}}({2 \times BER} )} \right), $$

where $erf{c^{ - 1}}({\cdot} )$ stands for the inverse of the complementary error function, and the $BER$ is obtained using the Monte-Carlo simulations. The total number of SSFS steps required by each equalizer versus a sweep of the launch power is also compared in Fig. 7(b). We keep coarse-mesh ideal DBP as our reference for comparison. As can be seen, any $M$-span trained A-DBP with $M = 2$, $3$, and $4$, outperforms the coarse-mesh ideal DBP. From Figs. 7(a) and (b), we see that the Q-factor of the 2-span A-DBP is $1$ dB higher than that of the coarse-mesh ideal DBP and its computational cost is lower. The optimum launch power is $- 2$ dBm and at this power, the 2-span A-DBP requires $2$ SSFS steps while the coarse-mesh ideal DBP requires $4$ SSFS steps. Q-factor increments of $2.2\; \textrm{dB}$ and $2.7\; \textrm{dB}$ are achieved by the 3-span and 4-span A-DBPs, respectively. The 3-span and 4-span A-DBPs though require 2-extra SSFS steps as compared to the coarse-mesh ideal DBP. Hence, the 2-span A-DBP provides a better trade-off between performance and computational cost, i.e. $1$ dB performance advantage and $50\%$ reduction in computational cost as compared to our reference system. It is worth emphasizing that for practical BER of ${10^{ - 2}}$ or ${10^{ - 3}}$ levels, the performance advantage of the 2-span A-DBP is reduced. The computational cost required by the 2-span A-DBP is still, however, half the coarse-mesh ideal DBP cost.

Fig. 7. (a) Q-factor, and (b) total number of required SSFS steps versus average launch power for the 4-span fiber-optic communication scenario. The equalization at the receiver is performed using the coarse-mesh ideal DBP, and the coarse-mesh trained A-DBPs with $2 - $, $3 - $, and $4 - $ virtual fiber spans. The number of transmitted symbols is ${2^{17}}$.

Download Full Size | PDF

It should conceptually be clear that the ABO algorithm can be used to train a $5$ or $10 - $ (virtual) span A-DBP models for the 20-(real) span fiber-optic system. These models could then be utilized to significantly reduce the computational complexity required by a 20-span coarse-mesh ideal DBP model for mitigating the distortions of the 20-span optical fiber communication scenario, achieving a similar or even better compensation performance.

6. Conclusion

We propose a computationally efficient adaptive digital back propagation (A-DBP) method to blindly mitigate the linear and nonlinear impairments-induced in optical fiber communication systems and networks. An efficient adjoint-based optimization (ABO) algorithm is introduced for accelerating the parameters extraction of the A-DBP. The ABO algorithm utilizes the sequential quadratic programming (SQP) optimization technique for solving the A-DBP training problem. The ABO algorithm also exploits an introduced adjoint sensitivity analysis (ASA) approach to rapidly evaluate the required sensitivity information at each optimization iterate. The efficiency and robustness of the proposed A-DBP method are illustrated through 4-span and 20-span optical fiber communication examples. As compared to the A-DBP trained using finite-difference (FD)-based SQP algorithms, the ABO algorithm is shown to be $8.47$ times faster than the backward/forward FD-based optimizers, and $16.5$ times faster than the more accurate central FD-based optimizer in the 20-span example. This computational complexity gain further increases with the number of parameters. Moreover, the reduction of the A-DBP computational cost at the expense of slightly lower equalization performance is demonstrated. In the 4-span example, an optimized $2 - $ virtual span coarse-mesh A-DBP model is shown to provide the 1 dB performance advantage over the reference coarse-mesh ideal DBP system, while reducing the computational complexity of the reference system by $50\%$.

Funding

Natural Sciences and Engineering Research Council of Canada (RGPIN-2016-05451, RGPIN-2016-06579).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. I. Kaminow, T. Li, and A. E. Willner, Optical Fiber Telecommunications Volume VIB: Systems and Networks (Academic Press, 2013).

2. G. P. Agrawal, Nonlinear Fiber Optics (Academic press, 2007).

3. S. Kumar and M. J. Deen, Fiber Optic Communications: Fundamentals and Applications (John Wiley & Sons, 2014).

4. M. Kuschnerov, F. N. Hauske, K. Piyawanno, B. Spinnler, M. S. Alfiad, A. Napoli, and B. Lankl, “Dsp for coherent single-carrier receivers,” J. Lightwave Technol. 27(16), 3614–3622 (2009). [CrossRef]

5. V. Curri, R. Gaudino, A. Napoli, and P. Poggiolini, “Electronic equalization for advanced modulation formats in dispersion-limited systems,” IEEE Photonics Technol. Lett. 16(11), 2556–2558 (2004). [CrossRef]

6. F. P. Guiomar and A. N. Pinto, “Simplified volterra series nonlinear equalizer for polarization-multiplexed coherent optical systems,” J. Lightwave Technol. 31(23), 3879–3891 (2013). [CrossRef]

7. M. M. T. Maghrabi, S. Kumar, and M. H. Bakr, “Dispersion compensation of fiber optic communication system with direct detection using artificial neural networks (anns),” Opt. Commun. 409, 109–116 (2018). [CrossRef]

8. R. J. Essiambre and P. J. Winzer, “Fibre nonlinearities in electronically pre-distorted transmission,” in Proc. 31st Eur. Conf. Opt. Commun., pp. 191–192 (2005).

9. E. Ip and J. M. Kahn, “Compensation of dispersion and nonlinear impairments using digital backpropagation,” J. Lightwave Technol. 26(20), 3416–3425 (2008). [CrossRef]

10. X. Li, X. Chen, G. Goldfarb, E. Mateo, I. Kim, F. Yaman, and G. Li, “Electronic post-compensation of wdm transmission impairments using coherent detection and digital signal processing,” Opt. Express 16(2), 880–888 (2008). [CrossRef]

11. L. B. Du and A and J. Lowery, “Improved single channel backpropagation for intra-channel fiber nonlinearity compensation in long-haul optical communication systems,” Opt. Express 18(16), 17075–17088 (2010). [CrossRef]

12. D. S. Millar, S. Makovejs, C. Behrens, S. Hellerbrand, R. I. Killey, P. Bayvel, and S. J. Savory, “Mitigation of fiber nonlinearity using a digital coherent receiver,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1217–1226 (2010). [CrossRef]

13. D. Rafique and A. D. Ellis, “Impact of signal-ASE four-wave mixing on the effectiveness of digital backpropagation in 112 Gb/s PM-QPSK systems,” Opt. Express 19(4), 3449–3454 (2011). [CrossRef]

14. J. Shao, S. Kumar, and X. Liang, “Digital back propagation with optimal step size for polarization multiplexed transmission,” IEEE Photonics Technol. Lett. 25(23), 2327–2330 (2013). [CrossRef]

15. D. Sharma and S. Kumar, “An overview of elastic optical networks and its enabling technologies,” Int. J. Eng. Technol. 9(3), 1643–1649 (2017). [CrossRef]

16. E. P. da Silva, R. Asif, K. J. Larsen, and D. Zibar, “Nonlinear compensation with modified adaptive digital backpropagation in flexigrid networks,” in CLEO: Science and Innovations, pp. SM2M–5 (2015).

17. T. Tanimura, T. Hoshida, T. Tanaka, L. Li, S. Oda, H. Nakashima, Z. Tao, and J. C. Rasmussen, “Semi-blind nonlinear equalization in coherent multi-span transmission system with inhomogeneous span parameters,” in Optical Fiber Communication Conference (OFC), p. OMR6 (2010).

18. C. Lin, A. Napoli, B. Spinnler, V. Sleiffer, D. Rafique, M. Kuschnerov, M. Bohn, and B. Schmauss, “Adaptive digital back-propagation for optical communication systems,” in Optical Fiber Communication Conference (OFC), p. M3C-4 (2014).

19. L. Jiang, L. Yan, Z. Chen, A. Yi, Y. Pan, W. Pan, and B. Luo, “Low-complexity and adaptive nonlinearity estimation module based on godard’s error,” IEEE Photonics J. 8(1), 1–8 (2016). [CrossRef]

20. X. Liang and S. Kumar, “Multi-stage perturbation theory for compensating intra-channel nonlinear impairments in fiber-optic links,” Opt. Express 22(24), 29733–29745 (2014). [CrossRef]

21. F. Zhang, Q. Zhuge, M. Qiu, X. Zhou, M. Y. S. Sowailem, T. M. Hoang, M. Xiang, and D. V. Plant, “Blind adaptive digital backpropagation for fiber nonlinearity compensation,” J. Lightwave Technol. 36(9), 1746–1756 (2018). [CrossRef]

22. J. Zhou, Y. Wang, and Y. Zhang, “Blind backpropagation method for fiber nonlinearity compensation with low computational complexity and high performance,” Opt. Express 28(8), 11424–11438 (2020). [CrossRef]

23. M. Bakr, A. Elsherbeni, and V. Demir, Adjoint Sensitivity Analysis of High Frequency Structures with MATLAB (The Institution of Engineering and Technology (IET), 2017).

24. M. H. Bakr, O. S. Ahmed, M. H. El Sherif, and T. Nomura, “Time domain adjoint sensitivity analysis of electromagnetic problems with nonlinear media,” Opt. Express 22(9), 10831–10843 (2014). [CrossRef]

25. N. K. Georgieva, S. Glavic, M. H. Bakr, and J. W. Bandler, “Feasible adjoint sensitivity technique for em design optimization,” IEEE Trans. Microwave Theory Techn. 50(12), 2751–2758 (2002). [CrossRef]

26. M. M. T. Maghrabi, M. H. Bakr, S. Kumar, A. Elsherbeni, and V. Demir, “Fdtd-based adjoint sensitivity analysis of high frequency nonlinear structures,” IEEE Trans. Antennas Propagat. 68(6), 4727–4737 (2020). [CrossRef]

27. G. Veronis, R. W. Dutton, and S. Fan, “Method for sensitivity analysis of photonic crystal devices,” Opt. Lett. 29(19), 2288–2290 (2004). [CrossRef]

28. M. R. Abdelhafez and M. A. Swillam, “Efficient sensitivity analysis approach based on finite element solutions of photonic structures,” Opt. Commun. 313, 430–435 (2014). [CrossRef]

29. E. Sayed, M. H. Bakr, B. Bilgin, and A. Emadi, “Adjoint sensitivity analysis of switched reluctance motors,” Electric Power Components and Systems 46(18), 1959–1968 (2018). [CrossRef]

30. M. M. T. Maghrabi, M. H. Bakr, and S. Kumar, “Adjoint sensitivity analysis approach for the nonlinear schrödinger equation,” Opt. Lett. 44(16), 3940–3943 (2019). [CrossRef]

31. M. M. T. Maghrabi, M. H. Bakr, and S. Kumar, “Nonlinear Schrödinger Equation-Based Adjoint Sensitivity Analysis,” in 2020 International Applied Computational Electromagnetics Society Symposium (ACES), pp. 1–2 (2020).

32. R. Glowinski, S. J. Osher, and W. Yin, Splitting Methods in Communication, Imaging, Science, and Engineering (Springer, 2017).

33. B. Spinnler, “Equalizer design and complexity for digital coherent receivers,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1180–1192 (2010). [CrossRef]

34. Y. Gao, J. H. Ke, K. P. Zhong, J. C. Cartledge, and S. S. Yam, “Assessment of intrachannel nonlinear compensation for 112 gb/s dual- polarization 16qam systems,” J. Lightwave Technol. 30(24), 3902–3910 (2012). [CrossRef]

35. F. Zhang, Q. Zhuge, M. Qiu, W. Wang, M. Chagnon, and D. V. Plant, “Xpm model-based digital backpropagation for subcarrier-multiplexing systems,” J. Lightwave Technol. 33(24), 5140–5150 (2015). [CrossRef]

36. H. Alt and J. V. Leeuwen, “The complexity of basic complex operations,” Computing 27(3), 205–215 (1981). [CrossRef]

37. S. S. Rao, Engineering Optimization: Theory and Practice (John Wiley & Sons, 2019).

38. U. M. G. Palomares and O. L. Mangasarian, “Superlinearly convergent quasi-newton algorithms for nonlinearly constrained optimization problems,” Mathematical Programming 11(1), 1–13 (1976). [CrossRef]

39. S. Han, “Superlinearly convergent variable metric algorithms for general nonlinear programming problems,” Mathematical Programming 11(1), 263–282 (1976). [CrossRef]

40. J. Nocedal and S. Wright, Numerical Optimization (Springer Science & Business Media, 2006).

41. M. Bakr, Nonlinear optimization in electrical engineering with applications in MATLAB (The Institution of Engineering and Technology (IET), 2013).

42. J. Shao, X. Liang, and S. Kumar, “Comparison of split-step Fourier schemes for simulating fiber optic communication systems,” IEEE Photonics J. 6(4), 1–15 (2014). [CrossRef]

43. J. P. Gordon and L. F. Mollenauer, “Phase noise in photonic communications systems using linear amplifiers,” Opt. Lett. 15(23), 1351–1353 (1990). [CrossRef]

44. S. Kumar, “Analysis of nonlinear phase noise in coherent fiber-optic transmission systems,” J. Lightwave Technol. 27(21), 4722–4733 (2009). [CrossRef]

Algorithm	Number of required real multiplications
SSFS [33]	$4 N_{s} (K_{N L S} + M) (1 + \log_{2} (N_{s})) + 10 N_{s} K_{N L S} + 2 N_{s} M$
M-SSFS	$4 N_{s} (K_{N L S} + M) (1 + \log_{2} (N_{s})) + 42 N_{s} K_{N L S} + 2 N_{s} M$

Parameter	Value
Second-order dispersion coefficient, $β_{2}$	$- 21$ ps²/km
Third-order dispersion coefficient, $β_{3}$	$0$ ps³/km
Loss coefficient, $α$	$0.2$ dB/km
Nonlinear coefficient, $γ$	$1.1$ W^-1km^-1

Fiber span number	Fiber span length (km)
Fiber span number	Actual fiber	ABO	SQP-CFD	SQP-FFD	SQP-BFD
1	50.00	$50.00$	$50.00$	$51.85$	$50.00$
2	80.00	$79.92$	$80.01$	$78.21$	$78.74$
3	85.00	$85.08$	$84.99$	$85.57$	$86.26$
4	125.00	$125.00$	$125.00$	$124.37$	$125.00$

Algorithm	Total number of equivalent conventional SSFS simulations
Algorithm	4-span system	20-span system
ABO	$14.88$	$54.56$
SQP-CFD	$54.00$	$902.00$
SQP-FFD	$30.00$	$462.00$
SQP-BFD	$30.00$	$462.00$

Fiber span number	Optimum parameters
	2-span A-DBP		3-span A-DBP		4-span A-DBP
	$L$ (km)	$γ (W^{- 1} k m^{- 1})$	$L$ (km)	$γ (W^{- 1} k m^{- 1})$	$L$ (km)	$γ (W^{- 1} k m^{- 1})$
1	$123.47$	$3.69$	$132.15$	$3.82$	$50.50$	$1.47$
2	$216.53$	$18.82$	$1.36$	$20.00$	$80.69$	$1.32$
3	$-$	$-$	$206.49$	$9.97$	$16.85$	$1.69$
4	$-$	$-$	$-$	$-$	$191.96$	$6.08$

Adaptive digital back propagation exploiting adjoint-based optimization for fiber-optic communications

Abstract

1. Introduction

2. Multi-span DBP mathematical model

3. ASA for the DBP

3.1 Adjoint DBP problem solution

3.2 Algorithm

3.3 Complexity analysis

4. Adjoint-based optimization algorithm

4.1 ABO algorithm

5. Results

5.1 4-span A-DBP

5.2 20-span A-DBP

5.3 Optimum low-complexity A-DBP

6. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (5)

Equations (66)

Optics Express