Efficient synchronous retrieval of OAM modes and AT strength using multi-task neural networks

Pinchao Meng; Jiabao Zhuang; Linhua Zhou; Weishi Yin; Dequan Qi

doi:10.1364/OE.511098

1. Introduction

The extensive use of today’s information technologies, such as urban IoT devices, 5G and B5G mobile communications, has facilitated the rapid progress of high-speed data processing services and various application devices, resulting in an increasing demand for high-capacity data transmission [1–3]. Free-space optical communication technology has the advantages of rapid construction of communication links under complex geographical conditions, high speed rate, high reliability and wide coverage, unlike the optical fibre communication system which is subject to environmental constraints [4]. Today, free-space optical communication makes almost full use of the dimensional information of light, including amplitude, phase and wavelength. Recently, it has been discovered that Laguerre-Gaussian (LG) orbital angular momentum (OAM) beams have an infinite range of potentially exploitable modes [5–7]. Consequently, the application of OAM beams to free-space optical communications technology has attracted considerable interest. OAM beams contain a phase factor with a helical wavefront denoted by $\exp ( il\theta )$, where $\theta$ is the azimuth angle and $l$ is the topological charge. Theoretically, any integer can be used, and OAM beams with different topological charges are perpendicular to each other. As a result, multiplexed OAM beams can be used as information carriers. This approach can significantly alleviate the capacity crisis faced by communication systems, especially for last-kilometre communication system architecture and high-dimensional quantum key distribution tasks [8,9].

However, when OAM beams are transmitted through the atmospheric channel, they are affected by refractive index fluctuations caused by atmospheric turbulence. This results in phase distortion and dispersion of beam intensity, leading to severe signal crosstalk and degrading the quality of laser information transmission. Therefore, detecting the characteristics of atmospheric turbulence is of great importance for free-space optical communication systems. In free-space optical communication systems, the atmospheric refractive index structure constant, denoted as $C_{n}^{2}$, is commonly used to characterise the intensity of atmospheric turbulence. Thus, by determining the value of $C_{n}^{2}$ in the turbulent channel, one can assess the extent of turbulence-induced damage to the communication link. This facilitates the implementation of adaptive optical measures to compensate for turbulence, ultimately improving transmission quality and reducing error rates.

OAM mode identification is a critical technology for the successful implementation of OAM-FSO communication systems. Established methods for demodulating OAM beams include SPPs, grating diffraction elements, computational holography and additional techniques [10–13]. These demodulation techniques often require expensive physical equipment and have room for improvement in terms of demodulation efficiency, range and the challenge of fabricating optical components, particularly when addressing issues arising from atmospheric turbulence. Therefore, there is an opportunity to use machine learning to formulate new adaptive demodulators. By exploiting the powerful data analysis and information processing capabilities of machine learning, it is possible to extract and detect the characteristics of OAM modes, even in the presence of phenomena such as atmospheric turbulence-induced phase distortion [14,15]. This process fulfils the requirement for efficient OAM demodulation.

In recent years, an increasing number of researchers have explored machine learning-based OAM mode demodulation solutions and achieved some results. Krenn et al. first used Self-Organising Map (SOM) neural networks, implementing iterative optimisation through competitive learning strategies, and successfully detected 16 different OAM modes over a 3-kilometre link [16]. With the flourishing development of machine learning, convolutional neural networks (CNNs) have made significant breakthroughs in image recognition, providing the potential to use CNNs to demodulate OAM beams by processing spatial and local features. Doster et al. used the AlexNet model for feature extraction from spot images and efficiently demodulated OAM beams [17]. Zhao et al. used multi-view pooling and data augmentation techniques during network training to improve the performance and robustness of OAM demodulation [18]. Wang et al. designed a six-layer CNNs structure to extract features from the intensity distribution of the received Laguerre-Gaussian beam, achieving a balance between computational complexity and detection efficiency [19]. Zhou et al. proposed an improved model based on the ShuffleNet V2 network. Compared to previous studies that focused on single and reused modes, it exhibits characteristics of high accuracy and efficiency. In addition, it maintains high recognition accuracy even under untrained turbulence intensity [20]. As a result, more researchers are considering more complex application scenarios. Further references can be found in [21–25].

In reality, the received LG intensity images contain OAM mode information and turbulence intensity information. Therefore, we constructed a multi-task model called OATNN. By simultaneously performing OAM mode inversion and turbulence intensity inversion tasks through OATNN, and exploiting the mutual influence and information exchange between these two tasks, we can achieve accurate and efficient simultaneous detection of OAM modes and turbulence intensity in complex atmospheric turbulence environments. Furthermore, due to the strong randomness of atmospheric turbulence, there is a high probability of excessive amplification of disturbances caused by this randomness during the forward propagation of the network [26,27]. This often places higher demands on the stability of neural networks. A stable algorithmic architecture allows feature information to be transmitted from the neural network’s input nodes to the end nodes without loss, preventing turbulence disturbances from being infinitely amplified during the forward propagation of the network [28–30]. Therefore, we consider the forward process of the neural network as a continuous dynamic system and, based on the forward Euler method, propose a stable neural network unit called RUEM. Its stability is verified by theoretical derivation and numerical experiments. Through a series of experiments, including ablation experiments, we have verified that the OATNN model with the coupled stability module can achieve high-precision demodulation of OAM modes in atmospheric turbulent environments and can simultaneously perform real-time inversion of turbulence intensity. We have also conducted experiments in scenarios with longer transmission distances and high-precision recognition requirements for turbulence intensity, demonstrating the generalizability of the proposed model.

2. Analytical model of atmospheric turbulence

In atmospheric turbulence, the random distribution of the refractive index disturbs the wavefront phase of laser beams, causing signal attenuation and interference. Since atmospheric turbulence is continuously distributed in time and space, the multi-layer phase screen analysis method is essentially a discrete solution to the integral of the light propagation paths. Therefore, multiple layers of random phase screens can be used to simulate optical transmission in atmospheric turbulence, replacing numerical simulations. We used the classical and widely used theoretical power spectral density model based on the Andrews improved Hill spectrum:

(1)$$\begin{aligned}& {{\Phi }_{n}}\left( {{k}_{x}},{{k}_{y}} \right)=0.033C_{n}^{2}\times \left[ 1+1.802\sqrt{\frac{k_{x}^{2}+k_{y}^{2}}{k_{l}^{2}}}-0.254{{\left( \frac{k_{x}^{2}+k_{y}^{2}}{k_{l}^{2}} \right)}^{{7}/{12}\;}} \right]\\ & \text{ }\times \exp \left( \frac{k_{x}^{2}+k_{y}^{2}}{k_{l}^{2}} \right){{\left( k_{x}^{2}+k_{y}^{2}+\frac{1}{L_{\text{0}}^{\text{2}}} \right)}^{{\text{11}}/{\text{6}}\;}} \end{aligned}$$

where ${k_x}$ and ${k_y}$ are the wave numbers in the x and y directions, ${k_l} = 3.3/{l_0}$, where $l_0$ is the inner scale of turbulence and $L_0$ is the outer scale of turbulence. $C_{n}^{2}$ represents the refractive index structure constant of atmospheric turbulence. The disturbance of wavefront phase can be further simulated using a random distribution with a certain variance:

(2)$${{\sigma }^{2}}\left( {{k}_{x}},{{k}_{y}} \right)={{\left( \frac{2\pi }{N\Delta x} \right)}^{2}}2\pi \left( k_{x}^{2}+k_{y}^{2} \right)\Delta z{{\Phi }_{n}}\left( {{k}_{x}},{{k}_{y}} \right)$$

where $\Delta z$ is the distance interval between adjacent phase images, $N$ and $\Delta x$ represent the size and the pitch of the random phase screen respectively. For ease of calculation, the phase screen in Cartesian coordinates can be transformed into the frequency domain using the Fast Fourier Transform

(3)$$\varphi \left( x,y \right)=FFT\left( {{C}_{N\times N}}\sigma \left( {{k}_{x}},{{k}_{y}} \right) \right)$$

where $FFT$ stands for Fast Fourier Transform and ${C}_{N\times N}$ is a complex Gaussian random matrix with a mean of 0 and a variance of 1.

As a result, we use the power spectrum inversion method to generate simulated random phase images $\varphi ( x,y )$. The Fig. 1 above illustrates the wavefront phase distortions caused by six different turbulence intensities, with the refractive index structure constant $C_{n}^{2}$ ranging from $5{{e}^{-16}}$ to $1{{e}^{-13}}$, covering turbulence of different intensities. It is easy to see that as the turbulence intensity increases, the range of phase distortion extends from $( -1,1 )$ to $( -20,20 )$. This indicates a positive correlation between the degree of phase distortion and turbulence intensity.

Fig. 1. The wavefront phase disturbance generated by the random phase screen has $C_{n}^{2}$ values of a) $5\times 10^{-16}\, \text {m}^{-2/3}$, b) $1\times 10^{-15}\, \text {m}^{-2/3}$, c) $5\times 10^{-15}\, \text {m}^{-2/3}$, d) $1\times 10^{-14}\, \text {m}^{-2/3}$, e) $5\times 10^{-14}\, \text {m}^{-2/3}$, and f) $1\times 10^{-13}\, \text {m}^{-2/3}$.

Download Full Size | PDF

We use an LG beam to simulate the transmission of a laser with an OAM mode in atmospheric turbulence. The LG mode explains the properties of the scalar equation of electromagnetic waves in cylindrical coordinates $( r, \theta, z )$, where $r$ is the distance from the propagation axis, $\theta$ is the azimuthal angle, and $z$ is the propagation distance. The optical field distribution of the LG beam is as follows

(4)$$\begin{aligned}& {{U}_{LG}}\left( r,\theta ,z \right)=\frac{1}{{{w}_{0}}}{{\left( \frac{r\sqrt{2}}{w(z)} \right)}^{\left| l \right|}}L_{p}^{\left| l \right|}\left( \frac{2{{r}^{2}}}{{{w}^{2}}(z)} \right)\exp \left( -\frac{{{r}^{2}}}{{{w}^{2}}(z)} \right)\exp \left( \frac{ik{{r}^{2}}z}{2\left( {{z}^{2}}+z_{R}^{2} \right)} \right)\\ & \text{ }\times \exp \left( il\theta \right)\exp \left[ i\left( 2p+\left| l \right|+1 \right)\arctan \frac{z}{{{z}_{R}}} \right] \end{aligned}$$

where ${z_{R}}={\pi {{w_{0}}}/{\lambda };}$ is the Rayleigh distance, the wavenumber $k={2\pi }/{\lambda };$ and $w(z)=\sqrt {{2( {{z}^{2}}+z_{R}^{2} )}/{k{{z}{R}}}; }$ is the beam radius at distance $z$, where ${w{0}}$ is the beam waist radius and $L_{p}^{| l |}$ is the associated Laguerre polynomial. At the same time, the phase factor $\exp ( il\theta )$ included in the LG beam expression is a crucial component that enables the LG beam to carry OAM information.

3. OATNN model architecture design

3.1 Architecture of OATNN

Convolution operations possess unique advantages such as local connections, parameter sharing, and translation invariance, making them more suitable for handling image data. We use convolution operations as fundamental modules, combined with a multitask network structure, to construct an inversion model that takes received speckle images as input and outputs their corresponding orbital angular momentum mode and atmospheric turbulence strength.

The specific network structure is shown in the Fig. 2. Figure 2(a) shows a schematic diagram of the OATNN. The network takes speckle images affected by turbulence as input and processes them in small batches. To facilitate training, we uniformly adjust the size of the images affected by different turbulence to 224*224 and normalise them to account for the different scales of the input data. The images are then pre-processed, including an 11*11 convolutional kernel operation with a stride of 4, a ReLU activation function, and a max-pooling layer. This stage uses larger convolutional kernels to capture broader local features for initial feature extraction. In addition, the large receptive field of the convolutional layer and the max-pooling layer reduce the size of the feature maps, which significantly reduces the computational complexity of subsequent network modules, thereby increasing training and testing speeds. After preprocessing, the feature maps enter OATNN’s main network, composed of 4 RUEM modules and $3*3$ convolution modules. The alternating arrangement of RUEM and $3*3$ convolution modules is pivotal in OATNN’s design. The stable RUEM module can capture long-term dependencies in speckle images, which is crucial for the inversion of OAM modes and turbulence strength. Convolution modules further extract and transform features, aiding the abstraction of information from images across network levels. Towards the main network’s end, the RUEM module output serves as the core data for the turbulence strength inversion task, while the convolution module output, combined with the RUEM output, serves as core data for the OAM inversion task. In OATNN’s architecture, we independently handle turbulence strength and OAM inversion tasks using classifiers with identical structures. Within the classifier, feature data undergo two $3*3$ convolutions and ReLU activation, followed by specific classification via three linear layers (Fig. 2(b)). Noteworthy is OATNN’s dual outputs; hence, we employ cross-entropy loss functions for both tasks to maintain loss value parity. Further, a modulation coefficient $0<\rho <1$ linearly combines task losses for gradient backpropagation, updating network weights.

Fig. 2. Schematic diagram of OATNN and classifier

Download Full Size | PDF

3.2 Architecture of RUEM

This section presents a stable RUEM, which maintains stability even in the absence of input variables. This structure ensures that noise in the input information is not unconditionally amplified during network transmission, thus preventing the drowning out of truly valuable information. It is worth noting that, in order to take full advantage of the characteristics of speckle images, we transform RUEM into a network stability module adapted to images through multiple convolution kernels. The specific structure of RUEM is shown in the Fig. 3.

Fig. 3. Schematic diagram of the specific structure of RUEM

Download Full Size | PDF

RUEM can be viewed as a state updater at discrete time steps, and its forward propagation expression is as follows

(5)$$\begin{aligned}&\quad{h}_{t} = \left( 1-{z}_{t} \right) \circ {h}_{t-1}+{z}_{t} \circ {\tilde{h}}_{t}\\ & \left\{ \begin{aligned} & {{r}_{t}}=\tanh \left( {{W}_{rh}}{{h}_{t-1}}+{{W}_{rx}}{{x}_{t}}+{{b}_{r}} \right) \\ & {\tilde{h}}_{t} = {r}_{t} \circ \tanh \left( {W}_{h}{h}_{t-1}+{b}_{h} \right) \\ & {{z}_{t}}=\sigma \left( {{W}_{zh}}{{h}_{t-1}}+{{W}_{zx}}{{x}_{t}}+{{b}_{z}} \right) \\ \end{aligned} \right. , \end{aligned}$$

where ${{h}_{t}}$ is the hidden layer state vector of RUEM at time t, ${{x}_{t}}$ is the input layer state vector at time t. ${{W}_{rh}}$ and ${{W}{h}}$ are weight matrices related to the state vector of the hidden layer, ${{W}_{rx}}$ is a weight matrix related to the state vector of the input layer, ${{W}_{zh}}$ and ${{W}_{zx}}$ are weight matrices related to the gating mechanism, ${{b}_{r}}$, ${{b}_{h}}$ and ${{b}_{z}}$ are biases, and $\sigma$ is the sigmoid function.

RUEM generates a new hidden state by taking the input from the current time step and the hidden state from the previous time step to preserve long term dependencies in the data. The evolution of this state is somewhat similar to the evolution of system states over time in ordinary differential equations. Since this recurrent unit and differential equations have some similarity in capturing and simulating the dynamic nature of time series data, we consider the following ODE

(6)$${h}'\left( t \right)=\tanh \left( {{W}_{rh}}h\left( t \right)+{{W}_{rx}}x\left( t \right)+{{b}_{r}} \right)\circ \tanh \left( {{W}_{h}}h\left( t \right)+{{b}_{h}} \right),$$

where $h(t) \in \mathbb {R}^n$, $x(t) \in \mathbb {R}^m$, ${{W}_{rh}},{{W}_{h}}\in \mathbb {R}^{n\times n}$, ${{W}_{rx}}\in \mathbb {R}^{n\times m}$, ${{b}_{r}}{{b}_{h}}\in \mathbb {R}^n$. $\tanh$ is the hyperbolic tangent function and $\circ$ is the Hadamard product. For most ODEs, numerical methods based on discretisation are typically used to obtain approximate solutions. The Forward Euler method estimates the evolution of the system through stepwise iterations and is a simple and reliable numerical ODE solution method. Although the Forward Euler method may not be as accurate as other higher order methods (such as the Improved Euler method or the Runge-Kutta method), it is often stable, especially when dealing with stiff ODEs. Therefore, it is applicable to various application domains. Using the forward Euler method to solve (6) yields

(7)$${{h}_{t}}={{h}_{t-1}}+\tanh \left( {{W}_{rh}}{{h}_{t-1}}+{{W}_{rx}}{{x}_{t}}+{{b}_{r}} \right)\circ \tanh \left( {{W}_{h}}{{h}_{t-1}}+{{b}_{h}} \right)$$

where ${{h}_{t}}$ and ${{h}_{t-1}}$ represent the initial and final information computed at step $t-1$, and ${{h}_{t}}\approx h(t)$ is an approximate solution to the differential equation. In fact, this corresponds to the forward expression of the recurrent unit.

Gate mechanisms can be used to adjust the rate of information accumulation, selectively introducing new information and forgetting previously accumulated information. Therefore, a gate structure ${{z}_{t}}$ can be added based on the discrete format (7), where ${{z}_{t}}$ is defined as

(8)$${{z}_{t}}=\sigma \left( {{W}_{zh}}{{h}_{t-1}}+{{W}_{zx}}{{x}_{t}}+{{b}_{z}} \right)$$

where ${{W}_{zh}}\in {{R}^{n\times n}}$, ${{W}_{zx}}\in {{R}^{n\times m}}$, ${{b}_{z}}\in {{R}^{n}}$, and $\sigma$ represents the sigmoid function. The values of the update gate range from [0, 1] and describe the degree of information passing. It can be observed that the forward Euler discrete format with gate mechanisms corresponds to the forward propagation expression of REUM (5), and as a result, the parameter learning problem of REUM is equivalent to solving the parameter estimation problem involving ODE system equations. Therefore, we can use the successful experience in the theory of dynamical systems to design ordinary differential equations with desired properties, and make the corresponding networks inherit these properties. Stability is one of the important properties to consider and will be discussed in the next section.

3.3 Stability analysis of RUEM

The stability of RUEM implies that a perturbation of size $\delta$ in the initial state should have an influence on subsequent states of no more than $\varepsilon$. This limits the influence of meaningless random factors on the network to an acceptable range. In this section we analyse the stability of RUEM under the no-input condition, based on the stability of the ODE and the forward Euler discrete format.

From [31] we have the following two specific theorems

Theorem 3.1 An autonomous differential equation system ${h}'=g( h )$ has a zero solution $h( t )=0$, and the solution of this ODE is stable if it satisfies

(9)$$\underset{i=1,2,\ldots ,n}{\mathop{\max }}\,R\left( {{\lambda }_{i}}\left( h\left( t \right) \right) \right)\le 0,$$

where $R( \cdot )$ is the real part of the complex numbers and ${{\lambda }_{i}}$ is the i-th eigenvalue of the Jacobian matrix $J( h( t ) )$.

Theorem 3.2 An autonomous differential equation system ${h}'=g( h )$ has a zero solution $h( t )=0$, and its forward Euler discretisation is stable if it satisfies

(10)$$\underset{i=1,2,\ldots ,n}{\mathop{\max }}\,\left| 1+\mu {{\lambda }_{i}}\left( {{J}_{t}} \right) \right|\le 1,$$

where $| \cdot |$ is the modulus of a complex number, $\mu >0$ is the step size, and ${{J}_{t}}$ is the Jacobian matrix at point ${{h}_{t}}$.

Based on the two theorems mentioned above, to study the stability of RUEM without input variables, consider the corresponding ODE as

(11)$${h}'(t)=\sigma \left( {{W}_{zh}}h\left( t \right)+{{b}_{z}} \right)\circ \left[ \tanh \left( {{W}_{rh}}h\left( t \right)+{{b}_{r}} \right)\circ \tanh \left( {{W}_{h}}h\left( t \right)+{{b}_{h}} \right)-h\left( t \right) \right],$$

where $h( t )\in {{R}^{n}}$, ${{W}_{zh}},{{W}_{rh}},{{W}_{h}}\in {{R}^{n\times n}}$, ${{b}_{z}},{{b}_{r}},{{b}_{h}}\in {{R}^{n}}$.

Theorem 3.3 For any weight matrix, the input-free RUEM is stable at the zero solution, where $h(t) = 0$.

Proof For ${{W}_{rh}}h( t )+{{b}_{r}}$, it can be expressed as $[ {{W}_{zh}},{{b}_{z}} ]\cdot {{[ h( t ),1 ]}^{T}}={{\hat {W}}_{zh}}\hat {h}$, and similar representations can also be expressed in the same form. Therefore, to simplify the calculations, consider the stability of the ODE under the condition that ${{b}_{z}}={{b}_{r}}={{b}_{h}}=0$.

Consider the system of differential equations corresponding to a RUEM with initial value conditions as

(12)$$\begin{aligned}\left\{ \begin{array}{*{35}{l}} {h}'\left( t \right)=g\left( h\left( t \right) \right)=\sigma \left( {{W}_{zh}}h\left( t \right) \right)\circ \left[ \tanh \left( {{W}_{rh}}h\left( t \right) \right)\circ \tanh \left( {{W}_{h}}h\left( t \right)-h\left( t \right) \right) \right] \\ h\left( 0 \right)=0, \\ \end{array} \right. \end{aligned}$$

It is known that the values of the $\sigma$ function and the tanh function at the origin are ${1}/{2}$ and 0, respectively. Therefore, it can be concluded that the ODE system (12) has a zero solution $h(t) = 0$.

Consider $h( t )\in {{R}^{n}}$, $g:{{R}^{n}}\to {{R}^{n}}$, then

(13)$$\begin{aligned}h\left( t \right)={{\left( {{h}^{1}}\left( t \right),{{h}^{2}}\left( t \right),\ldots ,{{h}^{n}}\left( t \right) \right)}^{T}}, \end{aligned}$$

(14)$$\begin{aligned}g\left( h\left( t \right) \right)={{\left( {{g}_{1}}h\left( t \right),{{g}_{2}}h\left( t \right),\ldots ,{{g}_{n}}h\left( t \right) \right)}^{T}}. \end{aligned}$$

It follows that the i-th element of the system (12) has the expression

(15)$${{g}_{i}}\left( h\left( t \right) \right)=\sigma \left( W_{zh}^{i}h(t) \right)\circ \left[ \tanh \left( W_{rh}^{i}h\left( t \right) \right)\circ \tanh \left( W_{h}^{i}h\left( t \right) \right)-{{h}^{i}}\left( t \right) \right],$$

where $i=1,2,\ldots,n$. In the following, $h( t )$ can be abbreviated as $h$.

It is known that ${{W}_{zh}},{{W}_{zh}},{{W}_{h}}\in {{R}^{n\times n}}$, which can be represented as the following matrix

(16)$$\begin{aligned}{{W}_{zh}}=\left[ \begin{matrix} W_{zh}^{1} \\ W_{zh}^{2} \\ \vdots \\ W_{zh}^{n} \\ \end{matrix} \right]=\left[ \begin{matrix} W_{zh}^{11} & W_{zh}^{12} & \cdots & W_{zh}^{1n} \\ W_{zh}^{21} & W_{zh}^{22} & \cdots & W_{zh}^{2n} \\ \vdots & \vdots & \cdots & \vdots \\ W_{zh}^{n1} & W_{zh}^{n2} & \cdots & W_{zh}^{nn} \\ \end{matrix} \right],{{W}_{rh}}=\left[ \begin{matrix} W_{rh}^{1} \\ W_{rh}^{2} \\ \vdots \\ W_{rh}^{n} \\ \end{matrix} \right],{{W}_{h}}=\left[ \begin{matrix} W_{h}^{1} \\ W_{h}^{2} \\ \vdots \\ W_{h}^{n} \\ \end{matrix} \right]. \end{aligned}$$

Furthermore, the Jacobian matrix corresponding to the system of differential equations (12) can be expressed as

(17)$$\begin{aligned}J\left( h \right)=\left[ \left( \frac{\partial {{g}_{i}}\left( h \right)}{\partial {{h}^{j}}} \right) \right]=\left[ \begin{matrix} \frac{\partial {{g}_{1}}\left( h \right)}{\partial {{h}^{1}}} & \frac{\partial {{g}_{1}}\left( h \right)}{\partial {{h}^{2}}} & \cdots & \frac{\partial {{g}_{1}}\left( h \right)}{\partial {{h}^{n}}} \\ \frac{\partial {{g}_{2}}\left( h \right)}{\partial {{h}^{1}}} & \frac{\partial {{g}_{2}}\left( h \right)}{\partial {{h}^{2}}} & \cdots & \frac{\partial {{g}_{2}}\left( h \right)}{\partial {{h}^{n}}} \\ \vdots & \vdots & \cdots & \vdots \\ \frac{\partial {{g}_{n}}\left( h \right)}{\partial {{h}^{1}}} & \frac{\partial {{g}_{n}}\left( h \right)}{\partial {{h}^{2}}} & \cdots & \frac{\partial {{g}_{n}}\left( h \right)}{\partial {{h}^{n}}} \\ \end{matrix} \right], \end{aligned}$$

where $i,j=1,2,\ldots,n$. Define the partial derivative

(18)$$\begin{aligned}{{\left. {{\gamma }_{ij}} \right|}_{h=0}}={{\left. \frac{\partial {{g}_{i}}\left( h \right)}{\partial {{h}^{j}}} \right|}_{h=0}}=\left\{ \begin{matrix} -\frac{1}{2} & i=j \\ 0 & i\ne j \\ \end{matrix} \right., \end{aligned}$$

Substituting (18) into (17) yields

(19)$$\begin{aligned}J\left( h \right)=\left[ \begin{matrix} -\frac{1}{2} & 0 & \cdots & 0 \\ 0 & -\frac{1}{2} & \cdots & 0 \\ \vdots & \vdots & \cdots & \vdots \\ 0 & 0 & \cdots & -\frac{1}{2} \\ \end{matrix} \right]={-}\frac{1}{2}E, \end{aligned}$$

where $E\in {{R}^{n\times n}}$ is the identity matrix.

Substituting (19) into Theorem 3.1 yields

(20)$$\underset{i=1,2,\ldots ,n}{\mathop{\max }}\,R\left( {{\lambda }_{i}}\left( -\frac{1}{2}E \right) \right)={-}\frac{1}{2}\le 0.$$

Therefore, it can be concluded that ODE (11) is stable in the zero solution $h(t) = 0$. Furthermore, in the absence of input variables, REUM (7) represents the forward Euler discretisation of ODE (11). From (19) it can be seen that the Jacobian at $h_t$ is ${J_t} = -\frac {1}{2}E$. Substituting this into the left hand side of Theorem 3.1 yields

(21)$$\underset{i=1,2,\ldots ,n}{\mathop{\max }}\,\left| 1+\mu {{\lambda }_{i}}\left( -\frac{1}{2}E \right) \right|=\underset{i=1,2,\ldots ,n}{\mathop{\max }}\,\left| 1-\frac{\mu }{2} \right|\le 1.$$

Theorem 3.3 proves the stability of the RUEM without input variables, which means that for any weight matrix, the RUEM without input is stable at the zero solution. To validate this conclusion, we arbitrarily choose three initial hidden layer states ($h_0\in {{R}^{2\times 2}}$) in 3D space: $(0.15, 0.50, 0.50)$, $(-0.45, -0.55, 0.50)$ and $(0.50, -0.35, -0.65)$. To intuitively demonstrate the motion trajectories of the hidden layer state $h$ of RUEM without input variables, 50 iterations are chosen in this paper. Figure 4 shows the dynamic behaviour of the hidden layer state $h$ under different weight matrices $W_{zh},W_{rh},W_h \in \mathbb {R}^{3\times 3}$.

Fig. 4. Dynamic behavior of RUEM hidden layer states in three-dimensional space

Download Full Size | PDF

Figure 4 illustrates different propagation results, where three pentagrams each represent three initial hidden layer states. Each coloured line is associated with an initial state, representing the feature propagation of the network in space. It can be observed that the three hidden layer states, denoted by h, gradually converge towards the origin in different directions, indicating the stability of RUEM at the origin. Furthermore, due to the different values of the weight matrices in a), b), c) and d) of Fig. 4, in a) the values of the weight matrices ${W_{zh}}, {W_{rh}}, {W_{h}}$ follow a normal distribution (0,1/3); in b) the eigenvalues of the weight matrices are greater than 0, less than 0 and purely imaginary; in c) they represent randomly generated weight matrices. Therefore, the speeds at which the three hidden layer states h converge to the origin are different, and the trajectories of the hidden layer states in Fig. 4 are also different.

4. Numerical results and analysis

We transmit LG beams through the computer analysed simulated turbulent channel. First, the binary data stream is mapped to different OAM-carrying modes and loaded into the spatial light modulator to generate LG beams carrying different OAM modes. The LG beams then pass through a free-space turbulent channel simulated by a randomly phased screen. The specific parameters for the numerical simulation are wavelength $\lambda \text {=1550}nm$, beam Waist ${w}_{0}\text {=6}cm$, interval between sequential phase screens $\Delta z\text {=200}m$, the value of inner scale of AT ${l}_{0}\text {=0}\text {.0003}m$, the value of outer scale of AT ${L}_{0}\text {=50}m$.

In addition, we have set seven levels of turbulence intensity, which correspond to $C_{n}^{2}$ values as follows: $1\times {{10}^{-16}}{{m}^{-2/3}}$, $5\times {{10}^{-16}}{{m}^{-2/3}}$, $1\times {{10}^{-15}}{{m}^{-2/3}}$, $5\times {{10}^{-15}}{{m}^{-2/3}}$, $1\times {{10}^{-14}}{{m}^{-2/3}}$, $5\times {{10}^{-14}}{{m}^{-2/3}}$ and $1\times {{10}^{-13}}{{m}^{-2/3}}$. These cover situations ranging from low turbulence to high turbulence. Under these turbulent conditions, the wavefront phase of the LG beam experiences varying degrees of distortion, and the degree of distortion is directly proportional to the turbulence intensity. It should be noted that due to the stochastic nature of turbulence, the image distortion can also be different under the influence of the same turbulence intensity. At higher turbulence intensities, similar wavefront perturbations caused by different turbulence intensities may result in similar OAM intensity images. Details are given in Fig. 5.

The culmination of LG beam channel transmission involves capturing the distorted LG beam spot image, influenced by turbulence, using a CCD camera. This captured image is then processed through the OATNN inversion model for feature extraction and subsequent processing. To enhance the accuracy and convergence speed of the OATNN inversion model, the collected data undergoes zero-mean normalization before integration into the model. These processed image data constitute our experimental dataset. Considering the training expenses associated with OATNN, a subset of $v$ random samples is chosen for batch training. Beyond these factors, the selection of the learning rate $\eta$, epoch count $e$, and modulation coefficient $\rho$ within the inversion model significantly influence the resultant inversion outcomes. After pre-experimentation, we have specified particular parameters for OATNN, delineated in Table 1.

Fig. 5. Received intensity images of LG beams in 10 OAM modes, in a simulated 1000 metre free-space turbulent channel, with turbulent intensities as follows: a) $1{{e}^{-16}}$, b) $5{{e}^{-16}}$, c) $1{{e}^{-15}}$, d) $5{{e}^{-15}}$, e) $1{{e}^{-14}}$, f) $5{{e}^{-14}}$, g) $1{{e}^{-13}}$.

Download Full Size | PDF

Table 1. Network Parameters

View Table

4.1 Simultaneous identification of turbulence intensity and OAM patterns

We investigated the detection performance of the OATNN model for ten different OAM modes (including $\pm$1 to $\pm$10) under four different turbulence intensity levels ($1\times {{10}^{-13}}{{m}^{-2/3}}$, $1\times {{10}^{-14}}{{m}^{-2/3}}$, $1\times {{10}^{-15}}{{m}^{-2/3}}$ and $1\times {{10}^{-16}}{{m}^{-2/3}}$). The training dataset consists of 16,000 images of $224*224$ pixels, with 400 images for each turbulence intensity and OAM mode. Similarly, the test dataset consists of 4000 images of 224*224 pixels, with 100 images for each turbulence intensity and OAM mode.

After 500 epochs of OATNN training, our model achieved high accuracy on both the training and testing datasets. In the training set, we achieved accuracy rates of 99.98% for OAM mode and 99.68% for turbulence intensity. On the test set, our model also performed exceptionally well, with accuracy rates reaching 99.05% and 99.37% respectively. This series of highly accurate data demonstrates the outstanding performance of our model in the identification tasks of OAM mode and atmospheric turbulence. We further conducted detailed visualization processing of the test set results, as shown in Fig. 6.

Fig. 6. Confusion matrix for the OAM model and the turbulence intensity.

Download Full Size | PDF

Figure 6(a) illustrates the classification performance of the OATNN model on different OAM modes using a confusion matrix. The columns represent the actual OAM modes ${O_i}$ of the LG beams, while the rows represent the model’s predicted results ${\tilde {O_i}}$. Each element $(m, n)$ in the matrix indicates the probability of predicting the actual OAM mode ${O_n}$ of the LG beam as ${\tilde {O_m}}$. Elements above the diagonal indicate accurate classifications by OATNN. Overall, all OAM modes achieve over 97% accuracy, with misclassifications mainly occurring between adjacent modes due to similar intensity images influenced by atmospheric turbulence. In particular, OAM modes 1 and 2, which are characterised by simpler intensity images, show higher recognition accuracy than others. In Fig. 6(b), the rows of the confusion matrix represent the actual turbulence intensity levels, while the columns represent the inversion results of OATNN. The numerical values in each cell reflect the quantitative relationship between the actual and inverted turbulence intensity levels, indicating the model’s classification probability for different turbulence intensity levels. The turbulence intensity confusion matrix shows the satisfactory classification performance of the OATNN model over different turbulence intensity levels.

When using the NVIDIA Quadro P600 graphics accelerator (GPU), the time for a single training iteration is approximately 1 minute. It is worth noting that by enhancing hardware performance, the training time of the network can be significantly reduced. After training completion, the OATNN model performs inversion of OAM modes and AT intensity for a single speckle image, taking only 2.5 milliseconds. In addition, we further compare the performance between OATNN and several other models, denoted as CNN1[22] CNN2[25], CNN3[19], CNN4[23], CNN5[24].We have implemented these algorithms with the specific parameters and the structure descripted in the corresponding references. The results indicate that the inversion performance of OAM modes and turbulence intensity of the OATNN model is superior to other models, as shown in Fig. 7.

Fig. 7. Comparison of inversion performance of algorithms

Download Full Size | PDF

4.2 Ablation study

To further investigate the performance of our proposed OATNN model in the inversion tasks of LG beam OAM modes and atmospheric turbulence intensity, we performed a series of ablation experiments. In these experiments, we systematically disabled certain key components within the network to assess their impact on the overall model performance.

To contrast the performance of the OATNN model with a single-output network, we built an equivalent single-output network that mirrors the number of layers and overall structure of the OATNN. Figure 8(a) shows the architecture of this single-output network, which includes identical components such as feature extraction and convolutional layers. Similar to the OATNN, this network also predicts both OAM modes and turbulence intensity, and includes 40 classification categories - four OAM inversion labels and ten turbulence intensity inversion labels. After training, the single output network achieved an accuracy of 96.62%. The Figure shows the loss function variations throughout the training stages of the single-output neural network. To assess the impact of the RUEM module, we conducted experiments by excluding it from our network. This module plays a critical role in maintaining the stability of the data transfer and in capturing long-term dependencies in the beam spot images. The modified network, without the stability structure module, is shown in Fig. 8(b). After training, the branch network achieved 99.92% accuracy in OAM mode detection and 99.73% in turbulence intensity on the training dataset, comparable to OATNN. However, on the test dataset, the branch network achieved accuracies of 97.92% for OAM mode detection and 99.25% for turbulence intensity detection.

Fig. 8. Network structure design for ablation study

Download Full Size | PDF

We perform a detailed performance comparison between OATNN, Branching Networks and Single-Output Networks for retrieving beam OAM modes in turbulent environments. Figure 9(a) shows the comparison of detection accuracy for different OAM modes for the three models. In general, the Single-Output Network shows lower accuracy compared to the Branching Network and the OATNN, with significant result variations. The Branching Network, while performing better than the Single-Output Network, still shows significant instability in accuracy under varying atmospheric conditions. In contrast, OATNN consistently achieves over 97% accuracy, approaching 100% for turbulence intensity and OAM mode detection. With minimal variation, OATNN shows a clear advantage, establishing its reliability and efficiency over the other models. Figure 9(b) illustrates the performance comparison between OATNN, the branch network and the single-output network in turbulence intensity inversion detection. The single-output network achieves accuracies of 99.2%, 97.6%, 96.9% and 92.7% across four turbulence intensity categories. In comparison, the branch network outperforms, achieving 100%, 99.3%, 98.8% and 99.5% accuracy in these categories. In particular, for tasks with strong turbulence inversion, the branch network outperforms the single output network. OATNN shows consistent performance improvements over varying turbulence levels while maintaining stability. However, achieving near-perfect accuracy limits further significant improvement.

Fig. 9. Comparison of recognition accuracy of three models

Download Full Size | PDF

4.3 Complex application scenario experiment

In previous experiments, we have extensively studied the OATNN model’s inversion tasks of OAM modes and turbulence intensity at a distance of 1000 metres and achieved satisfactory results. However, in practical applications, atmospheric adaptive transmission systems often require more accurate identification of turbulence intensity or may need to consider longer transmission distances. The following experiments were therefore carried out.

Fig. 10. Recognition accuracy in 2000 m transmission distance scenarios

Download Full Size | PDF

Compared to the 1000 metre transmission distance, we further investigated the performance of OATNN in the 2000 metre transmission scenario. In this application scenario, we used LG beams with the same channel transmission parameters and faced similar challenges of OAM mode and turbulence intensity inversion. Compared to previous experiments, the longer transmission distance may have a greater impact of atmospheric turbulence, which places higher demands on the OAM beam transmission and demodulation tasks. The experimental results show that for the turbulence intensity inversion task, OATNN still performs well at this longer distance, achieving an accuracy of 98.37%, although there is a slight decrease of 1.31% compared to the 1000 metre situation. For the OAM mode inversion task, OATNN’s accuracy at 2000 metres is 91.57%, a decrease of 8.41% compared to the 1000 metre scenario. Details are given in Fig. 10.

Fig. 11. Recognition accuracy in seven turbulence intensity scenarios

Download Full Size | PDF

In comparison to the four turbulence intensity detection applications, we further investigated the performance of OATNN in scenarios requiring a more refined turbulence intensity. Building on the foundation of the 4.1 experiments, we extended the atmospheric turbulence intensity categories from four to seven: $1\times {{10}^{-13}}{{m}^{-2/3}}$, $1\times {{10}^{-14}}{{m}^{-2/3}}$, $1\times {{10}^{-15}}{{m}^{-2/3}}$, $1\times {{10}^{-16}}{{m}^{-2/3}}$, $5\times {{10}^{-14}}{{m}^{-2/3}}$, $5\times {{10}^{-15}}{{m}^{-2/3}}$, $5\times {{10}^{-16}}{{m}^{-2/3}}$. This finer classification provides more detailed information for adaptive systems, offering a more accurate description of atmospheric turbulence characteristics, thereby helping to optimise laser signal transmission. The experimental results show that OATNN still performs admirably even when performing tasks involving the inversion of seven turbulence intensities, achieving an accuracy of 97.22% in turbulence intensity, with only a 2.15% decrease compared to the task involving four intensities. Furthermore, as no modifications were made to the OAM mode related segments of OATNN, the inversion accuracy for the OAM mode is 98.89%. This highlights the high scalability of OATNN compared to single output networks and its adaptability to multi-tasking requirements. Details are given in Fig. 11.

5. Conclusion

This paper proposes a multitasking network model, OATNN, which combines a network unit, REUM, with good stability to efficiently demodulate OAM modes and synchronously perform high-precision turbulence intensity inversion in atmospheric turbulence environments. Based on the theory of dynamical systems, the REUM stable neural network unit is designed to ensure stability and prevent the amplification of noise disturbances in the input features. Distorted point images affected by turbulence are used as model inputs. Coupled with this stable REUM unit, OATNN can perform layer-by-layer feature extraction, allowing stable sharing and migration of feature information between different tasks. Without the need for additional optical elements, the proposed model can efficiently and accurately demodulate the OAM of LG beams in turbulent scenes. At the same time, it can identify the intensity of atmospheric turbulence to detect damage to communication links in real time and adjust the optical system to improve communication quality.

Funding

Natural Science Foundation of Jilin Province (20220101040JC); National Natural Science Foundation of China (12371422); National Natural Science Foundation of China (12271207).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Y. Ji, J. Zhang, Y. Xiao, et al., “5g flexible optical transport networks with large-capacity, low-latency and high-efficiency,” China Commun. 16(5), 19–32 (2019). [CrossRef]

2. B. Li, Z. Fei, and Y. Zhang, “Uav communications for 5g and beyond: Recent advances and future trends,” IEEE Internet Things J. 6(2), 2241–2263 (2019). [CrossRef]

3. R. UllahS. UllahW. A. Imtiaz, “High-capacity free space optics-based passive optical network for 5g front-haul deployment,” in Photonics, vol.10 (2023), p.1073.

4. M. A. Khalighi and M. Uysal, “Survey on free space optical communication: A communication theory perspective,” IEEE Commun. Surv. Tutorials 16(4), 2231–2258 (2014). [CrossRef]

5. L. Allen, M. W. Beijersbergen, R. Spreeuw, et al., “Orbital angular momentum of light and the transformation of laguerre-gaussian laser modes,” Phys. Rev. A 45(11), 8185–8189 (1992). [CrossRef]

6. J. Wang, J.-Y. Yang, and I. M. Fazal, “Terabit free-space data transmission employing orbital angular momentum multiplexing,” Nat. Photonics 6(7), 488–496 (2012). [CrossRef]

7. Y. Lian, Y. Yu, and S. Han, “Oam beams generation technology in optical fiber: A review,” IEEE Sensors J. 22(5), 3828–3843 (2022). [CrossRef]

8. S. Koenig, D. Lopez-Diaz, and J. Antes, “Wireless sub-thz communication system with high data rate,” Nat. Photonics 7(12), 977–981 (2013). [CrossRef]

9. I. Vagniluca, B. Da Lio, and D. Rusca, “Efficient time-bin encoding for practical high-dimensional quantum key distribution,” Phys. Rev. Applied 14(1), 014051 (2020). [CrossRef]

10. M. Beijersbergen, R. Coerwinkel, M. Kristensen, et al., “Helical-wavefront laser beams produced with a spiral phaseplate,” Opt. Commun. 112(5-6), 321–327 (1994). [CrossRef]

11. T. Lei, M. Zhang, and Y. Li, “Helical-wavefront laser beams produced with a spiral phaseplate,” Light: Science & Applications 4(3), e257 (2015). [CrossRef]

12. C. Kai, P. Huang, and F. Shen, “Orbital angular momentum shift keying based optical communication system,” IEEE Photonics J. 9(2), 1–10 (2017). [CrossRef]

13. J. Zhou, W. Zhang, and L. Chen, “Experimental detection of high-order or fractional orbital angular momentum of light based on a robust mode converter,” Appl. Phys. Lett. 108(11), 1 (2016). [CrossRef]

14. T. Giordani, A. Suprano, and E. Polino, “Machine learning-based classification of vector vortex beams,” Phys. Rev. Lett. 124(16), 160401 (2020). [CrossRef]

15. E. Lamilla, C. Sacarelo, and M. S. Alvarez-Alvarado, “Optical encoding model based on orbital angular momentum powered by machine learning,” Sensors 23(5), 2755 (2023). [CrossRef]

16. M. Krenn, R. Fickler, and M. Fink, “Communication with spatially modulated light through turbulent air across vienna,” New J. Phys. 16(11), 113028 (2014). [CrossRef]

17. T. Doster and A. T. Watnik, “Machine learning approach to oam beam demultiplexing via convolutional neural networks,” Appl. Opt. 56(12), 3386–3396 (2017). [CrossRef]

18. Q. Zhao, S. Hao, and Y. Wang, “Mode detection of misaligned orbital angular momentum beams based on convolutional neural network,” Appl. Opt. 57(35), 10152–10158 (2018). [CrossRef]

19. Z. Wang, M. I. Dedo, K. Guo, et al., “Efficient recognition of the propagated orbital angular momentum modes in turbulences with the convolutional neural network,” IEEE Photonics J. 11(3), 1–14 (2019). [CrossRef]

20. D. M. I. Zhou Hongping, Pan Zhenzhen, G. Zhongyi, et al., “High-efficiency and high-precision identification of transmitting orbital angular momentum modes in atmospheric turbulence based on an improved convolutional neural network,” J. Opt. 23(6), 065701 (2021). [CrossRef]

21. J. Li, M. Zhang, and D. Wang, “Joint atmospheric turbulence detection and adaptive demodulation technique using the cnn for the oam-fso communication,” Opt. Express 26(8), 10494–10508 (2018). [CrossRef]

22. W. Xiong, Y. Luo, L. Junmin, et al., “Convolutional neural network assisted optical orbital angular momentum identification of vortex beams,” IEEE Access 8, 193801–193812 (2020). [CrossRef]

23. Y. Hao, L. Zhao, T. Huang, et al., “High-accuracy recognition of orbital angular momentum modes propagated in atmospheric turbulences based on deep learning,” IEEE Access 8, 159542–159551 (2020). [CrossRef]

24. Z. Li, J. Su, and X. Zhao, “Two-step system for image receiving in oam-sk-fso link,” Opt. Express 28(21), 30520–30541 (2020). [CrossRef]

25. X. Li, L. Sun, J. Huang, et al., “Research on orbital angular momentum recognition technology based on a convolutional neural network,” Sensors 23(2), 971 (2023). [CrossRef]

26. J. C. Wyngaard, Turbulence in the Atmosphere (Cambridge University, 2010).

27. J. C. Houbolt, “Atmospheric turbulence,” AIAA J. 11(4), 421–437 (1973). [CrossRef]

28. E. Haber and L. Ruthotto, “Stable architectures for deep neural networks,” Inverse problems 34(1), 014004 (2017). [CrossRef]

29. B. Chang, M. Chen, E. Haber, et al., “Antisymmetricrnn: A dynamical system view on recurrent neural networks,” arXiv, arXiv:1902.09689 (2019). [CrossRef]

30. Y. Lu, A. Zhong, Q. Li, et al., “Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations,” in International Conference on Machine Learning, (2018), pp. 3276–3285.

31. J. C. Butcher, Numerical methods for ordinary differential equations (John Wiley & Sons, 2018).

Efficient synchronous retrieval of OAM modes and AT strength using multi-task neural networks

Abstract

1. Introduction

2. Analytical model of atmospheric turbulence

3. OATNN model architecture design

3.1 Architecture of OATNN

3.2 Architecture of RUEM

3.3 Stability analysis of RUEM

4. Numerical results and analysis

4.1 Simultaneous identification of turbulence intensity and OAM patterns

4.2 Ablation study

4.3 Complex application scenario experiment

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (1)

Equations (21)

Optics Express