Deep-learning-assisted communication capacity enhancement by non-orthogonal state recognition of structured light

Hao Wang; Ziyu Zhan; Yijie Shen; Jianqi Hu; Xing Fu; Xing Fu; Xing Fu; Qiang Liu; Qiang Liu; Qiang Liu

doi:10.1364/OE.465318

1. Introduction

There is a large number of ongoing research in the free-space optical (FSO) communication community on the exploration of utilizing various physically distinguishable degrees of freedom of light, e.g. amplitude, phase, frequency, polarization, aiming to ease the bandwidth crunch due to exponentially growing data traffic in the information age [1,2]. As a special member of structured light, vortex beams carrying orbital angular momenta (OAM) are intensively investigated due to the unlimited Hilbert space they provide [3]. Such OAM modes are usually characterized by their topological charge $\ell $, a single quantum number that can be any integer number in theory [4]. Till now, vortex beams have expedited the advances from optical trapping [5], imaging [6], metrology [7] to quantum information processing [8].

Although arbitrary orders of OAM modes are available in principle, some challenging issues remains in terms of generation, propagation and detection. In practical communication scenarios, one can only generate finite amount of OAM modes regardless of using spatial light modulator [9], or laser cavity [10]. Higher order OAM modes are endowed with a larger effective beam width and they diverge faster and are more prone to be affected by turbulence than lower order ones, leading to nontrivial channel cross-talk and power loss [2]. Besides, the comparison results show that only OAM-based multiplexing may not outperform both conventional line-of-sight multi-input multi-output transmission and spatial-mode multiplexing [11,12]. Therefore, other than only focusing on eigen modes with different topological charges, to further expand the available state space, recent developments highlight communication systems leveraging low-order fractional OAM beams [11,13–17], Laguerre-Gaussian (LG) modes with different radial indices [18–20], multi-vortex geometric modes [21] and novel broad bandwidth OAM beams [22], to name a few. In particular, fractional vortex beams with edge dislocations can greatly increase the communication capacity, manifested in a recent work by transferring information via OAM modes with minimum interval $\mathrm{\Delta }\ell $ = 0.01 [17]. However, a bottleneck of fractional OAM modes falls into propagation instability, which poses another challenge for communications [13,23]. Additionally, LG modes with nonzero radial indices and other crafted OAM modes only offer limited improvement of the available state space.

Recently, it is noteworthy that data-driven deep learning algorithms have emerged as an exceptional tool to solve difficult problems in photonics such as holography [24], microscopy [25], device design [26] and sensing [27]. They also advance in recognizing orthogonal OAM modes [28], HG modes [29], multi-singularities modes [30] and non-orthogonal fractional OAM modes [15], as a novel and effective demodulation technique compared to traditional counterparts (e.g. interferometer [31], customized computer generated holograms [32], geometric transformation [33,34]).

In this paper, we seek to utilize non-orthogonal structured beams from different categories as information carriers. We jointly use Laguerre-Gaussian (LG) modes [35], Hermite-Gaussian (HG) modes [36], Bessel-Gaussian (BG) modes [37], Airy vortex (AV) modes [38], SU(2) vortex modes [39] and Mathieu-Gaussian (MG) modes [40] altogether to transmit messages. Embracing their non-orthogonality not only unleashes tremendous state space in hand, but also enables us to avoid disadvantages brought by high-order modes. The key to harness them is the capability of resolving these states [41], in which we resort to end-to-end neural networks. We apply an adapted ResNet [42] to successfully classify 288 different single modes and 256 superposition modes using transfer learning technique. Two structured-beam-based communication schemes including mode encoding and mode shift keying (SK) are demonstrated as proof of principle, where a 256-grayscale Richard Feynmann portrait is successfully reconstructed with satisfactory image quality. To alleviate the ‘black-box’ nature of neural networks, we interpret the model by figuring out which and how the spatial intensity signals influence the final predictions. More importantly, we investigate more practical scenarios which incorporate: (i) atmosphere turbulence (AT) and (ii) limited receiver aperture into the system, where the proposed method is still feasible under unknown turbulence and with partial received signals due to the improved model generalization ability. This study liberates us from conventional limited spatial optical fields and paves the way towards next-generation high-capacity optical communications.

2. Methods

In addition to vortex beams carrying well-known OAM which are extensively investigated over the years, two-dimensional structured beams are rich in members. According to the Helmholtz equation (HE) or paraxial wave equation (PWE) at different coordinate systems, one can obtain many groups of beams propagating in free space such as HG beam (Cartesian coordinate), LG beam (polar coordinate) [1]. In the following, to validate our proposed FSO data transmission scheme, we experimentally utilize six categories of structured beams including LG, HG, BG, AV, SU(2), MG beams altogether as information carriers. We focus on the case of scalar optics without considering spatial polarization states of the beams. Generally, one can express the electric field component as $\psi ({{q_x},{q_y},z} )$ where $({{q_x},{q_y}} )\; $ are the transverse coordinates that vary with coordinate systems and z is the propagation axis. For two-dimensional HE, $\psi ({{q_x},{q_y},z} )$ satisfies [43]:

(1)$$\left\{ \begin{array}{l} \psi ({{q_x},{q_y},z} )= U({{q_x},{q_y}} )\exp ({i{k_z}z} )\\ \nabla_t^2[{U({{q_x},{q_y}} )} ]+ k_t^2U({{q_x},{q_y}} )= 0\textrm{ },\\ k_t^2 + k_z^2 = {k^2} \end{array} \right.$$

where ${k_z}$, ${k_t}$ are longitudinal and transverse wave numbers respectively and $k = 2\pi /\lambda $ is the wave number with $\lambda $ the wavelength. $\nabla _t^2$ denotes the transverse Laplacian operator. For PWE with slowly varying envelope approximation, $\psi ({{q_x},{q_y},z} )$ satisfies:

(2)$$\left\{ \begin{array}{l} \psi ({{q_x},{q_y},z} )= F({{q_x},{q_y},z} )\exp ({ikz} )\\ \nabla_t^2[{F({{q_x},{q_y},z} )} ]+ 2ik\frac{{\partial F({{q_x},{q_y},z} )}}{{\partial z}} = 0 \end{array} \right.\textrm{ }\textrm{.}$$

To increase the modal diversity, we solve Eq. (1) and Eq. (2) under different conditions (see analytical expressions for each category of beams in Supplement 1). For sake of a vivid illustration of these exotic beams, we simulate six examples in Fig. 1. For each beam, the three-dimensional intensity shows its corresponding evolution along the propagation direction from $z = 0$ to $z = L$ where $L$ equals to the Rayleigh distance for LG, HG, SU(2), MG beams and half the Rayleigh distance for BG, AV beams given the same beam width and optical frequency. The right panel for each subfigure displays the intensity and phase patterns at plane $z = 0$.

Fig. 1. The beam families that are exploited in this work. (a) LG mode with $p = 5$, $\ell = 6$. (b) HG mode with $m = 6$, $n = 5$. (c) BG mode with ${k_t} = 100$, $\ell = 6$. (d) AV mode with ${r_0} = 3.0 \times {10^{ - 5}}$, $\alpha = 0.4$. (e) SU(2) vortex mode with $Q = 6$, ${n_0} = 5$ and $M = 7$. (f) MG mode with ${k_t} = 3.0 \times {10^4}$, $m = 4$. Refer to Supplement 1 for the definitions of these parameters. For each subfigure, the left column displays the evolution of three-dimensional intensity (with opacity of 0.1) along propagation axis z. The right column depicts the intensity and phase at initial plane ($z = 0$).

Download Full Size | PDF

Intuitively, the different categories endow us with rich available state spaces even only counting on low-order beams, where the key to harness them is how to decode the data from these non-orthogonal states.

The conceptual illustration of our system is illustrated in Fig. 2. We first collect the states via a conventional free space setup in Fig. 2(a). In particular, a continuous-wave light source (CNI Laser, MGL-III-532) is firstly expanded $8$-times in diameter. Then a half wave plate is used to tune the polarization of the incoming beam to match with the working polarization of the phase-only spatial light modulator (SLM, Hamamatsu, X13138-04, resolution of $1280 \times 1024$, pixel size of $12.5\mathrm{\;\ \mu m}$). Multiple computer-generated-holograms (CGHs) are loaded on the SLM in succession to modulate the fundamental Gaussian beam into different structures. The inset showcases a hologram corresponding to a superposition state under turbulence generated using a method from [44,45]. The modulated beam is then projected to the Complementary-Metal-Oxide-Semiconductor (CMOS) camera (AVT Mako G-131B, resolution of 1280 ${\times} $ 1024, pixel size of $5.3\mathrm{\;\ \mu m}$) plane through a 4-f system. Note the inserted polarizer is to adjust beam intensity to avoid the camera from overexposure. The iris selects the target first-order signals at the spatial spectrum plane. This experimental setup allows us to acquire large datasets efficiently via LabVIEW program control. As elucidated in Fig. 2(b), the obtained beam sets are preprocessed and fed into the decoding DNN. In the following sections, we thoroughly validate our method by considering various situations, including distinctive beam categories, either single or multiplexed, unknown AT and partial receiver aperture. The core to drive this work is the brute-force neural network, whose detailed architecture is shown in Fig. 2(c). The end-to-end network is called ResNet composing of 35 layers (adapted from ResNet34), which is well-known for the contribution in preventing gradient vanishing problem [42]. Given an intensity image with size of $224 \times 224$ pixels, a series of convolutional (${\times} 33$), pooling and nonlinear layers with skip connections are applied to extract the hierarchical high-dimensional features, which are then gathered by fully connected layers (${\times} 2$) to make a prediction. We identify the recognition process as a classification problem and optimize the DNN through standard error backpropagation, where the cross-entropy loss is calculated by

(3)$$L({y,\hat{y};\theta } )={-} \frac{1}{N}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^M {[{{y_{ij}}\log ({{{\hat{y}}_{ij}}} )+ ({1 - {y_{ij}}} )\log ({1 - {{\hat{y}}_{ij}}} )} ]+ C||\theta ||_2^2\textrm{ ,}} }$$

where $\hat{y}$ and y are the output and corresponding ground truth, $\theta $ refers to the model parameters, N and M are the total numbers of the dataset and output neurons respectively, and the last term with a constant C is introduced as ${L_2}$ regularization penalty to prevent overfitting. Gradient-descent-based updating is equivalent to minimizing the loss function in Eq. (3). Adam optimizer [46] with weight decay of $5.0 \times {10^{ - 3}}$ is utilized. To reduce the training time and improve training stability, we use a pre-trained model based on ImageNet [47], transfer it to our case [48], and retrain the network on the PyTorch framework [49] by a personal desktop (GPU, NVIDIA RTX 3060 Ti).

Fig. 2. Schematic of the experimental setup and workflow. (a) Beam preparation setup. BE, beam expander; HWP, half-wave plate; BS, beam splitter; SLM, spatial light modulator; P, polarizer; L1, lens with $f = 150\,\textrm{mm}$; L2, lens with $f = 75\,\textrm{mm}$; CMOS, camera. Inset, details of a hologram. The SLM and camera are monitored by the computer to work synchronously. (b) Once the datasets containing training, validation and test samples are prepared, they are fed into the deep neural network for further decoding. (c) The network architecture composed of 35 effective layers. Conv, convolutional layer; Pool, max or average pooling operation; ReLu, rectified linear units; Flatten, flatten operation; FCL, fully connected layer (with dropout of 0.5); Softmax, data transformation. Note the output classes may vary with different tasks (288 or 256).

Download Full Size | PDF

3. Results

3.1 Single mode results

We first investigate the recognition capability of the DNN with respect to single modes. Note here “single” means that we do not superpose these structured light from different categories, therefore SU(2) modes are also treated as single modes. We record the model’s performance during updating via the visualization toolkit TensorBoard as shown in Fig. 3(a). For single modes, we classify 288 different structured beams, where there are 48 elements for each category (see Supplement 1 for detailed modal parameters). We then augment the dataset with different initial phases. Specifically, for each structured beam, we change the initial phase from 0 to 1.96$\pi $ with a step of 0.04$\pi $ and 42 of them are for training, 5 for validation and 3 for test. Consequently, there are overall 12096 (1440, 864) data samples within the training (validation, test) set and the structured beams in test set are never seen by the model during training and validation stages. To improve the model robustness and generalization ability, random rotation and horizontal flip are also introduced into the input intensity images [50]. After hundreds of iterations, the training loss and validation loss are both low enough (∼ 1 ${\times} {10^{ - 2}}$) and close to each other, showing that our model is not overfitted. The blue line in Fig. 3(a) denotes the validation accuracy evolution as the training proceeds. Here accuracy $\mathrm{{\cal A}}$ is defined as the proportion of correctly classified beams over the whole set. We then test the obtained model and summarize the results in the confusion matrix of Fig. 3(b), where $\mathrm{{\cal A}} = 99.7\mathrm{\%}$ is achieved. Only $0.02\mathrm{\%}$ of the LG beams are incorrectly recognized as MG beams, indicating the model’s performance is encouraging for further applications.

Fig. 3. (a) Convergence plots of the model from single mode dataset. Left vertical axis, training and validation loss. Right vertical axis, validation accuracy. Inset, enlarged drawing of model log after convergence. (b) Confusion matrix for the recognition of experimental unseen structured modes. The diagonal entries represent the modes are correctly predicted.

Download Full Size | PDF

Although we can attain impressive recognition results of optical beams through DNN, we are actually not aware how it can happen. That is often called “the black-box nature of neural networks” [51]. This problem is of relevance especially for medical image analysis, since a wrong diagnosis from an untrustworthy model can be detrimental to a patient [52,53]. In light of this, substantial efforts have been paid to the reasonable explanation and visualization of neural network models, which leads to some profound tools such as t-SNE [54], CAM [55] and Grad-CAM [56]. Similarly, unlocking how the model identifies different structured beams can benefit relevant applications such as communications and encryption. Here we conduct such an analysis through the help of Captum library [57]. In particular, we calculate the contributions of input intensities to the model’s final decisions. Figure 4(a) depicts six typical results, one for each category. Based on a well-trained model, the tagged numbers, e.g. $99.97\mathrm{\%}$, at the first and fourth columns of Fig. 4(a) represents the probabilities that predicted classes match with the corresponding labels. Then the integrals of the gradients of the model’s predictions with respect to the inputs are evaluated as shown in the second and fifth columns of Fig. 4(a). The greater integrated gradients are, the more important these pixels are to the correct outputs. Moreover, we use a sliding window with the size of $15 \times 15$ pixels and a stride of 5 along both intensity image dimensions to occlude the corresponding image pixels, so as to gauge the attributions of different spatial signals. Quantitative results are presented in the third and sixth columns of Fig. 4(a). As expected, the spatial regions with nontrivial intensities are more critical for the final results. The combination of integrated gradients and occlusion-based attribution could help us in structured-beam-based applications since in real-world scenarios, the receiver’s aperture is always limited in size and the information may not be fully captured especially considering beams’ divergence upon propagation [59]. These model interpretation analyses can remind us which parts of the signals are more important, whether the input modes are proper for the model and whether the predictions are convincing.

Fig. 4. (a) Attributions visualization results by interpreting the well-trained model. Two different methods are respectively used including integrated gradients and occlusion. The attributions in the 2^nd, 3^rd, 5^th and 6^th columns highlight the decisive intensity regions used for classification. (b) The conceptual illustration of structured-beam encoding scheme used for transmitting a grayscale image. Richard Feynman portrait reuse is officially permitted by Mark Roesler (CMGWorldwide.com) [58].

Download Full Size | PDF

We then demonstrate a structured-beam encoding scheme as shown in Fig. 4(b). Specially, we transform the data into independent beams, and then transmit the beams to the receiver in succession followed by decoding via the DNN. For an 8-bit-depth Richard Feynman portrait, 256 distinct beams from six categories (48 of LG, 48 of HG, 48 of BG, 48 of AV, 48 of SU(2), 16 of MG beams, detailed in Supplement 1) are used to represent 256 gray levels. In our experiments, the resolution of the image is $100 \times 100$, which corresponds to a structured light sequence with the length of 10000. After about 30 minutes of transmission and 75 seconds of decoding, the reconstructed image is displayed in Fig. 4(b). The structural similarity index measure (SSIM) as image quality metric is calculated as 0.99, indicating a decent match with the original image and also primitively demonstrating the feasibility of communication using non-orthogonal structured beams.

3.2 Multiplexed mode results

Although the mode encoding scheme in Fig. 4 works, it is obviously less efficient due to necessitating substantial well-defined modes. Therefore, it is interesting to explore whether mode shift keying (SK) system is feasible, where each mode is identified as a data symbol [60]. More specifically, we use 8 structured beams (2 of LG, 2 of HG, 1 of BG, 1 of AV, 1 of SU(2), 1 of MG beams, detailed in Supplement 1) to represent 8 bits as shown in Fig. 5(a). For instance, given a gray value of 92, one can treat it as a binary vector as (0, 1, 0, 1, 1, 1, 0, 0), then each binary number is distributed to its corresponding structured beam as a superposition coefficient. The eight beams are superposed coherently which leads to the multiplexed state. Mathematically, the final state ${E_f}$ is defined as

(4)$${E_f} = \sum\limits_{i = 1}^8 {{b_i}{E_i}\textrm{ ,}}$$

where ${b_i}$ is the $i$th binary coefficient and ${E_i}$ is the $i$th mode base. As shown in the subfigures of Fig. 5(a), the intensity and phase patterns of a multiplexed state can be exotic, thereby difficult to be decoded using conventional methods.

Fig. 5. (a) The encoding principle of structured-beam-based shift keying scheme. Eight beams from six categories are adopted as 8 bits for 256 gray levels $({{2^8} = 256} )$. (b) The intensity pattern evolutions of four different modes at different atmospheric turbulences. These modes represent gray values of (b1) 8, (b2) 29, (b3) 69 and (b4) 455. At around AT5, one can hardly recognize these modes via naked eye.

Download Full Size | PDF

To make the system more practical and the results convincing, we first investigate how the AT affects the decoding performance. AT is ubiquitous caused by temperature and pressure’s inhomogeneities and is notoriously harmful to the free-space optical communication schemes thus a decoding technique robust against various and unknown AT is of significance. Here, we digitally simulate AT phase screens according to the modified von Karman model [61]. Specifically, the phase power spectral density (PSD) of each screen can be described as

(5)$${\phi _n}(k )= 0.49{r_0}^{ - {5 / 3}}{({{k^2} + k_0^2} )^{ - {{11} / 6}}}\exp ({ - {{{k^2}} / {k_m^2}}} )\textrm{ ,}$$

where k is angular spatial frequency in rad/m that depends on the working wavelength of 532 nm, ${k_0} = 5.92/{l_0}$ and ${k_m} = 2\pi /{L_0}$ with inner scale ${l_0} = 0.01\; \textrm{m}$ and outer scale ${L_0} = 10\,\textrm{m}$. Here ${r_0}$ is known as the Fried parameter and its relation with refractive-index structure parameter $C_n^2$ measured in ${\textrm{m}^{ - 2/3}}$ is approximated as ${r_0} = {({0.423{k^2}C_n^2z} )^{ - 3/5}}$ with z the propagation distance. Besides, the subharmonic approach is used to compensate for low spatial frequencies [62]. By generating 5 independent turbulence screens with each propagation distance of 100 m, we calculate the distorted beams 500 meters away. Figure 5(b) displays five experimental beams changing over AT. With the increase of turbulence $C_n^2$, the received beams are getting more destroyed. At around AT5, we are not able to confidently distinguish one from the other, also posing challenges for the decoding scheme.

Here we classify 256 different multiplexed beams and the dataset at each degree of AT is composed of $256 \times 50 = 12800$ samples, reserving $84\%$ ($10\%$, $6\%$) of which for training (validation, test). The model’s performance is evaluated based on accuracy metric $\mathrm{{\cal A}}$ and the results are summarized in Fig. 6. First, we train a model for each AT environment (therefore 8 neural network models in total) and assess the model over all blind test sets. In Fig. 6(a), the diagonal values represent the accuracies of the models on corresponding test set with the same turbulence while the nondiagonal terms are accuracies on test sets with different turbulences. Ideally, we hope the accuracy matrix including diagonal and nondiagonal elements is composed of all ‘1.00’ values, which represents the perfect generalization ability on various turbulences. It can be seen that a trained model performs remarkably well on data samples from equal $C_n^2$ but languishes on samples from different turbulences. In Fig. 6(b), blue bars are calculated based on 7 test sets with unknown turbulences while green bars are calculated based on all 8 test sets. From the results, we can conclude that the generalization ability of models trained on weak AT environments are better than models obtained from strong AT environments. Even so, the recognition accuracies are not outstanding. To improve model’s generalization ability on unknown data samples with other turbulences, we adopt dataset scaling technique, which is proven to be effective in advancing model’s performance in neural network studies. In other words, we now construct datasets with samples under two or more AT environments to retrain the neural networks. Indeed, Figs. 6(c) and (d) reveal that the accuracies are greatly enhanced. In particular, four new models are obtained under 2 (AT1 + AT5), 3 (AT1 + AT4 + AT6), 4 (AT0 + AT2 + AT4 + AT7) and 5 (AT0 + AT2 + AT4 + AT6 + AT7) types of ATs. The comparison of Figs. 6(a)-(b) and Figs. 6(c)-(d) indicates that, as the diversity of data samples increases, the model’s generalization ability on various environments also increases. With the enhancement of generalization ability, it is more favorable for our model to be utilized in real-world scenes.

Fig. 6. Experimental performance of the trained models on test sets with various turbulences. (a) Accuracy matrix of models trained on samples with single turbulence. (b) Statistical results based on (a). (c) Accuracy matrix of models trained on samples with multiple turbulences. (d) Statistical results based on (c).

Download Full Size | PDF

Thanks to the powerful recognition capability of DNN, we then demonstrate a structured-beam-based shift keying system as illustrated in Fig. 7(a). We also transmit an 8-bit Richard Feynman gray portrait like the case in Fig. 4(b). Each gray value, e.g. 152, is encoded as a superposition state of 8 various structured beams, which are the pseudo-bases, and then the multiplexed state propagates 500 m under turbulence to the receiver. The distorted signals are injected into the trained DNN for decoding. All 10000 multiplexed states are received in a time-varying manner. Similarly, we also investigate the image transmission accuracy under unknown AT environments. In particular, the transmitted beams carrying information are distorted by AT3 but the evaluated models only see samples from AT0, AT1 + AT5, AT1 + AT4 + AT6, AT0 + AT2 + AT4 + AT7 and AT0 + AT2 + AT4 + AT6 + AT7 respectively when training. We quantitatively assess the 5 models. The SSIM values are analyzed in Fig. 7(b). Not surprisingly, as the training dataset scales, the image fidelity with respect to the ground truth increases. The lowest bit error rate (BER) in this case can be $32\%$. Notably, the SSIM value tends to approach saturation at around 0.75, we argue that one can further improve it by adding more data into the training set or adding network model parameters. Recently some Foundation models like GPT-3 [63] with at-scale parameters (hundred million level) have achieved state-of-the-art performances for various image and language tasks. Thus it is believed that as the model enlarges, the inner complexity as well as model expression ability will get better.

Fig. 7. Experimental results of structured-beam-based shift keying scheme. (a) The conceptual workflow of the data transmission system under unknown turbulences. (b) The reconstructed image quality versus training dataset. Here the structured beams carrying information are distorted by AT3 while the evaluated models are trained over beams distorted by other ATs. The image quality is enhanced as the diversity of data samples in terms of turbulence increases.

Download Full Size | PDF

Apart from AT, another pronounced challenge for optical communication is the mismatch between the receiver aperture and the beam size [2,64]. Due to the diffraction limit or the geometric divergence nature of a transmitted beam, the required whole-beam receiver aperture can be extremely large, e.g. meter-scale in [65]. To tackle this problem, in the following, we examine the model’s recognition performance for input modes with limited aperture. As depicted in Fig. 8(a), the transmitted beam diverges upon propagation. At the receiver, we suppose the size of an ideal aperture that collects all beam’s information is constantly D and the size of a cut-off aperture that filters out accessible signals is d. Ten different cases are quantitatively considered from $d/D = 0.1$ to $d/D = 1$. When $d/D = 1$, the system degenerates to the ideal case that is generally assumed in recent works [11,14–18,22,28,29,65–67]. Notably, Fig. 8(b) denotes the results from a model trained with data samples under $d/D = 1$. In general, the decoding model provides superior performance on samples under larger apertures. When $d/D < 0.4$, the recognition accuracies $\mathrm{{\cal A}} < 0.5$. When $d/D > 0.6$, the accuracies $\mathrm{{\cal A}}$ saturate at around $1.0$. Recalling the interpretation results reported in Fig. 4(a), the intensity signals at differently locations contribute differently to the decision, which suggests that one may not need to capture every part of the signals to recognize beams correctly. Inspired by dataset scaling experiments in Fig. 6 and Fig. 7, we retrain a model using data samples under $d/D = 0.5$ and $d/D = 1$ and present the blind test results in Fig. 8(c). Indeed, now the accuracies $\mathrm{{\cal A}} > 0.9$ when $d/D > 0.5$. We continue to add data $({d/D = 0.1} )$ into the previous training set and the corresponding results are shown in Fig. 8(d). Although the accuracy $\mathrm{{\cal A}} > 0.9$ at $d/D = 0.1$, the generalization ability to other small apertures e.g. $d/D = 0.15$ is still poor. It is reasonable because as shown in Fig. 8(a), when $d/D\sim 0.1$, too much intensity information is lost so that the performance of DNN collapses. The results in Fig. 8 remind us that in practice when the receiver collects very little information, the DNN’s predictions are generally not credible. While a proper aperture that takes sufficient signals can turn the model reliable again.

Fig. 8. Influences of receiver aperture sizes on the performances of the trained models. (a) Illustration of the mismatch between beam size and receiver size. Red circle denotes full receiver aperture size that collects all intensity signals. Orange circle denotes partial (effective) receiver aperture size that loses part of intensity signals. (b)-(d) The classification accuracies on different partial receivers using different models. The performance is moderately improved as the diversity of data samples in terms of receiver size increases.

Download Full Size | PDF

4. Discussions and conclusions

The above results and analyses exhibit the potential of deep neural networks in unlocking non-orthogonal spatial beams for communications (also see Supplement 1 for intensity-based modal cross-talk analyses). To improve this work, in future research, it is meaningful to investigate how different modes diverge differently and elaborately pick out those low order beams with less divergence to constitute a structured beam library, which one can use to extend the communication distance [68]. Second, when one uses bigger model architecture to improve the generalization ability for higher accuracies, the decoding speed and energy cost could be a sacrifice. To avoid this, model pruning [69] and knowledge distillation [70] can be applied to compress the network at the software level. Besides, the novel all-optical or optoelectronic neural network hardwares can be exploited as optimal decoding platforms [71]. Moreover, in our setup, SLM is used for beam preparation which is low in refresh rate resulting in ${\sim} \; 30$ minutes to send an image, therefore replacing it with a digital mirror device is definitely helpful to improve the speed. As for LG modes with the same values of p and $|\ell |$, their intensity signals are the same (degenerate intensity), to effectively maintain the recognition robustness, one should break the symmetry before intensity detection through special diffraction [72], modulation [73], interference [28], etc. Instead of weak diffusing like atmospheric turbulence, what remains enticing is to explore the effectiveness and novel functionalities e.g. encryption in strong scattering environments including ground glasses and multimode fibers [74].

To conclude, in this work by revisiting the rich families of structured light, we experimentally demonstrate that, one can utilize multiple categories of (low-order) structured light simultaneously as information carriers for large-capacity optical communications. A well-trained model can identify these non-orthogonal beams with an accuracy of $> 99\%$ in a short time (∼ 7.5 ms), loosening the rigorous alignment need in experiments. Model interpretation analysis is conducted to identify nontrivial signals. In particular, structured-beam-based 256-ary mode encoding and SK schemes are successively verified. For the SK experiments, we investigate the model’s generalization ability from two aspects: unknown turbulences and partial signals input. At an AT of $1.0 \times {10^{ - 14}}\; {\textrm{m}^{ - 2/3}}$, the neural network trained on samples with other ATs can correctly recognize $85.3\%$ of samples in the blind test set without any adaptive optics compensations, resulting in 0.75 of SSIM value for an image transmission experiment. For a receiver aperture with finite size of $d/D \ge 0.5$, the accuracy is still as high as more than $0.8$. Our platform is highly scalable and readily be extended to other wavelengths including millimeter wave, terahertz wave and acoustic waves. The insights provided by this study may inspire further investigations on multi-channel communications, high-security encryption, novel information processing, etc.

Funding

National Natural Science Foundation of China (61975087).

Acknowledgments

H. Wang thanks Jia Guo and Zhensong Wan for useful discussions.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data and codes underlying the results presented in this paper can be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. A. Forbes, M. de Oliveira, and M. R. Dennis, “Structured light,” Nat. Photonics 15(4), 253–262 (2021). [CrossRef]

2. A. E. Willner, K. Pang, H. Song, K. Zou, and H. Zhou, “Orbital angular momentum of light for communications,” Appl. Phys. Rev. 8(4), 041312 (2021). [CrossRef]

3. Y. Shen, X. Wang, Z. Xie, C. Min, X. Fu, Q. Liu, M. Gong, and X. Yuan, “Optical vortices 30 years on: OAM manipulation from topological charge to multiple singularities,” Light: Sci. Appl. 8(1), 90 (2019). [CrossRef]

4. Y. Wen, I. Chremmos, Y. Chen, G. Zhu, J. Zhang, J. Zhu, Y. Zhang, J. Liu, and S. Yu, “Compact and high-performance vortex mode sorter for multi-dimensional multiplexed fiber communication systems,” Optica 7(3), 254–262 (2020). [CrossRef]

5. Y. Yuanjie, R. Yuxuan, C. Mingzhou, A. Yoshihiko, and R.-G. Carmelo, “Optical trapping with structured light: a review,” Adv. Photonics 3(3), 034001 (2021). [CrossRef]

6. L. Torner, J. P. Torres, and S. Carrasco, “Digital spiral imaging,” Opt. Express 13(3), 873–881 (2005). [CrossRef]

7. L. Fang, Z. Wan, A. Forbes, and J. Wang, “Vectorial Doppler metrology,” Nat. Commun. 12(1), 4186 (2021). [CrossRef]

8. F. Brandt, M. Hiekkamäki, F. Bouchard, M. Huber, and R. Fickler, “High-dimensional quantum gates using full-field spatial modes of photons,” Optica 7(2), 98–107 (2020). [CrossRef]

9. H. Wang, S. Fu, and C. Gao, “Tailoring a complex perfect optical vortex array with multiple selective degrees of freedom,” Opt. Express 29(7), 10811–10824 (2021). [CrossRef]

10. J. Pan, Y. Shen, Z. Wan, X. Fu, H. Zhang, and Q. Liu, “Index-Tunable Structured-Light Beams from a Laser with an Intracavity Astigmatic Mode Converter,” Phys. Rev. Appl. 14(4), 044048 (2020). [CrossRef]

11. Y. Na and D.-K. Ko, “Deep-learning-based high-resolution recognition of fractional-spatial-mode-encoded data for free-space optical communications,” Sci. Rep. 11(1), 2678 (2021). [CrossRef]

12. N. Zhao, X. Li, G. Li, and J. M. Kahn, “Capacity limits of spatially multiplexed free-space communication,” Nat. Photonics 9(12), 822–826 (2015). [CrossRef]

13. H. Zhang, J. Zeng, X. Lu, Z. Wang, C. Zhao, and Y. Cai, “Review on fractional vortex beam,” Nanophotonics 11(2), 241–273 (2022). [CrossRef]

14. F. Cao, T. Pu, and C. Xie, “Superposition of two fractional optical vortices and the orbital angular momentum measurement by a deep-learning method,” Appl. Opt. 60(36), 11134–11143 (2021). [CrossRef]

15. M. Cao, Y. Yin, J. Zhou, J. Tang, L. Cao, Y. Xia, and J. Yin, “Machine learning based accurate recognition of fractional optical vortex modes in atmospheric environment,” Appl. Phys. Lett. 119(14), 141103 (2021). [CrossRef]

16. G. Jing, L. Chen, P. Wang, W. Xiong, Z. Huang, J. Liu, Y. Chen, Y. Li, D. Fan, and S. Chen, “Recognizing fractional orbital angular momentum using feed forward neural network,” Results Phys. 28, 104619 (2021). [CrossRef]

17. Z. Liu, S. Yan, H. Liu, and X. Chen, “Superhigh-Resolution Recognition of Optical Vortex Modes Assisted by a Deep-Learning Method,” Phys. Rev. Lett. 123(18), 183902 (2019). [CrossRef]

18. H. Luan, D. Lin, K. Li, W. Meng, M. Gu, and X. Fang, “768-ary Laguerre-Gaussian-mode shift keying free-space optical communication based on convolutional neural networks,” Opt. Express 29(13), 19807–19818 (2021). [CrossRef]

19. A. Trichili, A. B. Salem, A. Dudley, M. Zghal, and A. Forbes, “Encoding information using Laguerre Gaussian modes over free space turbulence media,” Opt. Lett. 41(13), 3086–3089 (2016). [CrossRef]

20. L. Li, G. Xie, Y. Yan, Y. Ren, P. Liao, Z. Zhao, N. Ahmed, Z. Wang, C. Bao, A. J. Willner, S. Ashrafi, M. Tur, and A. E. Willner, “Power loss mitigation of orbital-angular-momentum-multiplexed free-space optical links using nonzero radial index Laguerre–Gaussian beams,” J. Opt. Soc. Am. B 34(1), 1–6 (2017). [CrossRef]

21. Z. Wan, Y. Shen, Z. Wang, Z. Shi, Q. Liu, and X. Fu, “Divergence-degenerate spatial multiplexing towards future ultrahigh capacity, low error-rate optical communications,” Light: Sci. Appl. 11(1), 144 (2022). [CrossRef]

22. Z. Mao, H. Yu, M. Xia, S. Pan, D. Wu, Y. Yin, Y. Xia, and J. Yin, “Broad Bandwidth and Highly Efficient Recognition of Optical Vortex Modes Achieved by the Neural-Network Approach,” Phys. Rev. Appl. 13(3), 034063 (2020). [CrossRef]

23. G. Gbur, “Fractional vortex Hilbert's Hotel,” Optica 3(3), 222–225 (2016). [CrossRef]

24. L. Shi, B. Li, C. Kim, P. Kellnhofer, and W. Matusik, “Towards real-time photorealistic 3D holography with deep neural networks,” Nature 591(7849), 234–239 (2021). [CrossRef]

25. X. Yang, L. Huang, Y. Luo, Y. Wu, H. Wang, Y. Rivenson, and A. Ozcan, “Deep-Learning-Based Virtual Refocusing of Images Using an Engineered Point-Spread Function,” ACS Photonics 8(7), 2174–2182 (2021). [CrossRef]

26. J. Jiang, M. Chen, and J. A. Fan, “Deep neural networks for the evaluation and design of photonic devices,” Nat. Rev. Mater. 6(8), 679–700 (2021). [CrossRef]

27. D. Jin, Y. Chen, Y. Lu, J. Chen, P. Wang, Z. Liu, S. Guo, and X. Bai, “Neutralizing the impact of atmospheric turbulence on complex scene imaging via deep learning,” Nat. Mach. Intell. 3(10), 876–884 (2021). [CrossRef]

28. L.-F. Zhang, Y.-Y. Lin, Z.-Y. She, Z.-H. Huang, J.-Z. Li, X. Luo, H. Yan, W. Huang, D.-W. Zhang, and S.-L. Zhu, “Recognition of orbital-angular-momentum modes with different topological charges and their unknown superpositions via machine learning,” Phys. Rev. A 104(5), 053525 (2021). [CrossRef]

29. L. R. Hofer, L. W. Jones, J. L. Goedert, and R. V. Dragone, “Hermite–Gaussian mode detection via convolution neural networks,” J. Opt. Soc. Am. A 36(6), 936–943 (2019). [CrossRef]

30. H. Wang, X. Yang, Z. Liu, J. Pan, Y. Meng, Z. Shi, Z. Wan, H. Zhang, Y. Shen, X. Fu, and Q. Liu, “Deep-learning-based recognition of multi-singularity structured light,” Nanophotonics 11(4), 779–786 (2022). [CrossRef]

31. Z. Xie, T. Lei, F. Li, H. Qiu, Z. Zhang, H. Wang, C. Min, L. Du, Z. Li, and X. Yuan, “Ultra-broadband on-chip twisted light emitter for optical communications,” Light: Sci. Appl. 7(4), 18001 (2018). [CrossRef]

32. J. Pinnell, I. Nape, B. Sephton, M. A. Cox, V. Rodríguez-Fajardo, and A. Forbes, “Modal analysis of structured light with spatial light modulators: a practical tutorial,” J. Opt. Soc. Am. A 37(11), C146–C160 (2020). [CrossRef]

33. G. C. G. Berkhout, M. P. J. Lavery, J. Courtial, M. W. Beijersbergen, and M. J. Padgett, “Efficient Sorting of Orbital Angular Momentum States of Light,” Phys. Rev. Lett. 105(15), 153601 (2010). [CrossRef]

34. Y. Wen, I. Chremmos, Y. Chen, J. Zhu, Y. Zhang, and S. Yu, “Spiral Transformation for High-Resolution and Efficient Sorting of Optical Vortex Modes,” Phys. Rev. Lett. 120(19), 193904 (2018). [CrossRef]

35. L. Allen, M. W. Beijersbergen, R. J. C. Spreeuw, and J. P. Woerdman, “Orbital angular momentum of light and the transformation of Laguerre-Gaussian laser modes,” Phys. Rev. A 45(11), 8185–8189 (1992). [CrossRef]

36. Y. Shen, Y. Meng, X. Fu, and M. Gong, “Wavelength-tunable Hermite-Gaussian modes and an orbital-angular-momentum-tunable vortex beam in a dual-off-axis pumped Yb:CALGO laser,” Opt. Lett. 43(2), 291–294 (2018). [CrossRef]

37. O. Céspedes Vicente and C. Caloz, “Bessel beams: a unified and extended perspective,” Optica 8(4), 451–457 (2021). [CrossRef]

38. N. K. Efremidis, Z. Chen, M. Segev, and D. N. Christodoulides, “Airy beams and accelerating waves: an overview of recent advances,” Optica 6(5), 686–701 (2019). [CrossRef]

39. Y. Shen, “Rays, waves, SU(2) symmetry and geometry: toolkits for structured light,” J. Opt. 23(12), 124004 (2021). [CrossRef]

40. S. Chávez-Cerda, J. C. Gutiérrez-Vega, and G. H. C. New, “Elliptic vortices of electromagnetic wave fields,” Opt. Lett. 26(22), 1803–1805 (2001). [CrossRef]

41. M. A. Cox, N. Mphuthi, I. Nape, N. Mashaba, L. Cheng, and A. Forbes, “Structured Light in Turbulence,” IEEE J. Sel. Top. Quantum Electron. 27(2), 1–21 (2021). [CrossRef]

42. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), 770–778.

43. U. Levy, S. Derevyanko, and Y. Silberberg, “Light Modes of Free Space,” in Prog. Opt., T. D. Visser, ed. (Elsevier, 2016), Chap. 4, pp. 237–281.

44. V. Arrizón, U. Ruiz, R. Carrada, and L. A. González, “Pixelated phase computer holograms for the accurate encoding of scalar complex fields,” J. Opt. Soc. Am. A 24(11), 3500–3507 (2007). [CrossRef]

45. A. Forbes, A. Dudley, and M. McLaren, “Creation and detection of optical modes with spatial light modulators,” Adv. Opt. Photonics 8(2), 200–227 (2016). [CrossRef]

46. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 (2014).

47. J. Deng, W. Dong, R. Socher, L. J. Li, L. Kai, and F.-F. Li, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2009), 248–255.

48. R. Zhu, T. Qiu, J. Wang, S. Sui, C. Hao, T. Liu, Y. Li, M. Feng, A. Zhang, C.-W. Qiu, and S. Qu, “Phase-to-pattern inverse design paradigm for fast realization of functional metasurfaces via transfer learning,” Nat. Commun. 12(1), 2974 (2021). [CrossRef]

49. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” arXiv:1912.01703 (2019).

50. A. Mikołajczyk and M. Grochowski, “Data augmentation for improving deep learning in image classification problem,” in 2018 International Interdisciplinary PhD Workshop (IIPhDW) (IEEE, 2018), 117–122.

51. X. Li, H. Xiong, X. Li, X. Wu, X. Zhang, J. Liu, J. Bian, and D. J. A. Dou, “Interpretable Deep Learning: Interpretations, Interpretability, Trustworthiness, and Beyond,” arXiv:2103.10689 (2021).

52. S. M. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” arXiv:1705.07874 (2017).

53. D. T. Huff, A. J. Weisman, and R. Jeraj, “Interpretation and visualization techniques for deep learning models in medical imaging,” Phys. Med. Biol. 66(4), 04TR01 (2021). [CrossRef]

54. L. van der Maaten and G. Hinton, “Visualizing Data using t-SNE,” J. Mach. Learn. Res. 9, 2579–2605 (2008).

55. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), 2921–2929.

56. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” Int. J. Comput. Vis. 128(2), 336–359 (2020). [CrossRef]

57. N. Kokhlikyan, V. Miglani, M. Martin, E. Wang, B. Alsallakh, J. Reynolds, A. Melnikov, N. Kliushkina, C. Araya, S. Yan, and O. J. A. Reblitz-Richardson, “Captum: A unified and generic model interpretability library for PyTorch,” arXiv:2009.07896 (2020).

58. “Richard P. Feynman Portrait,” (Princeton University Press, https://press.princeton.edu/our-authors/feynman-richard-p).

59. G. Xie, L. Li, Y. Ren, H. Huang, Y. Yan, N. Ahmed, Z. Zhao, M. P. J. Lavery, N. Ashrafi, S. Ashrafi, R. Bock, M. Tur, A. F. Molisch, and A. E. Willner, “Performance metrics and design considerations for a free-space optical orbital-angular-momentum-multiplexed communication link,” Optica 2(4), 357–365 (2015). [CrossRef]

60. S. Fu, Y. Zhai, H. Zhou, J. Zhang, T. Wang, X. Liu, and C. Gao, “Experimental demonstration of free-space multi-state orbital angular momentum shift keying,” Opt. Express 27(23), 33111–33119 (2019). [CrossRef]

61. J. D. Schmidt, “Propagation through atmospheric turbulence,” in Numerical Simulation of Optical Wave Propagation with Examples in MATLAB (SPIE, 2010), Chap. 9, pp. 149–184.

62. S. Fu and C. Gao, “Influences of atmospheric turbulence effects on the orbital angular momentum spectra of vortex beams,” Photonics Res. 4(5), B1–B4 (2016). [CrossRef]

63. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. J. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. J. A. Amodei, “Language Models are Few-Shot Learners,” arXiv:2005.14165 (2020).

64. S. Zheng, X. Hui, J. Zhu, H. Chi, X. Jin, S. Yu, and X. Zhang, “Orbital angular momentum mode-demultiplexing scheme with partial angular receiving aperture,” Opt. Express 23(9), 12251–12257 (2015). [CrossRef]

65. M. Krenn, J. Handsteiner, M. Fink, R. Fickler, R. Ursin, M. Malik, and A. Zeilinger, “Twisted light transmission over 143 km,” Proc. Natl. Acad. Sci. U.S.A. 113(48), 13648–13653 (2016). [CrossRef]

66. X. Wang, Y. Qian, J. Zhang, G. Ma, S. Zhao, R. Liu, H. Li, P. Zhang, H. Gao, F. Huang, and F. Li, “Learning to recognize misaligned hyperfine orbital angular momentum modes,” Photonics Res. 9(4), B81–B86 (2021). [CrossRef]

67. Y. Qian, H. Chen, P. Huo, X. Wang, S. Gao, P. Zhang, H. Gao, R. Liu, and F. Li, “Towards fine recognition of orbital angular momentum modes through smoke,” Opt. Express 30(9), 15172–15183 (2022). [CrossRef]

68. M. J. Padgett, F. M. Miatto, M. P. J. Lavery, A. Zeilinger, and R. W. Boyd, “Divergence of an orbital-angular-momentum-carrying beam upon propagation,” New J. Phys. 17(2), 023011 (2015). [CrossRef]

69. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning Filters for Efficient ConvNets,” arXiv:1608.08710 (2016).

70. G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” arXiv:1503.02531 (2015).

71. C. Li, X. Zhang, J. Li, T. Fang, and X. Dong, “The challenges of modern computing and new opportunities for optics,” PhotoniX 2(1), 20 (2021). [CrossRef]

72. B. P. da Silva, B. A. D. Marques, R. B. Rodrigues, P. H. S. Ribeiro, and A. Z. Khoury, “Machine-learning recognition of light orbital-angular-momentum superpositions,” Phys. Rev. A 103(6), 063704 (2021). [CrossRef]

73. J. Wang, S. Fu, Z. Shang, L. Hai, and C. Gao, “Adjusted EfficientNet for the diagnostic of orbital angular momentum spectrum,” Opt. Lett. 47(6), 1419–1422 (2022). [CrossRef]

74. F. Feng, J. Hu, Z. Guo, J.-A. Gan, P.-F. Chen, G. Chen, C. Min, X. Yuan, and M. Somekh, “Deep Learning-Enabled Orbital Angular Momentum-Based Information Encryption Transmission,” ACS Photonics 9(3), 820–829 (2022). [CrossRef]

Deep-learning-assisted communication capacity enhancement by non-orthogonal state recognition of structured light

Abstract

1. Introduction

2. Methods

3. Results

3.1 Single mode results

3.2 Multiplexed mode results

4. Discussions and conclusions

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (8)

Equations (5)

Optics Express