Vision transformers motivating superior OAM mode recognition in optical communications

Badreddine Merabet; Bingyi Liu; Zhixiang Li; Jinglong Tian; Kai Guo; Syed Afaq Ali Shah; Syed Afaq Ali Shah; Zhongyi Guo

doi:10.1364/OE.504841

1. Introduction

In recent years, there has been increasing interest in light beams carrying orbital angular momentum (OAM) due to its great potentials in dynamical manipulations and information process. Since it was first demonstrated in 1992 by Allen et al. [1] that each photon in vortex beam carries OAM of l$\hbar$, where l represents the topological charge and $\hbar$ represents the reduced Planck constant, OAM is playing a crucial role in optical manipulation [2,3] and optical communication [4,5]. To address the ever-increasing demands of data traffic and high-speed wireless communication, it is essential to efficiently transmit and receive OAM signals with high fidelity while maintaining large-capacity transmission [6–11]. The primary challenge in OAM communication lies in accurately recognizing the topological charge at the communication channel receiving end. In real-world free-space optical (FSO) communication scenarios, OAM beams propagate through air thus encounter atmospheric turbulence (AT), which distort the OAM beam and give rise to intermodal crosstalk [12]. To address this issue, several methods have been proposed to mitigate the effects of atmospheric turbulence on the beam. These methods range from much traditional approaches like the auto-alignment system [13], algorithm-based compensation [14–16], and adaptive optics-based compensation technique [17–20] to more advanced methods involving machine learning, such as self-organizing map (SOM) [21], support vector machine (SVM) [22–24], and multi-class support vector machine (MSVM) [24]. Additionally, deep learning methods like deep neural network (DNN) [25,26], diffractive deep neural network (D²NN) [27–29], and various variations of convolutional neural network (CNN) such as ResNet [30], DenseNet [31], AlexNet [32], and others [33–37], have been explored to tackle the influence of atmospheric turbulence on OAM beams.

These advanced machine learning and deep learning techniques offer potential solutions to the challenges posed by atmospheric turbulence in OAM-based FSO communication. By leveraging these methods, it becomes possible to improve the recognition accuracy of the OAM mode and compensate for AT, leading to more reliable OAM communication systems. Among all these methods reported above, CNN have demonstrated highest effectiveness in the detection and recognition of OAM modes under low to medium AT. However, as the AT strength and transmission distance increase the mode recognition accuracy is known to drop significantly and the classification ability of CNN becomes limited. The only viable way to achieve good communication performance under strong turbulence using CNN is increasing the complexity of CNN models which is not desired. Very recently, vision transformers (ViT) has garnered significant attention and showcased remarkable effectiveness in computer vision tasks, surpassing traditional CNN [38]. Although ViT has remarkably better performance in image recognition, to the best of our knowledge no work of exploring ViT for OAM mode recognition is reported till date.

In this work, we explore and optimize the recognition of OAM modes in OAM-based FSO communication using ViT. Specifically, we utilize the ViT-Base 16 variant, which has been pretrained on the IMAGENET dataset to make feature extraction easier and achieve high accuracy in recognizing both simple and complex OAM modes effectively. We briefly introduce the ViT and OAM mode recognition based on ViT. Our simulations involve assessing different AT strength, including medium and strong AT conditions. Additionally, we conduct a comparative analysis of the prediction time for individual OAM mode and accuracy between the ViT model and EfficientNet model [39], one of the strongest CNN architecture, revealing that ViT-based OAM detection has considerably better results both in terms of accuracy and prediction time. These findings may help to achieve high-capacity OAM-based optical communications technology in the future.

2. Methods

2.1 OAM generation

The Laguerre-Gaussian (LG) beam is the most common type of OAM beam in optical communication systems. It represents a specific solution to the Helmholtz equation in a cylindrical coordinate system, assuming a paraxial approximation. The LG beam's field distribution can be mathematically expressed as follows [40]:

(1)$$\begin{aligned} {u_{LG(l,p)}}({r,\varphi ,z} )&= \sqrt {\frac{{2p!}}{{({\pi ({p + |l |} )!} )}}} \frac{1}{{w(z )}}{\left( {\frac{{r\sqrt 2 }}{{w(z )}}} \right)^{|l |}}L_p^{|l |}\left( {\frac{{2{r^2}}}{{{w^2}(z )}}} \right)\\& \times \exp \left( {\frac{{ - {r^2}}}{{{w^2}(z )}} - \frac{{ik{r^2}z}}{{2R(z )}}} \right)\\& \times \exp \left[ {i({2p + |l |+ 1} ){{\tan }^{ - 1}}\frac{z}{{{z_R}}}} \right]\exp ({il\varphi } )\end{aligned}$$

where $l$ is the OAM mode value and is called topological charge which represents the phase change along the directional angle, $\lambda $ is the wavelength, $k = 2\pi /\lambda $ is the wavenumber, p is the radial indices, $w(z )= {w_0}\sqrt {1 + {{(z/{z_R})}^2}} $ represents the beam radius at a distance z away from the beam waist, in which ${w_0}$ is the beam waist and ${z_R} = \pi w_0^2/\lambda$ is the Rayleigh range, $L_p^{|l |}({\cdot} )$ is the generalized Laguerre polynomial, and $(2p + |l |+ 1){\tan ^{ - 1}}(z/{z_R})$ is the Gouy phase. For LG beams, l and p determine its light field distribution. When l = 0 and p = 0, the Eq. (1) becomes the light field expression of the Gaussian beam; When l $\ne $ 0 and p = 0, the light intensity images of LG beams demonstrate a ring-shaped distribution like a donut, and its halo radius increases with the increase of l. The light intensity distributions and phase distributions of the LG beams are shown in Fig. 1. With the transformation of l, while keeping p at 0, the light intensity distributions will vary accordingly.

Fig. 1. Intensity distributions and phase distributions of vortex beams with OAM mode (l = + 1, + 3, + 5). (a-c) Intensity distributions of vortex beams. (d-f) Phase distributions of vortex beams.

Download Full Size | PDF

2.2 Atmospheric turbulence (AT)

AT is the manifestation of perturbations from inhomogeneous media resulted from random variations in convective motion and temperature causing random fluctuations in the air’s refractive index. AT is responsible for causing phase distortion at different cross-sectional locations of the propagating OAM beams. These phase distortions give rise to intermodal crosstalk, which strongly limits the performance of OAM-based FSO communication systems [41]. To ensure robust communication in optical communication systems, it is necessary to model and analyze the properties of atmospheric turbulence channels. In order to introduce turbulence for OAM modes in free-space optical communication, a useful approach is to simulate AT by inserting random phase screens along the propagation path of the vortex beam. The stochastic phase screen method simulate the AT model by equally dividing the transmission path into multiple segments. Each segment is equivalent to a phase screen with respect to the refractive index power spectrum of the selected AT model, and the vortex light is transmitted in multiple phase screens hence introducing phase distortions to the OAM beams, mimicking the effects of atmospheric turbulence. By simulating turbulence in this way, researchers can study the performance of OAM-based communication systems under realistic conditions and develop strategies to mitigate the adverse effects caused by turbulence. For this work numerical simulation of turbulence channel was simulated by using Hill and Andrews model [42,43]:

(2)$$\begin{aligned} {\Phi _n}({k_x},{k_\textrm{y}}) &= 0.033C_n^2[1 + 1.802\sqrt {\frac{{k_x^2 + k_y^2}}{{k_l^2}}} - 0.254{(\frac{{k_x^2 + k_y^2}}{{k_l^2}})^{\frac{7}{{12}}}}]\\ &\times \exp ( - \frac{{k_x^2 + k_y^2}}{{k_l^2}}){({k_x} + {k_\textrm{y}} + \frac{1}{{L_0^2}})^{\frac{{ - 11}}{6}}} \end{aligned}$$

where ${k_x}$ and ${k_y}$ represent the frequency wave number of x and y axes respectively in the space coordinate system, L₀ and l₀ are the outer scale and inner scale of AT, respectively, and ${k_l} = 3.3/{l_0}$. The parameters of numerical simulation are listed in Table 1, and the schematic diagram of OAM-based FSO communication system under AT with the ViT for recognizing OAM modes is shown in Fig. 2.

Table 1. Simulation parameters

View Table

Fig. 2. Schematic diagram of OAM-based FSO communication system under AT with the ViT for recognizing OAM modes.

Download Full Size | PDF

2.3 Principle of the ViT model for OAM recognition under AT

Machine learning and deep learning are two rapidly developing research fields. Especially in computer vision tasks, CNN dominated the research scene due to its unique advantages and became state-of-the-art in image recognition tasks. However, very recently ViT (vision transformer) came into the competition and is outperforming CNN in many tasks. Initially, transformer models [44] were exclusively utilized in the field of natural language processing (NLP). These models found application in tasks such as question answering [45], language translation [46] and text classification [47]. The Transformer has already dominated the field of NLP. To prove that transformers has dominated the NLP field, many researchers, such as Vaswani et al. [44], pioneered the development of a Transformer technique that revolutionized machine translation through the use of attention mechanisms. In a significant advancement, a research team at Google led by Devin et al. [48] introduced bidirectional encoder representations from transformers (BERT) in 2008. BERT, comprising an impressive 340 million parameters, had a profound impact on future research. Leveraging the self-attention mechanism of BERT, the obtained results exhibited exceptional representational capabilities, enabling the extraction of intrinsic features.

After transformers proved themselves in NLP, another variation called vision transformers came to take place in computer vision tasks. In addition to achieving top performance on different image recognition benchmarks, it has already shown excellent results in segmentation [49], object detection [50], video understanding [51], and more. Thanks to the self-attention mechanism, ViT captures more global information from the input image and delivers the best results. Moreover, it can achieve state-of-the-art performance with less training data compared to CNN, due to pre-training on a large-scale dataset. ViT is also more interpretable than CNN, as the self-attention mechanism allows visualization of which parts of the input are being attended to by the model. Figure 3 represents a model overview of ViT.

Fig. 3. Model overview of ViT. It divides an image into fixed-size patches, embed them linearly, add position embeddings, and feed the resulting sequence to a transformer encoder. For classification, it append a learnable “classification token” to the sequence.

Download Full Size | PDF

The ViT model's ability to successfully analyze and understand the spatial layout and content of the input image is achieved through a coordinated arrangement of series of steps. The ViT first takes the image and divides it into small, fixed-size square patches. These patches are then linearly projected to form a sequence of token embeddings, with each patch considered as a separate token, allowing the model to process them individually. After the input image is patched, to complement the token embeddings with spatial information along with relative positions of the patches, positional embeddings are added to the token embeddings. This step serves to provide spatial information and encode the relative positions of the patches within the image. Subsequently, the token embeddings along with the positional encodings are passed to the Transformer encoder layers. Each layer consists of self-attention mechanisms and feed-forward neural networks (FFNs). The self-attention mechanism enables the model to focus on different parts of the input image and capture dependencies between patches. Basically the attention is a matrix where each cell represents the attention magnitude that the source token (row) pays to the target (column). The FFNs process the intermediate representations to generate more complex features [52,53]. Overall, this process allows the ViT model to effectively analyze and understand the content and spatial layout of the input image by leveraging self-attention and Transformer-based architecture.

3. Simulation results and discussion

We conducted all our simulations using the NVIDIA Tesla T4 GPU with 16GB of VRAM. As for the model, we utilized ViT-B/16, which we pre-trained on IMAGENET, which is a database made by Stanford researchers [54], consisting of 14 million images categorized into more than 20,000 classes. The main reason of using IMAGENET is to make object recognition and classification easier, robust and less computer resource consuming. Our aim was to demonstrate the strong performance of ViT in OAM recognition, so we conducted several simulations to evaluate the model's capabilities.

Initially, we trained our model using a limited set of four OAM modes (l = 0, + 2, + 4, + 6). For each mode, we gathered 500 training samples and 100 testing samples, employing a medium AT ($C_N^2 = 1\, \times {10^{ - 15}}$) and transmitting over a distance of 2000 meters, as can be seen from Fig. 4. Remarkably, our model demonstrated exceptional performance, achieving a 100% accuracy, as can be seen from Fig. 5(a). All tested OAM modes are recognized correctly; demonstrating that the ViT can distinguish all the OAM modes as is evident from the confusion matrix in Fig. 5(b) with no misclassification for any OAM state, hence achieving perfect classification. Subsequently, we conducted another training, this time expanding the range of OAM modes from +1 to +8, with the same AT strength. Surprisingly, we observed that compared to previous reports, the ViT model rapidly converged during training, even with a relatively small dataset [37]. The model achieved a high accuracy of 99.7%, as can be seen from Fig. 5(c). As for the confusion matrix in Fig. 5(d), almost all tested OAM modes are recognized correctly with only two wrong predictions, demonstrating that the small separation can also be segmented clearly.

Fig. 4. The intensity images of received LG beams (l from 1 to 8) after transmitting 2000 m under weak AT $(C_N^2 = 1\, \times {10^{ - 16}})$ and medium AT ($C_N^2 = 1\, \times {10^{ - 15}}$).

Download Full Size | PDF

Fig. 5. OAM modes recognition accuracy. (a) Four single OAM modes (l = 0, + 2, + 4, + 6) based on ViT. Accuracy and loss of recognizing four single OAM modes with a 2000 m transmission distance under medium AT strength $(C_N^2 = 1\, \times {10^{ - 15}})$. (b) the corresponding confusion matrix. (c) Performance of recognizing eight single OAM modes (l = 0 to +7) based on ViT. Accuracy and loss of recognizing eight single OAM modes with a 2000 m transmission distance under medium AT strength ($C_N^2 = 1\, \times {10^{ - 15}})$. (d) the corresponding confusion matrix.

Download Full Size | PDF

After obtaining promising results with the ViT model in the previous simulations, we proceeded to explore multiplexed OAM modes. The dataset consists of 16 different OAM modes made by the multiplexing of 4 modes (+1, + 3, + 5, + 7) as depicted in Fig. 6. First, we trained the ViT model on a dataset with medium AT ($C_N^2 = 1\, \times {10^{ - 15}}$). Notably, the ViT model exhibited rapid convergence, which can be attributed to its pretraining. This suggests that the ViT model can adapt quickly to new problems, even when trained on a smaller dataset compared to the initial simulations as can be seen from Fig. 7(a). In this case, we utilized 250 samples for each class during the training phase and 50 samples for testing purposes. From the confusion matrix in Fig. 7(b), we can see that for all the 16 modes, the ViT recognize them without any wrong prediction much thanks to the attention mechanism of the ViT.

Fig. 6. The intensity images of received multiplexed LG beams after transmitting 2000 m under weak AT $({C_N^2 = 1\, \times {{10}^{ - 16}}} ),$ medium AT $({C_N^2 = 1\, \times {{10}^{ - 15}}} )$ and strong AT $(C_N^2 = 1\, \times {10^{ - 14}})$.

Download Full Size | PDF

Fig. 7. Recognition accuracy of 16 multiplexed OAM modes based on ViT. (a) Accuracy and loss of recognizing 16 multiplexed OAM modes with a 2000 m transmission distance under medium AT ($C_N^2 = 1\, \times {10^{ - 15}}$). (b) Corresponding confusion matrix.

Download Full Size | PDF

After that a strong AT ($C_N^2 = 1\, \times {10^{ - 14}}$) was applied, the training in this case is similar to previous case, the recognition accuracy of ViT dropped to 88%. To solve this problem, we first trained the ViT with batch size of 1024, after which the model starts to stabilize. Once stabilized, we kept reducing the batch size to half until a batch size of 64 is reached and the recognition accuracy reached nearly 98% as can be seen from Fig. 8. With large batch size the model will learn more generalized features, while smaller batch sizes introduce more noise and randomness into the training process. Also, we can see that all the wrong predictions in the results are OAM beams that have similar shape as the true prediction for example class 14 that contain the OAM beams with l = + 1, + 3, + 5 and class 15 with l = + 1, + 3, + 5, + 7 or class 1 with l = 1 and class 2 with l = 3, as can be seen from Fig. 9.

Fig. 8. Performance of recognizing 16 multiplexed OAM modes based on ViT. (a) Accuracy and loss of recognizing 16 multiplexed OAM modes with a 2000 m transmission distance under strong AT ($C_N^2 = 1 \times {10^{ - 14}}$) with different batch sizes. (b) Zoom in of the accuracy graph with batch size = 64. (c) Zoom in of the loss graph with batch size = 64.

Download Full Size | PDF

Fig. 9. The confusion matrix of 16 multiplexed OAM modes.

Download Full Size | PDF

After achieving good results with the ViT, we proceeded to investigate the prediction time of one sample at a time and compared it with EfficientNet, one of the strongest CNN architectures. EfficientNet is based on the idea of compound scaling CNN networks at various points (width and depth of the network and resolution of the input image) to achieve high accuracy. To increase the model depth more network layers are added, to widen the network the number of neurons in one layer are increased and high-resolution input images are provided to capture more features. For this purpose, we used a dataset of multiplexed OAM beams with medium AT ($C_N^2 = 1\, \times {10^{ - 15}}$). The EfficientNet was also pretrained on ImageNet, and was fine-tuned in the same manner as the ViT model. The accuracy of the two models is nearly the same: with the ViT, accuracy of 99.9% is achieved, while with the EfficientNet, the achieved accuracy is 99.38%. In terms of prediction time, the average prediction time for the ViT is up to 35% faster than that of EfficientNet. This significant difference is particularly noteworthy, especially when considering that we are dealing with predictions on light beams, which involve the fastest information transformation known to date. It comes to solve the problem of many fields that are developing too fast and demand high-speed transmissions, such as the Internet of Things, self-driving cars, and others. Every millisecond counts and can make a substantial difference in the performance of communication systems, where efficiency and speed are crucial factors to ensure seamless data transmission and reception. In addition, as can be seen from Fig. 10(a), under medium AT ($C_N^2 = 1 \times {10^{ - 15}}$), ViT has given considerably improved prediction accuracy as compared to EfficientNet after training for 10 epochs for 4 OAM modes, 8 OAM modes and 16 multiplexed OAM modes respectively, showcasing the potential it has in future OAM-FSO communications.

Fig. 10. (a) Performance comparison of recognizing OAM modes between ViT and EfficientNet for 10 epochs under medium AT strengths $(C_N^2 = 1\, \times {10^{ - 15}})$ for four single OAM modes (l = 0, + 2, + 4, + 6), Eight Single OAM Modes (l = 0 to +7) and 16 multiplexed OAM modes. (b) simulation of transmitting of a picture using the multiplexed OAM modes.

Download Full Size | PDF

To further illustrate the versatility of the ViT model in managing OAM communication, we conducted a demonstrative simulation. This simulation involved the transmission of a high-resolution image using multiplexed OAM modes over an extended distance of 2000 meters, while operating under medium AT. This simulation is aimed to demonstrate the robustness of the ViT model in real-world scenarios. A grayscale image with dimensions of 200 × 200 pixels is used, while 240,000 OAM beams are generated during data transmission, as depicted in Fig. 10(b). The image chosen for this simulation was carefully selected to represent a challenging scenario in terms of data integrity and image quality. Despite the challenging atmospheric conditions and the considerable distance involved, the received image exhibited remarkable quality without any discernible loss of information with a BER of 0.027% showcasing the potential ViT has for future FSO-communication.

4. Conclusion

In this study, we employed ViT for the recognition of OAM beams passing through the AT. Notably, we are the first to apply the ViT model to address this specific problem. To evaluate the performance, we utilized different datasets comprising simple OAM modes with medium AT, multiplexed OAM modes with medium AT, and multiplexed OAM modes with strong AT. Our results demonstrate the effectiveness of the ViT in OAM recognition by outperforming the traditional CNN based mode sorting methods. The simulations reveal that for the recognition of 4 simple OAM modes within a distance of 2000 meters under medium AT, the accuracy reached 100%. For a more diverse set of 8 simple OAM modes with medium AT, the model exhibited exceptional accuracy, reaching 99.7%. Moreover, the ViT robustness was further highlighted in recognizing 16 multiplexed OAM beams with medium AT, yielding an accuracy of nearly 100%. Even in the presence of strong AT, the model maintained impressive performance, with the recognition accuracy for 16 multiplexed OAM beams nearing 98%. These results are promising, as they indicate the model's potential for real-world applications, especially in scenarios with challenging atmospheric conditions. Another important aspect to consider in practical applications is the computational efficiency of the model. Our simulations showed that the average prediction time for the ViT is up to 35% faster than EfficientNet. This signifies the ViT suitability for real-time and latency-sensitive applications, where quick decisions are crucial. The outstanding accuracies achieved across different OAM modes and atmospheric turbulence levels highlight the model's robustness and potential for various real-world applications. Our findings contribute significantly to the field of computer vision and pattern recognition. In addition, studying the applicability of the ViT-based method to other OAM carrying beams including Bessel-Gauss (BG) beams may be one of the directions for future work. A combination of ViT with different adaptive demodulation algorithms and turbulence compensation techniques may be explored for further enhancing the recognition accuracy, and will be considered in our future investigations.

Funding

National Natural Science Foundation of China (61775050).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. L. Allen, M. W. Beijersbergen, R. J. C. Spreeuw, et al., “Orbital angular momentum of light and the transformation of Laguerre-Gaussian laser modes,” Phys. Rev. A 45(11), 8185–8189 (1992). [CrossRef]

2. D. Gao, W. Ding, M. Nieto-Vesperinas, et al., “Optical manipulation from the microscale to the nanoscale: fundamentals, advances and prospects,” Light: Sci. Appl. 6(9), e17039 (2017). [CrossRef]

3. B. Zhang, Z. D. Hu, J. Wu, et al., “Metasurface-based perfect vortex beams with trigonometric-function topological charge for OAM manipulation,” Opt. Lett. 48(9), 2409–2412 (2023). [CrossRef]

4. M. Cheng, L. Guo, J. Li, et al., “Channel Capacity of the OAM-Based Free-Space Optical Communication Links With Bessel–Gauss Beams in Turbulent Ocean,” IEEE Photonics J. 8(1), 1–11 (2016). [CrossRef]

5. K. Cheng, Z. Liu, Z. D. Hu, et al., “Generation of integer and fractional perfect vortex beams using all-dielectric geometrical phase metasurfaces,” Appl. Phys. Lett. 120(20), 1 (2022). [CrossRef]

6. Z. Guo, C. F. Gong, H. J. Liu, et al., “Research advances of orbital angular momentum based optical communication technology,” Opto-Electronic Engineering 47(3), 1–34 (2020). [CrossRef]

7. S. Fu, Y. Zhai, H. Zhou, et al., “Demonstration of high-dimensional free-space data coding/decoding through multi-ring optical vortices,” Chin. Opt. Lett. 17(8), 080602 (2019). [CrossRef]

8. Z. Guo, A. Shporer, K. Hambleton, et al., “Tidally Excited Oscillations in Heartbeat Binary Stars: Pulsation Phases and Mode Identification,” Astrophys. J. 888(2), 95 (2020). [CrossRef]

9. Z. Guo, Z. Wang, M. Irene, et al., “The Orbital Angular Momentum Encoding System With Radial Indices of Laguerre–Gaussian Beam,” IEEE Photonics J. 10(5), 1–11 (2018). [CrossRef]

10. Z. Guo, Z. Pan, C. Gong, et al., “Research on router device of OAM optical communication,” Journal on Communications 41(11), 185–197 (2020). [CrossRef]

11. H. Zhou, Z. Pan, M. I. Dedo, et al., “High-efficiency and high-precision identification of transmitting orbital angular momentum modes in atmospheric turbulence based on an improved convolutional neural network,” J. Opt. 23(6), 065701 (2021). [CrossRef]

12. X. Ke and J. Wang, “Research progress of orbital angular momentum multiplexing communication in Xi'an University of Technology,” in 2017 IEEE/CIC International Conference on Communications in China (ICCC). (2017).

13. C. Cai, Y. Zhao, J. Zhang, et al., “Experimental demonstration of an underwater wireless optical link employing orbital angular momentum (OAM) modes with fast auto-alignment system,” in 2019 Optical Fiber Communications Conference and Exhibition (OFC). (IEEE, 2019). [CrossRef]

14. Y. Ren, H. Huang, J. Y. Yang, et al., “Correction of phase distortion of an OAM mode using GS algorithm based phase retrieval,” in 2012 Conference on Lasers and Electro-Optics (CLEO). (IEEE, 2012).

15. M. I. Dedo, Z. Wang, K. Guo, et al., “OAM mode recognition based on joint scheme of combining the Gerchberg–Saxton (GS) algorithm and convolutional neural network (CNN),” Opt. Commun. 456, 124696 (2020). [CrossRef]

16. H. Zhao, W. Deng, J. Li, et al., “Modified Gerchberg-Saxton algorithm-based probe-free wavefront distortion compensation of an OAM beam,” Optik 269, 169816 (2022). [CrossRef]

17. M. Li, M. Cvijetic, Y. Takashima, et al., “Evaluation of channel capacities of OAM-based FSO link with real-time wavefront correction by adaptive optics,” Opt. Express 22(25), 31337–31346 (2014). [CrossRef]

18. H. Chang, X. Yin, H. Yao, et al., “Low-complexity adaptive optics aided orbital angular momentum based wireless communications,” IEEE Trans. Veh. Technol. 70(8), 7812–7824 (2021). [CrossRef]

19. L. Zhu, H. Yao, H. Chang, et al., “Adaptive optics for orbital angular momentum-based internet of underwater things applications,” IEEE Internet Things J. 9(23), 24281–24299 (2022). [CrossRef]

20. B. Zhang, Z.D. Hu, J. Wang, et al., “Creating perfect composite vortex beams with a single all-dielectric geometric metasurface: erratum,” Opt. Express 31(1), 774–775 (2023). [CrossRef]

21. M. Krenn, J. Handsteiner, M. Fink, et al., “Twisted light transmission over 143 km,” Proc. Natl. Acad. Sci. 113(48), 13648–13653 (2016). [CrossRef]

22. X. Li, J. Huang, and L. Sun, “Identification of Orbital Angular Momentum by Support Vector Machine in Ocean Turbulence,” Journal of Marine Science and Engineering 10(9), 1284 (2022). [CrossRef]

23. E. Lamilla, C. Sacarelo, M. S. Alvarez-Alvarado, et al., “Optical Encoding Model Based on Orbital Angular Momentum Powered by Machine Learning,” Sensors 23(5), 2755 (2023). [CrossRef]

24. R. Sun, L. Guo, M. Cheng, et al., “Identifying orbital angular momentum modes in turbulence with high accuracy via machine learning,” J. Opt. 21(7), 075703 (2019). [CrossRef]

25. A. ElHelaly, M. Kafafy, A. H. Mehanna, et al., “Hybrid machine learning detection for orbital angular momentum over turbulent MISO wireless channel,” IET Communications 14(22), 4116–4126 (2020). [CrossRef]

26. S. Lohani, E. M. Knutson, M. O’Donnell, et al., “On the use of deep neural networks in optical communications,” Appl. Opt. 57(15), 4180–4190 (2018). [CrossRef]

27. Z. Huang, P. Wang, J. Liu, et al., “All-Optical Signal Processing of Vortex Beams with Diffractive Deep Neural Networks,” Phys. Rev. Appl. 15(1), 014037 (2021). [CrossRef]

28. P. Wang, X. Wenjie, H. Zebin, et al., “Diffractive Deep Neural Network for Optical Orbital Angular Momentum Multiplexing and Demultiplexing,” IEEE J. Sel. Top. Quantum Electron. 28(4), 1–11 (2022). [CrossRef]

29. Q. Zhao, S. Hao, Y. Wang, et al., “Orbital angular momentum detection based on diffractive deep neural network,” Opt. Commun. 443, 245–249 (2019). [CrossRef]

30. J. Zhou, Y. Yin, J. Tang, et al., “Recognition of high-resolution optical vortex modes with deep residual learning,” Phys. Rev. A 106(1), 013519 (2022). [CrossRef]

31. P. L. Neary, A. T. Watnik, K. P. Judd, et al., “CNN classification architecture study for turbulent free-space and attenuated underwater optical OAM communications,” Appl. Sci. 10(24), 8782 (2020). [CrossRef]

32. V. Raskatla and V. Kumar, “Deep learning assisted OAM modes demultiplexing,” in Fifteenth International Conference on Correlation Optics. (SPIE, 2021).

33. Z. Wang, Z. Wang, M. Chen, et al., “Coherent demodulated underwater wireless optical communication system based on convolutional neural network,” Opt. Commun. 534, 129316 (2023). [CrossRef]

34. V. Raskatla, B. P. Singh, S. Patil, et al., “Speckle-based deep learning approach for classification of orbital angular momentum modes,” J. Opt. Soc. Am. A 39(4), 759–765 (2022). [CrossRef]

35. X. Li, X. Li, L. Sun, et al., “Research on Orbital Angular Momentum Recognition Technology Based on a Convolutional Neural Network,” Sensors 23(2), 971 (2023). [CrossRef]

36. Y. Zhang, H. Zhao, H. Wu, et al., “Recognition of Orbital Angular Momentum of Vortex Beams Based on Convolutional Neural Network and Multi-Objective Classifier,” in Photonics, (MDPI, 2023).

37. H. Qin, Q. Fu, W. Tan, et al., “Highly accurate OAM mode detection network for ring Airy Gaussian vortex beams disturbed by atmospheric turbulence based on interferometry,” J. Opt. Soc. Am. A 40(7), 1319–1326 (2023). [CrossRef]

38. A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An image is worth 16 × 16 words: Transformers for image recognition at scale,” arXiv, arXiv:2010.11929 (2020). [CrossRef]

39. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. (PMLR, 2019).

40. F. Pampaloni and J. Enderlein, “Gaussian, hermite-gaussian, and laguerre-gaussian beams: A primer,” arXiv, arXiv:0410021 (2004). [CrossRef]

41. X. Ke, J. Wu, and S. Yang, “Research progress and prospect of atmospheric turbulence for wireless optical communication,” Chinese Journal of Radio Science 36(3), 323–339 (2021). [CrossRef]

42. L.C. Andrews and R.L. Phillips, Laser beam propagation through random media. (Laser Beam Propagation Through Random Media: Second Edition, 2005).

43. R.J. Hill, “Models of the scalar spectrum for turbulent advection,” J. Fluid Mech. 88(3), 541–562 (1978). [CrossRef]

44. A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” Advances in Neural Information Processing Systems 30, 15 (2017).

45. D. Lukovnikov, A. Fischer, and J. Lehmann, “Pretrained transformers for simple question answering over knowledge graphs,” in The Semantic Web–ISWC 2019: 18th International Semantic Web Conference, Auckland, New Zealand, October 26–30, 2019, Proceedings, Part I 18. (Springer, 2019).

46. N.C. Camgoz, O. Koller, S. Hadfield, et al., “Sign language transformers: Joint end-to-end sign language recognition and translation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2020).

47. Z. Shaheen, G. Wohlgenannt, and E. Filtz, “Large scale legal text classification using transformer models,” arXiv, arXiv:2010.12871 (2020). [CrossRef]

48. J. Devlin, C. Ming-Wei, L. Kenton, et al., “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv, arXiv:1810.04805 (2018). [CrossRef]

49. R. Strudel, R. Garcia, I. Laptev, et al., “Segmenter: Transformer for semantic segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision. (2021).

50. M. Yang, “Visual Transformer for Object Detection,” arXiv, arXiv:2206.06323 (2022). [CrossRef]

51. A. Arnab, M. Dehghani, G. Heigold, et al., “Vivit: A video vision transformer,” in Proceedings of the IEEE/CVF international conference on computer vision. (2021). [CrossRef]

52. N. Park and S. Kim, “How do vision transformers work?” arXiv, arXiv:2202.06709 (2022). [CrossRef]

53. A. Steiner, A. Kolesnikov, X. Zhai, et al., “How to train your vit? data, augmentation, and regularization in vision transformers,” arXivarXiv:2106.10270 (2021). [CrossRef]

54. J. Deng, W. Dong, R. Socher, et al., “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition. (2009).

Parameter	Specification
Beam waist ( $ω_{0}$ )	0.03 m
Wavelength (λ)	1550 nm
Inner scale ( $l_{0}$ )	0.0003 m
Outer scale ( $L_{0}$ )	50 m
Phase screen interval (Δz)	200 m
Phase screen size (N)	300
Resolution	800 × 800

Vision transformers motivating superior OAM mode recognition in optical communications

Abstract

1. Introduction

2. Methods

2.1 OAM generation

2.2 Atmospheric turbulence (AT)

2.3 Principle of the ViT model for OAM recognition under AT

3. Simulation results and discussion

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (1)

Equations (2)

Optics Express