High-performance mode decomposition using physics- and data-driven deep learning

Zichen Tian; Li Pei; Jianshuai Wang; Kaihua Hu; Wenxuan Xu; Jingjing Zheng; Jing Li; Tigang Ning

doi:10.1364/OE.470445

1. Introduction

In recent years, few-mode fibers (FMFs) have received extensive attention. due to the great potential in the fields of mode-division multiplexing optical transmission [1,2], sensing [3,4], nonlinear effect studies [5,6] and fiber laser system improvements [7,8], etc. However, compared with single-mode fiber (SMF), due to the simultaneous transmission of multiple modes in FMF, mode coupling is prone to occur in practical applications. The need for accurate characterization of mode information in the FMF has grown rapidly with the development of FMF-based devices. The mode decomposition (MD) technique, as an effective fiber mode analysis tool, can quantitatively reveal the beam properties associated with the transmitted modes in FMFs [9]. Through MD, both the modal weights (ρ²) and modal relative phases (θ) of each mode are obtained. Therefore, the MD techniques are applied broadly in mode instabilities monitoring [10], fiber-to-fiber coupling [11], mode-resolved gain analysis [12], adaptive mode control [13,14], laser self-focusing threshold measurement [15] and beam quality characterization [16], etc.

Recently, amounts of MD method have been proposed, such as the spatially and spectrally resolved imaging (S²) [17], the optical correlation analysis (OCA) [18], the wavefront measurement [19,20], the inverse matrix solution [21,22], the stochastic parallel gradient descent (SPGD) [23,24] and deep learning (DL) method [9,11,25,26], etc. Among these methods, DL has been proven to be an excellent solution in the MD field due to its excellent performance in image information analysis. The DL-based MD method extracts and analyzes the features of the beam pattern through the constructed neural network (NN) architecture. After training the constructed NN, both the corresponding ρ² and θ are quantitatively obtained. Therefore, the DL-based MD schemes have attracted the attention of researchers. As a numerical MD algorithm, it only needs the beam patterns as input images. It also has the advantages of simple operation process, accurate results with fast speed, less requirements on the experimental environment and equipment. Moreover, the NN-based MD schemes don’t consider the range of the initial values to avoid falling into the local optimal solution. A series of effective DL-based MD algorithms were proposed. In 2019, the first MD solution based on VGG-16 convolutional NN (CNN) was proposed with a high-precision decomposition in the case of 5 modes [9]. In 2020, X. Fan et al. [11] improved the MD accuracy by adding the errors associated with reconstructed near-field and far-field beam patterns into the loss function during training process. In the 6-mode case, the error of modal weights was reduced to ∼1%, the modal relative phases error was ∼2%. In 2021, the NN with principal component analysis (PCA) algorithm was proposed to preprocess the beam pattern and then use it as input to the NN to reduce the difficulty of feature extraction [26]. This method completes training in a short time without using GPU acceleration, but the modal relative phases error is as high as 4% in case of 3 modes. To obtain high-accuracy MD results in the 6-mode and above case, S. Rothe et al. [25] used up to 121 layers of DenseNet-CNN for MD in 2021, but it can only be trained on two GPUs simultaneously and consumes massive computational resources.

Above all, the current DL-based MD methods belong to traditional DL that only uses data to drive NN. It is called the data-driven deep learning (DDL). The DDL method strongly relies on the experience learned from the training datasets to analyze modal coefficients. Once the test beam pattern is significantly different from the training dataset or contains noise, it will cause serious errors. In addition, in the case of fiber structure parameters change, the DDL-MD need retrain the NN with regenerated beam pattern datasets to maintain precision. The reason is that the NN doesn’t extract essential physical features from the training set to analyze the tested pattern. Therefore, the DDL-MD has some limitations. Firstly, in terms of accuracy, the errors of modal coefficients are still high in the case of 6 modes and above. Meanwhile, the signs of MD results obtained by current DL-based schemes are indeterminate due to phase ambiguity [27]. Secondly, the DDL-based MD suffers from serious generalization problems for tested patterns that differs greatly from patterns in the training set and the errors of modal weights and relative phases tend to be much higher than the average errors. Thirdly, the noise immunity of the DDL-MD methods deteriorates rapidly as the modes number increases. Finally, even if the fiber structure parameters change slightly, it is inevitable to regenerate massive corresponding beam patterns as training set and retrain the NN, which is quite time-consuming and unfriendly operating. In general, the DDL-based MD relying solely on dataset hinders the performance promotion due to its poor ability to extract the real physical features and correct unexpected features.

In this paper, to overcome the limitations of the DDL-MD, we propose a physics- and data-driven deep learning (PDDL) method for high-performance MD. In PDDL, the network parameters update in the combination of the experience learned from the beam pattern dataset and the intrinsically physical modal features from the beam propagation model (BPM) of the FMF. Meanwhile, the PDDL enables to cull the unexpected features that conflict with the physical laws in the guided of BPM of the FMF. We compare the performance of DDL-MD and PDDL-MD method. In the 8-mode MD case, when the real modal weights and relative phases are obtained, the errors of ρ² and θ are less than 0.25% and 0.65% respectively. Furthermore, theoretical and experimental results demonstrate the PDDL-MD performing well in generalization, noise-resistance and fiber parameter adaptation.

2. Methods

2.1 Principle of mode decomposition

The propagation field of FMF can be described as a linear superposition of the eigenmodes supported in the fiber. It can be expressed mathematically as

(1)$$U(r,\varphi )\textrm{ = }\sum\limits_{n = 1}^N {\sqrt {\rho _n^2} {e^{i{\theta _n}}}{\psi _n}} ({r,\varphi } )$$

where N is the total number of eigenmodes supported in the fiber. Considering the weakly guidance approximation, here we utilize linearly polarized (LP) modes to describe eigenmodes [28]. ψ_n(r,φ) represents the normalized field distribution of the n^th mode. The corresponding modal coefficients of the n^th mode include modal weights and modal relative phases, which are denoted by ρ2 n and θ_n, respectively. The relative phase is defined as the difference between the higher-order modes and fundamental mode LP₀₁. In common, the phase of LP₀₁ mode is fixed as θ₁= 0. Therefore, the beam intensity distribution in the FMF can be expressed as

(2)$$I(r,\varphi )\textrm{ = }{|{U(r,\varphi )} |^2}\textrm{ = }H({\rho ^2},\theta )$$

where H(·) stands for the mapping function that relates all modal weights ρ² and all modal relative phases θ to the beam intensity distribution I. The intensity distribution can be captured by CCD as a near-field beam pattern. If the modal coefficients are known, the near-field image can be reconstructed according to the mapping function H(·). On the contrary, if we only get the intensity distribution of the FMF, then the solution of the modal coefficients depends on the inverse mapping function H^-1(·), which can be expressed as

(3)$$({\rho ^2},\theta ) = {H^{ - 1}}(I(r,\varphi ))$$

The DL-based MD algorithms can be regarded as schemes of fitting the NN function defined by a series of weights and biases to the inverse mapping function H^-1(·) in the direction of minimizing the risk function. When only a near-field beam pattern is used as input to the numerical MD algorithm, in order to obtain real modal coefficient values with determined signs, the relative phase θ₂ of the LP_11e mode ranges from 0 to π [29]. The range of the other relative phases θ_3,4,5…N is [-π, π]. Besides, with the introduction of the normalization, the modal weights satisfy the inherent relationship of ∑ ρ2 n = 1. Because of the determined θ₁= 0, MD aims to recover the modal weights ρ2 1,2,3…N and relative phases θ_2,3,4…N with the number of 2N-1 from the beam pattern.

2.2 Physics- and data-driven deep learning

The proposed PDDL-MD scheme is shown in Fig. 1. In the MD operation, the PDDL-MD are composed of two steps. The first step is a data-driven training process that extracts common features from beam pattern datasets. The second step is a physics-driven fine-tuning process which introduces the BPM of FMF to the network optimization loop, enhancing the NN's ability to learn essential physical features. Thus, it could cull the unexpected features learned from the data-driven process that don’t obey the physical laws of beam propagation in FMF.

Fig. 1. Schematic diagram of PDDL-MD scheme.

Download Full Size | PDF

As illustrated in Fig. 1, the first data-driven step makes the NN learns experience from the beam pattern dataset. The dataset should be generated to provide sufficient data information for the NN. By calculating the normalized field distribution of each supported LP eigenmode, the simulated near-field beam patterns with random ρ² and θ can be numerically generated as grayscale dataset according to the fiber’s eigenmode superposition. The dataset includes training set D_T and validation set D_V. The label vectors consist of corresponding ρ² and θ. The D_T is used to update the weights w and biases b of the NN. The D_V is applied to tune network’s hyperparameters. During training, by feeding the D_T into the NN for multiple epochs, the network parameters are updated for minimizing the discrepancy between the label vectors and the output until the network converges. The training process can be expressed as

(4)$${R_{{\textrm{w}^\ast },{b^\ast }}} = \mathop {\arg \min }\limits_{w,b \in \Theta } {||{{R_{\textrm{w},b}}({{I_i}} )- ({\rho_i^2,{\theta_i}} )} ||^2},\textrm{ }\forall [{{I_i},({\rho_i^2,{\theta_i}} )} ]\in {D_T}$$

where R_w,b represents a NN defined by a set of network parameters w and b. I_i are the labeled beam pattern and (ρ2 i, θ_i) is the corresponding label vectors from D_T. After training, a converged network R_w*,b* is obtained.

The second physics-driven step introduces the physics model of FMF with a fine-tuning process to enhance the capabilities of the NN. By exploiting the BPM of FMF described by Eqs. (1)-(3) as a constraint for the feature learning process of the NN. It will help the NN learn essential physical features, and cull the unexpected features learned in the data-driven step that conflict with the BPM. In the second step, the input is the single test beam pattern, the NN only learns experience from the physics model H(·) which corresponds with the BPM of the FMF, and thus finally obtains a viable solution that satisfies the physical laws of FMF. It only needs iterations to obtain the modal coefficients with high precision. There is no need extra dataset or NN retraining in this step. To be specific, after the NN has been trained in the first step, the preliminary ρ² and θ predicted for the test pattern can be used to reconstruct the corresponding beam pattern according to H(·). The weights w and biases b of the network are optimized cyclically based on the errors between the test beam pattern and the reconstructed beam pattern in the second step. The process can be expressed as

(5)$${R_{{\textrm{w}^\ast }^{\ast },{b^\ast }^{\ast }}} = \mathop {\arg \min }\limits_{{w^{\ast}},{b^{\ast}} \in \Theta } {||{H({{R_{{\textrm{w}^{\ast}},{b^{\ast}}}}(I )} )- I} ||^2}$$

where I stand for the test beam pattern which is not part of the D_T. When the optimization is complete, the converged network R_w**,b** can output the final predicted ${\tilde \rho ^2}$ and , which can be expressed as

(6)$$\left( {\mathop {{\rho^2}}\limits^ \sim ,\mathop \theta \limits^ \sim } \right) = {R_{{\textrm{w}^\ast }^{\ast},{b^\ast }^{\ast}}}(I )$$

It needs to be emphasized that it is necessary to select an appropriate NN structure to obtain high-precision results for the PDDL-MD scheme.

The structure of the designed NN is shown in Fig. 2. It is a multi-output CNN with specialized branches. By dividing the modal coefficients into multiple tasks to learn, the designed CNN can extract the different meaning of modal weights and relative phases, so that it has a good decomposition accuracy. Blocks 1 to 5 are common hidden layers, consisting of convolutional layers of 3*3 size and max-pooling layers, to extract and share common features across all tasks in low dimensions. Block 6 to Block 8 belong to the specific hidden layers containing dropout layers and fully connected layers, to extract modal coefficients’ specific features in high dimensions. In Block 1 to Block 7, every convolutional layer and fully connected layer uses Leaky_ReLU as an activation function. In Block 8, in order to ensure the validity of the output, the activation function is replaced by softmax, sigmoid or tanh function according to the value of modal weights and relative phases.

Fig. 2. Structure of the designed neural network which is a convolutional neural network suitable for multi-task learning.

Download Full Size | PDF

In the implementation of the scheme, we adopted the following parameter settings: the input beam pattern size is 128*128*1. The leak parameter of Leaky_ReLU is 0.2. In the data-driven step, the learning rate is set from 1e^-4 to 1e^-6 in different epochs, and the momentum parameters of the Adam optimizer are β₁ = 0.9, β₂ = 0.999. The mean-absolute-error (MAE) is applied to the sub-loss Loss_Di. In the physics-driven step, the size of the reconstructed beam pattern is the same as input pattern. The network’s parameters are fine-tuned using the Adam optimizer with a learning rate of 1e^-5. The loss function used to calculate the difference between the reconstructed image and the input image can be expressed as

(7)$$L\textrm{os}{\textrm{s}_p} = \frac{1}{M}\sum\limits_{m = 1}^M {\left|{\mathop {{I_m}}\limits^ \sim{-} {I_m}} \right|}$$

where M represents the number of pixels in the beam pattern, _m and I_m are the intensity of the reconstructed and the input pattern, respectively.

3. Results and discussion

The operation is conducted in a desktop computer with an AMD R9-5900X CPU and an NVIDIA RTX 3060 GPU. Here, for quantitatively evaluating the effectiveness of the PDDL-MD scheme, we examine its performance based on both simulated and real captured beam patterns.

3.1 Simulations

In order to verify the performance of the PDDL-MD scheme, a step-index FMF with the core radius of 12 µm and NA of 0.1141 is applied. At the wavelength of 1550 nm, the first 8 modes supported in the fiber (LP₀₁, LP_11e, LP_11o, LP_21e, LP_21o, LP₀₂, LP_31e, and LP_31o) are taken as examples. The subscripts e and o indicate odd and even LP mode, respectively. Due to the mode degeneracy [9], there are four possible mode-superposition (the former 3, 5, 6, and 8 modes superposition) cases. For the four cases, about 100000, 200000, 400000, 700000 simulated beam patterns assigned random modal coefficients are generated as D_T respectively. In each mode-superposition case, another 2000 simulated patterns are applied to test set D_TE to verify the performance. In the first data-driven step, during training, for the 3-mode case, the total loss Loss_D converges to 0.08 after 30 epochs. When mode number increases to 8, the NN structure becomes more complex, the Loss_D converges to 0.35 after 55 epochs, which is shown in Fig. 3(a). In the second physics-driven step, for each test beam pattern, the reduction degree of the Loss_P is different. We randomly select three test beam patterns as examples. The three curves of Loss_P varying with the number of iterations in the 8-mode case are shown in Fig. 3(b). It can be found that because the trained NN in the data-driven step has specific prediction accuracy for each test beam pattern, the initial value of Loss_P is different. After 200 iterations, the Loss_P converges to below 0.0015 for the test beam patterns. The iteration number for the Loss_P convergence reduces with less modes. In the 3-mode case, it only takes 100 iterations to make Loss_P converge to lower than 0.0011.

Fig. 3. Loss evolution curves for (a) the data-driven step and (b) the physics-driven step in the 8-mode case.

Download Full Size | PDF

To evaluate the precision of the PDDL-MD, the absolute weight error Δρ² is defined as | ρ2 p-ρ2 r| and the absolute relative phase error Δθ is |θ_p-θ_r| /2π. The subscripts r and p indicate the real and predicted modal coefficients, respectively. The correlation function is introduced to visualize the similarity between the reconstructed and original input patterns, which is expressed as [25]

(8)$$C = \left|{\frac{{\int\!\!\!\int {\triangle {I_{out}}(r,\varphi )\triangle {I_{in}}(r,\varphi )rdrd\varphi } }}{{\sqrt {\int\!\!\!\int {\triangle I_{out}^2(r,\varphi )rdrd\varphi \int\!\!\!\int {\triangle I_{in}^2(r,\varphi )rdrd\varphi } } } }}} \right|$$

where ΔI_i(r,φ)=I_i(r,φ)-$I(-{-}-)$_i (i = out, in),`I_i stands for the mean intensity of the output reconstructed pattern I_out and input real pattern I_in. The range of C is 0% to 100%.

3.1.1 High accuracy output

In order to evaluate the accuracy of the proposed scheme, we compare the modal coefficients’ error between the DDL and PDDL-MD. As shown in Fig. 4, the average error results of the D_TE are shown for both DDL and PDDL -MD under the same simulation conditions. Figure 4(a) illustrates the modal weights error corresponding with the mode number. The weight error Δρ² of the PDDL-MD scheme is 0.21% in the 3-mode case, which is slightly larger than that of DDL-MD. The reason is that we aim to obtain the statistical minimal average Δρ² and Δθ at the same time. The small growth of Δρ² makes a great reduction of Δθ as shown in Fig. 4(b). In the case of 3 modes, Δθ of the PDDL-MD is only 0.39%, reducing by two times form 0.86% for the DDL-MD. With the increasing of mode number, the PDDL-MD has a stable performance in accuracy. When the mode number increases from 3 to 8, Δρ² and Δθ only increase by less than 0.05% and 0.3% (shown in red line), respectively. In the 8-mode MD case, the Δρ² of the PDDL-MD is 0.24%. Compared with the DDL-MD, Δθ drops from 3.76% to 0.64%, which is a reduction of 6 times.

Fig. 4. Comparison of the PDDL-MD and the DDL-MD schemes on (a) modal weights and (b) modal relative phase errors.

Download Full Size | PDF

Some typical MD examples of the PDDL-MD scheme for four cases (3, 5, 6, 8-mode superposition) with corresponding correlation values and residual patterns are given in Fig. 5. The average correlation is higher than 99.96%. It demonstrates high similarity between the original and reconstructed patterns. The results proved that the PDDL-MD attains with high and stable accuracy in spite of the mode number growth.

Fig. 5. Typical MD examples of the PDDL scheme in (a) 3-mode, (b) 5-mode, (c) 6-mode, and (d) 8-mode cases. ORI: original pattern, REC: reconstructed pattern, RES: residual pattern, COR: correlation.

Download Full Size | PDF

3.1.2 Generalization ability

We also investigate the generalization of the PDDL-MD method. The generalization of the DL-based MD method stands for the processing ability of the test data (the input of the NN) beyond the training set D_T. The generalization ability affects the MD error fluctuation across the D_TE. Here, the error variance is applied to evaluate the MD error fluctuation degree. Based on the same D_T and D_TE, the modal coefficient error variances are discussed, which is expressed as

(9)$${V_{\Delta {\rho ^2}}}\textrm{ = }\frac{{\sum\limits_{\textrm{i} = 1}^Z {({\Delta \rho_i^2 - \overline {\Delta {\rho^2}} } )} }}{Z}$$

(10)$${V_{\Delta \theta }} = \frac{{\sum\limits_{\textrm{i} = 1}^Z {({\Delta \theta_i^2 - \overline {\Delta \theta } } )} }}{Z}$$

where Z represents the number of samples in the D_TE, $\Delta \rho (-{-}-)$² and $\Delta \theta (-{-}-)$ are the mean values of Δρ² and Δθ, respectively.

The statistical modal coefficient error variances of the D_TE in both DDL and PDDL-MD are illustrated in Fig. 6. Compared with DDL, the PDDL performs better in both V_Δρ2 and V_Δθ. As shown in Fig. 6(a), although in the 3-mode case, the V_Δρ2 of the PDDL-MD is 0.028, which is somewhat higher than that of DDL-MD. Since the PDDL scheme focuses on obtaining lower average values on both V_Δρ2 and V_Δθ simultaneously. The small growth of V_Δρ2 makes a great reduction of V_Δθ as shown in Fig. 6(b). Compared with the DDL-MD, the V_Δθ of the PDDL-MD is reduced from 3.00 to 0.901 in the case of 3 modes. For the 8-mode case, as shown in Fig. 6(a), it can be found that V_Δρ2 of the DDL-MD is larger than 0.06. Meanwhile, V_Δρ2 of the PDDL-MD is lower than 0.025. The relationship between V_Δθ and mode number is shown in Fig. 6(b). Compared to the DDL-MD, the PDDL-MD can reduce V_Δρ2 from 7.847 to 0.882 which is decreased by ∼10 times in the 8-mode case and keep V_Δθ below 0.91. The results of error variance show that the PDDL-MD scheme greatly reduces the error fluctuation on the D_TE, which means the high MD accuracy can be maintained for the most mode-superposition cases.

Fig. 6. The generalization performance of PDDL-MD and DDL-MD on (a) modal weights error variance and (b) relative phases error variance.

Download Full Size | PDF

Some typical examples related to generalization in the 8-mode case are shown in Fig. 7. Here, the original test patterns are greatly different from the patterns in the D_T. For a single beam pattern, the improvement in Δρ² and Δθ can also be applied to evaluate the generalization ability. Compared with the DDL-MD, Δθ of the PDDL-MD can be reduced by up to 100 times, and Δρ² is reduced by 12 times at most. From the 4th example in Fig. 7, it can be shown that when a large MD error occurs in the implementation of the DDL scheme, the beam pattern reconstructed based on the PDDL-MD still shows a very high similarity with the original test pattern. The results demonstrate the superiority of the PDDL scheme in generalization ability.

Fig. 7. Some typical examples for generalization ability of the PDDL-MD in the 8-mode case.

Download Full Size | PDF

3.1.3 Noise resistance

In the practical MD operations, noise disturbance is unavoidable, due to the real devices’ vulnerability. Since the noise is chaotic, it will bring great trouble to the extraction of modes features by the numerical MD algorithm, resulting in a large MD error. It is the main drawback in the practical application of current numerical MD methods. However, for the PDDL-MD method, the embedded BPM of FMF helps the NN to identify noise based on the essential physical features. To quantitatively verify the noise resistance of the PDDL-MD scheme, noise is added to the input test patterns. To be specific, the intensity value of every pixel of the ideal test pattern is multiplied by a factor, which is equal to 1+ N(0,1)·σ. Here N(0,1) is the standard normal distribution and σ represents noise intensity [9]. It should be noticed that the threshold value of σ is 0.08 in any practical scenarios [30].

Taking 8-mode case as an example, 1000 test patterns are generated for each σ. The modal coefficient errors under different σ which are depicted in Fig. 8. From Fig. 8(a), the Δρ² under different σ is analyzed in the 8-mode case. It can be found that the PDDL-MD shows great noise immunity. At the σ of 0.12, the Δρ² is only 0.67%, which is a tenfold reduction compared to the DDL-MD method. The same conclusion as above can also be obtained from the comparison of Δθ shown in Fig. 8(b). As shown in Fig. 8(b), compared with the traditional DDL-MD, the Δθ of the PDDL-MD algorithm drops from 13.65% to 2.41%, which is reduction of more than 5 times. The results confirm that for the PDDL-MD, high-precision MD can be performed on input patterns with strong noise.

Fig. 8. The performance of the PDDL-MD and DDL-MD method on (a) modal weights error and (b) relative phases error in the 8-mode case under different noise intensity σ.

Download Full Size | PDF

In order to visually characterize the superiority of PDDL method in terms of noise immunity, some typical examples of PDDL-MD on input patterns with different σ are illustrated in Fig. 9. With the σ increasing from 0 to 0.12, the reconstructed beam pattern maintains a high similarity with the original one. The pattern correlation is higher than 0.9991. It demonstrates that the proposed scheme greatly improves the noise tolerance, which helps broaden the application range of numerical MD algorithm.

Fig. 9. Input and reconstructed beam patterns from the PDDL-MD under different noise intensity σ. The value included in the pattern is the correlation with the original image. The pattern in the red rectangle is the original image. REC: reconstructed pattern.

Download Full Size | PDF

3.1.4 Adaptive enhancement

For the traditional DL-based MD method, it is first necessary to generate the dataset based on fixed FMF structure parameters. The generated dataset is used to train the NN until convergence, and then the MD result can be output by trained network. However, if MD is performed for other kinds of FMFs, new datasets need to be regenerated and the NN needs to be retrained. This greatly hinders the large-scale and multi-scenario use of MD in practice. But for the PDDL-MD method, fast and high-precision MD can be achieved for a series of fibers with similar structures by simply changing the fiber structure parameters of the embedded BPM and re-performing the physics-driven step. This is because the NN can relearn the essential physical features based on the updated physics model to help analyze the FMF whose structural parameters has changed. It is worth noting that the whole process neither needs to regenerate datasets based on new fiber structural parameters, nor relies on any datasets to fine-tune NN parameters.

In order to investigate the adaptive enhancement of the PDDL-MD, the test beam patterns are generated with different core radius (R) and numerical aperture (NA). Taking the 3-mode case as an example, the variation of the modal coefficient errors under the difference ΔR between the changed core radius and original core radius are depicted in Fig. 10(a) and 10(b), respectively. From Fig. 10(a), it can be found that if the ΔR < 12 µm, Δρ² of the PDLL-MD is less than 0.5%. Compared with the DDL-MD, the Δρ² is reduced by 15 times when the ΔR = 12 µm. Similarly, Δθ reduces from 6.61% to 1.22%. as illustrated in Fig. 10(b). The Δρ² and Δθ with variation of difference ΔNA between the changed NA and the original NA are shown in Fig. 10(c) and Fig. 10(d), respectively. For the Δρ², it keeps lower than 0.5%, regardless of the NA fluctuation. Besides, with a large variation of ΔNA = 0.1, Δθ of the PDDL-MD is lower than 2.08%. Based on the PDDL-MD, it enables to obtain the modal coefficients in different kinds of FMFs with high precision without retraining the NN based on regenerated datasets.

Fig. 10. Adaptive performance of the PDDL-MD and DDL-MD method. (a) Modal weights error and (b) relative phases error in the 3-mode case under different tested fiber core radius. (c) modal weights error and (d) relative phases error in the 3-mode case under different tested fiber NA.

Download Full Size | PDF

3.1.5 Time cost

Although the PDDL-MD consists of two-step NN update operations, it is still very cost-effective. When MD is performed, it is only necessary to perform the physics-driven step to output high-precision MD results since the data-driven step can be completed in advance. Furthermore, in all cases studied in this paper, the PDDL method can complete MD once within 200 iterations (∼5s) in the physics-driven step at most. Note that the proposed scheme is implemented here by Python and it can be implemented by other more efficient programming languages (such as C/C++) to speed up. This suggests that the PDDL-MD scheme can be implemented at a fast speed while maintaining high performance.

3.2 Experiments

The performance of the PDDL-MD is investigated experimentally based on the real near-field beam patterns captured by the CCD. A 3-mode commercial step-index FMF (FM SI-2) with standard cladding size produced by Y.O.F.C. is used. At the wavelength of 1550nm, it has a core radius of 7 µm and an NA value of 0.1141. The experimental setup for recording real beam patterns based on all-fiber devices is illustrated in Fig. 11. After being emitted by the laser source (Santec TSL-510), the laser at the wavelength of 1550 nm whose power is adjusted by the variable optical attenuator (VOA) is divided into three parts by an optical beam splitter. Each of the three beams passes through a polarization controller so that one of the polarization components of the beam can be selected. Mode conversion is implemented by a 3-mode photonic lantern (LP₀₁, LP_11e, and LP_11o) and the beam is coupled into the test FMF at the output of the photonic lantern. A CCD camera (Bobcat-320-GigE, pixel size: 20 µm) captures the output near -field beam from the FMF end face with objective lens’ assistance. A total of 100 real near-field beam patterns with different mode-superposition in the 3-mode case are collected as test images.

Fig. 11. Schematic diagram of the experimental setup for capturing real beam patterns. VOA, variable optical attenuator; SMF, single-mode fiber; OBS, optical beam splitter; PC, polarization controller; PL, photonic lantern; FMF, few-mode fiber; OBJ, microscope objective lens; CCD, charge-coupled device.

Download Full Size | PDF

For the entire test dataset consisting of real beam patterns recorded by a CCD, the average correlation value between the real patterns and reconstructed patterns based on the predicted modal coefficients can reach 98%. Some typical examples with corresponding reconstructed patterns and correlation values are shown in Fig. 12. The reason of the experimental results slightly lower than the theory is that the simulated test datasets are numerically generated based on ideal conditions, but the real test datasets are captured by a common CCD without adaptive functions in the experiment. For a process-free initial image captured by CCD, it inevitably suffers from disturbance, various types of noise and pattern center shift. We believe that the practical MD accuracy will be further improved with the help of the denoising equipment and rigorous experimental environment.

Fig. 12. Typical experimental examples based on real beam patterns. ORI: original real beam pattern, REC: reconstructed beam pattern, COR: correlation value.

Download Full Size | PDF

4. Conclusion

In conclusion, a PDDL-MD method is proposed and established. The PDDL-MD extracts essential physics features and cull the unexpected features that conflict with the physical laws by introducing the BPM of FMF to the NN. We quantitatively evaluate the PDDL-MD scheme on the accuracy, generalization ability, noise resistance, and adaptive enhancement. In the 8-mode MD case, Δρ² and Δθ of the PDDL-MD are less than 0.25% and 0.65% in the case of obtaining the real modal coefficients, respectively. Furthermore, the PDDL-MD method greatly improves the generalization ability of the NN. In the 8-mode case, Δρ² and Δθ can be reduced by 12 times and 100 times for beam patterns that differ greatly from the D_T. In addition, the proposed method shows robust against noise even in the 8-mode MD case with a practical maximum noise factor of 0.12. More strikingly, the PDDL-MD can maintain high precision for kinds of FMF without regenerating new dataset and retraining the NN. Meanwhile, the PDDL-MD also obtains high correlation based on real beam patterns in experiment. We believe that further efforts on increasing the variety of the NN structures will be probably contributed the PDDL-MD to practical application.

Funding

National Key Research and Development Program of China (2018YFB1801003); National Natural Science Foundation of China (61827817).

Disclosures

The authors declare that there are no conflicts of interest related to this paper.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. Rademacher, B. J. Puttnam, R. S. Luis, T. A. Eriksson, N. K. Fontaine, M. Mazur, H. Chen, R. Ryf, D. T. Neilson, P. Sillard, F. Achten, Y. Awaji, and H. Furukawa, “Ultra-wide band transmission in few-mode fibers,” in European Conference on Optical Communication (ECOC) (2021), 1–4.

2. M. Zuo, D. Ge, J. Liu, Y. Gao, L. Shen, X. Lan, Z. Chen, Y. He, and J. Li, “Long-haul intermodal-MIMO-free MDM transmission based on a weakly coupled multiple-ring-core few-mode fiber,” Opt. Express 30(4), 5868–5878 (2022). [CrossRef]

3. J. Jia, J. Cui, J. Zhang, M. Zuo, Y. Gao, Z. Chen, Y. He, and J. Li, “Distributed vibration sensor based on mode coupling in weakly coupled few-mode fibers,” Opt. Lett. 47(7), 1717–1720 (2022). [CrossRef]

4. Y. Li, Z. Song, J. Pan, H. Lu, and J. Hu, “In-line reflected fiber sensor for simultaneous measurement of temperature and liquid level based on tapered few-mode fiber,” Opt. Express 30(5), 7870–7882 (2022). [CrossRef]

5. H. K. Chandrasekharan, K. Ehrlich, M. G. Tanner, D. M. Haynes, S. Mukherjee, T. A. Birks, and R. R. Thomson, “Observing mode-dependent wavelength-to-time mapping in few-mode fibers using a single-photon detector array,” APL Photonics 5(6), 061303 (2020). [CrossRef]

6. C. Fan, Y. An, T. Yao, H. Xiao, L. Huang, J. Xu, J. Leng, and P. Zhou, “Seeing the beam cleanup effect in a high-power graded-index-fiber Raman amplifier based on mode decomposition,” Opt. Lett. 46(17), 4220–4223 (2021). [CrossRef]

7. J. Lv, H. Li, Y. Zhang, R. Tao, Z. Dong, C. Gu, P. Yao, Y. Zhu, W. Chen, Q. Zhan, and L. Xu, “Few-mode random fiber laser with a switchable oscillating spatial mode,” Opt. Express 28(26), 38973–38982 (2020). [CrossRef]

8. X. B. Lin, Y. X. Gao, J. G. Long, J. W. Wu, W. Y. Hong, H. Cui, Z. C. Luo, W. C. Xu, and A. P. Luo, “All Few-mode Fiber Spatiotemporal Mode-Locked Figure-eight Laser,” J. Lightwave Technol. 39(17), 5611–5616 (2021). [CrossRef]

9. Y. An, L. Huang, J. Li, J. Leng, L. Yang, and P. Zhou, “Learning to decompose the modes in few-mode fibers with deep convolutional neural network,” Opt. Express 27(7), 10127–10137 (2019). [CrossRef]

10. L. Huang, T. Yao, J. Leng, S. Guo, R. Tao, P. Zhou, and X. a. Cheng, “Mode instability dynamics in high-power low-numerical-aperture step-index fiber amplifier,” Appl. Opt. 56(19), 5412–5417 (2017). [CrossRef]

11. X. Fan, F. Ren, Y. Xie, Y. Zhang, J. Niu, J. Zhang, and J. Wang, “Mitigating ambiguity by deep-learning-based modal decomposition method,” Opt. Commun. 471, 125845 (2020). [CrossRef]

12. C. Jollivet, A. Mafi, D. Flamm, M. Duparré, K. Schuster, S. Grimm, and A. Schülzgen, “Mode-resolved gain analysis and lasing in multi-supermode multi-core fiber laser,” Opt. Express 22(24), 30377–30386 (2014). [CrossRef]

13. T. Qiu, I. Ashry, A. Wang, and Y. Xu, “Adaptive Mode Control in 4- and 17-Mode Fibers,” IEEE Photonics Technol. Lett. 30(11), 1036–1039 (2018). [CrossRef]

14. Y. An, J. Li, L. Huang, L. Li, J. Leng, L. Yang, and P. Zhou, “Numerical mode decomposition for multimode fiber: From multi-variable optimization to deep learning,” Opt. Fiber Technol. 52, 101960 (2019). [CrossRef]

15. X. Shen, H. Zhang, and M. Gong, “High Energy (100 mJ) and High Peak Power (8 MW) Nanosecond Pulses Delivered by Fiber Lasers and Self-Focusing Analysis Based on a Novel Mode Decomposition Method,” IEEE J. Sel. Top. Quantum Electron. 24(3), 1–6 (2018). [CrossRef]

16. G. Bai, X. Chen, Y. Yang, Y. Zheng, X. Zhao, K. Liu, C. Zhao, Y. Qi, B. He, and J. Zhou, “Beam quality evaluation of 20/400 µm large-mode-area fiber based on mode decomposition and reconstruction,” Laser Phys. 28(2), 025101 (2018). [CrossRef]

17. J. W. Nicholson, A. D. Yablon, S. Ramachandran, and S. Ghalmi, “Spatially and spectrally resolved imaging of modal content in large-mode-area fibers,” Opt. Express 16(10), 7233–7243 (2008). [CrossRef]

18. T. Kaiser, D. Flamm, S. Schröter, and M. Duparré, “Complete modal decomposition for optical fibers using CGH-based correlation filters,” Opt. Express 17(11), 9347–9356 (2009). [CrossRef]

19. M. Lyu, Z. Lin, G. Li, and G. Situ, “Fast modal decomposition for optical fibers using digital holography,” Sci. Rep. 7(1), 6556 (2017). [CrossRef]

20. J. Li, X. Zhang, Y. Zheng, F. Li, X. Shan, Z. Han, and R. Zhu, “Fast fiber mode decomposition with a lensless fiber-point-diffraction interferometer,” Opt. Lett. 46(10), 2501–2504 (2021). [CrossRef]

21. E. Manuylovich, A. Donodin, and S. Turitsyn, “Intensity-only-measurement mode decomposition in few-mode fibers,” Opt. Express 29(22), 36769–36783 (2021). [CrossRef]

22. P. S. Anisimov, V. V. Zemlyakov, and J. Gao, “2D least-squares mode decomposition for mode division multiplexing,” Opt. Express 30(6), 8804–8813 (2022). [CrossRef]

23. L. Li, J. Leng, P. Zhou, and J. Chen, “Multimode fiber modal decomposition based on hybrid genetic global optimization algorithm,” Opt. Express 25(17), 19680–19690 (2017). [CrossRef]

24. K. Choi and C. Jun, “Sub-sampled modal decomposition in few-mode fibers,” Opt. Express 29(20), 32670–32681 (2021). [CrossRef]

25. S. Rothe, Q. Zhang, N. Koukourakis, and J. Czarske, “Intensity-Only Mode Decomposition on Multimode Fibers Using a Densely Connected Convolutional Network,” J. Lightwave Technol. 39(6), 1672–1679 (2021). [CrossRef]

26. H. Gao, Z. Chen, Y. X. Zhang, W. G. Zhang, H. F. Hu, and T. Y. Yan, “Rapid Mode Decomposition of Few-Mode Fiber By Artificial Neural Network,” J. Lightwave Technol. 39(19), 6294–6300 (2021). [CrossRef]

27. R. Brüning, P. Gelszinnis, C. Schulze, D. Flamm, and M. Duparré, “Comparative analysis of numerical methods for the mode analysis of laser beams,” Appl. Opt. 52(32), 7769–7777 (2013). [CrossRef]

28. A. W. Snyder and J. D. Love, Optical Waveguide Theory (Springer, 1983).

29. E. S. Manuylovich, V. V. Dvoyrin, and S. K. Turitsyn, “Fast mode decomposition in few-mode fibers,” Nat. Commun. 11(1), 5507 (2020). [CrossRef]

30. A. Liu, T. Lin, H. Han, X. Zhang, Z. Chen, F. Gan, H. Lv, and X. Liu, “Analyzing modal power in multi-mode waveguide via machine learning,” Opt. Express 26(17), 22100–22109 (2018). [CrossRef]

High-performance mode decomposition using physics- and data-driven deep learning

Abstract

1. Introduction

2. Methods

2.1 Principle of mode decomposition

2.2 Physics- and data-driven deep learning

3. Results and discussion

3.1 Simulations

3.1.1 High accuracy output

3.1.2 Generalization ability

3.1.3 Noise resistance

3.1.4 Adaptive enhancement

3.1.5 Time cost

3.2 Experiments

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Equations (10)

Optics Express