Deep learning for multi-star recognition in optical turbulence

Jafar Bakhtiar Shohani; Morteza Hajimahmoodzadeh; Morteza Hajimahmoodzadeh; Hamidreza Fallah; Hamidreza Fallah

doi:10.1364/OPTCON.468308

1. Introduction

The atmosphere of the earth is a turbulent medium that causes the degradation of ground-based images. It is the result of random variation of temperature that can cause random variation in the refractive index of the air. This affects the phase directly and also the angle of arrival of the optical wave reaching the astronomical imaging system. To overcome these unwanted issues, adaptive optics is a useful solution. The atmosphere can randomly impose a combination of aberrations on wavefronts. Some models are defined to interpret and formulate the optical turbulence of the atmosphere which are mentioned in the following section. These models are used to generate phase screens. In this connection, some works have been conducted in which the phase screens are used to simulate atmospheric turbulence and mitigation its impact on the imaging systems [1–5].

The usual way to remove the effect of turbulence in images is deblurring, which is usually done either through image processing techniques or adaptive optics methods for different purposes. In the image processing methods, due to the presence of turbulence, features and patterns are so unclear that it is hard to distinguish the type of the close star systems (i.e. double stars). Also, due to the variability of turbulence strength, it is difficult to reach a single algorithm for various scenarios. On the other hand, after wavefront sensing and phase correcting in adaptive optics methods, there is always some wavefront residue such that an ideal image cannot be achievable. Therefore, adaptive optics may be unable to recognize a double star because of the existence of wavefront error. In addition to this issue, there are also some practical limitations that refer to the adaptation of the measured wavefront to the phase corrector element. Therefore, the phase correction is not done completely. Knowing these facts, we have proposed a deep learning method to directly recognize the close star systems in the presence of atmospheric turbulence, without any phase correction or image processing technique. The deep learning network takes a big dataset which is generated by some methods that are described in the following section.

Recently, many works have been directed in optics based on machine learning. This field is growing fast, becoming one of the widely used tools for automation purposes including in adaptive optics and wavefront sensing [6–13]. Extracting meaningful data from raw ones is a valuable process that machine learning is capable of and can turn data sets into significant information. Currently, new algorithms of deep learning such as CNN [14], Long-Short Term Memory (LSTM) networks [15], Generative Adversarial Network (GAN) [16], and Auto-encoder (AE) [17] have made significant progress in artificial intelligence. Artificial Neural Networks (ANNs) are the key parts for the most of the machine learning-based methods. The structure of ANN is the imitation of the human brain. In a typical ANN, the network is made up of successive layers with some neurons connected to each other. Peterson et al. explained in detail the ANNs [18].

Regarding the application of machine learning for the categorization of star systems, Cokina et al. introduced a method for the automatic classification of eclipsing binary stars using deep learning [19]. They considered two classes for the light curves of the detached and over-contact binary stars. To this end, they compared different configurations as the deep learning models such as CNN, LSTM, and GRU (Gated Recurrent Unit). Their dataset consist of many generated synthetic data achieved by ELISa software which they created. They reported that a combination of LSTM and CNN has an accuracy of upper than 98%. Considering this accuracy, they presented an automatic approach to classifying the binary stars. However, to address the differences between this work and the presented study, it should be said that we considered the turbulence effect on images taken from the earth. Moreover, the type of data is different. We simulated two-dimensional arrays of images as input data and they generated one-dimensional vectors of light curves. Besides, they studied two-star systems with three categories for star separation, while in the presented study, multi-star systems are taken into consideration. In the following, by comparing the different turbulence models, it is reported that a deep learning network with high accuracy is built which can be thought for automatic purposes.

2. Theoretical framework

The most well-known model to describe optical turbulence is Kolmogorov. The power spectrum of the Kolmogorov model is defined as [20]:

(1)$${\Phi ^K}(k) = 0.033\,C_n^2\,{k^{ - 11/3}},$$

where $C_n^2$ is the refractive index structure function and k is the angular spatial frequency in rad/m which equals to $k = \,|2\pi ({f_x}\hat{i} + {f_y}\hat{j})|$. This model is valid in the range of $1/{L_0} < k < 1/{l_0}$, where ${L_0}$ is the outer scale and ${l_0}$ the inner scale of turbulence. The Kolmogorov model was studied by other scientists and some models were introduced that their power spectrums are defined as [20]:

(2)$${\Phi ^T}(k) = 0.033\,C_n^2\,{k^{ - 11/3}}\textrm{exp} ( - {k^2}/k_m^2)\,,\,\,\,\,\,\,k > 1/{L_0}$$

(3)$${\Phi ^{VK}}(k) = 0.033\,C_n^2\,{({k^2} + k_0^2)^{ - 11/6}}\,,\,\,\,\,\,0 < k < 1/{l_0}$$

(4)$${\Phi ^{MVK}}(k) = 0.033\,C_n^2\,{({k^2} + k_0^2)^{ - 11/6}}\textrm{exp} ( - {k^2}/k_m^2)\,,\,\,\,\,\,0 \le k < \infty$$

(5)$${\Phi ^{AS}}(k) = 0.033\,C_n^2\,[1 + 1.802(k/{k_l}) - 0.254{(k/{k_l})^{7/6}}]\,,\,\,\,\,0 \le k < \infty$$

where, ${k_m} = 5.92/{l_0}$, ${k_0} = 2\pi /{L_0}$, and ${k_l} = 3.3/{l_0}$. In these equations, the superscripts “T”, “VK”, “MVK”, and “AS” stand for “Tatarskii”, “Von-Karman”, “Modified Von-Karman”, and “Atmospheric Spectrum”, respectively. To generate phase screens, the phase $\varphi (x,y)$ could be written using Fourier transform as:

(6)$$\varphi (x,y) = \sum\limits_n {\sum\limits_m {{c_{n,m}}\textrm{exp} (2\pi i(x{f_{{x_n}}} + y{f_{{y_m}}}))} } ,$$

in which, the complex numbers ${c_{n,m}}$ are series coefficients, and ${f_{{x_n}}}$, ${f_{{y_m}}}$ are spatial frequencies in x and y directions, respectively. It is proven that the ${c_{n,m}}$ coefficients have normal distribution with zero mean, which the relation between the variance of the Fourier series coefficients and the power spectral density is defined as [21]:

(7)$$\left\langle {\,|{\,c_{n,m}^2\,} |\,} \right\rangle = \Phi \,({f_{{x_n}}},{f_{{y_m}}})\,\Delta {f_{{x_n}}}\,\Delta {f_{{y_m}}},$$

where $\Phi ({f_{{x_n}}},{f_{{y_m}}})$ is the power spectral density, $\Delta {f_{{x_n}}} = 1/{L_x}{\; }, \Delta {f_{{y_m}}} = 1/{L_y}$, and ${L_x}$, ${L_y}$ are grid sizes along x and y directions, respectively. It should be emphasized that this method can’t generate low-order modes (such as tip-tilt) accurately. Therefore, the subharmonic method is proposed to overcome this issue. In this method, the phase screen for low spatial frequencies is as follows [22]:

(8)$${\varphi _{LF}}(x,y) = \sum\limits_{p = 1}^{{N_p}} {\,\,\sum\limits_{n ={-} 1}^1 {\,\sum\limits_{m ={-} 1}^1 {{c_{n,m}}\textrm{exp} \,(2\pi i\,(x{f_{{x_n},p}} + y{f_{{y_m},p}})),} } }$$

where the value of p corresponds to a different grid, and ${N_p}$ is the number of grids. The subscript “LF” stands for low frequency. This phase is added to the phase in Eq. (6) to cover the low spatial frequencies.

3. Simulations

3.1 Data generation

In this paper, the first step to generate intensity images as input data for the CNN is simulating many distorted wavefronts. Therefore, many random phase screens are generated for five turbulence models based on Fourier transform and subharmonic method mentioned in the previous section (Eqs. (6–8)). The randomness of these wavefronts is related to ${c_{n,m}}$ in Eq. (7). The next step is calculating the Point Spread Function (PSF) for every single wavefront. PSF is defined as the irradiance of a point source in the image plane [23].

(9)$$PSF(x,y) = \,{|{h(x,y)} |^2},$$

where $h(x,y)$ is the impulse response defined as [23]:

(10)$$h(x,y) = FT\,[A(x^{\prime},y^{\prime})\,\textrm{exp} (i\varphi (x^{\prime},y^{\prime}))],$$

in which “FT” stands for Fourier transform, $A(x^{\prime},y^{\prime})$ represents the aperture function, and $\varphi (x^{\prime},y^{\prime})$ is the phase simulated based on the turbulence models. The next step is the convolution of the generated PSFs and simulated aberration-free images of stars that provide the aberrated images as:

(11)$${I_2} = PSF \otimes {I_1}\,.$$

where ${I_1}$ and ${I_2}$ are the aberration-free and aberrated images, respectively. The ${\otimes} $ operator points to the convolution operation. Therefore, one part of the input data for the deep learning model is generated. However, due to the phase diversity of turbulent wavefronts, the uniqueness of the phase relevant to in-focus images cannot be guaranteed. Thus, the out-of-focus images as the second part of the data are taken into account. The out-of-focus situation could be simulated by adding the defocus term of the Zernike polynomial to the phases of the in-focus situation. This term is defined as:

(12)$$Z_{\,2}^{\,0} \equiv {Z_4} = \sqrt 3 (2{\rho ^2} - 1),$$

where $\rho $ is the radial coordinate of the unit circle in Zernike polynomials. Indeed, adding the defocus term to the phase is equivalent to slightly shifting the image plane of the optical system from the focal plane which is shown in Fig. (1). In this paper, the amount of this shift is considered to be $\lambda /8 \times {Z_4}$ in which $\lambda $ is the wavelength. Consequently, new phases are attained and the procedure of generating the out-of-focus images could be done through Eqs. (9–11). As an example, the intensity images for both the in-focus and out-of-focus situations are presented in Fig. (2).

The basic parameters in the simulation process are presented in Table (1) that are used according to the well-known instrument parameters at Keck Observatory [24]. Therefore, for every single situation of atmospheric turbulence, we have a pair of images. By this procedure, we have generated many aberrated images, based on the five turbulence models described in the previous section. For each turbulence model, 6000 in-focus and 6000 out-of-focus images were simulated. The image dataset is divided into training and testing data to be fed into the CNN which is elaborated on in the following section.

Fig. 1. Shifting the image plane to get out-of-focus intensity patterns

Download Full Size | PDF

Fig. 2. An example of simulating the aberrated images for in-focus (upper row), and out-of-focus (lower row) situations, and for five atmospheric models: (a): Kolmogorov, (b): Von-Karman, (c): Modified Von-Karman, (d): Tatarskii, and (e): Atmospheric spectrum.

Download Full Size | PDF

Table 1. Parameters in the simulation

View Table | View all tables in this article

In the data generation process, the ground-truth locations of stars are chosen according to the mutual distances of stars. In the simulation, the distance between two stars is defined as the distance of their centers (maximum pixel value) and is measured by pixels. These settings are applied to different classes (two and more stars in the images) presented in Table (2).

Table 2. Overview of the generated dataset and main parameters

View Table | View all tables in this article

The essential point to build a deep learning network is that it must have a generalized functionality. Hence, the variable parameters of the atmosphere should be flexible in some feasible ranges. Thus, during the data generation, we have considered these ranges by using the random numbers with the uniform distribution that are shown in Table (3).

Table 3. Ranges and distribution of random variables used in turbulence parameters

View Table | View all tables in this article

3.2 CNN configuration

In a typical CNN, feature extraction is the first step in which the main layers are convolution and pooling. The convolution operation is expressed as sliding a filter across an image with the element-wise multiplication of the filter and image in x and y directions, defined as:

(13)$$(I\ast f)(x,y) = \sum\limits_i {\sum\limits_j {I(x - i,y - j)\,f(i,j)} }$$

where $I$ and $f$ are image and filter, respectively. The output image after convolution could have a different size from the original image that depends on the parameters: filter size, stride, and padding. Another essential layer in CNN is pooling which can reduce the dimension of images while maintaining the features extracted by convolution. Max pooling is a customary type of pooling for which a kernel with a specific size should be determined. The kernel slides across the image and extracts the maximum value of that portion of the image.

Convolution and pooling can significantly facilitate the complication of the data structure. After extracting the features and reducing the dimensions during the learning process, rescaling and normalizing the data is an important task that is done by batch normalization. It can stabilize and standardize the network and can accelerate the training procedure. This process is defined as:

(14)$${y_n} = \alpha \left( {\frac{{{x_n} - {x_m}}}{{\sqrt {\sigma_m^2} }}} \right) + \beta$$

in which ${x_n}$ is input and ${y_n}$ is the output of the batch normalization layer. In this equation, $\alpha $ and $\beta $ are two trainable parameters that become updated during the learning. The mean ${x_m}$ and the variance $\sigma _m^2$ are defined as usual. The resulting output becomes flattened, and fed into some dense layers (also known as fully-connected). Neurons in the dense layers are connected to each other such that every connection is established by a trainable parameter named weight. Weights become updated during the learning procedure. In so doing, some configurations are recently introduced for CNN: VGGNet, AlexNet, ResNet, and GoogleNet [25–28]. In this paper, an AlexNet-like model with some modifications has been applied. In our work, two channels containing in-focus and out-of-focus gray-scale images are considered as input data for the CNN where the size of the images is 128 × 128. For the first convolution layer, 64 kernels with size 11 × 11 and stride 2 are applied. This size and stride are applied to extract the large-scale features. Therefore, the first bunch of feature maps are obtained by passing to the max pooling layer. These steps and their output are depicted in Fig. (3), schematically. Batch normalization is the next layer in this architecture. After the repetition of these three steps, three convolutions and one pooling layer are applied. The number of neurons in the final layer must be equal to the number of the classes. The detail of this process is listed in Table (4).

Fig. 3. The CNN configuration. Two convolution and two max pooling layer, corresponding to the number of kernels in Table (4) are shown.

Download Full Size | PDF

Table 4. Layers and parameters of the CNN configuration in this work

View Table | View all tables in this article

Updating the trainable parameters is done by using an optimizer. During the optimization process, these parameters must be chosen by the algorithm such that the cost function (also known as loss function) approaches the possible minimum value. Indeed, the answers are evaluated by the cost function. In the classification problems in machine learning, it is customary to use cross-entropy as the cost function. In this paper, we exploited the sparse categorical cross-entropy that is usually used for multi-class classification problems and is expressed by [29]:

(15)$$Loss = \, - \sum\limits_{i = 1}^N {{y_i}\log ({{\hat{y}}_i})\,,}$$

where N is the number of classes, ${y_i}$ and ${\hat{y}_i}$ are the ground-truth and predicted labels of class “i”, respectively. In this work, ADAM (Adaptive Moment estimation) optimizer which is a modified version of the gradient descent algorithm is utilized. The main feature of ADAM is the adaptive learning rate. Thus, updating the weights in the learning process can be more effective than conventional gradient descent algorithms [30].

3.3 Data validation

In this paper, in order to validate the input data to the CNN, the image central moments are investigated. Our dataset contains a large number of images. A deep learning network must experience miscellaneous input data during learning in order to operate properly. To check their diversity, statistical methods can be investigated, such as the mean and variance and distribution of pixel intensities. However, the coordinate of patterns in images are also important. Also, we have two types of in-focus and out-of-focus images in the dataset, in which the spatial distribution of pixels is different and we must interpret them separately from a statistical point of view. The different moments of an image contain both intensity and coordinate information. Therefore, we have applied them in this paper. They have been popular in pattern recognition in computer vision. Chaumette et al. studied in detail and presented a good explanation of image moments and their features [31]. The basic equation for image central moments is:

(16)$${\mu _{p\,q}} = \sum\limits_x {\sum\limits_y {{{(x - {x_m})}^p}{{(y - {y_m})}^q}\,I(x,y).} }$$

In this equation, $({x_m},{y_m})$ is the coordinate of the center of mass in an image. A covariance matrix which its elements are related to the image moments up to order 2, can be made that is shown as follows:

(17)$${\mathop{\rm cov}} \,(I(x,y)) = \left[ \begin{array}{l} {{\mu^{\prime}}_{20}}\,\,\,{{\mu^{\prime}}_{11}}\\ {{\mu^{\prime}}_{11}}\,\,\,{{\mu^{\prime}}_{02}} \end{array} \right],$$

in which, ${\mu ^{\prime}_{20}} = {\mu _{20}}/{\mu _{00}}$, ${\mu ^{\prime}_{02}} = {\mu _{02}}/{\mu _{00}}$, ${\mu ^{\prime}_{11}} = {\mu _{11}}/{\mu _{00}}$. In these relations, ${\mu _{00}}$ is the total intensity of all pixels. The distribution of the pixel intensities around the center of mass is defined by ${\mu ^{\prime}_{11}}$. It represents the first moments for both x and y directions. A covariance matrix with dimension 2 × 2 has two eigenvectors and two eigenvalues that are as defined as:

(18)$${\lambda _{\,i}} = \frac{{{{\mu ^{\prime}}_{20}} + {{\mu ^{\prime}}_{02}} \pm \sqrt {4{{\mu ^{\prime}}_{11}} + {{({{\mu ^{\prime}}_{20}} - {{\mu ^{\prime}}_{02}})}^2}} }}{2}.\,$$

In machine learning, checking the existence of outlier data is a common task in statistical analysis. An outlier data point is significantly different from the other data. Outliers usually reduce the effectiveness of the learning performance. They are handled in three ways: they might be accepted, corrected, or removed. If the amount of outliers is large, they must be removed or corrected. In this part of the paper, we have shown that outlier data are rarely found in our dataset, because very few data points are separated from the others. In Fig. (4), the consistency of the eigenvalues with respect to each other for both the in-focus and out-of-focus situations can be seen. Moreover, in theory it is clear that in the focal plane the density of pixel intensities is higher than that of the out-of-focus situation. This could be clearly seen in Fig. (4), where the marker size is chosen small to show the density of the dots. Obviously, the out-of-focus images have greater eigenvalues than the in-focus images which is consistent with the theory where the out-of-focus images have more scattered intensity patterns. Therefore, greater values of elements in covariance matrices for the out-of-focus images lead to greater values for eigenvalues. Also, the scattered points for the eigenvalues show diverse images with different intensity patterns. Regarding these facts, the generated images for both in-focus and out-of-focus situations are suitable as input data for the CNN.

4. Results

4.1 Training and validation accuracy

What is common to machine learning is to divide data into three types including training, validation, and testing. The training operation is performed on the training data and the evaluation of training is performed on the validation data for every epoch. In this work, 20% of the whole data is considered as test data and the remaining data are for training procedure. However, 20% of the training data is dedicated to the validation data. After the learning process, the constructed machine learning model is tested using the test data which are not included in the learning process. In this paper, the accuracy of the CNN is considered for the training and validation processes. The plots in Fig. (5), show the training and validation accuracy of the CNN for situations where there are up to six stars in the images in turbulence conditions. These plots are presented for the five turbulence models mentioned in the previous sections. As can be seen, the training and validation accuracy of the CNN increases to the extent that it finally gets closer to the vicinity of one. The training accuracy for the primary epochs in the first row is high as well. This is mainly because there are two classes. It can be seen that training and validation accuracy finally get closer to the vicinity of one. There are some small drawbacks in the accuracy that are mainly due to the learning rate which is a hyperparameter in the optimization algorithm. In this work, Adam optimizer is applied that has an adaptive learning rate. When the learning rate is adaptive, a value for the learning rate is automatically considered for each epoch, based on the experience of the algorithm in dealing with input and output data. Therefore, updating the trainable parameters could affect the algorithm to have such drawbacks in accuracy at some epochs. However, in the subsequent epochs, the algorithm is still able to automatically change the learning rate based on these drawbacks effectively, and increase the accuracy. But eventually, the algorithm had the ability to improve its performance. It should be emphasized that by comparing the five atmospheric turbulence models, the CNN works well for all of them. On these accounts, if the data are generated based on each turbulence model, the CNN is able to work with high accuracy. This is examined in the next section more precisely in which to evaluate the testing data, we used some well-known metrics that are described in the following.

Fig. 4. Eigenvalues scatter plot of the input data for Kolmogorov model, where the red dots show the case for in-focus images, and the blue dots show the case for out-of-focus images.

Download Full Size | PDF

Fig. 5. Accuracy plots for training (solid) and validation (dashed) data for (column a): Kolmogorov, (column b): Von-Karman, (column c): modified Von-Karman, (column d): Tatarskii, and (column e): Atmospheric spectrum models. The first, second, third, fourth, and fifth rows are for two, three, four, five, and six classes, respectively.

Download Full Size | PDF

4.2 Network performance assessment

Confusion matrix is a common method to describe the performance of classification problems in deep learning. A typical confusion matrix is expressed by:

(19)$$C\, = \left[ \begin{array}{l} TP\,\,\,FN\\ FP\,\,\,TN \end{array} \right]\,\,,$$

where TP, FN, FP, and TN stand for true positive, false negative, false positive, and true negative, respectively. For two classes, these parameters are defined simply as follows:

• TP shows the number of samples that belong to class “1” and they are correctly predicted by the model to belong to class “1”,
• FN shows the number of samples that actually belong to class “1”, but the model predicted them not to be in class “1”,
• FP shows the number of samples that the model predicted to be in class “1” but did not actually belong to it,
• TN shows the number of samples that are not in class “1” and the model correctly predicted them to be so.

These definitions can be generalized for more complex systems, e.g. 3, 4, 5, and 6 stars.

Based on these definitions, four metrics are introduced to evaluate the network, such as “accuracy”: (TP + TN)/(TP + TN + FP + FN), “precision”: TP/(TP + FP), “recall”: TP/(TP + FN) and “f1-score”: (2×precision × recall)/(precision + recall). These metrics are shown in Fig. (6), in which the average of metric values of the subclasses (labels) in Table (5) is calculated for each class and each turbulence model. As an example, the accuracy for the class 6 and for the Kolmogorov model is 0.95 which is the average of six accuracies related to each label. The calculation of other metrics and turbulence models could be done in the same way.

Fig. 6. Four metrics for five atmospheric turbulence models

Download Full Size | PDF

Table 5. Four metrics for five atmospheric turbulence models^a

View Table | View all tables in this article

If these four metrics become closer to 1, it indicates that the performance of the network for the test data after learning is good. In the accuracy metric, the ratio of true predictions to all of the true and false predictions is measured. As shown in Fig. (6), the network accuracy is higher than 80% for all of the turbulence models. In this regard, the most robust model is the atmospheric spectrum, because it remained more stable than the other models for different classes. In the precision metric, based on the definition, only positive predictions are involved. In other words, it shows the performance of the network for positive predictions whether they are true or false. In Fig. (6), it can be seen that in class 4, the precision for all of the models is higher than 90%, except Tatarskii which is around 80%. But for the recall metric, false negative predictions are also important. In fact, it measures the ratio of true positive predictions to the sum of true positive and false negative predictions. Therefore, the recall metric is usually more sensitive than the precision metric. If the recall and precision metrics are close to each other and also have high values, it could be concluded that false predictions are low and the network power of prediction is high whether for positive or negative predictions. In this sense, the atmospheric spectrum model is the best one, because its similarities and values of precision and recall (both upper than 90%) are higher than the other models. Thus, it can be said that the false predictions for this model are fewer than the other models. Also, f1-score is a weighted average of recall and precision. Generally, according to Fig. (6), it can be seen that the configured CNN works quite well for two and three-star classes. When the number of stars increases, the complexity of the classification process increases as well. However, the most accurate and robust model with more stable metrics by increasing the number of classes is atmospheric spectrum. It had the lowest changes in metrics. Despite the differences in performance of these turbulence models, it should be emphasized that all of the turbulence models have had metric values higher than 0.8, which shows that the network has a high power of prediction.

5. Conclusion

In this paper, we performed classification-based deep learning to distinguish the close stars in the presence of atmospheric turbulence with different degrees of strength. To this aim, we compared two well-known atmospheric turbulence models. By employing simulation, we generated the image data using these two models. These data consisted of two sets of in-focus and out-of-focus images. Also, the label of each pair of images which is the type of star system is fed to the deep learning network. The important variables in the simulation process were considered in feasible ranges to have a generalized deep learning model. Therefore, the data in different turbulence strengths (equivalent to different values of $C_n^2$) and diverse ranges of turbulence parameters including inner (${l_0}$) and outer (${L_0}$) scales of turbulence were generated. Moreover, the distances of the stars were adjusted, such that we can have thousands of images of stars with different distances to have detached, semi-detached, and separated stars. Finally, these images were given to CNN as input data. After generating data through simulation, a CNN was designed and configured which was comprised of different layers including convolution, max pooling, batch normalization, and dense layers. To check the diversity and validity of the data, central image moments were utilized and eigenvalues from the covariance matrix for each image were extracted. The covariance matrix for each image was used up to the second-order moment. To express the evaluation of the test data, the confusion matrix method was utilized. We concluded that the deep learning model, presented in this paper, works well to recognize the stars through different situations of atmospheric turbulence with different strengths. To evaluate the network, four metrics (“accuracy”, “precision”, “recall” and “f1-score”), were calculated and the comparison of these metrics and also the atmospheric turbulence models were presented.

Disclosures

The authors declare no conflict of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. R. C. Hardie, M. A. Rucci, S. Bose-Pillari, and R. Van Hook, “Application of tilt correlation statistics to anisoplanatic optical turbulence modeling and mitigation,” Appl. Opt. 60(25), G181–G198 (2021). [CrossRef]

2. Z. Chen, D. Zhang, C. Xiao, and M. Qin, “Precision analysis of turbulence phase screens and their influence on the simulation of Gaussian beam propagation in turbulent atmosphere,” Appl. Opt. 59(12), 3726–3735 (2020). [CrossRef]

3. F. Dios, J. Recolons, A. Rodríguez, and O. Batet, “Temporal analysis of laser beam propagation in the atmosphere using computer-generated long phase screens,” Opt. Express 16(3), 2206–2220 (2008). [CrossRef]

4. X. Chenlu, H. Shiqi, Z. Dai, Z. Qingsong, and W. Xiongfeng, “Design of atmospheric turbulence phase screen set under the influence of combined oblique propagation and beam propagation,” Infrared Laser Eng. 48(4), 404003 (2019). [CrossRef]

5. D. A. Paulson, “Experimental characterization of atmospheric turbulence supported by advanced phase screen simulations,” PhD diss., University of Maryland, College Park, (2020).

6. D. Cunefare, C. S. Langlo, E. J. Patterson, S. Blau, A. Dubra, J. Carroll, and S. Farsiu, “Deep learning based detection of cone photoreceptors with multimodal adaptive optics scanning light ophthalmoscope images of achromatopsia,” Biomed. Opt. Express 9(8), 3740–3756 (2018). [CrossRef]

7. Y. Jin, Y. Zhang, L. Hu, H. Huang, Q. Xu, X. Zhu, L. Huang, Y. Zheng, H. L. Shen, W. Gong, and K. Si, “Machine learning guided rapid focusing with sensor-less aberration corrections,” Opt. Express 26(23), 30162–30171 (2018). [CrossRef]

8. Y. Nishizaki, M. Valdivia, R. Horisaki, K. Kitaguchi, M. Saito, J. Tanida, and E. Vera, “Deep learning wavefront sensing,” Opt. Express 27(1), 240–251 (2019). [CrossRef]

9. J. Nousiainen, C. Rajani, M. Kasper, and T. Helin, “Adaptive optics control using model-based reinforcement learning,” Opt. Express 29(10), 15327–15344 (2021). [CrossRef]

10. H. Guo, Y. Xu, Q. Li, S. Du, D. He, Q. Wang, and Y. Huang, “Improved machine learning approach for wavefront sensing,” Sensors 19(16), 3533 (2019). [CrossRef]

11. D. Jin, Y. Chen, Y. Lu, J. Chen, P. Wang, Z. Liu, S. Guo, and X. Bai, “Neutralizing the impact of atmospheric turbulence on complex scene imaging via deep learning,” Nat. Mach. Intell. 3(10), 876–884 (2021). [CrossRef]

12. D. M. Sampson, D. A. Caneiro, A. L. Chew, J. La, D. Roshandel, Y. Wang, J. C. Khan, E. Chelva, P. G. Stevenson, and F. K. Chen, “Evaluation of focus and deep learning methods for automated image grading and factors influencing image quality in adaptive optics ophthalmoscopy,” Sci. Rep. 11(1), 1–9 (2021). [CrossRef]

13. A. M. Vorontsov, M. A. Vorontsov, G. A. Filimonov, and E. Polnau, “Atmospheric turbulence study with deep machine learning of intensity scintillation patterns,” Appl. Sci. 10(22), 8136 (2020). [CrossRef]

14. Y. LeCun, “Generalization and network design strategies,” Connectionism in perspective 19(143-155), 18 (1989).

15. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput. 9(8), 1735–1780 (1997). [CrossRef]

16. I. Goodfellow, J. P. Abadie, M. Mirza, B. Xu, D. W. Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems 27 (2014).

17. M. Tschannen, O. Bachem, and M. Lucic, “Recent advances in autoencoder-based representation learning,” arXiv preprint arXiv: 1812.05069 (2018).

18. C. Peterson and T. S. Rögnvaldsson, “An introduction to artificial neural networks,” No. LU-TP-91-23. CERN, (1991).

19. M. Čokina, V. Maslej-Krešňáková, P. Butka, and Š. Parimucha, “Automatic classification of eclipsing binary stars using deep learning methods,” Astron. Comput. 36, 100488 (2021). [CrossRef]

20. L. C. Andrews and R. L. Phillips, “Laser beam propagation through random media,” 2nd Ed., SPIE Press, Bellingham, WA (2005).

21. B. M. Welsh, “A Fourier series based atmospheric phase screen generator for simulating anisoplanatic geometries and temporal evolution,” Proc. SPIE 3125, 327–338 (1997). [CrossRef]

22. J. D. Schmidt, “Numerical simulation of optical wave propagation: With examples in MATLAB” SPIE. pp. 169 (2010).

23. D. G. Voelz, “Computational Fourier optics: a MATLAB tutorial,” SPIE press, Bellingham, Washington, pp. 127 (2011).

24. https://keckobservatory.org/

25. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv: 1409.1556 (2014).

26. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM 60(6), 84 (2017). [CrossRef]

27. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).

28. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” In Proceedings of the IEEE conference on computer vision and pattern recognition (2015), pp. 1–9.

29. P. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorial on the cross-entropy method,” Ann. Oper. Res. 134(1), 19–67 (2005). [CrossRef]

30. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980 (2014).

31. F. Chaumette, “Image moments: a general and useful set of features for visual servoing,” IEEE Trans. Robot. 20(4), 713–723 (2004). [CrossRef]

Parameter	Dimension	Value
Wavelength (λ)	[µm]	2.2
Telescope diameter (D)	[m]	10
Grid size	[pixel]	128
Pixel scale	[arcsecond]	0.04

Max number of stars:	1	2	3	4	5	6
Number of images :	2000	2000	2000	2000	2000	2000
Mutual star distances (d) [pixel] :	-	3 < d < 12	3 < d < 15	4 < d < 17	5 < d < 19	5 < d < 21

$C_{n}^{2} [m^{- 2 / 3}]$		$l_{0} [m]$		$L_{0} [m]$
range	random	range	random	range	random
10⁻¹⁶ < $C_{n}^{2}$ < 10⁻¹²	uniform	0.001< $l_{0}$ < 0.01	uniform	1< $L_{0}$ < 100	uniform

Layer	Type	Activation	Input Size	Filter Size	Stride	Pad	Kernels
1	Input	-	128 × 128 × 2	-	-	-	-
2	Convolution	ReLU	128 × 128 × 2	11 × 11	2	0	64
3	Max pooling	-	59 × 59 × 64	3 × 3	2	0	-
4	Batch Normalization	-	29 × 29 × 64	-	-	-	-
5	Convolution	ReLU	29 × 29 × 64	5 × 5	1	2	192
6	Max pooling	-	29 × 29 × 192	3 × 3	2	1	-
7	Batch Normalization	-	15 × 15 × 192	-	-	-	-
8	Convolution	ReLU	15 × 15 × 192	3 × 3	1	1	256
9	Convolution	ReLU	15 × 15 × 256	3 × 3	1	1	128
10	Convolution	ReLU	15 × 15 × 128	3 × 3	1	1	128
11	Max pooling	-	15 × 15 × 128	3 × 3	2	0	-
12	Dense		7 × 7 × 128	-	-	-	-
13	Dropout	-	6272 × 1	-	-	-	-
14	Dense		6272 × 1	-	-	-	-
15	Dropout	-	4096 × 1	-	-	-	-
16	Output	Softmax	N × 1	-	-	-	-

Parameter	Dimension	Value
Wavelength (λ)	[µm]	2.2
Telescope diameter (D)	[m]	10
Grid size	[pixel]	128
Pixel scale	[arcsecond]	0.04

Deep learning for multi-star recognition in optical turbulence

Abstract

1. Introduction

2. Theoretical framework

3. Simulations

3.1 Data generation

3.2 CNN configuration

3.3 Data validation

4. Results

4.1 Training and validation accuracy

4.2 Network performance assessment

5. Conclusion

Disclosures

Data availability

References

Data availability

Cited By

Figures (6)

Tables (5)

Equations (19)

Optics Continuum

Morteza Hajimahmoodzadeh	https://orcid.org/0000-0002-1915-1322
Hamidreza Fallah	https://orcid.org/0000-0003-4696-211X