An end-to-end laser-induced damage change detection approach for optical elements via siamese network and multi-layer perceptrons

Jingwei Kou; Jingwei Kou; Tao Zhan; Li Wang; Yu Xie; Yihui Zhang; Deyun Zhou; Maoguo Gong

doi:10.1364/OE.460417

1. Introduction

The healthy conditions of optical elements in high energy laser facility seriously affect the performance of the whole optical system and the corresponding inertial confine fusion physical experiments [1]. During the operating process of laser emission, the laser-induced damages may occur in the surface or body of optical elements due to the thermodynamic effects caused by laser induction [2–4]. As the number of laser irradiation increases, the size and shape of damaged regions will change accordingly. When the laser-induced damages grow to a certain extent, the optical elements will be out of actions for reaching their damage resistance thresholds [5–7]. Therefore, it is crucial to timely and effectively perform damage change detection of optical elements for laser-induced damage mechanism study, optics damage resistance threshold evaluation and security monitoring of high energy laser facilities.

The laser-induced damage change detection of optical elements can be considered as a pixel-level object classification problem. Namely, each pixel pair in the bi-temporal optical images captured before and after laser irradiation respectively need to be classified into two classes, i.e., changed class and unchanged class. Thus the damage change detection results can be highlighted by a binary image where one indicates changed pixels and zero indicates unchanged ones. As there are different kinds of noise and parasitic light in the captured optical images, the damage change detection performance is severely influenced and hindered. Besides, the dust and particles may adhere to the surface of optical elements and then lead to the appearance of fake damages in optical images. Hence, it’s still a challenging task to realize a precise and efficient damage change detection while suppressing the fake damages in real application scenes.

The traditional change detection methods generally use the hand-craft designed extractors to acquire damage features of optical elements and then classify them into changed class and unchanged class. Bovolo et al. proposed two representative manual feature extraction based methods: the change vector analysis (CVA) and the compressed CVA (C$^{2}$VA) methods in [8,9]. These two methods used the specific arithmetic expression to describe change information and have been applied for change detection of multi-spectral remote sensing images. Deng et al. proposed a PCA-based change detection method for multi-temporal and multi-sensor satellite data analysis in [10]. In [11], Li et al. presented a fuzzy clustering based approach for change detection which converted the change detection problem into a multiobjective optimization problem. The traditional methods have been verified in some conditions and have shown a relative good performance in some specific change detection tasks. However, they still have a common problem, i.e., their change detection results depend on rich industry experience and domain knowledge closely. When the application environment changes, their change detection capabilities degrade distinctly and even fail to work.

In recent years, as deep learning (DL) showing its powerful ability in the field of computer vision and pattern recognition, many DL based methods have been utilized to resolve optical object recognition and change detection problems [12–17]. In [18], Gong et al. proposed an unsupervised change detection model based on deep neural network (DNN) for change detection in synthetic aperture radar (SAR) images. Saha et al. proposed an unsupervised method which used deep change vector analysis (DCVA) for remote sensing image change detection in [19]. In [20], Gao et al. presented a convolutional-wavelet neural network based model (CWNN) for sea ice change detection in SAR images. By introducing dual-tree wavelet transform into convolutional neural networks, CWNN gained a stronger capacity of suppressing speckle noise. In [21], Du et al. proposed a deep network and slow feature analysis (SFA) based model to perform change detection in multi-temporal remote sensing images. In [22], Kou et al. proposed a similarity metric optimization driven model (SMO-SCNN) for damage change detection which exhibited good change detection performance in two real datasets. Furthermore, in [23], Saha et al. presented an unsupervised deep network for change detection of high spatial resolution multispectral images, which exploited all spectral bands to model contextual information and used transfer learning for network training. In [24], Wu et al. proposed a novel kernel principal component analysis convolutional mapping network (KPCA-MNet) for binary and multiclass change detection in multi-temporal remote sensing images. Compared to the traditional methods, these DL based change detection methods can obtain better performance because of their powerful feature learning abilities, but they still suffer from data imbalance and complex network structure. With the increasingly using of deep convolutional layers and attention modules, the architectures of the detection models become more and more bloated and have a lot of redundant components. The complex network structure brings big implementation pressure for applying these DL methods to real industrial applications. For the laser-induced damage change detection of optical elements, the damaged regions may be extremely small and are with less effective features in many cases, which leads to a seriously unbalanced distribution of hard and easy samples for model training. These problems significantly limit the extensive applications of the existing DL based change detection methods and need to be further studied.

As the convolutional neural network (CNN) structure based models have achieved great success in the field of visual object recognition and detection, many researchers take it for granted to utilize this type of structure in network design. Lately, the visual attention modules and vision transformer architecture have been widely used to boost the performance of network classification by extracting more efficient features and contextual information [25–29]. However, in the view of network complexity, these methods gain a relatively small performance improvement but introduce lots of extra computing overhead. Although these methods have shown its effectiveness and superiority in many tasks, there is a question worth thinking about by all of us in this field. Namely, do we have to use convolution structures and attention mechanisms in network design? From the point of view of [30–32], the CNN and attention modules is not necessary. In the image classification benchmarks, all multi-layer perceptrons (MLPs) based models can obtain competitive scores, which proves the effectiveness of MLP. The MLP based work gives us certain positive enlightenment and makes us rethink the scheme of our network design for damage change detection task.

Based on deep analysis of the existing works and our actual application scene, we plan to resolve the damage change detection problem by the ways of network structure design and hard degree metric of the samples. In this paper, a concise network structure only relying on the MLPs is designed to simplify the network complexity. By integrating the MLPs into siamese network, an end-to-end laser-induced damage change detection model (SiamMLP) is built. To deal with the data imbalance problem, we assign a classification difficulty for each individual sample by introducing a novel hard metric. What’s more, a novel hard loss is proposed for training the network. Benefiting from the hard metric and hard loss, the proposed model gains an excellent online hard sample mining ability. The contributions of this paper can be concluded as follows:

1) An end-to-end laser-induced damage change detection model is proposed for optical elements. By integrating the MLPs into siamese network, the proposed SiamMLP model obtains excellent damage change detection performance while effectively suppressing the fake damages and background interferences.
2) For addressing the unbalanced distribution of hard and easy samples, a novel metric called hard metric is introduced to quantitatively evaluate the classification difficulty of the samples. Moreover, a novel hard loss is presented to cooperate with the hard metric. The hard loss can up-weight the loss of hard samples and down-weight the loss of easy samples thus resulting in stronger hard sample mining ability of the proposed model.
3) The proposed model has been validated on two real damage change detection datasets for optical elements and the corresponding experimental results demonstrate its effectiveness and superiority. The proposed model provides a novel high-precision solution for condition change monitoring and is of great significance to damage growth mechanism study of optical elements.

The rest of this paper are organized as follows. Section 2. introduces some related background knowledge. Section 3. elaborates the proposed method in detail. Section 4. through 6. present the experimental setups, the experimental results and some relevant discussions with regard to our method. Finally, Section 7. draws the conclusion of this paper.

2. Background

2.1 Siamese network

The original siamese network is often used to extract the contrastive features by feeding two separate inputs into two identical sub-branches with shared weights [33,34]. The contrastive loss and cosine loss can be used to compute the loss between the extracted features of the two sub-networks and the ground truths. Through minimizing the training loss, the extracted features show a low intra-class and high inter-class difference. In [35], Liu et al. proposed a novel feature extraction method based on siamese CNN (SCNN) for hyperspectral image classification, which utilized a margin ranking loss to train this model and obtained discriminative features that were helpful in improving the classification accuracy. In [36], Hughes et al. presented a pseudo-siamese network for identifying corresponding patches in SAR and optical images. This model used two identical networks to extract features which were fused in a fully connected layer and trained by a binary cross-entropy loss. In [37], Zhang et al. proposed a siamese CNN based model for arbitrary object tracking. This model applied the siamese region proposal network to identify potential targets and used domain specific network updating to further improve the tracking performance. The approaches described above mainly treat siamese network as a feature extraction tool which can not be directly applied for object classification tasks. To address this problem, in [22], Kou et al. established a siamese architecture based classification network with some extra auxiliary layers to detect the damage changes of optical elements and obtained superior performance compared to the traditional approaches.

2.2 Introduction to MLP

MLP is a kind of forward artificial neural network (ANN) which has an input layer, an output layer and some hidden layers. The nodes in different layers of MLP are all fully connected. Despite of the nodes in the input layer, the rest nodes in MLP are non-linear activation function neurons thus making the network have a strong ability of dealing with nonlinear separable problems. Although MLP based methods have been widely used in last century, they are gradually neglected afterwards, while CNN and transformer based methods become popular for their powerful performance in computer vision [28]. Nevertheless, the computational complexity of these models is also increasing for using more layers and self-attention modules to capture more informative features and long distant dependences. In [30], Tolstikhin et al. presented a novel MLP-mixer model, which achieved a competitive results with state-of-the-art models. In [38], Ding et al. proposed a MLP based model (RepMLP) to lift the task performance of CNN based models, which replaced the original CNN modules by the MLPs. In [39], Guo et al. proposed a novel MLP like attention mechanism called “External Attention” which just used two linear layers and one normalization layer and could supersede current widespread self-attention frame.

2.3 Data imbalance problem

In object classification and detection tasks, the positive and negative samples often differ from each other significantly in counts which leads to a serious unbalanced data distribution. As the number of positive samples is small and its computed loss values are not many times larger than that of negative samples, the latter will dominate the train results in the model training process. Thus the pre-trained model will gain a poor classification and detection result in the testing data. For dealing with this problem, in [40], Lin et al. proposed the focal loss to gain the model’s online hard sample mining ability. The focal loss adjusted the loss weight of different classes by using a focusing strategy. Li et al. proposed a gradient harmonizing mechanism (GHM) based detector to resolve the huge difference between hard and easy samples in [41]. GHM handled the data imbalance problem by analyzing the gradients of all the samples and changed the contributions of different samples by weighting their gradients. On that basis, the GHM-C and GHM-R were designed respectively to balance the gradients for anchor classification and bounding box regression. In [42], Li et al. used the focal logistic loss to enhance feature representation learning ability in visual target tracking. This method alleviated the foreground-background data imbalance by modifying the logistic loss with a modulating factor. The existing methods have shown their effectiveness in different cases, but for damage change detection, the performance lifting by them seems relatively limited. It can be found that this problem needs to be resolved from theoretical level. This paper introduces a novel hard metric and a novel loss function to work together for dealing with the data imbalance issue.

3. Methodology

3.1 Overall description of the proposed method

As shown in Fig. 1, the proposed method consists of three stages: data preparation, network training and change detection testing. In the first stage, the training images are firstly converted into stacked grayscale image pair by data pre-processing module. Meanwhile, the corresponding ground truth map of the image pair is fed into the classification difficulty generation module (Section 3.3) to obtain the classification difficulty map. After that, the stacked image pair and the corresponding classification difficulty map are processed by the sample set generation module for producing the train dataset. The samples in the train dataset are all with the same size of $w\times w \times 2$ which are generated by partitioning pixels and their $N$ neighbors of the stacked image pair into image patches with the sliding window manner. In the second stage, each training sample is sent to the SiamMLP network (Section 3.2) for damage change classification. The SiamMLP network is built by integrating two identical MLPs modules into the siamese architecture so as to form an end-to-end classification network, which is then trained by the hard loss (Section 3.4), aiming to modulate the weights of hard and easy samples. Finally, all preprocessed test samples are fed into the well-trained network to obtain the corresponding labels. At length, all output data of the network are cached and reshaped thus we can gain a binary change map. The damage change detection result of optical element can be highlighted by the binary change map.

Fig. 1. The overall framework of the proposed SiamMLP model.

Download Full Size | PDF

3.2 Network architecture

From Fig. 1, the proposed SiamMLP network is comprised of one slice layer, two $1\times 1$ convolutional layers, two MLPs modules, one concatenation layer and one fully connected layer. The input data is firstly sliced into two parts by the slice layer. Then each part is fed to the identical sub-branch to extract its features. Each sub-branch of the siamese network is made up of a $1\times 1$ convolutional layer and a MLPs module. The MLPs module includes several cascaded MLP components while each MLP is composed of three fully connected layers and two non-linear layers. The rectified linear unit (ReLU) is utilized as the activation function of our non-linear layer. Here, for balancing the computation complexity and detection precision of damages, only one MLP is used in the MLPs module. The extracted features in the sub-branches are combined into a single vector by the concatenation layer. The concatenated vector is finally classified into changed and unchanged classes by an extra fully connected layer. The whole network is trained by the proposed hard loss. Through feeding the corresponding ground truth label and the classification difficulty of the input data into the hard loss, the network can be optimized by minimizing the computed loss value and converge in a good training result. The classification difficulty generation and the hard loss are described concretely in Section 3.3 and 3.4. The detailed parameter configuration of the proposed SiamMLP network can be found in Table 1.

Table 1. The detailed parameter configuration of the proposed SiamMLP network.^a

View Table | View all tables in this article

The overall structure of the proposed network plays a role of similarity driven classification network. By comparing the extracted features of the input bi-temporal image patches, the network outputs the classification results by the probability of similarity. The MLPs modules in the network are used for feature extraction and information fusion of different locations which do not use any convolutional layer or the attention mechanism. By integrating the MLPs module into the siamese network, the proposed network has a concise architecture while it can still achieve a competitive classification performance compared to some other complex networks.

3.3 Hard metric

For dealing with the data imbalance issue, most existing supervised learning based change detection methods just consider the ground truth information in computing the train loss and have not a quantitative metric to evaluate the classification difficulties of the samples. Different from them, this paper plans to address the online hard sample mining problem by introducing a novel hard metric. For one sample, we can measure its classification difficult degree by counting the number of pixels in the patch that have the same label as the center pixel. More specifically, suppose the center pixel $p_{ij}$ at the location $(i,j)$ in the ground truth map has the label $\Omega _{ij}$. Let $N_{ij}$ be the neighborhood of the center pixel $p_{ij}$. The total number of pixels in $N_{ij}$ is $w^{2}$. The classification difficulty of one sample can be represented as $d$ by Eq. (1), where $\sigma \ge 1$ is a scaling factor for adjusting the range of $d$, $h$ can be represented by Eq. (2) denoting the number of pixels in the sample with the same label as the center pixel, $w^{2}/8$ which is an experimental value denotes a threshold for distinguishing between the extremely difficult samples and ordinary samples. In Eq. (2), $\Omega _{st}$ denotes the label of the pixel $p_{st}\in N_{ij}$, $f(\cdot )$ denotes the label consistency of two single pixels which can be represented by Eq. (3). The easy samples which have large counts of $h$ will have large values of $d$, while hard samples will have relatively small values of $d$.

(1)$$d = \begin{cases} {(0.1\cdot h)^{1/\sigma}}, & {h\,\,>\,\, w^{2}/8} \\ {1,} & {h\le w^{2}/8}, \end{cases}$$

(2)$$h = \sum_{{<}s,t>{\in} N_{ij}}f(\mathrm{\Omega}_{st},\mathrm{\Omega}_{ij}),$$

(3)$$f(m,n) = \begin{cases} 1, & {\text{if}}\ {m=n} \\ {0,} & {\text{otherwise.}} \end{cases}$$

To intuitively understand the intention of the proposed hard metric, a visual contrast of bi-temporal images and the corresponding classification difficulty map generated under the hard metric is presented in Fig. 2. It can be seen that the generated classification difficulty map has a clear profile which explicitly highlights the hard pixels and regions needing to be given more attention. The motivation behind the hard metric is to up-weight the loss of hard samples as well as to down-weight the loss of easy samples. Based on the classification difficulties generated by the hard metric, the loss weights of hard and easy samples will be re-calibrated.

Fig. 2. A visual contrast of bi-temporal images, ground truth and the corresponding classification difficulty map. Note that the size of the image patch surrounding to the center pixel is set to be 9$\times$9 in this example. (a) Image acquired before laser irradiation. (b) Image acquired after laser irradiation. (c) Ground truth. (d) Classification difficulty map generated under the proposed hard metric.

Download Full Size | PDF

3.4 Hard loss

The focal loss which can be expressed by Eq. (4) has been proposed for dealing with the data imbalance problem in the classification and object detection tasks [40], where $\alpha _t$ indicates the weighting factor for the true class, $\gamma$ denotes the focusing degree of the sample. Though the focal loss has achieved good classification performance in some cases, there still exist some disadvantages when applying it to the real applications. For one thing, the focusing parameter $\gamma$ of the focal loss is set by hand-craft tuning in the actual experiments which is time-consuming and needs to be further optimized. For another thing, as the hard and easy samples sharing the same $\gamma$, the loss weights are only decided by the inferencing results under the specific value of $\gamma$. This loss weight adjusting manner is limited in some cases where the maximum adjusted proportion of loss weights between hard and easy samples is still not large enough to handle the class imbalance. Thus the settings of $\gamma$ for different samples need to be further explored for adapting to the practical sample distributions.

(4)$$\text{FL}(p_t) ={-}{\alpha_t} \cdot(1-p_t)^{\gamma}\cdot\log(p_t).$$

(5)$$\text{L\_hard} ={-}\sum_{k=1}^{N} \alpha \cdot (1-p_k)^{d_k}\cdot \log (p_k).$$

Here, we present a novel loss function called hard loss to resolve the class imbalance issue. The hard loss can be denoted by Eq. (5), where $p_k$ represents the classification probability of the $k$-th sample corresponding to the true class, $N$ is the number of samples in one batch, $\alpha \in (0,1]$ is the weighting factor for positive classes, $d_k$ is the classification difficulty of the $k$-th sample in one batch. As shown in Fig. 2, the smaller difficulty value of one sample, the harder it is to classify. Small classification difficulties assign relative large loss to the hard samples while large classification difficulties assign small loss to the easy samples. By integrating the $\alpha \cdot (1-p_k)^{d_k}$ into the cross-entropy (CE), the proposed loss function can realize powerful online hard sample mining in damage change detection of optical elements.

Compared to the fixed $\gamma$ in the original focal loss for all samples, the hard loss uses the customized classification difficulty for each individual sample. Thus the loss weights of hard and easy classes can be adjusted subtly by the classification difficulties, which leads to a more balanced class distribution. The parameter $\alpha$ may affect the performance of the hard loss, which needs to be set by the real data distribution of changed and unchanged classes. The setting of classification difficulty $d_k$ which is decided by the hard metric has a significant influence on the training result of the model. More reasonable classification difficulty settings result in better trained models.

3.5 Training

Before network training, the data set must have been prepared which consists of bi-temporal image patches, ground truth labels and the corresponding classification difficulties. For training the proposed model, the stochastic gradient descent (SGD) algorithm is used to optimize the computed loss. For alleviating the limitation of running memory of the computer device, the mini-batch learning approach is employed for computing and optimizing the loss. The settings of batch size is related to the size of sample and the hardware configuration of computer. The Gaussian initialization method is used for the last fully connected layer while the Xavier initialization method is used for the rest layers. To prevent model overfitting, the L2 regularization is utilized by setting an appropriate weight decay. The initial learning rate and the weight decay are set to be 0.02 and 0.00005 respectively. By using these skills and methods, we finally acquire a well-trained model in our experiments.

4. Experimental setups

As shown in Fig. 3, a small-aperture optical element damage testing system is used in this paper for laser-induced damage imaging and data acquiring. The damage testing system is composed of laser (355nm/532nm/1064nm switchable), illumination source, optical element, motion control platform, imaging lens and CCD (PixeLINK PL-D755). The location relationship of different components in this system can be found in Fig. 3. The work procedure of this system is as follows. (a) The parameters of the system are configured properly and the CCD image before laser irradiation (Image 1) is collected and stored in the computer. (b) The laser irradiates towards the tested optical element and the laser-induced damages may appear in the surface or body of the tested optical element. (c) The CCD image after laser irradiation (Image 2) is also collected and stored. (d) Adjust the parameters of the laser or the location of the tested optical element and then repeat the above process. Finally, pairs of bi-temporal damage images captured before and after laser irradiation respectively can be obtained.

For evaluating the performance of the proposed model, two real damage change detection datasets have been collected by the above-mentioned damage testing system for small aperture optical elements. The first dataset consists of four pairs of bi-temporal images with the same size of $740\times 740$ pixels. All the images in this dataset are optical images with three bands including red, green and blue. For each pair, there are two registered images which are captured by dark field imaging technique before and after laser irradiation respectively. The ground truth maps are acquired by integrating manual interpretation with prior information. The four image pairs and the corresponding ground truth maps in dataset 1 are shown in Fig. 4. Dataset 2 is collected by the same damage testing system but using the bright field imaging technique. There are four pairs of registered bi-temporal optical images in this dataset which are all with the same size of $501\times 501$ pixels and have three bands including red, green and blue. Different from the images in dataset 1, we can find that there exists interference of strong stray light in the background of the captured images in dataset 2, which increases the difficulty of change detection. The ground truth maps are also acquired by integrating manual interpretation with prior information. The bi-temporal images and the corresponding ground truth maps in dataset 2 are shown in Fig. 5.

Fig. 3. The diagram of damage testing system used in our experiments.

Download Full Size | PDF

Fig. 4. The bi-temporal images and the corresponding ground truth maps in dataset 1.

Download Full Size | PDF

Fig. 5. The bi-temporal images and the corresponding ground truth maps in dataset 2.

Download Full Size | PDF

4.1 Evaluation metrics

To quantitatively evaluate the results of proposed model and the comparison methods, we use the overall error (OE), the overall accuracy (OA), kappa coefficient (KC) and the first error measure (F$_1$) as the main evaluation indicators in this paper [11,43]. The detailed computing expressions of these indices can be denoted as follows:

(6)$${\textrm {OE}}=\textrm {FN}+\textrm {FP}.$$

(7)$${\textrm {OA}}=\frac{\textrm {TP}+\textrm {TN}}{\textrm {TP} + \textrm {TN} + \textrm {FP} + \textrm {FN}}.$$

(8)$${\textrm {KC}}=\frac{\textrm {OA}-\textrm {PRE}}{1-\textrm {PRE}},$$

(9)$$\begin{aligned} {\textrm {PRE}}= & \frac{(\textrm {TP}+\textrm {FP})\cdot (\textrm {TP}+\textrm {FN})}{(\textrm {TP} + \textrm {TN} + \textrm {FP} + \textrm {FN})^{2}} \\ & +\frac{(\textrm {TN}+\textrm {FN})\cdot (\textrm {TN}+\textrm {FP})}{(\textrm {TP} + \textrm {TN} + \textrm {FP} + \textrm {FN})^{2}}. \end{aligned}$$

(10)$${\textrm {F}_1}={\frac{2\cdot \textrm {TP}}{2\cdot \mathrm {TP+FN+FP}}}.$$

where TP, TN, FP, FN represent the number of true positive pixels, true negative pixels, false positive pixels and false negative pixels respectively. For the damage change detection task, TP and TN indicate the number of correctly classified changed and unchanged pixels, while FP and FN indicate the number of misclassified changed and unchanged pixels.

The OE indicates the total number of misclassified pixels. The OA denotes the overall accuracy of damage change detection of optical elements. KC and F$_1$ are two comprehensive indices which measure the global detection performance of the methods. Higher values of KC and F$_1$ indicate better change detection results and vice versa.

4.2 Implementation details

The proposed model is implemented by Caffe [44] and Matlab R2016b. The configuration of the computer used in our experiments is as follows: Ubuntu 16.04 LTS, Intel Core i5-7200U CPU at 2.5 GHz and RAM 8.0 GB. For better evaluating the performance of SimaMLP, five recently presented DL based approaches including DNN [18], DCVA [19], CWNN [20], SMO-SCNN [22] and KPCA-MNet [24] are used as the comparison methods in this paper.

4.3 Parameter setting

4.3.1 Setting of parameter $w$

As described in Section 3.1, the $w$ determines the neighbor size of the center pixel, namely the patch size of the generated samples. As the value of $w$ can distinctly affect the performance of the proposed model, we must select a reasonable setting of $w$ in our experiments. Here, $w$ is set to be 5, 7, 9, 11 and 13 respectively and then we can acquire the corresponding damage change detection results. Note that $\sigma$ and $\alpha$ are set to be 2.0 and 1.0 respectively during these experiments. The OE and KC results are shown with the form of broken line in Figs. 6 to 7. As we can find from the figures, the best damage change detection results can be got with $w = 9$ for both datasets. Too small $w$ makes the model hardly learning sufficient inherent feature representations of damages, while too large $w$ can result in blurring some boundaries of the changed objects and consuming more computation time. Therefore, $w$ is set to be 9 in all experiments of this paper.

Fig. 6. Relationship between the OE of the change detection results and neighborhood size $w$. (a) The image pair 2 in dataset 1. (b) The image pair 3 in dataset 1. (c) The image pair 3 in dataset 2. (d) The image pair 4 in dataset 2.

Download Full Size | PDF

Fig. 7. Relationship between the KC of the change detection results and neighborhood size $w$. (a) The image pair 2 in dataset 1. (b) The image pair 3 in dataset 1. (c) The image pair 3 in dataset 2. (d) The image pair 4 in dataset 2.

Download Full Size | PDF

4.3.2 Setting of parameters $\sigma$ and $\alpha$

As $\sigma$ is the key parameter for generating the classification difficulties of the samples, its setting directly influences the range of the classification difficulty thus greatly affecting the computed loss and even the training result of the network. For $\alpha$, its settings may play an active role in balancing the weights of positive and negative classes. Therefore, in this paper, we plan to find the best settings of $\sigma$ and $\alpha$ by alternately fixing one parameter and tuning the other within a certain range. First of all, with a fixed setting of $\alpha =1.0$, we set $\sigma$ to be 1.0 to 3.5 with a step of 0.5 and obtain the corresponding damage change detection results. By comparing the acquired results, we can determine the best setting of $\sigma$. After that, we tune $\alpha$ in the range of 0.1 to 1.0 with a fixed $\sigma$ and gain the corresponding change detection results. Thus we can find an appropriate setting of $\alpha$ for our experiments. The test results about $\sigma$ and $\alpha$ on two datasets are shown in Fig. 8. As shown in Fig. 8(a) and (b), the proposed model gains the optimal damage change detection results on dataset 1 with $\sigma =2.0, \alpha =1.0$. Meanwhile, Figs. 8(c) to (d) show that the proposed model achieves the best F$_1$ results on dataset 2 with $\sigma =3.0, \alpha =0.9$. Hence, we can get the best settings of $\sigma$ and $\alpha$ for the two damage change detection datasets in our experiments.

Fig. 8. The F$_1$ results about $\sigma$ and $\alpha$ on two datasets. (a) The test result about $\sigma$ on dataset 1. (b) The test result about $\alpha$ on dataset 1. (c) The test result about $\sigma$ on dataset 2. (d) The test result about $\alpha$ on dataset 2.

Download Full Size | PDF

5. Experimental results

5.1 Results on dataset 1

For dataset 1, as the second and the third image pairs have a great difference in size of damage changed regions, we select these two pairs as the test image pairs while the rest two pairs are used for generating the training samples. For training the proposed model, we collect a total of 1071648 samples in this dataset. The proportion of training set and validation set is set to be 2:1 with random sampling. Table 2 shows the quantitative results on dataset 1 by six different methods. Figure 9 provides the visual results on dataset 1 by all the comparison methods.

Fig. 9. Damage change detection results on dataset 1 obtained by different methods. (a) Test results on pair 2 by different methods. (b) Test results on pair 3 by different methods. Note that Image 1 and Image 2 represent the bi-temporal images acquired before and after laser irradiation.

Download Full Size | PDF

Table 2. Quantitative comparison of experimental results obtained by different methods on dataset 1.^a

View Table | View all tables in this article

As we can find from the statistical data in Table 2, the proposed method achieves the second best result on pair 2 and the best result on pair 3. Especially, our method obtains about 3$\%$ improvement to the second best method on image pair 3. For KPCA-MNet, it gains the worst result on image pair 3 and the second worst result on image pair 2. As shown in Fig. 9, there are so many FPs in the results of KPCA-MNet, which shows strong sensitivity to the noise in the optical images for its using simple threshold or clustering method to produce binary change map. CWNN gets the worst result on image pair 2 and second worst result on image pair 3, since the wavelet transform in CWNN is mainly designed to deal with speckle noise but has little effect to the noise in the captured laser-induced damage images. The DNN and DCVA both can detect the large damage changes while have weak capacity in detecting small damage changed regions due to their limited feature representation ability. For SMO-SCNN, as it uses a curve fitting based method to find the best weights for balancing the positive and negative classes, it achieves the best result on pair 2 and the second result on pair 3. Although SMO-SCNN has shown a strong damage change detection performance on this dataset, it still has a distinct disadvantage when applying it to the real applications. Namely, the curve fitting method used in SMO-SCNN is strenuous and time-consuming in selecting the best weights for the modified weighted softmax loss to address the data imbalance problem. For the proposed SiamMLP model, its parameter settings are more efficient than that of SMO-SCNN. At the same time, the computation cost of SiamMLP is less than that of SMO-SCNN. The proposed model has a powerful capability in detecting some detailed changes such as edges and bulges and has obtained competitive results to that of SMO-SCNN. Overall, considering both effectiveness and efficiency, our method is more suitable to the practical industrial application conditions compared to the other methods.

5.2 Results on dataset 2

In this dataset, following previous work in [22], the image pair 3 and image pair 4 which have relatively small damage changes compared to the other image pairs are selected for testing. While the first two pairs are used for producing the training samples. We have collected about 486098 samples in this dataset for training and validating the proposed SiamMLP model. The proportion of training set and validation set is also set to be 2:1. Table 3 and Fig. 10 present the results of six different methods on dataset 2.

Fig. 10. Damage change detection results on dataset 2 obtained by different methods. (a) Test results on pair 3 by different methods. (b) Test results on pair 4 by different methods. Note that Image 1 and Image 2 represent the bi-temporal images acquired before and after laser irradiation.

Download Full Size | PDF

Table 3. Quantitative comparison of experimental results obtained by different methods on dataset 2.^a

View Table | View all tables in this article

As shown in the statistical and visual results, our model achieves the best damage change detection performance among all the comparison methods. In terms of F$_1$ metric, our model outperforms the second best method by more than 1$\%$ both on pair 3 and pair 4. The results of DCVA and CWNN show that they have poor detection ability on small damage changes. As we can see from the first row in Fig. 10, only the largest and the first two largest changed regions can be correctly detected by DCVA and CWNN respectively, because they cannot address the unbalanced data distribution issue. For KPCA-MNet, its change detection results are a little better than that of DCVA and CWNN but still show weak ability in detecting small damage changed regions. That is caused by its weak feature representation and classification ability on hard samples which exhibit less effective information and contain a lot of noise. For DNN, although its statistical results are better than that of KPCA-MNet, it can be seen that there are still many FNs in its visual results. That is because there are seriously unbalanced data distributions between changed and unchanged classes in its generated samples, which has not been specially considered and handled by DNN. Benefiting from using the modified weighted softmax loss in the model training, SMO-SCNN gains the second best damage change detection results on both test pairs. However, there is regional adhesion phenomenon in the result of pair 3 which is caused by some FPs near the two real changed regions. For test pair 4, SMO-SCNN fails to detect the second changed region right above the central large changed region. For SiamMLP, it achieves the best damage change detection performance on this dataset compared to the other methods. As we can find from the last column in Fig. 10, the results of SiamMLP show the closest appearances as the ground truth change maps. The design of hard metric and hard loss equips SiamMLP to possess stronger online hard sample mining ability thus resulting in better change detection performance on the small changed regions.

6. Discussion

To validate the effectiveness of the core modules designed in the proposed method, we further conduct some additional experiments to investigate how these modules affect damage change detection performance. The MLP modules and hard loss are added in the original Siamese CNN network progressively in our experiments and we can obtain the corresponding damage change detection results. The softmax loss which is a classical and widely-used loss function in the classification task is utilized as the baseline loss to validate the effect of the proposed hard loss. Note that, all the experiments are conducted under the same hardware and software environment. Due to the limitation of space, only the visual results on dataset 2 are exhibited in this subsection.

6.1 Effects of MLP modules

To verify the MLP modules, a pair of experiments, i.e., Siamese CNN without MLP modules and SiamMLP with MLP modules have been performed. As listed in Table 4, except for pair 3 in dataset 2, the SiamMLP model which integrates the MLP modules in its network indeed improves the performance of the damage change detection in most test cases. Besides, using the MLP modules in the Siamese CNN makes the classification model inference in a more efficient manner and cost less computation time. As shown in Fig. 11, the model with MLP modules achieves better visual results than that of the model without MLP modules, since it can not only detect the extra small changed regions but also produce more accurate edges of changed regions. These above-described facts prove the effectiveness of the MLP modules in the proposed model.

Fig. 11. Visual comparison results on dataset 2 obtained by different methods. The first and second rows indicate the test results on pair 3 and pair 4 respectively. Note that Image 1 and Image 2 represent the bi-temporal images acquired before and after laser irradiation.

Download Full Size | PDF

Table 4. Performance comparison on the MLP modules and the hard loss by F$_1$ metric.^a

View Table | View all tables in this article

6.2 Effects of the hard loss

For evaluating the effect of introducing the hard loss in our model, four experiments (i.e., Siamese CNN with softmax loss, Siamese CNN with hard loss, SiamMLP with softmax loss and SiamMLP with hard loss) have been conducted in our experiments. As shown in Fig. 11 and listed in Table 4, the proposed hard loss outperforms the softmax loss both in statistical and visual results on all datasets. As we can find from the last two columns in Fig. 11, the models with hard loss gain more powerful detection capacity on the small changed regions, which gives credit to the introducing of hard metric and the weight re-adjusting strategy used in the computation of the loss. Benefiting from the hard loss, our model has stronger ability of online hard sample mining thus resulting in better damage change detection performance than that of the models with softmax loss.

7. Conclusion

This paper presents a novel DL based damage change detection method for optical elements. The proposed model integrates MLP modules into the siamese structure thus forming an end-to-end classification network, which can directly output a binary change result. The introducing of the MLP modules makes the proposed model run more efficiently without using any CNN and attention modules. For addressing the unbalanced distribution of hard and easy samples, we propose a novel hard metric and hard loss to adjust the weights of different samples by rescaling their loss weights in the computation of the loss. The proposed hard loss can enlarge the loss weights of hard samples and decrease the loss weights of easy samples, which leads to a better trained model. Benefiting from the MLP modules and the hard loss, the proposed SiamMLP model achieves the best damage change detection performance on two real datasets. The results on different datasets and the ablation study confirm the effectiveness and superiority of the proposed model. In the future, we plan to further decrease the computation complexity of the proposed model by optimizing its network architecture and using some model compression techniques.

Funding

National Natural Science Foundation of China (62005077, 62103311).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. P. A. Baisden, L. J. Atherton, R. A. Hawley, T. A. Land, J. A. Menapace, P. E. Miller, M. J. Runkel, M. L. Spaeth, C. J. Stolz, T. I. Suratwala, P. J. Wegner, and L. L. Wong, “Large optics for the national ignition facility,” Fusion Sci. Technol. 69(1), 295–351 (2016). [CrossRef]

2. J. Cao, Y. Jiang, R. Qiu, and T. Lü, “Filamentary damage of fused silica irradiated by a 532nm nanosecond laser,” Opt. Mater. Express 11(3), 936–942 (2021). [CrossRef]

3. G. Hallo, C. Lacombe, R. Parreault, N. Roquin, T. Donval, L. Lamaignère, J. Néauport, and F. Hild, “Sub-pixel detection of laser-induced damage and its growth on fused silica optics using registration residuals,” Opt. Express 29(22), 35820–35836 (2021). [CrossRef]

4. X. Zhang, Y. Jiang, R. Qiu, J. Meng, J. Cao, C. Zhang, Y. Zhao, and T. Lü, “Concentric ring damage on the front surface of fused silica induced by a nanosecond laser,” Opt. Mater. Express 9(12), 4811–4817 (2019). [CrossRef]

5. Y. Li, Y. Hui, Z. Yuan, Z. Liu, Z. Yi, Z. Zhe, S. Zhao, W. Jian, and X. Qiao, “Generation of scratches and their effects on laser damage performance of silica glass,” Sci. Rep. 6(1), 34818 (2016). [CrossRef]

6. X. Chai, P. Li, G. Wang, D. Zhu, J. Zhao, B. Zhang, Q. Zhu, K. Zheng, B. Chen, Z. Peng, L. Wang, F. Li, B. Feng, and Y. Jing, “Research on the growth interfaces of pyramidal and prismatic sectors in rapid grown KDP and DKDP crystals,” Opt. Mater. Express 9(12), 4605–4613 (2019). [CrossRef]

7. Y. Lian, D. Cai, T. Sui, M. Xu, Y. Zhao, X. Sun, and J. Shao, “Study on defect-induced damage behaviors of ADP crystals by 355 nm pulsed laser,” Opt. Express 28(13), 18814–18828 (2020). [CrossRef]

8. F. Bovolo and L. Bruzzone, “A theoretical framework for unsupervised change detection based on change vector analysis in the polar domain,” IEEE Trans. Geosci. Remote Sens. 45(1), 218–236 (2007). [CrossRef]

9. F. Bovolo, S. Marchesi, and L. Bruzzone, “A framework for automatic and unsupervised detection of multiple changes in multitemporal images,” IEEE Trans. Geosci. Remote Sens. 50(6), 2196–2212 (2012). [CrossRef]

10. J. S. Deng, K. Wang, Y. H. Deng, and Q. I. G. J., “PCA-based land-use change detection and analysis using multitemporal and multisensor satellite data,” Int. J. Remote Sens. 29(16), 4823–4838 (2008). [CrossRef]

11. H. Li, M. Gong, Q. Wang, J. Liu, and L. Su, “A multiobjective fuzzy clustering method for change detection in SAR images,” Appl. Soft Comput. 46, 767–777 (2016). [CrossRef]

12. Z. Li, L. Han, X. Ouyang, P. Zhang, Y. Guo, D. Liu, and J. Zhu, “Three-dimensional laser damage positioning by a deep-learning method,” Opt. Express 28(7), 10165–10178 (2020). [CrossRef]

13. K. Usmani, G. Krishnan, T. O’Connor, and B. Javidi, “Deep learning polarimetric three-dimensional integral imaging object recognition in adverse environmental conditions,” Opt. Express 29(8), 12215–12228 (2021). [CrossRef]

14. Z. Wang, Y. Zhang, L. Luo, and N. Wang, “TransCD: scene change detection via transformer-based architecture,” Opt. Express 29(25), 41409–41427 (2021). [CrossRef]

15. Y. Lei, D. Peng, P. Zhang, Q. Ke, and H. Li, “Hierarchical paired channel fusion network for street scene change detection,” IEEE Trans. Image Process. 30, 55–67 (2021). [CrossRef]

16. G. Krishnan, R. Joshi, T. O’Connor, and B. Javidi, “Optical signal detection in turbid water using multidimensional integral imaging with deep learning,” Opt. Express 29(22), 35691–35701 (2021). [CrossRef]

17. X. Li, S. An, H. Ji, J. Li, W. Shieh, and Y. Su, “Deep-learning-enabled high-performance full-field direct detection with dispersion diversity,” Opt. Express 30(7), 11767–11788 (2022). [CrossRef]

18. M. Gong, J. Zhao, J. Liu, Q. Miao, and L. Jiao, “Change detection in synthetic aperture radar images based on deep neural networks,” IEEE Trans. Neural Netw. Learn. Syst. 27(1), 125–138 (2016). [CrossRef]

19. S. Saha, F. Bovolo, and L. Bruzzone, “Unsupervised deep change vector analysis for multiple-change detection in VHR images,” IEEE Trans. Geosci. Remote Sens. 57(6), 3677–3693 (2019). [CrossRef]

20. F. Gao, X. Wang, Y. Gao, J. Dong, and S. Wang, “Sea ice change detection in SAR images based on convolutional-wavelet neural networks,” IEEE Geosci. Remote Sens. Lett. 16(8), 1240–1244 (2019). [CrossRef]

21. B. Du, L. Ru, C. Wu, and L. Zhang, “Unsupervised deep slow feature analysis for change detection in multi-temporal remote sensing images,” IEEE Trans. Geosci. Remote Sens. 57(12), 9976–9992 (2019). [CrossRef]

22. J. Kou, T. Zhan, D. Zhou, W. Wang, Z. Da, and M. Gong, “The laser-induced damage change detection for optical elements using siamese convolutional neural networks,” Appl. Soft Comput. 87, 106015 (2020). [CrossRef]

23. S. Saha, Y. T. Solano-Correa, F. Bovolo, and L. Bruzzone, “Unsupervised deep transfer learning-based change detection for HR multispectral images,” IEEE Geosci. Remote Sens. Lett. 18(5), 856–860 (2021). [CrossRef]

24. C. Wu, H. Chen, B. Du, and L. Zhang, “Unsupervised change detection in multitemporal VHR images based on deep kernel PCA convolutional mapping network,” IEEE Trans. Cybern.1–15 (2021).

25. J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020). [CrossRef]

26. Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “ECA-Net: Efficient channel attention for deep convolutional neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) (2020), pp. 11531–11539.

27. J. He, J.-N. Chen, S. Liu, A. Kortylewski, C. Yang, Y. Bai, and C. Wang, “TransFG: A transformer architecture for fine-grained recognition,” arXiv:2103.07976 (2021).

28. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16×16 words: Transformers for image recognition at scale,” arXiv:2010.11929 (2020).

29. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” arXiv:2103.14030 (2021).

30. I. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, and A. Dosovitskiy, “MLP-Mixer: An all-mlp architecture for vision,” arXiv:2105.01601 (2021).

31. D. Lian, Z. Yu, X. Sun, and S. Gao, “AS-MLP: An axial shifted mlp architecture for vision,” arXiv:2107.08391 (2021).

32. T. Yu, X. Li, Y. Cai, M. Sun, and P. Li, “Rethinking token-mixing MLP for MLP-based vision backbone,” arXiv:2106.14882 (2021).

33. S. Zagoruyko and N. Komodakis, “Learning to compare image patches via convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2015), pp. 4353–4361.

34. L. Leal-Taixé, C. Canton-Ferrer, and K. Schindler, “Learning by tracking: Siamese CNN for robust target association,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW) (2016), pp. 418–425.

35. B. Liu, X. Yu, P. Zhang, A. Yu, Q. Fu, and X. Wei, “Supervised deep feature extraction for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens. 56(4), 1909–1921 (2018). [CrossRef]

36. L. H. Hughes, M. Schmitt, L. Mou, Y. Wang, and X. Zhu, “Identifying corresponding patches in SAR and optical images with a pseudo-siamese cnn,” IEEE Geosci. Remote Sens. Lett. 15(5), 784–788 (2018). [CrossRef]

37. H. Zhang, W. Ni, W. Yan, J. Wu, H. Bian, and D. Xiang, “Visual tracking using siamese convolutional neural network with region proposal and domain specific updating,” Neurocomputing 275, 2645–2655 (2018). [CrossRef]

38. X. Ding, C. Xia, X. Zhang, X. Chu, J. Han, and G. Ding, “RepMLP: Re-parameterizing convolutions into fully-connected layers for image recognition,” arXiv:2105.01883 (2021).

39. M.-H. Guo, Z.-N. Liu, T.-J. Mu, and S.-M. Hu, “Beyond self-attention: External attention using two linear layers for visual tasks,” arXiv:2105.02358 (2021).

40. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV) (2017), pp. 2999–3007.

41. B. Li, Y. Liu, and X. Wang, “Gradient harmonized single-stage detector,” arXiv:1811.05181 (2018).

42. D. Li, G. Wen, Y. Kuai, L. Zhu, and F. Porikli, “Robust visual tracking with channel attention and focal loss,” Neurocomputing 401, 295–307 (2020). [CrossRef]

43. B. Desclée, P. Bogaert, and P. Defourny, “Forest change detection by statistical object-based method,” Remote Sens. Environ. 102(1-2), 1–11 (2006). [CrossRef]

44. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proc. 22nd ACM Int. Conf. Multimedia (2014), pp. 675–678.

Layer type	Node number	Output size
input	-	9 $\times$ 9 $\times 2$
slice	-	9 $\times$ 9 $\times$ 1 $\times$ 2
conv1/conv1_p	-	9 $\times$ 9 $\times$ 1
fc1/fc1_p	500	1 $\times$ 1 $\times$ 500
relu1/relu1_p	500	1 $\times$ 1 $\times$ 500
fc2/fc2_p	10	1 $\times$ 1 $\times$ 10
relu2/relu2_p	10	1 $\times$ 1 $\times$ 10
fc3/fc3_p	2	1 $\times$ 1 $\times$ 2
concat	4	1 $\times$ 1 $\times$ 4
fc4	2	1 $\times$ 1 $\times$ 2

Method	Pair 2				Pair 3
Method	OE	OA	KC	F $_{1}$	OE	OA	KC	F $_{1}$
DNN	5412	0.9901	0.9127	0.9180	404	0.9993	0.4817	0.4821
DCVA	5763	0.9895	0.8957	0.9013	362	0.9993	0.5667	0.5670
CWNN	8505	0.9845	0.8749	0.8832	18360	0.9665	0.0430	0.0458
SMO-SCNN	$2678$	$0.9951$	$0.9553$	$0.9579$	237	0.9996	0.7302	0.7304
KPCA-MNet	7380	0.9865	0.8811	0.8883	170800	0.6881	0.0029	0.0043
SiamMLP	3073	0.9944	0.9475	0.9505	$166$	$0.9997$	$0.7521$	$0.7522$

Method	Pair 3				Pair 4
Method	OE	OA	KC	F $_{1}$	OE	OA	KC	F $_{1}$
DNN	$223$	$0.9991$	0.9322	0.9326	246	0.9990	0.9053	0.9057
DCVA	498	0.9980	0.8468	0.8478	807	0.9968	0.6874	0.6890
CWNN	698	0.9972	0.8261	0.8275	1064	0.9958	0.7091	0.7112
SMO-SCNN	252	0.9990	0.9265	0.9270	229	0.9991	0.9134	0.9139
KPCA-MNet	397	0.9984	0.8918	0.8926	359	0.9986	0.8763	0.8770
SiamMLP	224	$0.9991$	$0.9362$	$0.9366$	$202$	$0.9992$	$0.9228$	$0.9232$

Method	Loss	Dataset 1		Dataset 2
Method	Loss	Pair 2	Pair 3	Pair 3	Pair 4
Siamese CNN	Softmax Loss	0.9423	0.6798	0.9270	0.9139
	Hard Loss	0.9468	0.7313	$0.9376$	0.9154
SiamMLP	Softmax Loss	0.9499	0.7394	0.9317	0.9181
	Hard Loss	$0.9505$	$0.7522$	0.9366	$0.9232$

Layer type	Node number	Output size
input	-	9 $\times$ 9 $\times 2$
slice	-	9 $\times$ 9 $\times$ 1 $\times$ 2
conv1/conv1_p	-	9 $\times$ 9 $\times$ 1
fc1/fc1_p	500	1 $\times$ 1 $\times$ 500
relu1/relu1_p	500	1 $\times$ 1 $\times$ 500
fc2/fc2_p	10	1 $\times$ 1 $\times$ 10
relu2/relu2_p	10	1 $\times$ 1 $\times$ 10
fc3/fc3_p	2	1 $\times$ 1 $\times$ 2
concat	4	1 $\times$ 1 $\times$ 4
fc4	2	1 $\times$ 1 $\times$ 2

An end-to-end laser-induced damage change detection approach for optical elements via siamese network and multi-layer perceptrons

Abstract

1. Introduction

2. Background

2.1 Siamese network

2.2 Introduction to MLP

2.3 Data imbalance problem

3. Methodology

3.1 Overall description of the proposed method

3.2 Network architecture

3.3 Hard metric

3.4 Hard loss

3.5 Training

4. Experimental setups

4.1 Evaluation metrics

4.2 Implementation details

4.3 Parameter setting

4.3.1 Setting of parameter $w$

4.3.2 Setting of parameters $\sigma$ and $\alpha$

5. Experimental results

5.1 Results on dataset 1

5.2 Results on dataset 2

6. Discussion

6.1 Effects of MLP modules

6.2 Effects of the hard loss

7. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (11)

Tables (4)

Equations (10)

Optics Express