Node-splitting optimized canonical correlation forest algorithm for sea fog detection using MODIS data

Jianhua Wan; Jiajia Li; Mingming Xu; Shanwei Liu; Hui Sheng

doi:10.1364/OE.454570

1. Introduction

Sea fog is a weather phenomenon that can reduce the horizontal visibility of the sea surface to less than 1 km, caused by the condensation of water vapor in the lower atmosphere on the sea surface. Sea fog can adversely affect coastal and marine production activities [1]. Sea fog can significantly reduce visibility, affecting marine aviation and transportation, marine fisheries, oil and gas development and other activities in the coastal and marine areas [2,3]. Sea fog can deadly affect the navigation of ships, and more than 80% of collision accidents are caused by low visibility [4]. Hence, it is crucial to detect the temporal and spatial distribution of sea fog accurately to reduce the disaster loss caused by it and protect the society and livelihood economy.

Traditional sea fog observation equipment cannot monitor sea fog macroscopically and quickly due to the sparse observation stations [5]. The advantages of satellite remote sensing, such as wide observation range, fast information acquisition and real-time dynamic observation, enable it to become the reliable technical means for monitoring sea fog. At present, scholars mostly conduct sea fog detection research based on Aqua/Terra [6–10], GMS [11], GOCI [12–15], Himawari-8 [16,17], FY-2E [18,19] and other satellite data. Moreover, the current study still has a problem of the high-precision distinction between low clouds and sea fogs, and the sea fog detection algorithms based on optical satellites cannot solve this problem well [5]. Active detection satellites can acquire vertical profile information of the atmosphere, which provides a solution to this problem [5,20]. Therefore, this study combined the active and passive satellites to distinguish sea fog from low clouds to improve the detection effect of sea fog.

Satellite-based sea fog detection methods have been developed. The threshold-based methods are mainly based on the statistical analysis of spectral and texture information of satellite images to achieve sea fog detection [5]. Cermak and Bendix [21] detected and analyzed the sea fogs and stratus in the southwest coast of Africa through using the Meteosat-8 Spinning-Enhanced Visible and Infra-Red Imager (SEVIRI) data. Zhang and Yi [22] used the Moderate-Resolution Imaging Spectroradiometer (MODIS) data and the sea surface temperature (SST) data to detect sea fogs and stratus. Ryu and Hong [17] constructed a normalized difference snow index (NDSI) and realized sea fog detection. However, with different regions and times, the selection of the thresholds needs to be changed constantly, which leads to great drawbacks in the practical application of this method.

In recent years, deep learning (DL) and ensemble learning have been gradually applied to remote sensing image classification research [23–29]. Compared with typical machine learning methods, the two methods can automatically learn the features from samples and have stronger generalization ability and robustness. However, in the field of cloud and sea fog detection, the application of DL and ensemble learning is still in its infancy. The convolutional neural network (CNN) is currently the most widely used DL model in image classification and sea fog detection [30–31]. Wang et al. [32] proposed a transfer convolutional neural network (TCNN) method, realizing high precision and fast identification of ground-based visible cloud images classification. Random forest (RF) is the most popular ensemble learning method based on decision trees, and Fu et al. [33], Tan et al. [34] and Welch et al. [35] made achievements in cloud classification or the estimation of cloud base height based on FY satellite data by using it, respectively. Zhang et al. [36] integrated CNNs with different base learners and RNNs with different depths and acquired higher cloud classification accuracy.

Although the methods mentioned above have achieved good classification results, the complicated parameter adjustment in the algorithm model reduces the automation and intelligence of the method, and the ensemble learning method based on the neural network requires a large number of samples for model training, which puts forward a higher requirement for the selection of samples.

In order to reduce the parameter setting in the process of sea fog detection and use fewer training samples while maintaining the prediction accuracy, we employed the Canonical correlation forest (CCF) [37] algorithm in which only one parameter needs to be set to detect the sea fog [38–41]. It has been found that the CCF algorithm has good applicability in binary and multi-class classification problems, and its performance was found to be superior or comparable to other ensemble methods based on decision trees [38].

Whereas the original canonical correlation tree (CCT) in the CCF does not consider the attribute characteristics of the data points when splitting, and the majority voting rule used does not fully consider the reliability of each tree when integrating each canonical correlation tree, leading to the spectral characteristics of each ground object sample not being fully considered in the algorithm. To overcome the disadvantages, we improved the split criterion and integration rule of the CCF algorithm and proposed a node splitting optimized canonical correlation forest algorithm. Moreover, we constructed the sample dataset using the active satellite (Cloud-Aerosol Lidar with Orthogonal Polarization, CALIOP) data and MODIS L1B data and established a sea fog detection model of the node splitting optimized canonical correlation forest algorithm, thus realizing sea fog remote sensing detection.

Section 2 describes the study area and the data used. In Section 3, the proposed sea fog detection algorithm and the flowchart of sea fog detection algorithm based on it are described, and we clarify the selection of samples and evaluation indicators and the setting of experimental parameters. Section 4 provides the results of CCF, the proposed sea fog detection algorithm, and compares the results of sea fog detection with those of three typical machine learning algorithms, which verifies the effectiveness of our algorithm. Section 5 gives the conclusion.

2. Study area and data

2.1 Study area

The Yellow Sea and the Bohai Sea are adjacent to the foggy area along the coast of the northwest Pacific [42], which is the sea region with a high incidence of sea fog in China's offshore. The seasonality of sea fog in the Yellow Sea is more obvious, and from April to June, the occurrence of sea fog in the Yellow Sea is frequent, while the frequency of sea fog in the Bohai Sea is lower and more average, with the most occurring in December throughout the year [43]. Therefore, in this study, the Yellow Sea and the Bohai Sea area surrounded by eastern China and Korean Peninsula is selected as the study area (shown in Fig. 1).

Fig. 1. Study area.

Download Full Size | PDF

2.2 Data

2.2.1 Optical satellite data

The medium-resolution imaging spectrometer MODIS is mainly carried on Terra and Aqua satellites. Terra and Aqua satellite (collectively referred to as EOS satellites) are satellites in the United States Earth Observing System, which transit at around 10:30 and 13:30 local time per day respectively, and mainly realize long-term observation and research on atmospheric and earth environment changes through comprehensive observation of the solar radiation, atmosphere, ocean and land. MODIS has 36 bands, with a spectral range from 0.4µm (visible light) to 14.4µm (thermal infrared). Its spatial resolution can reach 250 m at most, and its swath width is 2 330 km. MODIS products can be divided into six types: level 0 products (original data), level 1, level 2, level 3, level 4 and level 5 products, among which L1B data is more commonly used. This study used version 6.0 MODIS L1B data MOD021KM / MYD021KM and L1 data MOD03 / MYD03.

2.2.2 Spaceborne lidar data

The Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) were jointly developed by the National Aeronautics and Space Administration (NASA) and Centre National d’Etudes Spatiales (CNES) and successfully launched on April 28th, 2006. The satellite is operating in a sun-synchronous orbit at a distance of 705 km from the ground with an orbital inclination of 98.2°. Its payload consists of three instruments, cloud-aerosol lidar with orthogonal polarization (CALIOP), infrared imaging radiometer (IIR), and wide-field camera (WFC), among which CALIOP is the most critical detector.

CALIOP can achieve global measurement. As an active remote sensor, it can simultaneously emit laser pulses at 532nm and 1 064nm and detect the vertical profile information between the satellite and the sub-satellite point. The vertical and horizontal resolution of CALIOP data can reach 30m and 333m, respectively. However, because of the different properties of the atmosphere at different altitudes, CALIOP set various spatial resolutions according to different altitudes, as shown in Table 1. The Level 2 vertical feature mask data (VFM) of CALIOP and Level 1B radar data were selected to verify the methods. The VFM data can provide information such as the clouds and aerosol element types, geographical location and UTC in the vast ocean area that cannot be measured.

Table 1. The spatial resolution of CALIOP product

View Table | View all tables in this article

2.2.3. Meteorological station data

Download the measured data of nine meteorological stations along the Yellow Sea and the Bohai Sea coast from June to September 2019. The sites include Dalian, Yingkou, Laoting, Longkou, Changdao, Chengshantou, Haiyang, Qingdao and Rizhao. Furthermore, the specific meteorological parameters of the observed data of each meteorological station include cloud cover, air temperature, air pressure, horizontal visibility, relative humidity, dew point temperature, precipitation, and so forth. The spatial distribution of meteorological stations is shown in Fig. 2.

3. Method

3.1 Dataset

The ensemble learning algorithm used is a supervised learning process, so building a training sample dataset is necessary. The traditional artificial visual sample selection method is highly subjective and prone to misjudgment. Therefore, in this paper, the CALIOP VFM data and MODIS image data with similar transit time differences were selected as synchronous data, and the ground object samples were chosen according to visual interpretation combined with the CALIOP VFM data. The ground object samples were classified into the cloud, sea fog, and sea surface.

Fig. 2. Spatial distribution of meteorological stations. The red points indicate the location of the coastal meteorological stations.

Download Full Size | PDF

In remote sensing images, the medium-high clouds appear as thin slices, with rough textures and fuzzy boundaries, often in the form of irregular filaments. Low clouds are usually individual or flake clouds with pebble-like surfaces, which have a relatively uniform surface but the fuzzy boundary, while the sea fogs are milky white, with clear boundaries and uniform and smooth surfaces as a whole. Based on the above interpretation characteristics, the cloud which was visually interpreted as suspected sea fog in the image and close to the sea surface in the CALIOP VFM data or the abnormal sea surface, which was the no-signal area above the sea surface in the VFM data, were selected as sea fog sample points [6]. The cloud, which was visually interpreted as the suspected cloud in the image whose cloud base height exceeded 1 km, was selected as cloud sample points.

The spatial resolution of CALIPSO data is 333 m, but that of MODIS data is 1 km, so a single MODIS pixel contains multiple CALIPSO sample points. Therefore, the type of MODIS pixel depended on the mode of the type of CALIPSO sample points, and the adjacent MODIS pixel connected within 10×10 vicinity and had a similar visual interpretation result to that pixel in the image was also selected as the same type.

3.2 Channel selection

The analysis of clouds and fogs using the radiation characteristics of remote sensing images is the basis for remote sensing identification of sea fog. Clouds and fogs have higher reflectivity than the ocean surface in the visible and near-infrared channels in MODIS, and the reflectivity of medium-high clouds is significantly higher than that of low clouds and sea fogs because of their larger optical thickness. In the mid-infrared channel, the reflection intensity of clouds and sea fogs depends on the size of their particles, and the smaller the particles, the greater the reflection intensity. Since most fog particles are smaller than low clouds and medium-high clouds, the solar radiation reflected by fogs in the mid-infrared channel is larger than that reflected by low clouds. The radiation signal of the far-infrared channel mainly comes from the infrared radiation emitted by the earth—the lower the temperature, the lower the radiation value received by satellite. Therefore, the radiation of medium-high clouds is lower than that of other objects, while the brightens temperature of sea fogs, low clouds and ocean surface are similar [5].

MODIS data has high spatial resolution and rich band information. MODIS channels 1–7 and 17–19 are commonly used for daytime sea fog detection in the visible and near-infrared bands [44,45]. Deng et al. [6] used MODIS channels 1, 2, 18, 20, 31 and their combinations for sea fog detection and achieved good classification results. Based on the analysis of spectral characteristics of different bands, this paper selected MODIS channels 1, 2, 3, 4, 5, 7, 17–19, 20, 26, 31 and 32 to fully obtain the information on the cloud, fog and sea surface from the visible band to far-infrared band. The band characteristics of MODIS are depicted in Table 2.

Table 2. The band characteristics of selected MODIS bands

View Table | View all tables in this article

3.3 Node-splitting optimized canonical correlation forest algorithm

The canonical correlation forest (CCF) algorithm is a decision-tree-based ensemble learning algorithm proposed by Rainforth and Wood [37]. Like other traditional decision tree ensemble methods, CCTs are also generated from top to bottom. However, the main difference is that the algorithm first performs canonical correlation analysis (CCA) between features and category labels, then performs hyperplane splitting in the projected space and uses exhaustive search to select the segmentation instead of searching on axis-aligned segmentation. Besides, since the hyperplane splitting is calculated separately on each node, CCF is better at incorporating local correlations than the previous methods [37].

In addition, without the setting and optimization of parameters, the CCF algorithm only needs to define the number of decision trees (ntrees). Therefore, this algorithm is also designated as a parameter-free and user-friendly ensemble learning algorithm [40]. Compared with other state-of-the-art tree ensemble learning methods (such as RF), the CCF algorithm has better performance and is more competitive in smaller ensemble size [5]. However, the original CCT does not consider the attribute characteristics of data points when splitting, and the majority voting rule used does not fully consider the reliability of each tree when integrating each CCT. Overall, this study proposes an improved CCF algorithm (I-CCF), which is the node splitting optimized canonical correlation forest algorithm.

The original CCF algorithm takes the entropy of the node as the split criterion so that the gain of split only depends on how the data allocate the data points to each child node, resulting in any identical partitioning of the training data will give the same gain [37]. In order to further improve the classification accuracy of the algorithm, the attributes are considered to be added into the split criterion, that is, using the information gain rate instead of the entropy of the node as the new split criterion, and selecting the segmentation that can reduce impurities. The formula of information gain rate is expressed in the following equations:

(1)$$GR(Y({\omega _p},:),{\delta _i}) = \frac{{\sum\limits_{k = 1}^K {{p_k}\log {p_k} - \sum\limits_{j = 1}^{{m_i}} {\sum\limits_{k = 1}^K {p(k|{v_{i,j}})\log p(} } } k|{v_{i,j}})}}{{\sum\limits_{j = 1}^{{m_i}} {p({v_{i,j}})\log } p({v_{i,j}})}}$$

where ${\omega _p}$ and $Y({\omega _p},:)$ respectively are the index and the label dataset of training points that fall into the partition of node p, ${\delta _i}$ is a feature from the feature dataset in projected space, where $i = \{ 1,2, \cdots ,a\} $, a is the number of features in the projected space), and the value range of each discrete feature ${\delta _i}$ is ${v_1}$ to ${v_{{m_i}}}$ (${m_i}$ is the number of values of ${\delta _i}$). ${p_k}$ is the empirical probability of class k at the node, $p({v_{i,j}})$ is the probability that the value of ${\delta _i}$ is ${v_j}$, $p(k|{v_{i,j}})$ is the probability that ${\delta _i}$ belongs to class k under the condition that the value of it is ${v_j}$, and $GR(Y({\omega _p},:),{\delta _i})$ is the information gain rate of feature ${\delta _i}$ on the node p.

On the other hand, in CCF, the characteristics of the training samples selected by different CCTs are also different, leading to varying performance of different CCTs. Some CCTs have a more significant impact on the prediction results, while others have less impact on the prediction results. In order to improve the overall performance of CCF, it is necessary to improve the weight of each CCT. Hence, the posterior probability of each CCT is used to indicate its reliability to set the weight of the CCT in the CCF, so that the better the performance of the CCT has, the higher the weight is, thereby improving the classification effect of CCF.

The calculation formula of weight used in this paper was defined as:

(2)$${p_i} = \frac{{erro{r_i}}}{N}, $$

(3)$${w_i} = \frac{{1/{p_i}}}{{\sum\limits_{j = 1}^T {1/{p_j}} }}, $$

where ${p_i}$ is the error rate (i.e., posterior probability) of the i th CCT, $erro{r_i}$ is the number of samples misclassified by the i th tree, N is the total number of samples in the training dataset ${w_i}$, is the weight of the i th CCT, and T is the total number of trees in the CCF.

After obtaining the weight of each CCT, the final classification result of the model can be obtained by Eq. (4):

(4)$$R(x) = \mathop {\textrm{arg max}}\limits_{c = 1,2, \cdots ,k} \left\{ {\sum\limits_{{t_i}(x) = c,i = 1}^T {{w_i}} } \right\}, $$

where ${t_i}(x)$ is the classification result of the test sample x for the i th CCT, k is the total number of classes in the label dataset, and c is one of the class from k. $R(x)$ is the final classification result of test sample x after weighted voting.

To further improve the prediction accuracy and realize the continuous learning function of the model, the weights are updated whenever a new testing dataset is input, so as to achieve the purpose of optimizing the model. The weight in the new testing dataset is adjusted as follows:

(5)$${w_i} = \frac{{{w_i} \times N + w_i^{\prime} \times {N^{\prime}}}}{{N + {N^{\prime}}}}. $$

Figure 3 demonstrates the flowchart of the node splitting optimized CCF algorithm for the remote sensing detection of daytime sea fog. Initially, the MODIS image data was preprocessed and the characteristics in it were selected. Afterward, combined with CALIOP VFM data, the sample dataset was selected as input data for model training, and finally, the proposed CCF algorithm model was generated.

Fig. 3. Flowchart of the remote sensing detection of daytime sea fog based on the proposed algorithm.

Download Full Size | PDF

3.4 Parameter setting and evaluation metrics

The proposed algorithm model only needs to adjust the ntrees parameters. Through multiple experiments, we found that the mis-classification rate and kappa coefficient tended to be stable at 300 trees in Fig. 4. Since the running time increases linearly with the increase of ntrees, this study selected 300 trees as the optimal choice for the proposed algorithm model.

Fig. 4. Number of trees optimizing for CCF model: Kappa coefficient (a); misclassification rate (b).

Download Full Size | PDF

In order to verify and compare the accuracy and stability of the algorithm, this study used the probability of detection (POD), the false alarm rate (FAR), the critical success index (CSI) and the Matthew's correlation coefficient (MCC) to quantitatively evaluate the sea fog detection effect of each model. The above indexes are calculated by recording true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) using the following relationships:

(6)$$POD = \frac{{TP}}{{TP + FN}}, $$

(7)$$FAR = \frac{{FP}}{{FP + TN}}, $$

(8)$$CSI = \frac{{TP}}{{TP + FN + FP}}, $$

(9)$$MCC = \frac{{TP \times TN - FP \times FN}}{{\sqrt {({TP + FP} )({TP + FN} )({TN + FP} )({TN + FN} )} }}\cdot$$

Meanwhile, to prove that the proposed algorithm has the advantage of few-shot learning and obtain the best number of training samples, this study also designed a sensitivity analysis. We set the minimum number of the training dataset to 5000 sample points, incremented it to 40000 in 5000 intervals, and constructed five different training datasets for model training. All those trained models used the same test dataset to produce a fair comparison. The distribution of POD and CSI are shown in Fig. 5.

Fig. 5. Distribution of evaluation metrics for different sample numbers.

Download Full Size | PDF

It can be seen from the Fig. 6 that POD and CSI tend to increase with the increase in the number of training samples. When the number of training samples is more than 35000, POD is stable above 83%, and CSI is steady above 60%, proving that the proposed model can use fewer samples to obtain better sea fog detection results. Therefore, this study selected 30000 sample points to form the training dataset to train the model.

Fig. 6. Comparison of marine fog detection results by different algorithms: MODIS satellite cloud image (a), marine fog detection results based on RF algorithm (b) and the proposed algorithm (c), and CALIOP VFM profile image (d).

Download Full Size | PDF

We completed the interpretation of 17 MODIS images from 2017 to 2020 and obtained 35000 sample points. The sample dataset was randomly divided into the training dataset and testing dataset according to the ratio of 4:1 (see Table 3). Further, the training dataset samples were randomly divided into the model training dataset and weight training dataset according to the ratio of 7:3, in which the model training dataset was used as an input dataset to train the CCTs based on the node splitting optimization. In contrast, the weight training dataset was used to calculate the weight of each CCT.

Table 3. The sample dataset

View Table | View all tables in this article

4. Results and discussion

In order to better verify the effectiveness of this method and compare its classification accuracy with three typical machine learning algorithms and ensemble learning algorithms, some sea fog cases were evaluated and discussed visually, and the CALIOP data and observation data from meteorological stations along the Yellow Sea and Bohai Sea were also used for multiple verifications.

4.1 Validation with foggy and partly cloudy cases

Taking a sea fog event at 13:00 on June 6th, 2020 as an example, the sea fog detection of MODIS data was carried out based on RF algorithm and the proposed sea fog detection algorithm, and the VFM product provided by CALIPSO satellite was selected to conduct a case test. The MODIS satellite image shows that there is a mature sea fog area in the middle of the Yellow Sea and a cloud area on the south side. In this case, the cloud top height is below 500m and the cloud base height is close to the sea level in point A (33.745°N, 124.415°E) and the area from point B (34.543°N, 124.191°E) to point C (35.618°N, 123.884°E), which meets the criterion of the identification of the sea fog, so this area is considered to be the sea fog area.

As can be observed in Fig. 6, both sea fog detection algorithms can accurately identify the main location of the sea fog and cloud areas. The range of the sea fog area in the CALIOP VFM image was also identified as sea fog pixels, which shows that the detection results were credible. As shown in Fig. 6 (a), there are two mist areas surrounded by sea fog on the west coast of the Yellow Sea. The sea fog detection result based on the RF algorithm can identify most of the sea fog, but there was a severe phenomenon of missing detection for the mist. However, the proposed sea fog detection algorithm performs well on thinner structures, making the detection result more continuous, although some of the sea surface surrounded by the mist was misjudged as sea fog. Meanwhile, since point A is located at the edge of the sea fog area, the sea fog is also relatively thin. Compared with the detection result of the proposed sea fog detection algorithm, the RF algorithm failed to detect the sea fog. In the adjacent area of point A, it misidentified part of the sea fog as cloud, which resulted in misclassification.

Therefore, the sea fog detection algorithm based on the proposed algorithm can distinguish between sea fog and cloud better, with a more continuous detection effect. More importantly, it performed better in the thin cloud and sea fog, although part of the clear sea surface was still misjudged as sea fog areas.

Figure 7 shows the detection results of the proposed algorithm in foggy and partly cloudy weather cases. We observe that the proposed algorithm can well identify the location of sea fog that is not covered by clouds in the image and has a good detection effect on small, broken and thinner sea fog areas. However, due to the similarity of the spectral characteristics between low clouds and sea fogs, some low clouds, especially at the edge of the cloud areas, were misidentified as sea fogs. As noted in Fig. 7 (d)-(f), the areas in the red box presents the areas where the cloud was mistakenly detected as sea fog. The area in the yellow box in Fig. 7 (e) represents the area of sea fog that was missed. It can be seen that the proposed algorithm was prone to false detection at the edge of the stratus and the broken low cloud areas.

Fig. 7. MODIS satellite cloud image on 29 March 2017, at 5:30 UTC (a), 8 June 2018, at 5:05 UTC (b), and 4 June 2019, at 5:00 UTC (c). Results of the proposed sea fog algorithm on 29 March 2017, at 5:30 UTC (d), 8 June 2018, at 5:05 UTC (e), and 4 June 2019, at 5:00 UTC (f).

Download Full Size | PDF

4.2 Validation with CALIOP data

We downloaded all the CALIOP VFM data and the matched MODIS daytime data in the synchronous transit from January to September 2018. According to the matching method of CALIOP VFM data and MODIS data described in Section 3.1, only extracted the MODIS pixels that the trajectory of the CALIOP satellite passed through. Thus, we can obtain the verification data (see Table 4). The typical machine learning methods (Logistic algorithm, support vector machines (SVM) algorithm and decision tree (DT) algorithm), RF algorithm, CCF algorithm, and the proposed sea fog detection algorithm were used to detect the sea fog in MODIS images, respectively. The fog and non-fog pixels of VFM data were compared with those detected by the six algorithms, and calculated the POD, FAR, CSI and MCC.

Table 4. The sample distribution of valid dataset

View Table | View all tables in this article

Table 5 presents that the sea fog detection algorithms based on ensemble learning displayed superior performance versus those based on typical machine learning, and they were more stable. However, there was still some misclassification and false classification. Among them, the POD and CSI of the CCF algorithm were 81.67% and 60.14%, which was better than that of the three typical machine learning algorithms, verifying the effectiveness of this method. Compared with the RF algorithm, the CCF algorithm had lower POD and CSI. It is due to the fact that the GINI index is used as the segmentation criterion in the RF algorithm, and attribute features are considered when nodes are split, which makes the RF algorithm have better classification accuracy. However, under the premise of little difference in the accuracy, the running time of the CCF algorithm was faster than the RF algorithm.

Table 5. Accuracy comparison of six methods based on CALIOP data verification

View Table | View all tables in this article

The evaluation indexes of the proposed sea fog detection algorithm in this study were higher than those of the three typical machine learning algorithms, RF algorithm, and CCF algorithm. Especially compared with the detection accuracy of the CCF algorithm, this algorithm improved 2.96% on POD, decreased 0.63% on FAR, and CSI also increased by 3.87%. In addition, the MCC, considering the true and false negatives and the true and false positives, was always used to evaluate the overall performance of the model. The experimental result shows that the MCC of the proposed sea fog detection algorithm reached 74.14%, which was the highest among all the algorithm models, indicating that the algorithm had a better classification effect and stronger stability without setting too many parameters. So that, the proposed sea fog detection algorithm is more practical. Therefore, the sea fog detection algorithm of the node splitting optimized canonical correlation forest algorithm is satisfactory.

4.3 Validation with meteorological station data

To further prove the effectiveness of the proposed algorithm, we selected nine meteorological stations (Dalian, Yingkou, Laoting, Longkou, Changdao, Chengshantou, Haiyang, Qingdao and Rizhao) along the Yellow Sea and Bohai Sea from June to September 2019 for the verification, and compared the evaluation metrics of the proposed algorithm with the RF. The experimental results are shown in Table 6. According to the research of Wu [46], the meteorological stations with horizontal visibility less than 10 km and relative humidity greater than 90% was regarded as the stations with sea fog. Since we cannot obtain the detection results from the detection images directly, this study took the pixel point where the meteorological station was located as the center of the circle, searched all the classified pixels within the 30×30 pixels range around it, and regarded the largest proportion of the category as the algorithm detection result.

Table 6. Comparison between sea fog detection results and coastal meteorological station observation data

View Table | View all tables in this article

Table 6 presents that the proposed algorithm had higher POD and MCC than the RF, and its CSI was still greater than 60%, consistent with the results in Section 4.2. So the proposed algorithm has practical value in sea fog detection. Meanwhile, the accuracy of both algorithms was slightly lower than that of CALIOP data verification results. The reasons are as follows:

(1) Some stations have distance from the shore, and there is a phenomenon that the sea fog does not extend to the land or rises into stratocumulus when invading the land [47], which makes the meteorological station shows no fog while the detection result shows fog, resulting in lower MCC and higher FAR of the detection result;
(2) In most cases, clouds cover the sea fog, at which time the sensor cannot penetrate the clouds to obtain the characteristics of sea fog, so that the meteorological station shows fog while the detection result shows cloud, resulting in lower POD and MCC and higher FAR of the detection result.

5. Conclusion

In this study, a sea fog remote sensing detection algorithm of the node splitting optimized canonical correlation forest algorithm is proposed. The CCF algorithm is optimized by improving the split criterion from node entropy to the information gain rate of node entropy and improving the integration strategy from majority voting rule to weighted voting rule based on the reliability of each CCT. Sea fog and cloud characteristics are used to train each CCT, thus realizing the automatic and effective detection of sea fog events. This study proves the practicability of the CCF algorithm in sea fog detection by comparing and verifying with the CALIOP data and the observation data from meteorological stations. What’s more, compared with three typical machine learning algorithms and RF algorithm, the experimental results show that the proposed sea fog detection algorithm performed better in sea fog remote sensing detection, with the POD, FAR, CSI, and MCC were 84.63%, 5.70%, 64.01%, and 74.14% respectively, and can accurately identify the heavy fog and thin fog that are difficult to identify by other algorithms. However, because only spectral features are selected for classification in this study, there still had the problem of misclassification of sea fog and cloud. In addition, Aqua and Terra satellites are optical satellites, which cannot penetrate the cloud, so when the cloud covers the sea fog, it is impossible to identify the fog area accurately. Generally speaking, the algorithm is feasible in sea fog detection, but it still needs to be further improved to enhance the separation ability between sea fog and cloud.

Funding

National Key Research and Development Program of China (2017YFC1405600); Natural Science Foundation of Shandong Province (2019GHY112034).

Acknowledgments

We would like to thank the NASA Goddard Earth Sciences Data and Information Services Center for MODIS and CALIOP data support, the China National Meteorological Information Center for meteorological station data support.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. Zhang, L. Man, X. Meng, F. Gang, and S. Gao, “A comparison study between spring and summer fogs in the Yellow Sea-observations and mechanisms,” Pure Appl. Geophys. 169(5-6), 1001–1017 (2012). [CrossRef]

2. S. Zhang and X. Bao, “The main advances in sea fog research in China,” J. Ocean U. China 38(3), 359–366 (2008).

3. D. Koračin, C. E. Dorman, J. M. Lewis, J. G. Hudson, E. M. Wilcox, and A. Torregrosa, “Marine fog: A review,” Atmos. Res. 143, 142–175 (2014). [CrossRef]

4. G. Fu, J. Guo, S. P. Xie, Y. Duan, and M. Zhang, “Analysis and high-resolution modeling of a dense sea fog event over the Yellow Sea,” Atmos. Res. 81(4), 293–303 (2006). [CrossRef]

5. Y. Xiao, J. Zhang, T. Cui, and P. Qin, “Review of sea fog detection from satellite remote sensing data,” J. Mar. Sci. 41(12), 146–154 (2017).

6. Y. Deng, J. Wang, and J. Cao, “Detection of daytime fog in South China Sea using MODIS data,” J. Trop. Meteorol. 29(6), 1046–1050 (2013).

7. J. Wan, J. Su, H. Shen, S. Liu, and J. Li, “Spatial and temporal characteristics of sea fog in Yellow Sea and Bohai Sea based on active and passive remote sensing,” in IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (2020), pp. 5477–5480.

8. X. Wen, D. Hu, X. Dong, F. Yu, D. Tan, Z. Li, Y. Liang, D. Xiang, S. Shen, C. Hu, and B. Cao, “An object-oriented daytime land fog detection approach based on NDFI and fractal dimension using EOS/MODIS data,” Int. J. Remote Sens. 35(13), 4865–4880 (2014). [CrossRef]

9. X. Wu and S. Li, “Automatic sea fog detection over Chinese adjacent oceans using Terra/MODIS data,” Int. J. Remote Sens. 35(21), 7430–7457 (2014). [CrossRef]

10. L. Yi, K. Li, X. Chen, and K. Tung, “Arctic fog detection using infrared spectral measurements,” J. Atmos. Ocean. Tech. 36(8), 1643–1656 (2019). [CrossRef]

11. M. H. Ahn, E. H. Sohn, and B. J. Hwang, “A new algorithm for sea fog/stratus detection using GMS-5 IR data,” Adv. Atmos. Sci. 20(6), 899–913 (2003). [CrossRef]

12. A. Harun-Al-Rashid and C. Yang, “A simple sea fog prediction approach using GOCI observations and sea surface winds,” Remote Sensing Letters 9(1), 21–30 (2018). [CrossRef]

13. Y. Yuan, Z. Qiu, D. Sun, S. Wang, and X. Yue, “Daytime sea fog retrieval based on GOCI data: a case study over the Yellow Sea,” Opt. Express 24(2), 787–801 (2016). [CrossRef]

14. D. Kim and M. S. Park, “Severe visibility marine fog detection using GOCI/COMS VIS bands,” inConference on Remote Sensing of Clouds and the Atmosphere XXIV (2019), vol. 11152.

15. D. Kim, M. S. Park, Y. J. Park, and W. Kim, “Geostationary Ocean Color Imager (GOCI) marine fog detection in combination with Himawari-8 based on the decision tree,” Remote Sens. 12(1), 149 (2020). [CrossRef]

16. Y. Huang, S. Siems, M. Manton, A. Protat, L. Majewski, and H. Nguyen, “Evaluating Himawari-8 cloud products using shipborne and CALIPSO observations: cloud-top height and cloud-top temperature,” J. Atmos. Ocean. Tech. 36(12), 2327–2347 (2019). [CrossRef]

17. H. S. Ryu and S. Hong, “Sea fog detection based on Normalized Difference Snow Index using advanced Himawari imager observations,” Remote Sens. 12(9), 1521 (2020). [CrossRef]

18. Y. Deng, Y. Tian, and J. Wang, “Dynamic detection of daytime sea fog using geostationary meteorological satellite data,” Sci. Geographica Sinica 36(10), 1581–1587 (2016).

19. Q. Li, X. Sun, and X. Wang, “Reliability evaluation of the joint observation of cloud top height by FY-4A and Himawari-8,” Remote Sens. 13(19), 3851 (2021). [CrossRef]

20. D. Wu, B. Lu, T. Zhang, and F. Yan, “A method of detecting sea fogs using CALIOP data and its application to improve MODIS-based sea fog detection,” J. Quant. Spectrosc. Radiat. Transfer 153, 88–94 (2015). [CrossRef]

21. J. Cermak and J. Bendix, “Dynamical nighttime fog/low stratus detection based on Meteosat SEVIRI data: A feasibility study,” Pure Appl. Geophys. 164(6-7), 1179–1192 (2007). [CrossRef]

22. S. Zhang and L. Yi, “A comprehensive dynamic threshold algorithm for daytime sea fog retrieval over the Chinese adjacent seas,” Pure Appl. Geophys. 170(11), 1931–1944 (2013). [CrossRef]

23. B. Banerjee, F. Bovolo, A. Bhattacharya, L. Bruzzone, S. Chaudhuri, and B. K. Mohan, “A new self-training-based unsupervised satellite image classification technique using cluster ensemble strategy,” IEEE Geosci. Remote Sensing Lett. 12(4), 741–745 (2015). [CrossRef]

24. X. Dai, X. Wu, B. Wang, and L. Zhang, “Semi-Supervised scene classification for remote sensing images based on CNN and ensemble learning,” in IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (2018), pp. 4732–4735.

25. D. Hong, N. Yokoya, J. Chanussot, and X. X. Zhu, “An augmented linear mixing model to address spectral variability for hyperspectral unmixing,” IEEE Trans. on Image Process. 28(4), 1923–1938 (2019). [CrossRef]

26. H. Su, Y. Yu, Q. Du, and P. Du, “Ensemble learning for hyperspectral image classification using tangent collaborative representation,” IEEE Trans. Geosci. Remote Sensing 58(6), 3778–3790 (2020). [CrossRef]

27. H. Su, Y. Yu, Z. Wu, and Q. Du, “Random subspace-based K-nearest class collaborative representation for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sensing 59(8), 6840–6853 (2021). [CrossRef]

28. D. Hong, L. Gao, N. Yokoya, J. Yao, J. Chanussot, Q. Du, and B. Zhang, “More diverse means better: multimodal deep learning meets remote sensing imagery classification,” IEEE Trans. Geosci. Remote Sensing 59(5), 4340–4354 (2021). [CrossRef]

29. X. Wu, D. Hong, and J. Chanussot, “Convolutional neural networks for multimodal remote sensing data classification,” IEEE Trans. Geosci. Remote Sensing 60, 1–10 (2022). [CrossRef]

30. H. Su, W. Yao, Z. Wu, P. Zheng, and Q. Du, “Kernel low rank representation with elastic network for China coastal wetland classification using GF-5 hyperspectral imagery,” ISPRS J. Photogrammetry and Remote Sens. 171, 238–252 (2021). [CrossRef]

31. D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, and J. Chanussot, “Graph convolutional networks for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sensing 59(7), 5966–5978 (2021). [CrossRef]

32. M. Wang, Z. Zhuang, K. Wang, S. Zhou, and Z. Liu, “Intelligent classification of ground-based visible cloud images using a transfer convolutional neural network and fine-tuning,” Opt. Express 29(25), 41176–41190 (2021). [CrossRef]

33. H. Fu, J. Feng, J. Li, and J. Liu, “Cloud detection method of FY-2G satellite images based on random forest,” Bull. Surv. Mapping 3, 61–66 (2019). [CrossRef]

34. Z. Tan, S. Ma, D. Han, D. Gao, and W. Yuan, “Estimation of cloud base height for FY-4A satellite based on random forest algorithm,” J. Infrared Millim. W. 38(3), 381–388 (2019). [CrossRef]

35. R. M. Welch, S. Asefi, J. Zeng, U. S. Nair, Q. Han, R. O. Lawton, D. K. Ray, and V. S. Manoharan, “Biogeography of tropical montane cloud forests. Part I: Remote sensing of cloud-base heights,” J. Appl. Meteorol. Clim. 47(4), 960–975 (2008). [CrossRef]

36. J. Zhang, P. Liu, F. Zhang, H. Iwabuchi, A. A. d. H. e de Moura, and V. H. C. de Albuquerque, “Ensemble meteorological cloud classification meets internet of dependable and controllable things,” IEEE Internet Things J. 8(5), 3323–3330 (2021). [CrossRef]

37. T. Rainforth and F. Wood, “Canonical correlation forests,” arXiv:1507.05444v6 (2017).

38. I. Colkesen and T. Kavzoglu, “Ensemble-based canonical correlation forest (CCF) for land use and land cover classification using sentinel-2 and Landsat OLI imagery,” Remote Sensing Letters 8(11), 1082–1091 (2017). [CrossRef]

39. D. T. Ocansey, M. Aidoo, M. Bikdash, H. D. Ismail, C. White, R. H. Newman, and B. K. Dukka, “Performance of canonical correlation forest in phosphorylation site predictions,” in Proc. IEEE SoutheastCon (2018), pp. 1–7.

40. E. K. Sahin, I. Colkesen, and T. Kavzoglu, “A comparative assessment of canonical correlation forest, random forest, rotation forest and logistic regression methods for landslide susceptibility mapping,” Geocarto International 35(4), 341–363 (2020). [CrossRef]

41. J. Xia, N. Yokoya, and A. Iwasaki, “Hyperspectral image classification with canonical correlation forests,” IEEE Trans. Geosci. Remote Sensing 55(1), 421–431 (2017). [CrossRef]

42. J. M. Lewis, D. Koračin, and K. T. Redmond, “Sea fog research in the United Kingdom and United States: A historical essay including outlook,” Bull. Am. Meteorol. Soc. 85(3), 395–408 (2004). [CrossRef]

43. X. Wu, S. Li, M. Liao, Z. Cao, L. Wang, and J. Zhu, “Analyses of seasonal feature of sea fog over the Yellow Sea and Bohai Sea based on the recent 20 years of satellite remote sensing data,” Acta Oceanol. Sin. 37(1), 63–72 (2015).

44. J. Bendix, B. Thies, T. Nauss, and J. Cermak, “A feasibility study of daytime fog and low stratus detection with Terra/Aqus-MODIS over land,” Meteorol. App. 13(02), 111–125 (2006). [CrossRef]

45. C. Zhang, Y. Cai, and J. Zhang, “The Application of monitoring sea fog in Taiwan Strait using MODIS remote sensing data,” J. Appl. Meteorol. Sci. 20(1), 8–16 (2009).

46. D. Wu, “A discussion on difference between haze and fog and warning of ash haze weather,” in Chinese) Meteorol. Mon. 31(4), 3–7 (2005).

47. I. Gultepe, G. Pearson, J. A. Milbrandt, B. Hansen, S. Platnick, P. Taylor, M. Gordon, J. P. Oakley, and S. G. Cober, “The fog remote sensing and modeling field project,” Bull. Am. Meteorol. Soc. 90(3), 341–360 (2009). [CrossRef]

Altitude (km)	Horizontal Resolution (m)	Vertical Resolution (m)
-2 ∼ -0.5	333	300
-0.5 ∼ 8.2	333	30
8.2 ∼ 20.2	1000	60
20.3 ∼30.1	1667	180
30.1 ∼ 40	5000	300

Bands	Wavelengths (µm)	Spatial Resolution (km)	Bands	Wavelengths (µm)	Spatial Resolution (km)
1	0.620 ∼ 0.670	0.25	7	2.105 ∼ 2.155	0.5
2	0.841 ∼ 0.876	0.25	17 ∼ 19	0.890 ∼ 0.965	0.5
3	0.456 ∼ 0.479		20	3.660∼3.840	1
4	0.545 ∼ 0.565	0.5	26	1.360 ∼ 1.390
5	1.230 ∼ 1.250	0.5	31 ∼ 32	10.78 ∼ 12.27

Object	Training	Testing
Sea surface	8849	2184
Sea fog	5276	1182
Cloud	13875	3634
Total	28000	7000

Types	Valid
Surface	2666
Sea fog	1282
Cloud	4581
Total	8529

Method	POD	FAR	CSI	MCC
Logistic	65.05%	12.79%	37.75%	46.14%
SVM	75.74%	9.60%	49.09%	59.60%
DT	81.59%	20.42%	37.87%	47.88%
RF	81.83%	6.31%	60.32%	70.78%
CCF	81.67%	6.33%	60.14%	70.56%
I-CCF	84.63%	5.70%	64.01%	74.14%

Node-splitting optimized canonical correlation forest algorithm for sea fog detection using MODIS data

Abstract

1. Introduction

2. Study area and data

2.1 Study area

2.2 Data

2.2.1 Optical satellite data

2.2.2 Spaceborne lidar data

2.2.3. Meteorological station data

3. Method

3.1 Dataset

3.2 Channel selection

3.3 Node-splitting optimized canonical correlation forest algorithm

3.4 Parameter setting and evaluation metrics

4. Results and discussion

4.1 Validation with foggy and partly cloudy cases

4.2 Validation with CALIOP data

4.3 Validation with meteorological station data

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (6)

Equations (9)

Optics Express

Predict ResultTruth Result		MS fog pixel	MS non-fog pixel
		TP	FP
RF fog pixel		79	27
I-CCF fog pixel		85	23
		FN	TN
RF non-fog pixel		28	173
I-CCF non-fog pixel		25	174
	POD	FAR	CSI	MCC
RF	72.90%	13.50%	58.21%	59.66%
I-CCF	77.27%	11.68%	63.91%	65.87%