OM-NAS: pigmented skin lesion image classification based on a neural architecture search

Tiejun Yang; Tiejun Yang; Qing He; Qing He; Lin Huang

doi:10.1364/BOE.483828

1. Introduction

Skin cancer is one of the most common cancers, and melanoma is the most fatal skin cancer, with a mortality rate of approximately 75% [1]. Early identification of melanoma can significantly improve the survival rate of patients. Generally, images of pigmented skin lesions are acquired by dermoscopy. Due to the effects of skin color or hair and the high visual similarity between melanoma and non-melanoma lesions, a misdiagnosis or a missed diagnosis is very likely (Fig. 1 shows 8 types of skin lesions). Therefore, automatic classification methods for pigmented skin lesions can help doctors accurately identify melanoma and improve their work efficiency.

Fig. 1. Examples of different types of skin lesions.

Download Full Size | PDF

At present, deep learning has made significant progress in medical image processing [2]. Some researchers have started applying this approach to identify melanoma; in particular, convolutional neural networks (CNNs) have been used for image classification of pigmented skin lesions [3–5]. The design of the neural network architecture is critical for extracting image features and classification properties. However, this approach largely depends on expert experience, for example, when selecting the convolution kernel and setting the hyperparameters. In addition, the experience, knowledge and mindset of an individual may make it difficult to discover new neural network architectures to some extent.

NAS was developed to realize the automatic design of CNN architectures [12–14]. In general, a search strategy is used to evaluate the performance of a large number of candidate neural architectures in a search space composed of several operations in an attempt to find the optimal neural architecture.

At the beginning, NAS employs the global search space with a chain structure [12,13,15]. All the components of the entire neural network are searched, which makes the NAS task quite complex. To effectively solve this problem, the search space is reconstructed, and a cell-based search space is proposed [6,11,14]. This approach is beneficial because only the best local neural architecture (cell) is searched, and then, the entire neural network is built by repeatedly stacking the cells.

These two search spaces mainly include basic operations, such as convolution and pooling [11,13]. Some high-performance neural network architectures also include elaborate neural network modules comprising basic operations [7–10]. The InceptionV1 module [7] simultaneously combines convolution operations and pooling operations of three sizes. With a global pooling layer and a bottleneck structure composed of two fully connected layers, the SE module [10] can be embedded in most neural networks to improve their performance. The Fire module designed by SqueezeNet [9] is a lightweight and efficient neural network module with a convolutional layer, a pooling layer and a fully connected layer. PSPnet [8] uses convolution operations of four sizes to obtain feature maps with different scales. If these neural network modules (hereinafter referred to as “macro operations”) are used in the search space, the performance of the neural network is expected to be further improved.

The search strategy is also the key of NAS. Strategies include reinforcement learning [11–13], evolutionary computation [6,15,16] and gradient optimization [17]. In particular, evolutionary computation first encodes the neural architecture and then initializes a group of individuals to create a population. On this basis, various evolutionary operations based on the idea of biological evolution (such as copy, selection and mutation) are performed to modify individual codes and produce later generations, and individuals with a better performance are continuously selected through iterations to generate the optimal neural architecture.

Early on, the strategy based on evolutionary computation was used to evolve the weight of neural networks [18]. The NeuroEvolution of Augmenting Topologies (NEAT) algorithm [19], which can evolve both the weight and the topological structure of a neural network, was eventually developed. On this basis, GeNet [16] expresses the neural architecture as a binary code with a fixed length and then iteratively updates the population based on the genetic algorithm to obtain the optimal neural architecture. Simply taking a single-layer model without convolution as the beginning of individual evolution, Large-scale Evolution [15] uses a directed graph code as the neural architecture. Both GeNet and Large-scale Evolution use the tournament algorithm to eliminate individuals with less adaptive capacities to maintain an appropriate population size. In this case, excellent individuals remain, though many individuals in the population have the same ancestor. AmoebaNet [6] further improves the tournament algorithm. By assigning an age to individuals, this approach prefers young individuals in the population, which truly simulates the reproductive process of natural species. The problem is that excellent individuals may be eliminated.

To improve the accuracy of pigmented skin lesion image classification based on CNNs, we developed the macro operation mutation-based neural architecture search (OM-NAS) approach based on AmoebaNet [6] and applied it in pigmented skin lesion image classification. The contributions of this study are as follows:

1) The use of macro operations in the search space. Based on the search space of cells, InceptionV1 [7], pyramid scene parsing (PSP) [8], Fire [9] and squeeze and excitation (SE) [10] are incorporated into the search space so that it includes both basic operations such as a convolutional layer and elaborate manually designed macro operations. This approach fully utilizes existing experience in the design of the neural architecture and simplifies the original neural architecture search (NAS) search space [11].
2) The proposal of an evolutionary computation search strategy based on macro operation mutation. In the search process, an evolutionary algorithm based on macro operation mutation is used to iteratively change parent cells and thereby produce child cells. Specifically, macro operation as the target cell of mutation replaces the operation type of a cell, and the mutation connection method in AmoebaNet is used to change the connection mode of the cells. On the pigmented skin lesion datasets HAM10000 and ISIC2017, this approach is almost as accurate as or more accurate than state-of-the-art (SOTA) approaches in terms of image classification (average sensitivity: 72.4% and 58.5%, respectively).

2. Material and method

This approach includes three main steps, i.e., improving the search space, searching for the optimal cell and cell stacking and testing. As shown in Fig. 2, the search space of self-defined macro operations is first expanded, and AmoebaNet's original micro search space is simplified. Then, the optimal cell is searched for by using the evolutionary search strategy based on macro operation mutation. Finally, the optimal cells are stacked to build a neural network for classification of pigmented skin lesions, which is trained and tested on the HAM10000 and ISIC2017 datasets.

Fig. 2. Flow chart - First, add macro operations to the search space; then, employ the OM-NAS approach to search for the optimal cell in the improved search space containing macro operations; finally, build the entire neural architecture by stacking cells.

Download Full Size | PDF

2.1 Image data

The HAM10000 [29] and ISIC2017 [30] datasets were used in the experiment. HAM10000 contains 10,015 dermatoscopic images, including seven types of skin lesions: melanoma, melanocytic nevus, basal cell carcinoma, actinic keratosis, benign keratosis, dermatofibroma and vascular lesions. By employing a previously reported method for dividing a dataset [21], we randomly divided HAM10000 into 4 equal parts. Specifically, 1 part was used as the test set (2,500 images), and the other 3 were used for training and verification (5,000 for training and 2,500 for verification).

ISIC2017 consists of 2,750 images, including 3 types of skin lesion images: melanoma (MEL), pigmented nevus and seborrheic keratosis (SKL). ISIC2017 includes a training set (2,000), validation set(150) and test set (600). Examples of different types of skin lesions are demonstrated in Fig. 1.

2.2 Improving the search space

At present, the search space defined by NAS is generally composed of basic operations such as convolution and pooling (hereinafter referred to as “micro operations”). In this space, you can find the elaborate network architecture based on these micro operations. However, this method creates a large search space and results in a low search efficiency [12,15]. CNNs based on manually designed neural architectures can also have SOTA performance [7–10]. Thus, to make full use of neural architectures that have been well designed and verified (macro operations), we defined a new macro search space and inserted four macro operations, i.e., InceptionV1 [7], PSP [8], Fire [9] and SE [10], into the search space (Table 1). Moreover, to further optimize the original micro search space, micro operations that hardly appear in the optimal cells, such as NASNet-A [11] and AmoebaNet-A [6], were removed, including 1 × 1 convolution, 5 × 5 maximum pooling and 7 × 7 maximum pooling.

Table 1. Macro search space

View Table | View all tables in this article

In addition, 3 × 3 and 5 × 5 dilated convolution operations were included to create a convolutional layer with a larger receptive field. For the optimized micro search space, see Table 2.

Table 2. Micro search space

View Table | View all tables in this article

Then, the optimal micro and macro operations were used to build candidate normal cells and reduction cells [11]. Each cell is composed of B blocks, and each block comprises a five-tuple (I₁, I₂, O₁, O₂, Add). The structure example is shown in Fig. 3. For the block, each of the two operations has 1 input. These two operations, either micro operation or macro operation, are added to obtain the final output of the block.

Fig. 3. Example of block structure.

Download Full Size | PDF

2.3 Evolutionary search strategy based on macro operation mutation

Next, the optimal cell is searched for by using an evolutionary search strategy similar to AmoebaNet. The difference is that the mutation operator employs the proposed macro operation for mutation. For the overall process of this algorithm, see Fig. 2. First, P cells are initialized based on the micro search space, and the approach mentioned in section III.C is used to generate P candidate neural networks (initial populations) prior to evolving the populations. In each process of evolution, S candidate neural networks (individuals) are randomly selected from the population, and the individuals with the best adaptive capability are used as the parent of the object of mutation. Then, the macro operation mutation is used to mutate the parent's cell, a candidate neural network child is generated based on the mutated cell and its performance is evaluated. Both the child and the parent population enter the next iteration until the convergence conditions are satisfied. For a detailed description of this algorithm, see Algorithm 1.

Viruses inject their genes into host genes when an invasion occurs. Inspired by this fact, we regarded macro operations as viral genes and the candidate neural architecture as the host gene. Then, the mutation of the candidate neural network was taken as the “invasion” process of macro operations, where the macro operation replaces a certain operation in the candidate neural network cell.

A detailed description of macro operation mutation is listed in Algorithm 2. First, a mutation site of the neural architecture is randomly determined. The mutated cell is selected from the normal cell and the reduction cell contained in the optimal candidate neural network; a block is selected from the B Blocks that make up the cell; and an operation op is selected in the block. Then, the macro operation mutation process proceeds. A macro operation macro op is selected from the macro search space to replace op. Now, the macro operation mutation of the neural network has been completed.

boe-14-5-2153-i001

We also employed the same approach [6] as that used in AmoebaNet for mutation of the candidate neural network. The implementation of this approach is similar to changing the operation type of the neural architecture in the mutation of the operation input. At the same time, for the algorithm presented herein, no individual is eliminated in each process of evolution. As a result, excellent individuals are not eliminated, and most individuals will not be generated by the same ancestor.

2.4 Candidate neural network architecture

At the time of training and evaluation, the candidate neural network uses the architecture shown in Fig. 4. To extract features and reduce the dimensionality of feature mapping during output, we define N normal cells followed by a reduction cell as a group, and the cells are connected by residuals. The original input first passes through the 3 × 3 convolutional layer to the first group (note that each cell receives two inputs), then connects with G-1 groups and finally uses the global average pool layer (GAP) and fully connected layer (FC) as classification layers [20]. Then, the depth of the neural architecture is (N + 1) *G. In addition, stacking enables the group to work across multiple scales.

Fig. 4. Architecture of candidate neural network.

Download Full Size | PDF

2.5 Assessment indicators

In terms of assessment indicators, MC-Sensitivity S was used to fully measure the accuracy of classification. It is defined as follows:

(1)$$S = \frac{1}{C}\sum\limits_i^C {\frac{{T{P_i}}}{{T{P_i} + F{N_i}}}} $$

where TP is true positive, FN is false negative and C is the number of categories.

3. Results

This experiment was implemented by using Pytorch 1.5. The system environment was as follows: Win10, Intel i7-8700 CPU, 16 GB DDR4 RAM and GTX1080 8 G GPU. There were two phases, i.e., search and test. For the search, we used the OM-NAS approach to search for the optimal cell on the HAM10000 dataset. In the test phase, we built the neural network with the identified optimal cells and compared the classification performance of neural networks with different cell stacking depths (N, G) on the HAM10000 and ISIC2017 datasets. In addition, we compared them with SOTA approaches, such as InceptV3 + Attention [21], DenseNet-121 [22], VGG19 [23], Inception + ResNet [24], ARL-CNN [25], SA + AS [26], G-CNN [27], LIN [28] and AmoebaNet [6].

3.1 Setting of parameters

In the search phase, the parameters in Algorithm 1 were set as follows: P = 5, S = 5, M = 50 and B = 4. When training and assessing the candidate neural network, we used 224 × 224 images and the Adam optimizer and trained 5 epochs, with a batch size of 5 and a learning rate of 0.001.

Images (224 × 224) were also used in the test phase. For HAM10000, the initial learning rate was 0.0005, and the Adam optimizer with betas of (0.9, 0.999) and a weight decay of 3e-4 was used; for ISIC2017, the initial learning rate was 0.00125, and the SGD optimizer with a momentum of 0.9 and a weight decay of 3e-4 was used. During training, cosine annealing was used after each epoch to reduce the learning rate (the lowest value for HAM10000 was 0.0001, and that for ISIC2017 was 0.001). The batch size was set as 5 or 8; 120 epochs of iterative training were conducted.

Moreover, Table 3 shows that the dataset category was not balanced, so the weighted loss function [21] was used in the search and test phases.

Table 3. Average MC-sensitivity value of the verification set at different stacking depths

View Table | View all tables in this article

3.2 Search for the optimal cell

First, we searched for the optimal cell on the HAM10000 dataset. To improve the search efficiency, a large N is not appropriate. Second, the G value, which affects the depth and field of view of the neural network, should not be too small or large. A small G will lead to a small field of view, while a large N will cause a large depth of the neural network, thereby reducing the search efficiency. Therefore, we set N = 1, G = 5 and N = 2, G = 5 for the search. When N = 1, G = 5, the search took approximately 1.5 GPU days; when N = 2, G = 5, 2.5 GPU days were required.

Figure 5 shows the optimal cell (N = 1, G = 5). The normal cell (left) and the reduction cell (right) contain both micro operations and macro operations (PSP and SE modules). Macro operations were present because macro operation mutation was applied in the process of evolution. Integrating features at different scales can better express the features, while the attention mechanism in SE can focus on the information about lesions. The combination of micro and macro operations improved the generalization ability of the entire neural network to some degree.

Fig. 5. Optimal cell structure found on HAM10000 by using MON-NAS, including 4 blocks - left: normal cell; right: reduction cell. In particular, both H[i-1] and H[i] are inputs, and i is the number of layers. For all the block outputs without subsequent connections, the concat operation acts as the final output of the cell

Download Full Size | PDF

3.3 Stacking of the optimal cell

By changing N (N = 1, 2, 3) and G (G = 3, 4, 5), we analyzed the impact of the stacking depth of the best cell on the classification performance of the HAM10000 and ISIC2017 validation sets in order to determine their respective optimal neural networks.

Figure 6 shows the MC-Sensitivity-Epoch curves of the HAM10000 and ISIC2017 verification sets at different stacking depths. Figure 6 (a) shows that when N = 1 and G = 3, the performance of OM-NAS on the HAM10000 verification set was much better than that of the others. By contrast, the MC-Sensitivity for the ISIC2017 verification set showed a slight difference. However, when N = 1, G = 3 or G = 4, the MC-Sensitivity was higher.

Fig. 6. MC-sensitivity-epoch curves of the (a) HAM10000 and (b) ISIC2017 verification sets at different stacking depths

Download Full Size | PDF

To select the optimal cell stacking approach, we divided them into 3 groups according to the N value and calculated the average MC-Sensitivity value of 100 ∼ 120 epochs (see Table 3) as a reference index (as shown in Fig. 6, after training 100 epochs, the candidate neural networks basically converged). As shown in Table 3, when N was constant, the average MC-Sensitivity value decreased as G increased. Therefore, we selected the stacking approach with the maximum average MC-Sensitivity value in each group for testing.

3.4 Test result of HAM10000

Ultimately, we compared the optimal OM-NAS from the three stacking approaches with 5 SOTA approaches: InceptV3 + Attention [21], DenseNet-121 [22], Inception + ResNet [24], VGG19 [23] and AmoebaNet [6]. According to Table 4, the MC-Sensitivity values of both OM-NAS (N = 1, G = 3) and InceptV3 + Attention reached the peak (approximately 72.4%). However, InceptV3 + Attention uses high-resolution images (600 × 450) to present more fine-grained lesion features, while OM-NAS employs 224 × 224 images, requiring a lower resolution of images. Inception + ResNet, DenseNet-121 and VGG19 are all manually designed CNNs, with MC-Sensitivity values ranging from approximately 63.1 ∼ 66.7%. OM-NAS can automatically search neural network architectures based on different datasets, and the MC-Sensitivity values of the three neural networks all exceed those of the above three manually created neural networks. In addition, we compared the proposed method with AmoebaNet. The results show that the MC-Sensitivity value obtained by OM-NAS was higher than that obtained by AmoebaNet, especially when N = 1 and G = 3 (approximately 8.2% higher than AmoebaNet). AmoebaNet's search space only contains micro operations. Although this is used by most NAS methods, the search strategy limits the diversity of candidate neural network architectures. It is difficult to search for sophisticated architectures such as PSP and Fire modules, etc. It can be seen that directly adding the artificially designed delicate neural architectures to the search space is beneficial to improve the classification accuracy of the neural network.

Table 4. Comparison of the test results of different approaches on HAM10000

View Table | View all tables in this article

We also compared the neural networks with other stacking depths of OM-NAS on the HAM10000 test set. As shown in Fig. 7 (a), when N = 1, G = 3, OM-NAS had the maximum MC-Sensitivity value on the HAM10000 test set (approximately 72.4%). As G increased, the MC-Sensitivity value decreased. In particular, when N = 3 and G = 5, the stacking depth was the largest, but the MC-Sensitivity was the minimum. This means that when the stacking depth is too large, OM-NAS shows a poor performance in extracting feature information at a low spatial resolution, which results in a loss of some detailed information. As a result, shallow and deep information cannot be well integrated.

Fig. 7. Classification results of (a) HAM10000 and (b) ISIC2017 test sets at different stacking depths

Download Full Size | PDF

3.5 Test result of transfer to ISIC2017

To verify transferability of the optimal cell, we continued to test on ISIC2017 and compared it with 5 SOTA approaches, i.e., ARL-CNN [25], SA + AS [26], LIN [27], G-CNN [28] and AmoebaNet [6], in terms of performance. As shown in Table 5, the MC-Sensitivity values of the three optimal OM-NAS neural networks in the classification of seborrheic keratosis and three-type classification tasks all surpass those of other approaches. In particular, when N = 2 and G = 3, OM-NAS has the maximum MC-Sensitivity values in all classification tasks. The ARL block in ARL-CNN combines with the residual block and attention mechanism to identify the location of lesions. However, it does not fuse the features extracted at the same scale by increasing the network width. Its performance in melanoma classification is slightly lower than that of OM-NAS (N = 2, G = 3).

Table 5. Comparison of the test results of different approaches on ISIC2017

View Table | View all tables in this article

SA + AS uses active learning strategies to screen useful samples to improve classification performance with the help of human. G-CNN mainly focuses on the information about global scale of lesions, while LIN pays more attention to the difference in local fine-grain between lesions. Neither approach properly combines global information with local information. Additionally, compared with that of AmoebaNet, the performance of OM-NAS has been improved in all classification tasks. In particular, in three-type classification, the performance of OM-NAS (N = 2, G = 3) is approximately 8.7% higher than that of AmoebaNet.

Likewise, we also compared the neural networks of OM-NAS with other stacking depths on the ISIC2017 test set (three-type classification task). As shown in Fig. 7 (b), when N = 2 and G = 3, the MC-Sensitivity of OM-NAS is the maximum (approximately 58.5%). Moreover, with the increase in G, the MC-Sensitivity value continues decreasing. Therefore, the stacking depth has a great impact on the classification of pigmented skin lesions.

3.6 Discussion

Table 4 and Table 5 also provide the number of parameters of some CNN to compare and analyze the complexity of the algorithms. The number of parameters of OM-NAS is proportional to N and G, and it is between 0.29 M and 1.48 M, which is far lower than those of VGG19 (20.29 M) and ARL-CNN (23.0 M). It took approximately 1.5 to 2.5 GPU days to find the best cells. The retraining time on HAM10000 and ISIC2017 is about 18 GPU hours and 8 GPU hours respectively, and the average time of testing each sample is about 30 ms and 200 ms respectively. Compared with the traditional methods of manually constructing CNN for skin lesion classification, OM-NAS automatically searches a CNN model for skin lesion classification, which is more practical and efficient.

4. Conclusion

This paper presents the OM-NAS approach, a simple and effective approach to classify pigmented skin lesions based on NAS. This approach uses the search strategy of evolutionary computation and introduces macro operations into the process of cell mutation. In addition to the experience of the manually designed CNN containing macro operations, the identified cells are as flexible as traditional micro operations. OM-NAS was used to search for the optimal cell in the HAM10000 database and build the neural network by stacking cells; the classification performance was almost as good as or better than that of other SOTA approaches. The optimal cell was successfully transferred to ISIC2017 and showed a good classification performance. In the future, the search strategy using the search space based on micro operations, which can search for sophisticated neural architectures like SE modules, and the connection between the operations at the search channel level will be studied to further improve the diversity of neural architecture connections and the search efficiency.

Funding

National Natural Science Foundation of China (62166012, 62266015); Natural Science Foundation of Guangxi Province (2022GXNSFAA035644); Guangxi Key Laboratory Fund of Embedded Technology and Intelligent System (2020-1-8).

Disclosures

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [29] and [30].

References

1. A. F. Jerant, J. T. Johnson, C. D. Sheridan, and T. J. Caffrey, “Early detection and treatment of skin cancer,” Am. Fam. Physician 62(2), 357–368 (2000).

2. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. Van der Laak, B. Van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Med. Image Anal. 42, 60–88 (2017). [CrossRef]

3. J. Kawahara and G. Hamarneh, “Multi-resolution-tract CNN with hybrid pretrained and skin-lesion trained layers,” In: L. Wang, E. Adeli, Q. Wang, Y. Shi, and H.I. Suk, (eds.) Machine Learning in Medical Imaging. MLMI 2016. Lecture Notes in Computer Science (Springer, Cham, 2016), pp. 164–171

4. A.R. Lopez, X. Giro-i-Nieto, J. Burdick, and O. Marques, “Skin lesion classification from dermoscopic images using deep learning techniques,” In: The 13th IASTED International Conference on Biomedical Engineering (2017), pp. 49–54.

5. L. Yu, H. Chen, Q. Dou, J. Qin, and P. Heng, “Automated melanoma recognition in dermoscopy images via very deep residual networks,” IEEE Trans. Med. Imaging 36(4), 994–1004 (2017). [CrossRef]

6. E. Real, A. Aggarwal, Y. Huang, and Q.V. Le, “Regularized evolution for image classifier architecture search,” In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4780–4789. Association for the Advancement of Artificial Intelligence, Honolulu, HI (2019).

7. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2015), pp. 1–9.

8. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 2881–2890.

9. F.N. Iandola, S. Han, M.W. Moskewicz, K. Ashraf, W.J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size,” arXiv, arXiv:1602.07360 (2016). [CrossRef]

10. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” In: 2018 IEEE CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 7132–7141.

11. B. Zoph, V. Vasudevan, J. Shlens, and Q.V. Le, “Learning transferable architectures for scalable image recognition,” In: 2018 IEEE CVF Conference on Computer Vision and Pattern Recognition(2018), pp. 8697–8710.

12. B. Zoph and Q.V. Le, “Neural architecture search with reinforcement learning,” arXiv, arXiv:1611.01578 (2016). [CrossRef]

13. B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing neural network architectures using reinforcement learning,” arXiv, arXiv:1611.02167 (2016). [CrossRef]

14. Z. Zhong, J. Yan, W. Wu, J. Shao, and C.L. Liu, “Practical block-wise neural network architecture generation,” In: 2018 IEEE CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 2423–2432 .

15. E. Real, S. Moore, A. Selle, S. Saxena, Y.L. Suematsu, J. Tan, Q.V. Le, and A. Kurakin, Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, pp. 2902–2911. JMLR.org, Sydney, Australia (2017).

16. L. Xie and A. Yuille, “Genetic CNN,” In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017), pp. 1379–1388.

17. H. Liu, K. Simonyan, and Y. Yang, “Darts: differentiable architecture search,” arXiv, arXiv:1806.09055 (2018). [CrossRef]

18. G.F. Miller, P.M. Todd, and S.U. Hegde, “Designing neural networks using genetic algorithms,” In: Proceedings of the 3rd International Conference on Genetic Algorithms (Morgan Kaufmann Publishers, 1989), pp. 379–384.

19. K. O. Stanley and R. Miikkulainen, “Evolving neural networks through augmenting topologies,” Evol. Comput. 10(2), 99–127 (2002). [CrossRef]

20. M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv, arXiv:1312.4400 (2014). [CrossRef]

21. N. Gessert, T. Sentker, F. Madesta, R. Schmitz, H. Kniep, I. Baltruschat, R. Werner, and A. Schlaefer, “Skin lesion classification using cnns with patch-based attention and diagnosis-guided loss weighting,” IEEE Trans. Biomed. Eng. 67(2), 495–503 (2020). [CrossRef]

22. E. Mohamed and W. El-Behaidy, “Enhanced skin lesions classification using deep convolutional networks,” In: 2019 IEEE 9th International Conference on Intelligent Computing and Information Systems (ICICIS) (IEEE, 2019), pp. 180–188.

23. H. Mureşan, “Skin lesion diagnosis using deep learning,” In: 2019 IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP) (IEEE, 2019), pp. 499–506.

24. A.M.R. Ratul, M.H. Mozaffari, W.S. Lee, and E. Parimbelli, “Skin lesions classification using deep learning based on dilated convolution,” bioRxiv, 2020.860700 (2020). [CrossRef]

25. J. Zhang, Y. Xie, Y. Xia, and C. Shen, “Attention residual learning for skin lesion classification,” IEEE Trans. Med. Imaging 38(9), 2092–2103 (2019). [CrossRef]

26. X. Shi, Q. Dou, C. Xue, J. Qin, H. Chen, and P.A. Heng, “An active learning approach for reducing annotation cost in skin lesion analysis,” In: International Workshop on Machine Learning in Medical Imaging (Springer, 2019), pp. 628–636.

27. Y. Li and L. Shen, “Skin lesion analysis towards melanoma detection using deep learning network,” Sensors 18(2), 556 (2018). [CrossRef]

28. P. Tang, Q. Liang, X. Yan, S. Xiang, and D. Zhang, “GP-CNN-DTEL: global-part CNN model with data-transformed ensemble learning for skin lesion classification,” IEEE J. Biomed. Health Inform. 24(10), 2870–2882 (2020). [CrossRef]

29. P. Tschandl, C. Rosendahl, and H. Kittler, “The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,” Sci. Data 5, 180161 (2018). [CrossRef]

30. N. Codella, D. Gutman, M. Celebi, B. Helba, M. Marchetti, S. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler, and A. Halpern, “Skin lesion analysis toward melanoma detection: a challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the International skin imaging collaboration (ISIC),” In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) (IEEE, 2018), pp. 168–172.

Operation	Expression
PSP module	psp module
Fire module	fire module
SE module	se module
InceptionV1 module	inception_v1 module

Operation	Expression
Copy	identity
3 × 3 deep separable convolution	sep_conv 3 × 3
5 × 5 deep separable convolution	sep_conv 5 × 5
7 × 7 deep separable convolution	sep_conv 7 × 7
3 × 3 dilated convolution	dil_conv 3 × 3
5 × 5 dilated convolution	dil_conv 5 × 5
3 × 3 maximum pooling	max_pool 3 × 3
3 × 3 average pooling	avg_pool 3 × 3

Group	Stacking Depth	HAM10000	ISIC2017
N = 1	G = 3	0.7200	0.6029
	G = 4	0.6624	0.5787
	G = 5	0.6443	0.4595
N = 2	G = 3	0.6667	0.5476
	G = 4	0.6673	0.5241
	G = 5	0.6406	0.5085
N = 3	G = 3	0.6932	0.6015
	G = 4	0.6462	0.5306
	G = 5	0.5874	0.5455

Methods	MC-Sensitivity (%)	# Parameters
InceptionV3 + Attention [21]	72.4	-
Inception + ResNet [24]	63.1	-
DenseNet-121 [22]	66.3	-
VGG19 [23]	66.7	20.29M
AmoebaNet [6]	64.6	-
OM-NAS(N = 2, G = 4)	68.6	1.48M
OM-NAS(N = 3, G = 3)	71.6	0.50M
OM-NAS(N = 1, G = 3)	72.4	0.29M

Methods	MEL (%)	SKL (%)	All (%)	# Parameters
ARL-CNN [25]	59.0	77.8	-	23.0M
SA + AS [26]	47.9	75.6	-	-
G-CNN [28]	45.3	76.0	-	-
LIN [27]	-	-	50.4	-
AmoebaNet [6]	57.2	75.2	49.8	-
OM-NAS(N = 3, G = 3)	49.6	79.7	56.4	0.50M
OM-NAS(N = 1, G = 3)	55.5	79.2	57.9	0.29M
OM-NAS(N = 2, G = 3)	59.3	79.7	58.5	0.39M

OM-NAS: pigmented skin lesion image classification based on a neural architecture search

Abstract

1. Introduction

2. Material and method

2.1 Image data

2.2 Improving the search space

2.3 Evolutionary search strategy based on macro operation mutation

2.4 Candidate neural network architecture

2.5 Assessment indicators

3. Results

3.1 Setting of parameters

3.2 Search for the optimal cell

3.3 Stacking of the optimal cell

3.4 Test result of HAM10000

3.5 Test result of transfer to ISIC2017

3.6 Discussion

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (5)

Equations (1)

Biomedical Optics Express