Deep learning classification of cervical dysplasia using depth-resolved angular light scattering profiles

Haoran Zhang; Wesley Y. Kendall; Evan T. Jelly; Adam Wax

doi:10.1364/BOE.430467

1. Introduction

Cervical cancer, a common epithelial malignancy, is one of the leading causes of cancer death among women worldwide [1]. Overall, early screening for cervical cancer remains the most reliable method for improved patient outcomes. Precancerous tissue (cervical dysplasia) presents as a range of severity from low-grade squamous intraepithelial lesions (LSIL) to high-grade squamous intraepithelial lesions (HSIL), and staging of dysplasia is critical in preventing the progression of malignancy, with the severity of dysplasia drastically influencing treatment strategies. LSIL commonly resolves independently without intervention, and only periodic observation is employed to monitor its progression, while HSIL requires more immediate and invasive treatment methods if discovered [2]. Current cervical screening techniques such as the Papanicolaou test [3] enjoy widespread use. Still, this test is naturally prone to sampling error and requires significant time and resources to prepare and analyze each sample to complete the screening and diagnostic process. Resultingly, the need for reliable real-time diagnostics has inspired many optical techniques [4–10] aimed at detecting cervical cancer at early stages. Angle-resolved low-coherence interferometry (a/LCI) is a promising technique that offers unique capabilities of extracting depth-resolved nuclear morphology information at or near the basal layer of the cervical epithelium. a/LCI demonstrated substantial diagnostic capabilities in detecting cervical dysplasia in vivo in early clinical trials, with 100% sensitivity and 97% specificity [11] towards classifying dysplastic and non-dysplastic cervical epithelial tissue. A prospective follow-up study used the decision line from the first study to analyze data from a different a/LCI instrument with improved clinical utility and achieved 90% sensitivity, 82% specificity [12].

a/LCI measures depth-resolved angular scattering fields from the tissue at the basal layer of the epithelium to extract nuclear morphology, establishing biomarkers of nuclear diameter and nuclear density (relative nuclear refractive index) to characterize the disease state of the tissue [11–14]. Nuclear morphology is extracted computationally from the depth-resolved angular scattering data using an inverse light scattering analysis (ILSA) algorithm [15–17] based on Mie theory. Although Mie theory-based ILSA is an effective technique for classifying epithelial tissue based on nuclear morphology parameters [11–14], the method is currently limited by excessive data processing time, ∼ 25 seconds per optical biopsy [11].

For proper computation, current a/LCI processing involves an iterative process in which sample data is compared to a database of known Mie scattering profiles. Further, a/LCI scans using ILSA-based processing require a calibration step that involves summing the a/LCI data at all angles as a function of depth to determine the epithelium surface in the optical scan [13]. Not only is there a constant system dependent offset as a result of the tissue start depth and unique optical sampling geometry for each probe, but variability in probe placement on the cervix requires minor adjustments across individual patient scans. Accounting for these variations adds substantially to overall processing times. Additionally, a/LCI scans undergo a low-pass filter that removes high-frequency oscillations arising from scattering contributions from neighboring cell nuclei or more prominent organelles [16]; this is invariant between patients but depends on the specific instrumentation used in the a/LCI setup. Previously reported methods aimed at improving a/LCI processing include using a continuous wavelet transform [18] and T-matrix [19] based approaches. However, both suffer from substantial trade-offs in terms of processing accuracy and speed, respectively. Based on clinician feedback, processing should take under one second to provide instant feedback to clinicians and limit the total time of an imaging session.

In recent years, deep learning techniques have advanced the state-of-the-art for a variety of image classification tasks. One particular family is convolutional neural networks (CNNs), which implement a series of transforming layers to extract features directly from training data [20]. When paired with optical imaging techniques, CNNs have shown potential towards rapid, high-accuracy classification; examples include: quantitative phase imaging [21–23], optical coherence tomography [24–27], fundus photography [28,29], adaptive optics scanning light ophthalmoscopy [30], multiphoton microscopy [31] and fluorescence microscopy [32]. Similarly, light scattering information is well-suited for this approach due to the abundant information provided within each dataset. For example, recent work by Zheng et al. [33] used 2D maps of optical scattering parameters extracted from differential interference contrast images to perform CNN-based diagnosis and achieved high classification accuracy.

This study trains a CNN on a/LCI angle vs. depth scattering profiles of cervical epithelium for rapid classification of benign, LSIL, and HSIL tissue. Raw a/LCI scans from two previous clinical studies assessing cervical dysplasia in vivo comprise the model dataset. The major distinction across both studies is the use of unique instrumentation, with the first featuring a point-probe a/LCI instrument [11] and the other a multipoint scanning a/LCI instrument [12]. The training was performed on a/LCI scans from the point-probe a/LCI study only and tested on a/LCI scans from both the point-probe a/LCI study and the scanning a/LCI study. Using a CNN-based approach to a/LCI processing, high accuracy, increased processing speeds, and generalizability across different instruments are demonstrated as a powerful tool that can improve the clinical feasibility of using a/LCI to detect dysplasia.

2. Materials and methods

2.1 a/LCI datasets

The two datasets used here are composed of depth-resolved angular scattering scans of the cervical epithelium acquired during two separate clinical studies using two different a/LCI instruments [11,12]. Each a/LCI scan is a two-dimensional map of scattering intensity as a function of depth within the tissue, and an example of an a/LCI scan is shown on the left of Fig. 1. In both clinical studies, the physical locations of the optical biopsies were predetermined by the instrumentation to specific quadrants of the cervix. Colposcopy immediately followed a/LCI data collection, and co-registration using tissue marking or white light imaging allowed for the identification of each specific site for acquisition of physical biopsy. Four separate optical biopsy sites were acquired and analyzed by a pathologist blinded to the Mie theory-based a/LCI optical biopsy results for each patient. These pathological results were set as ground truth for the CNN with possible classifications of benign, LSIL, and HSIL.

Fig. 1. Visual representation of the convolutional neural network architecture. Raw depth-resolved angular scattering profiles are input to the networks, with layer stacks of generated feature maps notated by the number of features. The training time takes an average of 96.2 ± 1.2 s.

Download Full Size | PDF

Dataset A was collected by a point-probe a/LCI and consists of 6660 individual scans collected from 40 patients, including 3260 scans collected from 33 benign biopsy sites, 2040 scans from 17 LSIL biopsy sites, and 1360 scans from 13 HSIL biopsy sites. Dataset B was collected via scanning a/LCI, where multiple scans were acquired in one imaging session. This device also included an onboard white light camera to visualize the cervix and enable better registration with physical biopsies. This set consists of 1600 individual scans collected from 20 patients, including 980 scans collected from 49 benign biopsy sites, 380 scans from 19 LSIL biopsy sites, and 240 scans from 12 HSIL biopsy sites. Altogether, 60 different patients were involved in this study. The scans from dataset A were cropped down from an angular range of 30.2° to 20.8° and interpolated to match the angular sampling of dataset B to match the angular range between the dataset training dataset A and the test of dataset B.

Each patient was under voluntary informed consent for both studies and approved by the Institutional Review Boards (IRB) of the University of California, San Francisco, Albert Einstein College of Medicine, and Duke University. A more detailed description of each clinical study's protocols and ethical standards is provided by Ho et al. [11] and Kendall et al. [12], respectively. The data used in this study were deidentified before analysis.

2.2 Mie theory-based inverse light scattering analysis

In the original cervix studies [11,12], Mie theory-based ILSA was used to extract nuclear morphology information as a function of depth into the tissue; a more detailed description of the algorithm is provided in Brown et al. [16]. The algorithm produces a prediction of the nuclear diameter and nuclear density for each depth within the tissue scan. Multiple scans from a given biopsy site are averaged to determine an average nuclear diameter and density to represent the given site. In the first study [11], linear discriminant analysis (LDA) was applied to find the optimal decision line that separated disease from normal tissues based on nuclear diameter and nuclear density. The threshold was varied to create a receiver operating characteristic (ROC) curve. The corresponding sensitivity and specificity are determined using the optimal point nearest to the top left. Positive predictive value (PPV) and negative predictive value (NPV) can also be quantified using the decision line, and these performance metrics are beneficial in many types of diagnostic testing. In the second study [12], the decision line from the first study was used to grade the nuclear morphology measurements for each biopsy site prospectively. The performance in this prospective study is again calculated by determining sensitivity, specificity, PPV, and NPV.

2.3 Machine learning architecture

The algorithm was developed using a standard CNN architecture [20] based on eight convolutional layers. For both classification results, a batch size of 50 was used for training. Results were reported using Adam Optimization [34] with a step size of 10⁻³, and coefficients of β₀ = 0.9, β₁ = 0.999, and ɛ = 10⁻⁸. The architecture of the CNN is visually represented in Fig. 1. Each depth-resolved angular scattering scan with the size of 10 × 69 (depth × angle) is fed into the network as inputs. Layer one and layer two use 3 × 3 convolutional kernels with 128 features and a rectified linear unit (ReLU) activation function with a stride of two for down-sampling. Layers three through eight each use a 1 × 3 convolutional kernel with 128 features and a ReLU activation function. Layer nine is a densely connected layer with 512 neurons and ReLU activation, and layer ten is a densely connected readout layer. Training and data analysis were processed on a desktop computer with i7-8700 CPU at 3.2 GHz, 32 GB of RAM, and a GeForce RTX 2070 Super GPU.

2.4 Data processing and statistical analysis

k-fold cross-validation (k=32) was first used to assess the predictive power of the CNN. Dataset A consists of 6660 different depth-resolved angular scattering scans randomly partitioned into 32 subsets, where 31 subsets were used as the training dataset to create a learned network model and the remaining subset used as a testing set to measure the network’s performance. This analysis was repeated until all 32 subsets were used once as a testing set. The average performance of the model was reported using ten rounds of cross-validation with new randomly partitioned subsets to minimize variability. The classification results were then dichotomized using the two approaches used in the original study: a histologic dysplastic (LSIL & HSIL) versus non-dysplastic dichotomy based on morphological distinction and a clinical HSIL versus LSIL/benign dichotomy based on clinical treatment paths. Finally, majority voting based on all scans from each biopsy site was introduced to create an overall prediction for the given biopsy site to determine sensitivity, specificity, PPV, and NPV. The workflow of this training approach is shown in Fig. 2(a).

Fig. 2. Workflow of the two CNN training approaches for the automated classification of cervical dysplasia using 8260 clinical a/LCI scans. (a.) dataset A used as the training and test set. (b.) Dataset A used as the training set and dataset B as the test set.

Download Full Size | PDF

The scans in dataset A were cropped and interpolated to match the angular range of dataset B. Dataset A was used explicitly as training data to create a learned network model to identify dysplasia across different a/LCI instruments. Network performance was evaluated using dataset B to assess the generalizability of the network. The classification results are dichotomized using the same two approaches as above. Thresholding values for each prediction were to determine the sensitivity, specificity, PPV, and NPV of the network in identifying dysplasia (HSIL/LSIL) or HSIL for each biopsy site. The workflow of this training approach is shown in Fig. 2(b).

Testing and training of the CNN were performed using TensorFlow 2 (Google Inc, Mountain View, CA). Statistical analyses were performed using MATLAB R2019A (MathWorks, Inc., Natick, MA).

3. Results

Figure 3. summarizes the average CNN classification performance for distinguishing individual benign, LSIL, and HSIL angular scattering scans (evaluated using ten rounds of 32-fold cross-validation). The average training time for each fold takes 96.2 ± 1.2 seconds over 100 epochs. Figure 3(a) shows the classification accuracies for each stage of cervical dysplasia. The histopathological analysis results were set as ground truth and represented in the left column. The prediction score for each identity is listed along the rows of the confusion matrix. The colormap of the confusion matrix is shown on the right, where the matrix elements corresponding to darker colors represent higher classification accuracies. The diagonal arrangement of the higher classifications scores highlights the correct classifications. Figure 3(b) presents the classification results graphically in stacked bar plots, where the three classes benign, LSIL and HSIL, are represented as blue, orange, and yellow-colored bars, respectively. The CNN shows high classification accuracy in distinguishing the three classes using single biopsy scans (91.96%, 89.5%, and 86.9% respectively for classifying benign, LSIL, and HSIL), with an overall scan-level accuracy above 90%.

Fig. 3. Classification performance of a CNN-based approach for the grading of cervical dysplasia using 8260 clinical a/LCI scans. Performance evaluated using the k-fold cross-validation (k=32) is summarized in the confusion matrix shown in (a.) and the stack bar-plot in (b.). To minimize variability, the average performance of the model is performed using ten rounds of cross-validation with new subsets that are randomly partitioned each time reported. The standard deviation of each class is shown in the confusion matrix.

Download Full Size | PDF

The obtained predictions were then dichotomized based on histological classification and clinical classification. The ability of the algorithm to identify dysplasia (LSIL/HSIL) and HSIL based on the two approaches can be illustrated by calculating the sensitivity and specificity, shown in Table 1. Overall, the network offered high sensitivity and specificity for detecting dysplasia (94.8% and 92.0%, respectively) and HSIL (86.9% and 96.5%, respectively). Based on the histological classification, the high sensitivity and NPV value (94.8% and 94.4%, respectively) indicate that false-negative outcomes are rare where dysplasia or HSIL are incorrectly classified as benign. For the clinical classification, although high specificity and NPV (all above 95%) were retained for this dichotomization, lower sensitivity and PPV (86.9% and 85.7%, respectively) were observed compared with that for the dysplastic vs. non-dysplastic dichotomization.

Table 1. Performance of the convolutional neural network for identifying dysplasia (LSIL/HSIL) or HSIL.

View Table | View all tables in this article

An overall prediction for each biopsy site was generated by performing majority voting of all the scan predictions at each given biopsy site to compare the machine learning algorithm's performance with the original Mie theory-based ILSA algorithm [11]. Here we first assess the ability of our machine learning approach to identify dysplastic (LSIL/HSIL) biopsy sites (Table 2). The network predictions based on majority voting offer high sensitivity, specificity, PPV, and NPV (100%, 94%, 94%, and 100% respectively), which is comparable to that using Mie-theory (100%, 97%, 97%, and 100% respectively). For distinguishing HSIL from benign/LSIL biopsy sites, the network results based on majority voting produced a sensitivity of 85%, a specificity of 90%, a PPV of 69%, and an NPV of 96%.

Table 2. Performance comparison between the CNN and Mie theory-based ILSA for identifying dysplasia (LSIL/HSIL) or HSIL using the point-probe a/LCI data as the test set

View Table | View all tables in this article

Data processing times calculated using 1000 depth-resolved angular scattering profiles were analyzed by the machine learning algorithm. The average time for each profile and each biopsy site was calculated accordingly. For example, Mie theory-based ILSA required an average of 235 ms to process each profile and approximately 23 s per biopsy site (based on averaging 100 scans from the same patient). The machine learning algorithm realized a 100-fold increase in processing speed, achieving an average processing time of 2.23 ms per profile and 0.24 s per biopsy site (based on majority voting of 100 scans from the same patient).

The approach is then performed by training and testing on datasets collected from two different instruments to mirror the design of the prospective study [12]. Dataset B, obtained from the scanning a/LCI instrument [12], was tested using the pre-trained network on dataset A. ROC curves were determined based on each of the two dichotomizations, shown in Fig. 4. For dysplastic versus non-dysplastic, the ROC analysis resulted in an area under the curve (AUC) of 0.932 (Fig. 4(a)) compared to an AUC of 0.884 using Mie theory-based ILSA. The optimal threshold produced sensitivity and specificity of 90.3% and 85.7%, respectively, compared to 90.3% and 81.6%, respectively, for the Mie-theory ILSA algorithm. For the machine learning algorithm to distinguish HSIL versus benign/LSIL biopsy sites, the ROC analysis produced an AUC of 0.853 (Fig. 4(b)), a sensitivity of 91.7%, a specificity of 77.9%, PPV 44.3%, and NPV of 98.2%. This was then compared against the Mie theory ILSA algorithm, yielding an AUC of 0.846, sensitivity of 100%, a specificity of 70.6%, PPV of 37.5%, and NPV of 100%. The full analysis for each dichotomization is presented in Table 3.

Fig. 4. ROC curves for both the machine learning approach and Mie theory-based ILSA algorithm for biopsies dichotomized as (a.) dysplastic versus non-dysplastic and (b.) HSIL versus benign/LSIL.

Download Full Size | PDF

Table 3. Performance comparison between the CNN and Mie theory-based ILSA for identifying dysplasia (LSIL/HSIL) or HSIL using the scanning a/LCI data as the test set

View Table | View all tables in this article

4. Discussion

Using a CNN to classify the severity of dysplasia in the cervix based on a/LCI scattering data was successful with comparable performance and superior speed and generalizability compared to ILSA. Overall accuracy was high when considering only data from the point-probe study with 97% accuracy attained by the CNN in the histological (dysplastic vs. non-dysplastic) case compared to 98% from ILSA and 89% accuracy from the CNN in the treatment-based dichotomy (benign + LSIL vs. HSIL) vs. 86% from ILSA. Although excellent sensitivity and negative predictive values were realized in the treatment-based approach using ILSA analysis, these were not attained using the CNN, possibly because the original linear discriminant analysis (LDA)-based classification line was weighted to favor higher sensitivity to maximize clinical utility, as discussed further below. However, accuracy in all other aspects, including specificity and PPV, independent of the classification scheme, and sensitivity and NPV in the histology-based approach are improved compared to ILSA. Future work will involve adjusting initial parameters to maximize clinical utility by being weighted towards higher sensitivity.

The clinical advantage of using the CNN to classify dysplastic tissue from raw a/LCI scans over ILSA is most apparent from the increased generalizability. The instrument-specific low-pass filtering and recalibration of tissue start depth required between each patient add substantial processing complexity and time that hinder the clinical utility of a/LCI, especially if it were to be employed at scale. The CNN, when trained on the point-probe a/LCI data and tested on the scanning a/LCI, performed very similarly to that of the ILSA-based prospective study using scanning a/LCI, with the CNN achieving an overall accuracy of 87.5% in the histological case vs. 85% from ILSA, and an accuracy of 80% in the treatment-based case compared to 75% from ILSA. The LDA-based classification line used in the prospective ILSA-based study was also weighted to maximize sensitivity, producing similar variations in sensitivity, specificity, PPV, and NPV compared to those observed in the point-probe a/LCI data when classified using the CNN. The overall lower accuracy of the scanning a/LCI instrument compared to the point-probe a/LCI instrument mainly was because it was a prospective study using a previously established decision line. However, there were also differences like the cohort, where the scanning a/LCI was used in a patient population with a known diagnosis of dysplasia or other cervical diseases. The greater prevalence of dysplasia in the second study can produce skewed measurements where a lesion in one region of the cervix can influence the overall health of the cervix due to inflammation or the field effect of carcinogenesis [12,35]. The reduced performance using the scanning a/LCI instrument as seen from both the CNN classification and ILSA gives some credibility to this idea; corroboration of the CNN results with ILSA would imply it is less likely that the slightly weaker performance was due to flaws in instrumentation. Indeed, a/LCI has observed evidence of the field-effect previously [36]. However, the comparable level of accuracy seen with the CNN, which avoids the need to individually calibrate scans between patients or instrumentation, as needed for ILSA-based analysis, points to a much greater universal application of a/LCI; future studies investigating dysplasia using a/LCI will likely use deep learning in conjunction with or even in place of ILSA for classification.

Processing speed was greatly improved using CNN-based methods solely on the raw computational time required to perform classification. Traversing through the Mie library database and iteratively comparing each theoretical profile to a given scan is computationally intense and amplified by the number of repeated scans generally necessary to obtain averaged nuclear morphology information at a given biopsy site. Previous methods including the hybrid algorithm method [37] was introduced to reduce the database search range and decrease computational time. By that, we were able to achieve a processing speed of 73.2 milliseconds per profile, which was a threefold improvement compared with ILSA with a slight trade-off of accuracy [11]. Here, for better comparison of the CNN method and the ILSA, both approaches were performed on the same computer resulting in an averaged processing time for each method. Overall, the CNN requires 2.23 milliseconds per profile to produce a classification result for each scan vs. 235.4 milliseconds using ILSA, representing an over 100-fold improvement versus ILSA processing time alone while achieving comparable accuracies. Rapid classification of tissue in conjunction with the obviation of a calibration step presents the possibility of real-time cervical dysplasia diagnosis due to a/LCI.

Although our results already show an advance for cervical dysplasia detection using depth-resolved angular scattering profiles, some limitations remain that must be addressed before clinical implementation. One limiting factor that would influence the clinical utility of this method is the presence of unanalyzable scans due to low signal quality during the acquisition. Mie theory-based algorithms overcome these by setting thresholds during the χ² analysis to exclude nonunique scans or have higher χ² error than a “null-solution” scan [16]. However, these cannot be excluded in the same way using a CNN, and mislabeling an unmeaningful scan as benign or diseased would result in misclassified scans during predictions. Therefore, the current study was able to use only scans which already cleared the threshold. Another parameter that would influence the diagnostic accuracy is the angular range. For this analysis, we used an angular range of 20.8 degrees for each depth-resolved angular scattering profile. This was the physical limitation of the scanning a/LCI system, which was reduced during design to enable increased scan range in turn surveillance coverage of at-risk tissue. Future work will optimize parameters such as angular range and sampling to improve the network's classification performance [38]. Also, in this study we used raw depth-resolved angular scattering scans as datasets to feed into the CNN. Recent work suggests that CNN diagnosis [21,33] using extracted physical parameters provides better accuracy and robustness to the classification. Future work will also try combining CNN with ILSA to provide a more accurate and robust diagnostic method. Finally, diagnostics based on a/LCI in other tissue sites, such as detection of esophageal or colorectal dysplasia, will benefit from using deep learning for processing. A more general processing method that can be generalized to any a/LCI instrument and across tissue sites would significantly improve the clinical utility of a/LCI in detecting dysplasia.

Overall, this work demonstrates the first use of deep learning to identify disease state of tissue from raw a/LCI light scattering data and demonstrates that its performance is comparable to that of the Mie theory gold standard of inverse light scattering analysis while being faster and more generalizable across different patients or a/LCI instruments.

Funding

National Institutes of Health (R01 CA167421, R01 CA210544); National Science Foundation (2009841).

Acknowledgments

The authors would like to thank Dr. Roarke W. Horstmeyer for their technical advice.

Disclosures

Adam Wax is the founder and president of Lumedica Inc.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. RL Siegel, KD Miller, and A. Jemal, “Cancer statistics, 2019,” CA: A Cancer J. Clin. 69(1), 7–34 (2019). [CrossRef]

2. R Nayar and D. C. Wilbur, “The Bethesda system for reporting cervical cytology. 2019.

3. L. G. Koss, “The Papanicolaou test for cervical cancer detection: a triumph and a tragedy,” JAMA 261(5), 737–743 (1989). [CrossRef]

4. H. Weingandt, H. Stepp, R. Baumgartner, J. Diebold, W. Xiang, and P. Hillemanns, “Autofluorescence spectroscopy for the diagnosis of cervical intraepithelial neoplasia,” BJOG: An Int. J. Obstetrics & Gynaecol. 109(8), 947–951 (2002). [CrossRef]

5. I Pavlova, K Sokolov, R Drezek, A Malpica, M. Follen, and R. Richards-Kortum, “Microanatomical and biochemical origins of normal and precancerous cervical autofluorescence using laser-scanning fluorescence confocal microscopy,” Photochem. and Photobiol. 77(5), 550–555 (2003). [CrossRef]

6. P. Escobar, J. Belinson, A. White, N. Shakhova, F. Feldchtein, M. Kareta, and N. D. Gladkova, “Diagnostic efficacy of optical coherence tomography in the management of preinvasive and invasive cancer of uterine cervix and vulva,” Int. J. Gynecol. Cancer 14(3), 470–474 (2004). [CrossRef]

7. S. K. Chang, Y. N. Mirabal, E. N. Atkinson, D. D. Cox, A. Malpica, M. Follen, and R. Richards-Kortum, “Combined reflectance and fluorescence spectroscopy for in vivo detection of cervical pre-cancer,” J. Biomed. Opt. 10(2), 024031 (2005). [CrossRef]

8. P. R. Jess, D. D. Smith, M. Mazilu, K. Dholakia, A. C. Riches, and C. S. Herrington, “Early detection of cervical neoplasia by Raman spectroscopy,” Int. J. Cancer 121(12), 2723–2728 (2007). [CrossRef]

9. J. A. Freeberg, J. Benedet, L. West, E. Atkinson, C. MacAulay, and M. Follen, “The clinical effectiveness of fluorescence and reflectance spectroscopy for the in vivo diagnosis of cervical neoplasia: an analysis by phase of trial design,” Gynecologic Oncol. 107(1), S270–S280 (2007). [CrossRef]

10. J. Tan, M. Quinn, J. Pyman, P. Delaney, and W. McLaren, “Detection of cervical intraepithelial neoplasia in vivo using confocal endomicroscopy,” BJOG: An Int. J. Obstetrics & Gynaecol. 116(12), 1663–1670 (2009). [CrossRef]

11. D. Ho, T. K. Drake, K. K. Smith-McCune, T. M. Darragh, L. Y. Hwang, and A. Wax, “Feasibility of clinical detection of cervical dysplasia using angle-resolved low coherence interferometry measurements of depth-resolved nuclear morphology,” Int. J. Cancer 140(6), 1447–1556 (2017). [CrossRef] .

12. W. Y. Kendall, D. Ho, K. Chu, M. Zinaman, D. Wieland, K. Moragne, and A. Wax, “Prospective detection of cervical dysplasia with scanning angle-resolved low coherence interferometry,” Biomed. Opt. Express 11(9), 5197–5211 (2020). [CrossRef]

13. Y Zhu, N Terry, J Woosley, N Shaheen, and A. Wax, “Design and validation of an angle-resolved low-coherence interferometry fiber probe for in vivo clinical measurements of depth-resolved nuclear morphology,” J. Biomed. Opt. 16(1), 011003 (2011). [CrossRef]

14. N. G. Terry, Y. Zhu, M. T. Rinehart, W. J. Brown, S. C. Gebhart, and S. Bright, “Detection of dysplasia in Barrett's esophagus with in vivo depth-resolved nuclear morphology measurements,” Gastroenterology 140(1), 42–50 (2011). [CrossRef]

15. J. D. Keener, K. J. Chalut, J. W. Pyhtila, and A. Wax, “Application of Mie theory to determine the structure of spheroidal scatterers in biological materials,” Opt. Lett. 32(10), 1326–1328 (2007). [CrossRef]

16. W. J. Brown, J. W. Pyhtila, N. G. Terry, K. J. Chalut, T. A. D’Amico, T. A. Sporn, and J. V. Obando, “Review and recent development of angle-resolved low-coherence interferometry for detection of precancerous cells in human esophageal epithelium,” IEEE J. Sel. Top. Quantum Electron. 14(1), 88–97 (2008). [CrossRef]

17. K. J. Chalut, S. Chen, J. D. Finan, M. G. Giacomelli, F. Guilak, K. W. Leong, and A. Wax, “Label-free, high-throughput measurements of dynamic changes in cell nuclei using angle-resolved low coherence interferometry,” Biophys. J. 94(12), 4948–4956 (2008). [CrossRef]

18. D. Ho, S. Kim, T. K. Drake, W. J. Eldridge, and A. Wax, “Wavelet transform fast inverse light scattering analysis for size determination of spherical scatterers,” Biomed. Opt. Express 5(10), 3292–3304 (2014). [CrossRef]

19. C. Amoozegar, M. G. Giacomelli, J. D. Keener, K. J. Chalut, and A. Wax, “Experimental verification of T-matrix-based inverse light scattering analysis for assessing structure of spheroids as models of cell nuclei,” Appl. Opt. 48(10), D20–D5 (2009). [CrossRef]

20. A Krizhevsky, I Sutskever, and GE Hinton, eds. “Imagenet classification with deep convolutional neural networks:advances in neural information processing systems,” Advances in Neural Information Processing Systems 25 (NIPS 2012).

21. H. S. Park, M. T. Rinehart, K. A. Walzer, J.-T. A. Chi, and A. Wax, “Automated detection of P. falciparum using machine learning algorithms with quantitative phase images of unstained cells,” PLoS One 11(9), e0163045 (2016). [CrossRef]

22. C. L. Chen, A. Mahjoubfar, L.-C. Tai, I. K. Blaby, A. Huang, K. R. Niazi, and B. Jalali, “Deep learning in label-free cell classification,” Sci. Rep. 6(1), 21471 (2016). [CrossRef]

23. Y. Ozaki, H. Yamada, H. Kikuchi, A. Hirotsu, T. Murakami, and T. Matsumoto, “Label-free classification of cells based on supervised machine learning of subcellular structures,” PLoS One 14(1), e0211347 (2019). [CrossRef]

24. R. Rasti, A. Mehridehnavi, H. Rabbani, and F. Hajizadeh, “Automatic diagnosis of abnormal macula in retinal optical coherence tomography images using wavelet-based convolutional neural network features and random forests classifier,” J. Biomed. Opt. 23(3), 035005 (2018). [CrossRef]

25. M. A. Hussain, A. Bhuiyan, D. Luu C, R. Theodore Smith, H. Guymer R, H. Ishikawa, J. S. Schuman, and K. Ramamohanarao, “Classification of healthy and diseased retina using SD-OCT imaging and Random Forest algorithm,” PLoS One 13(6), e0198281 (2018). [CrossRef]

26. F. Li, H. Chen, Z. Liu, X.-D. Zhang, M.-S. Jiang, Z.-Z. Wu, and K.-Q. Zhou, “Deep learning-based automated detection of retinal diseases using optical coherence tomography images,” Biomed. Opt. Express 10(12), 6204–6226 (2019). [CrossRef]

27. S. P. K. Karri, D. Chakraborty, and J. Chatterjee, “Transfer learning based classification of optical coherence tomography images with diabetic macular edema and dry age-related macular degeneration,” Biomed. Opt. Express 8(2), 579–592 (2017). [CrossRef]

28. V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, and A. Narayanaswamy, “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs,” JAMA 316(22), 2402–2410 (2016). [CrossRef]

29. M. J. Van Grinsven, B. van Ginneken, C. B. Hoyng, T. Theelen, and C. I. Sánchez, “Fast convolutional neural network training using selective data sampling: Application to hemorrhage detection in color fundus images,” IEEE Trans. Med. Imaging 35(5), 1273–1284 (2016). [CrossRef]

30. D. Cunefare, A. L. Huckenpahler, E. J. Patterson, A. Dubra, J. Carroll, and S. Farsiu, “RAC-CNN: multimodal deep learning based automatic detection and classification of rod and cone photoreceptors in adaptive optics scanning light ophthalmoscope images,” Biomed. Opt. Express 10(8), 3815 (2019). [CrossRef]

31. M. J. Huttunen, R. Hristu, A. Dumitru, I. Floroiu, M. Costache, and S. G. Stanciu, “Multiphoton microscopy of the dermoepidermal junction and automated identification of dysplastic tissues with deep learning,” Biomed. Opt. Express 11(1), 186–199 (2020). [CrossRef]

32. X. Zhang and S.-G. Zhao, “Fluorescence microscopy image classification of 2D HeLa cells based on the CapsNet neural network,” Med. Biol. Eng. Comput. 57(6), 1187–1198 (2019). [CrossRef]

33. L. Zheng, K. Yu, S. Cai, Y. Wang, B. Zeng, and M. Xu, “Lung cancer diagnosis with quantitative DIC microscopy and a deep convolutional neural network,” Biomed. Opt. Express 10(5), 2446–2456 (2019). [CrossRef]

34. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv preprint, arXiv:14126980 (2014).

35. D. Damania, H. K. Roy, D. Kunte, J. A. Hurteau, H. Subramanian, and L. Cherkezyan, “Insights into the field carcinogenesis of ovarian cancer based on the nanocytology of endocervical and endometrial epithelial cells,” Int. J. Cancer 133(5), 1143–1152 (2013). [CrossRef]

36. F. E. Robles, Y. Zhu, J. Lee, S. Sharma, and A. Wax, “Detection of early colorectal cancer development in the azoxymethane rat carcinogenesis model with Fourier domain low coherence interferometry,” Biomed. Opt. Express 1(2), 736–745 (2010). [CrossRef] .

37. D. Ho, T. K. Drake, R. C. Bentley, F. A. Valea, and A. Wax, “Evaluation of hybrid algorithm for analysis of scattered light using ex vivo nuclear morphology measurements of cervical epithelium,” Biomed. Opt. Express 6(8), 2755 (2015). [CrossRef]

38. H. Zhang, Z. A. Steelman, D. S. Ho, K. K. Chu, and A. Wax, “Angular range, sampling and noise considerations for inverse light scattering analysis of nuclear morphology,” J. Biophotonics 12(2), e201800258 (2019). [CrossRef]

Stage (Number of scans)	Histological Classification LSIL/HSIL (3400) vs. Benign (3260)	Clinical Classification HSIL (1360) vs. Benign/LSIL (5300)
Sensitivity [%]	94.8	86.9
Specificity [%]	92.0	96.5
PPV [%]	92.5	85.7
NPV [%]	94.4	96.8

*Dysplastic (LSIL/HSIL, n=30) vs.* Non-Dysplastic (Benign, n=33)**
	Machine Learning Approach	Mie theory-based ILSA
Sensitivity [%]	100 (30/30)	100 (30/30)
Specificity [%]	94 (31/33)	97 (32/33)
PPV [%]	94 (30/32)	97 (30/31)
NPV [%]	100 (31/31)	100 (33/33)
Accuracy [%]	97 (61/63)	98 (62/63)
*High risk (HSIL, n=13) vs.* Low risk (Benign/LSIL, n=50)**
	Machine Learning Approach	Mie theory-based ILSA
Sensitivity [%]	85 (11/13)	100 (13/13)
Specificity [%]	90 (45/50)	82 (41/50)
PPV [%]	69 (11/16)	59 (13/22)
NPV [%]	96 (45/47)	100 (41/41)
Accuracy [%]	89 (56/63)	86 (54/63)
Processing Time:(ms/profile)^a(s/biopsy site)^a	2.23 0.24	235.423

*Dysplastic (LSIL/HSIL, n=31) vs.* Non-Dysplastic (Benign, n=49)**
	Machine Learning Approach	Mie theory-based ILSA
Sensitivity [%]	90.3 (28/31)	90.3 (28/31)
Specificity [%]	85.7 (42/49)	81.6 (40/49)
PPV [%]	80.0 (28/35)	75.6 (28/37)
NPV [%]	93.3 (42/45)	93.0 (40/43)
Accuracy [%]	87.5 (70/80)	85.0 (68/80)
*High risk (HSIL, n=12) vs.* Low risk (LSIL/Benign, n=68)**
	Machine Learning Approach	Mie theory-based ILSA
Sensitivity [%]	91.7 (11/12)	100 (12/12)
Specificity [%]	77.9 (53/68)	70.6 (48/68)
PPV [%]	42.3 (11/26)	37.5 (12/32)
NPV [%]	98.2 (53/54)	100 (48/48)
Accuracy [%]	80.0 (64/80)	75.0 (60/80)

Stage (Number of scans)	Histological Classification LSIL/HSIL (3400) vs. Benign (3260)	Clinical Classification HSIL (1360) vs. Benign/LSIL (5300)
Sensitivity [%]	94.8	86.9
Specificity [%]	92.0	96.5
PPV [%]	92.5	85.7
NPV [%]	94.4	96.8

*Dysplastic (LSIL/HSIL, n=30) vs.* Non-Dysplastic (Benign, n=33)**
	Machine Learning Approach	Mie theory-based ILSA
Sensitivity [%]	100 (30/30)	100 (30/30)
Specificity [%]	94 (31/33)	97 (32/33)
PPV [%]	94 (30/32)	97 (30/31)
NPV [%]	100 (31/31)	100 (33/33)
Accuracy [%]	97 (61/63)	98 (62/63)
*High risk (HSIL, n=13) vs.* Low risk (Benign/LSIL, n=50)**
	Machine Learning Approach	Mie theory-based ILSA
Sensitivity [%]	85 (11/13)	100 (13/13)
Specificity [%]	90 (45/50)	82 (41/50)
PPV [%]	69 (11/16)	59 (13/22)
NPV [%]	96 (45/47)	100 (41/41)
Accuracy [%]	89 (56/63)	86 (54/63)
Processing Time:(ms/profile)^a(s/biopsy site)^a	2.23 0.24	235.423

Deep learning classification of cervical dysplasia using depth-resolved angular light scattering profiles

Abstract

1. Introduction

2. Materials and methods

2.1 a/LCI datasets

2.2 Mie theory-based inverse light scattering analysis

2.3 Machine learning architecture

2.4 Data processing and statistical analysis

3. Results

4. Discussion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (4)

Tables (3)

Biomedical Optics Express