Nonlinear microscopy and deep learning classification for mammary gland
microenvironment studies

Arash Aghigh; Samuel E. J. Preston; Samuel E. J. Preston; Gaëtan Jargot; Heide Ibrahim; Sonia V Del Rincón; Sonia V Del Rincón; François Légaré

doi:10.1364/BOE.487087

1. Introduction

Over the last decade, we have improved our understanding of the microenvironment in which a tumor grows; composed of co-mingling tumor cells, immune cells, stromal cells, and the extracellular matrix (ECM) [1]. Various studies have correlated the arrangement of collagen in the microenvironment surrounding the tumor with patient survival [2–8]. Collagen organization at the tumor-stroma boundary is an essential indication of breast cancer disease progression and subsequent risk of local invasion and metastasis. Studying these so-called tumor-associated collagen signatures (TACS) can help to determine the invasiveness of a breast tumor [2,9]. TACS classification sorts heterogeneous tumor-associated collagen patterns into three physically distinct types: TACS-1, representing densely packed collagen close to the tumor boundary; TACS-2, a sphere-like collagen organization around TACS-1; and TACS-3, is linear collagen pointing away from the tumor boundary [2]. Studying the underlying mechanism of the formation of these TACS (especially TACS-3) can give valuable information about the pro-metastatic features of the tumor, as locally invasive tumor cells have been shown to use radially aligned collagen fibers as migration tracks to leave the primary site [10–12]. Moreover, it has been demonstrated that collagen fiber width, length alignment, and angle provide cues for the distinction between malignant/benign tumors and patient survival [9].

Many methods are available for studying the ECM around a tumor, of which histological staining is the most common [9]. Many stains are available, but the resolution of these stains, and the inability to quantify collagen features from them, have been limiting factors for ECM studies in cancer [9]. Another method used for ECM studies is liquid crystal polarizing microscopy [13]. It is fast, not expensive, and can be added to a microscope with a few optical components, but the signal processing and structure distinction are challenging [9].

To image collagen in tissues, SHG microscopy is the gold standard in imaging methods and has improved spatial resolution, limitation of phototoxicity, photobleaching, focal plane selectivity, and simple sample preparation [14]. This label-free non-invasive method provides a way to detect alterations of fibrillar collagen in the tumor microenvironment, which is impossible using other imaging techniques. SHG has played an essential role in cancer studies and has been successfully applied to assess collagen restructuring in breast [4,15,16], ovarian [17], prostate [18], and lung cancers [19]. All these studies have documented collagen morphological changes around the tumor using SHG microscopy. Nevertheless, orientation studies only based on SHG intensity can be subject to interference that masks the underlying structure [20] and make fibril orientation imaging impossible [21].

Polarization-resolved SHG microscopy (P-SHG) overcomes such limitations and combines the advantages of SHG microscopy with polarimetry [21–24]. It is used in collagen-related studies and provides accurate information about the structure of the fibrils in the imaging plane, which is highly advantageous for cancer research [25]. More advanced P-SHG microscopy systems exist, such as polarization-in, polarization-out (PIPO), that can also extract the asymmetry of fibril distribution [26] and have been successfully applied in lung [19] and breast cancer studies [27].

SHG and P-SHG image analyses have also developed over the past few years due to increasing amounts of information that can be extracted from acquired images. Collagen fiber alignment, width, length, texture, density, and TACS are all exciting metrics that can be identified using post-image-processing methods [9]. Image analysis and processing usually rely on human visual inspection for data validation. Deep learning can eliminate manual data inspection and automate image analysis, such as image classification [28–31]. Deep learning and machine learning in SHG microscopy have been applied to lymphedema [28], ovarian tissue [29,32], and breast cancer [30], to name a few. Care must be taken using these methods, as deep learning for smaller datasets can be challenging and therefore requires measures for the trained model to be accurate [33].

In this study, we imaged naïve and tumor-bearing murine mammary glands using an automated SHG and P-SHG microscopy system. From the SHG images, we identified collagen aggregations around the tumor boundary and dim SHG signal due to the tumor’s takeover of a significant portion of the mammary gland. Afterward, we applied our custom-written python program to analyze P-SHG images and CurveAlign to analyze SHG images to measure fibrillar orientations. CurveAlign is an effective technique for quantifying collagen fibers and can quickly extract orientation data from SHG images [34]. Nonetheless, it requires human inspection and can be prone to missing the finer fibers in images. In contrast, our automated P-SHG image analysis can resolve and detail finer collagen fibers at the cost of requiring increased imaging acquisition time. Following this, we trained a supervised deep-learning model for the SHG images to evaluate whether we could classify naïve and tumor-bearing mammary glands using a small dataset. In this process, different data splits were assessed, and other parameters of the trained model were also fine-tuned for each case to find the best possible deep-learning model for our data. A comparison was made with the well-known image classification model MobilenetV2 [35].

2. Methodology

2.1 Tissue preparation

Female BALB/c mice were purchased from Charles River Laboratories. All animal experiments were conducted according to the regulations established by the Canadian Council of Animal Care, under protocols approved by the McGill University Animal Care and Use Committee. For the collection of naïve glands, mice were euthanized at approximately 8 weeks of age, and the 4th (inguinal) mammary gland was removed. Whole mount preparations were made using blunt tweezers to manipulate the mammary glands, spreading the tissue flat against a Superfrost microscope slide (VWR). Mounted mammary glands were then immediately placed in Carnoy’s fixative (60% ethanol, 30% chloroform, 10% acetic acid) for 24 hours at 4° C, after which they were stored in 70% ethanol.

The murine tumor-bearing samples used in this study were derived from two orthotopic models: (1) injection of 4T1 cells into nulliparous mice and (2) injection of 66cl4 cells into mice in the postpartum period (an aggressive form of breast cancer). The 4T1 cells were provided by Dr. Peter Siegel’s group (McGill University) and were cultured in DMEM (Wisent) supplemented with 10% FBS and antibiotics. The 66cl4 cells were provided by Dr. Josie Ursini-Siegel’s group (McGill University) and cultured in RPMI (Wisent) supplemented with 10% FBS and antibiotics. Cells were maintained at a low passage number before use. For both models: 1 × 10⁵ cells were injected into the 4th mammary fat pad, and tumors were allowed to grow for two weeks. At 14 days post-injection, mice were euthanized, and primary tumors and surrounding stroma were removed. Samples were fixed in 10% Neutral Buffered Formalin (VWR) for 48 hours at 4° C, after which they were stored in 70% ethanol.

Following fixation, naïve and tumor-bearing mammary glands were embedded in paraffin and serially sectioned (5 µm thickness). Slides were deparaffinized and rehydrated by submersion in three rounds of xylene, two rounds of 100% ethanol, one round of 95% ethanol, and one round of 70% ethanol (5 minutes per round). Rehydrated slides were rinsed for 5 minutes in distilled water. Coverslips (VWR, No. 1) were then mounted onto slides using Permount mounting medium (Fisher). Slides were allowed to dry overnight before downstream microscopy.

2.2 SHG imaging setup

SHG microscopy and P-SHG microscopy were performed using a custom laser stage scanning inverted microscope (for more details, see Fig. 1). A mode-locked Ti:Sa laser (Tsunami, Spectra-Physics) pumped by a 12 W Millenia Pro laser (Spectra-Physics) was used. This laser emits pulses around 810 nm and delivers 150 fs pulse duration at 80 MHz repetition rate with an average power of 2.5 W. For power control, a half-wave plate and a Glan-Thompson polarizer were used to adjust the average power to 50 mW (0.625nJ pulse energy) at the focus of the objective. Given the size of the samples to image, sample scanning using a high-speed motorized XY scanning stage (MLS203, Newton, NJ) was used. The focus was adjusted coarsely and finely with mechanical and piezoelectric motors (PI Nano-Z, USA). An air objective (UplanSApo 20X, NA 0.75, Olympus, Japan) was used for illumination. A condenser was used to collect the sample’s SHG emission, which was detected by a photomultiplier tube (R6357, Hamamatsu Photonics) set at 800 V. The SHG signal was isolated by two spectral filters placed before the photomultiplier. A short pass filter that blocks any wavelength higher than 720 nm (i.e., the input laser light) is employed with a bandpass filter centered at 405 nm to filter out any residual input light. A multichannel I/O board (National Instruments) and a custom-written Python program were used for signal acquisition and synchronization. Given the sample size and the acceleration and deceleration times of the motorized scanning stage, each SHG image had an acquisition time of a few minutes. Raw data were visualized using Fiji-ImageJ (NIH, USA). For P-SHG measurements, 1000 × 1000 µm regions of interest were imaged, and for image classification, 9000 × 5000 µm whole sample images were taken.

Fig. 1. Layout of the SHG and P-SHG inverted microscope setup. When using only SHG, the motorized half-wave plate is removed. For P-SHG, the angles range from 0 to 170 degrees. The microscope and the motorized half-wave plate work under a unified custom python program for P-SHG measurements.

Download Full Size | PDF

For P-SHG, a motorized half-wave plate was used to rotate the linear polarization of the laser beam to acquire the images. To avoid any polarization distortion and due to the size of the sample, we used sample scanning instead of laser scanning. Images were taken for 18 polarizations states in 10-degree steps from 0 to 170 degrees. The motorized half-wave plate and the sample scanning were all synchronized with a home-built python program (for a complete description of the program, see [36]). A custom MATLAB script based on [37,38] was used for processing the P-SHG images. Fourier transform of the measured intensity about the angle is taken with a spatial fast Fourier transform algorithm. For more theoretical information about the script and how to obtain the polarization angle based on the SHG intensity, please refer to [37,38]. To summarize, the SHG intensity of collagen fibers with respect to the linear polarization angle of the input light source Ω can be written as [37]:

(1)$${I_{SHG}}(\mathrm{\Omega } )= K[{Acos({4\mathrm{\Omega } - 4\theta } )+ Bcos({2\mathrm{\Omega } - 2\theta } )+ 1} ]$$

Where A and B are associated with the susceptibility components, K is the mean number of photons detected, and θ is the collagen fiber in-plane orientation. By varying the angle Ω (i.e., using the half-wave plate to change the linear polarization of the input laser), the main direction of the fibrils (θ) can be extracted [18,19]. After a reliability test between the associated susceptibility components and the experimental data, P-SHG data can be extracted [37]. This modified MATLAB script integrates with our imaging pipeline and accepts 32-bit images [24].

In this study, we benchmark our method with CurveAlign, a well-known tool for fibrillar collagen quantification at the tumor boundary [34]. CurveAlign is a curvelet transform-based fibrillar collagen quantification platform. It consists of a few steps; first, a two-dimensional fast discrete curvelet transform is performed. Second, based on the scale of interest and the threshold of the remaining coefficients, the center and spatial orientation of each curvelet are found, and by grouping the adjacent curvelets, the local fiber orientations are estimated [39]. The simplified diagram of P-SHG and CurveAlign analysis is depicted in Fig. 2.

Fig. 2. Simplified P-SHG and CurveAlign analysis flowchart. For P-SHG analysis, 18 SHG images (32-bit TIFF) of the regions of interest (ROI) are taken in 10-degree steps from 0 to 170 degrees, and the results (color wheel, orientation map, anisotropy parameter map, and histogram data) are stored. For CurveAlign, a single SHG image is inputted to the CurveAlign script, and the results (overlay image alongside its histogram data) are stored.

Download Full Size | PDF

2.3 Image classification using deep learning and transfer learning

The image classification was done using TensorFlow [40], an open-source Python library developed by Google. Moreover, transfer learning was performed using the MobileNetV2 architecture since it is the most common architecture used for the image. Forty-six images comprising 28 naïve and 18 tumor-bearing mammary gland samples were used. Due to the small sample size, data augmentation involving flip, rotation, and zoom was performed before the image classification. The data was trained for twenty-five epochs, and the accuracy and loss of the training and test data were recorded for a data split of 10, 20, 30, and 40% between the training and test data, and the results were plotted to determine the overall performance of the model.

3. Results and discussions

3.1 SHG imaging

Given the sample size, a 9000 × 5000 µm area was selected to encompass most of the mammary gland and its surroundings. In this configuration, stage scanning was used for the imaging, and each image was taken over 3 min with a step size of 10 µm/pixel and 150 ms exposure time. Figure 3(a, b) are from naïve, non-tumor bearing mammary glands. As can be seen, the mammary gland and its surroundings have well-defined ductal structures.

Fig. 3. SHG images with normalized intensity calibration bars of (a, b) naïve and (c-f) tumor-bearing mammary glands. SHG microscopy can resolve the intricacies of the microenvironment. It shows that the tumor and the lymph nodes (LN) do not produce SHG signal, which leads to a loss of SHG signal as it progresses throughout the gland. Moreover, the yellow arrows in (c, d) indicate the collagen barrier formed between the tumor and the rest of the mammary gland ECM. (e, f) are more advanced cases where the tumor has taken over the majority of the mammary gland, with little normal tissue structure remaining.

Download Full Size | PDF

The mammary gland tumors in the bottom row do not generate SHG. In Fig. 3(c,d), the tumor edge is more pronounced, and aggregated collagen can be seen forming a barrier in the boundary between the tumor and the stroma, evident from the stronger SHG signal in the center of the figure (see yellow arrow). This finding agrees with previous studies that suggest collagen deposition around a tumor can form a barrier (collagen fibers being parallel to the tumor boundaries) which provides a protective layer to physically constrain the spread of the tumor [9,41]. The fibrillar orientation and angle of the formed barriers will be addressed in section 3.2. In extreme cases such as Fig. 3(e,f), there is a very dim SHG signal since the tumor has taken over most of the mammary gland. Although the SHG intensity reveals some aspects of tumor biology, for extracting the orientation of the collagen fibrils and defining the potential risk of invasion, P-SHG acquisition and image post-processing are necessary.

3.2 P-SHG and CurveAlign analysis

Restructuring of collagen fibers at the tumor-stroma junction is known to help promote local invasion and metastasis; therefore, extraction of fibrillar orientation data is essential [42]. Two approaches can be used for orientation extraction of the collagen fibrils: (i) P-SHG and extraction of angle data afterward using a custom MATLAB script or (ii) taking SHG images and processing them using CurveAlign. For P-SHG microscopy, a 1000 × 1000 µm area was chosen with a 3 µm/pixel step size and 90 µs exposure time, leading to a 4 min acquisition time for each image. For each P-SHG analysis, 18 images were taken, bringing the whole imaging process to 72 min per sample. The boundaries around the tumor of three samples were examined and afterward processed using the Fast Fourier transform process mentioned above.

To benchmark the capabilities of our P-SHG imaging and data analysis in studying the environment around the tumor boundary, CurveAlign software was used on SHG images taken from the same region of interest. Figure 4 provides a summary of the results obtained during this experiment:

Fig. 4. shows SHG, P-SHG, and CurveAlign analysis of 5 samples with normalized intensity calibration bar. Each row represents the same region of interest. P-SHG images are accompanied by a color wheel, with each angle (0-360 degrees) represented by a color and the fibrillar histogram. Both approaches provide an excellent distinction between the tumor and its surroundings, although in both cases, there are some underfilling and overfilling of regions shown using white dashed lines in P-SHG and red dashed lines in CurveAlign. As an example, in the first row, when we compare both approaches to the reference SHG image, we can see that the P-SHG method is overfilling in one region indicated by the white dashed line, and there are four areas in which CurveAlign has either underfilled or overfilled indicated by the red dashed lines. Similar errors can be seen in the consecutive rows as well.

Download Full Size | PDF

For our samples, P-SHG analysis provides orientation details for smaller and finer collagen fibers than its CurveAlign counterpart. In CurveAlign, the estimation of the orientation angle leads to insufficient detection in regions where the SHG signal is dim and overestimation in other regions (red dashed lines in Fig. 4). Overall P-SHG analysis is more detailed, albeit noisier, and with some overfilling in different samples (white dashed lines in Fig. 4). Overfilling in the P-SHG analysis is due to the goodness of fit (R²) (range between 0 and 1 of the pixel intensity) that is defined during analysis. If the intensity of a pixel is lower than the goodness of fit, it will be omitted from the analysis. Therefore, it is necessary to keep the goodness of fit between 0.3-0.4 so that noisier pixels are not omitted from the analysis, which would lead to overfilling of the images in some areas. We also performed multi-scale structural similarity index (MS-SSIM) (ranging from 0 to 1, with 0 being not similar and 1 being identical) by taking the SHG images used for the analyses as the ground truth (GT) and comparing the P-SHG analysis and CurveAlign analysis to the GT images and each other:

Based on the results of Table 1, a quality metric for image similarity is a necessity to fine-tune the parameters of both analysis methods. In addition, our study found that by increasing the R² minimum of the P-SHG analysis, the MS-SSIM between the P-SHG and CurveAlign analysis overlay image is increased (Please refer to Supplement 1 Table S1-S5 for more details). The solution for the underfilling and overfilling is to have a noise threshold (which our analysis method provides) that can be changed by the user based on the similarity and quality metric index between the analyzed image and GT image in the form of either a noise-free SHG image or a complementary fluorescence image. Given that P-SHG image acquisition takes 72 min compared to the 3 min of standard SHG imaging followed by the CurveAlign image processing, there is a tradeoff between imaging time and accuracy that depends on the study and experimental goals, such as boundary requirements and imaging time. To summarize, CurveAlign is a powerful image processing tool that can be used for collagen quantification around the tumor boundary but requires human inspection and high-quality SHG imaging and can miss or overfill some fiber orientations in the images. However, P-SHG imaging provides a more detailed view of the tumor microenvironment, and it can resolve finer fibers but at the cost of the image being noisier and imaging acquisition being much more time-consuming. A solution for overfilling and underfilling in both methods is to introduce a similarity index metric for comparison between the analyzed image and the GT image (complementary fluorescence image or high-quality SHG image) and to have a flexible noise threshold metric as it varies from sample to sample.

Table 1. MS-SSIM index of P-SHG and CurveAlign analysis

View Table | View all tables in this article

3.3 Image classification

Image classification was performed on the data using the SHG images discussed in the previous section. SHG images of naïve and tumor-bearing mammary glands were first preprocessed in ImageJ to adjust brightness and denoise. Two models were trained for the dataset. A custom sequential model was made using the Keras library, an open-source library from Google integrated with TensorFlow, and transfer learning was performed using the MobilenetV2 model. Both models were written and trained using Google Colaboratory. Finally, to determine the effectiveness and precision of the architecture, the accuracy and loss are plotted to visualize how the models fit the data. The image processing pipeline can be seen in Fig. 5, and the architecture of the sequential model is seen in Fig. 6.

Fig. 5. Image processing pipeline from SHG imaging to evaluate the trained model’s accuracy and precision.

Download Full Size | PDF

Fig. 6. Architecture of the convolutional neural network (CNN) model built using the Keras API. The data augmentation layer (sequential) creates new training examples by applying random transformations to existing images, such as rotating, flipping, or zooming. The rescaling layer rescales the input image pixel values from the range [0, 255] to the range [0, 1]. The conv2D layer applies a convolutional operation with 16, 32, and 64 filters, and Maxpooling2D reduces the spatial dimensions of the input by taking the maximum value in each 2 × 2 window. The Dropout layer randomly sets a fraction of the input units to zero during each training epoch (the rate is 0.2). Flatten layers flatten the previous layer’s output into a one-dimensional vector, fed into two dense fully connected neural network layers (Dense). We added more Conv2D and Maxpooling2D layers, up to 20 layers, to test how the addition of layers affected the accuracy of the classifier model.

Download Full Size | PDF

Overall, this model consists of convolutional layers that extract features from the input image, followed by a fully connected neural network layer that makes the final classification decision. The model is trained using the Sparse Categorical Cross entropy loss function and optimized using the Adaptive Moment Estimation backpropagation algorithm.

3.3.1 Deep learning feasibility on a small dataset

Before we examine the efficiency of the trained model, we must define some terms used to quantify its performance. There are performance markers that are used for measuring the capability of a trained model. However, for simplicity, we will perform cross-validation by splitting the data into training and test datasets, e.g., a percentage of the data will be selected as the training dataset and the remaining percentage as the test dataset. The model uses the training data to learn from, and the test data is used for assessing the model’s performance [43]. We chose the following data split percentages: (90% training data, 10% test data), (80% training data, 20% test data), (70% training data, 30% test data) and (60% training data, 40% test data). For simplicity, they will be called 10%, 20%, 30%, and 40% data splitting, respectively.

With the definition of the data splits that will be evaluated, we can define some terms that refer to each data set. Training/test accuracy refers to how well the model fits the specified training/test data. Training/test loss assesses the model’s error when learning from the training/test data. Training/test accuracy and loss are good metrics to assess the model’s fit on the data. When test loss is greater than the training loss, the model is “memorizing” the training data set, and therefore its ability to be applied to unseen data is impaired [44]. This is called overfitting. There is also underfitting, in which the model needs more steps (epochs) to go through the data and be fully trained. One good indicator that can reveal many aspects of the system is the training/test loss curve, which shows how well the model performs. Each case presented in this section will be summarized in a table that shows the average training/test accuracy after 25 epochs and represented in the training/test loss curve to give a clearer picture of the model’s performance.

Data augmentation and more complex architectures are unique strategies to avoid overfitting [43,45,46]. Many complex architectures solve overfitting by adding extra processing layers, but data augmentation targets the problem’s root: the available training data. As in the case of image classification, the number of available data is artificially inflated by changing different aspects of the training image dataset, such as cropping, flipping, rotating, etc [43,45,46]. All these measures mitigate overfitting.

Due to the small sample size of the data available, we also apply a variety of data augmentation such as crop, zoom, translation, and flip to inflate our dataset artificially. We also introduced a 20% dropout layer to avoid overfitting by randomly removing 20% of the nodes and their connections from our neural network, resulting in a new network architecture independent of the parent network [47]. The accuracy and loss curve of the first model can be seen below in Table 2.

Table 2. Average training and test accuracy for different data splits using a custom sequential model

View Table | View all tables in this article

It is evident from the data split of 10% that the test accuracy is constant, which can be due to the small test data set. Still, the training accuracy increases with each epoch, meaning the model better fits the training accuracy. In the 20% data split, we see that the accuracies are closer, and there are some epochs where the model was more accurate for the test dataset than the training set. In the 30% data split, we see a gap between the training and test accuracy, which is higher than the training accuracy. This gap could mean that the test data set is more straightforward for the model to understand than the training dataset. It could also be because the data augmentation we introduced makes it harder for the model to learn from the training dataset.

Moreover, since we are using dropout during training, in which some information from the training data is lost, it could be that consecutive layers will try making predictions based on incomplete data, thereby making it harder for the model to adapt. We will explore solutions to this problem in later sections when we change different parameters to see how it affects the trained model. There is also a gap in the 40% data split between the training and test accuracy, where the training accuracy is lower than the test accuracy.

Besides accuracy, the loss curve can provide relevant information about the model’s state, whether it is fitting, overfitting, or underfitting. Figure 7 provides the loss curve of the four data splits that were evaluated with the model:

Fig. 7. Loss curve of the model for 4 data splits. We have clear overfitting for the first data split after epoch 5, which is undesirable. The model fits the data well for 20% data splitting besides the overshoot seen at epoch 10. The best-case scenario from all the splits can be seen for 30% data splitting, where the model performs perfectly for the data provided. For the last case, we have underfitting until epoch 10, followed by overfitting.

Download Full Size | PDF

Therefore, the test accuracy of this model, without overfitting, is 30% data splitting at approximately 73%. Based on this first investigation, we can fine-tune other model parameters to see whether we can overcome the training and test accuracy gap. We performed receiver-operating characteristic (ROC) analysis, but in the case of our data, we have 44% data bias (33 healthy cases vs. 13 cancer cases), indicating a significant class imbalance in the data. Therefore, we implemented Precision, Recall, F1 score, and area under the curve (AUC) metrics to understand our model accuracy better. The model has a high recall (1.00), which means it can correctly identify the positive cases at the cost of having false positives. The precision of our model was low (0.3), and the F1 score of our model was 0.5. We have an AUC range of 0.51-0.55, which is expected given the high-class imbalance in the data.

3.3.2 Addition of more data augmentation layers and elimination of dropout

As previously mentioned, two elements, namely (i) dropouts and (ii) insufficient data, could explain the gap between the test and training accuracy. Therefore, we can introduce more data augmentation layers to the model to increase available data, eliminate the dropout in the model, and see how well it performs. The result of these changes is summarized in Table 3:

Table 3. Average training and test accuracy for different data splits using more data augmentation layers and no dropout

View Table | View all tables in this article

Table 3 shows that due to the extra layers of data augmentation implemented, the training accuracy decreases as we have made it harder for the model to learn from the training dataset. In one scenario, we also preferentially augmented the tumor-bearing data to balance out the dataset. Nevertheless, the test accuracy of the model remains the same even with the elimination of dropout and having more training data. For this trained model, the 10% data splitting has underfitting until epoch 18, and afterward, the model overfits after epoch 24. Overfitting happens in many earlier epochs for the case of 20, 30, and 40% data splits. Taking the loss curve and the accuracy of this model into consideration, adding more data augmentation, and eliminating dropout do not help improve the model’s performance. Experimenting with model parameters leads to the conclusion that the test accuracy of the model is sometimes higher due to the limited availability of data and that the test data is more straightforward than the training data. Moreover, it could be beneficial to omit some of the naïve samples to balance the dataset, but given the already limited dataset, it can negatively impact the model’s performance, and therefore we decided against it.

3.3 Transfer learning using MobileNetV2

In this section, we will explore whether transfer learning with the well-established MobileNetV2 model would be the better approach. The motivation for choosing the MobilenetV2 architecture is due to its lightweight and fewer tunable parameters, as our dataset is very small. Other networks, such as ResNet, Alexnet, and GoogLeNet, were considered, but these networks are much deeper compared to MobilenetV2, with many more layers and parameters. This makes it more prone to overfitting, especially when dealing with small datasets. The deeper architectures enable the networks to capture more complex features in the data, including noise and outliers, which can hinder the model’s ability to generalize to new data. As mentioned before, machine learning usually relies on the high quality and size of training data, which is only sometimes readily available in real-world scenarios (SHG microscopy images are an excellent example of this problem) [29]. Moreover, training models based on small datasets are usually very application-specific and cannot be applied to other datasets. In these situations, transfer learning is appealing because researchers can leverage models trained on much bigger datasets to conform to their datasets and refine the learning process to be valid for their application [29]. Table 4 provides the accuracies of this model when trained using our dataset.

Table 4. Average training and test accuracy for different data splits using MobileNetV2

View Table | View all tables in this article

We see that the model’s accuracy deteriorates after a 10% data split. From the loss curve, we have overfitting after epoch 4 for 10%, after epoch 8 for 20%, and after epoch 12 for 40% data splits. Interestingly for MobilenetV2, we see overfitting happening for just the 10% and 40% data split cases in which overfitting happens after epochs 8 and 23. For the 20% and 30% data split, the 0.001 dropout rate in the MobilenetV2 architecture causes the training loss to fluctuate, whereas the gap between the test and training loss remains around the same size [47].

To summarize, with such a small dataset, training a simple classification network from scratch is optimal but data-specific. Using transfer learning with well-known networks is a solution, but due to the complexity and number of layers present in such architectures, overfitting and underfitting are more prominent in that case.

4. Conclusion

In this study, SHG and P-SHG microscopy were used to study the ECM within tumor-bearing mammary glands. SHG microscopy can help identify the collagen aggregates that appear at the tumor-stroma boundary, and P-SHG microscopy is an excellent tool for analyzing collagen fibrillar orientations in the ECM. We have shown an automated SHG and P-SHG microscopy system that minimizes human intervention. We apply two image analysis methods for the collagen fibrillar orientation analysis. CurveAlign is a powerful tool that can be applied to SHG images to distinguish collagen fibrillar orientation with respect to the tumor boundary. Our custom-written P-SHG analysis method can achieve the same results with greater detail. Furthermore, deep learning and image classification can be powerful tools to differentiate between healthy and tumor-bearing samples within the limitation of a small training dataset. Therefore, if deep learning is to be used for SHG imaging, a database should be available where imaging labs worldwide can pool their images. This would help to remove data availability as a limiting problem. After investigating the variation of different parameters, the best model that showed promising results with our small dataset was the 30% data split with 0.2 dropout and three layers of data augmentation that gave a test accuracy of 73%. Another limitation of our study is the imaging speed, which requires further improvements. It is worth highlighting that P-SHG analysis, in conjunction with image classification and widefield imaging, has shown great promise in cancer research and provides excellent insight into the underlying mechanisms of collagen formation and remodeling in the ECM [30]. In addition, in future studies, we will explore the feasibility of adapting the machine learning approach used in this study to other tissue types. This will allow us to determine whether the approach is amenable to a broader range of applications and identify any potential limitations or challenges. It will be valuable to expand this work’s scope to include the analysis of metastatic lung tissue. This will let us assess the changes in collagen patterns that occur in metastatic outgrowth and potentially identify markers for early detection. Moreover, SHG and P-SHG imaging are well-established methods that have been successfully used for many years. In contrast, image analysis for these methods is still in its infancy, thereby necessitating the exploration of different analysis methods that can be used alongside these imaging techniques.

Funding

Canada Foundation for Innovation; Fonds de recherche du Québec – Nature et technologies; Natural Sciences and Engineering Research Council of Canada; New Frontiers Research Fund; NSERC CREATE; Canadian Cancer Society (707140); Fonds de Recherche du Québec - Santé; Epstein Fellowship in Women's Health.

Acknowledgments

The authors acknowledge technical support from Antoine Laramée.

Disclosures

The authors declare no competing interests.

Data availability

Data underlying the results presented in this paper are not publicly available but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. L. Bejarano, M. J. C. Jordāo, and J. A. Joyce, “Therapeutic targeting of the tumor microenvironment,” Cancer Discov. 11(4), 933–959 (2021). [CrossRef]

2. E. A. Brett, M. A. Sauter, H.-G. Machens, and D. Duscher, “Tumor-associated collagen signatures: pushing tumor boundaries,” Cancer Metab. 8(1), 14–15 (2020). [CrossRef]

3. S. Xu, H. Xu, W. Wang, S. Li, H. Li, T. Li, W. Zhang, X. Yu, and L. Liu, “The role of collagen in cancer: from bench to bedside,” J. Transl. Med. 17(1), 309 (2019). [CrossRef]

4. M. W. Conklin, J. C. Eickhoff, K. M. Riching, C. A. Pehlke, K. W. Eliceiri, P. P. Provenzano, A. Friedl, and P. J. Keely, “Aligned collagen is a prognostic signature for survival in human breast carcinoma,” Am. J. Pathol. 178(3), 1221–1232 (2011). [CrossRef]

5. K.-S. Hsu, J. M. Dunleavey, C. Szot, et al., “Cancer cell survival depends on collagen uptake into tumor-associated stroma,” Nat. Commun. 13(1), 7078 (2022). [CrossRef]

6. K. Song, Z. Yu, X. Zu, G. Li, Z. Hu, and Y. Xue, “Collagen remodeling along cancer progression providing a novel opportunity for cancer diagnosis and treatment,” Int. J. Mol. Sci. 23(18), 10509 (2022). [CrossRef]

7. W. Han, S. Chen, W. Yuan, Q. Fan, J. Tian, X. Wang, L. Chen, X. Zhang, W. Wei, R. Liu, J. Qu, Y. Jiao, R. H. Austin, and L. Liu, “Oriented collagen fibers direct tumor cell intravasation,” Proc. Natl. Acad. Sci. 113(40), 11208–11213 (2016). [CrossRef]

8. D. Chen, H. Chen, L. Chi, M. Fu, G. Wang, Z. Wu, S. Xu, C. Sun, X. Xu, L. Lin, J. Cheng, W. Jiang, X. Dong, J. Lu, J. Zheng, G. Chen, G. Li, S. Zhuo, and J. Yan, “Association of tumor-associated collagen signature with prognosis and adjuvant chemotherapy benefits in patients with gastric cancer,” JAMA Netw. Open 4(11), e2136388 (2021). [CrossRef]

9. J. N. Ouellette, C. R. Drifka, K. B. Pointer, Y. Liu, T. J. Lieberthal, W. J. Kao, J. S. Kuo, A. G. Loeffler, and K. W. Eliceiri, “Navigating the collagen jungle: the biomedical potential of fiber organization in cancer,” Bioengineering 8(2), 17 (2021). [CrossRef]

10. A. Ray, Z. M. Slama, R. K. Morford, S. A. Madden, and P. P. Provenzano, “Enhanced directional migration of cancer stem cells in 3D aligned collagen matrices,” Biophys. J. 112(5), 1023–1036 (2017). [CrossRef]

11. K. M. Riching, B. L. Cox, M. R. Salick, C. Pehlke, A. S. Riching, S. M. Ponik, B. R. Bass, W. C. Crone, Y. Jiang, A. M. Weaver, K. W. Eliceiri, and P. J. Keely, “3D collagen alignment limits protrusions to enhance breast cancer cell persistence,” Biophys. J. 107(11), 2546–2558 (2014). [CrossRef]

12. S. I. Fraley, P. Wu, L. He, Y. Feng, R. Krisnamurthy, G. D. Longmore, and D. Wirtz, “Three-dimensional matrix fiber alignment modulates cell migration and MT1-MMP utility by spatially and temporally directing protrusions,” Sci. Rep. 5(1), 14580 (2015). [CrossRef]

13. R. Oldenbourg and G. Mei, “New polarized light microscope with precision universal compensator,” J. Microsc. 180(Pt 2), 140–147 (1995). [CrossRef]

14. H. Lim, “Harmonic generation microscopy 2.0: new tricks empowering intravital imaging for neuroscience,” Front. Mol. Biosci. 6, 99 (2019). [CrossRef]

15. P. P. Provenzano, K. W. Eliceiri, J. M. Campbell, D. R. Inman, J. G. White, and P. J. Keely, “Collagen reorganization at the tumor-stromal interface facilitates local invasion,” BMC Med. 4(1), 38 (2006). [CrossRef]

16. P. P. Provenzano, D. R. Inman, K. W. Eliceiri, J. G. Knittel, L. Yan, C. T. Rueden, J. G. White, and P. J. Keely, “Collagen density promotes mammary tumor initiation and progression,” BMC Med. 6(1), 11 (2008). [CrossRef]

17. B. L. Wen, M. A. Brewer, O. Nadiarnykh, J. Hocker, V. Singh, T. R. Mackie, and P. J. Campagnola, “Texture analysis applied to second harmonic generation image data for ovarian cancer classification,” J. Biomed. Opt. 19(9), 096007 (2014). [CrossRef]

18. K. R. Campbell, R. Chaudhary, M. Montano, R. V. Iozzo, W. A. Bushman, and P. J. Campagnola, “Second-harmonic generation microscopy analysis reveals proteoglycan decorin is necessary for proper collagen organization in prostate,” J. Biomed. Opt. 24(06), 1 (2019). [CrossRef]

19. A. Golaraei, L. B. Mostaço-Guidolin, V. Raja, R. Navab, T. Wang, S. Sakashita, K. Yasufuku, M.-S. Tsao, B. C. Wilson, and V. Barzda, “Polarimetric second-harmonic generation microscopy of the hierarchical structure of collagen in stage I-III non-small cell lung carcinoma,” Biomed. Opt. Express 11(4), 1851–1863 (2020). [CrossRef]

20. M. Rivard, M. Laliberté, A. Bertrand-Grenier, C. Harnagea, C. P. Pfeffer, M. Vallières, Y. St-Pierre, A. Pignolet, M. A. El Khakani, and F. Légaré, “The structural origin of second harmonic generation in fascia,” Biomed. Opt. Express 2(1), 26–36 (2011). [CrossRef]

21. P. Stoller, K. M. Reiser, P. M. Celliers, and A. M. Rubenchik, “Polarization-Modulated Second Harmonic Generation in Collagen,” Biophys. J. 82(6), 3330–3342 (2002). [CrossRef]

22. P. J. Campagnola and L. M. Loew, “Second-harmonic imaging microscopy for visualizing biomolecular arrays in cells, tissues and organisms,” Nat. Biotechnol. 21(11), 1356–1360 (2003). [CrossRef]

23. S. G. Stanciu, F. J. Ávila, R. Hristu, and J. M. Bueno, “A study on image quality in polarization-resolved second harmonic generation microscopy,” Sci. Rep. 7(1), 15476 (2017). [CrossRef]

24. G. Latour, I. Gusachenko, L. Kowalczuk, I. Lamarre, and M.-C. Schanne-Klein, “In vivo structural imaging of the cornea by polarization-resolved second harmonic microscopy,” Biomed. Opt. Express 3(1), 1–15 (2012). [CrossRef]

25. R. Cisek, A. Joseph, M. Harvey, and D. Tokarz, “Polarization-sensitive second harmonic generation microscopy for investigations of diseased collagenous tissues,” Front. Phys. 9, 726996 (2021). [CrossRef]

26. A. E. Tuer, M. K. Akens, S. Krouglov, D. Sandkuijl, B. C. Wilson, C. M. Whyne, and V. Barzda, “Hierarchical model of fibrillar collagen organization for interpreting the second-order susceptibility tensors in biological tissue,” Biophys. J. 103(10), 2093–2105 (2012). [CrossRef]

27. A. Golaraei, L. Kontenis, R. Cisek, D. Tokarz, S. J. Done, B. C. Wilson, and V. Barzda, “Changes of collagen ultrastructure in breast cancer tissue determined by second-harmonic generation double Stokes-Mueller polarimetric microscopy,” Biomed. Opt. Express 7(10), 4054–4068 (2016). [CrossRef]

28. Y. V. Kistenev, V. V. Nikolaev, O. S. Kurochkina, A. V. Borisov, D. A. Vrazhnov, and E. A. Sandykova, “Application of multiphoton imaging and machine learning to lymphedema tissue analysis,” Biomed. Opt. Express 10(7), 3353–3368 (2019). [CrossRef]

29. M. J. Huttunen, A. Hassan, C. W. McCloskey, S. Fasih, J. Upham, B. C. Vanderhyden, R. W. Boyd, and S. Murugkar, “Automated classification of multiphoton microscopy images of ovarian tissue using deep learning,” J. Biomed. Opt. 23(6), 1–7 (2018). [CrossRef]

30. K. Mirsanaye, L. Uribe Castaño, Y. Kamaliddin, A. Golaraei, R. Augulis, L. Kontenis, S. J. Done, E. Žurauskas, V. Stambolic, B. C. Wilson, and V. Barzda, “Machine learning-enabled cancer diagnostics with widefield polarimetric second-harmonic generation microscopy,” Sci. Rep. 12(1), 10290 (2022). [CrossRef]

31. B. Shen, S. Liu, Y. Li, Y. Pan, Y. Lu, R. Hu, J. Qu, and L. Liu, “Deep learning autofluorescence-harmonic microscopy,” Light: Sci. Appl. 11(1), 76 (2022). [CrossRef]

32. G. Wang, G. Wang, G. Wang, Y. Sun, Y. Sun, Y. Sun, S. Jiang, G. Wu, G. Wu, W. Liao, Y. Chen, Z. Lin, Z. Liu, S. Zhuo, and S. Zhuo, “Machine learning-based rapid diagnosis of human borderline ovarian cancer on second-harmonic generation images,” Biomed. Opt. Express 12(9), 5658–5669 (2021). [CrossRef]

33. A. Mikołajczyk and M. Grochowski, “Data augmentation for improving deep learning in image classification problem,” in 2018 International Interdisciplinary PhD Workshop (IIPhDW) (2018), pp. 117–122.

34. Y. Liu, A. Keikhosravi, G. S. Mehta, C. R. Drifka, and K. W. Eliceiri, “Methods for quantifying fibrillar collagen alignment,” Methods Mol. Biol. 1627, 429–451 (2017). [CrossRef]

35. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 4510–4520.

36. M. Pinsard, “Multimodal and advanced interferometric second harmonic generation microscopy for an improved characterization of biopolymers in cells and tissues,” University of Quebec (2020).

37. C. Teulon, I. Gusachenko, G. Latour, and M.-C. Schanne-Klein, “Theoretical, numerical and experimental study of geometrical parameters that affect anisotropy measurements in polarization-resolved SHG microscopy,” Opt. Express 23(7), 9313–9328 (2015). [CrossRef]

38. G. Ducourthial, J.-S. Affagard, M. Schmeltz, X. Solinas, M. Lopez-Poncelas, C. Bonod-Bidaud, R. Rubio-Amador, F. Ruggiero, J.-M. Allain, E. Beaurepaire, and M.-C. Schanne-Klein, “Monitoring dynamic collagen reorganization during skin stretching with fast polarization-resolved second harmonic generation imaging,” J. Biophotonics 12(5), e201800336 (2019). [CrossRef]

39. Y. Liu, A. Keikhosravi, C. A. Pehlke, J. S. Bredfeldt, M. Dutson, H. Liu, G. S. Mehta, R. Claus, A. J. Patel, M. W. Conklin, D. R. Inman, P. P. Provenzano, E. Sifakis, J. M. Patel, and K. W. Eliceiri, “Fibrillar collagen quantification with curvelet transform based computational methods,” Front. Bioeng. Biotechnol. 8, 198 (2020). [CrossRef]

40. M. Abadi, A. Agarwal, P. Barham, et al., “TensorFlow: large-scale machine learning on heterogeneous distributed systems,” arXiv, arXiv:1603.04467 (2016). [CrossRef] .

41. L. Gole, J. Yeong, J. C. T. Lim, K. H. Ong, H. Han, A. A. Thike, Y. C. Poh, S. Yee, J. Iqbal, W. Hong, B. Lee, W. Yu, and P. H. Tan, “Quantitative stain-free imaging and digital profiling of collagen structure reveal diverse survival of triple negative breast cancer patients,” Breast Cancer Res. 22(1), 42 (2020). [CrossRef]

42. S. E. J. Preston, M. Bartish, V. R. Richard, A. Aghigh, C. Gonçalves, J. Smith-Voudouris, F. Huang, F. Légaré, L.-M. Postovit, R. Lapointe, R. P. Zahedi, C. H. Borchers, W. H. Miller Jr., and S. V. del Rincón, “Phosphorylation of eIF4E in the stroma drives the production and spatial organisation of collagen type I in the mammary gland,” Matrix Biology 111, 264>–288 (2022). [CrossRef]

43. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, Illustrated edition (The MIT Press, 2016).

44. M. J. Anzanello and F. S. Fogliatto, “Learning curve models and applications: Literature review and research directions,” Int. J. Ind. Ergon. 41(5), 573–583 (2011). [CrossRef]

45. K. Maharana, S. Mondal, and B. Nemade, “A review: Data preprocessing and data augmentation techniques,” Glob. Transit. Proc. 3(1), 91–99 (2022). [CrossRef]

46. A. P. Piotrowski and J. J. Napiorkowski, “A comparison of methods to avoid overfitting in neural networks training in the case of catchment runoff modelling,” J. Hydrol. 476, 97–111 (2013). [CrossRef]

47. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res. 15(1), 1929–1958 (2014). [CrossRef]

Sample No.	R² range of P-SHG analysis	P-SHG vs. GT MS-SSIM	CurveAlign overlay vs. GT MS-SSIM	P-SHG vs. CurveAlign overlay MS-SSIM
1	0.4-1	0.76	0.27	0.25
2	0.4-1	0.33	0.23	0.25
3	0.4-1	0.76	0.45	0.4
4	0.4-1	0.84	0.37	0.36
5	0.4-1	0.36	0.3	0.2

Data split	Training accuracy	Test accuracy
10%	0.82	0.76
20%	0.73	0.80
30%	0.53	0.73
40%	0.69	0.74

Data split	Training accuracy	Test accuracy
10%	0.64	0.76
20%	0.66	0.78
30%	0.65	0.79
40%	0.64	0.76

Data split	Training accuracy	test accuracy
10%	0.84	0.78
20%	0.54	0.49
30%	0.58	0.74
40%	0.49	0.33

Sample No.	R² range of P-SHG analysis	P-SHG vs. GT MS-SSIM	CurveAlign overlay vs. GT MS-SSIM	P-SHG vs. CurveAlign overlay MS-SSIM
1	0.4-1	0.76	0.27	0.25
2	0.4-1	0.33	0.23	0.25
3	0.4-1	0.76	0.45	0.4
4	0.4-1	0.84	0.37	0.36
5	0.4-1	0.36	0.3	0.2

Nonlinear microscopy and deep learning classification for mammary gland microenvironment studies

Abstract

1. Introduction

2. Methodology

2.1 Tissue preparation

2.2 SHG imaging setup

2.3 Image classification using deep learning and transfer learning

3. Results and discussions

3.1 SHG imaging

3.2 P-SHG and CurveAlign analysis

3.3 Image classification

3.3.1 Deep learning feasibility on a small dataset

3.3.2 Addition of more data augmentation layers and elimination of dropout

3.3 Transfer learning using MobileNetV2

4. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (7)

Tables (4)

Equations (1)

Biomedical Optics Express