Convolutional neural network-based retrieval of Raman signals from CARS spectra

Rajendhar Junjuri; Ali Saghi; Lasse Lensu; Erik M. Vartiainen

doi:10.1364/OPTCON.457365

1. Introduction

The coherent anti-Stokes Raman scattering (CARS) spectroscopy is a third-order nonlinear phenomenon where the Pump and Stokes beam coherently excite molecular vibrations. The photons in the probe beam inelastically scatter off from the excited mode with an energy equal to that of the vibrational state. This optical technique offers molecular fingerprint information of the samples at higher speeds without any autofluorescence competition compared to the conventional spontaneous Raman technique [1,2]. CARS has been demonstrated as a potential micro-imaging spectroscopic candidate for biological and materials applications such as brain tumor masses [3] and polymer blends analysis [4], respectively. Further, this technique needs only fractions of a second to generate a micrograph of a particular vibrational mode, and it is a few minutes for a complete hyperspectral image.

CARS spectrum typically has a coherent contribution from the photons generated through resonant and non-resonant processes [5]. The first one corresponds to the vibrational Raman signal, while the second one represents the electronic response. The electronic contribution to the CARS spectra is usually referred to as the “non-resonant background” (NRB) and is considered as the origin of CARS spectral distortion. Several efforts were made to decrease the NRB in the early stages of CARS microscopy development via different optical approaches such as polarization CARS [6], frequency modulation CARS [7], interferometric CARS [8], and others [9,10]. All these alternatives reduced the NRB but at the cost of increased experimental complexity. Further, the NRB acts like a stable homodyne-amplifier for the Raman-generated signal and a reduction of the NRB also led to the decrease of the Raman components of the CARS signal. Therefore, NRB plays an essential role in signal amplification [2], and without that, CARS does not show an advantage over the conventional Raman technique, especially in the case of fast imaging applications [11]1. In the case of Raman spectroscopy, the fluorescence acts like an additive background signal whereas, in the case of CARS, the NRB is coherent and co-generated with Raman-resonant CARS components. Hence, it will/may amplify the weak signals above the noise level, albeit this coherent mixing also introduces the distortions in the spectral line shapes in the CARS spectra which can’t be simply subtracted. However, there is a fixed phase relationship between the NRB and Raman components. This characteristic property has led to the realization that computational methods could be employed to extract the Raman signal from the complex CARS spectra.

In the literature, so-called “phase retrieval” methods such as the maximum entropy method (MEM) [12] and the Kramers–Kronig relation (KK) [13] were deployed to perform this task. These investigations were done by considering that the NRB is either known a priori or the NRB of an appropriate surrogate material (e.g., water, coverslip-glass, and salt) is used as an estimate [14]. However, later it was found that these surrogate –materials have led to errors in phase and amplitude as those were analytically connected [15]. Further, these errors were corrected using “scale-error correction (SEC)” and “phase-error correction (PEC)” approaches, which inherently exposed the connection between the surrogate material and NRB [15]. Moreover, a new approach, “factorized Kramers–Kronig and error correction” (fKK-EC) is reported in the literature [16]. It extracts the Raman signal by denoising the CARS spectra followed by the phase retrieval and correction. The singular value decomposition (SVD) method is utilized for denoising and the basis vector concept is applied for the phase retrieval and correction. Further, wavelet prism (WP) decomposition analysis was exploited for correcting the experimental artifacts in the CARS spectra [17].

All the aforementioned techniques either require the surrogate material as the reference material or other parameters need to be optimized by the user for achieving good results. However, these complexities can be overcome by employing deep learning (DL), which has shown unprecedented performance improvements in different areas [18]. Consequently, deep neural networks (DNNs) have become an active area of research and are explored for various applications [19–21]. These methods have also been applied in spectroscopic applications such as molecular excitation spectroscopy [22], laser-induced breakdown spectroscopy [23–25], and vibrational spectroscopy [26].

In the context of CARS spectroscopy, DL has been studied only in two recent papers [27,28]. ROLA et al. applied DL for phase retrieval from CARS spectra [27]. They utilized the Long Short-Term Memory (LSTM) model and compared the results with the MEM and KK methods. Moreover, Valensise et al. employed DL for predicting the imaginary part of CARS spectra [28]. They have explored a convolutional neural network (CNN) model to extract the imaginary part. This trained CNN model is referred to as SpecNet in the literature. However, it could not able to predict the spectral lines with minimal intensities. Also, the prediction capability was poor when extracting information near the extrema of the CARS spectra. Further, to the best of our knowledge, the two DL models reported to date have only trained with pure synthetic data and have not performed any quantitative measurements on predicted Raman line shapes.

Hence, in this work, we trained the DL model with synthetic and semi-synthetic data for extracting Raman signals from CARS compared to the previous works. In addition, the predicted output (peak height, position, area, and widths of semi-synthetic CARS data) is compared with the real one to estimate the model performance as these attributes are critical in concentration measurements and practical applications. We have chosen CNN model for this task, and its architecture is the same as SpecNet but trained it with synthetic and semi-synthetic data. So, hereafter it is referred to as the ‘retrained model’. Finally, the retrained model potential is demonstrated by extracting the Raman signal from real experimental CARS spectra which has shown better predictive performance compared with the original SpecNet model. It is worth considering that the LSTM architecture details are not openly available to compare with the retrained model results.

2. Experimental details

The CNN model has been trained with the 51024 CARS spectra, 50000 of which were simulated. The remainder were semi-synthetic spectra, and the details are given in the following sections.

2.1 Synthetic CARS spectra generation

The theoretical/synthetic CARS spectrum S(ω) is a combination of the resonant, non-resonant, and noise contributions, and it can be defined as

(1)$$S(\omega ) = \varepsilon (\omega ){|{\chi_{NR}^{(3)} + \chi_R^{(3)}(\omega )} |^2} + \eta (\omega )$$

where $\; \chi _R^{(3 )}$, $\chi _{NR}^{(3 )}\; $ are the resonant and non-resonant third-order susceptibilities, respectively and $\varepsilon(\omega)$ is a line-shape distortion error arising from experimental artefacts17, and η(ω) is the noise contribution. Further, $\chi _R^{\left( 3 \right)}$(Chi3) can be defined as

(2)$${\chi _R}^{(3)} = \sum {\frac{{{A_k}^{}}}{{{\Omega _k} - ({\omega _p} - {\omega _s}) - i{\Gamma _k}}}}$$

where A_k, Ω_k, ω_p, ω_s, and Γ_k represent the peak amplitude, resonance frequency, pump frequency, stokes frequency, and linewidth, respectively.

A maximum of 15 spectral lines/peaks are considered as a limit while generating each synthetic spectra, and vibrational frequencies are sampled over a normalized scale, i.e., [0,1]. The amplitudes and spectral linewidths are varied over in the range of [0.01, 1], and [0.001, 0.008], respectively. The frequency scale in our data is in the range of 200- 3200 cm^-1. So, the linewidth range in the normalized scale [0.001,0.008] corresponds to 2- 25.6 cm^-1 in the wavenumber scale. Further, non-resonant background (NRB) and uniformly distributed noise η(ω) are added to the $\chi _R^{\left( 3 \right)}$ for generating the CARS spectra (see the flow chart in Fig. 1(b)). The NRB is generated as given in Eq. (3):

(3)$$NRB = \,{\sigma _1}* {\mathrm \sigma _2};\qquad{\mathrm{\sigma }_\textrm{i}} = \frac{1}{{1 + {e^{ - \left( {\left( {{\omega _i} - {c_i}} \right){s_i}} \right)}}}}$$

where σ₁ and σ₂ are two sigmoid functions. The sigmoid parameters (c_i, s_i) are randomly selected to simulate various NRBs as shown in Fig. 1(b).

Fig. 1. a) Flow chart for generating the Chi3 spectral data. b) Flow chart for producing the retrained SpecNet model.

Download Full Size | PDF

2.2 Semi synthetic CARS spectra generation

Here the procedure mentioned in Fig. 1(a) is considered, which simulates the “experimental $\chi _R^{(3 )}$ spectra” from the “experimentally recorded Raman spectra”. In the first step, the background of the spectra is identified by implementing a baseline removal package in python [29]. The baseline is automatically estimated with the “adaptive iteratively reweighted penalized least squares approach” [30] and then subtracted from the spectra. In order to make more training spectra, 64 background-corrected spectra are augmented to 1024. In the augmentation process, Poisson noise is randomly generated and added. Later, Kramer – Kronig relation is applied to the augmented spectra for assessing the real part of the $\chi _R^{(3 )}$ and then $\chi _R^{(3 )}$ is estimated. Then both the simulated and experimental Chi3 spectra are combined together and the total training data size becomes 51024*640. Then the procedure mentioned in the previous section was applied to generate CARS spectra based on Chi3 data. The stack plot in Fig. 2 represents the TNT experimental Raman spectrum in black color (at the top) and the corresponding CARS spectrum after conversion in blue color (at the bottom).

Fig. 2. The CARS spectrum after conversion from the experimentally recorded Raman spectrum of TNT

Download Full Size | PDF

Similarly, “CARS experimental test spectra” are generated from the experimentally recorded Raman spectra. It is worth noting that all the NRB and noise simulation parameters utilized for simulating the CARS spectra (train and test set) are kept the same as in the original SpecNet model.

2.3 Details of the experimental Raman data

The Raman spectra of all the samples were recorded by a compact portable Raman spectrometer (i-Raman Plus, M/s B&W Tek). It offers a resolution of 4.5 cm^-1 at 912 nm and a wide spectral range of ∼200-3200 cm^-1 in a single acquisition. The 50 mW laser beam at 785 nm excitation wavelength is utilized to record the spectra. Each spectrum is acquired with an exposure time of 5 s and averaged over three times. All the parameters are optimized such that to get the best signal-to-noise ratio without burning the sample. All the powder samples are made into pellets using a hydraulic press and acquired spectra by focusing the laser beam on the surface of the pellet. Liquid samples were poured into a quartz cuvette, and obtained the Raman spectra. All the mixtures are prepared in a 50-50 percent weight ratio. The complete details of the samples (train & test) are given in Table 1 in Supplement 1. The sample-set consists of 5 different groups (see column 1 in the table), where each group has a different number of samples. Groups 1-5 correspond to the high energy materials (HEM’s) explosives, amino acids/optical isomers, liquid mixtures, polymers, and pharmaceuticals, respectively. A certain number of samples are chosen for the training from each group, as presented in Table 1 in Supplement 1. The data considered for testing is entirely different from the training data to evaluate the model performance. For example, in the case of polymers, ABS, HDPE, HIP, LDPE, PC, PET, PP, and Teflon are considered for training, whereas PPCP, PS, and SIHET are taken for testing. Further, the number of spectra acquired for each sample is in the range of 1-100, depending on the sample availability. However, to balance the data, a maximum of five spectra for each sample are considered for training the model. All the spectra contain 640 data points, and spectral line intensities are normalized between values zero and one. The background was corrected before the normalization by the polynomial fit approach.

2.4 Details of the experimental CARS data

CARS experimental details described elsewhere [31,32]. The optical layout of the multiplex CARS experimental setup can be found here [32]. In brief, a 10 ps laser of bandwidth of ∼1.5 cm^-1 at 710 nm is used as a pump/probe beam. Further, ∼80-fs laser pulse of bandwidth ∼184 cm^-1 is utilized as stokes beam, which is tunable from ∼750 to 950 nm. This tunable range corresponds to 750-3500 cm^-1 in the vibrational frequency range. The stokes (105 mW power) and pump/probe (75 mW power) beams are focused into a tandem cuvette with an achromatic lens of focal length 5 cm. Stokes and pump/probe laser beams use a long-wave pass filter and interference filter to block amplified spontaneous emission (ASE) from the lasers. The anti-Stokes signal generated from the sample is filtered and spectrally resolved onto a spectrometer with an effective spectral resolution of ∼5 cm^-1. All multiplex CARS spectra shown have an acquisition time of 800 ms. The experiment is performed on three different samples. The first one is an equimolar mixture of AMP, ADP, and ATP in water for a total concentration of 500 mM. The second is 75 mM DMPC small unilamellar vesicles (SUV) suspension. The third test sample is yeast. It is a living budding yeast cell (a zygote of Saccharomyces cerevisiae) measured from the mitochondria of the yeast cell [33].

3. Deep neural network model

Similar to common artificial neural networks (ANNs), the main motive of a DNN is to learn a nonlinear mapping from the input data to the desired model outputs [34]. The learning adjusts the model parameters which occur during the training process. After the training, it is possible to make predictions for unknown data samples by utilizing the learned model. The learning is typically implemented in a supervised manner by using training data consisting of data samples and the desired output for each sample, together with the backpropagation algorithm passing the error signal from the output towards the input and adjusting the model parameters. As a result of the training process, the mapping is represented by the learned weights of the DNN architecture. In the case of complex modeling problems, this type of architectures may contain a large number of interconnected layers of computational units called neurons and, thus, a huge number of parameters implying that a large training set is required. However, not all applications are such that a deep architecture with many layers is needed. Among the different ANN architectures, convolutional neural networks (CNN) have recently become an efficient solution for various machine learning-related problems such as image processing [35], time-series classification [36], and object detection [37].

The CNN architecture consists of different types of layers such as convolution, fully connected, pooling, and flattening layers. The convolutional part of a CNN is formed by convolution layers which are responsible for extracting the relevant features from the data and producing new data representations called feature maps. Convolution layers can be considered to implement filter banks in which the parameters are learned and the level of abstraction related to the data representation increases layer-by-layer in the convolutional part of the architecture. One of the common applications is in image processing to recognize useful patterns irrespective of their position in the given input. Each neuron in the convolution layer is connected to a limited neighborhood of neurons of the preceding layer and the weights are shared with the remaining neurons of the same layer. Thus, this significantly reduces the number of tuning parameters which is an added advantage in simplifying the back-propagation approach. Accessing the information in a spatially invariant way from the given input is another benefit that can be achieved by weight sharing. It is of particular interest for Raman spectroscopy applications where the spectral lines/peaks can appear anywhere in the spectrum. In the second part of the architecture, fully connected layers have no limitations concerning the connections from the preceding layer and their respective weights. They are used to generalize the information provided by the convolution layers that produce the feature maps derived from the given input data. The CNN architecture used in this study is SpecNet and it is retrained with synthetic and semi-synthetic data [28]. The code is implemented in Python. It comprises five 1D convolution layers with 128, 64, 16, 16, and 16 neurons of dimensionality 32, 16, 8, 8, and 8, respectively. It is followed by three fully connected layers of 32, 16, and 640 neurons. Rectified Linear Unit (ReLU) is utilized as the activation function and mean squared error (MSE) is used as the loss function. Adam was used for performing the optimization with a batch size of 256 samples. 10-fold cross-validation was used in the training process.

4. Results and discussion

In the following sections, we demonstrate and analyze the capability of the retrained SpecNet model for direct extraction of the imaginary part Im (χ3) from the input CARS spectrum.

4.1 Retrieval of the imaginary part

The predictive ability of the retrained SpecNet model can be determined by readily accessing the Im (χ³) part for various test samples. The analysis has been performed by deploying the trained model on 30 test spectra belonging to 9 different samples, and their complete details are given in Table 1 in Supplement 1. The test set is not utilized during the training process. In the first step, all the test CARS spectra are produced from the experimentally recorded Raman spectra by following the similar procedure mentioned in the previous section. All the necessary parameters (background correction and simulation) are kept the same as in the case of simulating the CARS spectra for the experimental training set. The imaginary parts of different test samples obtained from the retrained model and original SpecNet model are visualized in Fig. 3. Here the frequency scale is normalized, hence the Raman shift can be considered as a relative Raman shift. Each plot in the figure, for example, Fig. 3(a) represents the true and predicted imaginary part of the test CARS spectra at the top with black and red lines, respectively. Further, their difference/error is estimated and the square of the error is illustrated in the same plot with the blue line at the bottom. Insets in Fig. 3(c) and 3(e) represent the close view of the squared errors for better visualization. The squared error (SE) plot can critically assess the quality of a prediction. The SE interpretation becomes prominent when comparing the results to the existing model, and it can be used as a tool for validating the performance of the retrained model.

Fig. 3. Comparison of the results obtained from the retrained and original SpecNet model. (a and b) The imaginary parts predicted by the retrained model and SpecNet model, respectively for the TNT- HMX mixture. (c and d) for the TNT. (e and f) for the PPCP. The Insets in the figure represent the close view of the squared errors.

Download Full Size | PDF

As seen from Fig. 3(a), the predicted imaginary spectrum resembles the true spectrum for the retrained model. Whereas the imaginary part extracted from the original SpecNet was not able to predict/detect all the spectral features/peaks, as shown in Fig. 3(b). It is also noticed that by qualitative assessment, the original SpecNet performance deteriorated in predicting peaks at either end of the spectra. It is also true even when the spectral line intensities are higher as shown in Fig. 3(f). Further, the predicted peak intensities/heights are found to be in good agreement with the true ones for the retrained model as shown in Fig. 3(a), 3(c), and 3(e). On the contrary, it has deviated for the original SpecNet model as depicted in Fig. 3(b), 3(d), and 3(f). In order to make a better comparison, one strong characteristic peak/line for each sample is considered for the interpretation. For example, in the case of the first spectra/sample, the spectral line at 0.7 cm-1 is selected (X-axis is normalized to 0-1).

It is noticed that the measured SE for original SpecNet is 3 times higher at 0.7 cm^-1 compared to the retrained model. In addition to this, it is also found that deviance is higher for some samples. For instance, in the case of the second sample, the measured SE at 0.7 cm^-1 for the original SpecNet is 50 times to the retrained model as illustrated in Fig. 3(d) and it is 8 times for the third sample at 0.89 cm^-1 as visualized in Fig. 3(f). Further, the deviation is varied highly across the total spectral range depending on the sample. The retrained model has shown better performance than the original SpecNet in predicting the imaginary part for the majority of the samples. However, it has given partial results for very few samples. For example, in the case of the fourth and fifth samples, the performance is limited while predicting the peaks at the ends of the spectrum as shown in Fig. 4(a) and 4(c).

Fig. 4. Comparison of the results obtained from the retrained and original SpecNet model. (a and b) The imaginary parts predicted by the models for the D Leucine, (c and d) for the Ethanol.

Download Full Size | PDF

The reason can be explained as follows. The original SpecNet model has been trained with 640 data points, so to make the comparison, the same number of data points have been considered for the retrained model. However, experimental Raman spectra contain more than 640 points in their total spectral range. Hence, we have considered only the part of the spectral region from the full range which resulted in selecting the rising/falling part of the spectral lines at the ends for some samples as shown in Fig. 4(a) and 4(c). This limited spectral range selection has challenged the predictive ability of the model, nevertheless, it can be avoided by considering the total spectral range. However, the prediction of retrained one is better compared to SpecNet. The SE obtained from our model for the fourth sample at 0.56 cm^-1 is 36 times lower compared to the original SpecNet as illustrated in Fig. 4(a) and it is 56 times lower at 0.45 cm^-1 for the fifth sample as depicted in Fig. 4(c).

Further, the retrained model efficiency can be improved by training it with the different spectral simulation parameters such as different peak widths, heights, number of peaks, and noise levels. Moreover, the model architecture can be modified to detect the spectral lines at the edges.

4.2 Mean square error analysis

The SE plot can illustrate the discrepancies between the true and predicted spectra. Nonetheless, to make the quantitative evaluation of SE across the total spectral range for all the test samples, the mean square error (MSE) is calculated and the results are presented in Fig. 5. The total spectral range is divided into three parts (a) [0, 0.1], (b) [0.1, 0.9], and (c) [0.9, 1] for better visualization of deviation at each point. The dots in the schematic represent the mean whereas bars correspond to the standard deviation measured from the entire test spectra. It is noticed from Fig. 5 that the measured MSE is close to zero for 90% of the data points for the retrained model. Though, it is slightly higher at either end of the spectra as shown in Fig. 5(a) and 5(c). Within the three regions, the highest variation (standard deviation) is observed in the first spectral region 0 to 0.1 cm^-1 as illustrated in Fig. 5. Albeit, the mean is close to zero for the retrained model and it is around 0.1 for the original SpecNet. As aforementioned, it can be attributed to the inefficient selection of the spectral peaks at the ends as visualized in Fig. 4(c) and 4(d). Similar behavior is noticed in the third spectral region as shown in Fig. 5(c) which can be ascribed to considering only the rising/falling part of the peak while selecting the 640 data points from the total spectral range as visualized in Fig. 4(a) and 4(b). Further, the lowest variation is observed in the second spectral region which corresponds to ∼ 90% of the total data points. The retrained model has shown solid performance where the maximum standard deviation is only ∼ 0.03 whereas it is 5 times higher for the original SpecNet, i.e., 0.15. It can also be seen that the mean is close to zero for most of the data points for the retrained model as its prediction is close to the true spectra. However, the mean is deviated from zero for the original SpecNet because of the inaccurate estimation of the spectral features in the imaginary spectra.

Fig. 5. Comparison of the MSE obtained from the retrained model (top) and original SpecNet model (bottom). a, b, and c correspond to the following three spectra regions in the normalized x-axis scale (0, 0.1), (0.1, 0.9), and (0.9, 1) respectively. The dots (in black color) and bars (in red color) correspond to the mean and standard deviation estimated from all the test spectra.

Download Full Size | PDF

4.3 Correlation analysis

Analysis based on the correlation can serve as a performance metric while evaluating the retrained model. It is an approach that offers a statistical measure of strength of the relationship between two variables [38]. In detail, this evaluation criteria numerically finds the percentage of the similarity between the true and predicted imaginary spectrum. In the following, three different techniques, that is, one correlation and two distance metrics were considered for the present analysis: a) Pearson correlation coefficient (PCC), b) Euclidean distance (ED), and c) Cosine distance (CD). PCC generally varies between -1 to 1 where -1 represents the negative linear correlation and 1 corresponds to the positive linear correlation [39]. For all the three measurements, 1 corresponds to the best correlation i.e., true and predicted spectra are identical and 0 represents no similarity between true and predicted measurements. Initially, total data (640 points) is given as an input for the metric evaluation. The PCC, ED, and CD analysis has been performed for all the test samples and the results are visually presented in Fig. 6 (a), 6(b), and 6(c) respectively with a black line connected with dots. It has been found that half of the total test spectra have more than 0.95 correlation coefficient and 22 test spectra have more than 0.9 for all three techniques. Further, only four spectra (19^th to 22^nd test spectra) have caused the metric to be less than 0.8 among all the spectra. Visually it has been verified that the imaginary spectrum shown in Fig. 4(c) corresponds to one of those four test spectra and the remaining three corresponds to the spectra of the same sample.

Fig. 6. Different correlation metrics obtained for the 30 test spectra a) Pearson correlation coefficient (PCC), b) Euclidean distance (ED), and c) Cosine distance (CD).

Download Full Size | PDF

Similarly, the correlation strengths of the 15^th to 18^th spectra are in the range of 0.8-0.92 for all three measurements. The lower correlation coefficients have been observed because of considering only half part (rising/falling) of the spectral lines at either end of the spectra as shown in Fig. 4(a) and 4(c). In order to overcome this limitation, the metrics have been re-estimated by removing the inconsistent spectral lines at both ends of the spectra. Thus, only 540 data points have been considered as an input for evaluating the PCC, ED, & CD, and the results are shown in Fig. 6(a), 6(b), and 6(c) respectively with a red line connected with triangles. It is envisioned from Fig. 6(a), 6(b), and 6(c), the correlation coefficients are increased for those four spectra with a minimum coefficient of more than 0.9. To summarize, these metrics represent the predictive ability of the retrained model on the test set. In the next sections, individual peak analysis has been presented.

4.4 Peak analysis

This section investigates the minute details of each selected individual spectral line for all the test spectra. The spectral line-shape parameters such as peak center, width (FWHM – full width at half maximum), height, and area/intensity play a prominent role in quantitative CARS measurements. Hence, this thorough interrogation further validates the potential of the retrained model. A peak finding approach was carried out in MATLAB to select the spectral lines in the true and predicted imaginary spectra. It automatically finds the peaks present in the spectra by considering the spectral width & height as two input parameters. These parameters are modified in such a way that the maximum number of peaks is identified. Finally, the algorithm detected 108 peaks/spectral lines in both the true and predicted imaginary spectra. Then, each peak has been fitted by using the Lorentzian function and the coefficient of determination (R²) is found to be more than 0.85 for all the peaks. For example, Fig. 7(a) and 7(b) represent true and predicted spectrum spectral lines, respectively.

Fig. 7. a) and b) represent one of the spectral lines of true and predicted imaginary spectra, respectively. c) Stack plot of the true and estimated peak centers and their differences. A red dotted vertical line is drawn to show the reference data points. d), e) and f) represent the statistical distribution of deviation of the predicted peak center, peak area & width from the true ones respectively.

Download Full Size | PDF

The dots correspond to the actual data, and a red dotted line represents the Lorentzian fit. The approximation provides the details of the peak center, width, and area. These spectral parameters have a vital role in determining the concentrations of the analyte in the CARS technique, particularly with lower concentrations. The stack plot in Fig. 7(c) represents the predicted peak center, true peak center, and their difference respectively (top to bottom). It can be clearly seen that the difference between true and predicted peak centers is close to zero for more than 90% of the peaks as visualized in Fig. 7(c). However, a negligible deviation is observed for some of the spectral lines. For example, the predicted peak center of the 52^nd line (Fig. 7(b) represents this peak) is deviated by ∼0.007 in the normalized scale which is trivial. A red dotted line is plotted vertically at the 52^nd line for reference as shown in Fig. 7(c). Further to statistically represent the distribution of their difference, a bar graph is presented as shown in Fig. 7(d). It is noticed that more than 90% of the spectral lines have a deviation less than 0.0025 for the estimated peak centers. Similar results were obtained for the peak area and peak widths as shown in Fig. 7(e) and Fig. 7(f) respectively. Comparing the estimated line shape parameters of the obtained predicted imaginary spectrum shows a clear correspondence to the true parameters of all the test spectra, which ascertains the predictive power of the retrained model.

4.5 Prediction on experimental CARS spectra

In this section, we have evaluated the model performance on real experimental CARS spectra. Figure 8 represents the results obtained for all three test samples. The retrained model prediction on the CARS spectrum of AMP, ADP, and ATP mixture is visualized in Fig. 8(a). The first plot in Fig. 8(a) is an input CARS spectrum (at the top with green color). The second plot represents the true (black color) and predicted (red color) imaginary parts. Here, the true spectra correspond to the imaginary part extracted from the Maximum Entropy method, which is considered a standard approach in the CARS community. Further, the last plot in Fig. 8(a) represents the squared error (at the bottom with a blue line), i.e., the square of the difference between the true and predicted imaginary parts.

Fig. 8. a) and b) The imaginary parts predicted by the retrained and original SpecNet models, respectively for the ADP/AMP/ATP. (c and d) for the DMPC, (e and f) for the yeast.

Download Full Size | PDF

The adenine ring vibrations are observed at identical frequencies for ATP/AMP/ADP, i.e., at ∼ 1360 cm^-1 as shown in Fig. 8(a) [40]. The tri-phosphate group of ATP has shown a strong resonance at ∼1123 cm^-1. In the case of ADP, a broadened resonance is found in between at ∼1100 cm^-1. It is noticed from Fig. 8(a) and 8(b) the retrained model performance is found to be better compared to the SpecNet model. It has correctly extracted the major Raman lines, and the estimated squared error (SE) is very is low, i.e., ∼ 10⁻³. However, the original SpecNet could not predict all the lines, as shown in Fig. 8(b). For example, the spectral lines at 1087 and 1500 cm^-1 are very weak in the spectral range of 1060-1140 and 1440-1520 cm^-1, respectively. Further, the measured Raman line intensities are deviated from the actual one and led to the high SE, which is almost ∼ 25 times at ∼1332 cm^-1 compared to the retrained model. Similar behavior was noticed for the other samples as well.

The retrained and SpecNet model predictions of the DMPC sample are presented in Fig. 8(c) and 8(d). The spectral assignment of prominent vibrational frequencies in the fingerprint CH-stretch region is well known [41]. The symmetric methylene stretch vibration is the strongest peak observed at ∼2847 cm^-1. Both models have extracted the characteristic Raman lines, albeit the prediction accuracy, i.e., SE of the retrained model is one order less than the original SpecNet at ∼2847 cm^-1. Further, the predictions on the yeast sample are illustrated in Fig. 8(e) and 8(f). The C-H bend of the aliphatic chain and amide band is observed at 1440 cm⁻¹ and 1654 cm⁻¹ respectively, for both the models. Nevertheless, the SpecNet SE is 35 times the retrained model prediction which inherently demonstrates the potential of our approach. In future work, it would be interesting to include a more expansive range of experimental data sets in training in addition to simulated spectra. Moreover, the model efficiency can be further improved by considering the different types of NRB from the simulation. Also, NRB extracted from the experimentally recorded CARS spectra can be utilized in training.

5. Conclusions

We presented the extraction of the Raman signal from the complex CARS spectra by utilizing the deep learning model. It automatically extracts the information without any user intervention after training. The model was trained with huge synthetic and semi-synthetic data. The retrained model has shown potential improvement compared to the SpecNet model in terms of efficient extraction of imaginary part across the total spectral range. The correlation analysis demonstrated the potential of the retrained model where a correlation coefficient > 0.9 was achieved for most of the samples. Further individual peak analysis has revealed that the proposed model predicted the peak amplitudes, widths, and centers without much deviation from the true ones. Final results on the experimental CARS spectra demonstrated the potential of the approach where prediction error is 10 times less compared to the original SpecNet model. These findings can be helpful in real-time microscopy imaging applications for the rapid extraction of the Raman signal from the CARS spectra.

Funding

Academy of Finland (FIRI/327734).

Acknowledgments

This work is a part of “Quantitative Chemically-Specific Imaging Infrastructure for Material and Life Sciences (qCSI)” project funded by the Academy of Finland (Grant No. FIRI/327734). Also, we thank Michiel Müller and Hilde Rinia for providing the experimental measurements of the DMPC lipid sample and the AMP/ADP/ATP mixture, as well as, Masanari Okuno and Hideaki Kano for providing the experimental measurements of the yeast sample.

Disclosures

The authors declare no conflicts of interest.

Data availability

The data supporting this study’s findings are available from the corresponding author on request. See Supplement 1 for supporting content.

Supplemental document

See Supplement 1 for supporting content.

References

1. A. Zumbusch, G. R. Holtom, and X. S. Xie, “Three-dimensional vibrational imaging by coherent anti-Stokes Raman scattering,” Phys. Rev. Lett. 82(20), 4142–4145 (1999). [CrossRef]

2. M. Müller and A. Zumbusch, “Coherent anti-Stokes Raman scattering microscopy,” ChemPhysChem 8(15), 2156–2170 (2007). [CrossRef]

3. B.-R. Lee, K.-I. Joo, E. S. Choi, J. Jahng, H. Kim, and E. Kim, “Evans blue dye-enhanced imaging of the brain microvessels using spectral focusing coherent anti-Stokes Raman scattering microscopy,” PLoS One 12(10), e0185519 (2017). [CrossRef]

4. S. Xu, C. H. Camp Jr, and Y. J. Lee, “Coherent anti-Stokes Raman scattering microscopy for polymers,” J. Polym. Sci. 60(7), 1244–1265 (2022). [CrossRef]

5. W. M. Tolles, J. W. Nibler, J. R. McDonald, and A. B. Harvey, “A review of the theory and application of coherent anti-Stokes Raman spectroscopy (CARS),” Appl. Spectrosc. 31(4), 253–271 (1977). [CrossRef]

6. J.-X. Cheng, L. D. Book, and X. S. Xie, “Polarization coherent anti-Stokes Raman scattering microscopy,” Opt. Lett. 26(17), 1341–1343 (2001). [CrossRef]

7. F. Ganikhanov, C. L. Evans, B. G. Saar, and X. S. Xie, “High-sensitivity vibrational imaging with frequency modulation coherent anti-Stokes Raman scattering (FM CARS) microscopy,” Opt. Lett. 31(12), 1872–1874 (2006). [CrossRef]

8. M. Jurna, J. P. Korterik, C. Otto, J. L. Herek, and H. L. Offerhaus, “Background free CARS imaging by phase sensitive heterodyne CARS,” Opt. Express 16(20), 15863–15869 (2008). [CrossRef]

9. N. Dudovich, D. Oron, and Y. Silberberg, “Single-pulse coherently controlled nonlinear Raman spectroscopy and microscopy,” Nature 418(6897), 512–514 (2002). [CrossRef]

10. E. T. Garbacik, J. P. Korterik, C. Otto, S. Mukamel, J. L. Herek, and H. L. Offerhaus, “Background-free nonlinear microspectroscopy with vibrational molecular interferometry,” Phys. Rev. Lett. 107(25), 253902 (2011). [CrossRef]

11. M. Cui, B. R. Bachler, and J. P. Ogilvie, “Comparing coherent and spontaneous Raman scattering under biological imaging conditions,” Opt. Lett. 34(6), 773–775 (2009). [CrossRef]

12. E. M. Vartiainen, “Phase retrieval approach for coherent anti-Stokes Raman scattering spectrum analysis,” J. Opt. Soc. Am. B 9(8), 1209–1214 (1992). [CrossRef]

13. Y. Liu, Y. J. Lee, and M. T. Cicerone, “Broadband CARS spectral phase retrieval using a time-domain Kramers–Kronig transform,” Opt. Lett. 34(9), 1363–1365 (2009). [CrossRef]

14. A. Karuna, F. Masia, P. Borri, and W. Langbein, “Hyperspectral volumetric coherent anti-Stokes Raman scattering microscopy: quantitative volume determination and NaCl as non-resonant standard,” J. Raman Spectrosc. 47(9), 1167–1173 (2016). [CrossRef]

15. C. H. Camp Jr, Y. J. Lee, and M. T. Cicerone, “Quantitative, comparable coherent anti-Stokes Raman scattering (CARS) spectroscopy: correcting errors in phase retrieval,” J. Raman Spectrosc. 47(4), 408–415 (2016). [CrossRef]

16. C. H. Camp Jr, J. S. Bender, and Y. J. Lee, “Real-time and high-throughput Raman signal extraction and processing in CARS hyperspectral imaging,” Opt. Express 28(14), 20422–20437 (2020). [CrossRef]

17. Y. Kan, L. Lensu, G. Hehl, A. Volkmer, and E. M. Vartiainen, “Wavelet prism decomposition analysis applied to CARS spectroscopy: a tool for accurate and quantitative extraction of resonant vibrational responses,” Opt. Express 24(11), 11905–11916 (2016). [CrossRef]

18. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

19. Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, “Deep learning for visual understanding: A review,” Neurocomputing 187, 27–48 (2016). [CrossRef]

20. F. Lussier, V. Thibault, B. Charron, G. Q. Wallace, and J.-F. Masson, “Deep learning and artificial intelligence methods for Raman and surface-enhanced Raman scattering,” TrAC Trends in Analytical Chemistry 124, 115796 (2020). [CrossRef]

21. A. Ozdemir and K. Polat, “Deep learning applications for hyperspectral imaging: a systematic review,” J. Inst. Electron. Comput. 2(1), 39–56 (2020). [CrossRef]

22. K. Ghosh, A. Stuke, M. Todorović, P. B. Jørgensen, M. N. Schmidt, A. Vehtari, and P. Rinke, “Deep learning spectroscopy: Neural networks for molecular excitation spectra,” Adv. Sci. 6(9), 1801367 (2019). [CrossRef]

23. J. Vrábel, E. Képeš, L. Duponchel, V. Motto-Ros, C. Fabre, S. Connemann, F. Schreckenberg, P. Prasse, D. Riebe, and R. Junjuri, “Classification of challenging LIBS soil sample data-EMSLIBS contest,” Spectrochimica Acta Part B: Atomic Spectroscopy 169, 105872 (2020). [CrossRef]

24. R. Junjuri, A. Prakash Gummadi, and M. Kumar Gundawar, “Single-shot compact spectrometer based standoff LIBS configuration for explosive detection using artificial neural networks,” Optik 204, 163946 (2020). [CrossRef]

25. R. Junjuri, S. A. Rashkovskiy, and M. K. Gundawar, “Dependence of radiation decay constant of laser produced copper plasma on focal position,” Phys. Plasmas 26(12), 122107 (2019). [CrossRef]

26. R. Junjuri, C. Zhang, I. Barman, and M. K. Gundawar, “Identification of post-consumer plastics using laser-induced breakdown spectroscopy,” Polym. Test. 76, 101–108 (2019). [CrossRef]

27. R. Houhou, P. Barman, M. Schmitt, T. Meyer, J. Popp, and T. Bocklitz, “Deep learning as phase retrieval tool for CARS spectra,” Opt. Express 28(14), 21002–21024 (2020). [CrossRef]

28. C. M. Valensise, A. Giuseppi, F. Vernuccio, A. De la Cadena, G. Cerullo, and D. Polli, “Removing non-resonant background from CARS spectra via deep learning,” APL Photonics 5(6), 061305 (2020). [CrossRef]

29. S. Yao, L. Zhang, Y. Zhu, J. Wu, Z. Lu, and J. Lu, “Evaluation of heavy metal element detection in municipal solid waste incineration fly ash based on LIBS sensor,” Waste Manag. 102, 492–498 (2020). [CrossRef]

30. Z.-M. Zhang, S. Chen, and Y.-Z. Liang, “Baseline correction using adaptive iteratively reweighted penalized least squares,” Analyst 135(5), 1138–1146 (2010). [CrossRef]

31. E. M. Vartiainen, H. A. Rinia, M. Müller, and M. Bonn, “Direct extraction of Raman line-shapes from congested CARS spectra,” Opt. Express 14(8), 3622–3630 (2006). [CrossRef]

32. M. Müller and J. M. Schins, “Imaging the thermodynamic state of lipid membranes with multiplex CARS microscopy,” J. Phys. Chem. B 106(14), 3715–3723 (2002). [CrossRef]

33. M. Okuno, H. Kano, P. Leproux, V. Couderc, J. P. R. Day, M. Bonn, and H. Hamaguchi, “Quantitative CARS molecular fingerprinting of single living cells with the use of the maximum entropy method,” Angew. Chem. Int. Ed. Engl. 122(38), 6925–6929 (2010).

34. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Adv. Neural Inf. Process. Syst. 60(6), 84–90 (2017). [CrossRef]

35. S. Hijazi, R. Kumar, and C. Rowen, “Using convolutional neural networks for image recognition,” Cadence Des. Syst. Inc. San Jose, CA, USA1–12 (2015).

36. 36.Y. Zheng, Q. Liu, E. Chen, Y. Ge, and J. L. Zhao, “Time series classification using multi-channels deep convolutional neural networks,” in International Conference on Web-Age Information Management (Springer, 2014), pp. 298–310.

37. K. Kang, W. Ouyang, H. Li, and X. Wang, “Object detection from video tubelets with convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 817–825.

38. X. Tan, X. Chen, and S. Song, “A computational study of spectral matching algorithms for identifying Raman spectra of polycyclic aromatic hydrocarbons,” J. Raman Spectrosc. 48(1), 113–118 (2017). [CrossRef]

39. P. Schober, C. Boer, and L. A. Schwarte, “Correlation coefficients: appropriate use and interpretation,” Anesth. Analg. 126(5), 1763–1768 (2018). [CrossRef]

40. K. T. Yue, C. L. Martin, D. Chen, P. Nelson, D. L. Sloan, and R. Callender, “Raman spectroscopy of oxidized and reduced nicotinamide adenine dinucleotides,” Biochemistry 25(17), 4941–4947 (1986). [CrossRef]

41. R. Mendelsohn and D. J. Moore, “Vibrational spectroscopic studies of lipid domains in biomembranes and model systems,” Chem. Phys. Lipids 96(1-2), 141–157 (1998). [CrossRef]

Convolutional neural network-based retrieval of Raman signals from CARS spectra

Abstract

1. Introduction

2. Experimental details

2.1 Synthetic CARS spectra generation

2.2 Semi synthetic CARS spectra generation

2.3 Details of the experimental Raman data

2.4 Details of the experimental CARS data

3. Deep neural network model

4. Results and discussion

4.1 Retrieval of the imaginary part

4.2 Mean square error analysis

4.3 Correlation analysis

4.4 Peak analysis

4.5 Prediction on experimental CARS spectra

5. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (8)

Equations (3)

Optics Continuum