Contactless SpO<sub>2</sub> with an RGB camera: experimental proof of calibrated SpO<sub>2</sub>

Mark van Gastel; Wim Verkruysse

doi:10.1364/BOE.471332

1. Introduction

Pulse-oximeters are routinely used to monitor the health of people with conditions that affect blood oxygen levels, especially while they are in the hospital. Also outside the hospital peripheral oxygen saturation (SpO₂) has become common with the technology being integrated into (smart)watches and with the availability of low-cost pulse-oximeters. Applications range from monitoring of the critically ill, e.g., in the operating room, to monitoring during patient transport, respiratory monitoring during narcotic administration, detection of apneas, and evaluation of home-oxygen therapy. Despite the reduced threshold to monitor SpO₂ routinely, it still requires a dedicated device which has to be attached to an extremity. This can negatively impact the clinical workflow and causes discomfort, especially during continuous monitoring. There is therefore a need to monitor SpO₂ without touching the skin. Remote photoplethysmography (rPPG) is an emerging technology that enables contactless monitoring of blood volume variations with a camera. Monitoring capabilities of rPPG includes the common parameters pulse rate [1,2], respiration [3,4], blood pressure [5] and SpO₂ [6], but also less commonly monitored but clinically relevant parameters such as jugular venous pulse (JVP) [7]. Applications range from monitoring of infants in a neonatal intensive care unit (NICU) [8] to triaging at the emergency department (ED) [9]. Furthermore, the two-dimensional data acquisition provides maps that allow the analysis of spatial perfusion properties [10–12].

The feasibility of contactless SpO₂ using cameras has been demonstrated via studies under both normoxic and hypoxic conditions, and for different room temperatures to simulate centralization [13]. Although the principle is similar to conventional contact probes, i.e. multi-wavelength PPG, a camera captures light that has predominantly travelled through the superficial layers of the skin over a relatively short distance. This leads to a reduced modulation index, i.e. PPG signal strength, by a factor of 10 to 100 [13] compared to a typical contact probe in transmission mode. Furthermore, since most of the pulsatile arterioles reside in the deeper layers of the skin, the different source-detector geometry of the camera poses a risk to the calibratability of the measurement. It was however found that the accuracy of the camera measurement is compatible with the ISO standard for pulse-oximeters [14], demonstrating its feasibility. The vulnerability of the SpO₂ measurement to motion and noise artifacts because of the small amplitudes of the camera-based PPG signals could be mitigated by an indirect method where instead of extracting amplitude features from the individual PPG waveforms, the PPG waveforms are combined into a pulse signal where the blood oxygenation level is estimated based on the quality of that signal [6].

The aforementioned promising results were obtained with rather bulky setups consisting of multiple cameras and illumination with a large form factor. Multiple monochrome cameras with optical bandpass filters were used to capture the optimal wavelengths for SpO₂ because of the unavailability of affordable multi-spectral cameras with the required sensor quality for PPG. To reduce the setup’s form factor previous work explored the use of ambient light in the visible range with multiple cameras [15], and temporally modulated illumination such that multiple wavelengths can be measured with a single monochrome camera [16,17]. Both approaches however have disadvantages such as a dependency on (unpredictable) ambient light conditions [15], and the requirement of custom illumination synchronized with the camera acquisition [16,17].

Due to its convenience and the feasibility of wide use when integrated into smartphones, tablets, and other consumer devices, researchers have begun to estimate SpO₂ levels using an RGB camera [18–26]. Guazzi et al. [18] used a ratio of blue and green to detect changes in SpO₂ values with an RGB camera, without claiming an absolute measure of oxygen saturation. Instead of using the information from the camera color channels directly to extract the PPG signals and estimate SpO₂, Kim et al. [19] used both chrominance components after converting from the RGB to the YCgCr color space. A ratio of both components was used to find the calibration model via linear regression. Wei et al. [20] used a webcam with an inherent low signal-to-noise ratio (SNR). To enable a reliable measurement on the noisy data, blind source separation was used with the PPG signals from the three color channels as input.

Despite interesting results, many studies where RGB cameras have been used for SpO₂ estimation suffer from shortcomings, which renders the assessment of the value of these methods in clinical practice difficult. First, the datasets used for training and evaluation are often limited. The number of subjects involved is very small, with little variation in factors such as skin type and age. Second, for calibration and to demonstrate sensitivity to changes in SpO₂, breath-hold events are often used to induce a dip. Since it is difficult for the subjects to remain still during these events, breath-holds are often accompanied by head movements. As a consequence, it is difficult to assess if dips in SpO₂ values are indeed caused by changes in relative PPG amplitudes, or by motion artifacts during a breath-hold which are known to cause dips in SpO₂ as well due to equal channel noise. Another limitation of using breath-holds is the difficulty to correct for the physiological and processing delay. It is therefore important to perform calibration based on long term (> 1 minute segments of low SpO₂) rather than temporary dips in SpO₂ and robustly verify that the observed changes in observed SpO₂ are indeed associated with changes in PPG amplitudes. Third, trending of SpO₂ rather than absolute values for SpO₂ is demonstrated, i.e. calibrated for a subject population rather than per individual. While trending could still have relevance for event detection, e.g., for apnea, it cannot be used as an alternative to contact probes. As a result, it remains uncertain if RGB cameras may be deployed to perform contactless SpO₂ in a calibrated manner, and if so, under which conditions.

In this study we proposed a setup consisting of commercial off-the-shelf (COTS) elements added to a regular RGB camera for contactless SpO₂ in a calibrated manner. We used video data with SpO₂ values in the wide range $70\%-100\%$ to determine the calibration model, and then applied this model to a dataset where realistic scenarios for spot-check measurements in a clinical setting have been simulated.

2. Materials and methods

2.1 RGB concept

Pulse-oximetry is based on two key assumptions. First, the probing volumes of the different wavelengths are identical and homogeneous. Second, only arterial blood is pulsatile at the frequency of the cardiac cycle. For the wavelengths that are typically used in pulse-oximeters in the “optical window" between 600 and 950 nm this first assumption largely holds because of similar scattering. In visible light there is a much larger wavelength dependency for both scattering and absorption, leading to distinct skin penetration depths for the wavelengths within this range. This jeopardizes the calibratability of the SpO₂ measurement because factors such as skin geometry and blood volume are now part of the equation. To investigate to what extent the measurement is affected, studies have been executed where a comparison has been made between a combination of red and green, and the traditional combination of red and near-infrared wavelengths [27]. Here it was found that the visible wavelengths had an error of $3.0\%$ compared to $1.7\%$ for the conventional wavelengths. The errors for red and green increased for lower temperatures or when measured at a slightly different skin site (cheek instead of forehead), as a result from probing the vasculature at different depths. It has to be noted that for this study a setup consisting of multiple monochrome cameras with bandpass filters has been used in combination with incandescent light. When using an RGB camera additional challenges arise, which are summarized in Table 1. These challenges either add a bias or noise to the measurement. Whereas noise is undesirable but can be mitigated by using longer processing windows and/or filtering techniques, a bias is much harder to detect and correct for. In the next paragraphs we will describe how the challenges that pose the largest risk to the accuracy of the measurement, challenges I-III, can be mitigated with the addition of low-cost components. The other challenges have a rather small impact on the accuracy and have no easy solution with low-cost components (IV), or are inherent to the use of RGB cameras (V and VI).

Table 1. A list of the main challenges associated with measuring SpO₂ with an RGB camera.

View Table

2.2 Experimental setup

The experimental setup used for both studies consisted of COTS components only, and is visualized in Fig. 1. We used an RGB camera (UI-3200SE-C-HQ, IDS Imaging GmbH) with a triple-band bandpass filter (FF01-457/530/628, Semrock), and 50 mm C-mount lens (Computar). As illumination we used an RGB floodlight (FUTT07, MiLight) with a remote control to change the color and brightness. The spectra and sensitivities are visualized in Fig. 2, where for this study we used color setting 71, which corresponds to a combination of light from the red and green LEDs. As mitigation to challenges I and III, the triple-band bandpass filter reduces the effect of ambient light on the blood oxygenation measurement and was selected to match with the emission spectra of the LEDs. Polarizers were attached to the camera and illumination to enable cross-polarizated illumination/detection to reduce specular reflection (challenge II). Video data was captured with a spatial resolution of $3328 \times 2464$ pixels, at a sampling frequency 15 Hz, and was stored in an uncompressed binary format.

Fig. 1. The setup used for the creation of Dataset I. The subjects are seated with reference probes attached to the finger and ear, an oxygen mask covering their nose and mouth, and with the camera and illumination placed at a distance of approximately 80 cm from the face.

Download Full Size | PDF

Fig. 2. The camera channel sensitivities (solid RGB), emission spectra of the LED illumination (dashed RGB), transmission spectrum of the triple-band bandpass filter (black), and PPG amplitude spectrum for different SpO₂ values.

Download Full Size | PDF

2.3 Processing for SpO₂ extraction

The processing pipeline for the SpO₂ extraction is visualized in Fig. 3. We will now describe the steps in more detail.

Fig. 3. The processing pipeline for contactless SpO₂ estimation with an RGB camera. The selected region-of-interest (ROI) is tracked to compensate for head movements during acquisition. The ROI is downsampled and for each sub-region (skin patch) the ‘ratio-of-ratios’ (RoR), i.e. the ratio of the relative PPG amplitudes of the two wavelengths, is calculated. The final RoR for a specific time instance is calculated by weighting the skin patches by their corresponding values of the quality metric.

Download Full Size | PDF

Tracking and downsampling. First, the region-of-interest (ROI) consisting of the forehead was manually selected in the first video frame. This ROI was tracked with the Minimum Output Sum of Squared Error (MOSSE) tracker [28] and downsampled by a factor of 20 with a box-shaped kernel. The intensities of the pixels within the downsampled ROI are concatenated over time for the three color channels to generate PPG traces.

PPG pre-processing and RoR estimation. We divided the raw PPG signals for each wavelength ($\mathbf {C^{i}}$) by its quasi-DC signal obtained by low-pass filtering (LPF), and bandpass filtered (BPF) the resulting signal to obtain the DC-normalized PPG waveforms:

(1)$$\tilde{C}_{i,s}=\mathbf{BPF}\Bigg(\frac{C_{i,s}}{\mathbf{LPF}(C_{i,s})}\Bigg),$$

where $\tilde {C}_{i,s}$ denotes the DC-normalized signal for wavelength $i$ and skin patch $s$. $\mathbf {LPF}(\cdot )$ is a first-order Butterworth filter with a cut-off frequency of 0.7 Hz, and the $\mathbf {BPF}(\cdot )$ is a fourth-order zero-phase Butterworth filter with a passband in the range $0.7-4$ Hz, corresponding to $42-240$ beats per minute (bpm), the typical range of pulse rates for healthy adults. The DC-normalized color signals $\mathbf {\tilde {C}_{s}}$ are the input for the APBV (Adaptive Pulsatile Blood Vector) method [6], which can be mathematically summarized as:

(2)$$[O_{s},Q_{s}] = \mathop{\textrm{arg max}}\limits_{SpO_{2}\in \mathbf{SpO_{2}}}Q\bigg(\underbrace{\overbrace{k\vec{P_{bv}}(SpO_{2})[\mathbf{\tilde{C}_{s}}\mathbf{\tilde{C}^{T}_{s}}]^{{-}1}}^{\vec{W}_{PBV}}\mathbf{\tilde{C}_{s}}}_{P_{s}}\bigg),$$

where $P_{s}$ is the pulse signal, scalar $k$ is chosen such that $\vec {W}_{PBV}$ has unit length, and $O_{s}$ and $Q_{s}$ are the estimated SpO₂ value with corresponding quality index for skin patch $s$, respectively. The calculation of the weights for extraction of the pulse signal, $\vec {W}_{PBV}$, is formulated as a least squares problem using pulse signatures $\vec {P_{bv}}$ as representation for the different SpO₂ values. For this study we used the red and blue color channel of the RGB camera where we set the value for the red channel within the signature to one and sampled the values for the blue channel between 1 and 5 with a resolution of 0.01. The reason why we used the blue instead of the green color channel to capture the light from the green LED is the much lower sensitivity of the blue channel for the red LED as can be observed in Fig. 2. This results in a better SpO₂ contrast, i.e. a larger variation of the ‘ratio-of-ratios’ (RoR) as function of SpO₂ (Eq. (3)).

(3)$$SpO_{2}=C_{1}+C_{2}\cdot\overbrace{\Bigg(\frac{(\frac{AC}{DC})_{\lambda_{1}}}{(\frac{AC}{DC})_{\lambda_{2}}} \Bigg)}^{RoR} \mathop\Leftrightarrow\limits^{{\textrm{Direct vs Indirect}}} \vec{P_{bv}}(SpO_{2})=\begin{bmatrix} (\frac{AC}{DC})_{\lambda_{1}} \\ (\frac{AC}{DC})_{\lambda_{2}} \end{bmatrix}(SpO_{2}),$$

The SpO₂-value corresponds to the signature vector that yields the best pulse signal quality (Eq. (2)). To assess the pulse quality, we used the skewness of the pulse spectrum. The rationale behind the skewness metric is that the spectrum of a clean pulse is highly peaked (i.e. a high skewness), whereas a noisy signal has a clear lower skewness. The quality metric, $Q$, can be described as:

(4)$$Q = \frac{\frac{1}{N}\sum\limits_{f=1}^{N} \big(H_{P}(f)-\bar{H_{P}}\big)^{3}}{\Bigg(\sqrt{\frac{1}{N}\sum\limits_{f=1}^{N} (H_{P}(f)-\bar{H_{P}})^{2}}\Bigg)^{3}},$$

where $H_{P}$ denotes the frequency spectrum of pulse signal $P$ (Eq. (2)) and $\bar {H_{P}}$ is the average of all spectral components of $H_{P}$. Once we have calculated the SpO₂ signature for each skin patch, the final estimate is obtained by quadratic weighting of each signature by its corresponding value for the quality metric.

2.4 Datasets

We created two datasets with different purposes for this study. The research followed the principle of the Declaration of Helsinki, and informed consent was obtained from all subjects prior to the start of the protocol.

2.4.1 Dataset I - hypoxia lab

In order to determine the calibration coefficients for the proposed setup, a study has been executed at the hypoxia lab of UCSF, San Francisco, CA. Twenty adult subjects were enrolled, where the data from one subject had to be excluded from the analysis because the reference data was absent. From the remaining 19 subjects, 12 were female, the skin type distribution was [0, 5, 4, 6 ,2, 2] for type I-VI on Fitzpatrick’s scale, age $29.6\pm 7.54$ years, weight $67.1\pm 11.5$ kg and height $167.2\pm 9.69$ cm. All subjects were in good general health with no evidence of any medical problems and had no facial make-up and/or tattoos. As visualized in Fig. 1, subjects were seated, had two reference oximeters attached to their finger (Nellcor) and ear (Masimo), and an oxygen mask covered their nose and mouth. Saturation levels involved one period with air breathing and then at one of six levels with reduced oxygen, e.g. 94%, 90%, 85%, 80%, 75% and 70% saturation. Each level of saturation was held for 30-60 seconds. The operator then changed the inspired oxygen concentration to attain the next desired stead-state level of hypoxia. A run consists of several stable steady-state hypoxia levels and is terminated by a breath of 100% O₂ followed by room air. All subjects performed two runs of 10-15 minutes, each leading to a dataset consisting of video data with a total duration of 402 minutes. An overview of the distribution of SpO₂ and pulse rate values is visualized in Fig. 4. The participants were asked to try to remain stationary while the video was captured.

Fig. 4. Histograms of the SpO₂ (left) and pulse rate (right) values present in the Dataset I, which is used to determine the calibration coefficients for the setup.

Download Full Size | PDF

2.4.2 Dataset II - different scenarios

Whereas the purpose of Dataset I is to determine the calibration model, Dataset II is used to assess the accuracy of the measurement for realistic spot check scenarios and settings. Ten adult subjects were enrolled (two female) for the study, where similar in- and exclusion criteria have been used as for Dataset I. The protocol we used to create this dataset is visualized in Fig. 5. To study the impact of motion on the measurement accuracy, participants were asked to take different positions and to breathe with different air volumes. Furthermore, decreasing respiratory rate and increasing tidal volume has been shown to improve ventilation efficiency via alveolar recruitment and distension, thus reducing alveolar dead space [29]. Although the effect is small in a healthy population, there is an expected difference in oxygen saturation between ‘shallow’ and ‘deep’ breathing, which is also stable compared to short-term breath-hold events. The positions consisted of seated with and without head support (pillow), and standing, where we adjusted the height-adjustable desk on which the camera and illumination were mounted. To study the impact of ambient light, we opened the blinds to enable daylight to enter the lab for one scenario, and for the other scenario we turned on the fluorescent lamps on the ceiling. All subjects executed the different scenarios of the protocol twice leading to a dataset consisting of video data with a total duration of 240 minutes. As reference, we used three finger-oximeters (Philips, Masimo, Nellcor) connected to the index, middle and ring finger, respectively, and one nasal alar sensor (Philips).

Fig. 5. Protocol used to create Dataset II, with various realistic scenarios for spot check measurements in a clinical setting.

Download Full Size | PDF

2.5 Evaluation metrics

To evaluate the performance of the camera-based SpO₂ estimates we computed the error (bias) and absolute error metrics, which are calculated as:

\begin{aligned} \text{Error} &= \frac{\sum\limits_{i=1}^{L} \bigg(SpO_{2}^{Cam}(i)-SpO_{2}^{Ref}(i)\bigg)}{L} \\ \text{Absolute Error} &= \frac{\sum\limits_{i=1}^{L} \bigg|SpO_{2}^{Cam}(i)-SpO_{2}^{Ref}(i)\bigg|}{L}, \end{aligned}

where $SpO_{2}^{Ref}$ is the median of all contact-probes to improve the reliability of the reference. The reference data was synchronized with the camera data based on the extracted pulse rates (Dataset I, with hospital equipment), or based on timestamps when the clocks of the acquisition systems were synchronized or the offset could be determined (Dataset II, with lab equipment). Additionally, to quantify the effect of motion in Dataset II, we calculated the pixel displacement of the ROI for each scenario as the standard deviation of the radius variation within a processing window.

2.6 Processing for determining calibration model

To compute the calibration coefficients $C_{1}$ and $C_{2}$ (Eq. (3)) we used robust regression using iteratively re-weighted least squares [30]. For each subject in Dataset I we estimated the RoR and calculated the corresponding SNR values for time windows of 20 s and with a step size of 1 s, as described in Sec. 2.3. We then applied a 10th-order temporal median filter to each data point and its corresponding SpO₂ reference value to smooth the signal. The data points from the two desaturation runs of each subject were combined before performing the robust regression. After obtaining the calibration coefficients for each subject, the final values for $C_{1}$ and $C_{2}$ were calculated by quadratic weighting of the coefficients by the median SNR values.

3. Results and discussion

With the procedure described in Section 2.6, the calibration coefficients $C_{1}$ (offset) and $C_{2}$ (slope) could be determined as 53.3 and 11.8, respectively. These coefficients were used to calculate the SpO₂ values for Dataset II. When performing the same coefficient estimation procedure on the scenarios with the least motion (seated and with supported head) of Dataset II where the variation in SpO₂ values is much smaller, we obtained very similar coefficients: 53.6 and 11.5 for $C_{1}$ and $C_{2}$.

The results of Dataset II are visualized in Fig. 6 and in Fig. 7, where for the boxplots the central mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The first boxplot of Fig. 6 shows that the error (bias) is overall negative, indicating an underestimation of SpO₂, especially for the most challenging scenarios. For the (preliminary) screening applications for which this setup is envisioned, an underestimation leads to false positives (i.e. overtriage), but not to false negatives (i.e. undertriage) which would lead to patients not getting the treatment they need. The absolute errors displayed in the second boxplot indicate that when the person is seated, the system is sufficiently accurate to discriminate between normal, around $95\%$ to $100\%$, and low ($<90\%$ [31]) blood oxygen levels. Also, ambient light disturbances only have a small effect on the accuracy. This is an important result, as this does not restrict the possible applications to only a screening booth, completely shielded from the (clinical) environment, but also allows measurements in a more open space. Unsurprisingly, reliable measurements while the subject is standing seems a bridge too far due to motion, with an error of about 8 percentage-points.

As a result of using only two wavelengths, the impact of motion on the measurement is much larger compared to our previously reported results where we used a system of three or more wavelengths [6,32]. The direct relation between the amount of motion and the measurement accuracy can be easily observed from Fig. 6. For a fair comparison and to give a good impression of the performance for the different use cases, we did not exclude any data points from the analysis, e.g., based on a quality metric. To deploy the system in a clinical setting, it would be important to provide real-time feedback to both the user and the clinician. The user can then be instructed to keep still during the measurement and the clinician can better interpret the reading when displaying a corresponding confidence index. This would result in a faster and more reliable triaging/screening process.

Prior to starting the recordings for both datasets we adjusted the aperture of the camera lens to prevent overexposure. Clipped pixels do not contain blood volume variation information, and furthermore, when not excluded from the spatial/pixel averaging process, clipped pixels do affect the relative amplitude of the PPG signals and hence the SpO₂ measurement. Besides overexposure, also underexposure should be prevented, which is especially relevant for subjects with dark skin. For a subject with dark skin in Dataset II, the maximum skin pixel values we would get with maximum aperture were only $15\%$ of the dynamic range of the camera. For some of the subjects in Dataset I this number was even lower. Since the strength of the PPG signals is multiplicative with the light intensity, and other disruptive components are mostly additive, e.g. read noise, it is important to make good use of the camera’s dynamic range such that the PPG signal quality is least affected by the additive noise components. There are several solutions possible if this is not possible with only the aperture: 1) reduce the frame rate to be able to increase the exposure time, 2) use more cameras to average over more pixels values, or 3) use the green instead of the blue color channel to capture the light from the green LED. The last option however negatively impacts the SpO₂ sensitivity, i.e. a steeper slope, because of the sensitivity of the green color channel for red wavelengths as can be observed from Fig. 2. The steeper slope would largely cancel out the benefit of the better SNR of the green channel and the overall beneficial effect on SpO₂ accuracy as well.

While the presented results are promising, further validation on patients in a clinical setting is needed to verify that our conclusions still hold. It should be noted that this concept is especially relevant because it allows rapid deployment and scale-up in contrast to earlier developed setups. There is however still a need for a cost-efficient camera that can capture the optimal wavelength for SpO₂ to make the measurement more accurate and tolerant to motion as demonstrated earlier [6,32], such that reliable measurements are also possible when people are not compliant.

Fig. 6. Boxplots of the results on Dataset II: Error/Bias in percentage points (left), Mean Absolute Error in percentage points (middle), and Motion in pixels (right) for each scenario. The central marks indicate the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively.

Download Full Size | PDF

Fig. 7. Cumulative density function (CDF) of the results obtained on Dataset II.

Download Full Size | PDF

4. Conclusion

We presented a practical and cost-efficient setup for contactless SpO₂ estimation for screening applications with an RGB camera. Based on the identified challenges we could mitigate the most critical ones with the addition of COTS components only. After having determined the calibration coefficient on a self-created dataset with different skin types and a wide range of SpO₂ values, results show that the error can be reduced to acceptable levels for realistic screening applications on a different population. Remaining challenges are subject motion and low SNR because of underexposure. We think our findings are important to help accelerate the deployment of contactless SpO₂ for spot check measurements, e.g. in case of an outbreak such as COVID-19.

Acknowledgments

The authors would like to thank Dr. P. Bickler and his team at the UCSF hypoxia lab and Philips colleagues G. Lötgerink and R. Groen for their support in the preparation and execution of the desaturation study. Furthermore we thank all volunteers for their participation in the experiments and the reviewers for their valuable feedback.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. De Haan and V. Jeanne, “Robust pulse rate from chrominance-based RPPG,” IEEE Trans. Biomed. Eng. 60(10), 2878–2886 (2013). [CrossRef]

2. M. Van Gastel, S. Stuijk, and G. de Haan, “Motion robust remote-PPG in infrared,” IEEE Trans. Biomed. Eng. 62(5), 1425–1433 (2015). [CrossRef]

3. D. Luguern, S. Perche, Y. Benezeth, V. Moser, L. A. Dunbar, F. Braun, A. Lemkaddem, K. Nakamura, R. Gomez, and J. Dubois, “An assessment of algorithms to estimate respiratory rate from the remote photoplethysmogram,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, (2020), pp. 304–305.

4. M. van Gastel, S. Stuijk, and G. de Haan, “Robust respiration detection from remote photoplethysmography,” Biomed. Opt. Express 7(12), 4941–4957 (2016). [CrossRef]

5. I. C. Jeong and J. Finkelstein, “Introducing contactless blood pressure assessment using a high speed video camera,” J. Med. Syst. 40(4), 77 (2016). [CrossRef]

6. M. van Gastel, S. Stuijk, and G. De Haan, “New principle for measuring arterial blood oxygenation, enabling motion-robust remote monitoring,” Sci. Rep. 6(1), 38609 (2016). [CrossRef]

7. R. Amelard, R. L. Hughson, D. K. Greaves, K. J. Pfisterer, J. Leung, D. A. Clausi, and A. Wong, “Non-contact hemodynamic imaging reveals the jugular venous pulse waveform,” Sci. Rep. 7(1), 40150 (2017). [CrossRef]

8. M. van Gastel, B. Balmaekers, S. B. Oetomo, and W. Verkruysse, “Near-continuous non-contact cardiac pulse monitoring in a neonatal intensive care unit in near darkness,” Proc. SPIE 10501, 38 (2018). [CrossRef]

9. G. A. Capraro, B. Balmaekers, A. C. den Brinker, M. Rocque, Y. DePina, M. W. Schiavo, K. Brennan, and L. Kobayashi, “Contactless vital signs acquisition using video photoplethysmography, motion analysis and passive infrared thermography devices during emergency department walk-in triage in pandemic conditions,” The J. Emer. Med. 63(1), 115–129 (2022). [CrossRef]

10. U. Rubins, A. Miscuks, and M. Lange, “Simple and convenient remote photoplethysmography system for monitoring regional anesthesia effectiveness,” in EMBEC & NBC 2017, (Springer, 2017), pp. 378–381.

11. A. A. Kamshilin, V. Teplov, E. Nippolainen, S. Miridonov, and R. Giniatullin, “Variability of microcirculation detected by blood pulsation imaging,” PLoS One 8(2), e57117 (2013). [CrossRef]

12. M. Lai, S. D. van der Stel, H. C. Groen, M. van Gastel, K. F. Kuhlmann, T. J. Ruers, and B. H. Hendriks, “Imaging PPG for in vivo human tissue perfusion assessment during surgery,” J. Imaging 8(4), 94 (2022). [CrossRef]

13. W. Verkruysse, M. Bartula, E. Bresch, M. Rocque, M. Meftah, and I. Kirenko, “Calibration of contactless pulse oximetry,” Anesth. Analg. 124(1), 136–145 (2017). [CrossRef]

14. Medical electrical equipment – Part 2-61: Particular requirements for basic safety and essential performance of pulse oximeter equipment, Standard, International Organization for Standardization, Geneva, CH (2018).

15. L. Kong, Y. Zhao, L. Dong, Y. Jian, X. Jin, B. Li, Y. Feng, M. Liu, X. Liu, and H. Wu, “Non-contact detection of oxygen saturation based on visible light imaging device using ambient light,” Opt. Express 21(15), 17464–17471 (2013). [CrossRef]

16. K. Humphreys, T. Ward, and C. Markham, “A cmos camera-based pulse oximetry imaging system,” in 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, (IEEE, 2006), pp. 3494–3497.

17. D. Shao, C. Liu, F. Tsow, Y. Yang, Z. Du, R. Iriya, H. Yu, and N. Tao, “Noncontact monitoring of blood oxygen saturation using camera and dual-wavelength imaging system,” IEEE Trans. Biomed. Eng. 63(6), 1091–1098 (2015). [CrossRef]

18. A. R. Guazzi, M. Villarroel, J. Jorge, J. Daly, M. C. Frise, P. A. Robbins, and L. Tarassenko, “Non-contact measurement of oxygen saturation with an RGB camera,” Biomed. Opt. Express 6(9), 3320–3338 (2015). [CrossRef]

19. N. H. Kim, S.-G. Yu, S.-E. Kim, and E. C. Lee, “Non-contact oxygen saturation measurement using YCgCr color space with an rgb camera,” Sensors 21(18), 6120 (2021). [CrossRef]

20. B. Wei, X. Wu, C. Zhang, and Z. Lv, “Analysis and improvement of non-contact SpO₂ extraction using an RGB webcam,” Biomed. Opt. Express 12(8), 5227–5245 (2021). [CrossRef]

21. I. Nishidate, K. Nakano, D. McDuff, K. Niizeki, Y. Aizu, and H. Haneishi, “Evaluation of arterial oxygen saturation using RGB camera-based remote photoplethysmography,” Proc. SPIE 10501, 35 (2018). [CrossRef]

22. M. Yoshizawa, N. Sugita, A. Tanaka, A. Togashi, I. Kaji, and T. Yambe, “Basic approach to estimation of blood oxygen saturation using an RGB color camera without infrared light,” in 2022 IEEE 4th Global Conference on Life Sciences and Technologies (LifeTech), (IEEE, 2022), pp. 68–71.

23. J. Mathew, X. Tian, M. Wu, and C.-W. Wong, “Remote blood oxygen estimation from videos using neural networks,” arXiv, arXiv:2107.05087 (2021). [CrossRef]

24. J. Brieva, E. Moya-Albor, and H. Ponce, “A non-contact SpO₂ estimation using a video magnification technique,” Proc. SPIE 12088, 21 (2021). [CrossRef]

25. A. Al-Naji, G. A. Khalid, J. F. Mahdi, and J. Chahl, “Non-contact SpO₂ prediction system based on a digital camera,” Appl. Sci. 11(9), 4255 (2021). [CrossRef]

26. U. Bal, “Non-contact estimation of heart rate and oxygen saturation using ambient light,” Biomed. Opt. Express 6(1), 86–97 (2015). [CrossRef]

27. A. Moço and W. Verkruysse, “Pulse oximetry based on photoplethysmography imaging with red and green light,” J. Clin. Monit. Comput. 35(1), 123–133 (2021). [CrossRef]

28. D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, “Visual object tracking using adaptive correlation filters,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (IEEE, 2010), pp. 2544–2550.

29. G. Bilo, M. Revera, M. Bussotti, D. Bonacina, K. Styczkiewicz, G. Caldara, A. Giglio, A. Faini, A. Giuliano, C. Lombardi, K. Kawecka-Jaszcz, G. Mancia, P. Agostoni, and G. Parati, “Effects of slow deep breathing at high altitude on oxygen saturation, pulmonary and systemic hemodynamics,” PLoS One 7(11), e49074 (2012). [CrossRef]

30. P. W. Holland and R. E. Welsch, “Robust regression using iteratively reweighted least-squares,” Commun. Stat. Theory Methods 6(9), 813–827 (1977). [CrossRef]

31. S. R. Majumdar, D. T. Eurich, J.-M. Gamble, A. Senthilselvan, and T. J. Marrie, “Oxygen saturations less than 92% are associated with major adverse events in outpatients with pneumonia: a population-based cohort study,” Clin. Infect. Dis. 52(3), 325–331 (2011). [CrossRef]

32. M. van Gastel, W. Wang, and W. Verkruysse, “Reducing the effects of parallax in camera-based pulse-oximetry,” Biomed. Opt. Express 12(5), 2813–2824 (2021). [CrossRef]

Challenge	Description	Error type
I	Broad spectral sensitivities of RGB channels make the apparent SpO₂ highly dependent on the spectral content of ambient illumination.	Bias
II	Specular reflectance drastically changes the apparent SpO₂, in particular in dark skin.	Bias
III	Higher measurement noise due to less favorable contrast for SpO₂ with red and green.	Noise
IV	Unequal penetration depths in skin of red and green wavelengths, causing intra-individual bias errors.	Bias
V	Relatively poor signal to noise ratio with RGB cameras, compared to state of the art SpO₂ camera prototypes.	Noise and bias
VI	Device dependent spectral sensitivities of RGB channels.	Bias

Contactless SpO₂ with an RGB camera: experimental proof of calibrated SpO₂

Abstract

1. Introduction