Reducing the effects of parallax in camera-based pulse-oximetry

Mark van Gastel; Mark van Gastel; Wenjin Wang; Wenjin Wang; Wim Verkruysse; Wim Verkruysse

doi:10.1364/BOE.419199

1. Introduction

Monitoring of physiological parameters without touching the skin is highly attractive, e.g., for people with burned or fragile skin, when there is a serious risk of cross-infection transmission such as with COVID-19, or where the conventional contact-based measurement causes discomfort which could affect its diagnostic value. Over the years, various contactless methods for physiological monitoring have been proposed including radar [1], sonar [2], thermal imaging [3], WiFi [4] and regular cameras. Using consumer-grade cameras is particularly interesting as it allows a cost-efficient and convenient measurement of physiological information, movements and context information simultaneously. Furthermore, compared to RF-based techniques (e.g., radar, WiFi) that only measure the motion signal induced by breathing and beating of the heart, remote photoplethysmography (rPPG) is based on blood absorption and therefore can be used to measure the blood oxygenation level.

So far, most attention in the field of rPPG has been given to heart (pulse) rate and derived features [5,6], and respiration [7]. For (preterm) infants [8], COPD patients [9] and for the detection of sleep apnea [10] it is important to also monitor the oxygen content of arterial blood, i.e. the fraction of hemoglobin saturated with oxygen. Peripheral oxygen saturation (SpO₂) is the optical measurement of arterial oxygen saturation (SaO₂). In contrast to SaO₂ where arterial blood samples have to be taken, SpO₂ enables a non-invasive and continuous measurement using low-cost hardware, a pulse-oximeter, attached to a peripheral such as finger or earlobe. The cardiac-synchronous periodicity of the PPG waveform is used to isolate the absorption of arterial blood from the other absorbers. One of the challenges in rPPG is the very small signal because only a limited amount of the skin-reflected light has interacted with pulsatile blood. Compared to the conventional transmissive contact-based sensors, the modulation depth (PPG signal strength) is much smaller for the camera, up to 100 times [11], mostly because of the different source-detector geometry, while also anatomical location (finger vs forehead) may play a role.

To extract physiological information from the often noisy and motion-corrupted PPG waveforms, methods have been proposed which exploit the fact that the spectral signature of blood is different than those of the disturbances. By capturing the scene at different wavelengths, suppression of distortions within the heart rate band, e.g., $40-240$ beats per minute, is possible. Compared to other physiological parameters, the validity of the assumptions on the direction of the physiological and disturbance information are more critical for SpO₂. For pulse, violation of these assumptions typically leads to a worsened signal-to-noise ratio which would often still allow determination of the correct pulse rate based on frequency characteristics. For SpO₂, however, the measurement based on amplitude characteristics is directly affected. Part of the algorithm assumption violation is caused by the hardware because a cost-efficient equivalent of an RGB camera, i.e. a camera with a color filter array placed in front of the image sensor, is not readily available in the red near-infrared range, the ‘optical window’ where the sensitivity to SpO₂ variations is largest and the light is able to reach the deeper layers of the skin. The available single-optics multi-spectral options: 1) do not have the desired spectral characteristics for SpO₂, 2) are expensive and bulky due to the concept, e.g. a prism to project the incident light to multiple image sensors with different spectral filters, 3) focus on DC spectroscopy applications, e.g. agriculture, and do not offer the required signal-to-noise ratio for rPPG, or 4) suffer from color inhomogeneities and crosstalk between channels. As a consequence, multiple monochrome cameras with different optical filters are typically used to obtain spectral selectivity for the SpO₂ measurement. These cameras however have slightly different viewpoints which introduces parallax. Parallax is a difference in the apparent position of an object viewed along two different lines of sight. The algorithms assume the wavelengths, i.e. the cameras or color channels, to capture the same skin area. A systematic investigation on how much violation of this assumption is acceptable for SpO₂ measurements has not been performed yet.

In this paper we investigate and quantify the effects of parallax by a large-scale experiment. We construct a dedicated dataset with three different parallax settings. For each of these settings subjects were asked to follow a protocol which includes scenarios ranging from suppressed motion, to realistic motion, to challenging head motions, to simulate different use cases. We estimate the oxygenation levels with our earlier published SpO₂ method (“APBV” [12]) for a global image registration approach and compare this to a proposed adaptive local registration method to further reduce the parallax-induced image misalignment. The outcomes of this investigation are an important indicator of the practicality of camera-based pulse-oximetry for different clinical use cases such as sleep, screening for infectious diseases such as COVID-19, or emergency department (ED) triage.

The paper is organized as follows: in the next section image registration methods, the processing framework, the experimental setup, dataset and evaluation metrics are described. Next, the results are presented including a discussion. Finally, the conclusions are drawn in the last section.

2. Materials and methods

In this section we first describe the problems of parallax and introduce global and local image registration approaches. Next, we describe the processing framework for SpO₂ estimation, the experimental setup, the created dataset and the metrics used for evaluation.

2.1 Image registration

To capture the desired wavelengths for SpO₂, currently multi-camera setups are typically used [11,13–15]. However, a significant challenge of such multi-camera setups is the parallax between cameras. If the image planes of different cameras cannot be well aligned due to parallax, the pixels across multiple channels will not measure the same optical information, which may jeopardize the measurement of SpO₂ that is highly sensitive to the relative color changes between wavelength channels. A hardware solution to reduce parallax is to increase the distance between subject and camera. This option is limited in specific scenarios with short subject-to-camera distances like in a Neonatal Intensive Care Unit (NICU), patient monitoring room or sleeping room, or in-car setting. An existing algorithmic solution to reduce parallax is by image registration.

Conventional image registration methods [16] perform transformation of image pixels via a global model estimated from the epipolar geometry of at least two cameras (called “global registration”). It can be described in two steps: (i) calibration, in the first few frames of a video sequence, it first estimates a global linear transformation model between image planes of two cameras. The model can be assumed to have different degrees of freedom (see Fig. 1), such as translation, rotation, scaling, Euclidean, affine and homography, but the underlying transformations are all assumed linear; (ii) transformation, for the subsequent frames (i.e. assuming that the camera setup is fixed during the recording), it uses the estimated model to transform the images from one camera to the reference camera (e.g. warp the pixels) to have a single alignment.

Fig. 1. The exemplified linear models for global image registration.

Download Full Size | PDF

Such image registration has two apparent limitations: (i) the transformation is linear, which cannot cope with the challenge that either the scene has a clear depth information or the object has a clear 3D geometry, i.e. linear transformation is essentially only effective for a 2D plane; (ii) the estimation of the registration model is not adaptive to video contents, which is only valid for the case with a fixed subject-to-camera distance. If the subject has body motion during the measurement that changes his/her distance-to-camera, the registration using a fixed model will fail. The poor image registration will lead to mis-aligned image planes and introduce color gradients/artifacts in the multi-channel images, which may kill the SpO₂ measurement that relies on accurate color amplitude information between multi-wavelength channels.

To address the limitations of existing registration approaches, we propose a non-linear adaptive camera registration method. In general, the proposed method has two essential steps: (i) estimate the displacement of local image patches between different image planes that correspond to the same object; (ii) transform one image with respect to the other using the local displacements measured per image patch, which is a highly non-linear transformation, i.e. there is no assumption posed on epipolar geometric constraints for this step (see the illustration in Fig. 2).

Fig. 2. Flowchart of the proposed local adaptive image registration, which consists of two steps: (a) local image patch displacement measurement, (b) local image patch interpolation. The red dots represent the image patches in the un-registered images viewing the same object; the blue dots represent the registered image patches using proposed local registration method.

Download Full Size | PDF

More specifically, the proposed registration method is a patch-to-patch alignment approach, depending on the unit of pixel representation, where each image patch across multiple cameras has its own interpolation in the new image plane. A typical way is to first select a reference camera. Assuming the use case with three cameras, the camera placed in the central position is selected as the reference as it has shorter distance with respect to the other cameras. We measure the patch-to-patch displacement between the reference image and non-reference image. The local displacement can be measured by dense optical flow, such as Lucas Kanade flow [17], Farneback flow [18], Horn-Schunck flow [19], block-matching flow [20], deep-nets flow [21] or 3DRS [22]. This step can be expressed as:

(1)$$D=\mathbf{DOF}(I_{nonref}, I_{ref}) ,$$

where $\mathbf {DOF}(\cdot )$ denotes the dense optical flow; $I_{ref}$ and $I_{nonref}$ denotes the reference and non-reference images, respectively; $D$ denotes a 2-channel image representing the displacement of pixels, i.e. one channel contains horizontal shift information and the other channel contains vertical shift information. We mention that (dense) optical flow has been used to measure pixel movement between time-sequential images. Here we use it to measure the displacement of image patches between images from different cameras sampled at the same time due to parallax.

Next, we use $D$ to interpolate the non-reference images such that the image patches will be aligned with the ones in the reference image:

(2)$$I_{reg}=\mathbf{Interpolate}(I_{nonref},D) ,$$

where $\mathbf {Interpolate}(\cdot )$ denotes the pixel interpolation; $I_{reg}$ denotes the registered image. The image patches-based interpolation is a highly nonlinear image transformation.

These two steps, including the measurement of local image patches displacement and image patches interpolation, are performed for each individual frame. Therefore, our image registration is adaptive to real-time video contents/events, i.e. it is robust to the scene depth changes or the object position change (e.g., change of the distance between subject and camera) during the monitoring (see Fig. 3).

Fig. 3. Qualitative comparison between the unregistered, fixed global registered, and adaptive local registered images in two different subject-to-camera distances (i.e. different amount of parallax), where the images of the three separate cameras are visualized as Red, Green and Blue channels, respectively, in an RGB image. The proposed method clearly improves image registration, reducing color gradients appear in the fixed global registration.

Download Full Size | PDF

The strength of our method is that it allows non-linear registration between different cameras and such registration is adaptive to video contents. Thus, it can effectively reduce the color artifacts due to imperfect image alignment that are common in the existing registration methods that assume a global linear transformation model and the model is not adaptively estimated in real-time. In the following section, we shall benchmark its performance against the existing registration approaches.

2.2 Processing for SpO₂ extraction

After registration, 13 facial skin patches are defined and tracked using a Histogram of Oriented Gradients (HOG)-based facial landmark detector, as visualized in Fig. 4. The pixel intensities within the skin patches are spatially averaged for the three wavelengths and concatenated over time to generate PPG traces. We divided the raw PPG signals for each wavelength ($\mathbf {C^{i}}$) by its quasi-DC signal obtained by low-pass filtering (LPF), and bandpass filtered (BPF) the resulting signal to obtain the DC-normalized PPG waveforms:

(3)$$\tilde{C}_{i,s}=\mathbf{BPF}\Bigg(\frac{C_{i,s}}{\mathbf{LPF}(C_{i,s})}\Bigg),$$

where $\tilde {C}_{i,s}$ denotes the DC-normalized signal for wavelength $i$ and skin patch $s$. $\mathbf {LPF}(\cdot )$ is a first-order Butterworth filter with a cut-off frequency of 0.7 Hz, and the $\mathbf {BPF}(\cdot )$ is a fourth-order zero-phase Butterworth filter with a passband in the range $0.7-4$ Hz, corresponding to $42-240$ beats per minute (bpm), the typical range of pulse rates for healthy adults. The DC-normalized color signals $\mathbf {\tilde {C}_{s}}$ are the input for the APBV method [12], which can be mathematically summarized as:

(4)$$[O_{s},Q_{s}] = \mathop{\textrm {arg}\,\textrm{max}}\limits_{SpO_{2}\in \mathbf{SpO_{2}}}Q{\bigg(}\underbrace{\overbrace{k\vec{P_{bv}}(SpO_{2})[\mathbf{\tilde{C}_{s}}\mathbf{\tilde{C}^{T}_{s}}]^{{-}1}}^{\vec{W}_{PBV}}\mathbf{\tilde{C}_{s}}}_{P_{s}}{\bigg)},$$

where $P_{s}$ is the pulse signal and scalar $k$ is chosen such that $\vec {W}_{PBV}$ has unit length. The calculation of the weights for extraction of the pulse signal, $\vec {W}_{PBV}$, is formulated as a least squares problem using pulse signatures $\vec {P_{bv}}$ as representation for the different SpO₂ values, where we use the pulse signatures from our earlier calibration study [23] because of the same experimental setup. Although an SpO₂ measurement requires only two wavelengths, adding dimensionality by a third ‘redundant’ wavelength allows to suppress distortions as we showed earlier [12]. We sample pulse signatures in the range $70-105$ % SpO₂ with a resolution of $0.1$ %. The SpO₂ estimate is obtained by quadratic weighting of the individual estimates, $O_{s}$, by the corresponding pulse quality indices, $Q_{s}$. The quality index $Q$ is calculated as the skewness in the frequency domain. We calculated the SpO₂ value every second using processing windows of 10 s and used a 3 s moving average filter as post-processing to arrive at the final estimate.

Fig. 4. To investigate the effects of parallax, three identical monochrome cameras are horizontally and equally spaced by three distances: 8 cm (small parallax), 24 cm (medium parallax) and 50 cm (large parallax).

Download Full Size | PDF

2.3 Experimental setup

The experimental setup consists of three monochrome CCD cameras (AVT Manta G-283B, Allied Vision GmbH, Stadtroda, Germany) placed on an optical table. All cameras were equipped with identical 150 mm lenses (Schneider-Kreuznach 7805791, Bad Kreuznach, Germany). To obtain spectral selectivity, optical fluorescence single-band bandpass filters with center wavelengths (CWLs) of 760, 800 and 905 nm and full-width at half-maximums (FWHMs) of 20, 20 and 43 nm were used. The cameras were externally triggered at a stable frame rate of $15$ Hz and were horizontally spaced by 8 cm (small parallax), 24 cm (medium parallax) and 50 cm (large parallax), as visualized in Fig. 4. All cameras had an exposure time of 20 ms, where we adjusted the apertures such that the maximum intensities of the pixels within the region-of-interest (face) are at $80\%$ of the dynamic range. The image data was captured at a resolution of $968\times 728$ pixels with $2\times 2$ binning and 8-bits pixel depth. Homogeneous and diffuse illumination was provided by 2 armatures (Falcon Eyes, Hong Kong, China), each equipped with 9 incandescent lamps (Philips 60 W) at a distance of about 1.5 m at both sides from the subject. A current-limited DC power supply set to 210 V, 3.95 A (SM330-AR-22, Delta Elektronica, Zierikzee, The Netherlands) powered the lamps. For SpO₂ reference we used 4 conventional SpO₂ probes coupled to Philips MP2 patient monitors: a Philips finger sensor, a Philips ear sensor (M1191B and M1194A, Philips Medizin-Systeme, Böblingen, Germany), a Masimo finger sensor (LNCS DC-I, Masimo Corporation, Irvine, USA), and a Nellcor finger sensor (DS-100A Medtronic, Dublin, Ireland). A sample-wise (1 Hz) median of all 4 probes was defined as the reference signal.

2.4 Dataset

In order to quantify the effects of parallax we created a dataset consisting of 150 videos with a total duration of 1125 minutes. Five subjects (4 male) were enrolled, where each subject was asked to follow the same recording protocol 10 times for each of the three parallax settings to reduce the effects of random errors such that solid conclusions can be drawn. A summary of the dataset is listed in Table 1, the distributions of the physiological values present in the dataset are visualized in Fig. 5. The subjects were asked to sit on a chair in upright position at a distance of approximately 8 meters from the cameras. The recording of the reference data was started 20 seconds prior to, and was stopped 20 seconds after the protocol to allow synchronization of the camera and contact data because of the processing and physiological delays. The protocol consisted of five scenarios of 90 s each, and has a duration of 7.5 min as visualized in Fig. 6. The five scenarios representing different use cases are:

• Supported head (0–90 s): To prevent involuntary ballistocardiographic movements the head was supported by a soft support on the left side of the face.
• Unsupported head I (90–180 s): The subject was asked to keep their head in the (unsupported) central position.
• Motion I (180–270 s): The subject was instructed to make small movements and have an irregular respiration including short breath-hold events. This scenario was included to simulate a screening setting, e.g. for COVID-19.
• Motion II (270–360 s): Using an auditory stimulus, the subject was instructed to move their head from the central position to the head support (distance: 15 cm) at a frequency of 25 BPM. This scenario was included to simulate a non-cooperative, restless patient.
• Unsupported head (360–450 s): Similar to ‘Unsupported head I’.

Fig. 5. Distributions of SpO₂ values and pulse rates from all five subjects in the dataset.

Download Full Size | PDF

Fig. 6. To investigate the effects of parallax for realistic motion and physiological challenges, a protocol consisting of four different scenarios was used: 1) with head support, 2) without head support, 3) small movements with irregular respiration, and 4) large, periodic movements.

Download Full Size | PDF

Table 1. Description of the dataset.

View Table

2.5 Evaluation metrics

To evaluate the performance of the camera-based SpO₂ estimates we computed the mean-absolute-error (MAE) and the bias (mean difference) metrics, which are calculated as:

\begin{aligned}MAE &= \frac{\sum\limits_{i=1}^{L} {{\bigg|}}{{\bigg(}}SpO_{2}^{Cam}(i)-SpO_{2}^{Ref}(i){{\bigg)}}{{\bigg|}}}{L} \\ Bias &= \frac{\sum\limits_{i=1}^{L} {{\bigg(}}SpO_{2}^{Cam}(i)-SpO_{2}^{Ref}(i){{\bigg)}}}{L}, \end{aligned}

where $SpO_{2}^{Ref}$ is the median of all four contact-probes to improve the reliability of the reference, and $L$ are the number of samples. Before calculating the metrics it is important to compensate for processing and physiological delays between contact probes and between the contact probes and camera. The delay between contact probes is mostly caused by processing and was determined in our earlier study [11]. In this study we applied the same delay before calculating the sample-wise median. Similarly, we applied a delay of 20 s to compensate for the time offset between the camera and reference measurements.

3. Results

Examples of raw PPG signals with corresponding relative amplitudes from all subjects are visualized in Fig. 7. For the estimation of the amplitudes, all segments where the head was supported were used, i.e. the first 90 s of the protocol, a total of 45 min for each subject. The amplitudes are estimated for the camera channel with the strongest PPG signal, 905 nm, using peak-valley detection in combination with outlier rejection. It can be observed that Subject C and D have a weaker pulse signal compared to the other three subjects. The spread per individual could partly be explained by the three weeks period over which the data was collected, with associated variations in temperature, metabolism and physiology.

Fig. 7. Raw PPG signals (left) with corresponding relative amplitudes (right) of all subjects in the dataset.

Download Full Size | PDF

The evaluation metrics calculated for the dataset are visualized in Fig. 8. Figure 9 shows the average SpO₂ values including the 75th percentile values in shaded color for the different parallax settings and image registration methods. The results show the clear impact of parallax on the accuracy of the measurement with average errors of $2.53$ percentage points (pp), $1.74$ pp and $0.99$ pp from large to small parallax. During the most challenging ‘Motion’ scenarios the impact of parallax is largest with errors of $4.46$ pp, $3.17$ pp and $2.08$ pp from large to small parallax. Furthermore, there is a dominant negative bias in the error. This can partly be explained by the direction of the motion-induced intensity variations. Under color homogenous illumination conditions these variations are equal in all channels in the DC-normalized space, corresponding to a pulse signature of $\mathbf {1}$, similar to low saturation values. Another reason for the negative bias could be the asymmetry of the sampled pulse signature vectors for oxygen saturation values in this dataset, i.e. sampled vectors in the range $70-105$ % SpO₂ for normal blood oxygen levels in the dataset in the range $95-100$ %. For screening applications a negative bias leading to false positives is much preferred over a positive bias where critical conditions of patients could be missed. The ISO standard for pulse-oximeters is relatively lenient ($\pm 4\%$) [24] and for some adult applications the accuracy might be acceptable since there is typically no adverse health effect associated with providing more oxygen (higher FiO₂ levels). For premature infants, however, inaccuracies larger than $2$ % may be unacceptable given the relatively narrow target range [25]. In these patients, too high SpO₂ levels are associated with adverse health effects including retinopathy of prematurity.

Fig. 8. The MAE and bias of the camera-based SpO₂ measurement for the three parallax settings and for both image registration methods.

Download Full Size | PDF

Fig. 9. Aggregated results for the three parallax settings and two image registration methods. The shaded color around the solid lines in the first three columns indicates the 75th percentile values. The plots in the fourth column are an overlay of the plots in the first three columns.

Download Full Size | PDF

It can be observed that the proposed adaptive local registration method reduced the error compared to global image registration. On average the error is reduced by $0.47$ pp, where the largest gain is obtained for the large parallax setting with an error reduction of $0.83$ pp. For this setting the error is reduced by more than a factor of 2 for the most realistic screening scenarios, i.e. excluding ‘Motion II’. For large parallax the improvement from local registration is significant ($p<0.05$) with $p=0.0217$, one-tailed, paired t-test. The p-values for medium and small parallax are 0.0757 and 0.3008, respectively. Linear extrapolation of the results for the three parallax settings suggests that the error during the most challenging ‘Motion II’ scenario can be reduced to approximately $2$ pp when the parallax is reduced to zero. With a parallax-free system individual calibration errors relative to the population will then soon be dominant in addition to factors such as face tracking inaccuracies and temperature-dependent vignetting that violate the assumptions of the SpO₂ method.

Representative examples of the pulse and SpO₂ signals from all subjects are visualized in Fig. 10. Here the pulse signal is constructed by concatenation of the pulse segments from the selected pulse signatures of the SpO₂ estimation. Clear variations in pulse rate can be observed during the ‘Motion’ scenerio, where Subjects A and C in addition also show a variation in SpO₂, visible in both the camera and reference.

Fig. 10. Examples of the camera-derived pulse and SpO₂ signals from all five subjects in the dataset.

Download Full Size | PDF

It is worth mentioning that the parallax is defined as the ratio between the distance between the cameras and the distance between the cameras and the subject. In our setup we used rather large cameras and lenses and consequently had to place the setup at a large distance from the subject to get a small parallax. If the distance between cameras can be reduced by a factor of 4, e.g. as in smartphones or cameras with s-mount lenses, the ‘small’ parallax setting with a more practical working distance of 2 m can easily be realized.

4. Conclusion

In this study we systematically investigated the impact of parallax on the accuracy of camera-based pulse-oximetry by means of large scale experiments. A dataset consisting of recordings with a total duration of almost 19 hours was created where subjects were asked to perform realistic and challenging head movements to simulate possible use cases. Three different parallax settings were evaluated with three identical monochrome cameras with different bandpass filters in near-infrared. Oxygen saturation values were estimated with the motion-tolerant APBV method, where the performance of global image registration was compared to that of a newly proposed adaptive local image registration method to further reduce the image misalignment.

The results showed the clear impact of parallax on the accuracy of the measurement, especially during the most challenging motion scenarios, with errors of $2.53$ pp and $0.99$ pp for the largest and smallest evaluated parallax setting, respectively. The proposed adaptive local registration method enabled to reduce the error by more than a factor of 2 for the most common motion scenarios during screening settings, e.g. for COVID-19. This study gives important insights on the possible applications and use cases of remote pulse-oximetry with current affordable and readily available cameras. Furthermore, extrapolation of the results suggests that the error during the most challenging motion scenario can be reduced to approximately $2$ pp when using a parallax-free single-optics camera.

Acknowledgments

The authors would like to thank all volunteers for their participation in the experiments and the reviewers for their valuable feedback.

Disclosures

The authors declare no conflicts of interest.

References

1. M. Y. W. Chia, S. W. Leong, C. K. Sim, and K. M. Chan, “Through-wall UWB radar operating within FCC’s mask for sensing heart beat and breathing rate,” in 2005 European Radar Conference (EURAD), (2005), pp. 267–270.

2. X. Wang, R. Huang, and S. Mao, “Sonarbeat: Sonar phase for breathing beat monitoring with smartphones,” in 2017 26th International Conference on Computer Communication and Networks (ICCCN), (IEEE, 2017), pp. 1–8.

3. Y. Cho, S. J. Julier, N. Marquardt, and N. Bianchi-Berthouze, “Robust tracking of respiratory rate in high-dynamic range scenes using mobile thermal imaging,” Biomed. Opt. Express 8(10), 4480–4503 (2017). [CrossRef]

4. J. Liu, Y. Wang, Y. Chen, J. Yang, X. Chen, and J. Cheng, “Tracking vital signs during sleep leveraging off-the-shelf wifi,” in Proceedings of the 16th ACM International Symposium on Mobile Ad Hoc Networking and Computing, (ACM, 2015), pp. 267–276.

5. M.-Z. Poh, D. J. McDuff, and R. W. Picard, “Advancements in noncontact, multiparameter physiological measurements using a webcam,” IEEE Trans. Biomed. Eng. 58(1), 7–11 (2011). [CrossRef]

6. Y. Sun and N. Thakor, “Photoplethysmography revisited: from contact to noncontact, from point to imaging,” IEEE Trans. Biomed. Eng. 63(3), 463–477 (2016). [CrossRef]

7. P. H. Charlton, D. A. Birrenkott, T. Bonnici, M. A. Pimentel, A. E. Johnson, J. Alastruey, L. Tarassenko, P. J. Watkinson, R. Beale, and D. A. Clifton, “Breathing rate estimation from the electrocardiogram and photoplethysmogram: A review,” IEEE Rev. Biomed. Eng. 11, 2–20 (2018). [CrossRef]

8. N. Finer and T. Leone, “Oxygen saturation monitoring for the preterm infant: the evidence basis for current practice,” Pediatr. Res. 65(4), 375–380 (2009). [CrossRef]

9. J. Pilling and M. Cutaia, “Ambulatory oximetry monitoring in patients with severe COPD: a preliminary study,” Chest 116(2), 314–321 (1999). [CrossRef]

10. S. Dumitrache-Rujinski, G. Calcaianu, D. Zaharia, C. L. Toma, and M. Bogdan, “The role of overnight pulse-oximetry in recognition of obstructive sleep apnea syndrome in morbidly obese and non obese patients,” Maedica 8, 237 (2013).

11. W. Verkruysse, M. Bartula, E. Bresch, M. Rocque, M. Meftah, and I. Kirenko, “Calibration of contactless pulse oximetry,” Anesthesia and Analgesia 124(1), 136–145 (2017). [CrossRef]

12. M. Van Gastel, S. Stuijk, and G. De Haan, “New principle for measuring arterial blood oxygenation, enabling motion-robust remote monitoring,” Sci. Rep. 6(1), 38609 (2016). [CrossRef]

13. L. Kong, Y. Zhao, L. Dong, Y. Jian, X. Jin, B. Li, Y. Feng, M. Liu, X. Liu, and H. Wu, “Non-contact detection of oxygen saturation based on visible light imaging device using ambient light,” Opt. Express 21(15), 17464–17471 (2013). [CrossRef]

14. T. Vogels, M. van Gastel, W. Wang, and G. de Haan, “Fully-automatic camera-based pulse-oximetry during sleep,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (2018), pp. 1349–1357.

15. M. van Gastel, S. Stuijk, S. Overeem, J. P. van Dijk, M. M. Van Gilst, and G. de Haan, “Camera-based vital signs monitoring during sleep–A proof of concept study,” IEEE Journal of Biomedical and Health Informatics (to appear).

16. B. Zitova and J. Flusser, “Image registration methods: A survey,” Image and Vision Comput. 21(11), 977–1000 (2003). [CrossRef]

17. B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” - (1981).

18. G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Scandinavian conference on Image analysis, (Springer, 2003), pp. 363–370.

19. E. Meinhardt-Llopis, J. S. Pérez, and D. Kondermann, “Horn-schunck optical flow with a multi-scale strategy,” Image Process. on line 3, 151–172 (2013). [CrossRef]

20. M. Liu and T. Delbruck, “Adaptive time-slice block-matching optical flow algorithm for dynamic vision sensors,” in Proceedings of BMVC 2018, (2018).

21. E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, (2017), pp. 2462–2470.

22. G. de Haan, P. W. Biezen, H. Huijgen, and O. A. Ojo, “True-motion estimation with 3-D recursive search block matching,” IEEE Trans. Circuits Syst. Video Technol. 3(5), 368–379, 388 (1993). [CrossRef]

23. M. van Gastel, W. Verkruysse, and G. de Haan, “Data-driven calibration estimation for robust remote pulse-oximetry,” Appl. Sci. 9(18), 3857 (2019). [CrossRef]

24. “Medical electrical equipment – Part 2-61: Particular requirements for basic safety and essential performance of pulse oximeter equipment,” Standard, International Organization for Standardization, Geneva, CH (2018).

25. B.-I. Australia and U. K. C. Groups, “Outcomes of two trials of oxygen-saturation targets in preterm infants,” N. Engl. J. Med. 374(8), 749–760 (2016). [CrossRef]

Protocol	4 levels of motion
Duration protocol	7.5 minutes
Parallax settings	Small (8 cm), Medium (24 cm), Large (50 cm)
Subjects	5
# Recordings	10 for each setting, 150 in total
Total duration recordings	Almost 19 hours, over a period of 3 weeks
Cameras	3x with different bandpass filters (760 nm, 800 nm, 905 nm)
Illumination	2x incandescent armatures, diffuse
Reference	Median of 4 contact probes

Reducing the effects of parallax in camera-based pulse-oximetry

Abstract

1. Introduction