Photonic human identification based on deep learning of back scattered laser speckle patterns

Zeev Kalyzhner; Or Levitas; Felix Kalichman; Ron Jacobson; Zeev Zalevsky

doi:10.1364/OE.27.036002

1. Introduction

Optical means employed in the service of human recognition are very common for many years now. There is an abundance of methods for iris recognition [1,2,3], as well as facial features recognition such as: facial landmarks used for template matching [4], 3D modeling sensitive to facial expressions [5], thermal face recording [6] and bio-medical image analysis [7,8] found in widespread use. Although the speckle phenomenon has been known and investigated for a long time, especially speckle image formation by coherent sensing of photons diffracted from a broadband laser source [9,10]. Its use in a practical sense in this area of expertise is less common. Generally speaking, a speckle is the observed physical effect of a random pattern produced by the self-interference of different light waves of the same frequency but different phases and amplitudes that are added together to create a resultant wave with a combined intensity. This intensity is hard to predict as it depends on the properties of the light scattering medium [11].

Specifically, our interest lies in speckle patterns that occur in the far field as result of reflection from a surface illuminated by monochromatic laser light. The speckle patterns formed by hitting a surface can carry information about its deformations and allow for imaging of changes in the illuminated surface, be it dynamic or static [8].

The work by Zalevsky et al. [12] proposed measuring the speckle in the far field (not focused camera that is keeping the object defocused) in order to extract from their dynamic, various bio-medical parameters. There, tilting motion is converted into translational movement of the speckle patterns which can easily be extracted via correlation-based operation.

(1)$$\begin{aligned} {T_m}({{x_o},{y_o}} )&= {\int\!\!\!\int }exp[{i\phi ({x,y} )} ]exp\left[ {\frac{{\pi i}}{{\lambda {Z_1}}}({{{({x - {x_o}} )}^2} + {{({y - {y_o}} )}^2}} )} \right]dxdy\\ &= {A_m}({{x_0},{y_0}} )\; exp[{i\psi ({{x_0},{y_0}} )} ]\end{aligned}$$

where $({x,y} )$ are the coordinates of the transversal plane while the axial axis are denoted by Z. (x₀,y₀) are the resulting coordinates after the free space propagation. This expression of the field has an amplitude ${A_m}({{x_0},{y_0}} )$ and a new phase distribution of $exp[{i\psi ({{x_0},{y_0}} )} ]$. Then, this field distribution is imaged with our imaging system and this is Eq. (2). $exp[{i\phi ({x,y} )} ]$ is the random phase of the surface scattering light which is then propagated via Fresnel integral in free space for distance of Z₁. λ is the optical wavelength, $\phi $ – random phase created by surface roughness, and ${Z_1}$– distance between the object and plane captured by imaging system, with the intensity of the obtained speckle image being:

(2)$$\; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; I({{x_s},{y_s}} )= {\left|{\int\!\!\!\int {T_m}({{x_o},{y_o}} )h({{x_o} - M{x_s},{y_o} - M{y_s}\; } )} \right|^2}\; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; $$

where h is the spatial impulse response, M is the inverse of the magnification of the imaging system, and $({{x_s},{y_s}} )$ is the sensor plane coordinates’ set.

Imaging is equivalent to performing a convolution between the field distribution of Eq. (1) and the point spread function of the imaging system- h. Thus, instead of imaging directly the back scattered field with the random phase, [12] authors image a plane positioned distance of Z₁ from the back-scattering surface. Thus, the speckle that was capture are not generated directly in the back-scattering plane but are rather defocused speckles while Z₁ is the defocusing distance.

Combining these calculations and the method of Zalevsky et al [12] use this fact for authenticating a person based on extraction of heart beats from speckles recordings coupled with pulse measurements. This kind of identification approach relies on the dynamic property of the sensed speckle patterns which is expressed via the unique heart beating vibrations signature of the specific.

In contrast to the work of [12], the approach presented here is centered on processing static speckle patterns created by a laser beam and refracted from human skin in the forehead area. Instead of using theoretical means and deriving features from domain knowledge, we apply a set of deep neural network models [13], adapted to analyzing visual data to perform a classification task, where each class is the person we wish to identify from a predefined set. In this paper we show that the authentication of a person is very repeatable even if the person is tested at different conditions involving sweating, wetting the forehead and more.

2. Experiment setup

The data was gathered from five human subjects in a controlled lab environment. Each person was subjected to the following experimental procedure: the room in which the speckle measurements were taken was darkened to avoid background noises, with the subject seated 50 cm across from a sensor equipped with a camera recorder. Having the aim of recording the forehead area with few disturbances as possible, the face of the subject is restrained in a headset equipped with protective gear, facing the sensor to mitigate involuntary head movements. Then the subject was illuminated by a laser with output power of 770 µW at a wavelength of 1550 nm (invisible eye safe laser). Image of one of the subjects participating in the data acquisition process can be seen in Fig. 1.

Fig. 1. Collecting the data. (a). Recording speckle patterns of a person. (b). Lab equipment and headset.

Download Full Size | PDF

The camera controlled with MATLAB produces a video of 100 frames per second quality, focused on the area of interest for the duration of 10 seconds for each sample taken. Each sample is taken in 3 resolutions: 32×32, 64×64 and 128×128. During the recording the subject was static.

In the interest of varying the data and to avoid the pitfall of ingraining in the deep learning model we use biases that might be inherent in the experiment procedure itself: the data gathering was carried out on separate dates, where in one continuous session each subject was recorded in 7 different physical circumstances and facial expressions being: (1) neutral, (2) angry face expression (wrinkles of the forehead skin as obtained in angry face expression), (3) smiling, (4) eyebrows arched upward, (5) sad face expression, (6) water sprayed over the forehead, and (7) after physical exercise. Each situation was recorded in the above-mentioned resolutions. In the final phase each video produced per subject, physical category and resolution was cut down to its images of frame by frame components. Note that unlike in the dynamic speckle processing of [1], we did not try to correlate between the frames of the recorded video. The video was only used to extract large number of different static speckle patterns. No temporal variations analysis was performed.

The data itself contains about 300,000 frames (3GB), with each image-resolution containing about 100,000 frames. As part of the preparations for the training process, each image-resolution data was divided into 3 sets. Training set (80% of the data) of & validation set (10% of the data) - which we used in our models training process, and test set (10% of the data) - on which we ran the model after the training process. Each set contains randomly data from different time, but with same hardware, of our 5 subjects with all variations listed above.

3. Deep learning approach

3.1 Method overview

The nature of the problem of modeling a relation between identity and indirect measurements such as speckles patterns which are given as a visual data format falls squarely within the realm of machine learning algorithms, with Deep Learning (DL) models that have shown especially good performance with visual data such as pictures, evidenced by the ImageNet Challenge [14]. The architectures in the base of these models are neural networks which are inspired by biological neurons [15]. These models are built in layers consisting of several “neurons”, where each neuron is a unit that applies a nonlinear function (in our case ReLU [16]) on the data it receives, and then composes it with other neurons to pass it on to the next layer, until the data reaches the end of the network, evaluated by a loss function and then changes are propagated back from end to start of the net to adjust the learning process [17]. That way each layer learns how to transform its base input into a more complex representation and feeds it forward to the next layer which fine tunes the representation even further. In this manner with each pass over the data (called an epoch) and more data fed into it, the network learns on its own in each layer a specified task of recognizing certain feature or a set of features (in the case of speckles patterns, visual features) relevant to the input data. The depth of the neural network refers to the number of layers it employs.

Adapting this framework for our needs, the input layer reads the speckles patterns as frames we produced from the previous section of all the experiment subjects, all in the same image resolution. The output at the end of the final layer of the net are 5 values, each corresponding to one of the aforementioned classes, and organized in a vector. Each value represents the probability of the input data being taken from the class associated with it (i.e. person in the experiment).

3.2 Models for speckle patterns analysis

3.2.1 Simple neural networks

As an initial attempt at tackling the problem and assessing its feasibility, we begin by building different variations of a simple model of a neural network consisting of only a 1-2 hidden layers. The main characterization of this type of networks is that its layers are fully connected [18]. That is, every neuron in a fully connected layer that is connected to every neuron from the previous layer. Using this architecture, we construct nets which consist of few hidden fully connected layers (layers other than input and output layers), where the bulk of feature processing is performed. Figure 2 is a schematic of such a network.

Fig. 2. General architecture of simple fully connected neural network.

Download Full Size | PDF

The structure of the final fully connected model decided to use, and yielding the best results, contains a set of layers containing a fully connected layer and a batch normalization layer [19]. The combination of the last-mentioned layer in the model contributed to the improving the speed, performance, and stability of it.

A known disadvantage of fully connected layers is a strong possibility of overfitting [20], which is the inability to generalize the trained network model to previously unseen data. In order to curb this negative effect, we employ the dropout technique [21], which is at each phase of training the model, temporarily removes individual neurons from the net by a given probability p, selected beforehand as a parameter of the training process.

3.2.2 Convolutional neural networks

Another popular architecture known to perform well on visual data is the convolutional neural network model (CNN) [22]. The strength of CNNs is the convolution layer, where the neurons form a two-dimensional filter applied across over the input data (all the combined frames of our speckles), computing dot product between the input image and each pixel entry comprising the filter, so that each convolution layer figures out unique visual features the more time and data it processes. In addition, CNNs use down-sampling to compress the convolved data that might be very large for faster operation.

The general setup we chose for our CNNs is the following hierarchal order the input travels through:

1. two dimensional convolutional layer updating the input
2. max pooling layer for down-sampling data dimensions in a non-linear fashion [23]
3. a set of fully connected layers as in the previous DNN model, before the output level

Figure 3 illustrates the architecture of CNN we used.

Fig. 3. General architecture of CNN model.

Download Full Size | PDF

4. Experimental results

Testing our models, on a separate set of features (test set), from all our subjects, which the model did not met during the training process. The best model we achieved with our data reaches an accuracy of 80.1% on the test set.

Further analysis on the best model achieving the highest accuracy reveals a broader picture of how good the classification of the best model from previous section operates, as well as its weak spots. We supply two graphs below where Fig. 4 illustrates intra-class identification sensitivity of the distinct subcategories (physical situations) of each class (person). Figure 5 demonstrates between-class identification performance, comparing for each class the correct classification rate (auto) to a false negative rate (cross).

Figure 6 represents confusion matrix and classification statistics on our test set using trained CNN model.

Fig. 4. Accuracy rates of classifying correctly each class within every physical subset of it. The bar represents a mean accuracy rate based on model accuracy each time on a randomly chosen subsets of the data. Straight lines indicate the variance of accuracies. (a)-(e) represent the five different subjects.

Download Full Size | PDF

Fig. 5. Accuracy rates of classifying correctly each class as opposed to false negative classification rate. (a)-(e). Five different subjects.

Download Full Size | PDF

Fig. 6. Confusion matrix and classification report on our test-set with trained CNN model.

Download Full Size | PDF

5. Conclusions

In this paper, we have presented a novel approach with its experimental validation in which photonic speckle patterns-based authentication was demonstrated on small-predefined number of classes. This problem is applicable for smart homes and smart car applications. The authentication was performed according to static speckles and it showed high accuracy performance in separation between the different classes also in different conditions of measurements.

As mentioned in the paper, the last layer of each model is fully connected layer. It means that the model knows a “closed world” that includes only the five subjects it was trained on. Therefore, for example, if a sixth person arrives, the model will be classified into one of the five subjects he is trained on. The ideal for us is that the input of the sixth person that is entered to the model will give a low enough score that can be rejected. The different physical circumstances and facial expressions are designed to represent multiple options of a person (subject) at any given moment. That is, to create a variety of information for each object (to avoid correlation in data of each person / subject). We are using this diverse data to train a robust model that can predict a person in every variation we have trained on (smiling, angry, after gymnastics, etc.).

It must be said that dynamic speckles using long short-term memory based neural networks can further improve the performance.

The main criterion of the proposed model is first and foremost the usefulness it provides in driving insight into questions not easily determined. In case of speckle patterns that are hardly discernable for human eyes, the use of neural networks opens a new avenue of research with a lower entry barrier and a wide array of applications to be assimilated into.

The full code employed in our research including both architectures can be found in GitHub© repository at: https://github.com/zeevikal/speckles-classification. In addition, you can run a custom training on your own data with both architectures and with your configuration and hyper-parameters, as can be seen in Fig. 7.

Fig. 7. An example of a training process configuration file, with relevant variables and hyper-parameters. In this specific example the values in “conf_list” are” input shape, 1^st layer filter size, 2^nd layer filter size and the dropout layer fraction number of input units to drop.

Download Full Size | PDF

Using no extraordinary tailored model for the problem and without prior knowledge regarding speckles properties a result such as this is very promising and investigating further features of identifying facial features correlated with sentiment and advanced physical conditions can be expounded further in the future.

Disclosures

The authors declare that they do not have any conflicts-of-interest in respect to the research described in this paper.

References

1. J. Daugman, “High confidence visual recognition of persons by a test of statistical independence,” IEEE T. Pattern Anal. 15(11), 1148–1161 (1993). [CrossRef]

2. Y. Zhu, T. Tan, and Y. Wang, “Biometric personal identification based on iris patterns,” Proceedings 15th International Conference on Pattern Recognition 2 (2000), pp. 801–804 .

3. H. Zhaofeng, Z. Sun, T. Tan, X. Qiu, C. Zhong, and W. Dong, “Boosting Ordinal Features for Accurate and Fast Iris Recognition,” Proc. of the 26th IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008), pp. 1–8 .

4. R. Brunelli and T. Poggio, “Face recognition: features versus templates,” IEEE T. Pattern Anal. 15(10), 1042–1052 (1993). [CrossRef]

5. A. M. Bronstein, M. M. Bronstein, and R. Kimmel, “Three-Dimensional Face Recognition,” Int. J. Comput. Vis. 64(1), 5–30 (2005). [CrossRef]

6. D. A. Socolinsky and A. Selinger, “Thermal face recognition in an operational scenario,” Proceedings of 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2004), pp. 1012–1019 .

7. L. Maier-Hein, A. Reinke, M. Kozubek, A. L. Martel, T. Arbel, M. Eisenmann, and J. Saez-Rodriguez, “BIAS: Transparent reporting of biomedical image analysis challenges,” arXiv preprint arXiv:1910.04071 (2019).

8. Pengyi Zhang, Y. Zhong, Y. Deng, Z. Tang, and X. Li, “A Survey on Deep Learning of Small Sample in Biomedical Image Analysis,” arXiv preprint arXiv:1908.00473 (2019).

9. Debdoot Sheet, S. P. Krishna Karri, A. Katouzian, N. Navab, A. K. Ray, and J. Chatterjee, “Deep learning of tissue specific speckle representations in optical coherence tomography and deeper exploration for in situ histology,” 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI) (IEEE, 2015).

10. Y. Rivenson, Z. Göröcs, H. Günaydin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica 4(11), 1437–1443 (2017). [CrossRef]

11. J. C. Dainty, Laser Speckle and Related Phenomena, 2nd ed. (Springer-Verlag, Berlin, 1989).

12. Z. Zalevsky, Y. Beiderman, I. Margalit, S. Gingold, M. Teicher, V. Mico, and J. Garcia, “Simultaneous remote extraction of multiple speech sources and heart beats from secondary speckles pattern,” Opt. Express 17(24), 21566–21580 (2009). [CrossRef]

13. J. Schmidhuber, “Deep Learning in Neural Networks: An Overview,” Neural Networks 61, 85–117 (2015). [CrossRef]

14. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Commun. ACM 60(6), 84–90 (2017). [CrossRef]

15. S. Kleene, “Representation of Events in Nerve Nets and Finite Automata,” Automata Studies (AM-34) 34, 3–42 (1956). [CrossRef]

16. V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricted Boltzmann Machines,” Proceedings of ICML, 807–814 (2010).

17. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016), p. 196.

18. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, 2009).

19. Sergey Ioffe and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167 (2015).

20. D. M. Hawkins, “The Problem of Overfitting,” J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004). [CrossRef]

21. G. E. Hinton, N. Srivastava, A. Krishevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv:1207.0580v1 (2012).

22. Stanford’s CS231n Convolutional Neural Networks course (2018): https://cs231n.github.io/convolutional-networks/

23. Benjamin Graham. “Fractional max-pooling,” arXiv preprint arXiv:1412.6071 (2014).

Photonic human identification based on deep learning of back scattered laser speckle patterns

Abstract

1. Introduction

2. Experiment setup

3. Deep learning approach

3.1 Method overview

3.2 Models for speckle patterns analysis

3.2.1 Simple neural networks

3.2.2 Convolutional neural networks

4. Experimental results

5. Conclusions

Disclosures

References

Cited By

Figures (7)

Equations (2)

Optics Express