Abstract

We present spatial-temporal human gesture recognition in degraded conditions including low light levels and occlusions using passive sensing three-dimensional (3D) integral imaging (InIm) system and 3D correlation filters. The 4D (lateral, longitudinal, and temporal) reconstructed data is processed using a variety of algorithms including linear and non-linear distortion-invariant filters; and compared with previously reported space-time interest points (STIP) feature detector, 3D histogram of oriented gradients (3D HOG) feature descriptor, with a standard bag-of-features support vector machine (SVM) framework, etc. The gesture recognition results with different classification algorithms are compared using a variety of performance metrics such as receiver operating characteristic (ROC) curves, area under the curve (AUC), SNR, the probability of classification errors, and confusion matrix. Integral imaging video sequences of human gestures are captured under degraded conditions such as low light illumination and in the presence of partial occlusions. A four-dimensional (4D) reconstructed video sequence is computed that provides lateral and depth information of a scene over time i.e. (x, y, z, t). The total-variation denoising algorithm is applied to the signal to further reduce noise and preserve data in the video frames. We show that the 4D signal consists of decreased scene noise, partial occlusion removal, and improved SNR due to the computational InIm and/or denoising algorithms. Finally, gesture recognition is processed with classification algorithms, such as distortion-invariant correlation filters; and STIP, 3D HOG with SVM, which are applied to the reconstructed 4D gesture signal to classify the human gesture. Experiments are conducted using a synthetic aperture InIm system in ambient light. Our experiments indicate that the proposed approach is promising in detection of human gestures in degraded conditions such as low illumination conditions with partial occlusion. To the best of our knowledge, this is the first report on spatial-temporal human gesture recognition in degraded conditions using passive sensing 4D integral imaging with nonlinear correlation filters.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

Full Article  |  PDF Article
OSA Recommended Articles
Human gesture recognition using three-dimensional integral imaging

V. Javier Traver, Pedro Latorre-Carmona, Eva Salvador-Balaguer, Filiberto Pla, and Bahram Javidi
J. Opt. Soc. Am. A 31(10) 2312-2320 (2014)

Three-dimensional integral imaging and object detection using long-wave infrared imaging

Satoru Komatsu, Adam Markman, Abhijit Mahalanobis, Kenny Chen, and Bahram Javidi
Appl. Opt. 56(9) D120-D126 (2017)

References

  • View by:
  • |
  • |
  • |

  1. K. R. Gibson and T. Ingold, Tools, language and cognition in human evolution (Cambridge University Press, 1994).
  2. S. Mitra and T. Acharya, “Gesture Recognition: A Survey,” IEEE Trans. Syst. Man Cybern. C 37(3), 311–324 (2007).
    [Crossref]
  3. T. B. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vis. Image Underst. 81(3), 231–268 (2001).
    [Crossref]
  4. J. Aggarwal and M. Ryoo, “Human activity analysis: A review,” ACM Comput. Surv. 43, 1–43 (2011).
  5. D. Weinland, R. Ronfard, and E. Boyer, “A survey of vision-based methods for action representation, segmentation and recognition,” Comput. Vis. Image Underst. 115(2), 224–241 (2011).
    [Crossref]
  6. J. M. Chaquet, E. J. Carmona, and A. Fernández-Caballero, “A survey of video datasets for human action and activity recognition,” Comput. Vis. Image Underst. 117(6), 633–659 (2013).
    [Crossref]
  7. C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: Real-time tracking of the human body,” IEEE Trans. Pattern Anal. 19(7), 780–785 (1997).
    [Crossref]
  8. Y. Wu and T.S. Huang, “Vision-based gesture recognition: a review,” Lect. Notes Comput. 1739, 103-115 (1999).
  9. F. A. Sadjadi and A. Mahalanobis, “Target-adaptive polarimetric synthetic aperture radar target discrimination using maximum average correlation height filters,” Appl. Opt. 45(13), 3063–3070 (2006).
    [Crossref] [PubMed]
  10. S. Ali and S. Lucey, “Are correlation filters useful for human action recognition?” in ICPR, IEEE, 2608–2611 (2010).
  11. A. Mahalanobis, R. Stanfill, and K. Chen, “A Bayesian approach to activity detection in video using multi-frame correlation filters,” Proc. SPIE 8049, 8049OP (2011).
  12. H. Kiani, T. Sim, and S. Lucey, “Multi-channel correlation filters for human action recognition,” Image Processing (ICIP), IEEE International Conference, 1485–1489, (2014).
    [Crossref]
  13. L. Chen, W. Hong, and J. Ferryman, “A survey of human motion analysis using depth imager,” Pattern Recognit. Lett. 34(15), 1995–2006 (2013).
    [Crossref]
  14. L. L. Presti and M. La Cascia, “3D skeleton-based human action classification: A survey,” Pattern Recognit. 53, 130–147 (2016).
    [Crossref]
  15. G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theoretical Appl. 7(1), 821–825 (1908).
    [Crossref]
  16. H. E. Ives, “Optical properties of a Lippmann lenticuled sheet,” J. Opt. Soc. Am. 21(3), 171–176 (1931).
    [Crossref]
  17. C. B. Burckhardt, “Optimum parameters and resolution limitation of integral photography,” J. Opt. Soc. Am. 58(1), 71–76 (1968).
    [Crossref]
  18. Y. Igarishi, H. Murata, and M. Ueda, “3D display system using a computer-generated integral photograph,” Jpn. J. Appl. Phys. 17(9), 1683–1684 (1978).
    [Crossref]
  19. T. Okoshi, “Three-dimensional displays,” Proc. IEEE 68(5), 548–564 (1980).
    [Crossref]
  20. J. Arai, F. Okano, H. Hoshino, and I. Yuyama, “Gradient-index lens-array method based on real-time integral photography for three-dimensional images,” Appl. Opt. 37(11), 2034–2045 (1998).
    [Crossref] [PubMed]
  21. H. Hoshino, F. Okano, H. Isono, and I. Yuyama, “Analysis of resolution limitation of integral photography,” J. Opt. Soc. Am. A 15(8), 2059–2065 (1998).
    [Crossref]
  22. F. Okano, J. Arai, K. Mitani, and M. Okui, “Real-time integral imaging based on extremely high resolution video system,” Proc. IEEE 94(3), 490–501 (2006).
    [Crossref]
  23. X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, “Advances in three-dimensional integral imaging: Sensing, display, and applications,” Appl. Opt. 52(4), 546–560 (2013).
    [Crossref] [PubMed]
  24. B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
    [Crossref]
  25. B. Javidi, R. Ponce-Díaz, and S. H. Hong, “Three-dimensional recognition of occluded objects by using computational integral imaging,” Opt. Lett. 31(8), 1106–1108 (2006).
    [Crossref] [PubMed]
  26. A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral under low light levels,” IEEE Photonics J. 4(4), 1188–1195 (2012).
    [Crossref]
  27. A. Markman, X. Shen, and B. Javidi, “Three-dimensional object visualization and detection in low light illumination using integral imaging,” Opt. Lett. 42(16), 3068–3071 (2017).
    [Crossref] [PubMed]
  28. I. Moon and B. Javidi, “Three-dimensional visualization of objects in scattering medium by use of computational integral imaging,” Opt. Express 16(17), 13080–13089 (2008).
    [Crossref] [PubMed]
  29. Y.-K. Lee and H. Yoo, “Three-dimensional visualization of objects in scattering medium using integral imaging and spectral analysis,” Opt. Lasers Eng. 77, 31–38 (2016).
    [Crossref]
  30. M. Cho and B. Javidi, “Peplography-a passive 3D photon counting imaging through scattering media,” Opt. Lett. 41(22), 5401–5404 (2016).
    [Crossref] [PubMed]
  31. J.-S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. 27(13), 1144–1146 (2002).
    [Crossref] [PubMed]
  32. S. H. Hong, J. S. Jang, and B. Javidi, “Three-dimensional volumetric object reconstruction using computational integral imaging,” Opt. Express 12(3), 483–491 (2004).
    [Crossref] [PubMed]
  33. H. Yoo, “Artifact analysis and image enhancement in three-dimensional computational integral imaging using smooth windowing technique,” Opt. Lett. 36(11), 2107–2109 (2011).
    [Crossref] [PubMed]
  34. V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Human gesture recognition using three-dimensional integral imaging,” J. Opt. Soc. Am. A 31(10), 2312–2320 (2014).
    [Crossref] [PubMed]
  35. V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Three-dimensional Integral Imaging for Gesture Recognition under Occlusions,” IEEE Signal Process. Lett. 22(2), 171–175 (2017).
    [Crossref]
  36. I. Laptev, “On space-time interest points,” Int. J. Comput. Vis. 64(2–3), 107–123 (2005).
    [Crossref]
  37. A. Klaser, M. Marszałek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients,” in BMVC 19th British Machine Vision Conference (2008), pp. 275.
  38. R. Dupre, V. Argyriou, D. Greenhill, and G. Tzimiropoulos, “A 3D Scene Analysis Framework and Descriptors for Risk Evaluation,” 2015 International Conference on 3D Vision, Lyon, 100–108 (2015).
    [Crossref]
  39. H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, “Evaluation of local spatio-temporal features for action recognition,” In BMVC 2009-British Machine Vision Conference 124–1 (2009).
    [Crossref]
  40. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D 60(1–4), 259–268 (1992).
    [Crossref]
  41. B. Javidi and J. Wang, “Optimum distortion-invariant filter for detecting a noisy distorted target in nonoverlapping background noise,” J. Opt. Soc. Am. A 12(12), 2604–2614 (1995).
    [Crossref]
  42. B. Javidi and D. Painchaud, “Distortion-invariant pattern recognition with Fourier-plane nonlinear filters,” Appl. Opt. 35(2), 318–331 (1996).
    [Crossref] [PubMed]
  43. B. Javidi, “Nonlinear joint power spectrum based optical correlation,” Appl. Opt. 28(12), 2358–2367 (1989).
    [Crossref] [PubMed]

2017 (3)

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

A. Markman, X. Shen, and B. Javidi, “Three-dimensional object visualization and detection in low light illumination using integral imaging,” Opt. Lett. 42(16), 3068–3071 (2017).
[Crossref] [PubMed]

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Three-dimensional Integral Imaging for Gesture Recognition under Occlusions,” IEEE Signal Process. Lett. 22(2), 171–175 (2017).
[Crossref]

2016 (3)

Y.-K. Lee and H. Yoo, “Three-dimensional visualization of objects in scattering medium using integral imaging and spectral analysis,” Opt. Lasers Eng. 77, 31–38 (2016).
[Crossref]

M. Cho and B. Javidi, “Peplography-a passive 3D photon counting imaging through scattering media,” Opt. Lett. 41(22), 5401–5404 (2016).
[Crossref] [PubMed]

L. L. Presti and M. La Cascia, “3D skeleton-based human action classification: A survey,” Pattern Recognit. 53, 130–147 (2016).
[Crossref]

2014 (1)

2013 (3)

X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, “Advances in three-dimensional integral imaging: Sensing, display, and applications,” Appl. Opt. 52(4), 546–560 (2013).
[Crossref] [PubMed]

L. Chen, W. Hong, and J. Ferryman, “A survey of human motion analysis using depth imager,” Pattern Recognit. Lett. 34(15), 1995–2006 (2013).
[Crossref]

J. M. Chaquet, E. J. Carmona, and A. Fernández-Caballero, “A survey of video datasets for human action and activity recognition,” Comput. Vis. Image Underst. 117(6), 633–659 (2013).
[Crossref]

2012 (1)

A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral under low light levels,” IEEE Photonics J. 4(4), 1188–1195 (2012).
[Crossref]

2011 (4)

H. Yoo, “Artifact analysis and image enhancement in three-dimensional computational integral imaging using smooth windowing technique,” Opt. Lett. 36(11), 2107–2109 (2011).
[Crossref] [PubMed]

J. Aggarwal and M. Ryoo, “Human activity analysis: A review,” ACM Comput. Surv. 43, 1–43 (2011).

D. Weinland, R. Ronfard, and E. Boyer, “A survey of vision-based methods for action representation, segmentation and recognition,” Comput. Vis. Image Underst. 115(2), 224–241 (2011).
[Crossref]

A. Mahalanobis, R. Stanfill, and K. Chen, “A Bayesian approach to activity detection in video using multi-frame correlation filters,” Proc. SPIE 8049, 8049OP (2011).

2008 (1)

2007 (1)

S. Mitra and T. Acharya, “Gesture Recognition: A Survey,” IEEE Trans. Syst. Man Cybern. C 37(3), 311–324 (2007).
[Crossref]

2006 (3)

2005 (1)

I. Laptev, “On space-time interest points,” Int. J. Comput. Vis. 64(2–3), 107–123 (2005).
[Crossref]

2004 (1)

2002 (1)

2001 (1)

T. B. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vis. Image Underst. 81(3), 231–268 (2001).
[Crossref]

1999 (1)

Y. Wu and T.S. Huang, “Vision-based gesture recognition: a review,” Lect. Notes Comput. 1739, 103-115 (1999).

1998 (2)

1997 (1)

C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: Real-time tracking of the human body,” IEEE Trans. Pattern Anal. 19(7), 780–785 (1997).
[Crossref]

1996 (1)

1995 (1)

1992 (1)

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D 60(1–4), 259–268 (1992).
[Crossref]

1989 (1)

1980 (1)

T. Okoshi, “Three-dimensional displays,” Proc. IEEE 68(5), 548–564 (1980).
[Crossref]

1978 (1)

Y. Igarishi, H. Murata, and M. Ueda, “3D display system using a computer-generated integral photograph,” Jpn. J. Appl. Phys. 17(9), 1683–1684 (1978).
[Crossref]

1968 (1)

1931 (1)

1908 (1)

G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theoretical Appl. 7(1), 821–825 (1908).
[Crossref]

Acharya, T.

S. Mitra and T. Acharya, “Gesture Recognition: A Survey,” IEEE Trans. Syst. Man Cybern. C 37(3), 311–324 (2007).
[Crossref]

Aggarwal, J.

J. Aggarwal and M. Ryoo, “Human activity analysis: A review,” ACM Comput. Surv. 43, 1–43 (2011).

Aloni, D.

A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral under low light levels,” IEEE Photonics J. 4(4), 1188–1195 (2012).
[Crossref]

Arai, J.

F. Okano, J. Arai, K. Mitani, and M. Okui, “Real-time integral imaging based on extremely high resolution video system,” Proc. IEEE 94(3), 490–501 (2006).
[Crossref]

J. Arai, F. Okano, H. Hoshino, and I. Yuyama, “Gradient-index lens-array method based on real-time integral photography for three-dimensional images,” Appl. Opt. 37(11), 2034–2045 (1998).
[Crossref] [PubMed]

Azarbayejani, A.

C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: Real-time tracking of the human body,” IEEE Trans. Pattern Anal. 19(7), 780–785 (1997).
[Crossref]

Boyer, E.

D. Weinland, R. Ronfard, and E. Boyer, “A survey of vision-based methods for action representation, segmentation and recognition,” Comput. Vis. Image Underst. 115(2), 224–241 (2011).
[Crossref]

Burckhardt, C. B.

Carmona, E. J.

J. M. Chaquet, E. J. Carmona, and A. Fernández-Caballero, “A survey of video datasets for human action and activity recognition,” Comput. Vis. Image Underst. 117(6), 633–659 (2013).
[Crossref]

Chaquet, J. M.

J. M. Chaquet, E. J. Carmona, and A. Fernández-Caballero, “A survey of video datasets for human action and activity recognition,” Comput. Vis. Image Underst. 117(6), 633–659 (2013).
[Crossref]

Chen, K.

A. Mahalanobis, R. Stanfill, and K. Chen, “A Bayesian approach to activity detection in video using multi-frame correlation filters,” Proc. SPIE 8049, 8049OP (2011).

Chen, L.

L. Chen, W. Hong, and J. Ferryman, “A survey of human motion analysis using depth imager,” Pattern Recognit. Lett. 34(15), 1995–2006 (2013).
[Crossref]

Cho, M.

Darrell, T.

C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: Real-time tracking of the human body,” IEEE Trans. Pattern Anal. 19(7), 780–785 (1997).
[Crossref]

Fatemi, E.

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D 60(1–4), 259–268 (1992).
[Crossref]

Fernández-Caballero, A.

J. M. Chaquet, E. J. Carmona, and A. Fernández-Caballero, “A survey of video datasets for human action and activity recognition,” Comput. Vis. Image Underst. 117(6), 633–659 (2013).
[Crossref]

Ferryman, J.

L. Chen, W. Hong, and J. Ferryman, “A survey of human motion analysis using depth imager,” Pattern Recognit. Lett. 34(15), 1995–2006 (2013).
[Crossref]

Granum, E.

T. B. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vis. Image Underst. 81(3), 231–268 (2001).
[Crossref]

Hong, S. H.

Hong, W.

L. Chen, W. Hong, and J. Ferryman, “A survey of human motion analysis using depth imager,” Pattern Recognit. Lett. 34(15), 1995–2006 (2013).
[Crossref]

Hoshino, H.

Huang, T.S.

Y. Wu and T.S. Huang, “Vision-based gesture recognition: a review,” Lect. Notes Comput. 1739, 103-115 (1999).

Huang, Y. P.

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

Igarishi, Y.

Y. Igarishi, H. Murata, and M. Ueda, “3D display system using a computer-generated integral photograph,” Jpn. J. Appl. Phys. 17(9), 1683–1684 (1978).
[Crossref]

Isono, H.

Ives, H. E.

Jang, J. S.

Jang, J.-S.

Javidi, B.

A. Markman, X. Shen, and B. Javidi, “Three-dimensional object visualization and detection in low light illumination using integral imaging,” Opt. Lett. 42(16), 3068–3071 (2017).
[Crossref] [PubMed]

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Three-dimensional Integral Imaging for Gesture Recognition under Occlusions,” IEEE Signal Process. Lett. 22(2), 171–175 (2017).
[Crossref]

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

M. Cho and B. Javidi, “Peplography-a passive 3D photon counting imaging through scattering media,” Opt. Lett. 41(22), 5401–5404 (2016).
[Crossref] [PubMed]

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Human gesture recognition using three-dimensional integral imaging,” J. Opt. Soc. Am. A 31(10), 2312–2320 (2014).
[Crossref] [PubMed]

X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, “Advances in three-dimensional integral imaging: Sensing, display, and applications,” Appl. Opt. 52(4), 546–560 (2013).
[Crossref] [PubMed]

A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral under low light levels,” IEEE Photonics J. 4(4), 1188–1195 (2012).
[Crossref]

I. Moon and B. Javidi, “Three-dimensional visualization of objects in scattering medium by use of computational integral imaging,” Opt. Express 16(17), 13080–13089 (2008).
[Crossref] [PubMed]

B. Javidi, R. Ponce-Díaz, and S. H. Hong, “Three-dimensional recognition of occluded objects by using computational integral imaging,” Opt. Lett. 31(8), 1106–1108 (2006).
[Crossref] [PubMed]

S. H. Hong, J. S. Jang, and B. Javidi, “Three-dimensional volumetric object reconstruction using computational integral imaging,” Opt. Express 12(3), 483–491 (2004).
[Crossref] [PubMed]

J.-S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. 27(13), 1144–1146 (2002).
[Crossref] [PubMed]

B. Javidi and D. Painchaud, “Distortion-invariant pattern recognition with Fourier-plane nonlinear filters,” Appl. Opt. 35(2), 318–331 (1996).
[Crossref] [PubMed]

B. Javidi and J. Wang, “Optimum distortion-invariant filter for detecting a noisy distorted target in nonoverlapping background noise,” J. Opt. Soc. Am. A 12(12), 2604–2614 (1995).
[Crossref]

B. Javidi, “Nonlinear joint power spectrum based optical correlation,” Appl. Opt. 28(12), 2358–2367 (1989).
[Crossref] [PubMed]

Javier Traver, V.

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Three-dimensional Integral Imaging for Gesture Recognition under Occlusions,” IEEE Signal Process. Lett. 22(2), 171–175 (2017).
[Crossref]

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Human gesture recognition using three-dimensional integral imaging,” J. Opt. Soc. Am. A 31(10), 2312–2320 (2014).
[Crossref] [PubMed]

Klaser, A.

A. Klaser, M. Marszałek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients,” in BMVC 19th British Machine Vision Conference (2008), pp. 275.

La Cascia, M.

L. L. Presti and M. La Cascia, “3D skeleton-based human action classification: A survey,” Pattern Recognit. 53, 130–147 (2016).
[Crossref]

Laptev, I.

I. Laptev, “On space-time interest points,” Int. J. Comput. Vis. 64(2–3), 107–123 (2005).
[Crossref]

Latorre-Carmona, P.

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Three-dimensional Integral Imaging for Gesture Recognition under Occlusions,” IEEE Signal Process. Lett. 22(2), 171–175 (2017).
[Crossref]

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Human gesture recognition using three-dimensional integral imaging,” J. Opt. Soc. Am. A 31(10), 2312–2320 (2014).
[Crossref] [PubMed]

Lee, Y.-K.

Y.-K. Lee and H. Yoo, “Three-dimensional visualization of objects in scattering medium using integral imaging and spectral analysis,” Opt. Lasers Eng. 77, 31–38 (2016).
[Crossref]

Lippmann, G.

G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theoretical Appl. 7(1), 821–825 (1908).
[Crossref]

Mahalanobis, A.

A. Mahalanobis, R. Stanfill, and K. Chen, “A Bayesian approach to activity detection in video using multi-frame correlation filters,” Proc. SPIE 8049, 8049OP (2011).

F. A. Sadjadi and A. Mahalanobis, “Target-adaptive polarimetric synthetic aperture radar target discrimination using maximum average correlation height filters,” Appl. Opt. 45(13), 3063–3070 (2006).
[Crossref] [PubMed]

Markman, A.

Markman, A. S.

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

Marszalek, M.

A. Klaser, M. Marszałek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients,” in BMVC 19th British Machine Vision Conference (2008), pp. 275.

Martinez-Corral, M.

Martínez-Corral, M.

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

Martínez-Uso, A.

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

Mitani, K.

F. Okano, J. Arai, K. Mitani, and M. Okui, “Real-time integral imaging based on extremely high resolution video system,” Proc. IEEE 94(3), 490–501 (2006).
[Crossref]

Mitra, S.

S. Mitra and T. Acharya, “Gesture Recognition: A Survey,” IEEE Trans. Syst. Man Cybern. C 37(3), 311–324 (2007).
[Crossref]

Moeslund, T. B.

T. B. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vis. Image Underst. 81(3), 231–268 (2001).
[Crossref]

Moon, I.

Murata, H.

Y. Igarishi, H. Murata, and M. Ueda, “3D display system using a computer-generated integral photograph,” Jpn. J. Appl. Phys. 17(9), 1683–1684 (1978).
[Crossref]

Okano, F.

Okoshi, T.

T. Okoshi, “Three-dimensional displays,” Proc. IEEE 68(5), 548–564 (1980).
[Crossref]

Okui, M.

F. Okano, J. Arai, K. Mitani, and M. Okui, “Real-time integral imaging based on extremely high resolution video system,” Proc. IEEE 94(3), 490–501 (2006).
[Crossref]

Osher, S.

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D 60(1–4), 259–268 (1992).
[Crossref]

Painchaud, D.

Pentland, A. P.

C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: Real-time tracking of the human body,” IEEE Trans. Pattern Anal. 19(7), 780–785 (1997).
[Crossref]

Pla, F.

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Three-dimensional Integral Imaging for Gesture Recognition under Occlusions,” IEEE Signal Process. Lett. 22(2), 171–175 (2017).
[Crossref]

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Human gesture recognition using three-dimensional integral imaging,” J. Opt. Soc. Am. A 31(10), 2312–2320 (2014).
[Crossref] [PubMed]

Ponce-Díaz, R.

Presti, L. L.

L. L. Presti and M. La Cascia, “3D skeleton-based human action classification: A survey,” Pattern Recognit. 53, 130–147 (2016).
[Crossref]

Ronfard, R.

D. Weinland, R. Ronfard, and E. Boyer, “A survey of vision-based methods for action representation, segmentation and recognition,” Comput. Vis. Image Underst. 115(2), 224–241 (2011).
[Crossref]

Rudin, L. I.

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D 60(1–4), 259–268 (1992).
[Crossref]

Ryoo, M.

J. Aggarwal and M. Ryoo, “Human activity analysis: A review,” ACM Comput. Surv. 43, 1–43 (2011).

Saavedra, G.

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

Sadjadi, F. A.

Salvador-Balaguer, E.

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Three-dimensional Integral Imaging for Gesture Recognition under Occlusions,” IEEE Signal Process. Lett. 22(2), 171–175 (2017).
[Crossref]

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Human gesture recognition using three-dimensional integral imaging,” J. Opt. Soc. Am. A 31(10), 2312–2320 (2014).
[Crossref] [PubMed]

Schmid, C.

A. Klaser, M. Marszałek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients,” in BMVC 19th British Machine Vision Conference (2008), pp. 275.

Shen, X.

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

A. Markman, X. Shen, and B. Javidi, “Three-dimensional object visualization and detection in low light illumination using integral imaging,” Opt. Lett. 42(16), 3068–3071 (2017).
[Crossref] [PubMed]

Sotoca, J. M.

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

Stanfill, R.

A. Mahalanobis, R. Stanfill, and K. Chen, “A Bayesian approach to activity detection in video using multi-frame correlation filters,” Proc. SPIE 8049, 8049OP (2011).

Stern, A.

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, “Advances in three-dimensional integral imaging: Sensing, display, and applications,” Appl. Opt. 52(4), 546–560 (2013).
[Crossref] [PubMed]

A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral under low light levels,” IEEE Photonics J. 4(4), 1188–1195 (2012).
[Crossref]

Ueda, M.

Y. Igarishi, H. Murata, and M. Ueda, “3D display system using a computer-generated integral photograph,” Jpn. J. Appl. Phys. 17(9), 1683–1684 (1978).
[Crossref]

Wang, J.

Weinland, D.

D. Weinland, R. Ronfard, and E. Boyer, “A survey of vision-based methods for action representation, segmentation and recognition,” Comput. Vis. Image Underst. 115(2), 224–241 (2011).
[Crossref]

Wren, C. R.

C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: Real-time tracking of the human body,” IEEE Trans. Pattern Anal. 19(7), 780–785 (1997).
[Crossref]

Wu, Y.

Y. Wu and T.S. Huang, “Vision-based gesture recognition: a review,” Lect. Notes Comput. 1739, 103-115 (1999).

Xiao, X.

Yoo, H.

Y.-K. Lee and H. Yoo, “Three-dimensional visualization of objects in scattering medium using integral imaging and spectral analysis,” Opt. Lasers Eng. 77, 31–38 (2016).
[Crossref]

H. Yoo, “Artifact analysis and image enhancement in three-dimensional computational integral imaging using smooth windowing technique,” Opt. Lett. 36(11), 2107–2109 (2011).
[Crossref] [PubMed]

Yuyama, I.

ACM Comput. Surv. (1)

J. Aggarwal and M. Ryoo, “Human activity analysis: A review,” ACM Comput. Surv. 43, 1–43 (2011).

Appl. Opt. (5)

Comput. Vis. Image Underst. (3)

D. Weinland, R. Ronfard, and E. Boyer, “A survey of vision-based methods for action representation, segmentation and recognition,” Comput. Vis. Image Underst. 115(2), 224–241 (2011).
[Crossref]

J. M. Chaquet, E. J. Carmona, and A. Fernández-Caballero, “A survey of video datasets for human action and activity recognition,” Comput. Vis. Image Underst. 117(6), 633–659 (2013).
[Crossref]

T. B. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vis. Image Underst. 81(3), 231–268 (2001).
[Crossref]

IEEE Photonics J. (1)

A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral under low light levels,” IEEE Photonics J. 4(4), 1188–1195 (2012).
[Crossref]

IEEE Signal Process. Lett. (1)

V. Javier Traver, P. Latorre-Carmona, E. Salvador-Balaguer, F. Pla, and B. Javidi, “Three-dimensional Integral Imaging for Gesture Recognition under Occlusions,” IEEE Signal Process. Lett. 22(2), 171–175 (2017).
[Crossref]

IEEE Trans. Pattern Anal. (1)

C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: Real-time tracking of the human body,” IEEE Trans. Pattern Anal. 19(7), 780–785 (1997).
[Crossref]

IEEE Trans. Syst. Man Cybern. C (1)

S. Mitra and T. Acharya, “Gesture Recognition: A Survey,” IEEE Trans. Syst. Man Cybern. C 37(3), 311–324 (2007).
[Crossref]

Int. J. Comput. Vis. (1)

I. Laptev, “On space-time interest points,” Int. J. Comput. Vis. 64(2–3), 107–123 (2005).
[Crossref]

J. Opt. Soc. Am. (2)

J. Opt. Soc. Am. A (3)

J. Phys. Theoretical Appl. (1)

G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theoretical Appl. 7(1), 821–825 (1908).
[Crossref]

Jpn. J. Appl. Phys. (1)

Y. Igarishi, H. Murata, and M. Ueda, “3D display system using a computer-generated integral photograph,” Jpn. J. Appl. Phys. 17(9), 1683–1684 (1978).
[Crossref]

Lect. Notes Comput. (1)

Y. Wu and T.S. Huang, “Vision-based gesture recognition: a review,” Lect. Notes Comput. 1739, 103-115 (1999).

Opt. Express (2)

Opt. Lasers Eng. (1)

Y.-K. Lee and H. Yoo, “Three-dimensional visualization of objects in scattering medium using integral imaging and spectral analysis,” Opt. Lasers Eng. 77, 31–38 (2016).
[Crossref]

Opt. Lett. (5)

Pattern Recognit. (1)

L. L. Presti and M. La Cascia, “3D skeleton-based human action classification: A survey,” Pattern Recognit. 53, 130–147 (2016).
[Crossref]

Pattern Recognit. Lett. (1)

L. Chen, W. Hong, and J. Ferryman, “A survey of human motion analysis using depth imager,” Pattern Recognit. Lett. 34(15), 1995–2006 (2013).
[Crossref]

Physica D (1)

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D 60(1–4), 259–268 (1992).
[Crossref]

Proc. IEEE (3)

T. Okoshi, “Three-dimensional displays,” Proc. IEEE 68(5), 548–564 (1980).
[Crossref]

F. Okano, J. Arai, K. Mitani, and M. Okui, “Real-time integral imaging based on extremely high resolution video system,” Proc. IEEE 94(3), 490–501 (2006).
[Crossref]

B. Javidi, X. Shen, A. S. Markman, P. Latorre-Carmona, A. Martínez-Uso, J. M. Sotoca, F. Pla, M. Martínez-Corral, G. Saavedra, Y. P. Huang, and A. Stern, “Multidimensional Optical Sensing and Imaging System (MOSIS): From Macroscales to Microscales,” Proc. IEEE 105(5), 850–875 (2017).
[Crossref]

Proc. SPIE (1)

A. Mahalanobis, R. Stanfill, and K. Chen, “A Bayesian approach to activity detection in video using multi-frame correlation filters,” Proc. SPIE 8049, 8049OP (2011).

Other (6)

H. Kiani, T. Sim, and S. Lucey, “Multi-channel correlation filters for human action recognition,” Image Processing (ICIP), IEEE International Conference, 1485–1489, (2014).
[Crossref]

K. R. Gibson and T. Ingold, Tools, language and cognition in human evolution (Cambridge University Press, 1994).

S. Ali and S. Lucey, “Are correlation filters useful for human action recognition?” in ICPR, IEEE, 2608–2611 (2010).

A. Klaser, M. Marszałek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients,” in BMVC 19th British Machine Vision Conference (2008), pp. 275.

R. Dupre, V. Argyriou, D. Greenhill, and G. Tzimiropoulos, “A 3D Scene Analysis Framework and Descriptors for Risk Evaluation,” 2015 International Conference on 3D Vision, Lyon, 100–108 (2015).
[Crossref]

H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, “Evaluation of local spatio-temporal features for action recognition,” In BMVC 2009-British Machine Vision Conference 124–1 (2009).
[Crossref]

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1
Fig. 1 Concept of 3D integral imaging: (a) Synthetic aperture integral imaging (SAII) [31], and (b) computational reconstruction of integral imaging [32].
Fig. 2
Fig. 2 Flow chart of the proposed 3D correlation method for human gesture recognition. SAII = synthetic aperture integral imaging; FFT = Fast Fourier Transform; Thresh. = threshold. POE = correlation peak-to-output-energy ratio [see Eq. (7)].
Fig. 3
Fig. 3 (a) A 3 × 3 camera array for synthetic aperture integral imaging (SAII) in the human gesture recognition experiment. (b) Examples of the training video frames. (c) Example of a true class test video frame. (d) Examples of the false class test gestures.
Fig. 4
Fig. 4 Example of partially occluded human gesture under regular ambient illumination conditions: (a) (i-iii) Three captured 2D elemental images from a single video sequence, and (b) (i-iii) corresponding 4D (x, y, t; z) integral imaging reconstructed frames with occlusion removal at a fixed reconstruction depth z.
Fig. 5
Fig. 5 Image frames for partially occluded human finger gesture [see Fig. 4] under low light illumination conditions: (a) (i-iii) Three separate 2D elemental images from the captured video sequence, (b) (i-iii) 2D elemental images after the total variation (TV) denoising algorithm, (c) (i-iii) integral imaging reconstructed images using 9 perspective video data sets, and (d) (i-iii) integral imaging reconstructed images with the TV denoising algorithm. [a-d (iv)] One dimensional intensity profiles of the finger along the yellow lines in [a-d (ii)].
Fig. 6
Fig. 6 ROC (Receiver operating characteristic) curves for human gesture recognition using optimum linear distortion-invariant filter, and non-linear transformations of the filter in the presence of degraded conditions. (a) For the linear correlation process, k = 1, integral imaging reconstructed video with the TV algorithm (InIm + TV, green dashed line). (b) For non-linear correlation process, k = 0.3 [see Eq. (6)], (i) captured 2D video data (EI, black solid line), (ii) captured 2D video with the TV algorithm (EI + TV, magenta dashed line), (iii) integral imaging reconstructed video (InIm, blue dash-dotted line), and (iv) integral imaging reconstructed video with the TV algorithm (InIm + TV, red solid line).
Fig. 7
Fig. 7 ROC (Receiver operating characteristic) curves for human gesture recognition using non-linear distortion-invariant filter in the presence of degraded conditions with k = 0.3, [see Eq. (6)], (i) captured 2D video data (EI, black solid line), (ii) captured 2D video with the TV algorithm (EI + TV, magenta dashed line), (iii) integral imaging reconstructed video (InIm, blue dash-dotted line), (iv) integral imaging reconstructed video with the TV algorithm (InIm + TV, red solid line); and (v) integral imaging reconstructed video with the TV algorithm, where the correlation filter is trained by the averaged template videos as described at the end of Section 3.2 (InIm + TV, brown dotted line).

Tables (1)

Tables Icon

Table 1 Confusion matrix between a variety of algorithms for human gesture recognition and classification TP = True Positive, TN = True Negative, FP = False Positive, FN = False Negative, Tot. = Total

Equations (8)

Equations on this page are rendered with MathJax. Learn more.

Recon(x,y,z,t)= 1 O( x,y,t ) i=0 K1 j=0 L1 E I i,j ( xi r x × p x M× d x ,yj r y × p y M× d y ,t ) ,
s( p )=F T 1 [ S vec ( ω ) ]=vec[ s( x,y,t ) ] = i=1 N a i r i ( pτ ) + n b ( p )[ w 0 ( p ) i=1 N a i w ri ( pτ ) ]+ n a ( p ) w 0 ( p ),
H opt * (ω)= E[ S( ω,τ )exp( jωτ ) ] E[ | S( ω,τ ) | 2 ] = i=1 N [ R i (ω)+ m b W 1i (ω)+ m a | W 0 (ω) | 2 /d ] i=1 N { | R i (ω)+ m b W 1i (ω)+ m a | W 0 (ω) | 2 d | 2 + W 2i (ω) N b 0 (ω) 2π + | W 0 (ω) | 2 N a 0 (ω) 2π + ( m a + m b ) 2 [ W 2i (ω) | W 1i (ω) | 2 ] } ,
v k = [ | v 1 | k ×exp( j ϕ 1 ), | v 2 | k ×exp( j ϕ 2 ), ... , | v d | k ×exp( j ϕ d ) ] T ,
H k (ω)= { S k ( [ S k ] + S k ) 1 c* } 1/k ,
g( x,y,t;z )=F T 1 { [ abs( H )abs( T ) ] k ×exp[ j( TH ) ] }. ( k[ 0,1 ] )
POE= | E[ g( τ,τ ) ] | 2 / E{ [ g( p,τ ) ] 2 ¯ } ,
N photons =ϕ t e SNR× n r /η

Metrics