Abstract

Human eye-fixation prediction in 3D images is important for many 3D applications, such as fine-grained 3D video object segmentation and intelligent bulletproof curtains. While the vast majority of existing 2D-based approaches cannot be applied, the main challenge lies in the inconsistency, or even conflict, between the RGB and depth saliency maps. In this paper, we propose a three-stream architecture to accurately predict human visual attention on 3D images end-to-end. First, a two-stream feature extraction network based on advanced convolutional neural networks is trained for RGB and depth, and hierarchical information is extracted from each ResNet-18. Then, these multi-level features are fed into the channel attention mechanism to suppress the feature space inconsistency and make the network focus on a significant target. The enhanced saliency map is fused step-by-step by VGG-16 to generate the final coarse saliency map. Finally, each coarse map is refined empirically through refinement blocks, and the network's own identification errors are corrected based on the acquired knowledge, thus converting the prediction saliency map from coarse to fine. The results of comparison of our model with six other state-of-the-art approaches on the NUS dataset (CC of 0.5579, KLDiv of 1.0903, AUC of 0.8339, and NSS of 2.3373) and the NCTU dataset (CC of 0.8614, KLDiv of 0.2681, AUC of 0.9143, and NSS of 2.3795) indicate that the proposed model consistently outperforms them by a considerable margin as it fully employs the channel attention mechanism.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

Full Article  |  PDF Article
OSA Recommended Articles
Object-of-interest image segmentation based on human attention and semantic region clustering

Byoung Chul Ko and Jae-Yeal Nam
J. Opt. Soc. Am. A 23(10) 2462-2470 (2006)

Collaborative multicue fusion using the cross-diffusion process for salient object detection

Jin-Gang Yu, Changxin Gao, and Jinwen Tian
J. Opt. Soc. Am. A 33(3) 404-415 (2016)

Saliency of color image derivatives: a comparison between computational models and human perception

Eduard Vazquez, Theo Gevers, Marcel Lucassen, Joost van de Weijer, and Ramon Baldrich
J. Opt. Soc. Am. A 27(3) 613-621 (2010)

References

  • View by:
  • |
  • |
  • |

  1. M. Liu, C. Lu, H. Li, and X. Liu, “Near eye light field display based on human visual features,” Opt. Express 25(9), 9886–9900 (2017).
    [Crossref]
  2. C. E. Connor, H. E. Egeth, and S. Yantis, “Visual attention: bottom-up versus top-down,” Curr. Biol. 14(19), R850–R852 (2004).
    [Crossref]
  3. M. DaneshPanah, B. Javidi, and E. A. Watson, “Three dimensional object recognition with photon counting imagery in the presence of noise,” Opt. Express 18(25), 26450–26460 (2010).
    [Crossref]
  4. K. H. Yoon, M. K. Kang, H. Lee, and S. K. Kim, “Autostereoscopic 3D display system with dynamic fusion of the viewing zone under eye tracking: principles, setup, and evaluation,” Appl. Opt. 57(1), A101–A117 (2018).
    [Crossref]
  5. B. Kan, Y. Zhao, and S. Wang, “Objective visual comfort evaluation method based on disparity information and motion for stereoscopic video,” Opt. Express 26(9), 11418–11437 (2018).
    [Crossref]
  6. W. Zhou, L. Yu, W. Qiu, Y. Zhou, and M. Wu, “Local gradient patterns (LGP): An effective local-statistical-feature extraction scheme for no-reference image quality assessment,” Inf. Sci. 397–398, 1–14 (2017).
    [Crossref]
  7. X. Huang, C. Shen, X. Boix, and Q. Zhao, “Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks,” in International Conference on Computer Vision (ICCV), (IEEE, 2015), pp. 262–270.
  8. S. S. Kruthiventi, K. Ayush, and R. Babu, “Deepfix: A fully convolutional neural network for predicting human eye fixations,” IEEE Trans. Image Process. 26(9), 4446–4456 (2017).
    [Crossref]
  9. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).
  10. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.
  11. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122 (2015).
  12. J. Pan, C. C. Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and X. Giro-i Nieto, “Salgan: Visual saliency prediction with generative adversarial networks,” arXiv preprint arXiv:1701.01081 (2017).
  13. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, 2672–2680 (2014).
  14. W. Wang and J. Shen, “Deep visual attention prediction,” IEEE Trans. Image Process. 27(5), 2368–2378 (2018).
    [Crossref]
  15. M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “Predicting human eye fixations via an lstm-based saliency attentive model,” IEEE Trans. Image Process. 27(10), 5142–5154 (2018).
    [Crossref]
  16. A. Kroner, M. Senden, K. Driessens, and R. Goebel, “Contextual Encoder-Decoder Network for Visual Saliency Prediction,” arXiv preprint arXiv:1902.06634 (2019).
  17. Z. Che, A. Borji, G. Zhai, X. Min, G. Guo, and P. Callet, “Leverage eye-movement data for saliency modeling: Invariance Analysis and a Robust New Model,” arXiv preprint arXiv:1905.06803, May. (2019).
  18. Q. Zhang, X. Wang, J. Jiang, and L. Ma, “Deep learning features inspired saliency detection of 3D images,” in Pacific rim conference on multimedia, 580–589 (2016).
  19. B. Li, Q. Liu, X. Shi, and Y. Yang, “Graph-Based Saliency Fusion with Superpixel-Level Belief Propagation for 3D Fixation Prediction,” in 2018 25th IEEE International Conference on Image Processing (ICIP), (IEEE, 2018), pp. 2321–2325.
  20. W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2019), pp. 3064–3074.
  21. M.-X. Jiang, C. Deng, J.-S. Shan, Y.-Y. Wang, Y.-J. Jia, and X. Sun, “Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking,” Information Fusion 50, 1–8 (2019).
    [Crossref]
  22. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2016), pp.770–778.
  23. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” In Advances in neural information processing systems, 1097–1105 (2012).
  24. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012.
  25. B. Normalization, “Accelerating deep network training by reducing internal covariate shift.” CoRR.–2015.–Vol. abs/1502.03167.–URL: http://arxiv.org/abs/1502.03167 (2015).
  26. C. Lang, T. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, and S. Yan, “Depth matters: Influence of depth cues on visual saliency,” in European Conference on Computer Vision (ECCV), (Springer, 2012), pp. 101–115.
  27. C.-Y. Ma and H. Hang, “Learning-based saliency model with depth information,” J. Vis. 15(6), 19 (2015).
    [Crossref]
  28. L. Prechelt, “Early Stopping-But When?” Neural Networks Tricks of the Trade 1524, 55–69 (1998).
    [Crossref]
  29. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).
  30. Y. Fang, J. Wang, M. Narwaria, P. Le Callet, and W. Lin, “Saliency detection for stereoscopic images,” IEEE Trans. Image Process. 23(6), 2625–2636 (2014).
    [Crossref]
  31. M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “A deep multi-level network for saliency prediction,” in International Conference on Pattern Recognition (ICPR), 3488–3493(2016).
  32. H. R. Tavakoli, A. Borji, J. Laaksonen, and E. J. N. Rahtu, “Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features,” Neurocomputing 244, 10–18 (2017).
    [Crossref]

2019 (1)

M.-X. Jiang, C. Deng, J.-S. Shan, Y.-Y. Wang, Y.-J. Jia, and X. Sun, “Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking,” Information Fusion 50, 1–8 (2019).
[Crossref]

2018 (4)

K. H. Yoon, M. K. Kang, H. Lee, and S. K. Kim, “Autostereoscopic 3D display system with dynamic fusion of the viewing zone under eye tracking: principles, setup, and evaluation,” Appl. Opt. 57(1), A101–A117 (2018).
[Crossref]

B. Kan, Y. Zhao, and S. Wang, “Objective visual comfort evaluation method based on disparity information and motion for stereoscopic video,” Opt. Express 26(9), 11418–11437 (2018).
[Crossref]

W. Wang and J. Shen, “Deep visual attention prediction,” IEEE Trans. Image Process. 27(5), 2368–2378 (2018).
[Crossref]

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “Predicting human eye fixations via an lstm-based saliency attentive model,” IEEE Trans. Image Process. 27(10), 5142–5154 (2018).
[Crossref]

2017 (4)

M. Liu, C. Lu, H. Li, and X. Liu, “Near eye light field display based on human visual features,” Opt. Express 25(9), 9886–9900 (2017).
[Crossref]

W. Zhou, L. Yu, W. Qiu, Y. Zhou, and M. Wu, “Local gradient patterns (LGP): An effective local-statistical-feature extraction scheme for no-reference image quality assessment,” Inf. Sci. 397–398, 1–14 (2017).
[Crossref]

S. S. Kruthiventi, K. Ayush, and R. Babu, “Deepfix: A fully convolutional neural network for predicting human eye fixations,” IEEE Trans. Image Process. 26(9), 4446–4456 (2017).
[Crossref]

H. R. Tavakoli, A. Borji, J. Laaksonen, and E. J. N. Rahtu, “Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features,” Neurocomputing 244, 10–18 (2017).
[Crossref]

2015 (1)

C.-Y. Ma and H. Hang, “Learning-based saliency model with depth information,” J. Vis. 15(6), 19 (2015).
[Crossref]

2014 (1)

Y. Fang, J. Wang, M. Narwaria, P. Le Callet, and W. Lin, “Saliency detection for stereoscopic images,” IEEE Trans. Image Process. 23(6), 2625–2636 (2014).
[Crossref]

2010 (1)

2004 (1)

C. E. Connor, H. E. Egeth, and S. Yantis, “Visual attention: bottom-up versus top-down,” Curr. Biol. 14(19), R850–R852 (2004).
[Crossref]

1998 (1)

L. Prechelt, “Early Stopping-But When?” Neural Networks Tricks of the Trade 1524, 55–69 (1998).
[Crossref]

Anguelov, D.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.

Antiga, L.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

Ayush, K.

S. S. Kruthiventi, K. Ayush, and R. Babu, “Deepfix: A fully convolutional neural network for predicting human eye fixations,” IEEE Trans. Image Process. 26(9), 4446–4456 (2017).
[Crossref]

Babu, R.

S. S. Kruthiventi, K. Ayush, and R. Babu, “Deepfix: A fully convolutional neural network for predicting human eye fixations,” IEEE Trans. Image Process. 26(9), 4446–4456 (2017).
[Crossref]

Baraldi, L.

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “Predicting human eye fixations via an lstm-based saliency attentive model,” IEEE Trans. Image Process. 27(10), 5142–5154 (2018).
[Crossref]

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “A deep multi-level network for saliency prediction,” in International Conference on Pattern Recognition (ICPR), 3488–3493(2016).

Bengio, Y.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, 2672–2680 (2014).

Boix, X.

X. Huang, C. Shen, X. Boix, and Q. Zhao, “Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks,” in International Conference on Computer Vision (ICCV), (IEEE, 2015), pp. 262–270.

Borji, A.

H. R. Tavakoli, A. Borji, J. Laaksonen, and E. J. N. Rahtu, “Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features,” Neurocomputing 244, 10–18 (2017).
[Crossref]

Z. Che, A. Borji, G. Zhai, X. Min, G. Guo, and P. Callet, “Leverage eye-movement data for saliency modeling: Invariance Analysis and a Robust New Model,” arXiv preprint arXiv:1905.06803, May. (2019).

Callet, P.

Z. Che, A. Borji, G. Zhai, X. Min, G. Guo, and P. Callet, “Leverage eye-movement data for saliency modeling: Invariance Analysis and a Robust New Model,” arXiv preprint arXiv:1905.06803, May. (2019).

Chanan, G.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

Che, Z.

Z. Che, A. Borji, G. Zhai, X. Min, G. Guo, and P. Callet, “Leverage eye-movement data for saliency modeling: Invariance Analysis and a Robust New Model,” arXiv preprint arXiv:1905.06803, May. (2019).

Chintala, S.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

Connor, C. E.

C. E. Connor, H. E. Egeth, and S. Yantis, “Visual attention: bottom-up versus top-down,” Curr. Biol. 14(19), R850–R852 (2004).
[Crossref]

Cornia, M.

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “Predicting human eye fixations via an lstm-based saliency attentive model,” IEEE Trans. Image Process. 27(10), 5142–5154 (2018).
[Crossref]

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “A deep multi-level network for saliency prediction,” in International Conference on Pattern Recognition (ICPR), 3488–3493(2016).

Courville, A.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, 2672–2680 (2014).

Cucchiara, R.

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “Predicting human eye fixations via an lstm-based saliency attentive model,” IEEE Trans. Image Process. 27(10), 5142–5154 (2018).
[Crossref]

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “A deep multi-level network for saliency prediction,” in International Conference on Pattern Recognition (ICPR), 3488–3493(2016).

DaneshPanah, M.

Deng, C.

M.-X. Jiang, C. Deng, J.-S. Shan, Y.-Y. Wang, Y.-J. Jia, and X. Sun, “Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking,” Information Fusion 50, 1–8 (2019).
[Crossref]

Desmaison, A.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

DeVito, Z.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

Driessens, K.

A. Kroner, M. Senden, K. Driessens, and R. Goebel, “Contextual Encoder-Decoder Network for Visual Saliency Prediction,” arXiv preprint arXiv:1902.06634 (2019).

Egeth, H. E.

C. E. Connor, H. E. Egeth, and S. Yantis, “Visual attention: bottom-up versus top-down,” Curr. Biol. 14(19), R850–R852 (2004).
[Crossref]

Erhan, D.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.

Fang, Y.

Y. Fang, J. Wang, M. Narwaria, P. Le Callet, and W. Lin, “Saliency detection for stereoscopic images,” IEEE Trans. Image Process. 23(6), 2625–2636 (2014).
[Crossref]

Ferrer, C. C.

J. Pan, C. C. Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and X. Giro-i Nieto, “Salgan: Visual saliency prediction with generative adversarial networks,” arXiv preprint arXiv:1701.01081 (2017).

Giro-i Nieto, X.

J. Pan, C. C. Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and X. Giro-i Nieto, “Salgan: Visual saliency prediction with generative adversarial networks,” arXiv preprint arXiv:1701.01081 (2017).

Goebel, R.

A. Kroner, M. Senden, K. Driessens, and R. Goebel, “Contextual Encoder-Decoder Network for Visual Saliency Prediction,” arXiv preprint arXiv:1902.06634 (2019).

Goodfellow, I.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, 2672–2680 (2014).

Gross, S.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

Guo, G.

Z. Che, A. Borji, G. Zhai, X. Min, G. Guo, and P. Callet, “Leverage eye-movement data for saliency modeling: Invariance Analysis and a Robust New Model,” arXiv preprint arXiv:1905.06803, May. (2019).

Hang, H.

C.-Y. Ma and H. Hang, “Learning-based saliency model with depth information,” J. Vis. 15(6), 19 (2015).
[Crossref]

He, K.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2016), pp.770–778.

Hinton, G. E.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” In Advances in neural information processing systems, 1097–1105 (2012).

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012.

Hoi, S. C.

W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2019), pp. 3064–3074.

Huang, X.

X. Huang, C. Shen, X. Boix, and Q. Zhao, “Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks,” in International Conference on Computer Vision (ICCV), (IEEE, 2015), pp. 262–270.

Javidi, B.

Jia, Y.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.

Jia, Y.-J.

M.-X. Jiang, C. Deng, J.-S. Shan, Y.-Y. Wang, Y.-J. Jia, and X. Sun, “Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking,” Information Fusion 50, 1–8 (2019).
[Crossref]

Jiang, J.

Q. Zhang, X. Wang, J. Jiang, and L. Ma, “Deep learning features inspired saliency detection of 3D images,” in Pacific rim conference on multimedia, 580–589 (2016).

Jiang, M.-X.

M.-X. Jiang, C. Deng, J.-S. Shan, Y.-Y. Wang, Y.-J. Jia, and X. Sun, “Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking,” Information Fusion 50, 1–8 (2019).
[Crossref]

Kan, B.

Kang, M. K.

Kankanhalli, M.

C. Lang, T. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, and S. Yan, “Depth matters: Influence of depth cues on visual saliency,” in European Conference on Computer Vision (ECCV), (Springer, 2012), pp. 101–115.

Katti, H.

C. Lang, T. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, and S. Yan, “Depth matters: Influence of depth cues on visual saliency,” in European Conference on Computer Vision (ECCV), (Springer, 2012), pp. 101–115.

Kim, S. K.

Koltun, V.

F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122 (2015).

Krizhevsky, A.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” In Advances in neural information processing systems, 1097–1105 (2012).

Kroner, A.

A. Kroner, M. Senden, K. Driessens, and R. Goebel, “Contextual Encoder-Decoder Network for Visual Saliency Prediction,” arXiv preprint arXiv:1902.06634 (2019).

Kruthiventi, S. S.

S. S. Kruthiventi, K. Ayush, and R. Babu, “Deepfix: A fully convolutional neural network for predicting human eye fixations,” IEEE Trans. Image Process. 26(9), 4446–4456 (2017).
[Crossref]

Laaksonen, J.

H. R. Tavakoli, A. Borji, J. Laaksonen, and E. J. N. Rahtu, “Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features,” Neurocomputing 244, 10–18 (2017).
[Crossref]

Lang, C.

C. Lang, T. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, and S. Yan, “Depth matters: Influence of depth cues on visual saliency,” in European Conference on Computer Vision (ECCV), (Springer, 2012), pp. 101–115.

Le Callet, P.

Y. Fang, J. Wang, M. Narwaria, P. Le Callet, and W. Lin, “Saliency detection for stereoscopic images,” IEEE Trans. Image Process. 23(6), 2625–2636 (2014).
[Crossref]

Lee, H.

Lerer, A.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

Li, B.

B. Li, Q. Liu, X. Shi, and Y. Yang, “Graph-Based Saliency Fusion with Superpixel-Level Belief Propagation for 3D Fixation Prediction,” in 2018 25th IEEE International Conference on Image Processing (ICIP), (IEEE, 2018), pp. 2321–2325.

Li, H.

Lin, W.

Y. Fang, J. Wang, M. Narwaria, P. Le Callet, and W. Lin, “Saliency detection for stereoscopic images,” IEEE Trans. Image Process. 23(6), 2625–2636 (2014).
[Crossref]

Lin, Z.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

Ling, H.

W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2019), pp. 3064–3074.

Liu, M.

Liu, Q.

B. Li, Q. Liu, X. Shi, and Y. Yang, “Graph-Based Saliency Fusion with Superpixel-Level Belief Propagation for 3D Fixation Prediction,” in 2018 25th IEEE International Conference on Image Processing (ICIP), (IEEE, 2018), pp. 2321–2325.

Liu, W.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.

Liu, X.

Lu, C.

Ma, C.-Y.

C.-Y. Ma and H. Hang, “Learning-based saliency model with depth information,” J. Vis. 15(6), 19 (2015).
[Crossref]

Ma, L.

Q. Zhang, X. Wang, J. Jiang, and L. Ma, “Deep learning features inspired saliency detection of 3D images,” in Pacific rim conference on multimedia, 580–589 (2016).

McGuinness, K.

J. Pan, C. C. Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and X. Giro-i Nieto, “Salgan: Visual saliency prediction with generative adversarial networks,” arXiv preprint arXiv:1701.01081 (2017).

Min, X.

Z. Che, A. Borji, G. Zhai, X. Min, G. Guo, and P. Callet, “Leverage eye-movement data for saliency modeling: Invariance Analysis and a Robust New Model,” arXiv preprint arXiv:1905.06803, May. (2019).

Mirza, M.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, 2672–2680 (2014).

Narwaria, M.

Y. Fang, J. Wang, M. Narwaria, P. Le Callet, and W. Lin, “Saliency detection for stereoscopic images,” IEEE Trans. Image Process. 23(6), 2625–2636 (2014).
[Crossref]

Nguyen, T.

C. Lang, T. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, and S. Yan, “Depth matters: Influence of depth cues on visual saliency,” in European Conference on Computer Vision (ECCV), (Springer, 2012), pp. 101–115.

Normalization, B.

B. Normalization, “Accelerating deep network training by reducing internal covariate shift.” CoRR.–2015.–Vol. abs/1502.03167.–URL: http://arxiv.org/abs/1502.03167 (2015).

O’Connor, N. E.

J. Pan, C. C. Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and X. Giro-i Nieto, “Salgan: Visual saliency prediction with generative adversarial networks,” arXiv preprint arXiv:1701.01081 (2017).

Ozair, S.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, 2672–2680 (2014).

Pan, J.

J. Pan, C. C. Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and X. Giro-i Nieto, “Salgan: Visual saliency prediction with generative adversarial networks,” arXiv preprint arXiv:1701.01081 (2017).

Paszke, A.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

Pouget-Abadie, J.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, 2672–2680 (2014).

Prechelt, L.

L. Prechelt, “Early Stopping-But When?” Neural Networks Tricks of the Trade 1524, 55–69 (1998).
[Crossref]

Qiu, W.

W. Zhou, L. Yu, W. Qiu, Y. Zhou, and M. Wu, “Local gradient patterns (LGP): An effective local-statistical-feature extraction scheme for no-reference image quality assessment,” Inf. Sci. 397–398, 1–14 (2017).
[Crossref]

Rabinovich, A.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.

Rahtu, E. J. N.

H. R. Tavakoli, A. Borji, J. Laaksonen, and E. J. N. Rahtu, “Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features,” Neurocomputing 244, 10–18 (2017).
[Crossref]

Reed, S.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.

Ren, S.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2016), pp.770–778.

Sayrol, E.

J. Pan, C. C. Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and X. Giro-i Nieto, “Salgan: Visual saliency prediction with generative adversarial networks,” arXiv preprint arXiv:1701.01081 (2017).

Senden, M.

A. Kroner, M. Senden, K. Driessens, and R. Goebel, “Contextual Encoder-Decoder Network for Visual Saliency Prediction,” arXiv preprint arXiv:1902.06634 (2019).

Sermanet, P.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.

Serra, G.

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “Predicting human eye fixations via an lstm-based saliency attentive model,” IEEE Trans. Image Process. 27(10), 5142–5154 (2018).
[Crossref]

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “A deep multi-level network for saliency prediction,” in International Conference on Pattern Recognition (ICPR), 3488–3493(2016).

Shan, J.-S.

M.-X. Jiang, C. Deng, J.-S. Shan, Y.-Y. Wang, Y.-J. Jia, and X. Sun, “Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking,” Information Fusion 50, 1–8 (2019).
[Crossref]

Shen, C.

X. Huang, C. Shen, X. Boix, and Q. Zhao, “Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks,” in International Conference on Computer Vision (ICCV), (IEEE, 2015), pp. 262–270.

Shen, J.

W. Wang and J. Shen, “Deep visual attention prediction,” IEEE Trans. Image Process. 27(5), 2368–2378 (2018).
[Crossref]

W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2019), pp. 3064–3074.

Shi, X.

B. Li, Q. Liu, X. Shi, and Y. Yang, “Graph-Based Saliency Fusion with Superpixel-Level Belief Propagation for 3D Fixation Prediction,” in 2018 25th IEEE International Conference on Image Processing (ICIP), (IEEE, 2018), pp. 2321–2325.

Simonyan, K.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).

Song, H.

W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2019), pp. 3064–3074.

Sun, J.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2016), pp.770–778.

Sun, X.

M.-X. Jiang, C. Deng, J.-S. Shan, Y.-Y. Wang, Y.-J. Jia, and X. Sun, “Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking,” Information Fusion 50, 1–8 (2019).
[Crossref]

Sutskever, I.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” In Advances in neural information processing systems, 1097–1105 (2012).

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012.

Szegedy, C.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.

Tavakoli, H. R.

H. R. Tavakoli, A. Borji, J. Laaksonen, and E. J. N. Rahtu, “Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features,” Neurocomputing 244, 10–18 (2017).
[Crossref]

Torres, J.

J. Pan, C. C. Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and X. Giro-i Nieto, “Salgan: Visual saliency prediction with generative adversarial networks,” arXiv preprint arXiv:1701.01081 (2017).

Vanhoucke, V.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.

Wang, J.

Y. Fang, J. Wang, M. Narwaria, P. Le Callet, and W. Lin, “Saliency detection for stereoscopic images,” IEEE Trans. Image Process. 23(6), 2625–2636 (2014).
[Crossref]

Wang, S.

Wang, W.

W. Wang and J. Shen, “Deep visual attention prediction,” IEEE Trans. Image Process. 27(5), 2368–2378 (2018).
[Crossref]

W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2019), pp. 3064–3074.

Wang, X.

Q. Zhang, X. Wang, J. Jiang, and L. Ma, “Deep learning features inspired saliency detection of 3D images,” in Pacific rim conference on multimedia, 580–589 (2016).

Wang, Y.-Y.

M.-X. Jiang, C. Deng, J.-S. Shan, Y.-Y. Wang, Y.-J. Jia, and X. Sun, “Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking,” Information Fusion 50, 1–8 (2019).
[Crossref]

Warde-Farley, D.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, 2672–2680 (2014).

Watson, E. A.

Wu, M.

W. Zhou, L. Yu, W. Qiu, Y. Zhou, and M. Wu, “Local gradient patterns (LGP): An effective local-statistical-feature extraction scheme for no-reference image quality assessment,” Inf. Sci. 397–398, 1–14 (2017).
[Crossref]

Xu, B.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, 2672–2680 (2014).

Yadati, K.

C. Lang, T. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, and S. Yan, “Depth matters: Influence of depth cues on visual saliency,” in European Conference on Computer Vision (ECCV), (Springer, 2012), pp. 101–115.

Yan, S.

C. Lang, T. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, and S. Yan, “Depth matters: Influence of depth cues on visual saliency,” in European Conference on Computer Vision (ECCV), (Springer, 2012), pp. 101–115.

Yang, E.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

Yang, Y.

B. Li, Q. Liu, X. Shi, and Y. Yang, “Graph-Based Saliency Fusion with Superpixel-Level Belief Propagation for 3D Fixation Prediction,” in 2018 25th IEEE International Conference on Image Processing (ICIP), (IEEE, 2018), pp. 2321–2325.

Yantis, S.

C. E. Connor, H. E. Egeth, and S. Yantis, “Visual attention: bottom-up versus top-down,” Curr. Biol. 14(19), R850–R852 (2004).
[Crossref]

Yoon, K. H.

Yu, F.

F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122 (2015).

Yu, L.

W. Zhou, L. Yu, W. Qiu, Y. Zhou, and M. Wu, “Local gradient patterns (LGP): An effective local-statistical-feature extraction scheme for no-reference image quality assessment,” Inf. Sci. 397–398, 1–14 (2017).
[Crossref]

Zhai, G.

Z. Che, A. Borji, G. Zhai, X. Min, G. Guo, and P. Callet, “Leverage eye-movement data for saliency modeling: Invariance Analysis and a Robust New Model,” arXiv preprint arXiv:1905.06803, May. (2019).

Zhang, Q.

Q. Zhang, X. Wang, J. Jiang, and L. Ma, “Deep learning features inspired saliency detection of 3D images,” in Pacific rim conference on multimedia, 580–589 (2016).

Zhang, X.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2016), pp.770–778.

Zhao, Q.

X. Huang, C. Shen, X. Boix, and Q. Zhao, “Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks,” in International Conference on Computer Vision (ICCV), (IEEE, 2015), pp. 262–270.

Zhao, S.

W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2019), pp. 3064–3074.

W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2019), pp. 3064–3074.

Zhao, Y.

Zhou, W.

W. Zhou, L. Yu, W. Qiu, Y. Zhou, and M. Wu, “Local gradient patterns (LGP): An effective local-statistical-feature extraction scheme for no-reference image quality assessment,” Inf. Sci. 397–398, 1–14 (2017).
[Crossref]

Zhou, Y.

W. Zhou, L. Yu, W. Qiu, Y. Zhou, and M. Wu, “Local gradient patterns (LGP): An effective local-statistical-feature extraction scheme for no-reference image quality assessment,” Inf. Sci. 397–398, 1–14 (2017).
[Crossref]

Zisserman, A.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).

Appl. Opt. (1)

Curr. Biol. (1)

C. E. Connor, H. E. Egeth, and S. Yantis, “Visual attention: bottom-up versus top-down,” Curr. Biol. 14(19), R850–R852 (2004).
[Crossref]

IEEE Trans. Image Process. (4)

S. S. Kruthiventi, K. Ayush, and R. Babu, “Deepfix: A fully convolutional neural network for predicting human eye fixations,” IEEE Trans. Image Process. 26(9), 4446–4456 (2017).
[Crossref]

W. Wang and J. Shen, “Deep visual attention prediction,” IEEE Trans. Image Process. 27(5), 2368–2378 (2018).
[Crossref]

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “Predicting human eye fixations via an lstm-based saliency attentive model,” IEEE Trans. Image Process. 27(10), 5142–5154 (2018).
[Crossref]

Y. Fang, J. Wang, M. Narwaria, P. Le Callet, and W. Lin, “Saliency detection for stereoscopic images,” IEEE Trans. Image Process. 23(6), 2625–2636 (2014).
[Crossref]

Inf. Sci. (1)

W. Zhou, L. Yu, W. Qiu, Y. Zhou, and M. Wu, “Local gradient patterns (LGP): An effective local-statistical-feature extraction scheme for no-reference image quality assessment,” Inf. Sci. 397–398, 1–14 (2017).
[Crossref]

Information Fusion (1)

M.-X. Jiang, C. Deng, J.-S. Shan, Y.-Y. Wang, Y.-J. Jia, and X. Sun, “Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking,” Information Fusion 50, 1–8 (2019).
[Crossref]

J. Vis. (1)

C.-Y. Ma and H. Hang, “Learning-based saliency model with depth information,” J. Vis. 15(6), 19 (2015).
[Crossref]

Neural Networks Tricks of the Trade (1)

L. Prechelt, “Early Stopping-But When?” Neural Networks Tricks of the Trade 1524, 55–69 (1998).
[Crossref]

Neurocomputing (1)

H. R. Tavakoli, A. Borji, J. Laaksonen, and E. J. N. Rahtu, “Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features,” Neurocomputing 244, 10–18 (2017).
[Crossref]

Opt. Express (3)

Other (18)

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2015), pp. 1–9.

F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122 (2015).

J. Pan, C. C. Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and X. Giro-i Nieto, “Salgan: Visual saliency prediction with generative adversarial networks,” arXiv preprint arXiv:1701.01081 (2017).

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, 2672–2680 (2014).

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2016), pp.770–778.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” In Advances in neural information processing systems, 1097–1105 (2012).

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012.

B. Normalization, “Accelerating deep network training by reducing internal covariate shift.” CoRR.–2015.–Vol. abs/1502.03167.–URL: http://arxiv.org/abs/1502.03167 (2015).

C. Lang, T. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, and S. Yan, “Depth matters: Influence of depth cues on visual saliency,” in European Conference on Computer Vision (ECCV), (Springer, 2012), pp. 101–115.

A. Kroner, M. Senden, K. Driessens, and R. Goebel, “Contextual Encoder-Decoder Network for Visual Saliency Prediction,” arXiv preprint arXiv:1902.06634 (2019).

Z. Che, A. Borji, G. Zhai, X. Min, G. Guo, and P. Callet, “Leverage eye-movement data for saliency modeling: Invariance Analysis and a Robust New Model,” arXiv preprint arXiv:1905.06803, May. (2019).

Q. Zhang, X. Wang, J. Jiang, and L. Ma, “Deep learning features inspired saliency detection of 3D images,” in Pacific rim conference on multimedia, 580–589 (2016).

B. Li, Q. Liu, X. Shi, and Y. Yang, “Graph-Based Saliency Fusion with Superpixel-Level Belief Propagation for 3D Fixation Prediction,” in 2018 25th IEEE International Conference on Image Processing (ICIP), (IEEE, 2018), pp. 2321–2325.

W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Computer Vision and Pattern Recognition(CVPR), (IEEE, 2019), pp. 3064–3074.

M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “A deep multi-level network for saliency prediction,” in International Conference on Pattern Recognition (ICPR), 3488–3493(2016).

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” NIPS-W, Oct. (2017).

X. Huang, C. Shen, X. Boix, and Q. Zhao, “Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks,” in International Conference on Computer Vision (ICCV), (IEEE, 2015), pp. 262–270.

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (5)

Fig. 1.
Fig. 1. Architecture of the proposed human attention prediction network.
Fig. 2.
Fig. 2. Illustration of channel-wise attention mechanism
Fig. 3.
Fig. 3. Structure of recurrent convolutional upsample layer (RCUL).
Fig. 4.
Fig. 4. Qualitative results: we compare our results with 6 saliency prediction models. Column 1: original left images; Column 2: the ground truth; Column 3: the proposed model; Columns 4–9: saliency map from [29,8,30,14,31,15].
Fig. 5.
Fig. 5. Visual comparison with different modules. The meaning of indexes could be seen in the caption of Table 2

Tables (2)

Tables Icon

Table 1. Comparison of quantitative scores on NUS [26] and NCTU [27] datasets

Tables Icon

Table 2. Ablation studies of different modules.

Equations (12)

Equations on this page are rendered with MathJax. Learn more.

X m = R e ( F m ) R e ( F m ) T
W m = δ [ M a x ( X m , 1 ) X m ]
A m = ( W m F m ) + γ F m
Z = β ( σ ( C o n v ( A m ) ) )
S t + 1 = β ( σ ( μ ( Z ) + μ ( R e ( S t ) ) ) )
M = s i g ( C o n v ( S 3 ) )
C C ( P , Q ) = 1 σ ( P , Q ) σ ( P ) × σ ( Q )
L = 1 N i = 1 n | | P Q | | 2 + 1 σ ( P , Q ) σ ( P ) × σ ( Q )
C C = cov ( S , G ) σ ( S ) × σ ( G )
N S S = 1 N i = 1 N S ¯ ( i ) × Q ( i )
w h e r e N = i Q ( i ) a n d S ¯ = S μ ( S ) σ ( S )
D ( S , Q ) = i = 1 n P S ( x ) log P S ( x ) P Q ( x )

Metrics