Skip to main content
Log in

Human steering angle estimation in video based on key point detection and Kalman filter

  • Research Article
  • Published:
Control Theory and Technology Aims and scope Submit manuscript

Abstract

Human pose recognition and estimation in video is pervasive. However, the process noise and local occlusion bring great challenge to pose recognition. In this paper, we introduce the Kalman filter into pose recognition to reduce noise and solve local occlusion problem. The core of pose recognition in video is the fast detection of key points and the calculation of human steering angles. Thus, we first build a human key point detection model. Frame skipping is performed based on the Hamming distance of the hash value of every two adjacent frames in video. Noise reduction is performed on key point coordinates with the Kalman filter. To calculate the human steering angle, current state information of key points is predicted using the optimal estimation of key points at the previous time. Then human steering angle can be calculated based on current and previous state information. The improved SENet, NLNet and GCNet modules are integrated into key point detection model for improving accuracy. Tests are also given to illustrate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Zhang, Y. (2016). Research on the ethical problems and countermeasures of artificial intelligence technology. Journal of Jilin Radio and TV University, 11, 114–115.

    Google Scholar 

  2. Zheng, Z., & Gu, S. (2017). Tensorflow Combat Google Deep Learning Framework (pp. 200–208). Beijing: Electronic Industry Press.

    Google Scholar 

  3. Zheng, Y., Chen, Q., & Zhang, Y. (2014). Deep learning and its new progress in target and behavior recognition. China Journal of Image and Graphics, 2, 18–24.

    Google Scholar 

  4. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. arXiv:1512.03385v1.

  5. Agarwal A, Triggs B. (2004). 3D human pose from silhouettes by relevance vector regression. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 882–888. Washington, DC, USA.

  6. Meng, F. (2012). Human Pose Estimation of Static Picture. Hefei University of Technology.

  7. Tang, Z., & Wang, Z. (2011). Survey of human pose estimation in single frame image. Computer Engineering and Science, 33(11), 89–97.

    Google Scholar 

  8. Toshev, A., & Szegedy, C. (2014). DeepPose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660. Columbus, OH, USA.

  9. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. Venice, Italy.

  10. Yang, W., Ouyang, W., Li, H., & Wang, X. (2016). End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3073–3082. Seattle, WA, USA.

  11. Belagiannis, V., & Zisserman, A. (2017). Recurrent human pose estimation. In The 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 468–475. Washington, DC, USA.

  12. Bulat A, Tzimiropoulos G. (2016). Human pose estimation via convolutional part heatmap regression. In The 14th European Conference on Computer Vision (ECCV), pp. 717–732. Amsterdam, Netherlands.

  13. Chen, X., & Yuille, A. (2014). Articulated pose estimation by a graphical model with image dependent pairwise relations. In The 28th Conference on Neural Information Processing Systems (NIPS), pp. 1736–1744. Montreal, Canada.

  14. Tompson, J., Jain, A., LeCun, Y., & Bregler, C. (2014). Joint training of a convolutional network and a graphical model for human pose estimation. In The 28th Conference on Neural Information Processing Systems (NIPS), pp. 1799–1807. Montreal, Canada.

  15. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., & Bregler, C. (2015). Efficient object localization using convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 648–656. Boston, MA, USA.

  16. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., & Schiele, B. (2016). Deepcut: Joint subset partition and labeling for multi-person pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4929–4937. Seattle, WA, USA.

  17. Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B. (2016). Deepercut: A deeper, stronger, and faster multi-person pose estimation. In The 14th European Conference on Computer Vision (ECCV), pp. 34–50. Amsterdam, Netherlands.

  18. Cao, Z., Simon, T., Wei, S.-E., & Sheikh, Y. (2017). Realtime multi-person 2D pose estimation using part affinity fields. In The 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302–1310. Honolulu, HI, USA.

  19. Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME-Journal of Basic Engineering, 82(1), 35–45.

    Article  MathSciNet  Google Scholar 

  20. Kim, P., & Huh, L. (2011). Kalman Filter for Beginners: With Matlab Examples. Science & Techology.

  21. Cui, J., & Chen, G. (2013). Kalman filter and its real-time application. Beijing: Tsinghua University Press.

    Google Scholar 

  22. Zeng, Y. (2012). Image aware hash algorithm and its application. Hangzhou: Zhejiang Sci-Tech University.

    Google Scholar 

  23. Wei, S.-E., Ramakrishna, V., Kanade, T., & Sheikh, Y. (2016). Convolutional pose machines. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4732. Seattle, WA, USA.

  24. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.

  25. Neubeck, A., & Van Gool, L. (2006). Efficient non-maximum suppression. In The 18th International Conference on Pattern Recognition (ICPR), pp. 850–855. Hong Kong, China.

  26. Richard, J. (1994). Trudeau. Introduction to Graph Theory. Dover Publications Inc.

  27. Bondy, J. A., & Murty, U. S. R. (2008). Graph Theory. Springer.

  28. Kuhn, H. W. (2010). The hungarian method for the assignment problem. Naval Res Logistics, 52(1–2), 7–21.

    MATH  Google Scholar 

  29. Xie, B. (2016). Hungarian algorithm and its generalization. East China Normal University.

  30. Iqbal, U., Milan, A., & Gall, J. (2017). Pose-track: Joint multi-person pose estimation and tracking. arXiv:1611.07727.

  31. Iqbal, U., Milan, A., Insafutdinov, E., Andriluka, M., Ensafutdinov, E., Pishchulin, L., Gall, J., & Schiele, B. (2018). PoseTrack: A benchmark for human pose estimation and tracking. In The 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5167–5176. Salt Lake City, UT, USA.

  32. Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Lawrence Zitnick, C., & Dollár, P. (2015). Microsoft COCO: common objects in context. arXiv:1405.0312v3.

  33. Xu, Y. (2019). Human Pose Estimation Combined with Positions of the Head and Shoulders. University of Science and Technology Beijing.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinghui Wang.

Additional information

This work was supported by the National Natural Science Foundation of China (Nos. 72101026, 61621063) and the State Key Laboratory of Intelligent Control and Decision of Complex Systems.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Y., Liu, Y., Xu, Y. et al. Human steering angle estimation in video based on key point detection and Kalman filter. Control Theory Technol. 20, 408–417 (2022). https://doi.org/10.1007/s11768-022-00100-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11768-022-00100-3

Keywords

Navigation