Effect of Positive-Negative Image Ratio on the Performance of Pedestrian Detection Model

Authors

  • Lai Kok Yee Malaysia–Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100 Kuala Lumpur, Malaysia
  • Tan Lit Ken Malaysia–Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100 Kuala Lumpur, Malaysia
  • Hau Sim Choo Malaysia–Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100 Kuala Lumpur, Malaysia
  • Yutaka Asako Malaysia–Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100 Kuala Lumpur, Malaysia
  • Lee Kee Quen Malaysia–Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100 Kuala Lumpur, Malaysia
  • Hooi-Siang Kang bMarine Technology Center, Institute for Vehicle System & Engineering, School of Mechanical Engineering, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
  • Y. S. Gan School of Architecture, Feng Chia University, Taichung 40724, Taiwan R.O.C
  • Zun-Liang Chuan Faculty of Industrial Sciences and Technology, Universiti Malaysia Pahang
  • Wah Yen Tey Department of Mechanical Engineering, UCSI University, Cheras, Kuala Lumpur, Malaysia
  • Nor Azwadi Che Sidik Malaysia–Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100 Kuala Lumpur, Malaysia

DOI:

https://doi.org/10.11113/mjfas.v20n2.3300

Keywords:

Pedestrian detection, ratio, Histogram of Oriented Gradient, Support Vector Machines, Medium Neural Network

Abstract

Pedestrian detection holds significant importance in computer vision, finding applications in video surveillance, human-computer interaction, and autonomous vehicles. Surprisingly, there is a scarcity of research addressing the optimal ratio of positive to negative images for training detection models. This study endeavors to fill this research gap by exploring various detection models and determining the ideal ratio. Two distinct scenarios are investigated, each characterized by an equal total image count and an equivalent number of positive images sourced from CVC-14 night/visible, night/FIR, and INRIA databases. The study leverages the Histogram of Oriented Gradient, utilizing both Support Vector Machines and Medium Neural Networks to construct the detection models. Notably, the experiments reveal that the accuracy of the models remains relatively stable, even with an increase in the ratio of negative images. However, a noteworthy inverse relationship between sensitivity and specificity emerges as the ratio escalates. The findings, guided by the Youden Index, pinpoint the optimal training ratio for pedestrian detection models, falling within the range of 1:0.5 to 1:2In the CVC-14 nighttime database, the Youden index reached 100% when the model was trained with a 1:0.5 ratio using SVM, and a total of 4500 images were employed in the training process. On the other hand, in the INRIA dataset, the Youden index exhibited its highest value at 98.50%. This occurred when both SVM and a Medium neural network were utilized to train the model with a ratio of 1:2, utilizing a total of 3000 images for the training phase. It's worth highlighting that the processing time for SVM models lags behind that of Medium Neural Networks. This disparity arises from the heightened computational complexity inherent to medium-sized neural networks, making them computationally demanding compared to SVMs. This study contributes valuable insights into the nuanced relationship between image ratios and the performance of pedestrian detection models.

References

Tarek, B. and R. Liu. (2013). Study and performance analysis of motion detection algorithms. 355-358.

Liu, J., et al. (2018). Multi-target intense human motion analysis and detection using channel state information. Sensors, 18(10), 3379.

Yun, J. and S. S. Lee. (2014). Human movement detection and identification using pyroelectric infrared sensors. Sensors (Basel), 14(5), 8057-81.

Balogh, Z., M. Magdin, and G. Molnár. (2019). Motion detection and face recognition using raspberry pi, as a part of, the internet of things. Acta Polytechnica Hungarica, 16(3), 167-185.

Sumit, S.S., D. Rambli, and S. Mirjalili. (2021). Vision-based human detection techniques: A descriptive review. IEEE Access. 9, 42724-42761.

Vrigkas, M., C. Nikou, and I. A. Kakadiaris, (2015). A review of human activity recognition methods. Frontiers in Robotics and AI, 2.

Cheng, W.-H., et al. (2021). Fashion meets computer vision: A survey. ACM Computing Surveys (CSUR), 54(4), 1-41.

Sumit, S. S., D. R. A. Rambli, and S. Mirjalili. (2021). Vision-based human detection techniques: A descriptive review. IEEE Access, 9, 42724-42761.

Patwary, M. J. A., S. Parvin, and S. Akter. (2015). Significant HOG-histogram of oriented gradient feature selection for human detection. International Journal of Computer Applications, 132, 20-24.

Cheung, Y.m. and J. Deng. (2014). Ultra local binary pattern for image texture analysis. Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC).

Shorten, C. and T.M. Khoshgoftaar. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 60.

Thabtah, F., et al. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429-441.

Dalal, N. and B. Triggs. (2005). Histograms of oriented gradients for human detection. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). IEEE.

Ojala, T., M. Pietikäinen, and T. Mäenpää. (2000). Gray scale and rotation invariant texture classification with local binary patterns. European conference on computer vision. Springer.

Lowe, D. G. (1999). Object recognition from local scale-invariant features. Proceedings of the seventh IEEE international conference on computer vision. IEEE.

Zhang, S., C. Bauckhage, and A. B. Cremers. (2014). Informed haar-like features improve pedestrian detection. Proceedings of the IEEE Conference on computer vision and pattern recognition.

Tan, X. and B. Triggs. (2010). Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE transactions on image processing, 19(6), 1635-1650.

P. F. Felzenszwalb, R. B. Girshick, D. McAllester and D. Ramanan. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627-1645.

P. Viola and M. Jones (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, I-I.

Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346-359.

Zhu, Q., et al. (2006). Fast human detection using a cascade of histograms of oriented gradients. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2, 1491-1498.

He, N., J. Cao, and L. Song. (2008). Scale space histogram of oriented gradients for human detection. 2008 International Symposium on Information Science and Engineering.

Watanabe, T., S. Ito, and K. Yokoi. (2009). Co-occurrence histograms of oriented gradients for pedestrian detection. Pacific-Rim Symposium on Image and Video Technology. Springer.

Hiromoto, M. and R. Miyamoto. (2009). Cascade classifier using divided CoHOG features for rapid pedestrian detection. International Conference on Computer Vision Systems ICVS 2009, 53-62.

Quintero, R., et al. (2017). Pedestrian intention recognition by means of a hidden Markov model and body language. 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC). IEEE.

Jalal, A., S. Kamal, and D. Kim. (2016). Human depth sensors-based activity recognition using spatiotemporal features and hidden markov model for smart environments. Journal of Computer Networks and Communications, 2016.

Freund, Y. and R. E. Schapire. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.

Hyeranbyun and w. Seong (2011). A survey on pattern recognition applications of support vector machines. International Journal of Pattern Recognition and Artificial Intelligence, 17.

Sanchez-Matilla, R., F. Poiesi, and A. Cavallaro. (2016). Online multi-target tracking with strong and weak detections. Computer Vision – ECCV 2016 Workshops. Cham: Springer International Publishing.

Lipetski, Y. and O. Sidla. (2017). A combined HOG and deep convolution network cascade for pedestrian detection. Electronic Imaging, 2017, 11-17.

Zhang, J., J. Cao, and B. Mao. (2016). Moving object detection based on non-parametric methods and frame difference for traceability video analysis. Procedia Computer Science, 91, 995-1000.

Rui, T., et al. (2017). Pedestrian detection based on multi-convolutional features by feature maps pruning. Multimedia Tools and Applications, 76(23), 25079-25089.

Chen, N., W.-N. Chen, and J. Zhang. (2015). Fast detection of human using differential evolution. Signal Processing, 110, 155-163.

Bahri, H., et al. (2020). Real-time moving human detection using HOG and Fourier descriptor based on CUDA Implementation. Journal of Real-Time Image Processing, 17.

González, A., et al. (2016). Pedestrian detection at day/night time with visible and FIR cameras: A comparison. Sensors, 16(6), 820.

Trevethan, R. (2017). Sensitivity, specificity, and predictive values: Foundations, pliabilities, and pitfalls in research and practice. Frontiers in Public Health, 5(307).

Fluss, R., D. Faraggi, and B. Reiser. (2005). Estimation of the Youden Index and its associated cutoff point. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 47(4), 458-472.

Downloads

Published

24-04-2024