Multimodal Convolutional Neural Networks for Sperm Motility and Concentration Predictions


  • Voon Hueh Goh Department of Biomedical Engineering and Health Sciences, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
  • Muhammad Asraf Mansor Department of Biomedical Engineering and Health Sciences, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
  • Muhammad Amir As'ari ᵃDepartment of Biomedical Engineering and Health Sciences, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia; ᵇSport Innovation and Technology Centre (SITC), Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
  • Lukman Hakim Ismail Department of Biomedical Engineering and Health Sciences, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia



Sperm parameters prediction, Semen analysis, 3DCNN, ResNet18, Multimodal learning


Semen analysis is an important analysis for male infertility primary investigation and manual semen analysis is a conventional method to assess it. Manual semen analysis has been revealed with accuracy and precision limitations due to noncompliance to guidelines and procedures. Sperm motility and concentration are the main indicators for pregnancy and conception rate hence they were selected for parameters prediction. Convolutional neural network (CNN) has benefited computer vision application industry in recent years and has been widely applied in computer vision research tasks. In this paper, three-dimensional CNN (3DCNN) was designed to extract motion and temporal features, which are vital for sperm motility prediction. For sperm concentration, since two-dimensional CNN (2DCNN) is efficient in recognizing and extracting spatial features, well-established Residual Network (ResNet) architecture was adopted and customized for sperm concentration prediction. Multimodal learning approach is a technique to aggregate learnt features from different deep learning architecture that adopted other forms of modalities, which could provide deep learning model with better insights on their tasks. Hence, a multimodal learning deep learning architecture was designed to receive both image-based (frames extracted from video samples) and video-based (stacked frames pre-processed from video samples) input that could provide well-extracted spatial and temporal features for sperm parameters prediction.  The results obtained using the proposed methodology have surpassed other similar research works who used deep learning approach. For sperm motility, its best achieved average mean absolute error (MAE) was 8.048, and sperm concentration obtained a competent Pearson’s correlation coefficient (RP) value of 0.853.


World Health Organization. (2020). Infertility. Sep. 2020.

M. C. Inhorn and P. Patrizio. (2014). Infertility around the globe: New thinking on gender, reproductive technologies and global movements in the 21st century. Hum. Reprod. Update, 21(4), 411-426. Doi: 10.1093/humupd/dmv016.

World Health. (2010). Examination and processing of human semen.World Health, 10, 286.

N. Punjani et al. (2023). Changes in semen analysis over time: A temporal trend analysis of 20 years of subfertile non-azoospermic men. World J. Mens. Health, 41(2), 382. Doi: 10.5534/wjmh.210201.

F. Xianchun, F. Jun, D. Zhijun, and H. Mingyun. (2023). Effects of ureaplasma urealyticum infection on semen quality and sperm morphology. Front. Endocrinol. (Lausanne)., 14. Doi: 10.3389/fendo.2023.1113130.

J. Huang, H. Chen, N. Li, and Y. Zhao. (2023). Emerging microfluidic technologies for sperm sorting. Eng. Regen., 4(2): 161-169. Doi: 10.1016/j.engreg.2023.02.001.

B. Ducot, A. Spira, D. Feneux, and P. Jouannet. (1988). Male factors and the likelihood of pregnancy in infertile couples. 11. Study of clinical characteristics — practical consequences. J. Androl., 11(5), 395-404. Doi: 10.1111/j.1365-2605.1988.tb01012.x.

K. P. Nallella, R. K. Sharma, N. Aziz, and A. Agarwal. (2006). Significance of sperm characteristics in the evaluation of male infertility. Fertil. Steril., 85(3), 629-634. Doi: 10.1016/j.fertnstert.2005.08.024.

N. Kumar and A. Singh. (2015). Trends of male factor infertility, an important cause of infertility: A review of literature. J. Hum. Reprod. Sci., 8(4), 191-196. Doi: 10.4103/0974-1208.170370.

S. A. Hicks et al. (2019). Machine learning-based analysis of sperm videos and participant data for male fertility prediction. Sci. Rep., 9(1), 1-10. Doi: 10.1038/s41598-019-53217-y.

T. G. Cooper et al. (2009). World Health Organization reference values for human semen characteristics. Hum. Reprod. Update, 16(3), 231-245. Doi: 10.1093/humupd/dmp048.

J. Auger et al. (2000). Intra- and inter-individual variability in human sperm concentration, motility and vitality assessment during a workshop involving ten laboratories. Hum. Reprod., 15(11), 2360-2368. Doi: 10.1093/humrep/15.11.2360.

J. Lammers, S. Chtourou, A. Reignier, S. Loubersac, P. Barrière, and T. Fréour. (2021). Comparison of two automated sperm analyzers using 2 different detection methods versus manual semen assessment. J. Gynecol. Obstet. Hum. Reprod., 50(8). Doi: 10.1016/j.jogoh.2021.102084.

M. J. Tomlinson. (2016). Uncertainty of measurement and clinical value of semen analysis: has standardisation through professional guidelines helped or hindered progress? Andrology, 4(5), 763-770. Doi: 10.1111/andr.12209.

A. Agarwal and R. K. Sharma. (2007). Automation is the key to standardized semen analysis using the automated SQA-V sperm quality analyzer. Fertil. Steril., 87(1), 156-162. Doi: 10.1016/j.fertnstert.2006.05.083.

J. F. Moruzzi, A. J. Wyrobek, B. H. Mayall, and B. L. Gledhill. (1988). Quantification and classification of human sperm morphology by computer-assisted image analysis. Fertil. Steril., 50(1), 142-152. Doi: 10.1016/s0015-0282(16)60022-5.

S. T. Mortimer, G. Van Der Horst, and D. Mortimer. (2015). The future of computer-aided sperm analysis. Asian J. Androl., 17(4), 545-553x. Doi: 10.4103/1008-682X.154312.

K. M. Engel, S. Grunewald, J. Schiller, and U. Paasch. (2019). Automated semen analysis by SQA Vision ® versus the manual approach—A prospective double-blind study. Andrologia, 51(1), 1-10. Doi: 10.1111/and.13149.

J. Lammers, C. Splingart, P. Barrière, M. Jean, and T. Fréour. (2014). Double-blind prospective study comparing two automated sperm analyzers versus manual semen assessment. J. Assist. Reprod. Genet., 31(1), 35-43. Doi: 10.1007/s10815-013-0139-2.

O. M. Yis. (2020). Comparison of fully automatic analyzer and manual measurement methods in sperm analysis and clinical affect. Exp. Biomed. Res., 34: 224-230. Doi: 10.30714/j-ebr.2020463605.

T. G. Cooper and C. H. Yeung. (2006). Computer-aided evaluation of assessment of ‘grade a’ spermatozoa by experienced technicians. Fertil. Steril., 85(1), 220-224. Doi: 10.1016/j.fertnstert.2005.07.1286.

S. Ji, W. Xu, M. Yang, and K. Yu. (2013). 3D Convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(1), 221-231. Doi: 10.1109/TPAMI.2012.59.

D. Brezeale and D. J. Cook. (2008). Automatic video classification: A survey of the literature. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 38(3): 416-430. Doi: 10.1109/TSMCC.2008.919173.

V. Thambawita, P. Halvorsen, H. Hammer, M. Riegler, and T. B. Haugen. (2019). Extracting temporal features into a spatial domain using autoencoders for sperm video analysis. arXiv, 2670, 3-5.

V. Thambawita, P. Halvorsen, H. Hammer, M. Riegler, and T. B. Haugen. (2019). Stacked dense optical flows and dropout layers to predict sperm motility and morphology. arXiv, 10, 9-11.

J. M. Rosenblad, S. Hicks, H. K. Stensland, T. B. Haugen, P. Halvorsen, and M. Riegler. (2019). Using 2D and 3D convolutional neural networks to predict semen quality. CEUR Workshop Proc., 2670, 27-29,

Priyansi, B. Bhattacharjee, and J. H. Rahim. (2021). Predicting semen motility using three-dimensional convolutional neural networks, 1-8.

A. Lesani et al., “Quantification of human sperm concentration using machine learning-based spectrophotometry,” Comput. Biol. Med., vol. 127, no. August, p. 104061, 2020, doi: 10.1016/j.compbiomed.2020.104061.

“Simula Visem.” .

S. Hicks et al., “Predicting sperm motility and morphology using deep learning and handcrafted features,” CEUR Workshop Proc., vol. 2670, no. October, pp. 27–29, 2019.

S. Lu, Z. Li, Z. Qin, X. Yang, and R. S. M. Goh, “A hybrid regression technique for house prices prediction,” IEEE Int. Conf. Ind. Eng. Eng. Manag., vol. 2017-Decem, pp. 319–323, 2018, doi: 10.1109/IEEM.2017.8289904.

S. Lessmann and S. Voß, “Car resale price forecasting: The impact of regression method, private information, and heterogeneity on forecast accuracy,” Int. J. Forecast., vol. 33, no. 4, pp. 864–877, 2017, doi: 10.1016/j.ijforecast.2017.04.003.

S. Münzner, P. Schmidt, A. Reiss, M. Hanselmann, R. Stiefelhagen, and R. Dürichen, “CNN-based sensor fusion techniques for multimodal human activity recognition,” Proc. - Int. Symp. Wearable Comput. ISWC, vol. Part F1305, pp. 158–165, 2017, doi: 10.1145/3123021.3123046.

K. Wang, M. Bansal, and J. M. Frahm, “Retweet wars: Tweet popularity prediction via dynamic multimodal regression,” Proc. - 2018 IEEE Winter Conf. Appl. Comput. Vision, WACV 2018, vol. 2018-Janua, pp. 1842–1851, 2018, doi: 10.1109/WACV.2018.00204.

T. B. Haugen et al., “VISEM: A multimodal video dataset of human spermatozoa,” Proc. 10th ACM Multimed. Syst. Conf. MMSys 2019, pp. 261–266, Jun. 2019, doi: 10.1145/3304109.3325814.

M. J. Tomlinson et al., “Validation of a novel computer-assisted sperm analysis (CASA) system using multitarget-tracking algorithms,” Fertil. Steril., vol. 93, no. 6, pp. 1911–1920, 2010, doi: 10.1016/j.fertnstert.2008.12.064.