Evaluating the Factors and Forecasting Childhood Anemia Through Machine Learning Algorithms

Authors

  • Nahid Salma ᵃSchool of Mathematical Sciences, Universiti Sains Malaysia, Penang, Malaysia ᵇDepartment of Statistics and Data Science, Jahangirnagar University, Savar, Dhaka, Bangladesh-1342
  • Majid Khan Majahar Ali School of Mathematical Sciences, Universiti Sains Malaysia, Penang, Malaysia

DOI:

https://doi.org/10.11113/mjfas.v21n1.3520

Keywords:

School of Mathematical Sciences, Universiti Sains Malaysia, Penang, Malaysia.

Abstract

Anemia, characterized by insufficient hemoglobin levels, affects a significant portion of the global population, both in developed and developing nations, and is one of the most prevalent health conditions worldwide. With timely diagnosis and proper care, the risk of anemia can be reduced, potentially saving many lives. In this context, machine learning (ML) techniques can serve as valuable tools for disease diagnosis. Therefore, the objective of this study was to determine the most effective machine learning approach while considering the risk factors for childhood anemia in Bangladesh. Secondary data from the 2011 Bangladesh Demographic and Health Survey were analyzed, with both filter (chi-square test) and wrapper (Boruta algorithm) feature selection methods used to identify significant factors. The findings revealed that 52.11% of all Bangladeshi children suffered from varying degrees of anemia (mild: 29.72%, moderate: 21.60%, and severe: 0.8%). Nine key variables—children’s fathers’ education, child age, breastfeeding status, mother’s age, mother’s education, toilet type, water source, and the number of children under five years old—were found to be directly linked to anemia. Seven machine learning algorithms (KNN, NB, SVM, RF, Bagging, Gradient Boosting, and XGBoost) were compared based on model evaluation metrics, including accuracy, sensitivity, specificity, precision, Cohen’s Kappa, F1-score, and AUC. The results showed that Gradient Boosting outperformed the other algorithms with 87.46% accuracy, 85.31% sensitivity, 96.56% specificity, 95.35% precision, 0.5713 Kappa, 0.8990 F1-score, and 0.9099 AUC. Random Forest followed closely with 83.13% accuracy, 87.36% sensitivity, 83.01% specificity, 84.10% precision, 0.3601 Kappa, 0.8555 F1-score, and 0.8531 AUC. Support Vector Machine (SVM) showed 84.46% accuracy, 0.3501 Kappa, 0.8046 F1-score, and 0.8264 AUC, while XGBoost demonstrated 75.99% accuracy, 75.87% sensitivity, 76.23% specificity, 82.76% precision, 0.3319 Kappa, 0.7913 F1-score, and 0.7599 AUC. These findings suggest that machine learning techniques—especially Gradient Boostingcan be highly effective for predicting anemia in Bangladeshi children, assisting medical professionals in early detection and intervention. The results of this study are expected to guide policymakers and healthcare providers in improving patient care and advancing Bangladesh’s progress towards achieving the Sustainable Development Goal (SDG) related to health.

References

Abdullah, M., & Al-Asmari, S. (2016). Anemia types prediction based on data mining classification algorithms. In Communication, management and information technology (pp. 629–636). CRC Press.

Afroja, S., Kabir, M. R., & Islam, M. A. (2020). Analysis of determinants of severity levels of childhood anemia in Bangladesh using a proportional odds model. Clinical Epidemiology and Global Health, 8(1), 175–180.

Kursa, M. B., Jankowski, A., & Rudnicki, W. R. (2010). Boruta–a system for feature selection. Fundamenta Informaticae, 101(4), 271–285.

Ayoya, M. A., Ngnie-Teta, I., Séraphin, M. N., Mamadoultaibou, A., Boldon, E., Saint-Fleur, J. E., et al. (2013). Prevalence and risk factors of anemia among children 6–59 months old in Haiti. Anemia, 2013.

Bangladesh Bureau of Statistics. (2004). Anemia prevalence survey of Urban Bangladesh and Rural Chittagong Hill Tracts 2003. Dhaka, Bangladesh: Bangladesh Bureau of Statistics, Statistics Division, Ministry of Planning, Government of the People's Republic of Bangladesh UNICEF.

Balarajan, Y., Ramakrishnan, U., Özaltin, E., Shankar, A. H., & Subramanian, S. V. (2011). Anaemia in low-income and middle-income countries. Lancet, 378, 2123–2135. https://doi.org/10.1016/S0140-6736(10)62304-5

Chowdhury, M. R. K., Khan, M. M. H., Khan, H. T., Rahman, M. S., Islam, M. R., Islam, M. M., & Billah, B. (2020). Prevalence and risk factors of childhood anemia in Nepal: A multilevel analysis. PLOS ONE, 15(10), e0239409.

Leong, L. K., & Abdullah, A. A. (2019, November). Prediction of Alzheimer’s disease (AD) using machine learning techniques with Boruta algorithm as feature selection method. In Journal of Physics: Conference Series (Vol. 1372, No. 1, p. 012065). IOP Publishing.

Dutta, M., Bhise, M., Prashad, L., Chaurasia, H., & Debnath, P. (2020). Prevalence and risk factors of anemia among children 6–59 months in India: A multilevel analysis. Clinical Epidemiology and Global Health, 0–1. https://doi.org/10.1016/j.cegh.2020.02.015

Desai, M. R., Terlouw, D. J., Kwena, A. M., Phillips-Howard, P. A., Kariuki, S. K., Wannemuehler, K. A., et al. (2005). Factors associated with hemoglobin concentrations in preschool children in western Kenya: Cross-sectional studies. American Journal of Tropical Medicine and Hygiene, 72, 47–59. https://doi.org/10.4269/ajtmh.2005.72.47

Faruk, A. (2000). Anaemia in Bangladesh: A review of prevalence and aetiology. Public Health Nutrition, 3(4), 385–393.

Fafalios, S., Charonyktakis, P., & Tsamardinos, I. (2020). Gradient Boosting Trees. Gnosis Data Analysis PC, 1–3.

General Economics Division (GED). (2015). Millennium Development Goals: Bangladesh Progress Report 2015. Planning Commission, Government of the People's Republic of Bangladesh.

Helen Keller International. (2006). The burden of anemia in rural Bangladesh: The need for urgent action. Nutrition Surveillance Project Bulletin, 16.

Horton, S., & Ross, J. (2003). The economics of iron deficiency. Food Policy, 28, 51–75. https://doi.org/10.1016/S0306-9192(02)00070-2

Hsieh, C. H., Lu, R. H., Lee, N. H., Chiu, W. T., Hsu, M. H., & Li, Y. C. J. (2011). Novel solutions for an old disease: Diagnosis of acute appendicitis with random forest, support vector machines, and artificial neural networks. Surgery, 149(1), 87–93.

International Centre for Diarrheal Disease Research Bangladesh (icddr,b), United Nations Children’s Fund (UNICEF), Global Alliance for Improved Nutrition (GAIN), & Institute of Public Nutrition. (2013). National Micronutrients Status Survey 2011–12: Final Report. Dhaka, Bangladesh: Centre for Nutrition and Food Security, icddr,b.

Khan, J. R., Awan, N., & Misu, F. (2016). Determinants of anemia among 6–59 months aged children in Bangladesh: Evidence from nationally representative data. BMC Pediatrics, 16, 1–12.

Khan, J. R., Chowdhury, S., Islam, H., & Raheem, E. (2019). Machine learning algorithms to predict childhood anemia in Bangladesh. Journal of Data Science, 17(1), 195–218.

Moschovis, P. P., Wiens, M. O., Arlington, L., Antsygina, O., Hayden, D., Dzik, W., et al. (2018). Individual, maternal and household risk factors for anaemia among young children in sub-Saharan Africa: A cross-sectional study. BMJ Open, 8, 1–14.

Meng, X. H., Huang, Y. X., Rao, D. P., Zhang, Q., & Liu, Q. (2013). Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. The Kaohsiung Journal of Medical Sciences, 29(2), 93–99.

Ntenda, P. A. M., Nkoka, O., Bass, P., & Senghore, T. (2018). Maternal anemia is a potential risk factor for anemia in children aged 6–59 months in Southern Africa: A multilevel analysis. BMC Public Health, 18, 1–13.

National Institute of Population Research, Training (Bangladesh), Mitra and Associates (Firm), & Macro International. (2011). Bangladesh demographic and health survey. National Institute of Population Research and Training (NIPORT).

Leal, L. P., Batista Filho, M., Lira, P. I. C. D., Figueiroa, J. N., & Osório, M. M. (2011). Prevalence of anemia and associated factors in children aged 6–59 months in Pernambuco, Northeastern Brazil. Revista de Saúde Pública, 45, 457–466.

Rahman, M. S., Mushfiquee, M., Masud, M. S., & Howlader, T. (2019). Association between malnutrition and anemia in under-five children and women of reproductive age: Evidence from Bangladesh Demographic and Health Survey 2011. PLOS ONE, 14(7), e0219170.

Rashid, M., Flora, M. S., Moni, M. A., Akhter, A., & Mahmud, Z. (2010). Reviewing anemia and iron folic acid supplementation program in Bangladesh—a special article. Bangladesh Medical Journal, 39(3).

Rawat, R., Saha, K. K., Kennedy, A., Rohner, F., Ruel, M., & Menon, P. (2014). Anaemia in infancy in rural Bangladesh: Contribution of iron deficiency, infections and poor feeding practices. British Journal of Nutrition, 111(1), 172–181.

Sanap, S. A., Nagori, M., & Kshirsagar, V. (2011, December). Classification of anemia using data mining techniques. In International Conference on Swarm, Evolutionary, and Memetic Computing (pp. 113–121). Springer, Berlin, Heidelberg.

Sunuwar, D. R., Singh, D. R., Pradhan, P. M. S., Shrestha, V., Rai, P., Shah, S. K., & Adhikari, B. (2023). Factors associated with anemia among children in South and Southeast Asia: A multilevel analysis. BMC Public Health, 23(1), 1–17.

Stevens, G. A., Finucane, M. M., De-Regil, L. M., Paciorek, C. J., Flaxman, S. R., Branca, F., et al. (2013). Global, regional, and national trends in haemoglobin concentration and prevalence of total and severe anaemia in children and pregnant and non-pregnant women for 1995–2011: A systematic analysis of population-representative data. Lancet Global Health, 1(1), 16–25. https://doi.org/10.1016/s2214-109x(13)70001-9

Stevens, G. A., Finucane, M. M., De-Regil, L. M., Paciorek, C. J., Flaxman, S. R., Branca, F., et al. (2013). Global, regional, and national trends in haemoglobin concentration and prevalence of total and severe anaemia in children and pregnant and non-pregnant women for 1995–2011: A systematic analysis of population-representative data. Lancet Global Health, 1, 16–25.

Stevens, G. A., Finucane, M. M., De-Regil, L. M., Paciorek, C. J., Flaxman, S. R., Branca, F., et al. (2022). National, regional, and global estimates of anaemia by severity in women and children for 2000–19: A pooled analysis of population-representative data. Lancet Global Health, 10, e627–e639. https://doi.org/10.1016/S2214-109X(22)00084-5

Mahesh, B. (2020). Machine learning algorithms—a review. International Journal of Science and Research (IJSR), 9(1), 381–386.

Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.

World Health Organization. (2001). Iron deficiency anaemia: Assessment, prevention and control: A guide for programme managers. World Health Organization. https://doi.org/10.1136/pgmj.2009.089987

World Health Organization. (2016). South Asia - Prevalence of anemia. Retrieved from https://www.indexmundi.com/facts/south-asia/prevalence-of-anemia

Witten, I. H., Frank, E., Hall, M. A., Pal, C. J., & DATA, M. (2005, June). Practical machine learning tools and techniques. In Data Mining (Vol. 2, No. 4).

Yusuf, A., Mamun, A. S. M. A., Kamruzzaman, M., Saw, A., Abo El-fetoh, N. M., Lestrel, P. E., & Hossain, M. (2019). Factors influencing childhood anaemia in Bangladesh: A two-level logistic regression analysis. BMC Pediatrics, 19(1), 1–9.

Yu, W., Liu, T., Valdez, R., Gwinn, M., & Khoury, M. J. (2010). Application of support vector machine modeling for prediction of common diseases: The case of diabetes and pre-diabetes. BMC Medical Informatics and Decision Making, 10(1), 1–7.

Zhang, Q., Ananth, C. V., Li, Z., & Smulian, J. C. (2009). Maternal anaemia and preterm birth: A prospective cohort study. International Journal of Epidemiology, 38(5), 1380–1389.

Zhao, Y., Healy, B. C., Rotstein, D., Guttmann, C. R., Bakshi, R., Weiner, H. L., et al. (2017). Exploration of machine learning techniques in predicting multiple sclerosis disease course. PloS One, 12(4), e0174866.

Alghamdi, M., Al-Mallah, M., Keteyian, S., Brawner, C., Ehrman, J., Sakr, S. (2017). Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project. PloS One, 12(7), e0179805.

Islam, M. M., Rahman, M. J., Roy, D. C., Islam, M. M., Tawabunnahar, M., Ahmed, N. F., & Maniruzzaman, M. (2022). Risk factors identification and prediction of anemia among women in Bangladesh using machine learning techniques. Current Women's Health Reviews, 18(1), 118–133.

Tesfaye, S. H., Seboka, B. T., & Sisay, D. (2024). Application of machine learning methods for predicting childhood anaemia: Analysis of Ethiopian Demographic Health Survey of 2016. PloS One, 19(4), e0300172.

Saputra, D. C. E., Sunat, K., & Ratnaningsih, T. (2023, February). A new artificial intelligence approach using extreme learning machine as the potentially effective model to predict and analyze the diagnosis of anemia. In Healthcare, 11(5), 697. MDPI.

Gebeye, L. G., Dessie, E. Y., & Yimam, J. A. (2024). Predictors of micronutrient deficiency among children aged 6–23 months in Ethiopia: A machine learning approach. Frontiers in Nutrition, 10, 1277048.

Qasrawi, R., Sgahir, S., Nemer, M., Halaikah, M., Badrasawi, M., Amro, M., et al. (2024). Machine learning approach for predicting the impact of food insecurity on nutrient consumption and malnutrition in children aged 6 months to 5 years. Children, 11(7), 810.

Qasrawi, R., Badrasawi, M., Al-Halawa, D. A., Polo, S. V., Khader, R. A., Al-Taweel, H., et al. (2024). Identification and prediction of association patterns between nutrient intake and anemia using machine learning techniques: Results from a cross-sectional study with university female students from Palestine. European Journal of Nutrition, 1–15.

Salma, N., Al-Rammahi, A. H. M., & Ali, M. K. M. (2024). A novel feature selection method for ultra-high dimensional survival data. Malaysian Journal of Fundamental and Applied Sciences, 20(5), 1149–1171.

Reza, T. B., & Salma, N. (2024). Prediction and feature selection of low birth weight using machine learning algorithms. Journal of Health, Population and Nutrition, 43(1), 157.

Dalili, H., Baghersalimi, A., Dalili, S., Pakdaman, F., Rad, A. H., Kakroodi, M. A., et al. (2015). Is there any relation between duration of breastfeeding and anemia? Iranian Journal of Pediatric Hematology and Oncology, 5(4), 218.

Meinzen-Derr, J. K., Guerrero, M. L., Altaye, M., Ortega-Gallegos, H., Ruiz-Palacios, G. M., & Morrow, A. L. (2006). Risk of infant anemia is associated with exclusive breast-feeding and maternal anemia in a Mexican cohort. The Journal of Nutrition, 136(2), 452–458.

Kramer, M. S., & Kakuma, R. (2012). Optimal duration of exclusive breastfeeding. Cochrane Database of Systematic Reviews, (8).

Marques, R. F., Taddei, J. A., Lopez, F. A., & Braga, J. A. (2014). Breastfeeding exclusively and iron deficiency anemia during the first 6 months of age. Revista da Associação Médica Brasileira, 60, 18–22.

Rau, G., & Shih, Y. S. (2021). Evaluation of Cohen's kappa and other measures of inter-rater agreement for genre analysis and other nominal data. Journal of English for Academic Purposes, 53, 101026.

Thölke, P., Mantilla-Ramos, Y. J., Abdelhedi, H., Maschke, C., Dehgan, A., Harel, Y., et al. (2023). Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. NeuroImage, 277, 120253.

Downloads

Published

21-02-2025