Discovery of Interpretable Patterns of Breast Cancer Diagnosis via Class Association Rule Mining (CARM) With SHAP-Based Explainable AI (XAI)

Shahiratul Amalina Abd Karim; Ummul Hanan Mohamad; Puteri Nor Ellyza Nohuddin

doi:10.11113/mjfas.v21n3.3792

Authors

Shahiratul Amalina Abd Karim Institute of Visual Informatics, Bangunan Akademia Siber Teknopolis, Universiti Kebangsaan Malaysia, 43600 Bangi, Malaysia
Ummul Hanan Mohamad ᵃInstitute of Visual Informatics, Bangunan Akademia Siber Teknopolis, Universiti Kebangsaan Malaysia, 43600 Bangi, Malaysia; ᵇiAI-UKM Research Group, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
Puteri, N. E. Nohuddin ᵃInstitute of Visual Informatics, Bangunan Akademia Siber Teknopolis, Universiti Kebangsaan Malaysia, 43600 Bangi, Malaysia; ᵇiAI-UKM Research Group, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia

DOI:

https://doi.org/10.11113/mjfas.v21n3.3792

Keywords:

Breast cancer, class association rule, pattern discovery, SHAP, XAI model.

Abstract

Breast cancer remains the most common cancer among women globally highlighting the importance of early and reliable diagnostic methods. While previous studies have applied association rules mining (ARM) to explore factors contributing to breast cancer, many lacked robust validation of the extracted rules. To address this gap and deepen our understanding of the key biological markers linked to the disease, this study proposes a hybrid framework that integrates Class Association Rule Mining (CARM) with SHapley Additive exPlanations (SHAP) values based on Random Forest (RF) and Gradient Boost (GB) models to uncover and validate meaningful diagnostic patterns. Using the Breast Cancer Coimbra (BCC) dataset comprising 116 patient samples and nine biological markers, a total of 723,938 association rules (AR) were generated with 17,720 significant class association rules (CAR) were extracted. These rules were pruned using lift, leverage and conviction to retain the most relevant ones. Among the healthy group, combinations involving low glucose, low insulin, low resistin and low Homeostatic Model Assessment (HOMA) were consistently observed, while high BMI appeared particularly among younger individuals. These features were associated with negative SHAP values validating their contribution to healthy classifications. In contrast, common patterns such as high glucose, medium resistin and medium Monocyte Chemoattractant Protein-1 (MCP.1) among middle aged individuals highlighting their influence in predicting patient classification. These features consistently showed strong positive SHAP values across both classifiers highlighting their influence in predicting patient outcomes. By combining rule extraction of CARM with feature contribution using SHAP, this study provides a validated and interpretable approach for breast cancer diagnosis. The findings highlight the importance of feature interactions and offer promising directions for personalized risk assessment and early detection.

References

World Health Organization. (2024, February 1). Global cancer burden growing, amidst mounting need for services. https://www.who.int/news/item/01-02-2024-global-cancer-burden-growing--amidst-mounting-need-for-services

Siegel, R. L., Giaquinto, A. N., & Ahmedin, J. (2024). Cancer statistics, 2024. https://doi.org/10.3322/caac.21820

Bellazzi, R., & Zupan, B. (2008). Predictive data mining in clinical medicine: Current issues and guidelines. International Journal of Medical Informatics, 77(2), 81–97. https://doi.org/10.1016/j.ijmedinf.2006.11.006

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3).

Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann.

Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. ACM SIGMOD Record, 22(2), 207–216. https://doi.org/10.1145/170036.170072

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of 20th International Conference on Very Large Data Bases (VLDB'94), 487–499. https://citeseer.ist.psu.edu/agrawal94fast.html

Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 1998).

Barredo Arrieta, A., et al. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012

Kok, I., Okay, F. Y., Muyanli, O., & Ozdemir, S. (2023). Explainable artificial intelligence (XAI) for Internet of Things: A survey. IEEE Internet of Things Journal, 10(16), 14764–14779. https://doi.org/10.1109/JIOT.2023.3287678

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys, 51(5), 1–45. https://doi.org/10.1145/3236009

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 2017(December), 4766–4775.

Fahrudin, T. M., Syarif, I., & Barakbah, A. R. (2017). Discovering patterns of NED-breast cancer based on association rules using apriori and FP-growth. Proceedings of the International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC 2017), 2017(1), 132–139. https://doi.org/10.1109/KCIC.2017.8228576

Kabir, M. F., Ludwig, S. A., & Abdullah, A. S. (2018). Rule discovery from breast cancer risk factors using association rule mining. Proceedings of the 2018 IEEE International Conference on Big Data (Big Data 2018), 2433–2441. https://doi.org/10.1109/BigData.2018.8622028

Oladipupo, O., Olajide, O., Adubi, S., Oyelade, J., & Omogbadegun, Z. (2021). An interval type-2 fuzzy association rule mining approach to pattern discovery in breast cancer dataset. Journal of Computer Science, 17(3), 330–348. https://doi.org/10.3844/JCSSP.2021.330.348

Khater, T., et al. (2023). An explainable artificial intelligence model for the classification of breast cancer. IEEE Access, PP, 1. https://doi.org/10.1109/ACCESS.2023.3308446

Liu, Y., Fu, Y., Peng, Y., & Ming, J. (2024). Clinical decision support tool for breast cancer recurrence prediction using SHAP value in cooperative game theory. Heliyon, 10(2), e24876. https://doi.org/10.1016/j.heliyon.2024.e24876

Suresh, T., Assegie, T. A., Ganesan, S., Tulasi, R. L., Mothukuri, R., & Salau, A. O. (2023). Explainable extreme boosting model for breast cancer diagnosis. International Journal of Electrical and Computer Engineering, 13(5), 5764–5769. https://doi.org/10.11591/ijece.v13i5.pp5764-5769

Moncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S., & Geleijnse, G. (2021). Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Scientific Reports, 11(1), 1–14. https://doi.org/10.1038/s41598-021-86327-7

Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.

Akbas, K. E., et al. (2022). Assessment of association rule mining using interest measures on the gene data. Medical Records, 4(3), 286–292. https://doi.org/10.37990/medr.1088631

Patrcio, M., Pereira, J., Crisstomo, J., Matafome, P., Seia, R., & Caramelo, F. (n.d.). Breast Cancer Coimbra. UCI Machine Learning Repository.

Nam, G. E., Zhang, Z. F., Rao, J., Zhou, H., & Jung, S. Y. (2021). Interactions between adiponectin-pathway polymorphisms and obesity on postmenopausal breast cancer risk among African American women: The WHI SHARe study. Frontiers in Oncology, 11(July). https://doi.org/10.3389/fonc.2021.698198

Panigoro, S. S., et al. (2021). The association between triglyceride-glucose index as a marker of insulin resistance and the risk of breast cancer. Frontiers in Endocrinology, 12(October), 1–7. https://doi.org/10.3389/fendo.2021.745236

Ke, J., et al. (2021). Glucose intolerance and cancer risk: A community-based prospective cohort study in Shanghai, China. Frontiers in Oncology, 11(August). https://doi.org/10.3389/fonc.2021.726672

Sudan, S. K., et al. (2024). Obesity and early-onset breast cancer and specific molecular subtype diagnosis in Black and White women. JAMA Network Open, 7(7), e2421846. https://doi.org/10.1001/jamanetworkopen.2024.21846

Diao, S., et al. (2021). Obesity-related proteins score as a potential marker of breast cancer risk. Scientific Reports, 11(1), 1–11. https://doi.org/10.1038/s41598-021-87583-3

Sarker, P., Ksibi, A., Jamjoom, M. M., Choi, K., Al Nahid, A., & Samad, M. A. (2025). Breast cancer prediction with feature-selected XGB classifier, optimized by metaheuristic algorithms. Journal of Big Data, 12(1). https://doi.org/10.1186/s40537-025-01132-7

Alfian, G., et al. (2022). Predicting breast cancer from risk factors using SVM and extra-trees-based feature selection method. Computers, 11(9). https://doi.org/10.3390/computers11090136

Kazerani, R. (2024). Improving breast cancer diagnosis accuracy by particle swarm optimization feature selection. International Journal of Computational Intelligence Systems, 17(1). https://doi.org/10.1007/s44196-024-00428-5

Anusha, P. V., Anuradha, C., Chandra Murty, P. S. R., & Kiran, C. S. (2019). Detecting outliers in high dimensional data sets using Z-score methodology. International Journal of Innovative Technology and Exploring Engineering, 9(1), 48–53. https://doi.org/10.35940/ijitee.A3910.119119

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1), 321–357.

Darrab, S., Broneske, D., & Saake, G. (2024). Exploring the predictive factors of heart disease using rare association rule mining. Scientific Reports, 14(1), 1–26. https://doi.org/10.1038/s41598-024-69071-6

Liu, Y., et al. (2023). A novel FCTF evaluation and prediction model for food efficacy based on association rule mining. Frontiers in Nutrition, 10(August), 1–11. https://doi.org/10.3389/fnut.2023.1170084

Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

Han, S., & Kim, H. (2019). On the optimal size of candidate feature set in random forest. Applied Sciences, 9(5). https://doi.org/10.3390/app9050898

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.

Wang, L., Jiang, S., & Jiang, S. (2021). A feature selection method via analysis of relevance, redundancy, and interaction. Expert Systems with Applications, 183(June), 115365. https://doi.org/10.1016/j.eswa.2021.115365

Augustin, L. S. A., et al. (2017). Low glycemic index diet, exercise and vitamin D to reduce breast cancer recurrence (DediCa): Design of a clinical trial. BMC Cancer, 17(1), 1–13. https://doi.org/10.1186/s12885-017-3064-4

Pan, K., et al. (2020). Insulin resistance and breast cancer incidence and mortality in postmenopausal women in the Women's Health Initiative. Cancer, 126(16), 3638–3647. https://doi.org/10.1002/cncr.33002

Liu, K., et al. (2018). Association between body mass index and breast cancer risk: Evidence based on a dose–response meta-analysis. Cancer Management and Research, 10, 143–151. https://doi.org/10.2147/CMAR.S144619

Mohanty, S. S., & Mohanty, P. K. (2021). Obesity as potential breast cancer risk factor for postmenopausal women. Genes & Diseases, 8(2), 117–123. https://doi.org/10.1016/j.gendis.2019.09.006

Yee, L. D., Mortimer, J. E., Natarajan, R., Dietze, E. C., & Seewaldt, V. L. (2020). Metabolic health, insulin, and breast cancer: Why oncologists should care about insulin. Frontiers in Endocrinology, 11(February), 1–25. https://doi.org/10.3389/fendo.2020.00058

Qiu, J., Zheng, Q., & Meng, X. (2021). Hyperglycemia and chemoresistance in breast cancer: From cellular mechanisms to treatment response. Frontiers in Oncology, 11(February), 1–12. https://doi.org/10.3389/fonc.2021.628359

Zoroddu, S., Di Lorenzo, B., Paliogiannis, P., Mangoni, A. A., Carru, C., & Zinellu, A. (2024). Resistin and omentin in breast cancer: A systematic review and meta-analysis. Clinica Chimica Acta, 562(July), 119838. https://doi.org/10.1016/j.cca.2024.119838

Barulina, M., Gergenreter, Y., Zakharova, N., Maslyakov, V., Fedorov, V., & Ulitin, I. (2023). Predictive diagnosis of breast cancer based on cytokine profile. Engineering Proceedings, 33(1), 2–8. https://doi.org/10.3390/engproc2023033004

Discovery of Interpretable Patterns of Breast Cancer Diagnosis via Class Association Rule Mining (CARM) With SHAP-Based Explainable AI (XAI)

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

cover

MJFAS

Current Issue