Discovery of Interpretable Patterns of Breast Cancer Diagnosis via Class Association Rule Mining (CARM) With SHAP-Based Explainable AI (XAI)
DOI:
https://doi.org/10.11113/mjfas.v21n3.3792Keywords:
Breast cancer, class association rule, pattern discovery, SHAP, XAI model.Abstract
Breast cancer remains the most common cancer among women globally highlighting the importance of early and reliable diagnostic methods. While previous studies have applied association rules mining (ARM) to explore factors contributing to breast cancer, many lacked robust validation of the extracted rules. To address this gap and deepen our understanding of the key biological markers linked to the disease, this study proposes a hybrid framework that integrates Class Association Rule Mining (CARM) with SHapley Additive exPlanations (SHAP) values based on Random Forest (RF) and Gradient Boost (GB) models to uncover and validate meaningful diagnostic patterns. Using the Breast Cancer Coimbra (BCC) dataset comprising 116 patient samples and nine biological markers, a total of 723,938 association rules (AR) were generated with 17,720 significant class association rules (CAR) were extracted. These rules were pruned using lift, leverage and conviction to retain the most relevant ones. Among the healthy group, combinations involving low glucose, low insulin, low resistin and low Homeostatic Model Assessment (HOMA) were consistently observed, while high BMI appeared particularly among younger individuals. These features were associated with negative SHAP values validating their contribution to healthy classifications. In contrast, common patterns such as high glucose, medium resistin and medium Monocyte Chemoattractant Protein-1 (MCP.1) among middle aged individuals highlighting their influence in predicting patient classification. These features consistently showed strong positive SHAP values across both classifiers highlighting their influence in predicting patient outcomes. By combining rule extraction of CARM with feature contribution using SHAP, this study provides a validated and interpretable approach for breast cancer diagnosis. The findings highlight the importance of feature interactions and offer promising directions for personalized risk assessment and early detection.
References
World Health Organization. (2024, February 1). Global cancer burden growing, amidst mounting need for services. https://www.who.int/news/item/01-02-2024-global-cancer-burden-growing--amidst-mounting-need-for-services
Siegel, R. L., Giaquinto, A. N., & Ahmedin, J. (2024). Cancer statistics, 2024. https://doi.org/10.3322/caac.21820
Bellazzi, R., & Zupan, B. (2008). Predictive data mining in clinical medicine: Current issues and guidelines. International Journal of Medical Informatics, 77(2), 81–97. https://doi.org/10.1016/j.ijmedinf.2006.11.006
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3).
Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann.
Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. ACM SIGMOD Record, 22(2), 207–216. https://doi.org/10.1145/170036.170072
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of 20th International Conference on Very Large Data Bases (VLDB'94), 487–499. https://citeseer.ist.psu.edu/agrawal94fast.html
Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 1998).
Barredo Arrieta, A., et al. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012
Kok, I., Okay, F. Y., Muyanli, O., & Ozdemir, S. (2023). Explainable artificial intelligence (XAI) for Internet of Things: A survey. IEEE Internet of Things Journal, 10(16), 14764–14779. https://doi.org/10.1109/JIOT.2023.3287678
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys, 51(5), 1–45. https://doi.org/10.1145/3236009
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 2017(December), 4766–4775.
Fahrudin, T. M., Syarif, I., & Barakbah, A. R. (2017). Discovering patterns of NED-breast cancer based on association rules using apriori and FP-growth. Proceedings of the International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC 2017), 2017(1), 132–139. https://doi.org/10.1109/KCIC.2017.8228576
Kabir, M. F., Ludwig, S. A., & Abdullah, A. S. (2018). Rule discovery from breast cancer risk factors using association rule mining. Proceedings of the 2018 IEEE International Conference on Big Data (Big Data 2018), 2433–2441. https://doi.org/10.1109/BigData.2018.8622028
Oladipupo, O., Olajide, O., Adubi, S., Oyelade, J., & Omogbadegun, Z. (2021). An interval type-2 fuzzy association rule mining approach to pattern discovery in breast cancer dataset. Journal of Computer Science, 17(3), 330–348. https://doi.org/10.3844/JCSSP.2021.330.348
Khater, T., et al. (2023). An explainable artificial intelligence model for the classification of breast cancer. IEEE Access, PP, 1. https://doi.org/10.1109/ACCESS.2023.3308446
Liu, Y., Fu, Y., Peng, Y., & Ming, J. (2024). Clinical decision support tool for breast cancer recurrence prediction using SHAP value in cooperative game theory. Heliyon, 10(2), e24876. https://doi.org/10.1016/j.heliyon.2024.e24876
Suresh, T., Assegie, T. A., Ganesan, S., Tulasi, R. L., Mothukuri, R., & Salau, A. O. (2023). Explainable extreme boosting model for breast cancer diagnosis. International Journal of Electrical and Computer Engineering, 13(5), 5764–5769. https://doi.org/10.11591/ijece.v13i5.pp5764-5769
Moncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S., & Geleijnse, G. (2021). Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Scientific Reports, 11(1), 1–14. https://doi.org/10.1038/s41598-021-86327-7
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.
Akbas, K. E., et al. (2022). Assessment of association rule mining using interest measures on the gene data. Medical Records, 4(3), 286–292. https://doi.org/10.37990/medr.1088631
Patrcio, M., Pereira, J., Crisstomo, J., Matafome, P., Seia, R., & Caramelo, F. (n.d.). Breast Cancer Coimbra. UCI Machine Learning Repository.
Nam, G. E., Zhang, Z. F., Rao, J., Zhou, H., & Jung, S. Y. (2021). Interactions between adiponectin-pathway polymorphisms and obesity on postmenopausal breast cancer risk among African American women: The WHI SHARe study. Frontiers in Oncology, 11(July). https://doi.org/10.3389/fonc.2021.698198
Panigoro, S. S., et al. (2021). The association between triglyceride-glucose index as a marker of insulin resistance and the risk of breast cancer. Frontiers in Endocrinology, 12(October), 1–7. https://doi.org/10.3389/fendo.2021.745236
Ke, J., et al. (2021). Glucose intolerance and cancer risk: A community-based prospective cohort study in Shanghai, China. Frontiers in Oncology, 11(August). https://doi.org/10.3389/fonc.2021.726672
Sudan, S. K., et al. (2024). Obesity and early-onset breast cancer and specific molecular subtype diagnosis in Black and White women. JAMA Network Open, 7(7), e2421846. https://doi.org/10.1001/jamanetworkopen.2024.21846
Diao, S., et al. (2021). Obesity-related proteins score as a potential marker of breast cancer risk. Scientific Reports, 11(1), 1–11. https://doi.org/10.1038/s41598-021-87583-3
Sarker, P., Ksibi, A., Jamjoom, M. M., Choi, K., Al Nahid, A., & Samad, M. A. (2025). Breast cancer prediction with feature-selected XGB classifier, optimized by metaheuristic algorithms. Journal of Big Data, 12(1). https://doi.org/10.1186/s40537-025-01132-7
Alfian, G., et al. (2022). Predicting breast cancer from risk factors using SVM and extra-trees-based feature selection method. Computers, 11(9). https://doi.org/10.3390/computers11090136
Kazerani, R. (2024). Improving breast cancer diagnosis accuracy by particle swarm optimization feature selection. International Journal of Computational Intelligence Systems, 17(1). https://doi.org/10.1007/s44196-024-00428-5
Anusha, P. V., Anuradha, C., Chandra Murty, P. S. R., & Kiran, C. S. (2019). Detecting outliers in high dimensional data sets using Z-score methodology. International Journal of Innovative Technology and Exploring Engineering, 9(1), 48–53. https://doi.org/10.35940/ijitee.A3910.119119
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1), 321–357.
Darrab, S., Broneske, D., & Saake, G. (2024). Exploring the predictive factors of heart disease using rare association rule mining. Scientific Reports, 14(1), 1–26. https://doi.org/10.1038/s41598-024-69071-6
Liu, Y., et al. (2023). A novel FCTF evaluation and prediction model for food efficacy based on association rule mining. Frontiers in Nutrition, 10(August), 1–11. https://doi.org/10.3389/fnut.2023.1170084
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Han, S., & Kim, H. (2019). On the optimal size of candidate feature set in random forest. Applied Sciences, 9(5). https://doi.org/10.3390/app9050898
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.
Wang, L., Jiang, S., & Jiang, S. (2021). A feature selection method via analysis of relevance, redundancy, and interaction. Expert Systems with Applications, 183(June), 115365. https://doi.org/10.1016/j.eswa.2021.115365
Augustin, L. S. A., et al. (2017). Low glycemic index diet, exercise and vitamin D to reduce breast cancer recurrence (DediCa): Design of a clinical trial. BMC Cancer, 17(1), 1–13. https://doi.org/10.1186/s12885-017-3064-4
Pan, K., et al. (2020). Insulin resistance and breast cancer incidence and mortality in postmenopausal women in the Women's Health Initiative. Cancer, 126(16), 3638–3647. https://doi.org/10.1002/cncr.33002
Liu, K., et al. (2018). Association between body mass index and breast cancer risk: Evidence based on a dose–response meta-analysis. Cancer Management and Research, 10, 143–151. https://doi.org/10.2147/CMAR.S144619
Mohanty, S. S., & Mohanty, P. K. (2021). Obesity as potential breast cancer risk factor for postmenopausal women. Genes & Diseases, 8(2), 117–123. https://doi.org/10.1016/j.gendis.2019.09.006
Yee, L. D., Mortimer, J. E., Natarajan, R., Dietze, E. C., & Seewaldt, V. L. (2020). Metabolic health, insulin, and breast cancer: Why oncologists should care about insulin. Frontiers in Endocrinology, 11(February), 1–25. https://doi.org/10.3389/fendo.2020.00058
Qiu, J., Zheng, Q., & Meng, X. (2021). Hyperglycemia and chemoresistance in breast cancer: From cellular mechanisms to treatment response. Frontiers in Oncology, 11(February), 1–12. https://doi.org/10.3389/fonc.2021.628359
Zoroddu, S., Di Lorenzo, B., Paliogiannis, P., Mangoni, A. A., Carru, C., & Zinellu, A. (2024). Resistin and omentin in breast cancer: A systematic review and meta-analysis. Clinica Chimica Acta, 562(July), 119838. https://doi.org/10.1016/j.cca.2024.119838
Barulina, M., Gergenreter, Y., Zakharova, N., Maslyakov, V., Fedorov, V., & Ulitin, I. (2023). Predictive diagnosis of breast cancer based on cytokine profile. Engineering Proceedings, 33(1), 2–8. https://doi.org/10.3390/engproc2023033004
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Shahiratul Amalina Abd Karim, Ummul Hanan Mohamad, Puteri, N. E. Nohuddin

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.