A Framework to Spatially Cluster Air Quality Monitoring Stations in Peninsular Malaysia using the Hybrid Clustering Method


  • Nurul Alia Azizan School of Biomedical Science, Faculty of Health Sciences, Universiti Sultan Zainal Abidin, Gong Badak Campus, 21300 Kuala Nerus, Terengganu, Malaysia
  • Ahmad Syibli Othman School of Biomedical Science, Faculty of Health Sciences, Universiti Sultan Zainal Abidin, Gong Badak Campus, 21300 Kuala Nerus, Terengganu, Malaysia
  • Asheila AK Meramat Department of Basic Medical Science, Faculty of Medicine & Health Science, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia
  • Siti Noor Syuhada Muhammad Amin Universiti Sultan Zainal Abidin Science and Medicine Foundation Centre, Gong Badak Campus, 21300 Kuala Nerus, Terengganu, Malaysia
  • Azman Azid School of Animal Science, Aquatic Science and Environment, Faculty of Bioresources and Food Industry, Universiti Sultan Zainal Abidin, Kampus Besut, 22200 Besut, Terengganu, Malaysia




Multiple variables must be analyzed in order to assess air quality trends. It turns into a multidimensional issue that calls for dynamic methods. In order to provide an improved spatial cluster distribution with distinct validation, this study set out to illustrate the hybrid cluster method in air quality monitoring stations in Peninsular Malaysia. The Department of Environment, Malaysia (DOE), provided the data set, which covered the two-year period from 2018 to 2019. This study included six air quality pollutants: PM10, PM2.5, SO2, NO2, O3, and CO. Principal component analysis (PCA), a multivariate technique, was used to condense the information found in enormous data tables in order to better comprehend the variables (to reduce dimensionality) prior to grouping the data. The PCA factor scores were then used to produce the AHC. The clusters were validated using discriminant analysis (DA). 36 of 47 stations required additional analysis using AHC, according to the PCA factor scores. Low Polluted Region (LPR = seven stations), Moderate Polluted Region (MPR = 20 stations), and High Polluted Region (HPR = nine stations) were created from AHC and share the same characteristics. The DA results showed 84 % correct classification rate for the clusters. With regard to identifying and categorizing stations according to air quality characteristics, the framework presented here offers an improved method. This illustrates that the hybrid cluster method utilized in this work can produce a new method of pollutant distributions that is helpful in air pollution investigations.


Sahrir, S., et al. (2019). Environmetric study on air quality pattern for assessment in Klang Valley, Malaysia. International Journal of Recent Technology and Engineering (IJRTE), 8, 17-24.

Jankowska-Kieltyka, M., A. Roman, and I. Nalepa. (2021). The air we breathe: air pollution as a prevalent proinflammatory stimulus contributing to neurodegeneration. Frontiers in Cellular Neuroscience, 15, 647643-647643.

Bandyopadhyay, A. (2016). Neurological disorders from ambient (urban) air pollution emphasizing UFPM and PM2.5. Current Pollution Reports, 2(3), 203-211.

Chen, M.-C., et al. 2021. Air pollution is associated with poor cognitive function in Taiwanese adults. International Journal of Environmental Research and Public Health, 18(1), 316.

Iaccarino, L., et al. (2021). Association between ambient air pollution and amyloid positron emission tomography positivity in older adults with cognitive impairment. JAMA Neurology, 78(2), 197-207.

Shehab, M. A. and F. D. Pope. (2019). Effects of short-term exposure to particulate matter air pollution on cognitive performance. Sci Rep., 9(1), 8237.

Wong, S. F., et al. (2020). Association between long-term exposure to ambient air pollution and prevalence of diabetes mellitus among Malaysian adults. Environ Health, 19(1), 37.

Volk, H. E., et al. (2013). Traffic-related air pollution, particulate matter, and autism. JAMA Psychiatry, 70(1), 71-77.

Arifuddin, A. A., Jalaludin, J. and Hisamuddin, N. H. (2019). Air pollutants exposure with respiratory symptoms and lung function among primary school children nearby heavy traffic area in Kajang. Asian J. Atmos. Environ., 13, 21-29.

Tellez-Rojo, M. M., et al. (2020). Children's acute respiratory symptoms associated with PM2.5 estimates in two sequential representative surveys from the Mexico City Metropolitan Area. Environ Res, 180, 108868.

Zainal Abidin, E., et al. (2014). The relationship between air pollution and asthma in Malaysian schoolchildren. Air Quality, Atmosphere & Health, 7(4), 421-432.

Zakaria, J., M.s. Lye, and Z. Hashim. (2012). Asthma severity and environmental health risk factor among asthmatic primary school children in the selected areas. American Journal of Applied Sciences, 9, 1553-1560.

Hisamuddin, N. H., et al. (2022). The influence of environmental polycyclic aromatic hydrocarbons (PAHs) exposure on DNA damage among school children in urban traffic area, Malaysia. International Journal of Environmental Research and Public Health, 19(4).

Ab. Rahman, E., et al. (2022). Assessment of PM2.5 patterns in Malaysia using the clustering method. Aerosol and Air Quality Research, 22(1), 210161.

Abd Rani, N. L., et al. (2018). Air pollution index trend analysis in Malaysia, 2010-15. Polish Journal of Environmental Studies, 27(2), 801-807.

Department of Environment (DOE). (2020). New Malaysia ambient air quality standard department of environment. Accessed March. http://www.doe.gov.my/portalv1/en/category/info-umum/indeks-pencemaran-udara.

Shafii, N. Z., et al. (2019). Application of chemometrics techniques to solve environmental issues in Malaysia. Heliyon, 5(10), e02534.

Azid, A., et al. (2015). Identification source of variation on regional impact of air quality pattern using chemometric. Aerosol and Air Quality Research, 15(4), 1545-1558.

Azizan, N. A., et al. (2022). Air quality pattern in Central of Malaysia: A new approach. Bioscience Research, 19(SI-1), 126-134.

Zakaria, U., et al. (2017). The assessment of ambient air pollution pattern in Shah Alam, Selangor, Malaysia. Journal of Fundamental and Applied Sciences, 9, 772-788.

Azid, A., et al. (2017). Air quality modelling using chemometric techniques. Journal of Fundamental and Applied Sciences, 9, 443-466.

Dominick, D., et al. (2012). Spatial assessment of air quality patterns in Malaysia using multivariate analysis. Atmospheric Environment, 60, 172-181.

Liyana Zakri, N., et al. (2018). Identification source of variation on regional impact of air quality pattern using chemometric techniques in Kuching, Sarawak. International Journal of Engineering & Technology, 7(3): Special Issue 14.

Azid, A., et al. (2014). Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: A case study in Malaysia. Water, Air, & Soil Pollution, 225(8), 2063.

Govender, P. and V. Sivakumar. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmospheric Pollution Research, 11(1), 40-56.

Pires, J. C. M., et al. (2008). Management of air quality monitoring using principal component and cluster analysis—Part I: SO2 and PM10. Atmospheric Environment, 42(6), 1249-1260.

Austin, E., et al. (2013). A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition. Environment International, 59, 244-254.

Azid, A., et al. (2016). Selection of the most significant variables of air pollutants using sensitivity analysis. Journal of Testing and Evaluation, 44, 1-9.

Azizan, N. A., et al. (2022). Identification of the most significant of air pollutants using sensitivity analysis with spatial assessment using clustering method. Bioscience Research, 9(SI-1), 105-114.

Ismail, A., A. Abdullah, and M. Samah. (2017). Environmetric study on air quality pattern for assessment in Northern Region of Peninsular Malaysia. Journal of Environmental Science and Technology, 10, 186-196.

Zheng, D., et al. (2018). Prediction and sensitivity analysis of long-term skid resistance of epoxy asphalt mixture based on GA-BP neural network. Construction and Building Materials, 158, 614-623.

Shrestha, N. (2021). Factor analysis as a tool for survey analysis. American Journal of Applied Mathematics and Statistics, 9(1), 4-11.

Hua, A. (2018). Applied chemometric approach in identification sources of air quality pattern in Selangor, Malaysia. Sains Malaysiana, 47, 471-479.

Azid, A., et al. (2018). Assessing indoor air quality using chemometric models. Polish Journal of Environmental Studies, 27.

Shafii, N., et al. (2017). Spatial assessment on ambient air quality status: A case study in Klang, Selangor. Journal of Fundamental and Applied Sciences, 9, 964-977.

Jamalani, M., et al. (2016). Monthly analysis of PM10 in ambient air of Klang Valley, Malaysia. Malaysian Journal of Analytical Sciences.

Hamza Ahmad, I. and A. Azman. (2015). Air quality pattern assessment in Malaysia using multivariate techniques. Malaysian Journal of Analytical Sciences, 19(5), 966-978.

Kim, N. (2016). A robustified Jarque–Bera test for multivariate normality. Economics Letters, 140, 48-52.

Lee, D. (2020). Data transformation: A focus on the interpretation. Korean Journal of Anesthesiology, 73, 503-508.

van Ginkel, J. R., L. A. van der Ark, W. H. M. Emons, and R. R. Meijer. (2023). Handling missing data in principal component analysis using multiple imputation, in Essays on Contemporary Psychometrics. Springer International Publishing. 141-161.

Franceschi, F., M. Cobo, and M. Figueredo. (2018). Discovering relationships and forecasting PM10 and PM2.5 concentrations in Bogotá, Colombia, using artificial neural networks, principal component analysis, and k-means clustering. Atmospheric Pollution Research, 9(5), 912-922.

Shrestha, S. and F. Kazama. (2007). Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin, Japan. Environmental Modelling & Software, 22(4), 464-475.

Ul-Saufie, A. Z., et al. (2013). Future daily PM10 concentrations prediction by combining regression models and feedforward backpropagation models with principle component analysis (PCA). Atmospheric Environment, 77, 621-630.

Abdullah, S., et al. (2016). Neural network fitting using Levenberg-Marquardt training algorithm for PM10 concentration forecasting in Kuala Terengganu. Journal of Telecommunication, Electronic and Computer Engineering, 8, 27-31.

Li, G. (2007). Measuring the quality of life in City of Indianapolis by integration of remote sensing and census data. International Journal of Remote Sensing - INT J REMOTE SENS, 28.

Kruskal, J. B. a. W., M. (1978). Multidimensional scaling. Sage University Paper Series on Quantitative Applications in the Social Sciences. Sage Publications. 07-011.

Kamalha, E. and E. Omollo. (2017). Clustering and classification of cotton lint using principle component analysis, agglomerative hierarchical clustering, and K-means clustering. Journal of Natural Fibers. 15.