Predicting the performance of the players in NBA Players by divided regression analysis


  • Yann Ling Goh Universiti Tunku Abdul Rahman
  • Yeh Huann Goh Kolej Universiti Tunku Abdul Rahman
  • Ling Leh Bin Raymond Universiti Tunku Abdul Rahman
  • Weng Hoong Chee Universiti Tunku Abdul Rahman



Divided Regression, Multiple Linear Regression, Variance Inflation Factor


A divided regression model is built to predict the performance of the players in the National Basketball Association (NBA) from year 1997 until year 2017. The whole data set is divided into five groups of sub data sets and multiple linear regression model is employed to model each of the sub data set. In addition, the relationships among independent variables are checked by using variance inflation factor (VIF) to identify the risk of having multicollinearity in the data. Moreover, non-linearity of regression model, non-constancy of error variance and non-normality of error terms are investigated by plotting residual plots and quantile-quantile plots. Finally, a divided regression model is built by combining the results obtained from the sub data sets and the performance of the divided regression model is verified.

Author Biographies

Yann Ling Goh, Universiti Tunku Abdul Rahman

Department of Mathematical and Actuarial Sciences

Yeh Huann Goh, Kolej Universiti Tunku Abdul Rahman

Department of Mechanical Engineering

Ling Leh Bin Raymond, Universiti Tunku Abdul Rahman

Department of Accountancy

Weng Hoong Chee, Universiti Tunku Abdul Rahman

Department of Mathematical and Actuarial Sciences


Draper, N. R., Smith, H. (2014). Applied regression analysis (Vol. 326). John Wiley & Sons.

Silhavy, R., Silhavy, P., Prokopova, Z. (2017). Analysis and selection of a regression model for the Use Case Points method using a stepwise approach. Journal of Systems and Software, 125, 1-14.

Stoklosa, J., Huang, Y. H., Furlan, E., Hwang, W. H. (2016). On quadratic logistic regression models when predictor variables are subject to measurement error. Computational Statistics & Data Analysis, 95, 109-121.

Vastrad, C. (2013). Performance analysis of regularized linear regression models for oxazolines and oxazoles derivitive descriptor dataset. arXiv preprint arXiv:1312.2789.

Schneider, A., Hommel, G., Blettner, M. (2010). Linear regression analysis: part 14 of a series on evaluation of scientific publications. Deutsches Ärzteblatt International, 107(44), 776-782.

Awang, S. R., Alimin, N. S. N. (2016). The significant factors for the people with epilepsy high employability based on multiple intelligence scores. Malaysian Journal of Fundamental and Applied Sciences, 12(1), 1-5.

Mason, C. H., Perreault Jr, W. D. (1991). Collinearity, power, and interpretation of multiple regression analysis. Journal of marketing research, 268-280.

Dubey, R., Gunasekaran, A., Childe, S. J., Wamba, S. F., Papadopoulos, T. (2016). The impact of big data on world-class sustainable manufacturing. The International Journal of Advanced Manufacturing Technology, 84(1-4), 631-645.

Fan, J., Han, F., Liu, H. (2014). Challenges of big data analysis. National science review, 1(2), 293-314.

Gandomi, A., Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144.

Wang, C., Chen, M. H., Schifano, E., Wu, J., Yan, J. (2016). Statistical methods and computing for big data.Statistics and its interface, 9(4), 399.

Jun, S., Ryu, S. J. L. B. (2015). A divided regression analysis for big data. International Journal of Software Engineering and Its Applications, 9(5), 21-32.

Yang, Y. S. (2015). Predicting Regular Season Results of NBA Teams Based on Regression Analysis of Common Basketball Statistics (Doctoral dissertation, PhD thesis, UC Berkeley).

Aiken, L. S., West, S. G., Pitts, S. C. 2003. Multiple linear regressions. In, Handbook of Psychology. 19, 481-507.

Fitrianto, A., Hanafi, I., Chui, T. L. (2016). Modeling Asia's Child Mortality Rate: A Thinking of Human Development in Asia. Procedia Economics and Finance, 35, 249-255.

Hall, R., Fienberg, S. E., Nardi, Y. (2011). Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics, 27(4), 669.

Chen, G. J. (2012). A simple way to deal with multicollinearity. Journal of Applied Statistics, 39(9), 1893-1909.

Chen, X., Xie, M. G. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, 1655-1684.

Tang, L., Zhou, L., Song, P. X. K. (2016). Method of Divide-and-Combine in Regularised Generalised Linear Models for Big Data. arXiv preprint arXiv:1611.06208.

Martin, J., de Adana, D. D. R., Asuero, A. G. (2017). Fitting Models to Data: Residual Analysis, a Primer. In Uncertainty Quantification and Model Calibration. InTech.