Predicting the performance of the players in NBA Players by divided regression analysis

Yann Ling Goh, Yeh Huann Goh, Ling Leh Bin Raymond, Weng Hoong Chee


A divided regression model is built to predict the performance of the players in the National Basketball Association (NBA) from year 1997 until year 2017. The whole data set is divided into five groups of sub data sets and multiple linear regression model is employed to model each of the sub data set. In addition, the relationships among independent variables are checked by using variance inflation factor (VIF) to identify the risk of having multicollinearity in the data. Moreover, non-linearity of regression model, non-constancy of error variance and non-normality of error terms are investigated by plotting residual plots and quantile-quantile plots. Finally, a divided regression model is built by combining the results obtained from the sub data sets and the performance of the divided regression model is verified.


Divided Regression; Multiple Linear Regression; Variance Inflation Factor

Full Text:



Draper, N. R., Smith, H. (2014). Applied regression analysis (Vol. 326). John Wiley & Sons.

Silhavy, R., Silhavy, P., Prokopova, Z. (2017). Analysis and selection of a regression model for the Use Case Points method using a stepwise approach. Journal of Systems and Software, 125, 1-14.

Stoklosa, J., Huang, Y. H., Furlan, E., Hwang, W. H. (2016). On quadratic logistic regression models when predictor variables are subject to measurement error. Computational Statistics & Data Analysis, 95, 109-121.

Vastrad, C. (2013). Performance analysis of regularized linear regression models for oxazolines and oxazoles derivitive descriptor dataset. arXiv preprint arXiv:1312.2789.

Schneider, A., Hommel, G., Blettner, M. (2010). Linear regression analysis: part 14 of a series on evaluation of scientific publications. Deutsches Ärzteblatt International, 107(44), 776-782.

Awang, S. R., Alimin, N. S. N. (2016). The significant factors for the people with epilepsy high employability based on multiple intelligence scores. Malaysian Journal of Fundamental and Applied Sciences, 12(1), 1-5.

Mason, C. H., Perreault Jr, W. D. (1991). Collinearity, power, and interpretation of multiple regression analysis. Journal of marketing research, 268-280.

Dubey, R., Gunasekaran, A., Childe, S. J., Wamba, S. F., Papadopoulos, T. (2016). The impact of big data on world-class sustainable manufacturing. The International Journal of Advanced Manufacturing Technology, 84(1-4), 631-645.

Fan, J., Han, F., Liu, H. (2014). Challenges of big data analysis. National science review, 1(2), 293-314.

Gandomi, A., Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144.

Wang, C., Chen, M. H., Schifano, E., Wu, J., Yan, J. (2016). Statistical methods and computing for big data.Statistics and its interface, 9(4), 399.

Jun, S., Ryu, S. J. L. B. (2015). A divided regression analysis for big data. International Journal of Software Engineering and Its Applications, 9(5), 21-32.

Yang, Y. S. (2015). Predicting Regular Season Results of NBA Teams Based on Regression Analysis of Common Basketball Statistics (Doctoral dissertation, PhD thesis, UC Berkeley).

Aiken, L. S., West, S. G., Pitts, S. C. 2003. Multiple linear regressions. In, Handbook of Psychology. 19, 481-507.

Fitrianto, A., Hanafi, I., Chui, T. L. (2016). Modeling Asia's Child Mortality Rate: A Thinking of Human Development in Asia. Procedia Economics and Finance, 35, 249-255.

Hall, R., Fienberg, S. E., Nardi, Y. (2011). Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics, 27(4), 669.

Chen, G. J. (2012). A simple way to deal with multicollinearity. Journal of Applied Statistics, 39(9), 1893-1909.

Chen, X., Xie, M. G. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, 1655-1684.

Tang, L., Zhou, L., Song, P. X. K. (2016). Method of Divide-and-Combine in Regularised Generalised Linear Models for Big Data. arXiv preprint arXiv:1611.06208.

Martin, J., de Adana, D. D. R., Asuero, A. G. (2017). Fitting Models to Data: Residual Analysis, a Primer. In Uncertainty Quantification and Model Calibration. InTech.



  • There are currently no refbacks.

Copyright (c) 2019 Yann Ling Goh, Yeh Huann Goh, Weng Hoong Chee

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright © 2005-2019 Penerbit UTM Press, Universiti Teknologi Malaysia. Disclaimer: This website has been updated to the best of our knowledge to be accurate. However, Universiti Teknologi Malaysia shall not be liable for any loss or damage caused by the usage of any information obtained from this website.