Improvement of estimation based on small number of events per variable (EPV) using bootstrap logistics regression model


  • Muhamad Safiih Lola Universiti Malaysia Terengganu
  • Nurul Hila Zainuddin Universiti Malaysia Terengganu
  • Mohd Noor Afiq Ramlee Universiti Malaysia Terengganu
  • Muhamad Na’eim Abdul Rahman Universiti Malaysia Terengganu
  • Mohd Tajuddin Abdullah Universiti Malaysia Terengganu



Endemic dengue, Logistics regression, Bootstrap approach, Monte Carlo simulation


In this research, a bootstrap approach model is proposed, namely as Bootstrap Logistics Regression Model (BLRM) that is specifically used to solve the small events per variable (EPV) problem. Considering a sample data from study case of endemic dengue at several localities in Kelantan, Malaysia, a simulation study is conducted.  We generated 5, 10, 20 and 25 mean samples with 500 times replacement, 1500 times bootstrap for each small EPV value (EPV= 2, 3, 4 and 5) according to the basic reproduction number, R0 for endemic dengue.  The performance of the propose BLRM revealed that the frequency distribution of estimated regression coefficient became less peaked and possessed thinner tails; the average percent relative bias consistently decreased and was closed to true parameter; the sample variance (MSE and RMSE) of the estimated regression coefficients of were smaller than original model.


Arthur, S. G. 1962. Best Linear Unbiased Prediction in the Generalized Linear Regression Model. Journal of the American Statistical Association, 57(298), 369-375.

Asghar, G. and Saleh, Z. 2012. Normality Tests for Statistical Analysis: A Guide for Non-Statisticians. International Journal of Endocrinology and Metabolism, 10(2), 486-489.

Carroll, R. J. and Pederson, S. 1993. On Robustness in the Logistic Regression Model. Journal of the Royal Statistical Society, Series B, 80, 461-465.

Concato, J., Feinstein, A. R. and Holford, T. R. 1993. The Risk of Determining Risk with Multivariable Models. Annals of Internal Medicine, 118, 201-210.

Concato, J., Peduzzi, P., Holford, T. R., and Feinstein, A. R. 1995. The Importance of Events per Independent Variable (EPV) in Proportional Hazards Analysis: I. Background, Goals and General Strategy. Journal of Clinical Epidemiology, 48, 1495-1501.

Efron, B. 1979. Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1-26.

Efron, B. and Tibshirani, R. J. 1993. An Introduction to the Bootstrap. New York: Chapman and Hall.

Freedman, L. S. and Pee, D. 1989. Return to a Note on Screening Regression Equations. American Statistician, 43, 279-282.

Gareth, A., Anthony, R. B. and Patrick, R. 2002. Simplifying a Prognostic Model: A Simulation Study Based on Clinical Data. Statistics Medical, 21, 3803–3822.

Harrel, F., Lee, K. L., Matchar, D. B. and Reichert, T. A. 1985. Regression Models for Prognostic Prediction: Advantages, Problems and Suggested Solutions. Cancer Treatment Reports, 69, 1071-1077.

Muhamad Safiih, L. 2013. Fuzzy Parametric Sample Selection Model: Monte Carlo Simulation Approach. Journal of Statistical Computation and Simulation, 83(6), 992-1006.

Muhamad Safiih, L., Kamil, A.A. and Abu Osman, M.T. 2014. Estimated and Analysis of the Relationship Between the Endogenous and Exogenous Variables using Fuzzy Semi-Parametric Sample Selection Model. American Journal of Applied Sciences, 11(9), 1542-1552

Muhamad Safiih, L., Wan Saliha, W. A. and Nurul Hila, Z. 2016. Sample Selection Model with Bootstrap (BPSSM) Approach: Case Study of the Malaysian Population and Family Survey. Open Journal of Statistics, 6, 741 – 748.

Nornadiah, M.R. and Yap, B.W. 2011. Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.

Peduzzi, P., Concato, J., Kemper, E., Theodore, R.H. and Alvan, R.F. 1996. Simulation Study of the Number of Events per Variable in Logistic Regression Analysis, Journal of Clinical Epidemiology, 49(12), 1373-1379.

Peduzzi, P., Detre, K. and Gage, A. 1985. Veterans Administration Cooperative Study of Medical Versus Surgical Treatment for Stable Angina-Progress Report: Section 2-Design and Baseline Characteristics. Progress in Cardiovascular Diseases, 28, 235-243. 1974.

Rahim, M., Flora, I.M. and Richard, H.G. 2007. A Simulation Study of Sample Size for Multilevel Logistic Regression Models. Medical Research Methodology,7, 34.

Tao, L. and Narayanaswamy, B. 2008. Best Linear Unbiased Estimators of Parameters of a Simple Linear Regression Model Based on Ordered Ranked Set Samples. Journal of Statistical Computation and Simulation, 78(12), 1267-1278.

Wynants, L., Bouwmeester, W., Moons, K.G.M., Moerbeek, M., Timmerman, D. and Van, S., Vergouwe, Y. 2015. A simulation Study of Sampel Siza Demonstrated the Importance of the Number of Events Per Variable to Develop Prediction Models in Clustered Data. Journal of Clinical Epidemiology, 68, 1406-1414.